Video-to-Text Transcription Using Hugging Face Models