This project focuses on enhancing the audio quality of videos using a combination of advanced AI-powered techniques. It leverages Azure OpenAI to extract audio from videos, convert it into a transcript, correct grammar, eliminate filler words, and generate refined audio, which is then seamlessly synchronized back to the original video. The project aims to provide content creators and educators with polished, professional-quality videos by improving audio clarity and synchronization.
High-quality audio is critical for creating impactful video content, particularly for educators and creators. This project addresses common audio imperfections, such as grammatical errors, filler words, and synchronization mismatches. By integrating Azure OpenAI and cutting-edge tools like MoviePy, the solution provides a streamlined workflow to refine video audio. This enhances the overall video quality, ensuring a professional presentation.
The script extracts the audio track from the input video using MoviePy, saving it as a separate audio file.
The extracted audio is processed using a speech recognition engine to generate a transcript of the spoken content.
The transcript is refined using Azure OpenAI, which corrects grammatical errors and removes filler words (e.g., "uh", "um", "hmm") to improve clarity.
The cleaned transcript is converted back into an audio file using a text-to-speech engine.
The newly generated audio is perfectly synchronized with the original video, ensuring seamless integration without any delays or mismatches.
Test Videos: Various sample videos were tested, including short lectures, interviews, and casual content.
Performance Metrics:
Accuracy of Speech-to-Text conversion.
Quality of Grammar Correction and Filler Word Removal.
Seamlessness of Audio-Video Synchronization.
Tools and Libraries: MoviePy, Azure OpenAI, and Text-to-Speech engines.
Achieved high accuracy in converting speech to text and correcting grammar using Azure OpenAI.
Successfully removed filler words, resulting in more polished transcripts.
Generated refined audio perfectly synchronized with the original video.
Significant improvement in video audio quality, ensuring a professional output.
The Video Audio Enhancer project demonstrates an effective method for refining video audio using AI tools like Azure OpenAI. By automating processes such as grammar correction, filler word removal, and audio-video synchronization, the project provides an invaluable tool for creators and educators. Future enhancements could include support for multiple languages and real-time processing.
GitHub Repository: https://github.com/skstanwar/Curious-PM-
Documentation Page: https://skstanwar.github.io/Curious-PM-/
Flow Explanation Diagram: https://miro.com/app/live-embed/uXjVLRKRgGg=/?moveToViewport=-1247,-525,1837,912&embedId=383609660988
Extracts audio from the input video and converts it into text using a speech recognition engine.
Corrects grammatical errors in the transcript using Azure OpenAI.
Removes common filler words such as "uh", "um", and "hmm" from the transcript to improve clarity.
Converts the cleaned transcript back into audio.
Ensures the new audio is perfectly synchronized with the original video, without any delay or mismatch.
There are no datasets linked
There are no datasets linked