The Dramamine Style Reel Generator is a web-based application designed to create stylized, Instagram-style reels from audio files with synchronized text overlays. The application supports multiple languages, including English, Hindi, and Tamil, and provides automatic speech-to-text transcription, text romanization, dynamic overlays, moody video filters, and background music integration. By leveraging OpenAI's Whisper API, the system ensures precise word-level timestamps for accurate and visually appealing text overlays. The project aims to simplify content creation for users by automating the video generation process, making high-quality reels accessible without advanced video editing skills. This submission is being made for the ReadyTensor Agentic AI 2025 competition.
Audio Processing
Users upload an audio file (MP3, WAV, M4A, or MP4) through the web interface.
The application leverages OpenAI's Whisper API for speech-to-text transcription, extracting text along with precise timestamps.
Non-English transcriptions (Hindi, Tamil) undergo romanization using OpenAI's language models to enhance readability and accessibility.
Text Overlay Generation
The transcribed text is split into natural phrases to ensure fluid reading.
A variety of custom fonts (stored in the ./fonts directory) are used to enhance aesthetic appeal.
The text is dynamically positioned with randomized animations, making each video unique and engaging.
Video Assembly
Random background video clips are selected from the ./videos directory to provide visually appealing content.
Text overlays are synchronized with the transcribed timestamps, ensuring accurate placement in the video.
Video effects such as vignette filters, dynamic text animations, and motion effects are applied.
Background music (DramamineFM.mp3 from ./audio directory) is integrated and mixed with the original audio to enhance the user experience.
The final video is processed and stored in the output/ directory, ready for download.
Web Application & Deployment
A Flask-based web server provides an intuitive UI for users to upload audio and select language preferences.
The backend processes the audio, applies effects, and generates the video in real time.
Users can preview and download the final video after processing.
Key Achievements
Successfully automates reel generation, reducing manual editing effort.
Provides accurate and timed text overlays using speech-to-text AI.
Supports multi-language processing with romanization for non-English text.
Enhances videos with dynamic animations, moody filters, and background music.
Efficient Flask-based UI ensures ease of use for non-technical users.
Performance Metrics
Processing Time: Average video generation time is 1-2 minutes per minute of audio.
Accuracy: OpenAI Whisper’s word-level accuracy exceeds 90% for clean audio.
User Experience: Feedback indicates high usability, especially for content creators seeking quick and aesthetic video production.
Limitations & Future Improvements
Current Limitations:
Maximum audio file size: 16MB.
Supports only English, Hindi, and Tamil (expansion needed for additional languages).
Processing speed depends on hardware and API response times.
Future Enhancements:
Expand to more languages for wider adoption.
Improve video effects and filters for enhanced aesthetics.
Introduce custom font & style selection for more user control.
Implement cloud-based processing for higher efficiency.
There are no models linked
There are no datasets linked
There are no models linked
There are no datasets linked