●5 reads●MIT License
Enhancing Audio Transcription and Speaker Diarization with OpenAI Whisper: A Case Study
Table of contents
🎙️ Audio Processing Application
A powerful audio processing application that transcribes, analyzes, and translates conversations with speaker diarization support. Available in both Uzbek and English interfaces.
🌟 Features
- Audio Transcription: Utilizes OpenAI's Whisper model for accurate speech-to-text conversion
- Speaker Diarization: Automatically identifies and labels different speakers in conversations
- Language Processing:
- Handles mixed Kazakh-Uzbek conversations
- Translates to pure Uzbek while preserving original structure
- Smart Analysis:
- Generates comprehensive conversation summaries
- Identifies key topics and points
- Analyzes speaker patterns and interactions
- User-Friendly Interface:
- Intuitive file upload system
- Real-time processing status
- Downloadable results in multiple formats
🚀 Getting Started
Prerequisites
python >= 3.8 openai streamlit python-dotenv
Installation
- Clone the repository:
git clone https://github.com/mustafoyev-202/OPENAI_STT.git cd audio-processor
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory:
OPENAI_API_KEY=your_openai_api_key_here
Running the Application
streamlit run app.py
For the English version:
streamlit run main.py
💡 Usage
- Launch the application using the command above
- Upload an audio file (supported formats: MP3, WAV, M4A)
- Wait for the processing to complete
- View the results:
- Original transcription
- Speaker-labeled conversation
- Translated text (for Uzbek version)
- Conversation analysis and summary
- Download the results using the provided buttons
🔧 Configuration
The application can be configured through environment variables:
OPENAI_API_KEY
: Your OpenAI API key (required)- Additional Streamlit configurations can be set in
.streamlit/config.toml
🌍 Language Support
Uzbek Version (app.py)
- Transcribes mixed Kazakh-Uzbek conversations
- Translates to pure Uzbek
- Provides interface in Uzbek language
English Version (main.py)
- Transcribes English conversations
- Provides interface in English language
- Focuses on speaker diarization and analysis
🔍 Technical Details
The application uses several advanced technologies:
- OpenAI Whisper: For accurate speech-to-text conversion
- GPT-4: For speaker diarization and text analysis
- Streamlit: For the web interface
- Regular Expressions: For text formatting and cleaning
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
⚠️ Important Notes
- Ensure your audio files are clear and of good quality
- Large audio files may take longer to process
- API usage is subject to OpenAI's pricing and rate limits
🙏 Acknowledgments
- OpenAI for providing the API services
- Streamlit for the awesome framework
- Contributors and users of this application
Models
Datasets
There are no datasets linked