Feb 20, 2025●5 reads●MIT License

Enhancing Audio Transcription and Speaker Diarization with OpenAI Whisper: A Case Study

#STT #Diarization

b
@baxtiyormustafoyev20

🎙️ Audio Processing Application

A powerful audio processing application that transcribes, analyzes, and translates conversations with speaker diarization support. Available in both Uzbek and English interfaces.

🌟 Features

Audio Transcription: Utilizes OpenAI's Whisper model for accurate speech-to-text conversion
Speaker Diarization: Automatically identifies and labels different speakers in conversations
Language Processing:
- Handles mixed Kazakh-Uzbek conversations
- Translates to pure Uzbek while preserving original structure
Smart Analysis:
- Generates comprehensive conversation summaries
- Identifies key topics and points
- Analyzes speaker patterns and interactions
User-Friendly Interface:
- Intuitive file upload system
- Real-time processing status
- Downloadable results in multiple formats

🚀 Getting Started

Prerequisites

python >= 3.8
openai
streamlit
python-dotenv

Installation

Clone the repository:

git clone https://github.com/mustafoyev-202/OPENAI_STT.git
cd audio-processor

Install dependencies:

pip install -r requirements.txt

Create a .env file in the root directory:

OPENAI_API_KEY=your_openai_api_key_here

Running the Application

streamlit run app.py

For the English version:

streamlit run main.py

💡 Usage

Launch the application using the command above
Upload an audio file (supported formats: MP3, WAV, M4A)
Wait for the processing to complete
View the results:
- Original transcription
- Speaker-labeled conversation
- Translated text (for Uzbek version)
- Conversation analysis and summary
Download the results using the provided buttons

🔧 Configuration

The application can be configured through environment variables:

OPENAI_API_KEY: Your OpenAI API key (required)
Additional Streamlit configurations can be set in .streamlit/config.toml

🌍 Language Support

Uzbek Version (app.py)

Transcribes mixed Kazakh-Uzbek conversations
Translates to pure Uzbek
Provides interface in Uzbek language

English Version (main.py)

Transcribes English conversations
Provides interface in English language
Focuses on speaker diarization and analysis

🔍 Technical Details

The application uses several advanced technologies:

OpenAI Whisper: For accurate speech-to-text conversion
GPT-4: For speaker diarization and text analysis
Streamlit: For the web interface
Regular Expressions: For text formatting and cleaning

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Important Notes

Ensure your audio files are clear and of good quality
Large audio files may take longer to process
API usage is subject to OpenAI's pricing and rate limits

🙏 Acknowledgments

OpenAI for providing the API services
Streamlit for the awesome framework
Contributors and users of this application

Models

OPENAI STT

Datasets

There are no datasets linked