This project uses the YOLOv8 model for real-time object detection and integrates it with Microsoft Azure services. The system also provides descriptive analysis of detected objects using OpenAI GPT and generates audio output via Azure Text-to-Speech (TTS).
git clone https://github.com/jihyeon26/Vision_detect_project.git cd Vision_detect_project
pip install -r requirements.txt
OPENAI_ENDPOINT=https://your-openai-endpoint OPENAI_API_KEY=your_openai_api_key DEPLOYMENT_NAME=your-deployment-name SPEECH_ENDPOINT=https://your-speech-endpoint SPEECH_API_KEY=your_speech_api_key
python main.py
vision-detection-project/
│
├── main.py # Main entry point for the application
├── yolo_model.py # YOLO model loading and object detection logic
├── openai_service.py # Communication with OpenAI's GPT for analysis
├── speech_service.py # Communication with Azure TTS for audio generation
├── gradio_ui.py # Gradio interface and event listeners
├── config.py # Environment variable loader
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── .env # Environment variables (not included in the repo)
There are no datasets linked