Table of contents
🎉 Use Multi-Modal Vision-Language Models to Analysis and Detect any Objects in the video!
✨ Features
- Upload and analyze videos through an intuitive web interface
- Real-time frame-by-frame analysis using multimodal AI
- Natural language object description support
- Visual results display with confidence scores
- Image preprocessing for better detection accuracy
- Streaming response for real-time analysis feedback
🚀 Getting Started
Prerequisites
- Python 3.8+
- Ollama with Llama Vision model installed
- OpenCV
Installation
- Clone the repository
git clone https://github.com/JYe9/ollama_vlm_monitoring.git
cd ollama_vlm_monitoring
- Install dependencies
pip install -r requirements.txt
- Make sure Ollama is running with Llama Vision model
ollama run llama3.2-vision
- Start the application
python main.py
- Access the web interface at
http://localhost:8000
🛠️ Usage
- Open the web interface
- Upload a video file
- Enter a description of the object/person you want to find
- Click "Start Analysis"
- View results as they appear in real-time
📦 Dependencies
- FastAPI
- OpenCV
- Ollama
- Jinja2
- uvicorn