Back to publicationsDec 26, 2024●25 reads●Apache 2.0Multi-Modal Vision-Language Monitoring SystemComputer VisionLlama 3.2 VisionObject TrackingOllamaVision LanguagesKristian YeShareTable of contents 🎉 Use Multi-Modal Vision-Language Models to Analysis and Detect any Objects in the video! ✨ Features 🚀 Getting Started Prerequisites Installation 🛠️ Usage 📦 Dependencies 🎉 Use Multi-Modal Vision-Language Models to Analysis and Detect any Objects in the video! ✨ Features Upload and analyze videos through an intuitive web interface Real-time frame-by-frame analysis using multimodal AI Natural language object description support Visual results display with confidence scores Image preprocessing for better detection accuracy Streaming response for real-time analysis feedback 🚀 Getting Started Prerequisites Python 3.8+ Ollama with Llama Vision model installed OpenCV Installation Clone the repository git clone https://github.com/JYe9/ollama_vlm_monitoring.git cd ollama_vlm_monitoring Install dependencies pip install -r requirements.txt Make sure Ollama is running with Llama Vision model ollama run llama3.2-vision Start the application python main.py Access the web interface at http://localhost:8000 🛠️ Usage Open the web interface Upload a video file Enter a description of the object/person you want to find Click "Start Analysis" View results as they appear in real-time 📦 Dependencies FastAPI OpenCV Ollama Jinja2 uvicorn Table of contents 🎉 Use Multi-Modal Vision-Language Models to Analysis and Detect any Objects in the video! ✨ Features 🚀 Getting Started Prerequisites Installation 🛠️ Usage 📦 Dependencies