Ona Vision is a real-time object detection and tracking system built using YOLOv8 and DeepSORT. It captures live video, detects objects, tracks them across frames, and streams processed video over the network. The system integrates observability using Prometheus to monitor performance metrics like FPS, inference time, and resource usage.
Designed as a full MLOps pipeline, Ona Vision covers model training, deployment, scaling, and monitoring. It leverages Docker, Kubernetes, and GitOps principles to ensure scalability and reliability for production environments.
Ona Vision was inspired by a personal experience where I lost my AirPods and had to manually sift through hours of CCTV footageโan inefficient and frustrating process. This project aims to automate and simplify video surveillance by enabling real-time detection and tracking of specific objects, making environments safer and security systems smarter.
flowchart TD A[Webcam] --> B[Server: Capture Frames] B --> C[YOLOv8: Object Detection] C --> D[Add Bounding Boxes & Labels] D --> E[Serialize Processed Frame] E --> F[Send Frame over Socket] F --> G[Client: Receive & Display Video] B --> H[Prometheus Metrics Collection] H --> I[Performance & Inference Stats] style A fill:#dbe9ff,stroke:#333,stroke-width:1px,color:#000 style B fill:#a3c4f3,stroke:#333,stroke-width:1px,color:#000 style C fill:#8ccf7e,stroke:#333,stroke-width:1px,color:#000 style D fill:#a3c4f3,stroke:#333,stroke-width:1px,color:#000 style E fill:#a3c4f3,stroke:#333,stroke-width:1px,color:#000 style F fill:#a3c4f3,stroke:#333,stroke-width:1px,color:#000 style G fill:#f7c978,stroke:#333,stroke-width:1px,color:#000 style H fill:#f4dbb7,stroke:#333,stroke-width:1px,color:#000 style I fill:#f4dbb7,stroke:#333,stroke-width:1px,color:#000
pip install -r requirements.txt python main.py # Start detection server python client.py # Display real-time video stream cd ui python app.py # Launch Flask web UI at http://localhost:5000
Then visit http://localhost:5000 in your browser.
Metric | Description |
---|---|
FPS | Frames per second for performance monitoring |
Inference Time | Time taken for YOLOv8 inference per frame |
CPU Usage | System CPU utilization during processing |
Memory Usage | System memory consumption during processing |
Detection Confidence | Average confidence of detected objects per frame |
Class-wise Object Count | Tracks the number of detected objects per class |
You can visit my repository at the link below.
https://github.com/josiah-mbao/Ona-Vision
To see a demo of the system in action click the link below. The demo uses video footage that comes from Pixabay, and inference on T4 GPUs from Google Colab.
๐ฅ Click here to view the demo video
Ona Vision is designed to provide real-time object detection and tracking for a variety of scenarios, making it a versatile tool for industries and individuals alike. Here are some example use cases:
Ona Vision was born from a real and frustrating experienceโlosing my AirPods and spending hours scrubbing through CCTV footage at my university, only to walk away with nothing. That process felt deeply inefficient and made me wonder: With so many cameras deployed in campuses, businesses, and public spaces, are we really using that video data to its full potential?
My long-term vision for Ona Vision is to enrich video data with insights and make it searchableโimagine being able to ask, "Show me all instances of a person picking up an AirPod on Tuesday between 2โ4 PM." This would transform how we interact with surveillance footage, turning it into an intelligent, searchable asset instead of just a passive recording.
To move Ona Vision toward that goal, Iโm focusing on a few key areas:
Access to Scalable Compute
Currently, I rely on Google Colab for inference. Iโm in the process of enabling billing for my GCP account, but GPU compute costs remain a key challenge. Long-term, exploring more cost-effective edge inference or using spot instances could help.
More Robust MLOps Pipeline
I want to expand the current system to include:
Batch Video Processing
Real-time detection is powerful, but batch processing pre-recorded video is far more affordable and often sufficient for forensic or analytics use-cases. Building support for offline processing pipelines is a top priority.
Edge Deployment on Raspberry Pi or Jetson Nano
Running Ona Vision on edge devices like the Raspberry Pi or NVIDIA Jetson Nano would allow for local, low-latency inference without relying on the cloud. This is ideal for privacy-sensitive environments (like homes or schools), or areas with limited internet connectivity. Lightweight YOLO variants or quantized models can make real-time detection feasible even on constrained hardware.
Cloud Integration to Store Detection Data
Integrating with cloud platforms like Google Cloud, AWS, or Azure will enable long-term storage of inference results, logs, and video clips. This would support features like:
Web-Based Visualization Using Flask or FastAPI
While the current Flask UI offers basic monitoring, I aim to extend this into a full-featured dashboard with:
This project is open-source under the Apache 2.0 License.
Josiah Mbao โ Software Engineer | MLOps Developer