This project is an advanced video analysis tool that generates comprehensive video synopses by leveraging state-of-the-art computer vision techniques. It provides a powerful solution for intelligent video summarization, particularly useful in surveillance, content analysis, and object tracking scenarios. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.
Run the project using this Google Colab Notebook.
To install all dependencies, run:
pip install -r requirements.txt
To interactively run the project on a Streamlit-based web UI:
streamlit run ./app.py & npx localtunnel --port 8501
This will expose the Streamlit app through a localtunnel link.
Run main.py
with the following example:
python main.py \ --input_model "OWL-ViT" \ --video "/content/text2video_synopsis/all_rush_video.mp4" \ --classes "people,person" \ --epoch 100
--input_model
: Detection model to use (OWL-ViT
or Florence-2-large
).--video
: Path to the input video file.--classes
: Object classes to detect.
"People in the video" , "Car on the road"
"People with black t-shirt" , "People with suitcase"
"car,person,dog"
--epoch
: Number of iterations for video synopsis optimization.Motion Detection: Focuses processing on video segments with significant motion.
Object and Action Detection: Uses state-of-the-art models like Florence and OWL-ViT for object detection, and SAM for segmentation.
Flexible Synopsis Generation: Creates optimized video summaries based on user-defined object criteria
Versatile Use Cases:
There are no models linked
There are no datasets linked
There are no datasets linked
There are no models linked