Ollama Optivus is a video processing AI based solution designed to analyze and extract meaningful information from video files. It uses a combination of technologies, including computer vision (YOLO), optical character recognition (OCR), and speech-to-text transcription (Whisper).
It processes videos by extracting audio and analyzing the content frame-by-frame to identify text and objects. It then generates detailed information based on the analysis and forwards it to an Ollama Large Language Model.
Video Upload and Management: Upload, List, and Delete video files from the server.
Video Processing: Process video files by extracting audio, transcribing speech, detecting objects and texts in frames, and generating prompts to be used in an AI LLM.
AI LLM: Integrated with Ollama for advanced AI-based processing. Supports all Ollama based models such as deepseek-r1
or llama3.2
.
Video Upload: Upload the video to the server.
Audio Extraction: FFmpeg is used to extract audio from the video file.
Audio Transcription: Whisper transcribes the extracted audio to text.
Frame Extraction: OpenCV extracts frames from the video to be analyzed one by one.
Object Detection: YOLOv8 detects objects in each frame of the video.
Text Detection: EasyOCR detects any text in each frame.
AI Prompting: Based on the extracted data, an AI model (Ollama) generates detailed information, summarizing the video.
Make sure Ollama is installed.
Note: No need to pull a model manually. The specified OLLAMA_AI_MODEL
in .env
or docker-compose.yml
will be pulled and installed automatically, when started.
Clone the repository:
git clone https://github.com/TarikSeyceri/OllamaOptivus.ai.git
cd OllamaOptivus.ai
There are two methods to install this project.
Before building and running the docker container, you may need to view docker-compose.yml and add/change environment variables accordingly. You can view the default environment variables in .env, but don't change the variables inside .env
file, just change/add them in docker-compose.yml
Then:
docker-compose up -d
Download and Install Node.js v18.20.5
Download and Install Python v3.11.9
Install Backend dependencies:
npm install
pip install -r requirements.txt
ffmpeg -version
If the above command worked and showed ffmpeg's version, then ffmpeg has been installed successfully.
Modify environment variables in .env file as needed, or leave them be.
Start the project:
npm run start
After running the project with either Docker or Locally. It will be accessible through:
http://localhost:3330
.
The .env is straight forward self explanatory. You can access it and read it.
However, there are some important environment variables that I will be highlighting here:
NODE_ENV
: Set to development
or production
.
development
:
/swagger
and /test
endpoints will be enabled.HTTP_BEARER_TOKEN
authentication and rate limiting will not be active.HTTP_BEARER_TOKEN
: Prestored token for authenticating API requests in production.
PROCESSING_ONLY
: If set to true
, It will only allow video processing, it won't allow video listing, uploading or deleting. Should be enabled with ALLOW_PROCESS_VIDEOS_OUTSIDE_VIDEOS_DIR
ALLOW_PROCESS_VIDEOS_OUTSIDE_VIDEOS_DIR
: When set to true
, It will allow /process
endpoint to process videos from outside the project's VIDEOS_DIR
folder, which means it can access other storage units such as usb, network or cloud storages, etc..
OLLAMA_AI_MODEL
: Set the Ollama AI Model to be used, default: deepseek-r1
All endpoints are prepared in a Postman Collection which can be imported.
In development environment /swagger
endpoint is enabled which shows all available endpoints (Without authentication)
GET /
Checks if the server is online.
Response Body:
{ "success": true, "msg": "Ollama Optivus Server is running!" }
POST /upload
Uploads a video file to the server.
Video file must be .mp4
file.
Note: Once a video has finished processing, you must manually send a delete request to the appropriate endpoint to remove the video. If you don't, the video will be deleted automatically after the specified FILE_RETENTION_DAYS
in the .env
file. Be mindful not to overload the storage by leaving videos uncleaned before the retention period ends.
Request Body:
form-data with key
video
and value as the video file.
Response Body:
{ "success": true, "msg": "Video file uploaded!", "payload": { "videoFilePath": "data/videos/8047000029a4000c36b908dd2fd94b6e.mp4" } }
GET /videos
{ "success": true, "msg": "Video files listed!", "payload": { "videos": [ "data/videos/8047000029a4000c36b908dd2fd94b6e.mp4" ] } }
DELETE /delete
Deletes a video file from the server.
Request Body:
{ "videoFileName": "8047000029a4000c36b908dd2fd94b6e.mp4" }
Response Body:
{ "success": true, "msg": "Video file deleted!" }
POST /process
Processes a video by analyzing its frames, extracting audio and transcribing it.
Request Body: JSON with keys:
videoFilePath
: Path to the video file. (Required)
language
: Language for transcription. ('en' or 'tr' for now, Optional, default: "en")
videoExplanation
: Text explaining the context of the video. (Optional, default: a proper general sentence)
temperature
: The creativity level for the AI model. (optional, default: 0, means no creativity)
format
: The response format which is an ollama json response schema. (Optional, default: { summary, events })
model
: The Ollama AI model to be used for prompting. (Optional, default: env.OLLAMA_AI_MODEL)
noPrompting
: If true
, it will return the processed data and skips the AI model prompting part, this means, format parameter will not be used, useful for debugging a video analysis or to send a prompt request to an Ollama LLM manually (Optional, default: false).
{ "videoFilePath": "data/videos/8047000029a4000c61f808dd2fd54bb4.mp4" }
{ "success": true, "msg": "Processing completed", "payload": { "response": "{ \"summary\": \"The conversation starts with verifying account details and setting up international transfers. The user is guided through the process, including adding recipient details and understanding fees. The assistant explains that there's a $15 fee for international transfers but a reduced $10 fee if using the mobile banking app. The conversation ends with thanking the user and providing final instructions.\", \"events\": [ { \"timestamp\": 2, \"description\": \"Verification of account balance\" }, { \"timestamp\": 4, \"description\": \"Setting up international transfer\" }, { \"timestamp\": 6, \"description\": \"Understanding fees for international transfers\" }, { \"timestamp\": 8, \"description\": \"Explaining the fee structure\" }, { \"timestamp\": 10, \"description\": \"Guiding through recipient setup\" }, { \"timestamp\": 12, \"description\": \"Clarifying if mobile app reduces fees\" }, { \"timestamp\": 14, \"description\": \"Confirming next steps\" } ] }\n\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\t \t " } }
{ "videoFilePath": "data/videos/8047000029a4000c61f808dd2fd54bb4.mp4", "language": "en", "videoExplanation": "The following info is the output of an analysis of a video call conversation between an agent and customer:", }
# Same as above response
Request Body: Advanced request body example: (format schema is same as ollama's format schema)
{ "videoFilePath": "data/videos/8047000029a4000c61f808dd2fd54bb4.mp4", "language": "en", "videoExplanation": "The following info is the output of an analysis of a video call conversation between an agent and customer:", "temperature": 0, "model": "deepseek-r1", "format": { "type": "object", "properties": { "summary": { "type": "string" }, "isUnrespectfulConversation": { "type": "boolean" }, "customerFraud": { "type": "object", "properties": { "percentage": { "type": "number" }, "reason": { "type": "string" } }, "required": ["percentage", "reason"] }, "customerSatisfactionPercentage": { "type": "number" }, "events": { "type": "array", "items": { "type": "object", "properties": { "timestamp": { "type": "number" }, "description": { "type": "string" } }, "required": ["timestamp", "description"] } } }, "required": [ "summary", "isUnrespectfulConversation", "customerFraud", "customerSatisfactionPercentage", "events" ] } }
Response Body:
{ "success": true, "msg": "Processing completed", "payload": { "response": "{ \"summary\": \"The conversation starts with verifying account details and setting up international transfers. The user is guided through the process, including adding recipient details and understanding fees. The assistant explains that there's a $15 fee for international transfers but a reduced $10 fee if using the mobile banking app. The conversation ends with thanking the user and providing final instructions.\", \"isUnrespectfulConversation\": false, \"customerFraud\": {\"percentage\": 0, \"reason\": \"\"}, \"customerSatisfactionPercentage\": 95, \"events\": [ { \"timestamp\": 1576234800, \"description\": \"Verification of account balance completed successfully.\" }, { \"timestamp\": 1576234860, \"description\": \"Setting up international transfer requested by the customer.\" }, { \"timestamp\": 1576234920, \"description\": \"Customer informed about a $15 fee for international transfers.\" }, { \"timestamp\": 1576234980, \"description\": \"Explanation of reduced fee ($10) if using mobile banking app.\" }, { \"timestamp\": 1576235040, \"description\": \"Customer asked about setting up recipient details beforehand.\" }, { \"timestamp\": 1576235100, \"description\": \"Explanation of required steps for adding recipient details online.\" }, { \"timestamp\": 1576235160, \"description\": \"Guidance on receiving instructions via email or SMS.\" }, { \"timestamp\": 1576235220, \"description\": \"Customer thanks and requests further instructions sent to their email.\" }, { \"timestamp\": 1576235280, \"description\": \"Final instructions provided for completing the transfer setup.\" }, { \"timestamp\": 1576235340, \"description\": \"Completion of process and exit conversation.\" } ] }\n \t \t \t \t \t \t \t \t \t\t" } }
POST /test
Performs a test request
Can be used to check some custom functionalities.
Enabled only in development
environment
Request Body: empty
Response Body:
# Anything // for testing and dev
Winston, Morgan & Custom Python Logger are used for logging all important events and errors. Logs are stored in the LOG_DIR
and rotated daily.
To check the logs:
tail -f data/logs/node_YYYY-MM-DD.log
tail -f data/logs/processor_YYYY-MM-DD.log
Every hour, the server checks the videos, audios, json, and prompts directories, deleting files older than the configured retention period, default: 2 days
.
I welcome contributions to Ollama Optivus AI! To contribute to this project, please follow these steps:
Fork the Repository:
Create a copy of the repository by clicking the "Fork" button in the top-right corner of the GitHub repository page.
Clone Your Fork:
Clone your forked repository to your local machine:
git clone https://github.com/yourusername/OllamaOptivus.ai.git cd OllamaOptivus.ai
git checkout -b feature/your-feature-name
Make Changes:
Implement your feature, fix a bug, or improve documentation. Be sure to write clear, concise commit messages.
Run Tests:
Ensure that all tests pass (if applicable). If you're adding new features, consider writing new tests.
Commit Changes:
After making your changes, commit them:
git add . git commit -m "Add/Update [feature/fix]"
git push origin feature/your-feature-name
If you find any issues or bugs in the project, please open an issue. Be sure to include as much detail as possible, including:
You can open an issue here: Issues
If you add new feature or change existing functionality, please update the documentation accordingly. You can edit the README.md
file directly or submit improvements for the project’s documentation.
There are no models linked
There are no models linked
There are no datasets linked
There are no datasets linked