This project is an AI-powered assistant that allows users to have real-time conversations with an LLM while working. By capturing a screenshot of the user's current screen and processing voice input, the assistant provides intelligent, context-aware responses to enhance productivity and streamline workflows.
Voice Interaction – Speak naturally to ask questions without interrupting your work.
AI-Powered Responses – Utilizes LLMs to generate concise, insightful answers based on both the question and the screen's content.
Contextual Awareness: Automatically captures a screenshot to give the LLM visual context for more relevant answers.
Text-to-Speech Output: Converts responses into natural-sounding speech for a seamless hands-free experience.
Real-Time Assistance: Instantaneous transcription using Whisper, processing, and response generation for fast, efficient interactions.
This assistant is perfect for multitaskers, researchers, and professionals who need quick, AI-assisted insights without breaking their workflow.
Files can be found in the GitHub repo
There are no datasets linked
There are no datasets linked
There are no models linked
There are no models linked