This project is an AI-powered assistant that allows users to have real-time conversations with an LLM while working. By capturing a screenshot of the user's current screen and processing voice input, the assistant provides intelligent, context-aware responses to enhance productivity and streamline workflows.
Voice Interaction – Speak naturally to ask questions without interrupting your work.
AI-Powered Responses – Utilizes LLMs to generate concise, insightful answers based on both the question and the screen's content.
Contextual Awareness: Automatically captures a screenshot to give the LLM visual context for more relevant answers.
Text-to-Speech Output: Converts responses into natural-sounding speech for a seamless hands-free experience.
Real-Time Assistance: Instantaneous transcription using Whisper, processing, and response generation for fast, efficient interactions.
This assistant is perfect for multitaskers, researchers, and professionals who need quick, AI-assisted insights without breaking their workflow.
Files can be found in the GitHub repo