Raven AI is a production-ready, RAG-powered portfolio assistant designed to provide grounded, context-aware answers about the developer's skills, projects, and professional experience. Unlike generic chatbots, Raven uses a Retrieval-Augmented Generation (RAG) pipeline to anchor its responses in a custom knowledge base derived from portfolio data, virtually eliminating hallucinations on domain-specific queries.
The project implements a lightweight, custom RAG architecture built from first principles using Node.js and Google's Gemini API. By bypassing heavy orchestration frameworks in favor of a custom vector search implementation, the system achieves low-latency retrieval and high explainability. The assistant features a hybrid routing logic that intelligently distinguishes between grounded portfolio questions (strict RAG) and general conversational queries (open-ended LLM), offering a seamless user experience deployed live on Microsoft Azure.
This project implements a full-stack RAG pipeline without relying on heavy abstraction layers, demonstrating mastery of the core agentic AI concepts:
Source Data: Raw text data is extracted from the portfolio website's sections (About, Projects, Skills).
Chunking: Content is logically segmented into granular "knowledge chunks" to optimize retrieval precision.
Embeddings: Each chunk is converted into high-dimensional vectors using Google's text-embedding-004 model via a custom Node.js ingestion script.
Vector Store: Instead of a heavyweight database, the project uses a highly optimized, in-memory JSON-based vector store (vector-database.json), ensuring sub-millisecond access times for the specific dataset size.
Query Processing: User queries are embedded on-the-fly using the same text-embedding-004 model.
Similarity Search: A custom Cosine Similarity algorithm scans the vector store to identify the top 3 most relevant knowledge chunks based on semantic proximity to the user's query.
Context Injection: The retrieved chunks are injected into a system prompt that enforces strict grounding rules (e.g., "Answer only using the provided context").
LLM Inference: The augmented prompt is sent to Google Gemini 1.5 Pro / 2.5 Pro, which synthesizes the final answer.
Hybrid Routing: The system creates a "Hybrid AI" by evaluating retrieval relevance scores. High-relevance matches trigger the strict RAG workflow, while low-relevance matches fall back to a general conversational mode, ensuring the bot remains helpful even when off-topic.
Backend: A robust Node.js/Express API handles the orchestration.
Frontend: A responsive, accessible chat interface built with vanilla JavaScript.
Deployment: The system is deployed on Microsoft Azure App Service, demonstrating cloud-readiness and real-world accessibility.
The Raven AI Assistant successfully meets the objectives of Module 1, delivering a functioning, deployed Q&A agent.
Accuracy & Grounding: The assistant correctly answers specific portfolio questions (e.g., "What is Adil's collaboration policy?") by retrieving the exact text chunks defined in the vector store. It refuses to hallucinate details not present in the source text when in RAG mode.
Hybrid Intelligence: The system successfully switches modes. It answers "Who is Adil?" using RAG context but answers "What is 2+2?" using general LLM knowledge, satisfying the optional enhancement for advanced reasoning flow.
Performance: By using a lightweight JSON store and custom retrieval logic, the system achieves near-instant retrieval latency, significantly faster than overhead-heavy vector DBs for datasets of this magnitude.
Live Deployment: The project is fully productionized and accessible publicly, validating the end-to-end pipeline from ingestion to user interaction.
Example Interaction:
User: "Tell me about the Arabic Romanizer project."
Retrieval: Identifies chunk-8 (Project details) with high cosine similarity.
Response: "The Arabic Romanizer is an AI-powered tool for transliteration..." (Grounded answer).