This project presents a Retrieval-Augmented Generation (RAG) chatbot that combines dense vector search with a lightweight language model to provide accurate, context-grounded responses. Unlike generic chatbots that rely purely on model knowledge, this system retrieves relevant information from a structured document base before generating its response. The solution is optimized for both CPU and GPU environments, supporting OpenVINO for Intel accelerators and PyTorch for CUDA-enabled devices.
The system is lightweight, easy to deploy locally, and can be adapted for real-world knowledge bases, research archives, or organizational document retrieval.
Recent advances in large language models have shown impressive capabilities in open-domain conversations. However, these models may hallucinate facts when not provided with external context.
RAG (Retrieval-Augmented Generation) bridges this gap by integrating retrieval (vector search) and generation (LLM response) into a single pipeline.
In this project, we built a RAG chatbot capable of:
This architecture ensures that responses are context-grounded, fast, and hardware-flexible.
title and content fields).langchain Document object.all-MiniLM-L6-v2) are used to convert text chunks into dense vectors.langchain.sentence-transformers/all-MiniLM-L6-v2HuggingFaceTB/SmolLM2-360M-Instruct| Parameter | Value |
|---|---|
| Chunk Size | 1000 characters |
| Overlap | 200 characters |
| Top-k Retrieved Chunks | 10 |
| Similarity Threshold | 0..1 |

This project shows how Retrieval-Augmented Generation (RAG) can make chatbots more reliable by grounding responses on real data instead of relying only on a language model.
By combining a retriever, a vector database, and a generation model, the chatbot provides accurate and context-aware answers to document-based queries.
Although the system performs well, thereβs still room for improvement in speed, memory handling, and scalability for larger datasets.
Overall, this work is a practical example of how RAG can turn traditional chatbots into smarter, knowledge-driven assistants.