This project implements a Retrieval-Augmented Generation (RAG)โbased AI assistant with a Streamlit front end. The system combines an embedding-based retriever with a generative model: documents are embedded into a local vector store, the most relevant documents are retrieved for each query, and those documents are used to ground the modelโs answers. The implementation comprises two main modules: rag_demo.py, which constructs embeddings, manages the vector collection, and performs similarity search; and app.py, which provides the Streamlit user interface for asking questions and viewing answers. The result is a lightweight, reproducible RAG pipeline that demonstrates how external knowledge improves response accuracy compared to a standalone generative model.

โFigure 1: Streamlit UI used to interact with the RAG assistant.โ
Pre-trained generative models often produce fluent but unsupported responses when they lack access to up-to-date or domain-specific information. Retrieval-Augmented Generation (RAG) mitigates hallucination by augmenting generation with a retrieval step: relevant documents are fetched and provided as grounding context to the model. This project demonstrates a compact RAG pipeline using Streamlit to make the interaction interactive and accessible. The system stores knowledge as embeddings, retrieves the top matches for each query, and uses those matches to produce grounded answers, illustrating an effective approach for combining retrieval and generation in small-scale deployments.
The implementation is divided into backend and frontend components and includes a simple setup procedure for running the app locally.
Backend (rag_demo.py): The backend is responsible for converting documents into vector embeddings, storing them in a local collection, and retrieving the top matches for a query. The make_embedder() function initializes the embedding model used to convert text into numerical vectors. get_collection() prepares and returns the local vector collection used to store and query embeddings. answer_research_question(query) orchestrates retrieval and generation: it finds the most similar document chunks, passes them along with the user query to the language model, and returns the generated, grounded answer. We chose an embedding + similarity-search approach to keep the pipeline modular and replaceable by other vector databases in future work.
Frontend (app.py): The Streamlit-based interface provides a text box where users enter questions and a โGet Answerโ button to trigger the RAG pipeline. A checkbox allows users to inspect the top retrieved documents that influenced the answer, supporting transparency and debuggability. The frontend sends the user query to the backend, receives the model response, and displays both the generated answer and the retrieved evidence.
A checkbox option allows users to view the top retrieved documents that contributed to the answer.

git clone https://github.com/your-username/rag-assistant.git cd rag-assistant
-pip install -r requirements.txt
-Run the Application
-streamlit run app.py
Open the Streamlit app in your browser (default: http://localhost:8501).
Enter a question in the text box.
Click Get Answer.
After launching, open the Streamlit UI at http://localhost:8501. Enter a question into the input box and click Get Answer to receive a grounded response. The ability to view retrieved documents makes it easy to evaluate whether the retrieval step provided relevant context.
The current system processes each query independently (single-turn retrieval). For future multi-turn interactions, we plan to add a conversational memory buffer that stores past queries and answers โ either as raw history or as summarized state โ enabling context-aware follow-ups. Reasoning in this system is performed by combining the retrieved document text with the user query and letting the model synthesize an answer; this grounding reduces unsupported claims and increases factual accuracy.
The retrieval process follows these steps:
Tested queries on the assistant by providing custom questions.
Verified that retrieved documents matched the context of the question.
Checked that answers improved when retrieved documents were relevant.
Compared answers with and without retrieved documents to validate the advantage of retrieval.
To validate the assistantโs performance, we tested queries against known documents.
top_k to 5 balanced relevance and efficiency.Documents are split into 500-token chunks with a 100-token overlap to maintain context continuity across chunk boundaries. This balance was chosen to keep chunk sizes compatible with typical model context windows while using overlap to avoid losing boundary information that could be important for retrieval. In practice, chunking with moderate overlap improved recall because key phrases that straddle chunk boundaries remained searchable.
The RAG assistant successfully retrieved relevant supporting documents for user queries.
The generated answers were more accurate and context-aware compared to using only a generative model.
The Streamlit interface allowed real-time interaction, making it easy to test the RAG pipeline.
The developed application demonstrates how RAG enhances AI assistants by integrating external knowledge retrieval with generation. Our implementation shows that:
A simple embedding-based retriever improves response quality.
The Streamlit interface provides an interactive and user-friendly way to explore RAG.
Even with a minimal setup (rag_demo.py + app.py), a functional RAG system can be built and extended further.
While the current assistant demonstrates the core of RAG, several improvements are planned: