ABSTRACT
This project presents a Custom Retrieval-Augmented Generation (RAG) Assistant with a graphical interface that allows users to interact with their own text documents.
The system integrates Ollama local LLMs (e.g., Llama 2, Mistral) with Hugging Face embeddings (all-MiniLM-L6-v2) and FAISS for efficient semantic search.
All data and processing run locally, ensuring privacy and offline capability.
Users simply select a folder of .txt files, initialize the system, and ask natural-language questions about their content.
METHODOLOGY
The assistant follows a classic RAG pipeline with three main components:
Document Loading & Preprocessing
Local .txt files are read using a robust text loader, split into 1000-character chunks (200 overlap), and encoded with sentence-transformers/all-MiniLM-L6-v2 embeddings.
Retrieval Module
Embeddings are stored in a FAISS vector database for fast similarity search. The retriever returns the top-k most relevant chunks (default = 4).
Generation Module
Retrieved context and user query are passed to an Ollama model (e.g., llama2, mistral, phi) via LangChain’s RetrievalQA chain for grounded text generation.
User Interface
A simple Tkinter GUI allows users to select datasets, initialize the RAG system, and query the assistant interactively.
Models: Ollama LLMs
, Hugging Face Embeddings
RESULTS
Example Output:
User: “Summarize the main points in these documents.”
Assistant: “The texts discuss AI fundamentals, transformer architectures, and retrieval-augmented generation methods using local models.”
CONCLUSION
The system delivers accurate, grounded responses on local text datasets with low latency, demonstrating how LangChain + FAISS + Hugging Face + Ollama enable practical offline RAG workflows.