π§© Overview
This project implements a Retrieval-Augmented Generation (RAG) pipeline that allows a local Large Language Model (LLM) β such as mistral:7b-instruct β to answer user questions based only on embedded local documents, not hallucinated data.
It is designed as a fully offline AI assistant, using:
Ollama for local LLM inference
FAISS for vector search
NumPy for embedding vector storage and similarity retrieval
The system retrieves the top relevant text segments from your dataset, feeds them into the LLM context, and generates a grounded answer.
βοΈ Features
β
Local & offline β no OpenAI API key required
β
FAISS vector search for document retrieval
β
Ollama integration with any local model (mistral, llama3, gemma3, etc.)
β
Context truncation handling (prevents model overload)
β
Grounding verification β detects if answer was based on retrieved context
π Project Structure
rag-assistant/
βββ data/sample_docs/ # Your local corpus (Wikipedia or Ready Tensor publications)
β βββ RAG.txt
β βββ LongChain.txt
β βββ MCP.txt
β βββ Agentic AI.txt
βββ src/
β βββ embeddings.py # Local embedding generator
β βββ indexer.py # FAISS index builder / loader
β βββ generator.py # Model interaction (Ollama)
β βββ pipeline.py # Main RAG pipeline (retrieval β generation β grounding)
βββ index/ # Auto-generated FAISS vector index (ignored by .gitignore)
βββ .env_example # Environment setup template β
βββ .gitignore # Secure Git ignore configuration β
βββ requirements.txt # Python dependencies
βββ README.md # This documentation
π Setup Instructions
1οΈβ£ Install Requirements
Make sure Python 3.10+ is installed, then run:
pip install -r requirements.txt
2οΈβ£ Install Ollama & Model
Download and install Ollama:
π https://ollama.com/download
Then pull a compatible model (recommended):
ollama pull mistral:7b-instruct
3οΈβ£ Prepare Environment
Copy the example environment file:
cp .env_example .env
You may adjust:
OLLAMA_MODEL=mistral:7b-instruct
π§ How to Run
Build Index
python -m src.indexer
Ask a Question
python -m src.pipeline --ask "What is RAG?"
β Expected Output:
π Retrieved context sources:
π¬ Model Answer:
Retrieval-Augmented Generation (RAG) is a technique...
β Answer appears grounded in retrieved context.
π Environment & Security Practices
This repository strictly follows Ready Tensor Secure AI Development guidelines:
File Purpose
.gitignore Prevents sensitive or large files (e.g., .env, index/) from being uploaded
.env_example Documents required environment variables without exposing real data
.env Private local file containing your runtime configuration β never committed
π Documentation & Reproducibility
Clear file structure and reproducible setup
No proprietary dependencies (fully open-source)
Runs 100% locally with Ollama and FAISS
Meets Ready Tensorβs Technical Rubric for βFunctional RAG systemβ and βBest Practices for AI/ML Documentationβ
π§Ύ Licensing & Data Source
This project uses Wikipedia articles for demonstration.
All content complies with Wikipediaβs CC BY-SA 4.0 License
.
If you adapt or expand this system using Ready Tensor publications, ensure authors permit reuse under Ready Tensorβs platform terms.