This repository provides a template for building a Retrieval-Augmented Generation (RAG) AI assistant that can answer questions using your own documents.
It combines search and NLP to generate meaningful answers leveraging the content you provide.
Think of it as: ChatGPT that knows about YOUR documents and can answer questions about them.
๐ฆ Implementation Plan
Replace the sample documents with your own content
The data/
directory contains sample files on various topics. Replace these with documents relevant to your domain:
data/
โโโ your_topic_1.txt
โโโ your_topic_2.txt
โโโ your_topic_3.txt
Each file should contain text content you want your RAG system to search through.
Test chunking:
from src.vectordb import VectorDB vdb = VectorDB() chunks = vdb.chunk_text("Your test text here...") print(f"Created {len(chunks)} chunks")
Test document loading:
documents = [{"content": "Test document", "metadata": {"title": "Test"}}] vdb.add_documents(documents)
Test search:
results = vdb.search("your test query") print(f"Found {len(results['documents'])} results")
Once implemented, run:
python src/app.py
After Cloning try this example question to get an idea:
Input (query):
- "List three common tasks performed in NLP pipelines."
Expected Output:
Three common NLP pipeline tasks are tokenization, part-of-speech tagging, and named entity recognition.
Important: This template uses specific packages (ChromaDB, LangChain, HuggingFace Transformers) and approaches, but you are completely free to use whatever you prefer!
Before starting, make sure you have:
Clone and install dependencies:
git clone [(https://github.com/Avacaato/rt-aaidc-project1-template.git)] cd rt-aaidc-project1-template pip install -r requirements.txt
Configure your API key:
# Create environment file (choose the method that works on your system) cp .env.example .env # Linux/Mac copy .env.example .env # Windows
Edit .env
and add your API key:
OPENAI_API_KEY=your_key_here
# OR
GROQ_API_KEY=your_key_here
# OR
GOOGLE_API_KEY=your_key_here
# OR
PERPLEXITY_API_KEY=your_key_here
RT-AAIDC-PROJECT1-TEMPLATE/
โโโ data/ # Place your documents here
โโโ src/ # Application source code
โ โโโ app.py # Main application logic
โ โโโ vectordb.py # Vector database and search logic
โโโ .gitignore # Git ignore rules
โโโ LICENSE # Project license
โโโ README.md # This documentation
โโโ requirements.txt # Python dependenciesronment template
โโโ README.md # This guide
Your implementation is complete when: