# Personal Knowledge Brain – User-Scoped RAG Assistant
Overview
Personal Knowledge Brain (PKB) is a user-scoped Retrieval-Augmented Generation (RAG) assistant designed to answer questions grounded strictly in user-provided documents. Unlike generic chatbots, PKB focuses on personal knowledge management with persistent memory and clean architectural separation.
Each user has an isolated knowledge base and conversation memory, making the system suitable for multi-user and future multi-tenant deployments.
Key Capabilities
- Retrieval-Augmented Generation (RAG)
- User-scoped document ingestion and isolation
- Semantic search using vector embeddings
- Persistent conversation memory
- Modular and extensible backend architecture
Installation and Usage
Prerequisites
- Python 3.10 or higher
- A Groq API key for LLM access
- Basic familiarity with Python virtual environments
Installation
- Clone the repository:
git clone https://github.com/Rakmo5/readyTensor_RAG-project.git
cd readyTensor_RAG-project/project
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create a .env file in the project root and add:
GROQ_API_KEY=your_api_key_here
Usage
- Add text or markdown documents to:
data/users/<user_id>/documents/
- Run the ingestion pipeline to index documents:
python test_vector_store.py
- Query the assistant:
python test_chat.py
Model Selection Rationale
Embedding Model
The system uses the all-MiniLM-L6-v2 Sentence Transformer for document embedding. This model was selected due to its strong balance between semantic accuracy and computational efficiency. It performs well for semantic similarity tasks while remaining lightweight, making it suitable for scalable and user-scoped knowledge bases.
Language Model
The assistant uses a Groq-hosted large language model for response generation. This model was chosen for its low-latency inference and reliable instruction-following behavior. When combined with retrieved document context, it enables grounded and responsive answers without relying on external or implicit knowledge.
Safety Guardrails
To ensure reliability and prevent hallucinations, the assistant follows these safety guardrails:
- Responses are generated using retrieved document context provided by the user.
- If the retrieved context does not contain sufficient information, the assistant explicitly acknowledges uncertainty.
- Conversation memory is used only for maintaining dialogue continuity and personalization, not as a source of factual knowledge.
- The knowledge base is updated explicitly, preventing accidental ingestion of unverified or noisy information.
- The assistant avoids assumptions and does not rely on external or implicit world knowledge.
Evaluation Strategy
The system is evaluated qualitatively to ensure correctness, reliability, and grounded behavior. The primary evaluation criteria include:
- Retrieval Relevance: Whether the retrieved document chunks are semantically relevant to the user’s query.
- Answer Groundedness: Whether generated responses are directly supported by the retrieved context.
- Hallucination Avoidance: Whether the assistant appropriately acknowledges uncertainty when insufficient information is available.
- Conversational Consistency: Whether follow-up questions are answered coherently using prior conversational context.
- User-Scoped Isolation: Verification that knowledge and memory remain isolated across different users.
This evaluation approach prioritizes explainability and factual correctness over purely generative fluency.
System Architecture
The system follows a clean backend-first design:
- Conversation Memory: Stored persistently using SQLite
- Knowledge Store: Vector embeddings stored in ChromaDB
- Embeddings: Sentence Transformers (
all-MiniLM-L6-v2)
- LLM: Groq-hosted large language model
- User Isolation: Separate data directories per user
Each user operates within their own logical “knowledge brain”.
RAG Workflow
- User adds documents (text or markdown)
- Documents are chunked into overlapping segments
- Chunks are embedded into vector representations
- Embeddings are stored persistently per user
- User queries are matched using semantic similarity
- Retrieved context is injected into the LLM prompt
- Grounded responses are generated
Knowledge vs Conversation Memory
The system intentionally separates:
This design improves accuracy, traceability, and explainability.
Usage Summary
- Add documents to the user-specific documents directory
- Run the ingestion pipeline to index knowledge
- Query the assistant via the backend interface
- Responses are generated using retrieved document context
Limitations
- Minimal interface (backend-focused)
- Manual ingestion step
- Limited document formats (text/markdown)
Future Improvements
- Web-based chat interface
- Document upload via UI
- Knowledge editing and deletion
- Advanced memory summarization
- Multi-agent reasoning workflows
Conclusion
Personal Knowledge Brain demonstrates a practical and extensible implementation of a user-scoped RAG system with persistent memory. The project emphasizes correctness, modularity, and real-world applicability over superficial features.