Personal Knowledge Brain – User-Scoped RAG Assistant

# Personal Knowledge Brain – User-Scoped RAG Assistant

Overview

Personal Knowledge Brain (PKB) is a user-scoped Retrieval-Augmented Generation (RAG) assistant designed to answer questions grounded strictly in user-provided documents. Unlike generic chatbots, PKB focuses on personal knowledge management with persistent memory and clean architectural separation.

Each user has an isolated knowledge base and conversation memory, making the system suitable for multi-user and future multi-tenant deployments.

Key Capabilities

Retrieval-Augmented Generation (RAG)
User-scoped document ingestion and isolation
Semantic search using vector embeddings
Persistent conversation memory
Modular and extensible backend architecture

Installation and Usage

Prerequisites

Python 3.10 or higher
A Groq API key for LLM access
Basic familiarity with Python virtual environments

Installation

Clone the repository:

git clone https://github.com/Rakmo5/readyTensor_RAG-project.git
cd readyTensor_RAG-project/project

Create and activate a virtual environment:

 python -m venv venv
 source venv/bin/activate   # Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create a .env file in the project root and add:

GROQ_API_KEY=your_api_key_here

Usage

Add text or markdown documents to:

data/users/<user_id>/documents/

Run the ingestion pipeline to index documents:
python test_vector_store.py
Query the assistant:

python test_chat.py

Model Selection Rationale

Embedding Model

The system uses the all-MiniLM-L6-v2 Sentence Transformer for document embedding. This model was selected due to its strong balance between semantic accuracy and computational efficiency. It performs well for semantic similarity tasks while remaining lightweight, making it suitable for scalable and user-scoped knowledge bases.

Language Model

The assistant uses a Groq-hosted large language model for response generation. This model was chosen for its low-latency inference and reliable instruction-following behavior. When combined with retrieved document context, it enables grounded and responsive answers without relying on external or implicit knowledge.

Safety Guardrails

To ensure reliability and prevent hallucinations, the assistant follows these safety guardrails:

Responses are generated using retrieved document context provided by the user.
If the retrieved context does not contain sufficient information, the assistant explicitly acknowledges uncertainty.
Conversation memory is used only for maintaining dialogue continuity and personalization, not as a source of factual knowledge.
The knowledge base is updated explicitly, preventing accidental ingestion of unverified or noisy information.
The assistant avoids assumptions and does not rely on external or implicit world knowledge.

Evaluation Strategy

The system is evaluated qualitatively to ensure correctness, reliability, and grounded behavior. The primary evaluation criteria include:

Retrieval Relevance: Whether the retrieved document chunks are semantically relevant to the user’s query.
Answer Groundedness: Whether generated responses are directly supported by the retrieved context.
Hallucination Avoidance: Whether the assistant appropriately acknowledges uncertainty when insufficient information is available.
Conversational Consistency: Whether follow-up questions are answered coherently using prior conversational context.
User-Scoped Isolation: Verification that knowledge and memory remain isolated across different users.

This evaluation approach prioritizes explainability and factual correctness over purely generative fluency.

System Architecture

The system follows a clean backend-first design:

Conversation Memory: Stored persistently using SQLite
Knowledge Store: Vector embeddings stored in ChromaDB
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
LLM: Groq-hosted large language model
User Isolation: Separate data directories per user

Each user operates within their own logical “knowledge brain”.

RAG Workflow

User adds documents (text or markdown)
Documents are chunked into overlapping segments
Chunks are embedded into vector representations
Embeddings are stored persistently per user
User queries are matched using semantic similarity
Retrieved context is injected into the LLM prompt
Grounded responses are generated

Knowledge vs Conversation Memory

The system intentionally separates:

Conversation Memory
Used for conversational continuity and stored automatically.
Knowledge Base
Represents factual or long-term information and is updated explicitly to avoid accidental ingestion.

This design improves accuracy, traceability, and explainability.

Usage Summary

Add documents to the user-specific documents directory
Run the ingestion pipeline to index knowledge
Query the assistant via the backend interface
Responses are generated using retrieved document context

Limitations

Minimal interface (backend-focused)
Manual ingestion step
Limited document formats (text/markdown)

Future Improvements

Web-based chat interface
Document upload via UI
Knowledge editing and deletion
Advanced memory summarization
Multi-agent reasoning workflows

Conclusion

Personal Knowledge Brain demonstrates a practical and extensible implementation of a user-scoped RAG system with persistent memory. The project emphasizes correctness, modularity, and real-world applicability over superficial features.