
Transform your documents into an intelligent knowledge assistant.
Large Language Models (LLMs) can produce fluent answers, but in knowledge-work settings (PMO, HR, operations) fluency is not enough—answers must be traceable to internal documents such as onboarding checklists, offboarding procedures, and timesheet policies. Without grounding, an assistant may:
This project implements a Retrieval-Augmented Generation (RAG) assistant called KnowBridge that answers user questions using only the indexed internal documents (currently .md and .docx). The system is designed to prioritize traceability and practical deployment: persistent storage, incremental re-indexing, and session-based chat history.
General-purpose LLMs are not reliable when asked detailed procedural questions. They can:
The core problem is the absence of source-bounded reasoning: the model must be forced to answer from retrieved context, not from prior training knowledge.
.md) and Word (.docx) knowledge filesKnowBridge uses a modular pipeline that separates:
This separation makes the system maintainable and easy to extend.

Web UI — Knowledge Base (Upload & Index)

Web UI — Chat (Grounded Answer + Sources)

Example — Indexed Files Table / Vector DB State

Data Layer
.md and Word .docx).rag-knowledge-assistant/data/:
onboarding_alpha.mdoffboarding_alpha.mdtimesheet.mdProcessing Layer
Embedding Layer
sentence-transformers/all-MiniLM-L6-v2Storage Layer
Retrieval Layer
Generation Layer
ChatGroqSOURCES: line for traceabilityMemory Layer
Interface Layer
Uploaded .md and .docx files are handled via an upsert strategy:
The file hash is stored in Chroma metadata so it can be compared later.
Documents are processed as plain text. Each stored chunk is associated with:
source (a source identifier derived from the uploaded filename; legacy .md sources are stored without an extension for backward compatibility)file_hash (SHA-256 of the full document content)Recursive character-based chunking using LangChain’s splitter:
This balances retrieval precision (smaller chunks) and context completeness (overlap).
Each chunk is embedded using:
all-MiniLM-L6-v2Embeddings are stored in a persistent ChromaDB collection:
rag-knowledge-assistant/outputs/vector_db/knowledge_baseBenefits:
At query time:
Default retrieval parameters (configurable in YAML):
n_results: 5threshold: 0.5 (distance; lower means closer match)Fallback behavior: If no chunks pass the threshold, the system returns the top-k results anyway (to avoid an empty context). This is paired with strict prompting to minimize hallucination.
The RAG assistant prompt enforces:
SOURCES: ... (or SOURCES: none)The assistant supports long-running conversations with bounded context:
This allows the assistant to remain aware of the conversation while keeping prompts manageable.
+--------------------------+
| Knowledge Files |
+------------+-------------+
|
v
+--------------------------+
| Upsert + Hash Checking |
| (added/updated/unchanged)|
+------------+-------------+
|
v
+--------------------------+
| Text Chunking |
| (Recursive Splitter) |
| size=1000, overlap=200 |
+------------+-------------+
|
v
+--------------------------+
| Sentence-Transformer |
| all-MiniLM-L6-v2 |
+------------+-------------+
|
v
+--------------------------+
| Chroma Persistent Vector |
| DB (cosine / HNSW) |
+--------------------------+
+--------------------+
| User Query |
+---------+----------+
|
v
+-----------------------------+
| Load SQLite Session History |
| + Rolling Summary |
+--------------+--------------+
|
v
+-----------------------------+
| Embed Query + Retrieve TopK |
| threshold filtering + |
| fallback to topK if empty |
+--------------+--------------+
|
v
+-----------------------------+
| Grounded Prompt Build |
| (docs-only + SOURCES line) |
+--------------+--------------+
|
v
+-----------------------------+
| Groq LLM (ChatGroq) |
| model: llama-3.1-8b-instant |
+--------------+--------------+
|
v
+-----------------------------+
| Response + SOURCES |
| Persist to SQLite |
+-----------------------------+
The system prompt explicitly states:
The generation prompt is constructed around “Relevant documents + User question”, making retrieval an explicit dependency for answering.
Each retrieved chunk is prefixed with [Source: <name>], and the model is required to append:
SOURCES: <comma-separated list>This provides transparent traceability to the document(s) used.
The intended safe behavior for out-of-scope questions is a refusal:
In addition, the retriever has a practical fallback (return top-k results even if none pass the threshold) to avoid empty-context generation. In practice, threshold tuning is important to balance recall vs. relevance.
langchain_groq)langchain_huggingface, sentence-transformers)python-docx (Word .docx text extraction)python-dotenv, pyyamlcode/app.py: Gradio UI (upload/index + chat)code/document_indexer.py: chunking, embeddings, Chroma persistence, incremental upsertcode/rag_pipeline.py: retrieval, prompt construction, LLM invocation, summarizationcode/chat_history_db.py: SQLite-backed history + rolling summariescode/config/config.yaml: model + vector DB parameterscode/config/prompt_config.yaml: grounding prompt rules (including SOURCES:)llama-3.1-8b-instantn_results=5, threshold=0.5 (distance)trimming_window_size=6 messages in the prompt window.doc (non-.docx) is not supported.docx extraction is best-effort plain-text (formatting, images, and complex layouts are not preserved)temperature=0) for higher stability.docx parsing/normalization (headers/footers, lists, section structure)KnowBridge demonstrates a practical, modular RAG architecture for correctness-sensitive internal knowledge work. By combining persistent vector search, strict grounding prompts, incremental indexing, and session-based memory persistence, the assistant can answer operational questions in a way that is more traceable and controllable than a general-purpose LLM alone.
GitHub repository: https://github.com/rajapateriya/knowbridge