
Transform your documents into an intelligent knowledge assistant.
Large Language Models (LLMs) can produce fluent answers, but in document-centric workflows fluency is not enough—answers must be traceable to the documents a user provides. Whether those documents contain onboarding checklists, offboarding procedures, timesheet policies, or other reference material, an assistant without grounding may:
This project implements a Retrieval-Augmented Generation (RAG) assistant called KnowBridge that answers user questions using only the indexed documents provided by the user (currently .md, .docx, and .doc). The system is designed to prioritize traceability and practical deployment: persistent storage, incremental re-indexing, and session-based chat history.
General-purpose LLMs are not reliable when asked detailed procedural questions. They can:
The core problem is the absence of source-bounded reasoning: the model must be forced to answer from retrieved context, not from prior training knowledge.
.md) and Word (.docx, .doc) knowledge filesKnowBridge uses a modular pipeline that separates:
This separation makes the system maintainable and easy to extend.

Web UI — Knowledge Base (Upload & Index)

Web UI — Chat (Grounded Answer + Sources)

Example — Indexed Files Table / Vector DB State

Data Layer
.md and Word .docx / .doc).rag-knowledge-assistant/data/:
onboarding_alpha.mdoffboarding_alpha.mdtimesheet.mdProcessing Layer
Embedding Layer
sentence-transformers/all-MiniLM-L6-v2Storage Layer
Retrieval Layer
Generation Layer
ChatGroqSOURCES: line for traceabilityMemory Layer
Interface Layer
To run KnowBridge locally, first ensure Python 3.10 or later is installed and a valid Groq API key is available. Then create and activate a virtual environment, install dependencies from requirements.txt, and add a .env file in the project root with GROQ_API_KEY=<your_api_key>. Once configuration is complete, start the application with python code/app.py from the rag-knowledge-assistant/ directory. The app launches a Gradio interface with two tabs: in the Knowledge Base tab, upload .md, .docx, or .doc files and trigger indexing; in the Chat tab, ask grounded questions and review answers with SOURCES citations generated from retrieved context.
KnowBridge handles uploaded .md, .docx, and .doc files through an incremental upsert workflow designed to avoid redundant processing. For each file, the system computes a SHA-256 hash and compares it with hash metadata already stored in Chroma. If the file is new, it is chunked, embedded, and inserted into the collection. If the source exists but the hash has changed, previous chunks for that source are removed and the updated content is re-indexed. If both source and hash match, the file is treated as unchanged and skipped, reducing embedding cost and indexing time.
Documents are normalized to plain text before chunking and embedding. Each chunk stored in the vector database carries metadata for traceability, including source and file_hash. The source value is derived from the uploaded filename, while supporting legacy .md source naming for backward compatibility. This metadata enables source-aware retrieval, update detection, and transparent citation in generated answers.
The system uses LangChain's recursive character splitter with a chunk size of 1000 characters and an overlap of 200 characters. This setting was selected to preserve enough local context for policy-style instructions while keeping chunks compact enough for precise similarity matching. Overlap reduces the risk of splitting important details across boundaries, which improves retrieval quality for multi-step procedural questions.
Each text chunk is embedded using the sentence-transformer model all-MiniLM-L6-v2. The same embedding model is also used for user queries, ensuring representation consistency between indexed knowledge chunks and incoming questions during similarity search.
Embeddings are persisted in a ChromaDB collection named knowledge_base under rag-knowledge-assistant/outputs/vector_db/. Because storage is persistent, the index is reused across runs instead of rebuilt on every startup. This makes the assistant practical for iterative usage, where users repeatedly upload, revise, and query internal process documents.
At query time:
In the current configuration, n_results is set to 5 and the distance threshold is set to 0.5, where lower values indicate closer semantic match. The current query-processing design intentionally stays lightweight: it performs semantic retrieval over the indexed corpus, applies threshold-based filtering, and passes the retrieved context to a grounding-focused prompt. When no candidate passes the threshold, the retriever returns top-k candidates as a fallback to avoid empty-context generation; this is combined with strict prompting so the model can still refuse when the evidence is weak or off-topic. This design keeps the assistant broadly applicable to arbitrary user-supplied document sets rather than optimizing for one fixed domain.
The generation layer uses a grounding-focused prompt that instructs the model to answer only from retrieved documents, explicitly refuse when the answer is not supported by context, and avoid relying on model-internal prior knowledge. Every response is required to end with a SOURCES line so users can quickly verify which indexed documents informed the answer.
KnowBridge supports long-running conversations through SQLite-based session persistence, recent-window trimming, and rolling summarization of older turns. This design keeps prompt size bounded while preserving useful context from earlier parts of the conversation, allowing follow-up questions to remain coherent without uncontrolled token growth.
+--------------------------+
| Knowledge Files |
+------------+-------------+
|
v
+--------------------------+
| Upsert + Hash Checking |
| (added/updated/unchanged)|
+------------+-------------+
|
v
+--------------------------+
| Text Chunking |
| (Recursive Splitter) |
| size=1000, overlap=200 |
+------------+-------------+
|
v
+--------------------------+
| Sentence-Transformer |
| all-MiniLM-L6-v2 |
+------------+-------------+
|
v
+--------------------------+
| Chroma Persistent Vector |
| DB (cosine / HNSW) |
+--------------------------+
+--------------------+
| User Query |
+---------+----------+
|
v
+-----------------------------+
| Load SQLite Session History |
| + Rolling Summary |
+--------------+--------------+
|
v
+-----------------------------+
| Embed Query + Retrieve TopK |
| threshold filtering + |
| fallback to topK if empty |
+--------------+--------------+
|
v
+-----------------------------+
| Grounded Prompt Build |
| (docs-only + SOURCES line) |
+--------------+--------------+
|
v
+-----------------------------+
| Groq LLM (ChatGroq) |
| model: llama-3.1-8b-instant |
+--------------+--------------+
|
v
+-----------------------------+
| Response + SOURCES |
| Persist to SQLite |
+-----------------------------+
The system prompt explicitly states:
The generation prompt is constructed around “Relevant documents + User question”, making retrieval an explicit dependency for answering.
Each retrieved chunk is prefixed with [Source: <name>], and the model is required to append:
SOURCES: <comma-separated list>This provides transparent traceability to the document(s) used.
The intended safe behavior for out-of-scope questions is a refusal:
In addition, the retriever has a practical fallback (return top-k results even if none pass the threshold) to avoid empty-context generation. In practice, threshold tuning is important to balance recall vs. relevance.
This implementation uses semantic retrieval over ChromaDB with all-MiniLM-L6-v2 embeddings, n_results=5, and threshold-based filtering (threshold=0.5) before generation. The current publication focuses on system behavior and qualitative verification rather than a formal benchmark suite.
Representative validation queries include: "What are the onboarding steps for a new employee?", "What are the offboarding responsibilities?", and "How should timesheet entries be submitted and approved?" For each query, the system retrieves relevant chunks from onboarding_alpha.md, offboarding_alpha.md, or timesheet.md. The final answer then includes a SOURCES line for traceability, showing the expected pipeline behavior of retrieval-first grounded generation. This clarifies that KnowBridge is scoped by document availability and query answerability, rather than by a single business domain.
langchain_groq)langchain_huggingface, sentence-transformers)python-docx (Word .docx text extraction)python-dotenv, pyyamlKnowBridge uses llama-3.1-8b-instant via Groq because it provides a reliable balance of instruction-following quality and low-latency responses for interactive RAG workflows. In this project setup, the model consistently follows grounding rules from the system prompt and works well with the retrieved-context format used by the pipeline. This makes it a practical choice for document-based Q&A without adding extra model-routing complexity.
code/app.py: Gradio UI (upload/index + chat)code/document_indexer.py: chunking, embeddings, Chroma persistence, incremental upsertcode/rag_pipeline.py: retrieval, prompt construction, LLM invocation, summarizationcode/chat_history_db.py: SQLite-backed history + rolling summariescode/config/config.yaml: model + vector DB parameterscode/config/prompt_config.yaml: grounding prompt rules (including SOURCES:)llama-3.1-8b-instantn_results=5, threshold=0.5 (distance)trimming_window_size=6 messages in the prompt window.doc extraction depends on document format and available local conversion support.docx extraction is best-effort plain-text (formatting, images, and complex layouts are not preserved)temperature=0) for higher stability.docx parsing/normalization (headers/footers, lists, section structure)KnowBridge demonstrates a practical, modular RAG architecture for correctness-sensitive internal knowledge work. By combining persistent vector search, strict grounding prompts, incremental indexing, and session-based memory persistence, the assistant can answer operational questions in a way that is more traceable and controllable than a general-purpose LLM alone.
GitHub repository: https://github.com/rajapateriya/knowbridge