This project implements a Retrieval-Augmented Generation (RAG) chatbot that loads JSON documents as its knowledge base. It employs token-aware chunking to efficiently split large texts for embedding and retrieval. Using a large language model integrated via the Grok API, the chatbot delivers accurate, context-aware answers while maintaining session memory for conversational continuity. The system features a user-friendly Gradio interface, allowing users to interact with the chatbot seamlessly. This lightweight yet effective pipeline demonstrates how JSON-based document collections can power practical conversational AI applications
The system follows an end-to-end pipeline to enable effective question answering over JSON-based documents using Retrieval-Augmented Generation.
Document Loading:
Documents are loaded from local JSON files using a custom loader that extracts relevant textual content. This step prepares the data for subsequent processing.
Token-Aware Chunking:
To manage large documents and optimize retrieval, texts are split into overlapping chunks using a token-based splitter (TokenTextSplitter) configured with a chunk size of 200 tokens and an overlap of 30 tokens. This token-level splitting ensures chunks fit within the language modelβs context window and preserves semantic coherence.
Embedding and Vector Store Creation:
Each chunk is converted into vector embeddings using a pre-trained embedding model. The embeddings are stored in a vector database, enabling efficient similarity-based retrieval during query time.
Retriever Setup:
A retriever interface is built on top of the vector store to fetch relevant chunks based on user queries.
Conversational Chain with Session Memory:
A conversational retrieval chain is constructed by combining the retriever with a large language model (LLM), accessed via the Grok API. A session memory buffer tracks chat history to maintain context across multiple turns, enabling coherent and context-aware responses.
User Interface:
The entire pipeline is wrapped in a Gradio web interface, allowing users to ask questions interactively. The UI also supports resetting the conversation context.
The implemented Retrieval-Augmented Generation chatbot successfully demonstrated effective retrieval and generation capabilities over JSON document collections. Token-aware chunking enabled precise segmentation of large texts, facilitating accurate embedding and retrieval of relevant information. The conversational chain, integrated with the Grok API, provided coherent and contextually relevant answers across multiple turns by leveraging session memory.
User interactions via the Gradio interface were smooth, with low latency responses and the ability to reset chat context seamlessly. The system showed robustness in handling diverse queries related to the loaded JSON documents, validating the overall design for practical conversational AI applications. While currently limited to JSON inputs, the modular architecture positions the application for easy expansion to other document types and enhanced LLM integrations.