RAG-Powered Transformers Chatbot with Streamlit UI and Persistent Memory

Overview

This project presents a Retrieval-Augmented Generation (RAG) chatbot that combines the capabilities of HuggingFace Transformers, FAISS vector search, and a modern Streamlit-based interface to deliver real-time, context-aware responses. Unlike traditional chatbots that rely solely on pre-trained models, this system augments generation with semantic retrieval from a custom knowledge base, making it suitable for diverse domains such as documentation assistance, education, customer support, and developer guidance.

One of the distinguishing features of this chatbot is its persistent conversational memory, which allows the system to retain context across multiple interactions during a session. This memory component enables more natural, coherent, and contextually aware conversations.

System Features

The chatbot integrates multiple components that work seamlessly to provide a high-quality user experience.

First, it uses a RAG pipeline to combine semantic retrieval with LLM-based generation. The system ingests and indexes large datasets or web-based content, which are then stored as vector embeddings using FAISS. By employing the sentence-transformers/all-MiniLM-L6-v2 model, it ensures accurate semantic similarity search across the knowledge base.

The persistent memory module enhances dialogue flow by maintaining a structured record of previous exchanges. This memory persists throughout the runtime of the chatbot, ensuring that answers are contextually consistent.

Finally, the Streamlit-powered interface makes the chatbot easy to use, offering a clean and interactive environment where users can ask questions and receive answers in real time. The application is optimized to handle large documents through chunk-based text splitting, which prevents meta-tensor errors and supports scalable ingestion.

Technology Stack

The solution is implemented using Python 3.10+ and leverages several modern AI/ML frameworks. LangChain provides tools for chaining prompts, managing memory, and orchestrating the RAG workflow. HuggingFace Transformers power the LLM capabilities, while FAISS serves as the high-performance vector database for similarity search. The frontend is built with Streamlit, which enables rapid deployment and a responsive web-based chat experience. Additionally, dotenv is used for secure environment variable management.

Architecture

The architecture follows a structured pipeline. It begins with data ingestion, where either local documents or scraped web content are loaded through LangChain utilities. The content is then subjected to semantic text chunking, breaking larger documents into meaningful sections for more efficient retrieval.

Once the text is segmented, the system generates dense embeddings, which are indexed and stored within FAISS. At query time, relevant document chunks are retrieved and passed to the RAG pipeline, which integrates them with the Transformer model for generating contextually accurate responses.

To enhance continuity, the chatbot employs a conversational memory layer, which keeps track of the dialogue history. The entire pipeline is accessible through a Streamlit-based interface, providing users with an intuitive and interactive experience.

Use Cases

This chatbot has a broad range of applications. For instance, it can function as a documentation assistant, helping users quickly navigate technical manuals or product guides. In a developer support setting, it can provide immediate answers about frameworks and tools, reducing dependency on static resources. As an educational tutor, it can engage students with subject-specific queries, and in the customer service domain, it can automate responses with contextual accuracy.

Installation and Usage

To run the chatbot, the repository can be cloned from GitHub. Dependencies are installed using requirements.txt, and environment variables are configured through a .env file. Once the knowledge base is ingested using ingest.py, the chatbot can be launched with Streamlit:

# Clone the repository
git clone https://github.com/narevignesh/RAG_TRANSFORMERS.git
cd RAG_TRANSFORMERS

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
GROQ_API_KEY=your_groq_key_here
LANGSMITH_API_KEY=your_langsmith_key_here

# Ingest data
python ingest.py

# Run chatbot
streamlit run app.py