LangChain-Docs-Bot

📘 Local Document-Powered Chatbot (FAISS + Google Gemini)

A lightweight, local, knowledge-grounded chatbot that answers questions based on your own documents---powered by LangChain, FAISS, and Google Gemini.

📌 1. Project Overview

This project implements a local knowledge-based chatbot that intelligently retrieves information from text documents stored in the knowledge_base/ directory.

It uses:

LangChain -- for document loading, vector stores, and prompt handling
Google Gemini LLM (via ChatGoogleGenerativeAI)
FAISS -- for vector search and similarity lookups
Python CLI interface -- for a simple, real-time chat experience

The chatbot workflow:

Load documents from the knowledge base
Convert text into embeddings
Store those embeddings inside a FAISS vector index
Retrieve relevant document chunks during Q&A
Use Gemini LLM to generate accurate, context-aware answers

This project demonstrates:

Document ingestion
Embedding creation
FAISS vector search
Prompt engineering
Retrieval-augmented generation (RAG)
Real-time command-line chatbot interaction

⚙️ 2. How It Works (Short Explanation)

Step-by-Step Flow

knowledge_base/ -> Load Documents -> Create Embeddings ↓ ↓ ↓ User Question -> Retrieve Similar Chunks -> Gemini LLM Answer

Pipeline Breakdown

Add .txt files to the knowledge_base/ folder
The program loads all text files using LangChain
Text is chunked and converted to vector embeddings using the Gemini Embedding API
FAISS builds a searchable vector index
When the user enters a question:
- The system performs similarity search in FAISS
- Retrieves the most relevant document chunks
- Feeds them into a prompt template
Gemini LLM produces a final answer based on document context

🧠 Example Code Snippet

Here is a minimal example showing how FAISS and Gemini are used together:

`from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader

Create embeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

Build FAISS index

faiss_index = FAISS.from_documents(document_objects, embeddings)

Chat model

llm = ChatGoogleGenerativeAI(
model="gemini-2.5-flash",
temperature=0.2,
max_tokens=200,
timeout=None,
max_retries=2,
)

Retrieval

qa_chain = RetrievalQA.from_chain_type(
llm=llm, chain_type="stuff", retriever=retriever
)
return qa_chain.invoke(prompt_structure)

LLM answer

def start_chat():
print("Welcome to the Chatbot! Type 'exit' to end the chat.\n")

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
KB_DIR = os.path.join(BASE_DIR, "knowledge_base")
# Documents
documents = load_documents_from_directory(KB_DIR)

# Load documents into FAISS
faiss_index = load_documents_to_faiss(documents)

while True:
    # Get user input (question)
    question = input("You: ")

    # Exit condition
    if question.lower() == "exit":
        print("Goodbye!")
        break

    # Get the answer from the model
    answer = ask_question(question, faiss_index)

    result = answer.get("result", "").strip()
    # If empty → tell user it was not found
    if not result:
        print("AI: It did not find anything.\n")
    else:
        print(f"AI: {result}\n")

🗂️ Folder Structure Example

├── knowledge_base/ │ ├── 1.txt │ ├── 2.txt │ └── 3.txt ├── chatbot.py ├── main.py ├── faiss_index.py └── README.md

🖼️ Architecture Diagram (ASCII)

+--------------------+ | knowledge_base/ | | (text documents) | +----------+---------+ | v +--------------------+ | Gemini Embeddings | +--------------------+ | v +--------------------+ | FAISS Index | +--------------------+ | v +----------------------------+ | Similarity Search (RAG) | +----------------------------+ | v +------------------------------+ | Gemini LLM Answer Generator | +------------------------------+