Jun 11, 2025●18 reads●Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA)

RAG AI Assistant with local hosted Qwen Chat LLM

AAIDC2025
ai
ai-assistant
chatbot
langchain
rag

Andrei Germanov

Overview

The RAG AI Assistant with Local Hosted Qwen Chat LLM is a command-line-based question-answering chatbot built using the principles of Retrieval-Augmented Generation (RAG). The assistant uses a locally hosted version of the Qwen-7B-Chat Large Language Model (LLM) to generate responses based on a knowledge base stored in text files.

Key Features

Local LLM: Uses the open-source Qwen-7B-Chat model, eliminating reliance on external APIs.
Knowledge Base: Supports dynamic knowledge expansion through text files placed in the texts folder.
Vector Embeddings: Converts text into vector embeddings using Hugging Face's all-MiniLM-L6-v2 embedding model.
Chroma Vector Database: Stores and retrieves relevant document chunks efficiently using ChromaDB.
RAG Architecture: Combines retrieval and generation steps to provide accurate and context-aware answers.

Technical Components

Core Libraries Used

LangChain: For building the RAG pipeline, prompts, and integration with LLMs.
HuggingFace Transformers: To load and run the Qwen-7B-Chat model locally.
ChromaDB: A lightweight and scalable vector database for efficient similarity search.
PyTorch: To manage GPU acceleration and tensor operations.

How It Works

Data Ingestion: On startup, the application loads all .txt files from the texts directory.
Text Splitting: Documents are split into smaller chunks (500 characters each, with 50-character overlap).
Embedding & Indexing: Each chunk is converted into a vector embedding and stored in ChromaDB.
Query Processing:
- User inputs a query.
- The system retrieves the top 10 most relevant text chunks using cosine similarity.
- These chunks are injected into a prompt template.
- The prompt is passed to the Qwen-7B-Chat model for response generation.
Output: The generated answer is returned to the user.

Installation Instructions

To set up and run the RAG AI Assistant, follow these steps:

Prerequisites

Python 3.9 or higher
Git installed
CUDA-compatible GPU (optional but recommended for faster inference)

Step-by-Step Setup

Clone the Repository

git clone https://github.com/AndreyGermanov/langchain_qwen_chat_cli.git
cd langchain_qwen_chat_cli

Create a Virtual Environment

python -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

This will install LangChain, HuggingFace Transformers, ChromaDB, and other required libraries.

Run the Application

python app.py

⚠️ Note: The first time you run the application, it will download the Qwen-7B-Chat model (~14 GB), which may take some time depending on your internet connection.

Usage Examples

Once the application starts, you'll see the prompt:

Enter query:

You can now type questions related to the content in the texts folder.

Example Queries

Query 1:

Enter query: How to start learning?

Response:

To start learning, you should sign up for a free account on Ready Tensor if you haven't already done so. Then, enroll in the program and navigate to the Certifications hub to request access. After your request is approved, you will have immediate access to program materials, including weekly lectures, reading materials, and project guidelines. You can also use the lectures, tools, or other resources you prefer to learn.

Query 2:

Enter query: What is an objective of the module 1 project?

Response:

The objective of the module 1 project is to build a question-answering assistant using core concepts of agent architectures, retrieval-augmented generation (RAG), and tool use.

Query 3:

Enter query: What should I deliver to complete the project?

Response:

The deliverable for the project is a simple RAG-based question-answering or document-assistant app. This means you should create an application that uses the RAG (Retrieve And Generate) system to answer questions or assist with documents.

Query 4:

Enter query: What is a due date of the module 1 project?

Response:

The due date for the module 1 project is June 13, 2025 at 11:59 pm UTC.

Query 5 (Out-of-scope):

Enter query: Who is the president of the United States?

Response:

The given context does not contain an answer to your question. It's recommended to provide more context if you need further assistance.

Extending the Knowledge Base

To expand the assistant’s capabilities:

Add or replace .txt files in the texts folder.
Restart the application (python app.py) to re-process the updated content.

Each new file will be embedded and indexed during the next run, allowing the assistant to answer questions about its contents.

Code Structure

`app.py`

This is the main application logic:

Loads text files using DirectoryLoader.
Splits documents into chunks.
Creates embeddings using HuggingFaceEmbeddings.
Stores embeddings in a Chroma vector store.
Constructs a LangChain RAG chain using a custom prompt template.
Runs a loop to accept queries and return answers from the LLM.

`qwen_chat.py`

Implements a wrapper around the Qwen-7B-Chat model to make it compatible with LangChain's BaseChatModel.

Key functions include:

_generate: Processes input messages, formats them into a prompt, generates output using the model.
Integration with PyTorch and CUDA for hardware acceleration.

References

Conclusion

The RAG AI Assistant with Local Hosted Qwen Chat LLM is a powerful demonstration of how modern AI technologies can be combined to create intelligent, context-aware applications without relying on cloud services. By integrating local LLMs, vector databases, and RAG techniques, this project provides a flexible and extensible foundation for future AI development. Whether you're a student working on certification projects or a developer exploring agentic systems, this assistant offers valuable insights into building autonomous, knowledge-driven applications.

RAG AI Assistant with local hosted Qwen Chat LLM

Table of contents

Overview

Key Features

Technical Components

Core Libraries Used

How It Works

Installation Instructions

Prerequisites

Step-by-Step Setup

Usage Examples

Example Queries

Query 1:

Query 2:

Query 3:

Query 4:

Query 5 (Out-of-scope):

Extending the Knowledge Base

Code Structure

`app.py`

`qwen_chat.py`

References

Conclusion

Table of contents

Code

Code