The RAG AI Assistant with Local Hosted Qwen Chat LLM is a command-line-based question-answering chatbot built using the principles of Retrieval-Augmented Generation (RAG). The assistant uses a locally hosted version of the Qwen-7B-Chat Large Language Model (LLM) to generate responses based on a knowledge base stored in text files.
texts
folder.all-MiniLM-L6-v2
embedding model..txt
files from the texts
directory.To set up and run the RAG AI Assistant, follow these steps:
git clone https://github.com/AndreyGermanov/langchain_qwen_chat_cli.git cd langchain_qwen_chat_cli
python -m venv myenv source myenv/bin/activate # On Windows: myenv\Scripts\activate
pip install -r requirements.txt
This will install LangChain, HuggingFace Transformers, ChromaDB, and other required libraries.
python app.py
β οΈ Note: The first time you run the application, it will download the Qwen-7B-Chat model (~14 GB), which may take some time depending on your internet connection.
Once the application starts, you'll see the prompt:
Enter query:
You can now type questions related to the content in the texts
folder.
Enter query: How to start learning?
Response:
To start learning, you should sign up for a free account on Ready Tensor if you haven't already done so. Then, enroll in the program and navigate to the Certifications hub to request access. After your request is approved, you will have immediate access to program materials, including weekly lectures, reading materials, and project guidelines. You can also use the lectures, tools, or other resources you prefer to learn.
Enter query: What is an objective of the module 1 project?
Response:
The objective of the module 1 project is to build a question-answering assistant using core concepts of agent architectures, retrieval-augmented generation (RAG), and tool use.
Enter query: What should I deliver to complete the project?
Response:
The deliverable for the project is a simple RAG-based question-answering or document-assistant app. This means you should create an application that uses the RAG (Retrieve And Generate) system to answer questions or assist with documents.
Enter query: What is a due date of the module 1 project?
Response:
The due date for the module 1 project is June 13, 2025 at 11:59 pm UTC.
Enter query: Who is the president of the United States?
Response:
The given context does not contain an answer to your question. It's recommended to provide more context if you need further assistance.
To expand the assistantβs capabilities:
.txt
files in the texts
folder.python app.py
) to re-process the updated content.Each new file will be embedded and indexed during the next run, allowing the assistant to answer questions about its contents.
app.py
This is the main application logic:
DirectoryLoader
.HuggingFaceEmbeddings
.Chroma
vector store.qwen_chat.py
Implements a wrapper around the Qwen-7B-Chat model to make it compatible with LangChain's BaseChatModel
.
Key functions include:
_generate
: Processes input messages, formats them into a prompt, generates output using the model.The RAG AI Assistant with Local Hosted Qwen Chat LLM is a powerful demonstration of how modern AI technologies can be combined to create intelligent, context-aware applications without relying on cloud services. By integrating local LLMs, vector databases, and RAG techniques, this project provides a flexible and extensible foundation for future AI development. Whether you're a student working on certification projects or a developer exploring agentic systems, this assistant offers valuable insights into building autonomous, knowledge-driven applications.