Tigress AI Assistant

The Tigress AI Assistant: A Hybrid LLM-Powered Chatbot Framework for Enhanced Customer Support

In my previous publication, i talked about how we could build a RAG system with Open_Source frameworks. Well, sit tight because we about to unveil something more interesting!
Building a Multi Agentic AI system with open source. Not just that, we will add our previous RAG system to this agent and . This will make it more powerful and impactful. So, what are you waiting for? let's get started...

Abstract

The proliferation of large language models (LLMs) presents a significant opportunity to automate and improve customer support systems. However, a key challenge is integrating these powerful but generic models with a company's specific knowledge base and existing communication platforms. This paper introduces the Tigress AI Assistant, a novel, open-source chatbot framework designed to address this challenge. Our framework combines a flexible LangGraph workflow with a Retrieval-Augmented Generation (RAG) system, human-in-the-loop (HITL) and a custom-built supervisor module. The system integrates with the Matrix communication protocol to provide real-time, context-aware responses. We detail the architecture, including the dynamic query-type detection, a secure RAG node for knowledge retrieval, and a hierarchical supervisor that orchestrates responses based on user intent. The Tigress AI Assistant demonstrates a robust and scalable approach to building intelligent chatbots that can handle diverse queries, from technical support to sales and general inquiries, while ensuring data privacy and maintaining a consistent brand voice.

1. Introduction

Modern customer support is increasingly shifting towards automated, conversational interfaces. While early rule-based chatbots were limited in their ability to handle complex queries, the emergence of advanced large language models (LLMs) offers unprecedented potential for more natural and effective interactions. However, a "one-size-fits-all" approach using a generic LLM is often insufficient. These models lack specific, up-to-date company information and may not adhere to strict business policies or privacy guidelines.

The Tigress AI Assistant is a hybrid framework that bridges this gap. It is a multi-agent AI bot built on the Matrix chat app to respond to customer complaints and reply automatically. This agent is reinforced with a RAG system to improve customer satisfaction and efficiency. It is built using LangChain and LangGraph for orchestration.

The framework is built upon the following core principles:

Modularity: The system's components (LLM interface, RAG system, prompt manager, and communication client, human-in-the-loop, escalation evaluator) are decoupled, allowing for easy substitution and customization.
Intelligence: A multi-layered architecture, featuring a supervisor and dynamic routing, ensures queries are handled by the most appropriate agent or process.
Integration: The framework is specifically designed to work with the Matrix communication protocol, a decentralized and secure network ideal for enterprise applications.

Our work presents a solution that is not only effective but also transparent and extensible, providing a blueprint for companies looking to deploy sophisticated, LLM-powered assistants.

1.1 Rationale for Open-Source Systems

Many existing agentic AI projects are built on platforms like WhatsApp (using Twilio API) or OpenAI (using ChatOpenAI), which can hinder developers from exploring new skills to solve real-world problems. To address this, the Tigress AI Assistant utilizes two key open-source systems: the Matrix protocol and the Hugging Face ecosystem (specifically Meta-LLaMA/Meta-LLaMA-3).

Matrix Protocol: Matrix is an open network for secure, decentralized communication. It supports end-to-end encryption for privacy and security. It includes features for chatting, voice and video calls, file sharing, and real-time collaboration. Matrix supports public and private rooms, spaces for organizing multiple rooms, and bridging to other messaging platforms. It also allows for bots and automation and provides synchronized history across multiple devices.
Hugging Face / Meta-LLaMA-3: Meta-LLaMA-3-8B-Instruct is an open-source LLM developed by Meta and is designed for instruction-following tasks. It enables applications such as text generation, question answering, and summarization. It is available on Hugging Face with pre-trained weights and supports fine-tuning. The model can be used with Hugging Face embeddings for semantic search and retrieval tasks. Users can run the model locally or via compatible platforms, and it integrates with frameworks like PyTorch for easy deployment.

2. Architectural Overview

The Tigress AI Assistant's architecture is built around a directed graph, orchestrated using LangGraph. This allows for a stateful and flexible workflow, where the path taken to generate a response depends on the nature of the user's query.

2.1 Core Components

The system is composed of several key modules:

Matrix Client: A custom Python class that handles all communication with the Matrix homeserver. It includes functions for logging in, sending messages, and continuously syncing to receive new messages from designated rooms. This acts as the external interface for the bot.
Prompt Manager: This module is responsible for centralizing and formatting all prompts sent to the LLM. It loads configurations from a YAML file, allowing for easy modification of the bot's persona (e.g., business name, location), response guidelines, and specialized instructions for different query types (e.g., sales, technical, complaints).
Document Loader & RAG System: The DocumentLoader ingests unstructured text files (.txt) from a local directory. The TigressTechRAG class then processes these documents using a RecursiveCharacterTextSplitter to create manageable chunks. It uses HuggingFaceEmbeddings with the sentence-transformers/all-MiniLM-L6-v2 model to create vector representations, which are then stored in a Chroma vector database. This allows the system to retrieve highly relevant context for a given user query.
Supervisor: This is the central decision-making component. It analyzes the user's input, detects the query type (e.g., technical, sales, complaint), and determines the optimal processing path within the LangGraph workflow. It can decide whether to perform a RAG lookup or handle the query directly.
Hugging Face LLM Client: A simple wrapper huggingface_completion function that calls a local or remote Hugging Face model (meta-llama/Meta-Llama-3-8B-Instruct in this implementation). This provides a streamlined interface for text generation.
Custom Tools: The system also includes custom tools like a calculator and an appointment schedule tool.
Escalation Evaluator: This new component, implemented as the EscalationEvaluator class, is crucial for determining if a conversation requires human intervention. It performs a comprehensive assessment by analyzing several factors, including:
- Conversation History: Detecting repetition or prolonged, unresolved discussions.
- User Sentiment: Identifying strong negative or frustration indicators.
- Complexity: Checking for high-complexity topics (e.g., legal, financial, advanced technical).
- Explicit Requests: Detecting direct or indirect requests to speak with a human.
- AI Capability: Assessing if the query requires human judgment, discretion, or if the AI's intended response indicates uncertainty. It returns an overall score and a primary reason to trigger a handoff.

2.2 The LangGraph Workflow

The bot's operational logic is defined within a StateGraph, which manages the flow of information through a series of nodes:

input_node: Receives a message from the Matrix client and initializes the AgentState with the user's query and sender information.
detect_query_type: The PromptManager analyzes the user's user_input to categorize it into one of several predefined types, such as general, technical, sales, or complaint. It also determines if a RAG lookup is necessary based on simple keywords.
Conditional Edge: Based on the detect_query_type output, the graph routes the query to one of two paths:

direct_llm_path: For queries that require a knowledge lookup (e.g., general, sales, technical), the state is passed to the secure_rag node.
supervisor_path: For more sensitive or complex queries (e.g., complaint, report), the state is passed directly to the supervisor node, bypassing the RAG and direct LLM nodes. This allows for specialized handling, such as escalating the issue or providing a specific, non-generative response.

secure_rag: The RAG system queries the Chroma knowledge base to retrieve relevant context to ground the LLM's response.
llm_node: Formats a final prompt using the PromptManager, combining the system persona, conversation history, and the retrieved context. It then calls the huggingface_completion function to generate the final response text.
supervisor_node: This node is a direct, simplified path for handling specific, pre-defined query types. It can either generate a canned response or perform a specialized action before passing the state to the output.
escalation_check: (NEW GATE) The EscalationEvaluator runs a comprehensive check on the user_input, conversation history, and the provisional AI response to determine if immediate human intervention is required. It updates the state with requires_human_escalation, escalation_reason, and escalation_score
ask_human: (INTERRUPT) This node is executed if requires_human_escalation is True. It pauses the workflow and generates a human_question asking a support human to decide the next step (e.g., proceed, escalate, or custom instructions).
process_human_response: (CONTINUATION) This node is executed once the support human provides a response. It updates the state based on the human's decision, setting the human_action_taken.
output_node: The final response is prepared for transmission back to the user via the Matrix client.

3. Implementation and Results

The Tigress AI Assistant is implemented in Python, leveraging a number of open-source libraries, including transformers, langchain, langgraph, and requests. The system is configured to run continuously, listening for messages on a specified Matrix channel.

The separation of concerns between modules proved highly effective during development and testing. For instance, the prompt template and business information can be adjusted simply by editing the prompt_config.yaml file, without requiring any code changes. The ability to switch between a RAG-powered path and a supervisor-controlled path provides a critical layer of safety and control, ensuring that sensitive queries are not handled by a generic, unconstrained generative model.

While performance metrics such as response latency and accuracy are dependent on the underlying hardware and LLM used, the architectural framework itself demonstrates a robust and scalable approach. The use of a local vector database (Chroma) and a local Hugging Face model (meta-llama/Meta-Llama-3-8B-Instruct) allows for low-latency inference while keeping data in a secure, on-premise environment.

4. Conclusion

The Tigress AI Assistant framework offers a practical and powerful solution for deploying intelligent customer support chatbots. By combining a Retrieval-Augmented Generation (RAG) system, a prompt management module, and a hierarchical supervisor within a flexible LangGraph workflow, the system can handle a wide range of user queries with context-awareness and policy adherence. Its modular design and integration with the secure Matrix protocol make it a strong candidate for enterprise-level applications where data privacy and customizability are paramount. Future work will focus on integrating more sophisticated tools (e.g., scheduling APIs, ticketing systems) and implementing a human-in-the-loop mechanism for seamless escalation.

Github Repository URL:

https://github.com/AhmadTigress/customer-s_support_agent/tree/main

License

This project is licensed under the MIT License

Connect with Me

GitHub: AhmadTigress
X (Twitter): @AhmadTigress
Kaggle: davidrufaieneye
Hugging Face: AhmadTigress