The goal of the project will be to build a robust generative search system capable of effectively and accurately answering questions from various insurance policy documents. Using LlamaIndex to build the generative search application.
1.Background and Objective:
The HelpMate AI project is designed to develop a Retrieval-Augmented Generation (RAG)
system using LlamaIndex. The system aims to process insurance documents, extract relevant
information, and provide accurate responses to user queries.
2.Objective:
The overall objective of the project is to build an effective generative AI system capable of
accurately answering queries from a policy document
❖ Develop a RAG system capable of processing insurance policy documents.
❖ Implement LlamaIndex to enable efficient document retrieval and summarization.
❖ Enhance user interaction through a chatbot interface.
❖ Build a robust testing pipeline to evaluate response accuracy.
❖ Integrate OpenAI GPT3.5 turbo LLM for improved response generation.
3.System design:
3.1 Architecture overview:
The system is built on a multi-layered architecture, as depicted in the system diagram:
Step 1: Build the Vector Store
• Embeddings: Converts insurance policy documents into numerical vector
representations.
• Vector Database: Stores the embeddings for efficient similarity searches.
Step 2: Cache, Search, Re-rank
• Query Processing: A user submits a query, which is first checked against a cache.
• Index Search Cache: If the query exists in the cache, the system retrieves results quickly.
• Vector DB Search: If the query is not found in the cache, it searches the main vector
database for the closest document chunks.
• Re-ranking with Cross Encoders: The top-k retrieved documents are re-ranked using a
cross-encoder model to improve relevance.
Step 3: Generative Search
• Query + Prompt + Top 3 Documents are passed to an LLM (Large Language Model) for
response generation.
• The final response is returned to the user.
3.2 Data Sources
The system processes various data sources, including:
• Internal Document Repositories: Policy documents stored in cloud storage.
• External Databases: APIs providing real-time policy updates.
• User Queries: Input provided by users for processing and retrieval.
3.3 Technologies Used
The project uses the following technologies:
Component
Technology Used
Development Platform
Google Colab
Programming Language
Python (pandas, OpenAI API, LlamaIndex, LangChain)
Cloud Storage
Google Drive
LLM Hosting
OpenAI API for natural language generation
Indexing & Retrieval
LlamaIndex with vector-based search
Embedding Model
Sentence Transformers for vectorization
Search Optimization
Cross-Encoders for document re-ranking
4. Implementation Details
Data Loading
• Documents are stored in Google Drive and accessed via a SimpleDirectoryReader.
• Data is cleaned, normalized, and indexed for efficient search operations.
Query Engine Development
• Documents are converted into structured vector nodes.
• VectorStoreIndex enables efficient similarity search.
• Queries are processed via a chatbot that fetches and displays relevant responses.
• A custom prompt template is implemented to enhance response quality.
LLM Integration and Usage
• The system uses an OpenAI API key to access GPT models.
• API requests are securely managed via environment variables.
• OpenAI's GPT-3.5-turbo generates responses dynamically.
• Fine-tuned models are explored for domain-specific understanding.
• Prompt engineering techniques optimize response quality.
5. Evaluation and Testing - A set of predefined questions is used to test accuracy. - Responses are validated using a similarity score and user feedback. - Continuous monitoring and logging ensure system reliability and improvements over time.
6. Challenges and Solutions
Challenge
Impact
Solution
Data Quality
Issues
Inconsistent formatting in
documents caused retrieval
errors.
Implemented preprocessing techniques
such as text normalization and
metadata correction.
Query
Optimization
Search results were sometimes
inconsistent or irrelevant.
Used custom embedding models and
cross-encoder re-ranking to improve
retrieval accuracy.
Scalability
Concerns
Large document sets slowed
down processing and retrieval.
Applied chunk-based indexing and
optimized retrieval strategies.
LLM Response
Accuracy
AI-generated responses were
sometimes misleading or
incorrect.
Fine-tuned OpenAI’s LLM and improved
prompt engineering techniques.
Integration
Challenges
Difficulty in coordinating
LlamaIndex, OpenAI API, and
vector search.
Designed a modular system for
optimized API interactions.
Testing &
Evaluation
Needed to validate AI
responses to avoid
hallucinations.
Built an automated testing pipeline with
human validation checkpoints.
7. Lessons Learned
• Proper document formatting significantly improves retrieval accuracy.
• Custom embedding models enhance search performance.
• User feedback is crucial for refining the chatbot’s performance.
• Cross-Encoders significantly improve the quality of retrieved documents.
• LLM prompt engineering improves response generation quality.
8. Future Enhancements
• Implementing HyDE or FLARE for improved document retrieval.
• Expanding chatbot capabilities for multi-turn conversations.
• Leveraging fine-tuned LLMs to improve response quality.
• Enhancing the system’s ability to process real-time policy updates from APIs.
9. Summary of steps followed in Creating AI Assistant for HDFC Policy Document Using
RAG Pipeline
The basic RAG pipeline in Llama Index is illustrated below.
• We started by importing the necessary Libraries for implementing Llama Index.
• Then mounted our google drive to retrieve the data from that location and we set the
API Key for authentication at OpenAI Platform.
• Checked whether the API was working properly, and asked some general query to the
Open AI and got a very generic response. Our aim is to improve this response using RAG
Pipeline.
• We imported some more important libraries and loaded the seven HDFC policy
documents. Counted for the number of documents loaded and ensured that the count
matched
• Next, we built the query engine. For this, necessary libraries were imported, built the
parser and parsed the documents into nodes, We built the index and finally built the
query engine.
• Then we checked the response using the query engine and found it to be much more
accurate compared to our earlier response (without RAG pipeline).
• We checked the source node and meta data of the response.
• We extracted File name and page no of the response.
• We also extracted the response score and found it to be 0.90. so the query vector is
similar to the document vector.
• Then created a query response pipeline, which takes a user input and returns a response.
• As a next step, created a function for the user to initialize conversation and get response
based on the query. The function also has an option to exit and terminate the
conversation.
• Then we created a function to test our pipeline and also created a list of three queries.
We provided feedback to each of the responses to the three queries and based on the
feedback, created a customized template to fine tune the response. (reducing detailing
in responses and creating more concise responses)
• Imported the necessary libraries and fine tuned few parameters such as temperature,
max token, chunk size, chunk overlap, context window, similarily_top_k to get a less
detailed and concise response.
• Finally, our Customized prompt template is ready to take queries and provide accurate
responses.
10. Conclusion
The HelpMate AI RAG system successfully processed insurance documents and provided
accurate responses. The integration of OpenAI’s LLM, vector search, and document re-ranking
significantly enhanced the system’s ability to generate contextually accurate answers. Future
improvements will focus on refining retrieval mechanisms, expanding data sources, and
improving model efficiency.
Project submitted by
Vikhyat Negi