Finance Analyst AI is a Retrieval-Augmented Generation (RAG) chatbot that enables natural language question-answering over financial documents. Users upload a PDF annual report, and the system automatically processes, chunks, embeds, and stores the content in a persistent vector database. At query time, relevant document segments are retrieved and passed to a large language model (LLM) to generate grounded, document-faithful responses.
The system is built using LangChain, ChromaDB, Groq's LLaMA-3.1-8B-Instant for generation, and HuggingFace all-MiniLM-L6-v2 for embeddings, and is served through a Streamlit-based interface. It incorporates conversational memory to support multi-turn dialogue and is explicitly designed to avoid hallucinations by restricting responses strictly to retrieved context.
The system follows a modular six-stage pipeline:
Users upload a PDF via the Streamlit interface. The document is processed using LangChainβs PyPDFLoader to extract raw text.
Extracted text is split into chunks of 800 tokens with a 100-token overlap using RecursiveCharacterTextSplitter. This ensures contextual continuity across chunk boundaries.
Each chunk is embedded using the HuggingFace all-MiniLM-L6-v2 model to generate dense vector representations.
Embeddings are stored in a persistent ChromaDB vector database, enabling efficient similarity-based retrieval across sessions.
At query time, the user query is embedded using the same model, and the top-k (k=5) most semantically similar chunks are retrieved.
Retrieved context is injected into a structured YAML-based prompt that defines role, constraints, and output format. The Groq LLaMA-3.1-8B-Instant model generates responses grounded strictly in retrieved context.
Conversational continuity is maintained using ConversationSummaryMemory, which compresses prior interactions into a rolling summary.
To assess system performance, RAGAS-based evaluation metrics were implemented, focusing on retrieval quality and answer grounding.
The system follows a standard Retrieval-Augmented Generation pipeline integrating document ingestion, semantic retrieval, and grounded response generation.
The system was evaluated using both qualitative testing and quantitative metrics to assess retrieval and generation performance.
We implemented the RAGAS framework to evaluate the quality of the Retrieval-Augmented Generation pipeline.
The evaluation process follows a structured pipeline:
Query β Retrieved Context β LLM Response β Metric Evaluation (RAGAS)
Performance depends on the quality of PDF text extraction
Scanned or image-based documents degrade performance due to lack of OCR
Retrieval precision is relatively low, introducing irrelevant context in some cases
Single-document focus limits broader comparative analysis
This project demonstrates how Retrieval-Augmented Generation can be effectively applied to financial document analysis.
Automated Financial Insights: Enables users to query complex reports using natural language
Reduced Hallucination Risk: Ensures responses are grounded strictly in source documents
Decision Support: Helps analysts quickly extract key metrics and trends
Scalability: Architecture can be extended to multi-document and enterprise-scale systems
Beyond finance, this approach is applicable in domains requiring high factual accuracy, such as healthcare, legal analysis, and compliance systems.
Finance Analyst AI presents a practical implementation of a Retrieval-Augmented Generation system tailored for financial documents. By combining semantic retrieval, prompt engineering, and conversational memory, the system delivers grounded and context-aware responses.
While the system demonstrates strong retrieval recall and reliable grounding, improvements in retrieval precision and document handling can further enhance performance. Future work includes hybrid retrieval methods, re-ranking strategies, and support for multi-document querying.