Finance Analyst AI: A Retrieval-Augmented Generation System for Financial Document Question Answering

Abstract

Finance Analyst AI is a Retrieval-Augmented Generation (RAG) chatbot that enables natural language question-answering over financial documents. Users upload a PDF annual report, and the system automatically processes, chunks, embeds, and stores the content in a persistent vector database. At query time, relevant document segments are retrieved and passed to a large language model (LLM) to generate grounded, document-faithful responses.

The system is built using LangChain, ChromaDB, Groq's LLaMA-3.1-8B-Instant for generation, and HuggingFace all-MiniLM-L6-v2 for embeddings, and is served through a Streamlit-based interface. It incorporates conversational memory to support multi-turn dialogue and is explicitly designed to avoid hallucinations by restricting responses strictly to retrieved context.

Methodology

The system follows a modular six-stage pipeline:

Document Upload & Loading

Users upload a PDF via the Streamlit interface. The document is processed using LangChain’s PyPDFLoader to extract raw text.

Text Chunking

Extracted text is split into chunks of 800 tokens with a 100-token overlap using RecursiveCharacterTextSplitter. This ensures contextual continuity across chunk boundaries.

Embedding Generation

Each chunk is embedded using the HuggingFace all-MiniLM-L6-v2 model to generate dense vector representations.

Vector Storage

Embeddings are stored in a persistent ChromaDB vector database, enabling efficient similarity-based retrieval across sessions.

Retrieval

At query time, the user query is embedded using the same model, and the top-k (k=5) most semantically similar chunks are retrieved.

Generation with Memory

Retrieved context is injected into a structured YAML-based prompt that defines role, constraints, and output format. The Groq LLaMA-3.1-8B-Instant model generates responses grounded strictly in retrieved context.
Conversational continuity is maintained using ConversationSummaryMemory, which compresses prior interactions into a rolling summary.

To assess system performance, RAGAS-based evaluation metrics were implemented, focusing on retrieval quality and answer grounding.

System Architecture

The system follows a standard Retrieval-Augmented Generation pipeline integrating document ingestion, semantic retrieval, and grounded response generation.

Pipeline Overview:

PDF → Load → Chunk → Embed → Vector Database
User Query → Embed → Retrieve → Prompt → LLM → Response
Memory is integrated into the prompt for multi-turn interactions

Results & Evaluation

The system was evaluated using both qualitative testing and quantitative metrics to assess retrieval and generation performance.

Quantitative Evaluation (RAGAS)

We implemented the RAGAS framework to evaluate the quality of the Retrieval-Augmented Generation pipeline.

Faithfulness: 0.70
Context Recall: 0.97
Context Precision: 0.40

Interpretation

High context recall (0.97) indicates that the retriever successfully captures most relevant information.
Moderate faithfulness (0.70) suggests responses are generally grounded but may include minor unsupported details.
Low context precision (0.40) shows that some irrelevant chunks are retrieved, introducing noise into the generation process.

Qualitative Results

Accurately answered financial queries including revenue, expenses, and risk factors
Maintained coherent multi-turn conversations using memory
Avoided hallucinations through strict prompt constraints
Returned fallback responses when context was insufficient

Evaluation Framework

The evaluation process follows a structured pipeline:
Query → Retrieved Context → LLM Response → Metric Evaluation (RAGAS)

Limitations

Performance depends on the quality of PDF text extraction
Scanned or image-based documents degrade performance due to lack of OCR
Retrieval precision is relatively low, introducing irrelevant context in some cases
Single-document focus limits broader comparative analysis

Significance and Applications

This project demonstrates how Retrieval-Augmented Generation can be effectively applied to financial document analysis.

Automated Financial Insights: Enables users to query complex reports using natural language
Reduced Hallucination Risk: Ensures responses are grounded strictly in source documents
Decision Support: Helps analysts quickly extract key metrics and trends
Scalability: Architecture can be extended to multi-document and enterprise-scale systems

Beyond finance, this approach is applicable in domains requiring high factual accuracy, such as healthcare, legal analysis, and compliance systems.

Conclusion

Finance Analyst AI presents a practical implementation of a Retrieval-Augmented Generation system tailored for financial documents. By combining semantic retrieval, prompt engineering, and conversational memory, the system delivers grounded and context-aware responses.

While the system demonstrates strong retrieval recall and reliable grounding, improvements in retrieval precision and document handling can further enhance performance. Future work includes hybrid retrieval methods, re-ranking strategies, and support for multi-document querying.

Financial Analyst AI RAG Assistant