TL;DR: This publication documents an advanced dual-agent RAG system (v2.0) featuring Traditional Orchestrator (fast, single-step) and ReAct Agentic System (comprehensive, multi-step reasoning). Built from the ground up with dual cognitive approaches, the system processes complex insurance documents and provides users choice between speed-optimized and reasoning-optimized query execution. This represents a complete architectural evolution from the basic single-pipeline RAG system in v1.0.
Insurance documents pose unique challenges: complex tables, dense legal text, and cross-referenced content. This publication presents the evolution from a basic RAG pipeline (v1.0) to a sophisticated dual-agent architecture (v2.0) that intelligently handles both simple and complex insurance queries.
Key Innovation: Users choose between two execution systems:
Technical Highlights: 4 specialized agents, learning-enabled classifier, 35+ test cases, premium calculations with GST, policy comparisons, 79% code reduction through modularization.
Stack: Django 5.1 + Streamlit 1.40 + LangChain 0.3 + ChromaDB 0.5 + Azure OpenAI
v1.0 (RAG Expert): Single-pipeline document search
v2.0 (Agentic Module): Dual-agent architecture with intelligent routing
v1.0: User Query β Document Search β Return Results
v2.0: User Query β Choose System:
ββ Traditional (FAST) β One agent β Result
ββ ReAct (COMPREHENSIVE) β Multi-step reasoning β Result
| Dimension | v1.0 | v2.0 | Impact |
|---|---|---|---|
| ποΈ Architecture | Single pipeline | Dual-agent (Traditional + ReAct) | User choice optimization |
| π€ Agents | 0 | 4 specialized | Domain expertise |
| π Query Types | Search only | Search + Premium + Comparison | 3x capability |
| β‘ Speed Options | 3-8s (one option) | 3-5s (fast) / 5-15s (deep) | Flexible trade-offs |
| π§ Tools/Query | 1 tool | 1 (Trad) / 3-5 (ReAct) | Dynamic chaining |
| π Reasoning | β Hidden | β Transparent (ReAct) | Trust & debugging |
| π Learning | Static | Pattern learning | Continuous improvement |
| π Evaluation | 1 metric | 3D metrics | Enhanced quality |
| π§ͺ Testing | Basic | 35+ test cases | 7x coverage |
| π¦ Code Quality | Monolithic | 79% reduction | Maintainability |
Design Philosophy: "Not all queries are equalβsimple questions deserve fast answers, complex ones deserve deep reasoning."
Traditional Orchestrator:
ReAct Agentic System:
π‘ TIP: Users select interface based on query complexityβPort 8502 (fast) or Port 8503 (comprehensive).
Single Query, Multiple Steps:
Query: "Calculate premium for age 35, compare with ActivFit, recommend cheaper"
Iteration 1 (2.1s): Calculate ActivAssure premium β βΉ15,000
Iteration 2 (2.8s): Retrieve ActivFit premium β βΉ12,000
Iteration 3 (0.9s): Compare features β Side-by-side analysis
Iteration 4 (0.3s): Finish β "ActivFit saves βΉ3,000/year"
Total: 4 iterations, 3 tools, 8.7 seconds
vs Traditional: Would require 3 separate queries + manual analysis
Capabilities:
Example:
Input: "2 adults aged 35 and 42, 1 child aged 8, 10L cover"
Output: Base βΉ18,500 + GST βΉ3,330 = Total βΉ21,830
Speed: 3.2 seconds
Traditional System:
ReAct System:
Features:
Example Output:
| Feature | ActivFit | ActivAssure |
|---|---|---|
| Premium (35) | βΉ12,000 | βΉ15,000 |
| Waiting Period | 30 days | 90 days |
| Room Rent | 1% SI | 2% SI |
Recommendation: ActivFit - βΉ3,000/year savings
Insurance documents are uniquely difficult to process:
Technical Complexity:
Business Requirements:
β οΈ CAUTION: Errors in insurance documents can lead to compliance violations, financial losses, and customer dissatisfactionβdemanding robust validation.
| Approach | Limitation |
|---|---|
| Manual Processing | Hours per document, not scalable |
| Simple OCR | Misses semantic relationships |
| Rule-Based Systems | Brittle, high maintenance |
| Generic RAG | Poor table handling, no domain expertise |
v2.0 addresses these challenges with:
β
Two Query Systems: Traditional (fast) + ReAct (comprehensive)
β
Semantic Chunking: Embedding-based segmentation (0.75 threshold)
β
Table Intelligence: Multi-page merging with header matching
β
4 Specialized Agents: Orchestrator, Retrieval, Premium, Comparison
β
9 Tools: 4 built-in (ChromaDB, OpenAI) + 5 custom (Excel, PDF, chunker, evaluator, classifier)
β
Human-in-the-Loop: Manual validation at critical points
β
3D Evaluation: Coverage + similarity + diversity metrics
Technology Stack: Django 5.1 + Streamlit 1.40 + LangChain 0.3 + ChromaDB 0.5 + Azure OpenAI
The v2.0 architecture provides two independent query execution paths, each optimized for different complexity levels:
ββββββββββββββββββββ User Query ββββββββββββββββββββ
β
βββββββββββββΌββββββββββββ
β Choose System β
β Traditional or ReAct?β
βββββββββ¬ββββββββ¬ββββββββ
β β
βββββββββββββ βββββββββββββ
β β
βββββββββΌβββββββββ ββββββββΌββββββββββ
β TRADITIONAL β β REACT AGENTIC β
β ORCHESTRATOR β β SYSTEM β
β (Port 8502) β β (Port 8503) β
ββββββββββββββββββ€ ββββββββββββββββββ€
β β’ 3-5 seconds β β β’ 5-15 seconds β
β β’ One agent β β β’ Multi-tool β
β β’ Deterministicβ β β’ Adaptive β
βββββββββ¬βββββββββ βββββββββ¬βββββββββ
β β
βββββββββββββ¬ββββββββββββββββββββ
β
βββββββββββββΌβββββββββββ
β 4 Specialized β
β Agents (Shared) β
ββββββββββββββββββββββββ€
β β’ Orchestrator β
β β’ Retrieval β
β β’ Premium Calculator β
β β’ Comparison β
ββββββββββββββββββββββββ
β
βββββββββββββΌβββββββββββ
β Services & Storage β
ββββββββββββββββββββββββ€
β β’ ChromaDB (Vectors) β
β β’ Azure OpenAI β
β β’ Django REST API β
ββββββββββββββββββββββββ
| Need | System | Speed | Complexity |
|---|---|---|---|
| Quick answer | Traditional | 3-5s | Single-step |
| Deep analysis | ReAct | 5-15s | Multi-step |
Example Queries:
Traditional: "What is waiting period?" / "Calculate premium for age 35"
ReAct: "Calculate premium for age 35, compare with ActivFit, and recommend cheaper option"
| Component | Version | Purpose |
|---|---|---|
| Django | 5.1.4 | Backend API + ORM |
| Django REST | 3.15.2 | RESTful endpoints |
| Streamlit | 1.40.2 | Interactive UIs |
| Component | Version | Purpose |
|---|---|---|
| LangChain | 0.3.27 | Agent orchestration |
| ChromaDB | 0.5.23 | Vector storage |
| Azure OpenAI | text-ada-002 | Embeddings (1536D) |
| Azure OpenAI | gpt-35-turbo | Chat completion |
| Scikit-learn | 1.5.2 | Semantic chunking |
| Component | Version | Purpose |
|---|---|---|
| PDFPlumber | 0.11.4 | Table extraction |
| Pandas | 2.2.3 | Data manipulation |
π‘ TIP: Stack balances cutting-edge AI with production stabilityβall dependencies actively maintained with security updates.
Design: Single-step intelligent routing to specialized agents.
Query β Intent Classifier β Agent Selection β Execute β Return
(Sub-10ms) (One agent) (3-5s)
Method: Pattern matching + keyword detection
Categories: retrieval, premium_calculation, comparison, general
def detect_intent(query: str) -> str: query_lower = query.lower() if any(kw in query_lower for kw in ['premium', 'cost', 'calculate']): return 'premium_calculation' elif any(kw in query_lower for kw in ['compare', 'versus', 'vs']): return 'comparison' return 'retrieval' # Default
Query: "Calculate premium for 2 adults aged 35 and 40 with 5L cover"
Step 1: Classify β PREMIUM_CALCULATION (8ms)
Step 2: Route β Premium Calculator Agent
Step 3: Extract β {ages: [35, 40], sum_insured: 500000}
Step 4: Calculate β Age band 35-45, Family floater, GST 18%
Step 5: Return β "βΉ45,000 (Base: βΉ38,135 + GST: βΉ6,865)"
Total: 3.2 seconds
Purpose: Semantic search across document corpus
Features:
Example:
retriever.retrieve( query="What is waiting period?", k=5, doc_type="policy", exclude_types=["brochure"] )
Purpose: Insurance premium calculations
Capabilities:
Supported Configurations:
Age Bands: 18-35, 36-45, 46-55, 56-60, 61-65, 66-70, 71-75, 76-80
Purpose: Multi-policy analysis
Process:
Output Format:
| Feature | Product A | Product B |
|---|---|---|
| Premium | βΉ12,000 | βΉ15,000 |
| Coverage | Details | Details |
POST /agents/query/
Request:
{ "query": "Calculate premium for age 35", "chroma_db_dir": "media/output/chroma_db/ActivAssure", "k": 5, "conversation_id": "user_123" }
Response:
{ "query": "Calculate premium for age 35", "response": "Annual premium: βΉ15,000", "agent_type": "premium_calculation", "confidence": 0.92, "execution_time": 3.2, "sources": [{"content": "...", "page": 5}] }
Design: Iterative reasoning loop with dynamic tool selection (max 10 iterations).
Query β ReAct Loop:
ββ THOUGHT: Analyze situation
ββ ACTION: Select tool + execute
ββ OBSERVATION: Process result
ββ Repeat until FINISH
Query: "Calculate premium for age 35, compare with ActivFit, recommend cheaper"
Iteration 1 (2.1s):
π THOUGHT: "Need to calculate ActivAssure premium first"
π§ ACTION: premium_calculator(age=35, sum_insured=500000)
ποΈ OBSERVATION: "Premium: βΉ15,000"
Iteration 2 (2.8s):
π THOUGHT: "Need ActivFit premium for comparison"
π§ ACTION: document_retriever(query="ActivFit premium age 35", k=3)
ποΈ OBSERVATION: "ActivFit: βΉ12,000 for Individual age 35"
Iteration 3 (0.9s):
π THOUGHT: "Need feature comparison"
π§ ACTION: product_comparator(products=["ActivAssure", "ActivFit"])
ποΈ OBSERVATION: "Comparison table retrieved"
Iteration 4 (0.3s):
π THOUGHT: "Have all info, can provide final answer"
π§ ACTION: finish
β
ANSWER: "ActivAssure: βΉ15,000. ActivFit: βΉ12,000 (βΉ3,000 cheaper).
Recommendation: ActivFit offers better valueβsaves βΉ3,000/year."
Metadata:
| Tool | Purpose | Parameters |
|---|---|---|
document_retriever | Search documents | query, k=5, doc_type |
premium_calculator | Calculate premiums | age, sum_insured, config |
product_comparator | Compare products | products[], criteria |
excel_query | Query Excel data | query, workbook |
finish | Complete reasoning | final_answer |
POST /agents/agentic/query/
Request:
{ "query": "Calculate for age 35, compare with ActivFit", "chroma_db_dir": "media/output/chroma_db/ActivAssure", "k": 5 }
Response:
{ "query": "...", "final_answer": "ActivAssure: βΉ15,000. ActivFit: βΉ12,000...", "reasoning_trace": [ { "iteration": 1, "thought": "...", "action": "premium_calculator", "observation": "...", "execution_time": 2.1 } ], "total_iterations": 4, "tools_used": ["premium_calculator", "document_retriever"], "total_execution_time": 8.7 }
Advantages:
Trade-offs:
π‘ TIP: Use Traditional for 80% of queries (fast), ReAct for 20% requiring deep analysis.
1. Semantic Search
text-embedding-ada-002)2. Document Type Filtering
where clause3. Context Assembly
4. LLM Response Generation
gpt-35-turbo)class RetrievalAgent: def retrieve(self, query: str, k: int = 5, doc_type: str = None) -> dict: """ Semantic retrieval with optional document type filtering """ # Generate query embedding query_embedding = self.embedding_model.embed_query(query) # Build ChromaDB query with filtering query_params = {"query_embeddings": [query_embedding], "n_results": k} if doc_type and doc_type != "all": query_params["where"] = {"doc_type": doc_type} # Execute search and generate LLM response results = self.collection.query(**query_params) context = self._build_context(results) answer = self.llm.invoke(self._format_prompt(context, query)) return {"answer": answer, "sources": self._extract_sources(results)}
Performance Characteristics:
Full Implementation:
backend/agents/retrieval_agent.pyincludes context assembly, deduplication, and comprehensive error handling.
TIP: For frequently asked questions, implement a caching layer that stores query embeddings and responses. This can reduce response time by 70% for cache hits.
The Premium Calculator Agent is a domain-specific agent that performs insurance premium calculations based on policy workbooks. This agent demonstrates the power of specialized agents in multi-agent architectures.
1. Mixed Age Format Support
2. Policy Type Handling
3. Excel Workbook Registry
class PremiumCalculatorAgent: def calculate(self, query: str, context: dict) -> dict: """Calculate insurance premium from natural language query""" # Extract parameters: policy_name, adults, children, sum_insured params = self._extract_parameters(query) # Load policy workbook and detect format workbook = self._load_workbook(params['policy_name']) age_format = self._detect_age_format(workbook) # 'exact' or 'age_band' # Calculate based on format premium = (self._calculate_exact_age(params, workbook) if age_format == 'exact' else self._calculate_age_band(params, workbook)) # Add GST and format response gst, total = premium * 0.18, premium * 1.18 return { "answer": self._format_answer(params, premium, gst, total), "calculation": {"gross_premium": premium, "gst": gst, "total": total} }
Key Features:
Full Implementation: See
backend/agents/calculators/for Excel workbook registry, age band mapping, and discount calculations.
Query: "Calculate premium for ActivAssure with 2 adults aged 32 and 45, 1 child aged 8, sum insured 5 lakhs"
Response:
**Premium Calculation for ActivAssure**
**Family Composition:** 2 Adult(s) + 1 Child(ren)
**Sum Insured:** βΉ5,00,000
**Age Band:** 31-35 (adult 1), 46-50 (adult 2), 6-10 (child)
**Premium Breakdown:**
- Gross Premium: βΉ16,563.00
- GST (18%): βΉ2,981.34
- **Total Premium: βΉ19,544.34**
All premiums are annual and include applicable taxes.
CAUTION: Premium calculations are estimates based on available policy workbooks. Always verify final premiums with official insurance provider documentation and account for rider options, medical conditions, and other factors not captured in base calculations.
The Comparison Agent enables side-by-side analysis of multiple insurance policies, helping users make informed decisions.
1. Multi-Policy Retrieval
2. Feature Extraction
3. Structured Output
class ComparisonAgent: def compare(self, query: str, context: dict) -> dict: """Compare multiple insurance policies side-by-side""" policies = self._extract_policy_names(query) # Retrieve key information for each policy policy_data = { policy: self._retrieve_policy_info(policy) for policy in policies } # Generate structured comparison comparison = self._generate_comparison_table(policy_data) answer = self._format_comparison(comparison) return {"answer": answer, "comparison_data": comparison}
Comparison Features:
Extension Opportunity: Integrate with Premium Calculator to show cost comparisons for same family composition across policies.
TIP: The comparison agent can be extended to include premium calculations for each policy with the same family composition, providing a complete cost-benefit analysis.
The multi-agent system can handle complex queries requiring multiple agents:
Example: Complex Query
This sophisticated coordination enables the system to handle real-world insurance queries that often involve multiple steps and decision points.
The ReAct (Reasoning + Acting) Agentic System represents an advanced query execution paradigm that enables complex, multi-step reasoning through an iterative ThoughtβActionβObservation loop. Unlike the traditional orchestrator's single-step routing, ReAct dynamically chains multiple tools based on intermediate results, making it ideal for complex insurance queries that require sequential decision-making.
ReAct Philosophy: Instead of directly answering a query, the agent reasons about what actions to take, observes the results, and iteratively refines its approach until reaching a comprehensive answer.
Key Components:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ReAct Iterative Loop β
β (Maximum 10 iterations) β
βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Iteration N β
βββββββββββββββββββββββββββ€
β 1. THOUGHT β
β - Analyze state β
β - Plan next action β
β - Consider context β
βββββββββββββββββββββββββββ€
β 2. ACTION β
β - Select tool β
β - Format input β
β - Execute β
βββββββββββββββββββββββββββ€
β 3. OBSERVATION β
β - Receive result β
β - Update context β
β - Check if done β
βββββββββββββββ¬ββββββββββββ
β
βββββ Continue? βββββΊ Next Iteration
β
βββββ Done? βββββββββΊ Final Answer
# backend/agents/agentic/agentic_system.py class AgenticSystem: def __init__(self, llm, calculator, comparator, retriever): """Initialize ReAct-based system""" # Learning classifier for pattern recognition self.classifier = LearningIntentClassifier(llm) # Create ReAct tool wrappers self.react_tools = { 'premium_calculator': PremiumCalculatorTool(calculator), 'policy_comparator': PolicyComparatorTool(comparator), 'document_retriever': DocumentRetrieverTool(retriever) } # ReAct agent (primary execution engine) self.react_agent = ReActAgent(llm, self.react_tools) def process_query(self, query: str, context: Dict) -> Dict: """Process query using ReAct iterative reasoning""" # Run ReAct loop for dynamic execution react_result = self.react_agent.run(query, context, max_iterations=10) # Classify intent for learning classification = self.classifier.classify(query, context) # Learn from execution inferred_intent = self._infer_intent_from_react(react_result) self.classifier.learn_from_feedback( query, classification['intent'], inferred_intent, context ) return { 'mode': 'react', 'reasoning_trace': react_result['reasoning_trace'], 'final_answer': react_result['final_answer'], 'success': react_result['success'], 'agentic_metadata': { 'reasoning_iterations': react_result['iterations'], 'tools_used': react_result['tools_used'], 'learning_applied': True, 'react_enabled': True } }
# backend/agents/agentic/react_agent.py class ReActAgent: def run(self, query: str, context: Dict, max_iterations: int = 10) -> Dict: """Execute ReAct loop""" trace = ReActTrace(query=query) while trace.current_iteration < max_iterations: # Step 1: Generate thought and decide action thought, action, action_input = self._generate_step(trace, context) trace.add_step(ReActStep( step_type=ReActStepType.THOUGHT, content=thought )) # Check if finished if action == "finish": trace.final_answer = action_input.get('answer', '') trace.success = True break # Step 2: Execute action (use tool) observation = self._execute_action(action, action_input, context) trace.add_step(ReActStep( step_type=ReActStepType.ACTION, content=f"{action}({action_input})", tool_used=action )) # Step 3: Record observation trace.add_step(ReActStep( step_type=ReActStepType.OBSERVATION, content=str(observation)[:500], tool_output=observation )) return trace.to_dict() def _generate_step(self, trace, context): """Use LLM to generate next reasoning step""" prompt = self._build_react_prompt(trace, context) response = self.llm.invoke(prompt) # Parse LLM output to extract: # Thought: "I need to calculate premium first" # Action: premium_calculator # Action Input: {"age": 35, "sum_insured": 500000} return self._parse_llm_response(response.content)
The ReAct system includes a learning component that improves intent classification over time by analyzing which tools were actually used during execution.
# backend/agents/agentic/intent_learner.py class LearningIntentClassifier: def __init__(self, llm): self.llm = llm self.execution_patterns = [] # Historical execution data self.pattern_cache = {} # Cached patterns for fast lookup def classify(self, query: str, context: Dict) -> Dict: """Classify intent using LLM + learned patterns""" # Check pattern cache first if cached_intent := self._check_cache(query): return {'intent': cached_intent, 'confidence': 0.9, 'source': 'cache'} # Use LLM for classification prompt = f""" Based on historical patterns, classify this insurance query: Query: {query} Intent options: PREMIUM_CALCULATION, DOCUMENT_RETRIEVAL, POLICY_COMPARISON, COMPLEX_QUERY """ response = self.llm.invoke(prompt) return {'intent': response.content.strip(), 'confidence': 0.7, 'source': 'llm'} def learn_from_feedback(self, query: str, predicted: str, actual: str, context: Dict): """Learn from execution results""" self.execution_patterns.append({ 'query': query, 'predicted_intent': predicted, 'actual_intent': actual, 'match': predicted == actual, 'timestamp': context.get('timestamp') }) # Update pattern cache self._update_pattern_cache(query, actual)
Each specialized agent is wrapped as a ReAct tool with a standardized interface:
# backend/agents/agentic/react_tools.py class PremiumCalculatorTool(ReActTool): def __init__(self, calculator: PremiumCalculator): super().__init__( name="premium_calculator", description="Calculate insurance premiums based on age, sum insured, and policy type" ) self.calculator = calculator def execute(self, action_input: str, context: Dict) -> str: """Execute premium calculation and return observation""" params = json.loads(action_input) result = self.calculator.calculate_premium( policy_type=params.get('policy_type', 'family_floater'), members=params.get('members', []), sum_insured=params.get('sum_insured') ) if result.get('error'): return f"Error: {result['error']}" return f"Premium calculated: βΉ{result['total_premium']:,.2f}"
Let's trace a complex query through the ReAct system:
Query: "Calculate premium for 2 adults aged 32 and 45, then compare with ActivFit to see which is cheaper"
Iteration 1: Initial Assessment
THOUGHT: "This query has two parts: first calculate premium, then compare with ActivFit.
I should start by calculating the premium with the given parameters."
ACTION: premium_calculator
ACTION_INPUT: {
"policy_type": "family_floater",
"members": [{"age": 32}, {"age": 45}],
"sum_insured": 500000
}
OBSERVATION: "Premium calculated: βΉ45,000 (Gross: βΉ38,135 + GST: βΉ6,865)"
Iteration 2: Retrieve Product Information
THOUGHT: "I have the calculated premium (βΉ45,000). Now I need to find ActivFit's
premium for the same family composition to make a comparison."
ACTION: document_retriever
ACTION_INPUT: {
"query": "ActivFit premium 2 adults aged 32 and 45",
"product": "ActivFit",
"k": 5
}
OBSERVATION: "ActivFit premium details: For 2 adults (ages 32-45 band), sum insured 5L:
Base premium βΉ35,000 + GST βΉ6,300 = Total βΉ41,300"
Iteration 3: Comparison Analysis
THOUGHT: "I now have both premiums:
- Calculated: βΉ45,000
- ActivFit: βΉ41,300
I can now provide a complete comparison and recommendation."
ACTION: finish
FINAL_ANSWER: "Based on your requirements (2 adults aged 32 and 45 with 5L cover):
Your calculated premium: βΉ45,000
ActivFit premium: βΉ41,300
**ActivFit is cheaper by βΉ3,700 annually** (8.2% savings).
Recommendation: ActivFit offers better value for your family composition."
Metadata:
{ "total_iterations": 3, "tools_used": ["premium_calculator", "document_retriever", "finish"], "execution_time": "9.4 seconds", "learning_applied": true, "reasoning_steps_visible": true }
| Aspect | Traditional Orchestrator | ReAct Agentic System |
|---|---|---|
| Execution Model | Synchronous, single-pass | Iterative, multi-pass |
| State Management | Stateless (context per call) | Stateful (trace accumulation) |
| Tool Selection | Pre-determined by intent | Dynamic based on observations |
| Error Recovery | Fail fast | Can retry with different tools |
| Context Size | Fixed (single query) | Growing (accumulates observations) |
| Code Complexity | ~180 lines (orchestrator.py) | ~900 lines (4 files) |
| Token Usage | Low (1-2 LLM calls) | High (3-10+ LLM calls) |
| Latency | 3-5 seconds | 5-15 seconds |
| Cost | Lower (fewer API calls) | Higher (more API calls) |
| Transparency | Limited (intent + result) | Full (reasoning trace) |
Scenario 1: Conditional Logic
Query: "If premium for age 45 exceeds βΉ20,000, show me cheaper alternatives"
ReAct handles:
1. Calculate premium for age 45
2. Check if > βΉ20,000
3. If yes, retrieve alternative products
4. Compare premiums
5. Rank by cost
Scenario 2: Multi-Product Analysis
Query: "Compare premiums across all products for age 35, then show coverage differences
for the top 3 cheapest options"
ReAct handles:
1. Calculate premium for age 35 (product-agnostic)
2. Retrieve premiums for ActivFit
3. Retrieve premiums for ActivAssure
4. Retrieve premiums for ActivCare
5. Sort by cost (top 3)
6. Retrieve coverage details for top 3
7. Generate comparison table
ReAct System Optimization Strategies:
Current Performance Metrics:
Implementation Files:
backend/agents/agentic/agentic_system.py(155 lines)backend/agents/agentic/react_agent.py(403 lines)backend/agents/agentic/react_tools.py(152 lines)backend/agents/agentic/intent_learner.py(289 lines)
INFO: The ReAct system is designed for complex queries but can handle simple ones too. However, for simple queries, the traditional orchestrator is more efficient due to lower latency and cost.
Challenge: Extract content from complex insurance PDFs with multi-page tables and dense legal text.
Features:
def extract_tables(pdf_path, output_dir): tables = page.find_tables(table_settings={ "vertical_strategy": "lines", "snap_tolerance": 3 }) # Merge if headers match and rows sequential if should_merge(prev_table, curr_table): merged = pd.concat([prev_table, curr_table])
Performance: 85-90% detection accuracy, ~30-45s/page
π‘ TIP: Adjust snap_tolerance (1-3 for line-based, 5-7 for borderless tables)
Innovation: Spatial analysis excludes table bounding boxes to prevent duplication.
# Filter out words intersecting with tables non_table_words = [w for w in words if not intersects_with_table(w, table_bboxes)]
Benefits: No text-table duplication, preserves table references
Problem: Fixed-size chunks break mid-sentence, lose context.
Solution: Embedding-based chunking at natural semantic boundaries (cosine similarity threshold 0.75).
# Calculate sentence similarities similarities = [cosine_similarity(emb[i], emb[i+1]) for i in range(len(embeddings)-1)] # Create chunks at low-similarity boundaries if similarity < 0.75 or length > max_size: create_new_chunk()
Results:
| Metric | Traditional | Semantic | Improvement |
|---|---|---|---|
| Context Quality | Poor | Excellent | Natural boundaries |
| Retrieval Accuracy | Baseline | +25-35% | Better matches |
| Processing Time | Fast | 8+ minutes | Quality trade-off |
β οΈ CAUTION: 8+ minute processing timeβuse for critical content, fixed-size for less important sections.
Strategic validation at critical points ensures accuracy:
1. Table Mapping Review
2. CSV Bulk Upload
3. Approval Tracking
Benefits: High-stakes accuracy, user trust, catch edge cases
Configuration:
text-embedding-ada-002 (1536D)media/output/chroma_db/)Collections by Product:
chroma_db/
βββ ActivAssure/
βββ ActivFit/
βββ [other products]/
Metadata Schema:
{ "page": 5, "doc_type": "policy", "doc_name": "ActivAssure", "chunk_id": "chunk_127", "created_at": "2024-11-05T10:30:00Z" }
Query Features:
Auto-categorization during ingestion:
| Category | Keywords | Use Case |
|---|---|---|
| Policy | policy, terms, coverage | Detailed terms |
| Brochure | brochure, marketing | Overview docs |
| Prospectus | prospectus, offering | Investment info |
| Terms | terms, conditions | Legal clauses |
| Premium Calculation | premium, rates | Pricing tables |
Benefits: Precision filtering, faster retrieval
| Endpoint | System | Speed | Use Case |
|---|---|---|---|
/api/extract_tables/ | Ingestion | N/A | Extract PDF tables |
/api/extract_text/ | Ingestion | N/A | Extract PDF text |
/api/chunk_and_embed/ | Ingestion | 8+ min | Semantic chunking |
/agents/query/ | Traditional | 3-5s | Fast single-step |
/agents/agentic/query/ | ReAct | 5-15s | Multi-step reasoning |
Environment Variables (.env):
# Azure OpenAI AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ AZURE_OPENAI_API_KEY=your-key AZURE_OPENAI_DEPLOYMENT_NAME=gpt-35-turbo AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-ada-002 # Django DEBUG=False SECRET_KEY=your-secret-key ALLOWED_HOSTS=localhost,127.0.0.1 # ChromaDB CHROMA_DB_DIR=media/output/chroma_db/
Prompt Configuration (config/prompt_config.py):
ORCHESTRATOR_SYSTEM_PROMPT = """ You are an insurance query classifier... """ REACT_AGENT_PROMPT = """ You have access to the following tools: {tools} Think step by step... """
Centralized Logging (logs/utils.py):
logger.info(f"Query: {query}, Intent: {intent}, Time: {elapsed}s") logger.error(f"Premium calculation failed: {error}", exc_info=True)
Log Levels:
Error Recovery:
| Component | Metric | Value | Notes |
|---|---|---|---|
| Document Ingestion | |||
| Table Extraction | Speed | 30-45s/page | PDF complexity dependent |
| Table Extraction | Accuracy | 85-90% | Manual review recommended for complex tables |
| Text Extraction | Speed | 10-15s/page | Excluding tables |
| Semantic Chunking | Duration | 8-15 minutes | For 25-page document |
| Embedding Generation | Duration | 2-3 minutes | ChromaDB insert included |
| Full Pipeline | Total Time | 15-20 minutes | Complete document processing |
| Query Performance | |||
| Traditional Orchestrator | Average | 3.5 seconds | Single-step retrieval |
| Traditional Orchestrator | P95 | 5 seconds | 95th percentile |
| ReAct (Simple Query) | Average | 6 seconds | 2-3 tool calls |
| ReAct (Simple Query) | P95 | 10 seconds | 95th percentile |
| ReAct (Complex Query) | Average | 12 seconds | 4-5 tool calls, multi-step reasoning |
| ReAct (Complex Query) | P95 | 15 seconds | 95th percentile |
| Quality Metrics | |||
| Test Coverage | Test Cases | 35+ tests | Across 13 test classes |
| Test Coverage | Modules | 6 modules | Ingestion, retrieval, agents |
| Evaluation Metrics | Dimensions | 3D assessment | Term coverage, similarity, diversity |
| Intent Classification | Accuracy | High | Pattern-based with learning capability |
π‘ TIP: ReAct system is intentionally slower due to multi-step reasoning, providing more comprehensive and accurate answers compared to single-step retrieval.
35+ Test Cases Across 13 Test Classes:
| Module | Test Class | Tests | Coverage |
|---|---|---|---|
| Ingestion | PDFProcessingTests | 4 | Table/text extraction |
| Ingestion | ChunkingTests | 3 | Semantic chunking |
| Retrieval | DocumentRetrievalTests | 3 | Search & filtering |
| Retrieval | EvaluationTests | 2 | Metrics calculation |
| Agents | OrchestratorTests | 5 | Intent classification |
| Agents | PremiumCalculatorTests | 8 | All configurations |
| Agents | ComparisonTests | 3 | Multi-product analysis |
| Agents | ReActAgentTests | 4 | Multi-step reasoning |
| Agents | IntentLearnerTests | 3 | Pattern learning |
Test Execution:
# Run all tests python manage.py test # Specific module python manage.py test agents.tests.OrchestratorTests
Sample Test:
def test_premium_calculation_family_floater(self): """Test 2 Adults + 1 Child configuration""" response = self.client.post('/agents/query/', { 'query': 'Calculate premium for 2 adults aged 35, 40 and child aged 8', 'chroma_db_dir': 'media/output/chroma_db/ActivAssure' }) self.assertEqual(response.status_code, 200) self.assertIn('agent_type', response.data) self.assertEqual(response.data['agent_type'], 'premium_calculation') self.assertIn('βΉ', response.data['response'])
3D Quality Assessment:
1. Term Coverage Score
terms_found / total_query_terms2. Semantic Similarity
3. Result Diversity
Real-Time Display:
st.metric("Term Coverage", f"{coverage_score:.2%}") st.metric("Similarity", f"{similarity_score:.3f}") st.metric("Diversity", f"{diversity_score:.2%}")
Benefits: Transparency, debugging aid, quality monitoring
ReAct Agent Constraints
Document Processing
Query Processing
Data & Storage
Response Time Trade-offs
Concurrent Processing
Rate Limits
Infrastructure Dependencies
Scalability Constraints
Security & Access Control
Monitoring & Observability
Document Support
Advanced Features Not Included
β οΈ CAUTION: These limitations are documented transparently to set realistic expectations. Many can be addressed in future iterations with additional engineering effort.
Current Setup:
Load Balancer (Future)
ββ Django Backend (Single Instance β Scalable to Multiple)
ββ ChromaDB (File-based β Centralized with Shared Storage)
ββ Streamlit Frontend (2 Instances: Traditional + ReAct)
Scaling Strategies:
Horizontal Scaling:
Component Separation:
Performance Optimization:
Key Metrics:
Health Checks:
# Backend curl http://localhost:8000/health/ # ChromaDB connectivity curl http://localhost:8000/api/health/chroma/
β
Dual-Agent Success: Offering speed vs depth choice increased user satisfaction
β
Semantic Chunking: 25-35% better retrieval despite 8+ min overhead
β
HITL Critical: Human validation caught 15-20% edge cases
β
Test Coverage: 35+ tests prevented production issues
β
Modular Code: 79% reduction improved maintainability
β οΈ Challenges:
1. ML-Based Intent Classification
2. Multi-Document Queries
3. Conversational Memory
4. Advanced Table Understanding
5. Performance Optimization
6. Enhanced Evaluation
This publication demonstrated the evolution from a basic RAG pipeline (v1.0) to a sophisticated dual-agent architecture (v2.0) for insurance document processing.
Key Achievements:
Innovation: Users intelligently choose between fast single-step routing (3-5s) and comprehensive multi-step reasoning (5-15s) based on query complexity.
Production-Ready: Deployed with Django + Streamlit, backed by ChromaDB and Azure OpenAI, with comprehensive testing and monitoring.
Impact: Transforms hours of manual insurance document analysis into seconds of automated, accurate responses with transparent reasoning.
Title: Enhanced Insurance Document Processing: A Production-Ready RAG System with Multi-Agent Intelligence (v2.0)
Version History:
Domain: Insurance Technology, Document Processing, Artificial Intelligence
Primary Technologies: RAG (Retrieval-Augmented Generation), Multi-Agent Systems, LangChain, Azure OpenAI, ChromaDB, Django, Streamlit
Author: Yuvaranjani Mani
Contact: GitHub - @Yuvaranjani123
Source Code:
License: MIT License
Version: 2.0 (Multi-Agent Enhanced Edition)
Publication Date: November 4, 2025
Last Updated: November 4, 2025
Supersedes: v1.0 - Insurance RAG
Related Publications:
Technologies and Frameworks:
Inspiration and Learning:
For Questions or Collaboration:
Version-Specific Resources:
If you found this publication helpful:
Built with using Python, LangChain, Azure OpenAI, and cutting-edge multi-agent AI technologies
Β© 2025 Yuvaranjani Mani | MIT License