Agentic AI - A Multi-Agent System for Insurance Policy Cancellation

📈 Abstract

This publication presents an agentic AI system designed to automate insurance policy cancellation workflows using a multi-agent architecture. The system coordinates specialized agents responsible for intent recognition, policy validation, eligibility assessment, and execution, enabling efficient and scalable decision-making.

Built with a modular and extensible design, the solution combines rule-based validation with large language model (LLM)-driven reasoning to handle complex, real-world scenarios. The architecture improves transparency, maintainability, and adaptability compared to monolithic automation approaches.

This work demonstrates how multi-agent systems can enhance operational efficiency, reduce manual intervention, and provide a scalable foundation for enterprise AI automation in the insurance domain.

💼 Introduction

The manual processing of insurance policy cancellations is often slow, error-prone, and resource-intensive, requiring careful verification of policy details, eligibility checks, and calculation of refunds. To address these challenges, this project introduces a Multi-Agent Insurance Policy Cancellation System, which automates the end-to-end workflow while maintaining reliability, traceability, and compliance with business rules.

Purpose
The primary purpose of this system is to streamline and automate the cancellation process for insurance policies. By leveraging multiple specialized AI agents, the system reduces human effort, minimizes errors, and ensures consistent application of business rules throughout the workflow.
Objectives
The system is designed to achieve the following objectives:

Automate User Input Handling: Collect and validate policy numbers and customer information in a structured and safe manner.
Verify Policy Eligibility: Confirm that the policy exists and meets the necessary criteria for cancellation, including active status, payment verification, and valid policy dates.
Coordinate Multi-Agent Workflow: Divide responsibilities across specialized agents, including intake, eligibility verification, refund calculation, logging, and summary generation, ensuring smooth and reliable communication between components.
Ensure Robust and Safe Execution: Incorporate validation checks, structured logging, retries, timeout handling, and fallback mechanisms to prevent system crashes and maintain operational integrity.
Produce Clear Output: Generate an accurate and professional cancellation notice in PDF format for the user.

Reader Guidance
This document explains the design and implementation of the system, including:

The agent-based architecture and how specialized components interact
The business rules and decision logic used to enforce correct processing
Error handling, monitoring, and logging mechanisms that ensure reliability
Testing strategies and operational considerations to maintain performance and compliance

Overall, this system uses Large Language Models (LLMs), tool integrations, validation guardrails, test strategies and workflow orchestration to process user requests safely and generate structured cancellation outcomes properly. Using mock data stored in CSV files, multiple specialized agents collaborate to validate policy details, evaluate cancellation eligibility based on predefined business rules, calculate refundable amounts, log refund records, and generate a professional cancellation notice in PDF format. In addition, in order to make the system more user-friendly and interactive, the system is deployed with a Streamlit UI to achieve the purpose.

🎯 System Purpose and System Architecture

This system is an AI-driven, multi-agent workflow engine that processes insurance cancellation requests in a structured, validated, and secure manner.

Target Users

Insurance policyholders requesting cancellation
Internal insurance support teams
Developers testing agentic workflows
QA engineers validating cancellation scenarios

Problem Statement
Traditional insurance cancellation system involves a lot of manual efforts and requires manual data verification, also increases high risks of inconsistent validation. Using this agentic AI system can help solve automated structured intake of cancellation requests, intelligent validation and policy verification, risk and compliance checks, safe and auditable decision workflows.
Architecture Diagram
Core Components

Intake Agent
Collects and validates customer policy details
Analysis Agent
Evaluates cancellation eligibility using business rules
Human Review (Eligibility)
Approves or rejects cancellation eligibility
Refund Agent
Calculates refund amount
Human Review (Refund)
Validates financial refund decision
Logger Agent
Records approved refund outcomes
Summary Agent
Generates customer cancellation notice (PDF)

🤖 Model Development and Evaluation

This system uses a multi-agent, tool-augmented LLM architecture designed to safely automate structure insurance cancellation workflows.

Rather than relying on a single monolithic prompt, the system decomposes the workflow into role-specialised agents, deterministic routing login, tool-augmented reasoning, and guardrails-enforced structured outputs.

Business Rules

To achieve the success of policy cancellation, the process is determined using the following rules:

The policy status must be active.
The policy payment must be marked as paid.
The current date must be earlier than the policy end date.

Only policies that satisfy all rules are eligible for cancellation and refund processing.

Dataset Preparation

This project uses mock data, all customer and policy data is stored as a CSV file named insurance_policies.csv. Link for the dataset is below:
https://github.com/jingozuo/AAIDC_Project3_JZ/blob/main/data/insurance_policies.csv

Model Design & Development

Tools

Data Lookup Tool
This custom data lookup tool is used to search and validate the existence of the policy by query a mock CSV dataset.
Check Cancellation Eligibility Tool
The tool is designed to evaluate whether a policy is eligible for cancellation based on the policy's status, payment and dates.
Refund Calculator Tool
Once the policy is confirmed that is eligible for refund, this tool is used to compute refund amount from policy dates and payment only.
Refund Log Tool
This tool is helpful to append one new refund record to the output CSV file as an evidence.
Notice Generator Tool
The last step for processing insurance cancellation is to generate a PDF notice with all relevant details for the user.

Large Language Models (LLMs)

Intake Assistance Prompt
It's used to ask the user to provide a policy number and confirm whether they want to proceed with cancelling the policy. The output is in JSON format, following the schema:

      {
        "policy_number": "string",
        "first_name": "string",
        "last_name": "string",
        "start_date": "date",
        "end_date": "date",
        "policy_status": "string",
        "payment_amount": "string",
        "is_payment_made": "boolean"
      }

Summary Assistance Prompt
It's responsible for generating a clear and polite cancellation summary for the customer. The summary includes:

Policy number
Customer full name
Customer email
Cancellation status
Refund amount
Reason for approval or rejection
The output for the response is listed as bullet points and the total response is under 100 words

Agents

Intake Agent
Intake agent is responsible for collecting policy number, looking up policies in data source, and confirming whether the policy detail is correct with user by using Data Lookup Tool and Intake Assistant Prompt. It involves loops until policy is found and confirmed or max attempts.

If the policy number provided can be found in the mock dataset, the agent will retrieve the policy details for the customer and confirm the customer if policy details are correct or not. If the policy number can't be found, the agent will request the customer to enter the correct policy number.

Analysis Agent
Analysis agent is responsible for determining if the policy is eligible for cancellation. It uses Check Cancellation Eligibility Tool to to check whether the policy is active, whether the payment has been made, and whether the current date is before the policy end date.
Refund Agent
Refund agent can help automatically compute refund amount from the policy by using Refund Calculator Tool. Refund are calculated using a pro-rata approach based on the remaining policy duration and the total payment amount.
Logger Agent
Logger agent persists approved refund decisions into a CSV file stored in the output directory. This agent uses Refund Log Tool, a custom CSV logging tool, and is executed only after human approval of the refund. Logged data includes:

Policy number
Refund eligibility
Refund amount
Decision timestamp
Customer details

Summary Agent
Summary agent generates cancellation notice from the policy and refund details. It uses Summary Assistance Prompt to produce well-structured, formal language suitable for customer communication. The generated notice is then exported as a PDF file by using the custom Notice Generator Tool. It represents the final customer-facing output of the system

Human-in-the-Loop (HITL)

The system also involves human-in-the-loop interactions as key decision boundaries within the flow to enhance safety, accountability, and trust. There're two human-in-the-loop checkpoints implemented in the workflow.

Human Review – Eligibility (HITL)
This checkpoint is after the Analysis Agent. A human reviewer examines the eligibility decision and explicitly approves or rejects the policy cancellation. Once the human reviewer approves the eligibility check, the workflow can proceed to Refund Agent, otherwise, the workflow terminates safely.
Human Review – Refund (HITL)
A human review checkpoint is added after Refund Agent, ensuring that financial outcomes are verified before any permanent record is created. Only when the refund is approved does the workflow proceed to Logger Agent, otherwise, the workflow terminates safely.

Performance & Evaluation

DeepEval method is used in this project to evaluate the insurance cancellation workflow. Evaluation includes five dimensions:

Eligibility correctness
Eligibility should match business rules, the eligible policy must be: active, payment made, and current date before end date.
Refund calculation correctness
The stated refund amount must match the proportional formula from dates and payment, which is payment × (remaining_days / total_days).
Workflow sequencing
Workflow sequence should follow: Intake agent→ Analysis agent→ Human Review Eligibility→ Refund → Human Review Refund → Logger agent → Summary agent
Summary notice quality
Generating a summary notice should follow the summary_assistant_prompt. The notice should be clear, professional, and includes the details as mentioned in the prompt.
Agent boundary enforcement
Each agent should only use its allowed tools.

Intake agent -> Data Lookup Tool
Analysis agent -> Check Cancellation Eligibility Tool
Refund agent -> Refund Calculator Tool
Logger agent -> Refund Logger Tool
Summary agent -> Notice Generator Tool

In-memory cache
The CSV file which stores all mock data is read only once and stored in memory, so future lookups are faster unless a different file path is used.

🔐 Safety, Security and Guardrails

The system implements a multi-layered safety architecture combining:

Prompt-level constraints (LLM behavior control)
Input validation and sanitization
Output filtering and schema enforcement
Guardrails Hub validation
Structured compliance logging
Graceful degradation and safe fallbacks

Prompt-Level Safety Controls

The system enforces strict behavioral constraints directly inside prompt_config.yaml.

Intake Assistant Controls

It is from "intake_assistant_prompt".

Behavioral Restrictions:

Must return valid JSON only
Must not include explanations or markdown
Must not invent or guess policy numbers
Must normalize policy number (trim + uppercase)
Must redirect unrelated user input
If user refuses confirmation → set "confirmed": false

Security Benefits:

Prevents hallucinated free text
Prevents arbitrary output format
Enables strict downstream schema validation
Prevents data fabrication

Summary Assistant Controls

It is from "summary_assistant_prompt".

Explicit Constraints:

Must generate summary only from system state
Must not expose internal logic
Must not invent new information
Must explicitly state Refund Amount: $0.00 if none
Must format as bullet points

Security Benefits:

Prevents data leakage
Prevents hallucinated refund values
Prevents exposure of internal agent logic

Input Validation and Sanitization

The system implement input validation and sanitization inside guardrails_safety.py.

All user input goes through sanitize_user_input() to trim whitespace, remove control characters, enforce max length (500 characters).
Policy number format is only allowed alphanumeric, hyphens, underscores, spaces designed in validate_policy_number_format(). If invalid, it logs validation failure, removes disallowed characters, returns safe truncated value, and does NOT crash system.

Output Filtering and Content Safety

LLM-generated summaries are validated using validate_notice_output().
Validations performed:

Must be non-empty string
Enforces max length (10,000 characters)
Applies unsafe pattern filtering
Returns safe fallback if invalid

Error Handling

The system is designed to handle errors safely and continue running when possible, while also logging problems so they can be traced later.
The system uses retry mechanism for temporary failures. If certain operations fail (such as calling tools or the AI model), the system will automatically retry up to 3 times. The retry logic is used for:

Policy lookup
Refund logging
Notice PDF generation
AI summary generation.
If all retries fail, the system records the failure and moves to a safe fallback behaviour.

Compliance Logging

All guardrail events are logged in file guardrails_compliance.jsonl.
The log records events include:

retry_attempt – a retry is about to happen
retry_exhausted – all retry attempts failed
tool_failure – a tool failed (for example, policy lookup). The system treats it as “policy not found.”
error_handling – a failure occurred and a fallback was used.

Each log entry contains structured information such as:

timestamp
event type
stage
message
error details (if available)

Example:

{"timestamp": "2026-03-01T22:23:31.894994+00:00", "event_type": "output_validation", "stage": "summary", "message": "Notice text validated", "validated": true}

Security Benefits:

Enables audit trails
Supports regulatory compliance
Enables debugging
Maintains structured JSONL format

📊 Testing Strategy and Coverage

The system implements a structured, The system implements a structured, production-grade testing strategy using pytest, organized into:

Unit tests (agent, tool, utility level)
Integration tests (agent-to-agent communication)
End-to-end test (entire system workflow)

The test suite is modular and maps directly to system components.

Test Suite Structure

conftest.py
run_tests.py
test_e2e_system_flows.py
test_health.py
test_integration_agent_communication.py
test_nodes.py
test_prompt_builder.py
test_retry_logging.py
test_tools_cancellation_rules.py
test_tools_data_lookup.py
test_tools_notice_generator.py
test_tools_refund_calculator.py
test_tools_refund_logger.py
test_utils.py

Unit Testing

Agent Node Test
The test validates each individual agent node functions, execution logic, output schema compliance, proper state transitions, and failure handing inside nodes. This test ensures each agent produces structured format, and invalid inputs are rejects.
Result:
Prompt Construction Test
The test validates whether prompt templates are correctly formatted, whether required system instructions are included, and whether JSON schema constraints are injected. The purpose is to prevent prompt drift and regression when modifying templates.
Result:
Tool-Level Unit Tests
Each custom tool is tested independently:
3.1. test_tools_data_lookup.py

Valid policy lookup
Non-existent policy handling
Empty or invalid input handling

3.2. test_tools_cancellation_rules.py

Eligibility window validation
Business rule enforcement
Invalid cancellation conditions

3.3. test_tools_refund_calculator.py

Accurate refund calculation
Fee deduction logic
Rounding behavior

3.4. test_tools_refund_logger.py

Logging persistence behavior
Structured audit log creation
Failure handling

3.5. test_tools_notice_generator.py

Structured notice generation
Required fields included
Template formatting validation

Utils Tests
The test covers missing files, empty path, invalid YAML, empty file, to ensure configurations setup correctly.

Integration Testing

The test validates orchestrator routing logic, agent-to-agent handoffs, tool invocation sequencing, state passing consistency, multi-agent chain, HITL resume handoffs. This test ensures agents communicate correctly, no schema corruption across steps, and deterministic routing logic flows.
Result: Screenshot 2026-03-02 at 3.11.21 PM.png

End-to-End (E2E) System Testing

End-to-end test runs the full insurance cancellation graph (intake → analysis → HITL → refund → HITL →
logger → summary) with mocked user input, HITL decisions, and LLM. Verifies that
the entire flow completes and produces expected outputs (refund log, PDF notice).
Result: Screenshot 2026-03-02 at 9.38.48 AM.png

⚙️ Deployment and Configuration Guide

Project Structure

Screenshot 2026-03-02 at 4.36.06 PM.png

Installation & Quick-Start Guide

Clone Repository

git clone https://github.com/jingozuo/AAIDC_Project3_JZ.git

Create Virtual Environment

python -m venv venv
source venv/bin/activate

Install Dependencies

pip install -r requirements.txt

Configure Environment

cp .env.example .env

Update Config File
Set the llm_model in config/config.yaml to match your provider (e.g. llama-3.3-70b-versatile for Groq).

llm_model: llama-3.3-70b-versatile

Run Application
From the project root:

python codes/main.py

Or from the codes directory:

cd codes && python main.py

Streamlit UI
Run the same workflow in a browser:

streamlit run codes/streamlit_app.py

Tests
Tests use pytest. Run from the project root:

pytest tests/test_e2e_system_flows.py

Run a single test, e.g. test_tools_data_lookup.py

pytest tests/test_tools_data_lookup.py

🎨 UI Specifications (Streamlit)

Overview

The system uses Streamlit to provide an interactive conversational workflow. The UI follows a step-based stateful flow, where the session state determines the next prompt and available user actions.

The interface designed includes:

Main conversational flow panel
Structured input fields
Dynamic confirmation buttons
Workflow status display
Left sidebar session controls

Workflow Demonstration

A demonstration video is attached to illustrate the full interaction flow.
The workflow proceeds as follows:

🧩 Resilience and Monitoring

The system implements a centralized retry, timeout, and logging mechanism to ensure production-grade resilience and operational traceability.

Operations that depend on external services—such as tool calls and LLM invocations—are executed through the call_with_retry() wrapper. This mechanism provides:

Automatic retries for transient failures
Exponential backoff between attempts
Configurable timeout controls
Structured JSON logging
Safe fallback responses when failures persist

This design improves system reliability while ensuring all operational events are recorded for monitoring and debugging.

Retry Strategy
All retry-enabled operations follow the same retry policy:

Maximum Attempts: 3
Backoff Strategy: Exponential
Base Delay: 1 second
Maximum Delay Cap: 60 seconds

If an operation fails, the system waits based on the backoff rule and retries. If all retry attempts fail, the retry is marked as exhausted, and fallback handling is triggered.

Timeout Management
To prevent long or stalled requests from blocking the system, timeout controls are implemented at multiple levels.

2.1. Retry Attempt Timeout:
The LLM initialization function (get_llm in codes/llm.py) supports an optional request_timeout parameter. This value limits how long the system waits for an LLM response before terminating the request.

The timeout can be configured in: config/config.yaml

Both the main application and the Streamlit interface read this value and pass it to the LLM client.

2.2. LLM Request Timeout:
The retry wrapper (call_with_retry) also supports a per-attempt timeout.

Each operation attempt runs in a separate thread. If execution exceeds the specified timeout:

A TimeoutError is raised
The retry mechanism treats it as a failure
The operation is retried according to the retry policy
This ensures that stalled operations do not block the workflow.

Failure Handling Behavior
3.1. Retry Attempt - When an operation fails but retry attempts remain:

A retry_attempt event is logged
The system waits according to backoff rules
The operation is re-executed

3.2. Retry Exhausted - If all attempts fail:

A retry_exhausted event is logged
Operation-specific fallback is triggered

3.3. Safe Fallback Mechanism
If the system encounters conditions such as:

Exceptions during execution
Invalid or empty responses
Output validation failure
LLM response errors

The system returns a predefined safe message:

"Your insurance cancellation has been processed. Please retain this notice for your records."

This can help prevent the system crash, and ensure no exposure of partial or corrupted LLM output.

Logging and Monitoring
All retry attempts, failures, and fallback events are written to the structured log file: logs/guardrails_compliance.jsonl.
Each log entry includes structured JSON fields such as:

timestamp
event_type
stage
message
optional error
optional metadata

Events captured include:

Retry attempts
Retry exhaustion
Tool failures
Timeout errors
Fallback execution

This centralized logging design enables:

Easier debugging and incident investigation
Full traceability of system behaviour
Simplified operational monitoring and compliance auditing

Health Check
The system includes a health check feature to make sure everything is set up correctly before running.

Health Check includes:
5.1. Configuration files:

Checks that config/config.yaml and config/prompt_config.yaml exist
Verifies they are valid YAML
Confirms required settings are present (e.g. intake_assistant_prompt, summary_assistant_prompt)

5.2. Data file:

Checks that data/insurance_policies.csvexists and can be read

5.3. Directories:

Checks that relevant folders exist. If not, creates them.

Run the health check:

python codes/main.py --health

The output looks like:
Screenshot 2026-03-15 at 10.09.47 AM.png
These checks verify your configuration, data files, folders, and optional AI model connectivity.

💡 Technical Support & Troubleshooting Guide

The troubleshouting guide helps find and fix common problems when running the insurance cancellation system, whether using the command line, Streamlit app, or tests.

There are common problems listed below and how to fix them:

Data or File Problems

Data file not found: Make sure data/insurance_policies.csv exists with the right columns.
Policy lookup always “not found” : Check that the policy number exists and that the CSV format is correct (remove extra spaces, uppercase).
Module not found errors : Make sure you are running from the project root folder.

Configuration Problems

Config file not found: Make sure config/config.yaml and config/prompt_config.yaml exist and you are running from the project root.
Invalid YAML : Fix formatting mistakes in the YAML file (check indentation and colons).
Empty config file : Add the required settings, like the AI model name.
LLM fails to start : Check the model name and API key; make sure they are correct.

AI Model / API Problems

LLM hangs or times out: Set a request timeout and allow retries; check your internet connection.
Unauthorized (401/403) errors : Make sure your API key is set in the .env file.
Rate limits : The system will retry automatically, but try reducing how many requests you send at once.

Overall, check your config files, data files, and API keys first, then look at the logs for retry or fallback events. The system will try to recover automatically when possible.

🚀 Conclusion

The Multi-Agent Insurance Cancellation System is a production-ready AI workflow designed to automate policy cancellation in a safe, structured, and auditable manner. The system combines modular multi-agent orchestration, strict JSON schema enforcement, Guardrails-based validation, and centralized retry and logging mechanisms to ensure reliability and compliance.

By enforcing deterministic routing, input sanitization, output filtering, and exponential backoff retry logic, the system minimizes hallucination risk, prevents malformed outputs, and gracefully handles failures without disrupting the user experience. All critical events — including validation checks, retry attempts, and fallback executions — are captured in structured compliance logs to support traceability and operational monitoring.

Comprehensive unit, integration, and end-to-end testing further ensures workflow integrity and resilience under real-world conditions.

Overall, the system demonstrates a robust approach to building secure, maintainable, and enterprise-ready agentic AI applications, prioritizing safety, reliability, and transparency over uncontrolled autonomy.