Abstract

Financial institutions must continuously screen entities for adverse media as part of KYC and AML compliance. Manual screening is slow and inconsistent, while simple keyword matching produces too many false positives and struggles with entity disambiguation.

I built a production-ready multi-agent system that automates FATF-compliant adverse media screening. Building on the Module 2 foundation, this version adds comprehensive testing, security guardrails, and a user-friendly interface—transforming a prototype into an enterprise-ready compliance tool. Four specialized AI agents work together to identify entities, search for relevant articles, classify them against FATF taxonomy, and resolve conflicts, delivering accurate and auditable results.

System Overview

I designed the system around agent specialization to solve adverse media screening. The Entity Disambiguation Agent accurately identifies entities using search, NER, and semantic matching. The Search Quality Agent finds relevant adverse media through progressive search strategies while filtering noise. The Classification Agent categorizes articles into six FATF categories (fraud, corruption, organized crime, terrorism, sanctions evasion, and other serious crimes) using dual-LLM validation. The Conflict Resolution Agent handles disagreements through rule-based logic, external validation, and LLM arbitration.

The workflow returns four outcomes: CLEAN (no adverse media), COMPLETED (adverse media found and classified), NEEDS_CONTEXT (more information needed), or MANUAL_REVIEW (requires human judgment). Each result includes confidence scores, reasoning, and audit trails.

For detailed information about the agent architecture, workflows, and methodology, see Module 2 publication. This publication focuses on production enhancements: testing, safety features, and user interface.

Testing Strategy

Production compliance systems require rigorous testing—errors can mean missed financial crimes or false accusations against innocent individuals. I implemented a comprehensive multi-layered testing approach that validates individual components, their interactions, and complete end-to-end workflows.

Testing Architecture: The Pyramid Approach

My test suite follows the testing pyramid pattern, balancing thorough coverage with fast execution. When tests fail, the failure location immediately indicates which system layer broke: unit test failures point to agent logic errors, integration test failures suggest data flow problems, and end-to-end test failures reveal orchestration issues.

I organized over 100 tests across 6 categories mirroring the system architecture. The bulk of testing focused on individual agent behavior and core workflow orchestration, validating everything from entity disambiguation and search to state management and error handling. Security received dedicated attention through comprehensive guardrail tests covering input validation, rate limiting, and prompt injection detection. Additional tests validated external API integrations, multi-agent collaboration, and complete end-to-end workflows.

Unit Testing: Realistic Scenarios

Unit tests validate that agents handle real-world challenges, not just toy examples. Consider a critical scenario: compliance analysts sometimes provide extensive context—hundreds of words of biographical information. This creates a technical challenge because language models have token limits.

I wrote tests specifically to verify the system handles extremely long inputs without crashing. See the test code example below:

This test validates that the system won't crash when analysts provide detailed context about complex entities—a scenario guaranteed to occur in production.

Integration Testing: Agent Collaboration

Individual agents working correctly is necessary but insufficient—they must collaborate effectively. Integration tests validate critical handoffs:

Disambiguation → Search: Resolved entities with extracted context enhance search query precision
Search → Classification: Complete article content flows to classification with all necessary metadata
Classification → Conflict Resolution: Dual-LLM outputs including reasoning and confidence scores enable informed arbitration

These integration tests caught several bugs that unit tests missed, including search queries not incorporating entity metadata and classification receiving truncated article content due to serialization issues.

System Testing: End-to-End Validation

System tests validate complete workflows from initial entity input through final assessment, exercising the full LangGraph orchestration. I tested realistic compliance scenarios including known adverse entities (high-profile fraud cases), clean entities (no adverse media found), ambiguous entities requiring additional context, API failures handled gracefully, and batch processing where one failure doesn't stop others.

Security Testing: Guardrail Validation

My 18 security tests validate protection against documented attack vectors including prompt injection (15+ attack patterns from OWASP AI Security guidance), XSS prevention (script tags and event handlers blocked), input validation (international names accepted, excessive length rejected), rate limiting enforcement, and output sanitization (API keys never exposed in error messages).

Coverage Metrics & Real Impact

Test coverage:

Overall: 85% of production code
Core workflow: 92%
Agent implementations: 88%
Guardrails: 95%
Service integrations: 78%

Bugs prevented by testing before deployment:

Disambiguation confidence comparison bug causing excessive manual review requests
Search deduplication failure where embedding similarity wasn't removing duplicate articles
Classification timeout when long articles exceeded token limits without truncation
State serialization error where complex nested objects didn't convert to JSON
Prompt injection vulnerability where Unicode homoglyphs disguised attack patterns

Each bug was caught during development rather than production, where impacts would have been significantly more costly.

Testing Philosophy

My approach reflects three principles for compliance software:

Test reality, not theory: I use realistic entity names creating disambiguation challenges, actual adverse media patterns from real investigations, and documented attack vectors from security research. Tests passing on toy examples but failing on realistic data provide false confidence.
Fail fast, fail clearly: Test failures immediately indicate what broke and where. This precision dramatically reduces debugging time during production incidents.
Trust but verify: Even obviously correct code gets tested. Confidence calculations, date parsing, and string sanitization all have tests because production systems have no room for assumptions about correctness.

This rigor gives compliance teams confidence that the system behaves correctly under production conditions including edge cases, service failures, and malicious inputs.

Security & Safety Guardrails

Production adverse media screening systems handle sensitive compliance decisions affecting individuals' financial access and reputations. A single misclassification could wrongly deny banking services or miss genuine financial crime indicators. Beyond accuracy, these systems face security threats including prompt injection attacks and data leakage risks. My guardrail implementation addresses these challenges through three interconnected layers: input validation, output validation, and bias detection.

Three-Layer Guardrail Architecture

My guardrail system operates at three critical workflow points. Input validation executes before processing begins, blocking malicious inputs that could compromise system behavior. Output validation runs after each agent completes work, verifying results meet quality standards and flagging low-confidence classifications for human review. Bias detection analyzes final classifications, identifying patterns indicating unfair treatment. Together, these layers create defense in depth where failure of one guardrail doesn't compromise overall integrity.

Input Validation: Blocking Malicious Inputs

Input validation serves as the first line of defense against attacks attempting to manipulate the system through crafted entity names or context. The system protects against several threat vectors.

Prompt injection attacks are blocked through pattern matching against 15+ documented attack vectors from OWASP AI Security guidance, including instruction override attempts ("Ignore previous instructions"), system prompt extraction, and Unicode-based obfuscation techniques.

XSS protection blocks script tags, event handlers, and JavaScript that could compromise the interface while allowing legitimate punctuation. "José O'Brien-García" passes validation while "John<script>alert(1)</script>Smith" is blocked.

Input sanitization accepts international names with Unicode characters, accommodates reasonable punctuation, normalizes whitespace, and enforces length limits preventing resource exhaustion.

Rate Limiting: Preventing Resource Abuse

Free-tier API services impose strict rate limits that cause processing failures if exceeded. My rate limiter prevents failures through intelligent request throttling and automatic retry logic.

I implemented service-specific limits tracking Groq (30 requests/min, 100K tokens/min), Together AI (60 requests/min), Brave Search (1 request/sec, 2K monthly), and Tavily (1K monthly). The limiter tracks usage across all dimensions simultaneously. Token estimation calculates expected token usage (approximately 4 characters per token) and waits proactively to stay within limits, preventing failed API calls. Retry logic with exponential backoff automatically retries up to 3 times when rate limits are hit, distinguishing temporary hits from permanent failures. Monthly tracking provides warnings at 80% usage, enabling proactive capacity management before quota exhaustion.

Output Validation: Ensuring Classification Quality

Output validation verifies agent results meet quality standards through multi-level checks. Required fields validation ensures every classification includes entity involvement type, adverse media category, confidence scores, and reasoning text. Logical consistency checking prevents contradictions like classifying entities as "neutral" but assigning "Fraud & Financial Crime" category, or marking perpetrators as "Not Adverse Media."

Evidence quality assessment requires serious classifications to cite specific facts. High-confidence classifications with short reasoning trigger warnings, and perfect confidence scores (0.99-1.0) raise suspicion of model overconfidence. Entity mention validation verifies the entity actually appears in article text—missing mentions indicate probable misattribution or disambiguation failure. Dual-LLM conflict detection flags disagreements between primary and secondary classifiers, recognizing that unanimous uncertainty still indicates unreliable results.

Bias Detection: Protecting Against Discrimination

Bias detection analyzes classification patterns that might indicate unfair treatment through evidence-aware multi-factor analysis.

I implemented name-based detection with evidence awareness, maintaining databases of Nigerian/African and Arabic/Middle Eastern name patterns. When these patterns appear in high-confidence perpetrator classifications, sophisticated evidence quality assessment activates, scoring across four dimensions: entity mentions (0-3 points), evidence keywords (0-3 points), reasoning length (0-2 points), and suspicious hedging (-2 points penalty).

Evidence-based severity assignment provides strong evidence (6+ points) low severity for statistical monitoring, moderate evidence (3-5 points) triggers warnings recommending review, and weak evidence (<3 points) with non-Western names generates mandatory human review alerts. Confidence pattern detection ensures very high confidence (>0.90) on serious accusations warrants extra scrutiny, as perfect scores suggest overconfidence or hallucination rather than genuine certainty. LLM disagreement detection assigns high severity to conflicts involving serious categories like terrorism or corruption, recognizing substantial uncertainty requiring human resolution.

Validation Results and Severity Levels

All guardrails use consistent severity taxonomy. Critical severity indicates fundamental failures invalidating results (missing fields, logical contradictions, entity misattribution) requiring immediate escalation to manual review. High severity signals serious concerns requiring human verification (low confidence on serious accusations, weak evidence, significant LLM disagreements). Medium severity suggests caution is warranted but doesn't mandate review (moderate evidence quality, suspicious patterns, minor inconsistencies). Low severity provides informational warnings for monitoring and auditing without blocking automated processing.

This graduated system enables risk-based processing where critical and high severity issues trigger immediate review while medium and low severity cases proceed with enhanced logging for compliance auditing.

Guardrails Philosophy

My implementation reflects three principles for responsible AI deployment in regulated environments.

Defense in depth: Multiple overlapping protective layers prevent single points of failure. Input validation blocks bad inputs, but output validation and bias detection catch issues that slip through.
Transparency and explainability: Every review requirement includes clear reasoning—"high confidence classification with weak evidence" rather than opaque "needs review" flags. This specificity helps human reviewers focus investigation.
Continuous improvement: Comprehensive logging enables learning from successes and failures. Every validation failure, bias indicator, and review requirement generates detailed logs for analyzing classification accuracy, refining validation rules, and training better models.

This rigorous guardrail architecture transforms the system from prototype to production-ready tool that compliance teams can trust with high-stakes decisions affecting real people's financial access and reputations.

User Interface Design and Implementation

To operationalize the multi-agent adverse media screening system for real-world compliance workflows, I developed a production-grade web interface using Streamlit. The interface balances regulatory requirements—audit trails, detailed reasoning, export capabilities—with user experience principles prioritizing clarity, speed, and actionable insights for compliance analysts.

Architecture and Design Philosophy

The UI implements a two-tab navigation pattern optimized for compliance analyst workflows: New Analysis for entity submission and real-time processing, and Results Dashboard for comprehensive result visualization and analysis history. This separation reflects the natural workflow of compliance teams who initiate screenings in batches then review results systematically.

A persistent sidebar provides system information, agent details, and contextual help—critical for production environments where multiple analysts with varying expertise levels operate the system.

Progressive disclosure input form with sidebar showing system configuration and expandable additional context fields including date of birth, occupation, location, and organization.

Entity Input: Progressive Disclosure

The input form balances simplicity with power-user flexibility. The basic interface requests entity name (required) and basic context (optional). An expandable "Additional Context" section reveals structured fields that dramatically improve disambiguation accuracy: date of birth, occupation, nationality, location, organization affiliation, and other identifiers.

This design accommodates both quick screenings (name-only) and high-stakes investigations requiring precise entity resolution. All inputs undergo server-side validation through my guardrails framework, with user-friendly error messages guiding corrections.

Results Dashboard: Status-Based Rendering

The results interface adapts dynamically based on processing outcomes, providing specialized views for each status category.

Clean Entity Results

For entities with no adverse media coverage, the interface emphasizes reassurance through visual hierarchy. The clean status view displays a prominent green checkmark with "Entity appears clean - No adverse media found", risk level badge showing "CLEAN" with "No Risk" indicator, search coverage metrics demonstrating thoroughness, and expandable assessment details supporting audit requirements.

Clean status results showing risk assessment, sources searched, and zero adverse articles found.

High-Risk Entity Results

When adverse media is detected, the interface shifts to action-oriented presentation with comprehensive analytical depth using a three-tier information architecture.

Tier 1 - Executive Summary: Risk level (HIGH/MEDIUM/LOW) with color-coded badges, key metrics (total articles, adverse count, percentage, sources searched), and alert banner stating "Significant adverse media found - Action required."
Tier 2 - Article Analysis Table: The core analytical interface designed for rapid triage, displaying status (adverse/clean indicators), article titles, entity involvement (perpetrator/victim/associate/witness), FATF category classification, confidence scores, and clickable source links. Articles are sortable by all columns with smart truncation preserving readability.

High-risk results showing article classification table with involvement types, FATF categories, and confidence scores.

Tier 3 - Detailed Analysis: Expandable article cards provide forensic-level detail required for compliance documentation, including final classification, resolution method (confidence-based, LLM arbitration, external validation), natural language reasoning explaining classification decisions, article preview with source link, and conflict information when dual-LLM disagreement occurred.

Category Breakdown and Export

For adverse findings, the interface displays FATF category distribution using metrics cards (e.g., "Fraud & Financial Crime: 2 articles, Other Serious Crimes: 1 article"). This visualization immediately communicates the nature and diversity of adverse findings—an entity with 5 articles across multiple FATF categories represents fundamentally different risk than 5 articles about a single historic incident.

FATF category distribution and multi-format export options (JSON, CSV, Summary).

I designed three export formats optimized for different downstream uses. JSON Export provides complete machine-readable output with metadata, confidence scores, and reasoning chains for integration with case management systems. CSV Export delivers article-level tabular data for spreadsheet analysis including classifications and confidence scores. Summary Report offers human-readable text summary for compliance documentation and regulatory reporting. Each export includes timestamped filenames with entity identifiers ensuring organization and traceability.

Production Features

Beyond visual design, the interface implements production-grade operational features.

Input validation and security ensures real-time client-side validation with server-side guardrails preventing prompt injection, XSS attacks, and malformed data. Maximum field lengths prevent resource exhaustion.
Error handling provides graceful degradation when backend services fail, user-friendly error messages without exposing system internals, and automatic retry suggestions for transient failures.
Session management enables persistent session state for tab switching without data loss, analysis history tracking, and workflow state preservation during context request scenarios.
Performance optimization includes spinner indicators during processing (typically 45-90 seconds), progressive disclosure preventing UI overload, lazy loading of detailed content, and caching of static resources.
Accessibility implements high-contrast color schemes meeting WCAG AA standards, semantic HTML for screen reader compatibility, keyboard navigation support, and responsive layout.

Design Impact on Compliance Workflows

My UI design choices directly support real-world compliance operations. The interface reduces false positives through context request workflows while accelerating triage via risk-stratified results. Comprehensive reasoning trails support audit requirements, analysis history enables batch processing, and clear confidence scores with conflict indicators facilitate escalation decisions.

Through this production-grade interface, the multi-agent adverse media system transitions from research prototype to operationally deployable compliance tool ready for integration into institutional KYC/AML workflows.