Financial institutions must continuously screen entities for adverse media as part of KYC and AML compliance. Manual screening is slow and inconsistent, while simple keyword matching produces too many false positives and struggles with entity disambiguation.
I built a production-ready multi-agent system that automates FATF-compliant adverse media screening. Building on the Module 2 foundation, this version adds comprehensive testing, security guardrails, and a user-friendly interface—transforming a prototype into an enterprise-ready compliance tool. Four specialized AI agents work together to identify entities, search for relevant articles, classify them against FATF taxonomy, and resolve conflicts, delivering accurate and auditable results.
I designed the system around agent specialization to solve adverse media screening. The Entity Disambiguation Agent accurately identifies entities using search, NER, and semantic matching. The Search Quality Agent finds relevant adverse media through progressive search strategies while filtering noise. The Classification Agent categorizes articles into six FATF categories (fraud, corruption, organized crime, terrorism, sanctions evasion, and other serious crimes) using dual-LLM validation. The Conflict Resolution Agent handles disagreements through rule-based logic, external validation, and LLM arbitration.
The workflow returns four outcomes: CLEAN (no adverse media), COMPLETED (adverse media found and classified), NEEDS_CONTEXT (more information needed), or MANUAL_REVIEW (requires human judgment). Each result includes confidence scores, reasoning, and audit trails.
For detailed information about the agent architecture, workflows, and methodology, see Module 2 publication. This publication focuses on production enhancements: testing, safety features, and user interface.
Production compliance systems require rigorous testing—errors can mean missed financial crimes or false accusations against innocent individuals. I implemented a comprehensive multi-layered testing approach that validates individual components, their interactions, and complete end-to-end workflows.
My test suite follows the testing pyramid pattern, balancing thorough coverage with fast execution. When tests fail, the failure location immediately indicates which system layer broke: unit test failures point to agent logic errors, integration test failures suggest data flow problems, and end-to-end test failures reveal orchestration issues.
I organized over 100 tests across 6 categories mirroring the system architecture. The bulk of testing focused on individual agent behavior and core workflow orchestration, validating everything from entity disambiguation and search to state management and error handling. Security received dedicated attention through comprehensive guardrail tests covering input validation, rate limiting, and prompt injection detection. Additional tests validated external API integrations, multi-agent collaboration, and complete end-to-end workflows.
Unit tests validate that agents handle real-world challenges, not just toy examples. Consider a critical scenario: compliance analysts sometimes provide extensive context—hundreds of words of biographical information. This creates a technical challenge because language models have token limits.
I wrote tests specifically to verify the system handles extremely long inputs without crashing. See the test code example below:

This test validates that the system won't crash when analysts provide detailed context about complex entities—a scenario guaranteed to occur in production.
Individual agents working correctly is necessary but insufficient—they must collaborate effectively. Integration tests validate critical handoffs:
These integration tests caught several bugs that unit tests missed, including search queries not incorporating entity metadata and classification receiving truncated article content due to serialization issues.
System tests validate complete workflows from initial entity input through final assessment, exercising the full LangGraph orchestration. I tested realistic compliance scenarios including known adverse entities (high-profile fraud cases), clean entities (no adverse media found), ambiguous entities requiring additional context, API failures handled gracefully, and batch processing where one failure doesn't stop others.
My 18 security tests validate protection against documented attack vectors including prompt injection (15+ attack patterns from OWASP AI Security guidance), XSS prevention (script tags and event handlers blocked), input validation (international names accepted, excessive length rejected), rate limiting enforcement, and output sanitization (API keys never exposed in error messages).
Test coverage:
Bugs prevented by testing before deployment:
Each bug was caught during development rather than production, where impacts would have been significantly more costly.
My approach reflects three principles for compliance software:
This rigor gives compliance teams confidence that the system behaves correctly under production conditions including edge cases, service failures, and malicious inputs.
Production adverse media screening systems handle sensitive compliance decisions affecting individuals' financial access and reputations. A single misclassification could wrongly deny banking services or miss genuine financial crime indicators. Beyond accuracy, these systems face security threats including prompt injection attacks and data leakage risks. My guardrail implementation addresses these challenges through three interconnected layers: input validation, output validation, and bias detection.
My guardrail system operates at three critical workflow points. Input validation executes before processing begins, blocking malicious inputs that could compromise system behavior. Output validation runs after each agent completes work, verifying results meet quality standards and flagging low-confidence classifications for human review. Bias detection analyzes final classifications, identifying patterns indicating unfair treatment. Together, these layers create defense in depth where failure of one guardrail doesn't compromise overall integrity.
Input validation serves as the first line of defense against attacks attempting to manipulate the system through crafted entity names or context. The system protects against several threat vectors.
Prompt injection attacks are blocked through pattern matching against 15+ documented attack vectors from OWASP AI Security guidance, including instruction override attempts ("Ignore previous instructions"), system prompt extraction, and Unicode-based obfuscation techniques.

XSS protection blocks script tags, event handlers, and JavaScript that could compromise the interface while allowing legitimate punctuation. "José O'Brien-García" passes validation while "John<script>alert(1)</script>Smith" is blocked.
Input sanitization accepts international names with Unicode characters, accommodates reasonable punctuation, normalizes whitespace, and enforces length limits preventing resource exhaustion.
Free-tier API services impose strict rate limits that cause processing failures if exceeded. My rate limiter prevents failures through intelligent request throttling and automatic retry logic.
I implemented service-specific limits tracking Groq (30 requests/min, 100K tokens/min), Together AI (60 requests/min), Brave Search (1 request/sec, 2K monthly), and Tavily (1K monthly). The limiter tracks usage across all dimensions simultaneously. Token estimation calculates expected token usage (approximately 4 characters per token) and waits proactively to stay within limits, preventing failed API calls. Retry logic with exponential backoff automatically retries up to 3 times when rate limits are hit, distinguishing temporary hits from permanent failures. Monthly tracking provides warnings at 80% usage, enabling proactive capacity management before quota exhaustion.
Output validation verifies agent results meet quality standards through multi-level checks. Required fields validation ensures every classification includes entity involvement type, adverse media category, confidence scores, and reasoning text. Logical consistency checking prevents contradictions like classifying entities as "neutral" but assigning "Fraud & Financial Crime" category, or marking perpetrators as "Not Adverse Media."
Evidence quality assessment requires serious classifications to cite specific facts. High-confidence classifications with short reasoning trigger warnings, and perfect confidence scores (0.99-1.0) raise suspicion of model overconfidence. Entity mention validation verifies the entity actually appears in article text—missing mentions indicate probable misattribution or disambiguation failure. Dual-LLM conflict detection flags disagreements between primary and secondary classifiers, recognizing that unanimous uncertainty still indicates unreliable results.
Bias detection analyzes classification patterns that might indicate unfair treatment through evidence-aware multi-factor analysis.
I implemented name-based detection with evidence awareness, maintaining databases of Nigerian/African and Arabic/Middle Eastern name patterns. When these patterns appear in high-confidence perpetrator classifications, sophisticated evidence quality assessment activates, scoring across four dimensions: entity mentions (0-3 points), evidence keywords (0-3 points), reasoning length (0-2 points), and suspicious hedging (-2 points penalty).
Evidence-based severity assignment provides strong evidence (6+ points) low severity for statistical monitoring, moderate evidence (3-5 points) triggers warnings recommending review, and weak evidence (<3 points) with non-Western names generates mandatory human review alerts. Confidence pattern detection ensures very high confidence (>0.90) on serious accusations warrants extra scrutiny, as perfect scores suggest overconfidence or hallucination rather than genuine certainty. LLM disagreement detection assigns high severity to conflicts involving serious categories like terrorism or corruption, recognizing substantial uncertainty requiring human resolution.
All guardrails use consistent severity taxonomy. Critical severity indicates fundamental failures invalidating results (missing fields, logical contradictions, entity misattribution) requiring immediate escalation to manual review. High severity signals serious concerns requiring human verification (low confidence on serious accusations, weak evidence, significant LLM disagreements). Medium severity suggests caution is warranted but doesn't mandate review (moderate evidence quality, suspicious patterns, minor inconsistencies). Low severity provides informational warnings for monitoring and auditing without blocking automated processing.
This graduated system enables risk-based processing where critical and high severity issues trigger immediate review while medium and low severity cases proceed with enhanced logging for compliance auditing.
My implementation reflects three principles for responsible AI deployment in regulated environments.
This rigorous guardrail architecture transforms the system from prototype to production-ready tool that compliance teams can trust with high-stakes decisions affecting real people's financial access and reputations.
To operationalize the multi-agent adverse media screening system for real-world compliance workflows, I developed a production-grade web interface using Streamlit. The interface balances regulatory requirements—audit trails, detailed reasoning, export capabilities—with user experience principles prioritizing clarity, speed, and actionable insights for compliance analysts.
The UI implements a two-tab navigation pattern optimized for compliance analyst workflows: New Analysis for entity submission and real-time processing, and Results Dashboard for comprehensive result visualization and analysis history. This separation reflects the natural workflow of compliance teams who initiate screenings in batches then review results systematically.
A persistent sidebar provides system information, agent details, and contextual help—critical for production environments where multiple analysts with varying expertise levels operate the system.

Progressive disclosure input form with sidebar showing system configuration and expandable additional context fields including date of birth, occupation, location, and organization.
The input form balances simplicity with power-user flexibility. The basic interface requests entity name (required) and basic context (optional). An expandable "Additional Context" section reveals structured fields that dramatically improve disambiguation accuracy: date of birth, occupation, nationality, location, organization affiliation, and other identifiers.
This design accommodates both quick screenings (name-only) and high-stakes investigations requiring precise entity resolution. All inputs undergo server-side validation through my guardrails framework, with user-friendly error messages guiding corrections.
The results interface adapts dynamically based on processing outcomes, providing specialized views for each status category.
For entities with no adverse media coverage, the interface emphasizes reassurance through visual hierarchy. The clean status view displays a prominent green checkmark with "Entity appears clean - No adverse media found", risk level badge showing "CLEAN" with "No Risk" indicator, search coverage metrics demonstrating thoroughness, and expandable assessment details supporting audit requirements.

Clean status results showing risk assessment, sources searched, and zero adverse articles found.
When adverse media is detected, the interface shifts to action-oriented presentation with comprehensive analytical depth using a three-tier information architecture.

High-risk results showing article classification table with involvement types, FATF categories, and confidence scores.
For adverse findings, the interface displays FATF category distribution using metrics cards (e.g., "Fraud & Financial Crime: 2 articles, Other Serious Crimes: 1 article"). This visualization immediately communicates the nature and diversity of adverse findings—an entity with 5 articles across multiple FATF categories represents fundamentally different risk than 5 articles about a single historic incident.

FATF category distribution and multi-format export options (JSON, CSV, Summary).
I designed three export formats optimized for different downstream uses. JSON Export provides complete machine-readable output with metadata, confidence scores, and reasoning chains for integration with case management systems. CSV Export delivers article-level tabular data for spreadsheet analysis including classifications and confidence scores. Summary Report offers human-readable text summary for compliance documentation and regulatory reporting. Each export includes timestamped filenames with entity identifiers ensuring organization and traceability.
Beyond visual design, the interface implements production-grade operational features.
My UI design choices directly support real-world compliance operations. The interface reduces false positives through context request workflows while accelerating triage via risk-stratified results. Comprehensive reasoning trails support audit requirements, analysis history enables batch processing, and clear confidence scores with conflict indicators facilitate escalation decisions.
Through this production-grade interface, the multi-agent adverse media system transitions from research prototype to operationally deployable compliance tool ready for integration into institutional KYC/AML workflows.