Cross-Check is an advanced phishing detection framework powered by Large Language Models (LLMs). Built using Google's Agent Development Kit (ADK) and Mesop, it implements a "debate" mechanism where multiple specialized AI agents analyze a website from different perspectives before reaching a consensus on its legitimacy.
This project serves as the capstone submission for the Agentic AI Developer Certification (Module 3), delivering a production-grade system designed to mitigate AI hallucinations through rigorous cross-examination and engineering.
The Challenge: Single-Point Failure
Traditional phishing detection often relies on single-point analysis—asking one model, "Is this phishing?" This approach is prone to hallucinations; a sophisticated phishing site might look visually perfect to a standard LLM, or a legitimate site might be flagged due to benign anomalies. To build a system that is truly reliable, we need to move beyond simple inference and towards a panel of experts that can debate the evidence.
Cross-Check operates on a sophisticated SequentialAgent architecture governed by a debate loop. The pipeline processes every request through three distinct stages:
Before any AI analysis occurs, the UrlPreProcessor agent executes deterministic validation. It validates the URL format, verifies reachability, and scrapes the target website to extract clean HTML and visible text. This ensures that all subsequent agents analyze the exact same snapshot of the site and prevents wasted tokens on invalid inputs.
The core of the system is the LoopAgent, which convenes a panel of four specialized experts to debate the findings:
| URL Analyst | Examines domain patterns, typosquatting, subdomain usage, and TLD characteristics. |
| HTML Structure Analyst | Inspects the code for hidden elements, obfuscated scripts, suspicious input fields, and deceptive redirection patterns. |
| Content Semantic Analyst | Analyzes visible text for manipulative language, requests for sensitive information, and social engineering tactics. |
| Brand Impersonation Analyst | Detects mismatches between the brand identity (e.g., Apple, PayPal) and the actual URL/content. |
These agents submit their findings to a Moderator, who evaluates if a consensus exists. If the team disagrees, the Moderator triggers another round, forcing the agents to refine their arguments based on peer feedback.
Once the debate concludes, a distinct JudgementAgent reviews the entire conversation history. It weighs the final arguments from all specialists and delivers the authoritative PHISHING or LEGITIMATE verdict.
Cross-Check is engineered to meet professional software standards, ensuring it is testable, portable, and resilient.
The application is fully containerized using Docker. The Dockerfile implements best practices by using uv for fast, frozen dependency management and creating a non-root mesop user for security compliance. This makes the system immediately deployable to environments like Hugging Face Spaces or Kubernetes.
Reliability is proven through a multi-layered testing strategy that goes beyond simple unit tests:
| Integration & Evaluation | The system includes a dedicated eval suite that utilizes the AgentEvaluator to run full end-to-end integration tests. By testing against structured datasets (legitimate.evalset.json and phishing.evalset.json), we can benchmark the system's actual detection performance and ensure the "debate" mechanism is functioning correctly across real-world examples. |
| Unit Tests | Individual components, such as the UrlPreProcessor and utility functions, are verified using pytest to ensure robust error handling and correct data parsing. |
| CI/CD Pipeline | A GitHub Actions workflow (tests.yml) automatically executes this entire unit test suite on every push, ensuring no regressions are introduced. |
Cross-Check demonstrates the power of Agentic AI when applied with engineering rigor. By simulating a human expert panel—analysts, moderators, and judges—it provides a transparent and robust defense against sophisticated phishing attacks, wrapped in a production-ready architecture.
You can explore the project here: Cross-Check on GitHub
PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-Wey Chong, Shankar Karuppayah
https://arxiv.org/abs/2506.15656