๐ก๏ธ Phishing Analyzer
Multi-Agent Email Security System

A SOC-grade phishing detection system that analyzes raw .eml email files using a multi-agent architecture, producing deterministic risk scores and actions such as Allow, Flag, or Quarantine.
Designed to simulate real-world enterprise email security pipelines.
โจ Key Highlights
- ๐ง Analyze raw
.eml email files
- ๐ค Multi-Agent architecture (Header, Content, URL, Domain, Attachment)
- ๐ SOC-style cross-agent correlation
- ๐ Deterministic risk scoring (0โ100)
- ๐จ Final actions: Allow / Flag / Quarantine
- ๐งช Demo mode with phishing samples
- ๐ง Real credential-phishing detection
- ๐ฅ๏ธ Interactive Streamlit UI
- ๐งช 70%+ test coverage with pytest
๐ง System Architecture
The system processes emails using independent detection agents, then correlates their findings using SOC-style logic.

๐ Project Structure
phishing-analyzer-prod/
โ
โโโ __init__.py
โโโ logging_config.py
โโโ app/
โ โโโ app.py # Streamlit UI
โ
โโโ phishing_analyzer/
โ โโโ agents/
โ โ โโโ ingestion.py
โ โ โโโ header_agent.py
โ โ โโโ content_agent.py
โ โ โโโ url_agent.py
โ โ โโโ domain_agent.py
โ โ โโโ attachment_agent.py
โ โ โโโ risk_agent.py
โ โ โโโ reporter_agent.py
โ โ
โ โโโ orchestration/
โ โ โโโ prefect_flow.py
โ โ
โ โโโ tools/
โ โ โโโ url_tool.py
โ โ โโโ attachment_tool.py
โ โ โโโ virustotal_tool.py
โ โ
โ โโโ config/
โ โ โโโ risk_config.py
โ โ
โ โโโ safety/
โ โ โโโ guardrails.py
โ โ
โ โโโ utils/
โ โโโ resilience.py
โ
โโโ samples/
โ โโโ dhl_delivery_failure_phish.eml
โ โโโ microsoft_password_reset_phish.eml
โ โโโ Updates to how privacy settings work on Play.eml
โ โโโ Help shape Advent of Cyber 2026 ๐.eml
โ
โโโ images/
โ โโโ architecture.png
โ โโโ title.png
โ
โโโ tests/
โ โโโ unit/
โ
โโโ README.md
โโโ .env
โโโ requirements.txt
โโโ pyproject.toml
๐ Analysis Flow
- Raw
.eml email is ingested
- Email is parsed into structured components
- Each agent analyzes its own signal independently
- Agents return risk scores + indicators
- Risk Agent applies cross-agent correlation
- Final decision is produced:
- Score
- Severity
- Action
- Confidence
No agent can directly allow or block an email on its own.
๐งฉ Agents Overview
๐ฅ Ingestion Agent
- Parses
.eml files
- Extracts:
- Email body
- URLs
- Attachments
- Sender & domain
- Detects:
- Brand impersonation
- Sender spoofing indicators
- Adds risk for suspicious headers
๐ง Content Agent
- Detects credential phishing
- Looks for:
- Password reset language
- Urgency & coercion
- Brand impersonation keywords
- Assigns real, non-zero phishing risk
๐ URL Agent
- Detects:
- Malformed / obfuscated URLs
- URL shorteners
- Suspicious URL keywords
- Works even without VirusTotal
- Adds meaningful risk in demo mode
๐ Domain Agent
- Checks:
- Domain age (WHOIS)
- Recently registered domains
- Correlation triggers even when domain age is unknown
- Optional VirusTotal reputation lookup
๐ Attachment Agent
- Flags risky attachment types
- Optional hash-based VirusTotal lookup
โ ๏ธ Risk Agent (Core Intelligence)
- Aggregates all agent risks
- Applies SOC-style correlation, for example:
- Content phishing + URL โ boosted risk
- Content phishing + attachment โ boosted risk
- Produces:
- Final score
- Severity
- Action
- Confidence
๐ Risk Thresholds
| Score Range | Severity | Action |
|---|
| 0โ49 | Info | Allow |
| 50โ69 | Medium | Flag |
| 70โ100 | High | Quarantine |
๐งช Demo Mode vs Real-World Mode
Demo Mode (Default)
- VirusTotal optional
- Uses heuristic and structural analysis
- Safe for classrooms, demos, GitHub, interviews
- Still produces real phishing decisions
Real-World Mode
- Enable VirusTotal via
VT_API_KEY
- Adds reputation-based confirmation
- Same scoring and correlation logic
- No logic changes required
๐งช Sample Output (High-Risk Phishing)
๐ dhl_delivery_failure_phish.eml
{
"from": "DHL Express <noreply@dhl-track-support.com>",
"domain": "dhl-track-support.com",
"risk": {
"score": 90,
"severity": "High",
"action": "Quarantine",
"confidence": "High"
},
"findings": {
"headers": [
"Brand impersonation detected: dhl"
],
"content": [
"Brand impersonation detected: dhl"
],
"urls": {
"indicators": [
"Malformed URL detected"
],
"virustotal": "not_configured"
},
"attachments": {
"indicators": [],
"virustotal": "not_configured"
},
"domain": {
"age_days": null,
"virustotal": "enabled"
}
}
}
๐ Python Virtual Environment Setup
1๏ธโฃ Create virtual environment
python -m venv venv
2๏ธโฃ Activate virtual environment
Windows
venv\Scripts\activate
macOS / Linux
source venv/bin/activate
3๏ธโฃ Install dependencies
pip install -r requirements.txt
โถ๏ธ Run the Application
streamlit run app/app.py
Upload a .eml file and view the phishing analysis.
๐งช Run Tests
pytest --cov=phishing_analyzer
โ Minimum 70% test coverage enforced
๐ Resilience & Reliability
The system is designed to fail safely:
- Retry logic with exponential backoff for external tools
- Timeouts to prevent stuck workflows
- Graceful degradation when VirusTotal is unavailable
- No silent failures โ errors are logged and surfaced
- Deterministic behavior even when signals are missing
This ensures consistent behavior in real SOC environments.
๐ Logging & Observability
- Centralized logging configuration
- Clear logs for:
- Agent decisions
- External tool failures
- Correlation triggers
- Enables debugging, auditing, and future SIEM integration
๐ Security & Safety Guardrails
.eml files are parsed safely (no execution)
- Attachments are never opened or executed
- External API calls are isolated and optional
- Input validation and sanitization enforced
- No destructive actions performed on user systems
This project includes built-in safety mechanisms to ensure robustness, secure handling of untrusted email content, and fail-safe behavior under errors.
๐งน Input Sanitization & Content Safety
All user-supplied and email-derived text is sanitized before analysis or UI rendering:
- Removes embedded
- Strips all remaining HTML tags
- Decodes HTML entities
- Normalizes whitespace
This prevents:
- XSS risks in the Streamlit UI
- Malicious HTML or JavaScript execution
- Parser confusion from malformed markup
๐งฑ Graceful Degradation
- The system is designed to continue operating even when optional components fail:
- VirusTotal unavailable โ system falls back to heuristic analysis
- WHOIS lookup fails โ correlation still triggers
- Individual agent failure โ overall pipeline continues
No single failure causes the system to crash or silently skip analysis.
๐ Deterministic & Auditable Decisions
- No opaque ML decisions in the core pipeline
- Every risk increase is traceable to:
- A specific agent
- A specific indicator
- Or an explicit correlation rule
- Final decisions are explainable and auditable
โ ๏ธ Known Limitations
- Rule-based and heuristic driven (no ML model yet)
- Free VirusTotal API limits apply
- No attachment sandbox execution
- Designed for analysis and decisioning, not auto-remediation
๐ Future Enhancements
- ML-based phishing classifier
- Attachment sandboxing
- SIEM / SOAR integration
- Batch email ingestion