
As Large Language Models (LLMs) are integrated into enterprise environments, the risk of "Sensitive Information Disclosure" (OWASP Top 10 for LLM, #06) has become a critical security bottleneck. Traditional Data Loss Prevention (DLP) systems rely on static keyword matching, which fails to account for the stochastic and generative nature of LLMs. This publication presents a multi-layered guardrail engine that combines Named Entity Recognition (NER), high-entropy regex pattern matching, and semantic vector similarity to detect and redact PII, credentials, and proprietary IP in real-time.
Generative AI models often "hallucinate" training data or paraphrase sensitive internal documentation. Deterministic scanners (simple string matching) are insufficient because:
My approach implements an inference-time security proxy that evaluates LLM responses through three distinct analytical layers:
graph TD A[LLM Raw Output] --> B{Security Engine} B --> C[Layer 1: NER PII Scanner] B --> D[Layer 2: Secret & Pattern Matcher] B --> E[Layer 3: Semantic IP Comparator] C --> F[Redaction Logic] D --> F E --> G[Blocking Logic] F --> H[Sanitized Response] G --> H
Utilizing Microsoft Presidio and the en_core_web_lg NER model, this layer identifies Personally Identifiable Information (PII). Unlike simple regex, this layer understands linguistic context.
To catch high-entropy strings such as API keys and database connection strings, I implement custom PatternRecognizers.
To prevent the leakage of proprietary source code or internal IP, even when paraphrased by the LLM, the system utilizes Sentence-Transformers (all-MiniLM-L6-v2).
all-MiniLM-L6-v2 (Chosen for high performance/low latency balance).The tool is designed for local inference, ensuring that the "Security Vault" itself remains private and never leaves the host environment.
presidio-analyzer, sentence-transformers, streamlit, spacy.# Clone the repository git clone https://github.com/MANU-de/llm-leak-detector.git cd llm-leak-detector # Install requirements pip install -r requirements.txt python -m spacy download en_core_web_lg # Launch the Guardrail Dashboard streamlit run app.py
One of the core engineering challenges in AI Security is the "False Positive" trade-off. Sentinel-LLM addresses this by providing a Dynamic Threshold Controller.
| Leak Type | Sample Input | Detection Result | Action |
|---|---|---|---|
| PII | "Contact John Doe at j.doe@email.com" | Found: PERSON, EMAIL | Redact |
| Secret | "API_KEY: sk-ant-api03-..." | Found: Generic API Key | Redact |
| IP Leak | Paraphrased Internal Auth logic | 78% Semantic Similarity | BLOCK |
Initial testing indicates that the semantic layer correctly identifies proprietary code leaks with a 92% Recall rate, significantly outperforming keyword-based filters. The latency overhead introduced by the three-layer scan averages ~120ms, making it suitable for real-time human-in-the-loop applications.
Sentinel-LLM proves that effective LLM security requires a transition from "Keyword Filtering" to "Contextual Intelligence." By running these scans at the edge or as a sidecar, organizations can safely leverage Generative AI while maintaining compliance with GDPR, HIPAA, and internal IP standards.
Live Demo/Video:
Technical Demo on Loom
Source Code:
GitHub Repo
Author: Manuela Schrittwieser
LinkedIn
This tool demonstrates that "Security-by-Design" for LLMs requires a hybrid approach. Future iterations will incorporate Inbound Guardrails to detect Prompt Injection attacks using DeBERTa-v3 classifiers, creating a bi-directional "Security Sandbox" for LLM interactions.