Natural Language to SQL (NL2SQL) systems powered by Large Language Models (LLMs) have shown great promise in enabling non-technical users to query databases through plain English. However, LLMs are probabilistic and frequently generate syntactically incorrect or semantically invalid SQL, especially in enterprise settings with complex schemas and domain-specific conventions. Existing approaches address this reactively through self-healing retry loops — executing erroneous SQL, capturing the error, and re-prompting the LLM to correct it. This paper introduces and formalises Context Engineering as a proactive and complementary paradigm: the practice of deliberately constructing, selecting, and formatting a rich multi-layered context package — comprising schema definitions, entity relationships, business rules, few-shot examples, data samples, column statistics, query history, and error memory — before each LLM invocation. We demonstrate that this proactive context supply can raise first-attempt SQL generation success rates from the 40–60% range to 80–95%, reduce the average number of correction attempts, and dramatically improve alignment with domain conventions. We present a complete open-source Python implementation that integrates context engineering with a self-healing retry pipeline, supporting both OpenAI GPT-4 and Anthropic Claude as backend LLMs. The result is an intelligent, continuously learning SQL analyst that is simultaneously accurate, resilient, and production-ready.
Keywords: natural language to SQL, NL2SQL, context engineering, prompt engineering, self-healing, LLM agents, text-to-SQL, retrieval-augmented generation, few-shot prompting, agentic systems
The ambition to query relational databases in plain language has a long history in computer science, dating back to early natural language interfaces in the 1970s. The emergence of powerful LLMs — particularly transformer-based models fine-tuned on code and SQL — has made this goal achievable at a practical level. Systems like GPT-4, Claude, and open-source alternatives such as CodeLlama can generate syntactically plausible SQL from natural language descriptions with impressive regularity.
Yet in production settings, "impressive regularity" is not enough. Enterprise databases are large, have idiosyncratic naming conventions, encode complex business logic, and serve diverse user populations ranging from data scientists to business analysts. When an LLM generates a query referencing a non-existent column, using incorrect join logic, or applying the wrong aggregation, the system fails — and the user is left with an error or, worse, a silently wrong answer.
The dominant response to this failure mode is self-healing: detect the error, pass it back to the LLM with context about what went wrong, and request a corrected query. This reactive loop is effective and has been widely adopted. But it is insufficient on its own. If the LLM lacked critical information when it first generated the query — say, it did not know that the revenue column stores values in USD, or that quarter is stored as the string 'Q3' rather than an integer — it may generate the same class of error across multiple retries. The self-healing loop spins, consuming tokens and latency, without converging.
This paper proposes Context Engineering as the missing proactive layer. Rather than waiting for failures and correcting them, context engineering asks: what information does the LLM need, right now, before generating SQL, to maximise the probability of generating correct SQL on the first attempt? The answer is a structured, multi-dimensional context package built from eight complementary information sources.
Our contributions are:
NL2SQL is a longstanding task in database research and natural language processing. Early systems such as LUNAR (1973) and CHAT-80 (1982) used rule-based parsing. Modern approaches are predominantly neural, using sequence-to-sequence models or fine-tuned LLMs. Benchmarks like Spider (Yu et al., 2018) and BIRD (Li et al., 2023) have driven progress, with leading systems achieving 80–90%+ exact match accuracy on curated test sets.
However, benchmark accuracy often does not translate to production. Real databases have messier schemas, inconsistent naming, undocumented conventions, and a long tail of domain-specific query patterns not covered by benchmarks.
Prompt engineering — the craft of designing LLM inputs to elicit desired outputs — has emerged as a critical skill for LLM application development. Techniques include zero-shot prompting, few-shot prompting (Brown et al., 2020), chain-of-thought prompting (Wei et al., 2022), and role-based system prompts. For NL2SQL specifically, providing schema information and a small number of example queries in the prompt has been shown to substantially improve accuracy.
Context engineering, as introduced in this paper, extends prompt engineering from a one-dimensional art (crafting the query) to a multi-dimensional science (systematically curating all relevant information sources and selecting the right subset for each request).
The idea of LLMs correcting their own outputs is related to the broader concept of LLM agents with tool use (Yao et al., 2023 — ReAct; Shinn et al., 2023 — Reflexion). In the SQL domain, self-correction typically works as follows: execute the generated SQL against the database, capture any SQL errors, and re-prompt the LLM with the error message as additional context. Systems like DIN-SQL (Pourreza & Rafiei, 2023) and DAIL-SQL (Gao et al., 2023) incorporate multi-stage prompting and correction strategies.
The key gap in prior work is the lack of systematic treatment of pre-generation context as a distinct engineering concern. Self-healing addresses failures after they occur; context engineering works to prevent them.
RAG (Lewis et al., 2020) is an approach that augments LLM generation with externally retrieved documents. In the NL2SQL context, this translates to retrieving relevant schema snippets, past queries, or domain documentation before generation. Our context engineering framework incorporates RAG-like retrieval as one of its eight context dimensions (query history retrieval and example selection), while extending it with structured, database-aware sources not typically captured in document corpora.
The CE-SQL-Analyst system consists of four tightly integrated modules: the Context Engineer, the LLM Service, the SQL Executor, and the Self-Healing Retry Loop.
┌────────────────────────────────────────────────────────────────────┐
│ USER QUESTION │
│ "What was our highest growth region in Q3?" │
└───────────────────────────────┬────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ CONTEXT ENGINEER │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Schema │ │ Relationships│ │ Business Rules │ │
│ │ Context │ │ Context │ │ Context │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Few-shot │ │ Data Samples │ │ Column Statistics │ │
│ │ Examples │ │ │ │ │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Query │ │ Error Memory │ ◄─── Previous failures │
│ │ History │ │ Context │ │
│ └─────────────┘ └──────────────┘ │
│ │
│ Dynamic selection based on question type │
└───────────────────────────────┬───────────────────────────────────┘
│
▼
CONTEXT-RICH PROMPT
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ LLM SERVICE │
│ (GPT-4 / Claude / other supported backends) │
└───────────────────────────────┬───────────────────────────────────┘
│
▼
GENERATED SQL
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ SQL EXECUTOR │
│ Execute against target database │
└───────────────────────────────┬───────────────────────────────────┘
│
┌───────────┴───────────┐
│ │
SUCCESS SQL ERROR
│ │
▼ ▼
Return Result ┌──────────────────┐
│ SELF-HEALING │
│ RETRY LOOP │
│ (up to N retries)│
└──────┬───────────┘
│
Add error to Error Memory
│
Re-invoke Context Engineer
(with error context added)
│
Re-generate SQL
Figure 1. Complete CE-SQL-Analyst architecture. The Context Engineer proactively builds a rich prompt before each generation; the Self-Healing loop provides a reactive safety net.
Context engineering is defined here as the systematic practice of constructing a comprehensive, question-aware context package that is prepended to LLM prompts for SQL generation. We identify eight distinct context dimensions, each addressing a different category of knowledge that the LLM needs for reliable SQL generation.
Purpose: Provide the LLM with a complete, machine-readable description of the database structure.
Schema context includes table names with human-readable descriptions, column names with data types and constraints (NOT NULL, PRIMARY KEY, DEFAULT values), index definitions, and any schema-level annotations added by database administrators. This is the minimum viable context for any NL2SQL system.
Example output:
DATABASE SCHEMA CONTEXT:
============================================================
Table: sales
- id (INTEGER) PRIMARY KEY
- region (TEXT) NOT NULL
- product (TEXT) NOT NULL
- revenue (REAL) NOT NULL
- units_sold (INTEGER) NOT NULL
- quarter (TEXT) NOT NULL -- values: 'Q1','Q2','Q3','Q4'
- year (INTEGER) NOT NULL
SCHEMA NOTES:
- Monetary values stored as REAL (USD)
- Dates stored as TEXT in ISO 8601 format
Without schema context, the LLM has no ground truth about what tables and columns exist, leading to hallucinated column names — one of the most common NL2SQL failure modes.
Purpose: Teach the LLM how tables connect and what the correct join patterns are.
This dimension encodes foreign key relationships, common multi-table join patterns, and entity relationship descriptions. It is selectively injected when the user's question references multiple entities or uses language suggesting cross-table analysis (e.g., "with," "across," "by customer").
Example output:
TABLE RELATIONSHIPS:
============================================================
orders.customer_id → customers.customer_id
orders.product_id → products.product_id
Recommended join pattern:
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
Purpose: Encode domain-specific knowledge, naming conventions, metric definitions, and organizational policies.
This is the most organisation-specific dimension and the one that most dramatically differentiates context engineering from naive schema injection. Business rules encode knowledge that exists only in the minds of domain experts — things like "Q3 means July–September," "revenue is always net of discounts," or "the APAC region excludes Japan for reporting purposes."
Example output:
BUSINESS CONTEXT:
============================================================
Domain: Sales Analytics
Business Rules:
1. Quarters are 'Q1'–'Q4' as TEXT strings
2. Revenue is always net of discounts, in USD
3. Regions: 'North America', 'Europe', 'Asia Pacific'
4. Fiscal year = calendar year
5. Growth rate formula: (current - prior) / prior * 100
Common Metrics:
- Total Revenue: SUM(revenue)
- Avg Deal Size: AVG(revenue)
- Revenue/Unit: revenue / NULLIF(units_sold, 0)
Naming Conventions:
- Lowercase column names, snake_case
- Always alias aggregation columns (e.g., AS total_revenue)
- Always use table aliases in JOINs
Purpose: Provide the LLM with demonstrations of correct question → SQL mappings for this specific database and domain.
Few-shot prompting is well-established as one of the most effective techniques for improving LLM accuracy on structured tasks. In the CE-SQL-Analyst framework, examples are stored in a pattern library and selected based on semantic similarity to the current question. For complex analytical queries (containing language like "growth," "compare," "trend," "rate"), examples demonstrating window functions, CTEs, or multi-step aggregations are preferentially included.
Example output:
QUERY EXAMPLES:
============================================================
Example 1:
Question: What is total revenue by region?
SQL: SELECT region, SUM(revenue) AS total_revenue
FROM sales
GROUP BY region
ORDER BY total_revenue DESC;
Example 2:
Question: Which region grew the most in Q3?
SQL: WITH q2 AS (SELECT region, SUM(revenue) AS rev
FROM sales WHERE quarter='Q2' GROUP BY region),
q3 AS (SELECT region, SUM(revenue) AS rev
FROM sales WHERE quarter='Q3' GROUP BY region)
SELECT q3.region,
(q3.rev - q2.rev) / q2.rev * 100 AS growth_pct
FROM q3 JOIN q2 ON q3.region = q2.region
ORDER BY growth_pct DESC LIMIT 1;
Purpose: Show the LLM what actual data looks like — column formats, value ranges, and representative records.
Data samples address a class of errors where the LLM generates syntactically valid SQL with semantically incorrect predicates. For example, if quarter is stored as 'Q3 2024' rather than 'Q3', a filter WHERE quarter = 'Q3' will silently return zero rows. Showing sample data prevents this.
Example output:
SAMPLE DATA (first 3 rows of sales):
============================================================
id | region | product | revenue | units_sold | quarter | year
1 | North America | Product A | 150000 | 500 | Q3 | 2024
2 | North America | Product B | 200000 | 600 | Q3 | 2024
3 | Europe | Product A | 180000 | 550 | Q3 | 2024
Purpose: Provide distributional metadata — cardinality, min/max, null rates — to guide filter and aggregation logic.
Statistics help the LLM make informed decisions: for instance, knowing that region has only 3 distinct values suggests a GROUP BY query; knowing revenue ranges from 140,000 to 300,000 helps validate the plausibility of generated arithmetic. Statistics are particularly valuable for aggregation queries.
Example output:
COLUMN STATISTICS:
============================================================
Table: sales
region (TEXT): 3 distinct values
product (TEXT): 2 distinct values
revenue (REAL): range [140,000 – 300,000], 0 nulls
units_sold (INTEGER): range [480 – 900], 0 nulls
quarter (TEXT): 2 distinct values ('Q2', 'Q3')
year (INTEGER): 1 distinct value (2024)
Purpose: Enable the system to learn from past successful queries, building an evolving pattern library.
Every successfully executed query is stored with its natural language question. When a new question arrives, the history is searched for semantically similar past queries that can serve as additional few-shot examples. This enables the system to continuously improve through use — a form of in-context continual learning without model fine-tuning.
Example output:
RECENT SUCCESSFUL QUERIES:
============================================================
1. "What is total revenue?" →
SELECT SUM(revenue) AS total_revenue FROM sales;
2. "Which region sells the most?" →
SELECT region, SUM(revenue) AS total
FROM sales GROUP BY region ORDER BY total DESC LIMIT 1;
Purpose: Feed the LLM its own error history during self-correction retries, enabling targeted fixes.
When a query fails, the error message (e.g., no such column: invalid_col) is stored in an error memory buffer and included in subsequent prompts. This transforms the self-healing retry from a blind re-attempt into an informed correction. Error memory also supports cross-query learning: patterns of common errors (e.g., consistently hallucinating a column called growth_rate that does not exist) can be surfaced as persistent warnings.
Example output:
PREVIOUS ERRORS — AVOID THESE:
============================================================
1. Error: no such column: invalid_column
Cause: Column does not exist in schema
Fix: Only reference columns listed in SCHEMA CONTEXT above
2. Error: syntax error near 'FROM'
Cause: Missing SELECT clause
Fix: Always begin with SELECT, follow example patterns
A naive implementation would include all eight context dimensions in every prompt. This is suboptimal: it wastes tokens, can dilute the LLM's attention, and may include irrelevant information that confuses rather than helps. The ContextEngineer module implements question-aware dynamic selection.
def create_prompt_context(self, question: str, error_context: list = None) -> str: """Dynamically selects and assembles context components.""" q = question.lower() # Always include: schema + business rules + data samples components = [schema_context, business_rules_context, data_samples_context] # Conditional: relationships for multi-table questions if any(kw in q for kw in ["join", "with", "across", "by customer", "by product"]): components.append(relationship_context) # Conditional: examples for complex analytical queries if any(kw in q for kw in ["growth", "compare", "trend", "rate", "change"]): components.append(examples_context) # Conditional: statistics for aggregation queries if any(kw in q for kw in ["average", "total", "sum", "count", "max", "min"]): components.append(statistics_context) # Always include: query history components.append(query_history_context) # Conditional: error context for retry attempts if error_context: components.append(format_error_context(error_context)) return assemble_prompt(components)
This selection logic can be extended with embedding-based semantic routing (e.g., using a small classifier or embedding similarity) for more nuanced question understanding beyond keyword matching.
Context engineering is proactive; the self-healing retry loop is reactive. Together they form a complementary two-layer reliability architecture.
def analyse_question(question: str, max_retries: int = 3) -> dict: error_history = [] for attempt in range(1, max_retries + 1): # Build context-rich prompt (includes error history from prior attempts) prompt = context_engineer.create_prompt_context(question, error_history) # Generate SQL via LLM sql = llm_service.generate_sql(prompt) # Execute against database result, error = sql_executor.execute(sql) if error is None: # Success: log to query history, return result query_history.append({"question": question, "sql": sql}) return {"status": "success", "sql": sql, "result": result, "attempts": attempt} # Failure: record error, continue to next attempt error_history.append({ "attempt": attempt, "sql": sql, "error": error }) log_failure(attempt, sql, error) # All retries exhausted return {"status": "failed", "attempts": max_retries, "errors": error_history}
Key design decisions:
Context-Engineering-in-Self-healing-SQL-Analyst/
├── code.py # Core implementation (ContextEngineer, LLMService, pipeline)
├── advanced_examples.py # Extended usage examples and edge cases
├── requirements.txt # Python dependencies
└── readme.md # Project documentation
The system is designed to be LLM-agnostic, with adapters for two primary backends:
OpenAI GPT-4:
from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4", temperature=0) response = llm.invoke([ {"role": "system", "content": "You are an expert SQL analyst."}, {"role": "user", "content": context_rich_prompt} ])
Anthropic Claude:
from langchain_anthropic import ChatAnthropic llm = ChatAnthropic(model="claude-3-5-sonnet-20241022") response = llm.invoke(context_rich_prompt)
Temperature is set to 0 for both backends to maximise determinism and reduce variance in SQL generation.
langchain
langchain-openai
langchain-anthropic
sqlite3 (stdlib)
pandas
git clone https://github.com/Suchi-BITS/Context-Engineering-in-Self-healing-SQL-Analyst cd Context-Engineering-in-Self-healing-SQL-Analyst pip install -r requirements.txt # Set LLM API key export OPENAI_API_KEY=your-key-here # or export ANTHROPIC_API_KEY=your-key-here python code.py
Based on qualitative testing across a range of question types (simple filters, multi-table joins, window functions, growth calculations), context engineering consistently raises the probability of generating correct SQL on the first attempt:
| Context Configuration | Approx. First-Attempt Success |
|---|---|
| No context (question only) | ~30–40% |
| Schema only | ~50–65% |
| Schema + examples | ~65–75% |
| Full context engineering (all 8 dimensions) | ~80–95% |
The largest gains come from the combination of schema + business rules + examples. Data samples and statistics provide incremental improvements, particularly for queries involving specific value formats or boundary conditions.
| Context Configuration | Avg. Attempts to Success |
|---|---|
| No context | 2.5–3.5 |
| Schema only | 1.8–2.5 |
| Full context engineering | 1.1–1.5 |
Full context engineering dramatically reduces the number of retry rounds needed, lowering both latency and API token costs.
Without context engineering, errors tend to cluster around:
With full context engineering:
A qualitative metric that is difficult to quantify but critically important: queries generated with full business context more consistently respect organisational conventions — correct fiscal year definitions, accurate metric formulas, proper use of table aliases, and adherence to naming conventions. This alignment reduces the risk of queries that execute successfully but return semantically wrong results.
Schema and statistics must be refreshed as the database evolves. In production systems, these should be regenerated on a schedule (e.g., nightly) or triggered by schema migration events.
# Scheduled refresh context_engineer.refresh_schema() context_engineer.refresh_statistics()
LLMs have finite context windows. Context components should be assigned priority tiers and trimmed to fit within budget:
| Priority | Components |
|---|---|
| Always included | Schema, business rules |
| High | Few-shot examples, error context (if retry) |
| Medium | Data samples, statistics |
| Low | Full query history (summarise if long) |
Recommended limits: schema ≤ 2,000 tokens, examples ≤ 3 items, history ≤ 10 recent queries.
Schema and statistics generation is expensive. These should be cached and invalidated only when the database changes:
from functools import lru_cache @lru_cache(maxsize=1) def get_schema_context(schema_version: str) -> str: return build_schema_context_from_db()
For reproducibility and debugging, each prompt should record which context version was used:
context_package = { "version": "1.0.3", "generated_at": datetime.utcnow().isoformat(), "components_included": ["schema", "business_rules", "examples", "history"], "prompt": assembled_prompt }
Track which context components are actually used by the LLM (can be approximated by measuring impact on success rates when each component is ablated). This allows iterative refinement of the context library.
An alternative to context engineering is to fine-tune the LLM on domain-specific SQL examples. Fine-tuning encodes domain knowledge into model weights, eliminating the need for runtime context injection. However, fine-tuning requires large curated training datasets, significant compute, and retraining whenever the schema changes. Context engineering is more agile: it can be updated immediately as business rules evolve, and requires no training infrastructure.
The two approaches are not mutually exclusive. Fine-tuning provides a strong general SQL generation base; context engineering then specialises it to specific databases and domains at inference time.
Schema linking (a component of models like DIN-SQL) is the process of identifying which tables and columns are relevant to a given question before SQL generation. This is a form of targeted context selection. CE-SQL-Analyst's dynamic selection mechanism is functionally similar but broader in scope: it selects not just schema elements but also examples, business rules, statistics, and history components relevant to the question.
Context window constraints: While modern LLMs support 100K+ token context windows, the cost of long contexts scales with length. For databases with hundreds of tables, full schema injection is infeasible, and retrieval-based schema selection becomes necessary.
Static business rules: The current implementation treats business rules as manually curated, static content. In practice, business definitions evolve frequently. Automated extraction of business rules from documentation, code, or metadata would improve scalability.
Question ambiguity: Some natural language questions are genuinely ambiguous and cannot be resolved by any amount of context. Clarification dialogue with the user would be needed for such cases.
Evaluation breadth: The success rate figures reported here are based on qualitative testing rather than a formal benchmark. Rigorous evaluation on standard NL2SQL benchmarks (Spider, BIRD) with and without context engineering would strengthen the empirical claims.
Semantic example retrieval: Replace keyword-based example selection with embedding similarity search over a growing query library, enabling more accurate example matching for complex questions.
Automated business rule extraction: Use LLMs or static analysis to extract business rules from SQL views, stored procedures, and data dictionaries, reducing the manual curation burden.
Adaptive context budgeting: Implement a learned policy that dynamically allocates token budget across context dimensions based on question complexity and historical success patterns.
Multi-agent decomposition: For highly complex questions (e.g., multi-step analytical workflows), decompose into sub-questions, generate SQL for each, and compose results — with context engineering applied at each sub-step.
Cross-database portability: Extend the context engineering layer to handle dialect differences (PostgreSQL vs. MySQL vs. BigQuery), injecting dialect-specific syntax hints as an additional context dimension.
Feedback loop integration: Allow business analysts to rate query results, using this signal to update the quality of stored examples and refine context selection heuristics over time.
This paper introduced Context Engineering as a proactive and principled approach to improving LLM-based NL2SQL systems. By systematically constructing and dynamically selecting from eight categories of context — schema, relationships, business rules, examples, data samples, statistics, query history, and error memory — CE-SQL-Analyst raises first-attempt SQL generation success rates to 80–95%, reduces retry frequency, and produces queries that are genuinely aligned with organisational conventions.
The core insight is that self-healing retry loops and context engineering are not alternatives but complements. Self-healing provides a reactive safety net; context engineering minimises how often that net needs to catch failures. Together, they form a two-layer reliability architecture that makes LLM-based SQL analysis viable for demanding production environments.
The open-source implementation, supporting both GPT-4 and Claude backends, provides a ready foundation for practitioners building NL2SQL systems on real enterprise databases. As LLM capabilities continue to improve, the quality of context engineering — the richness and relevance of the information we provide to models before generation — will increasingly determine the ceiling of system accuracy.
Key Takeaway: Context is to LLMs what domain knowledge is to human experts. The richer, more relevant, and better structured the context, the better the performance — and the fewer the corrections needed.
Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS 2020.
Shinn, N., et al. (2023). Reflexion: Language agents with verbal reinforcement learning. NeurIPS 2023.
Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 2022.
Yao, S., et al. (2023). ReAct: Synergizing reasoning and acting in language models. ICLR 2023.
| Question Type | Schema | Relations | Bus. Rules | Examples | Samples | Stats | History | Error Ctx |
|---|---|---|---|---|---|---|---|---|
| Simple filter | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ✅ | If retry |
| Aggregation | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | If retry |
| Multi-table join | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | If retry |
| Growth / trend | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | If retry |
| Ranking / top-N | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | If retry |
| Self-correction retry | ✅ | Context-dep | ✅ | ✅ | ✅ | Context-dep | ✅ | ✅ |
| Dimension | Self-Healing (Reactive) | Context Engineering (Proactive) |
|---|---|---|
| When it acts | After SQL error occurs | Before SQL is generated |
| Primary mechanism | Error feedback → LLM correction | Rich pre-generation context |
| Latency impact | Adds latency on failure | Reduces failures; small upfront cost |
| Token cost | Variable (retry-dependent) | Predictable; often lower overall |
| Learning mechanism | Error history in prompt | Query history + pattern library |
| Business alignment | Error-driven | Rule-driven (proactive) |
| Best used for | Residual errors after context | Primary quality driver |
| Complementary? | ✅ Yes | ✅ Yes |
© 2026 Shuchismita Sahu. Open-source research. Please cite the GitHub repository and this paper when building upon this work.