
The GPT-Interpretable Language (GIL) Compiler is a domain-specific compiler framework that translates high-level, human-readable task specifications into optimized prompts for large language models (LLMs). By introducing a structured interface for goal declaration, GIL abstracts the complexities of prompt engineering and facilitates consistent, accurate, and interpretable interactions with generative AI systems. This paper presents the design, methodology, system architecture, and empirical evaluation of GIL, demonstrating its effectiveness in improving LLM performance across summarization, classification, and information extraction tasks.
Large language models like GPT-4 have transformed natural language processing, yet their performance remains highly sensitive to prompt phrasing. This dependence creates a barrier for non-technical users and limits scalability for developers. GIL addresses this by introducing a high-level language interface that allows users to express intent in structured natural language or declarative form, which is then compiled into optimized prompts. The GIL compiler treats prompt construction as a code generation problem, leveraging techniques from programming language theory to ensure systematic, repeatable, and tunable prompt creation.
Prompt engineering remains an opaque and trial-and-error-heavy process. Existing systems lack interpretability, composability, and abstraction layers that enable modular and scalable LLM applications. GIL aims to:
+-----------------------+
| User Goal Input (DSL)|
+-----------------------+
|
v
+-----------------------+ +--------------------------+
| Parser | ---> | Semantic Analyzer |
+-----------------------+ +--------------------------+
| |
v v
+----------------------------------------------------------+
| Prompt Optimization and Template Generation |
+----------------------------------------------------------+
|
v
+-----------------------+
| LLM Invocation |
+-----------------------+
|
v
+-----------------------+
| Output Response |
+-----------------------+
GIL introduces a DSL comprising structured instructions:
SUMMARIZE news_articles WITH tone=neutral AND format=bullet_points CLASSIFY user_reviews INTO categories=positive,negative,neutral EXTRACT named_entities FROM legal_document IN json_format
Modifiers such as tone, format, length, and audience guide the prompt refinement process.
| Task | Metric | Manual | Template | GIL |
|---|---|---|---|---|
| Summarization | Relevance Score (1–5) | 3.7 | 4.1 | 4.6 |
| Entity Extraction | Precision (%) | 82.1% | 86.5% | 91.3% |
| Sentiment Analysis | Accuracy (%) | 88.2% | 90.0% | 93.5% |
| Style Adherence | Style Score (1–5) | 3.9 | 4.0 | 4.8 |
| Prompt Length | Avg. tokens per prompt | 47 | 38 | 33 |
GIL consistently outperformed baselines across multiple tasks and metrics, producing shorter, more precise, and more effective prompts.
DSL grammar currently supports only English-based commands.
Dependency on LLM API quality for final output.
Future extensions include:
GIL Compiler presents a novel approach to prompt engineering by introducing a high-level structured language for expressing user intent and systematically compiling it into optimized LLM prompts. It improves task performance, reduces prompt complexity, and enhances interpretability. GIL represents a step toward making LLMs programmable, modular, and user-friendly.
© Craig ML Dsouza, 2025.