A simple web application that lets you ask natural-language questions about your PostgreSQL and MySQL database and have them converted into SQL queries by an LLM. It includes schema browsing, query confirmation for destructive statements, and result display with pagination and hover-tooltips.
Result Rendering: Paginated, responsive table with hover popovers for long content.
Database Connection: Connect to or create databases directly from the UI.
Backend Workflow
π§ Environment Variables
Place a .env file at the project root with the following:
# For Together AI (default)
LLM_API_KEY=your-together-api-key
LLM_API_URL=https://api.together.xyz/v1/chat/completions
LLM_API_MODEL_NAME=meta-llama/Llama-3-70B-Instruct
# OR for Gemini (optional alternative)
GEMINI_API_URL=https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=your-gemini-api-key
π Local Development
1. Clone the repository
git clone https://github.com/ah-naf/nlsql.git
cd nlsql
2. Backend setup
cd backend
go mod tidy
3. Frontend setup
cd../frontend
npminstallnpm run build
4. Run the application
cd../backend
go run cmd/main.go
Open http://localhost:8080 in your browser.
π³ Run with Docker
This project supports multi-stage Docker builds (frontend + backend). Here's how to build and run:
1. Build the image
docker build -t nlsql-app .
2. Run the container
docker run -p 8080:8080 nlsql-app
Make sure your PostgreSQL instance is accessible from the container.
Add a new user named Alice with email alice@example.com
Delete all rows from temp_sessions table
Problems Solved
No SQL Expertise Required: Users can obtain data insights without writing or understanding SQL syntax.
Faster Querying: Business analysts or non-technical stakeholders can run ad-hoc queries by typing plain English, speeding up report generation.
Reduced Human Error: Converting natural language directly into SQL minimizes typos or syntax mistakes that often occur when writing SQL manually.
Interactive Schema Exploration: The schema browser allows users to discover table and column structures without switching to a separate database client.
Safe Destructive Operations: Built-in confirmation flow prevents accidental data loss from INSERT, UPDATE, or DELETE statements.
Transaction Support: Destructive queries can be wrapped in transactions; users receive confirmation prompts before committing changes.
Uniform Interface for Multiple Databases: Supports both PostgreSQL and MySQL backends, making it easy to switch or scale across different environments.
Future Work
Support Additional SQL/NoSQL Engines:
Add connectors for other relational databases (e.g., SQL Server, SQLite) and popular NoSQL stores (e.g., MongoDB, DynamoDB) to broaden applicability.
Enhanced Prompt History:
Maintain a conversation context so that follow-up questions (e.g., βShow only those hired after 2023β) are interpreted correctly relative to previous queries.
User Authentication & Roles:
Introduce role-based access control (RBAC) so that certain users can only view data while others can perform mutations.
Natural Language Refinement:
Implement pre-processing techniques (spell correction, stop-word removal) to handle more ambiguous or complex user inputs.
Visual Query Builder Integration:
Combine the LLM-powered approach with a drag-and-drop query builder for users who want both natural language and GUI-based query construction.
Performance Monitoring & Caching:
Track query execution times and cache frequently requested results to improve responsiveness.
Multi-Language Support:
Extend natural language parsing to support languages beyond English (e.g., Spanish, Chinese) by leveraging multilingual LLMs.
Limitations
LLM Context Window Constraints:
The free or low-cost LLM endpoints have a limited context window (e.g., 4Kβ8K tokens). As schemas grow (many tables/columns), prompts may exceed the allowable token limit, leading to truncated or degraded SQL generation.
Stateless Query Processing:
Without explicit conversational context, each query is evaluated independently. Follow-up questions that reference previous results may produce incorrect SQL unless the context is manually provided in the prompt.
Model Cost & Rate Limits:
Using free LLM tiers means stricter rate limits and potential throttling during peak usage. Upgrading to a paid plan may be required for production-level performance.
Synthetic Schema Exposure:
Embedding an entire database schema in the prompt could leak sensitive structure information if the LLM providerβs security controls are not robust.
Edge-Case SQL Parsing:
For very complex queries (window functions, subqueries in nested transactions), the LLM may generate syntactically valid SQL that does not align with user intent or may not optimize performance for large datasets.
Lack of Offline Mode:
Since the architecture depends on external LLM APIs, the application cannot generate queries when offline or if the LLM service is unavailable.
Acknowledgements
LLM Service Providers β Together AI, Gemini API, and meta-llamaβs open models for powering SQL generation.
PostgreSQL & MySQL β Foundations for reliable, ACID-compliant relational storage.
Go (Golang) & React β For a performant backend and a responsive frontend experience.
Docker & Docker Compose β Streamlining deployment and environment consistency.