I've developed a Proof of Concept (PoC) for a Retrieval-Augmented Generation (RAG) chatbot using Java 21 and Spring Boot.
The goal was to experiment with building a fully offline, local-first AI system that combines web scraping, embeddings, and a lightweight LLM.
The application integrates a local LLM (Mistral via Ollama), an in-memory vector store, and the all-MiniLM-L6-v2 embedding model (via DJL).
You can submit URLs, extract their content via Firecrawl, generate embeddings from that content, and then query the system for context-aware responses ā all running 100% locally.
š You can find the full source code on GitHub: https://github.com/lucadallavecchia/rag-poc
The idea was to explore how a Spring Boot application could:
Everything runs offline and without depending on external LLM APIs or SaaS tools.
| Tool/Library | Role |
|---|---|
| Java 21 | Language |
| Spring Boot | Backend framework |
| Ollama + Mistral | Local LLM for generation |
| Spring AI | Integration for LLM & embedding |
| DJL + MiniLM | Embedding (all-MiniLM-L6-v2) model |
| Firecrawl | Web scraping via API |
| In-memory store | Vector store for embedding chunks |
| REST API | For document upload + querying context |
/documents endpoint/ask, the top relevant chunks are selected using cosine similarityContext + Question) is sent to the local Mistral model via Ollama# Prerequisites - Java 21 - Ollama with the `mistral` model installed - Firecrawl API key # Clone and run the project git clone https://github.com/lucadallavecchia/rag-poc cd rag-poc ./gradlew bootRun 