Jul 10, 2025●32 reads●MIT License

Building an Offline RAG Chatbot with Java, Spring Boot, and Ollama

AI
Chatbot
DJL
Embeddings
Firecrawl
java
LLM
Mistral
Ollama
RAG
spring boot

Luca Dalla Vecchia

🧠 Local RAG Chatbot PoC with Java and Spring Boot

I've developed a Proof of Concept (PoC) for a Retrieval-Augmented Generation (RAG) chatbot using Java 21 and Spring Boot.
The goal was to experiment with building a fully offline, local-first AI system that combines web scraping, embeddings, and a lightweight LLM.

The application integrates a local LLM (Mistral via Ollama), an in-memory vector store, and the all-MiniLM-L6-v2 embedding model (via DJL).
You can submit URLs, extract their content via Firecrawl, generate embeddings from that content, and then query the system for context-aware responses — all running 100% locally.

🔗 You can find the full source code on GitHub: https://github.com/lucadallavecchia/rag-poc

🎯 Goal

The idea was to explore how a Spring Boot application could:

🔍 Scrape HTML content from a web page
✂️ Chunk and embed the content using a local embedding model
🧠 Use a local LLM (Mistral via Ollama) to respond to user questions
💬 Expose all of this via a REST API

Everything runs offline and without depending on external LLM APIs or SaaS tools.

🛠️ Tech Stack

Tool/Library	Role
Java 21	Language
Spring Boot	Backend framework
Ollama + Mistral	Local LLM for generation
Spring AI	Integration for LLM & embedding
DJL + MiniLM	Embedding (`all-MiniLM-L6-v2`) model
Firecrawl	Web scraping via API
In-memory store	Vector store for embedding chunks
REST API	For document upload + querying context

🔧 How it works

The user sends a URL to the /documents endpoint
The page is scraped using Firecrawl
The HTML content is cleaned, chunked and embedded
Embeddings are stored in memory
When the user sends a query to /ask, the top relevant chunks are selected using cosine similarity
A prompt (Context + Question) is sent to the local Mistral model via Ollama
The chatbot replies with a context-aware answer — all offline

▶️ Quick Start

# Prerequisites
- Java 21
- Ollama with the `mistral` model installed
- Firecrawl API key

# Clone and run the project
git clone https://github.com/lucadallavecchia/rag-poc
cd rag-poc
./gradlew bootRun

![RAC PoC Schema.png](RAC%20PoC%20Schema.png)