Retrieval Augmented Generation (RAG) has revolutionized how we interact with large document collections. This publication details the evolution of the RAG Publications Assistant—from a static document retriever to a professional, session-isolated, and persistent AI platform.
A common failure in multi-document RAG systems is Context Leakage. If a user switches from "Research Paper A" to "Research Paper B," the AI often retains "memory" of the previous document, leading to hallucinations or mixed-context answers.
Our system solves this through a dual-isolation strategy:
uuid.uuid4() to generate fresh chat_id markers, ensuring the AI only retrieves past questions relevant to the current active document.We migrated from Jina to Nomic AI (nomic-embed-text-v1) to leverage task-specific embedding types, which significantly improve retrieval precision:
search_document): Optimizes the high-dimensional representation of the publication text for storage.search_query): Tailors the user's question vector to match the document storage format, reducing noise in semantic search.# Implementation of Task-Specific Retrieval query_embedding = generate_nomic_embeddings_batch( API_KEY, [refined_question], task_type="search_query" )[0]
Unlike ephemeral session memory, we implemented a MySQL-backed persistent storage layer.
We moved beyond a simple chat box to a high-fidelity Glassmorphism-inspired UI:
| Component | Technology |
|---|---|
| Backend | Flask (Python) |
| Vector Database | Qdrant |
| Embeddings | Nomic nomic-embed-text-v1 |
| Relational DB | MySQL |
| LLM Orchestration | LangChain / ChatCohere |
| Chunking Strategy | 1000-character semantic chunks |
The RAG Publications Assistant demonstrates that an effective academic tool requires more than just an LLM; it requires a robust infrastructure for memory persistence and context isolation. By grounding every answer in specific, user-controlled datasets and maintaining strict session boundaries, we provide a reliable platform for deep academic inquiry.