This publication presents a Retrieval-Augmented Generation (RAG) system designed to query diabetic research documents, specifically focusing on traditional Indian medical methodologies. The system enables semantic search and natural language question answering over a curated set of PDFs, providing insights into Ayurvedic and other indigenous approaches to diabetes management.
#System Architecture
The proposed system follows a Retrieval-Augmented Generation (RAG) architecture designed to provide accurate, context-aware responses for diabetes-related queries by integrating traditional Indian medical knowledge with modern medical literature. The architecture consists of four primary components: document ingestion and processing, embedding generation and storage, query-time retrieval, and response generation using a large language model.
Documents are first ingested from curated medical sources and processed into semantically meaningful chunks. These chunks are converted into vector embeddings and stored in a vector database to enable efficient similarity-based retrieval. At inference time, user queries are embedded and matched against the stored vectors to retrieve relevant contextual information, which is then injected into the prompt provided to the language model. This grounding mechanism significantly reduces hallucination and improves factual consistency.

The methodology of the proposed RAG system is divided into two major phases: an offline knowledge preparation phase and an online query processing phase.
In the offline phase, domain-specific documents related to diabetes and traditional Indian medical practices are collected, cleaned, and segmented into overlapping text chunks. Each chunk is embedded using a sentence-level embedding model and stored in a vector database for fast similarity search.
In the online phase, the system processes user queries in real time by generating embeddings, retrieving the most relevant document chunks, and constructing an augmented prompt that combines user intent with retrieved context. This augmented prompt is then passed to the language model to generate a grounded and contextually accurate response.

#Query Processing Pipeline
When a user submits a query, the system follows a structured query processing pipeline to ensure retrieval relevance and response accuracy. The pipeline begins with basic query normalization, such as lowercasing and noise removal, to ensure consistent embedding generation. The normalized query is then converted into a vector embedding using the same embedding model employed during document indexing.
The query embedding is used to perform a similarity search against the vector database, retrieving the top-k most relevant document chunks. These retrieved chunks are combined with the original query to construct an augmented prompt, which is passed to the language model for response generation. This approach ensures that generated answers are grounded in authoritative medical content.
#Optimization Techniques
Several optimization strategies were employed to improve retrieval accuracy and system efficiency. Chunk size and overlap were carefully tuned to balance semantic completeness with retrieval precision. Top-k retrieval parameters were selected to ensure sufficient contextual coverage while minimizing irrelevant information.
Additionally, embedding generation was optimized by caching document embeddings during preprocessing, reducing redundant computation at query time. These optimizations collectively improve system responsiveness and retrieval quality without introducing additional computational complexity.
#Embedding Model Selection and Configuration
The embedding model was selected based on its ability to capture semantic similarity in medical and health-related textual data. Sentence-level embeddings were chosen to ensure meaningful representation of both traditional Indian medical terminology and modern clinical language.
The selected model produces fixed-dimensional dense vector representations, enabling efficient similarity search using vector databases. This choice ensures compatibility across heterogeneous medical documents while maintaining strong semantic alignment between user queries and retrieved content.
#Significance and Implications
The proposed RAG system demonstrates the effectiveness of grounding large language models with curated domain-specific knowledge to reduce hallucinations and improve response reliability in healthcare applications. By integrating traditional Indian medical literature with contemporary diabetes research, the system enables holistic knowledge access that would otherwise be fragmented across sources.
This approach has significant implications for medical decision support, patient education, and knowledge discovery, particularly in resource-constrained environments. The system architecture is adaptable to other medical domains, making it a reusable blueprint for trustworthy AI-assisted healthcare applications.
Data Collection
Gathered diabetic research PDFs related to traditional Indian medicine
Extracted text using PDF parsers and cleaned
Chunking Strategy
Applied semantic chunking
Embedding Model
Used multilingual sentence embeddings
Vector Store
Indexed chunks for efficient similarity search.
Query Flow
User query → Embedding → Similarity search → Context retrieved → LLM generates answer.
Accuracy: The system successfully retrieved relevant content