The Modeling and Simulation RAG System is a Retrieval-Augmented Generation framework developed to facilitate efficient access to domain-specific knowledge in the fields of modeling and simulation. This is achieved by integrating vector-based document retrieval with large language model generation, this system allows users to pose natural language queries and receive accurate, context-grounded responses drawn from a custom set of documents. The primary motivation is to address the challenges of information retrieval from technical literature, where traditional search methods often fall short in providing precise and comprehensive answers. This publication details the system's architecture, implementation, and evaluation, demonstrating its utility for researchers, students, and practitioners in AI and simulation-related disciplines.
Modeling and simulation play pivotal roles in various scientific and engineering domains, enabling the analysis and prediction of complex systems without the need for physical prototypes or extensive real-world testing. However, accessing specific information from vast repositories of technical documents can be time-consuming and inefficient, often requiring manual sifting through irrelevant content. This inefficiency motivates the development of advanced retrieval systems that can intelligently extract and synthesize relevant information. The Modeling and Simulation RAG System emerges from this need, combining retrieval-augmented generation techniques to provide users with quick, accurate answers to queries such as "What is the role of agent-based modeling in system simulation?" The system is built on the premise that integrating local vector stores with cloud-based language models can bridge the gap between unstructured document data and user queries, enhancing productivity in research and education.
At a high level, the system operates through a pipeline that begins with document ingestion, where text files containing information on modeling and simulation are processed and stored in a vector database. When a user submits a query, the system retrieves the most relevant document chunks using similarity search and augments the query with this context before passing it to a large language model for response generation. This architecture ensures that answers are grounded in the provided documents, reducing hallucinations common in standalone LLMs. The design prioritizes simplicity and efficiency, with a command-line interface for interaction, making it accessible for users without extensive technical setup.
The component architecture consists of several interconnected modules. The document loader handles input from the docs/ directory, splitting text into manageable chunks using a recursive character splitter to maintain context. Embeddings are generated using Hugging Face's all-MiniLM-L6-v2 model, which transforms text into vector representations stored in FAISS for fast retrieval. The retriever component performs similarity searches to fetch top-k chunks, while the generator uses Groq's LLM to produce natural language responses. A fallback mechanism is integrated to return raw retrieved documents if the API connection fails, ensuring reliability. This modular design allows for easy extension, such as adding support for PDF files or alternative embedding models.
Although the system does not use LangGraph for orchestration, the workflow is managed through LangChain's chaining mechanisms, which sequence the retrieval and generation steps. The pipeline is initiated by loading the FAISS index, configuring the retriever with similarity search parameters (k=3), and defining a prompt template that incorporates the query and retrieved context. This orchestration ensures seamless flow from user input to output, with logging for debugging network or processing issues. The design adapts LangChain's flexibility to create a robust, stateful process that can handle multiple queries in a session.
The core system components include the ingestion script (generate_vector_embeddings.py), which processes documents into a FAISS vector store, and the query script (language_model.py), which sets up the RAG chain. The ingestion module loads text files, splits them into chunks of 500 characters with overlap, generates embeddings, and saves the index locally. This component ensures that the system can operate offline for retrieval once the index is built. The query module loads the index, configures the retriever and LLM, and handles user interactions via a Command Line Interface loop, providing answers or fallbacks as needed.
In this system, the "agent" functionality is embodied in the retrieval and generation modules. The retrieval module acts as an agent for context gathering, using FAISS to search and rank document chunks based on query similarity. The generation module, powered by Groq's LLM, functions as a reasoning agent, synthesizing the retrieved context into coherent responses. Together, these modules simulate agent-like behavior without explicit multi-agent orchestration, focusing on efficiency for domain-specific queries in modeling and simulation.