The rapid proliferation of high-velocity, multi-format data in the domains of finance, economics, and public policy presents a significant challenge for traditional information retrieval and analysis. Large Language Models (LLMs), while powerful, are constrained by the static nature of their training corpora, rendering them inadequate for real-time, evidence-based reasoning on contemporary events. To address this, we present a modular Retrieval-Augmented Generation (RAG) agent designed for specialized question-answering. The system architecture integrates Google's Gemini-1.5-Pro as its core reasoning engine, a local ChromaDB instance for knowledge persistence, and a suite of tools for dynamic data ingestion from heterogeneous sources, including structured APIs (NewsAPI, GDELT) and semi-structured RSS/Atom feeds. Its primary contribution is a dynamic, tool-driven ingestion pipeline that allows the agent to build a contextually relevant, ephemeral knowledge base on-demand. We demonstrate through qualitative evaluation that this approach enables nuanced, evidence-based question-answering on complex, evolving topics, providing a robust framework for specialized information synthesis.
Keywords: Retrieval-Augmented Generation (RAG), Large Language Models (LLM), Tool-Using Agents, Information Retrieval, Financial News Analysis, Knowledge Persistence.
The velocity and volume of information generated in the global economic and policy landscape necessitate advanced computational tools for timely analysis. Decision-makers, analysts, and researchers require the ability to synthesize information from a diverse set of sources, ranging from mainstream press to specialized institutional reports. While Large Language Models (LLMs) have demonstrated profound capabilities in natural language understanding and generation [1], they inherently lack access to information beyond their training data cutoff. This limitation severely curtails their utility for tasks requiring up-to-the-minute knowledge.
The Retrieval-Augmented Generation (RAG) paradigm [2] has emerged as a potent solution, augmenting LLM reasoning with information retrieved from external knowledge bases. However, most RAG implementations rely on static or pre-populated vector stores. Our work extends this paradigm by introducing a fully autonomous, tool-using agent that dynamically constructs its knowledge base in response to user queries.
This paper details the architecture and methodology of such an agent. The system leverages the native tool-calling capabilities of the Gemini-1.5-Pro model [3] to orchestrate a data pipeline that:
We posit that this agent-driven, on-demand RAG approach provides a flexible and powerful blueprint for developing specialized, real-time information analysis systems.
The system is architected as a modular, closed-loop system orchestrated by a central agent. The core components are illustrated in Figure 1 and detailed below.
Figure 1: System architecture detailing the flow from user query to answer generation through ingestion, persistence, and retrieval cycles.
2.1. Core Reasoning Engine: The system is centered around the gemini-1.5-pro-latest model, invoked via the LangChain framework. Its primary functions are planning, tool selection, and final answer synthesis.
2.2. Data Ingestion Subsystems: A suite of StructuredTool objects provides interfaces to heterogeneous external data sources:
NewsAPI Tool: Accesses high-latency, mainstream English-language news via a RESTful API.
GDELT Tool: Queries the GDELT project's massive global news database for low-latency, multilingual coverage.
RSS Tool: A generic parser for ingesting content from any specified RSS or Atom feed, targeting specialist publications.
2.3. Knowledge Persistence Layer: This layer is responsible for storing and indexing retrieved information.
Embedding Model: The all-MiniLM-L6-v2 sentence-transformer model is used to generate 384-dimensional vector embeddings of text chunks. This model was chosen for its optimal balance of performance and computational efficiency for local execution.
Vector Store: A local ChromaDB instance is used for persistent storage. Each document is stored with its text content, vector embedding, and critical metadata: source, url, and date.
2.4. Retrieval and Synthesis Workflow: The agent follows a multi-step process. After an initial ingestion phase, a dedicated Vector Search Tool is used to perform a similarity search (cosine distance) against the populated ChromaDB instance. The retrieved context is then provided to the agent for the final synthesis step.
The operational workflow of the agent constitutes a cognitive loop, executed for each user query.
3.1. Query Decomposition and Tool Selection: Upon receiving a user query, the Gemini agent, guided by a system prompt, analyzes the request. It determines the optimal strategy for information gathering, selecting one or more tools from its available suite. For example, a query about "ECB policy" might trigger the RSS tool with the ECB's feed URL, while a query about "global supply chains" might trigger the GDELT tool.
3.2. Data Ingestion and Vectorization: The selected tool is executed with parameters inferred by the LLM. The raw data (JSON or XML) is parsed, and relevant text (e.g., title, description) is extracted for each article. Each extracted text chunk is then passed to the embedding model to compute its vector representation. The text, metadata, and vector are subsequently "upserted" into the ChromaDB collection.
3.3. Contextual Retrieval: Once the ingestion phase is complete (as indicated by the tool's output message), the agent's internal state directs it to use the Vector Search Tool. This tool performs a similarity search against the vector store using the original user query as the search vector. A temporal filter is applied by default, prioritizing documents published within the last 30 days. The top-k (default k=8) most relevant document chunks are returned as context.
3.4. Answer Synthesis and Citation: The retrieved document chunks are formatted and prepended to the final prompt for the Gemini agent. The agent's task is to synthesize this context into a coherent, concise answer that directly addresses the user's query. The system prompt explicitly instructs the agent to cite its sources using the metadata (source, date, url) provided with the retrieved context, ensuring traceability and verifiability.
Orchestration Framework: LangChain v0.2
LLM Interface: langchain-google-genai
Language Model: gemini-1.5-pro-latest (Temperature: 0.0)
Embedding Model: sentence-transformers/all-MiniLM-L6-v2
Vector Database: ChromaDB v0.5.0 (local persistence)
Data APIs: NewsAPI (/v2/everything), GDELT DOC 2.0 API (/api/v2/doc/docsearch)
We evaluated the system's efficacy with a set of representative queries designed to test different facets of its functionality.
Query Type Example Query Expected Behavior Observed Outcome
Mainstream News "What is the recent sentiment on US inflation according to major news outlets?" Agent selects NewsAPI tool, ingests data, then performs vector search to synthesize an answer. The agent correctly invoked news_api_search, loaded 30 articles, then used vector_database_search to provide a summarized answer with citations from sources like Bloomberg and Reuters.
Global Event "Find reports on recent semiconductor manufacturing developments in Taiwan." Agent selects GDELT tool for its international scope. The agent correctly invoked gdelt_search, ingesting articles from various global sources. The final answer synthesized information from these geographically diverse reports.
Specialist Source "Ingest the IMF blog RSS feed and summarize the main topics discussed." Agent uses RSS tool with the specified URL, then vector searches the newly ingested titles. The agent successfully called rss_import and then, upon a follow-up query, provided a coherent summary of themes like sovereign debt and climate finance, citing the blog post titles.
These qualitative results indicate that the agent can correctly map user intent to the appropriate tool, successfully execute the multi-step RAG workflow, and produce high-quality, cited answers.
The proposed system demonstrates a viable architecture for dynamic, real-time RAG. However, several limitations present avenues for future research:
Retrieval Depth: The current implementation ingests only titles and short descriptions. Future work should involve implementing a secondary full-text scraping and chunking mechanism for ingested URLs to provide deeper context.
Retrieval Strategy: The current retrieval relies solely on semantic similarity. A hybrid search approach, combining semantic (vector) search with keyword-based sparse retrieval (e.g., BM25), could improve relevance for queries containing specific proper nouns or technical terms.
Temporal Weighting: The date filter is a simple cutoff. A more sophisticated retrieval algorithm could incorporate a decay function, up-weighting more recent documents during the similarity scoring process itself.
Quantitative Benchmarking: A rigorous evaluation framework using established Q&A datasets (e.g., a time-sensitive version of HotpotQA or a custom-built dataset) is required to quantitatively measure precision, recall, and faithfulness against baseline models.
We have presented a modular, tool-using RAG agent capable of performing real-time analysis on economic and policy-related topics. By dynamically ingesting information from heterogeneous data streams into a local vector store, the agent overcomes the static knowledge limitations of its underlying LLM. This architecture provides a flexible and powerful blueprint for developing specialized, intelligent information systems that can reason about a constantly evolving world. The successful qualitative evaluation underscores the potential of this agent-driven RAG approach for complex, real-world applications.
References
[1] Vaswani, A., et al. (2017). "Attention Is All You Need". Advances in Neural Information Processing Systems 30 (NIPS 2017).
[2] Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". Advances in Neural Information Processing Systems 33 (NeurIPS 2020).
[3] Gemini Team, Google. (2024). "Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context". Google AI Blog.