Unlike generic RAG systems, Lumigo supports document-specific queries, generates follow-up questions, and provides transparent source attribution for all responses. It also features a Thesis Fallback Retrieval Module that automatically searches and incorporates open-access academic theses when internal documents are insufficient, enriching the knowledge base dynamically. Leveraging MongoDB, HuggingFace Embeddings, and Vertex AI, it delivers semantically grounded insights through a clean, interactive UI.
Conventional Retrieval-Augmented Generation (RAG) systems often provide broad, non-transparent answers with limited user control, restricting their usefulness in academic research workflows.
Lumigo addresses these challenges with a multi-agent architecture that divides the search and answering process into specialized agents and tools, resulting in:
flowchart TD style Start fill:#f0f0f0,stroke:#333,stroke-width:1px style AgentStart fill:#f0f0f0,stroke:#333,stroke-width:1px style AgentExpand fill:#f0f0f0,stroke:#333,stroke-width:1px style AgentRetrieve fill:#f0f0f0,stroke:#333,stroke-width:1px style AgentCite fill:#f0f0f0,stroke:#333,stroke-width:1px style AgentSynthesis fill:#f0f0f0,stroke:#333,stroke-width:1px style AgentDecide fill:#f0f0f0,stroke:#333,stroke-width:1px style End fill:#f0f0f0,stroke:#333,stroke-width:1px Start(["Start"]) Start --> AgentStart["AgentStart\nLLM: mode_decide_prompt"] AgentStart -->|mode = explore| AgentExpand["AgentExpand\nLLM: expand_prompt"] AgentStart -->|mode = direct| AgentRetrieve["AgentRetrieve"] AgentExpand --> AgentRetrieve AgentRetrieve --> AgentCite["AgentCite"] AgentCite --> AgentSynthesis["AgentSynthesis"] AgentSynthesis --> AgentDecide["AgentDecide"] AgentDecide -->|continue = True| AgentExpand AgentDecide -->|continue = False| End(["End"]) %% Tools used by AgentRetrieve subgraph "Tools used by AgentRetrieve" direction TB Tool1["VectorSearchTool"] Tool2["ThesisSearchTool"] Tool3["DocumentRerankTool"] end AgentRetrieve --> Tool1 AgentRetrieve --> Tool2 AgentRetrieve --> Tool3 %% Tools used by AgentCite subgraph "Tools used by AgentCite" Tool4["AnswerGenerationTool"] end AgentCite --> Tool4 %% Tools used by AgentDecide subgraph "Tools used by AgentDecide" Tool5["DecisionTool"] end AgentDecide --> Tool5
Agent | Tools Used | Functionality Description |
---|---|---|
AgentRetrieve | VectorSearchTool | Performs semantic vector retrieval across document indexes, enabling precise and context-aware search results. |
ThesisSearchTool | Searches academic theses repositories as a fallback to enrich content when internal documents are insufficient. | |
DocumentRerankTool | Refines document ranking based on query context to surface the most relevant results. | |
AgentCite | AnswerGenerationTool | Generates citation-aware, Markdown-formatted answers that clearly link back to source documents for transparency. |
AgentDecide | DecisionTool | Determines whether to continue query expansion or terminate based on the evolving dialogue context, enabling dynamic multi-turn interactions. |
This modular design enables flexibility, easy maintenance, and independent upgrading or replacement of individual components without disrupting the overall system.
Lumigo leverages Retrieval-Augmented Generation (RAG) as an underlying technology to bridge semantic retrieval with large language model (LLM) generation, while emphasizing transparency and user control via its multi-agent orchestration.
Key integrations include:
This layered approach ensures that RAG functions as a supporting engine, while specialized agents handle precise task orchestration and user interaction.
Lumigo includes a dedicated analytics page that provides rich data presentations of user interactions to support ongoing system insights and improvements. Features include:
These data presentations help monitor user behavior, guide content curation, and inform system enhancements for a better academic search experience.
Lumigo supports a variety of deployment environments to fit different user needs:
Thanks to its modular agent-tool architecture, Lumigo can easily incorporate new agents or tools to support emerging research workflows or expand capabilities.
git clone git@gitlab.com:sc310542-group/Lumigo.git cd Lumigo bash script/build-docker-image.sh # Copy and configure environment variables cp .env.example .env # Fill in required keys like MONGODB_URI, OPENAI_API_KEY, VERTEX_PROJECT_ID, etc. cd deploy docker-compose up -d # For development mode bash script/run-dev-mode.sh streamlit run app.py --server.port=7860
Lumigo is evaluated against traditional keyword-based academic search engines (e.g., Google Scholar), generic RAG systems, multi-agent Q&A frameworks, and domain-specific retrieval tools.
It offers better transparency via explicit source attribution and multi-agent orchestration, outperforming generic RAG models with limited explainability. Its modular design provides enhanced user control over references, unlike monolithic systems. While traditional engines cover more documents, Lumigo adds value through dynamic follow-up questioning and fallback retrieval of open-access theses.
Optimized vector indexing and caching ensure competitive response times despite multi-agent complexity.
Future work will expand this analysis with quantitative metrics on larger academic datasets to further validate Lumigo's advantages and identify areas for optimization.
Current Limitations | Planned Improvements |
---|---|
Fixed chunk sizes may break context | Dynamic semantic chunking to preserve context integrity. |
Limited file format support | Expand support to JPG PNG format for wider ingestion capabilities, e.g. images and videos |
No quantitative benchmarking | Implement benchmarking with academic datasets to validate and improve system performance. |
Lumigo is MIT licensed. Issues and feature requests can be submitted via GitLab. Contributions are highly welcomed to help grow and improve the system.