We use cookies to improve your browsing experience and to analyze our website traffic. By clicking “Accept All” you agree to our use of cookies. Privacy policy.

Autonomous Multi-Agent Research Report Generation System

cover photo.png

Autonomous Multi-Agent Research Report Generation System

1. Introduction

Large Language Models are powerful at generating text, but they struggle with structured reasoning, source grounding, and evidence validation when operating as a single monolithic system. Traditional pipelines often combine planning, retrieval, verification, and synthesis into one opaque process, leading to hallucinations, weak attribution, and limited explainability.

This project presents an Autonomous Multi-Agent Research Report Generation System built using LangGraph orchestration, ChromaDB-based Retrieval-Augmented Generation (RAG), DuckDuckGo web search, FastAPI for backend services, and Streamlit for interactive UI. The system decomposes research into structured steps executed by specialized agents, ensuring modular reasoning, traceability, and evidence-backed outputs.

The system supports multiple LLM providers (OpenAI, Groq, Google Gemini) and generates outputs in Markdown and PDF formats.

2. Problem Statement

Generating high-quality research reports requires:

Structured topic decomposition
Reliable information retrieval
Evidence validation
Logical synthesis
Proper citation and traceability

Single-agent systems attempt to solve all of these simultaneously, resulting in:

Hallucinated or unsupported claims
Weak citation grounding
Poor modularity
Low transparency in reasoning

This project addresses these limitations through a multi-agent architecture, where each agent performs a well-defined role within a coordinated workflow.

3. System Architecture Overview

The system is implemented as a LangGraph state machine, where agents operate over a shared structured state and communicate through deterministic transitions.

Technology Stack

Component	Technology	Purpose
Orchestration	LangGraph	State machine workflow
API	FastAPI	REST API
Frontend	Streamlit	Interactive UI
LLM	OpenAI / Groq / Gemini	Language models
Embeddings	sentence-transformers/all-MiniLM-L6-v2	Semantic encoding
Vector DB	ChromaDB	Persistent retrieval
Search	DuckDuckGo	Web search
Scraping	BeautifulSoup + requests	Content extraction
Export	ReportLab	PDF generation
Output Format	Markdown / PDF	Report formatting

4. Multi-Agent Design

The system consists of four specialized agents:

4.1 Planner Agent

Responsible for transforming the input topic into a structured research plan:

Decomposes topic into logical sections
Generates research questions
Produces structured JSON outline

4.2 Research Agent

Builds the knowledge base:

Performs DuckDuckGo search
Extracts webpage content
Chunks text into segments
Generates embeddings
Stores vectors in ChromaDB with metadata

4.3 Verifier Agent

Validates the quality and completeness of retrieved information:

Performs similarity search
Computes coverage score
Evaluates source diversity
Applies trust scoring heuristics
Flags insufficient or low-confidence data

4.4 Writer Agent

Generates the final report:

Retrieves top-k relevant chunks
Synthesizes findings into structured sections
Attaches citations
Outputs Markdown and PDF

5. Orchestration with LangGraph

The system uses a shared state object:

class ResearchState(TypedDict):
    topic: str
    outline: dict
    research_complete: bool
    verified: bool
    report: str

Workflow

User Topic
    ↓
Planner Agent
    ↓
Researcher Agent
    ↓
Verifier Agent
    ↓
Writer Agent
    ↓
Structured Report (MD/PDF)

Each agent updates the shared state, enabling controlled transitions between stages.
This design ensures:

Clear separation of concerns
Traceable reasoning flow
Modular debugging
Expandability

Flow Diagram

The workflow pipeline is visualized as:

6. Tool Integration

The system integrates multiple external tools:

6.1. DuckDuckGo Search Tool

Provides free web search capability for retrieving research sources.

6.2. Web Content Extraction Tool

Uses requests and BeautifulSoup to extract readable text from webpages.

6.3. Embedding Model

sentence-transformers/all-MiniLM-L6-v2 generates semantic embeddings for chunked text.

6.4. ChromaDB Vector Store

Stores embeddings and metadata for similarity-based retrieval.

6.5. Markdown Report Formatter

Formats structured output into publishable research documentation.

6.6. ReportLab PDF Exporter

Generates PDF versions of reports.

This satisfies the requirement of integrating multiple tools within a coordinated multi-agent system.

7. Execution Flow

User provides a research topic via API or UI.
Planner generates a structured outline.
The research agent performs a web search and builds a vector knowledge base.
Verifier checks coverage and retrieval quality.
Writer synthesizes final evidence-backed report.
Final report is exported in Markdown and PDF formats.

8. Screenshots & User Interface

The system provides an interactive Streamlit frontend for easy usage, alongside the FastAPI backend. Below are key screenshots demonstrating the user experience and output quality.

8.1 Streamlit Frontend – Input Screen

Topic entry field
"Generate Report" button

src 1.PNG

8.2 Final Report Output (Markdown View)

Structured sections with headings
Inline citations
Clean, readable formatting

src 4.PNG

8.3 PDF Export Preview

Professional layout generated via ReportLab
Preserves citations and formatting

src 5.PNG

These screenshots showcase the end-to-end user journey from topic input to fully cited, exportable research report.

9. Demo Video

A short walkthrough demonstrating:

End-to-end workflow (topic → report)
Agent pipeline execution
Verification step behavior
Final output (Markdown and PDF)

Video Link:

10. Repository Structure

.
├── backend/
│   ├── app/
│   │   ├── agents/               # Agent logic
│   │   ├── api/                  # FastAPI routes
│   │   ├── config/               # Settings & env handling
│   │   ├── core/                 # Shared utilities
│   │   ├── data/
│   │   │   ├── chroma/           # ChromaDB persistence
│   │   │   └── state/            # JSON state snapshots per report
│   │   ├── graph/                # LangGraph workflow definition
│   │   ├── schemas/              # Pydantic models
│   │   ├── tools/                # Search, loader, embedding tools
│   │   ├── outputs/              # ← Generated Markdown + PDF reports go here
│   │   └── main.py               # FastAPI entry point
│   └── tests/                    # Tests
├── frontend/
│   └── app.py                    # Streamlit UI
├── sample-scr/                   # Sample screenshots / example outputs
├── .env_example
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

This modular structure reflects the agent-tool separation principle.

11. Key Design Decisions

11.1. Explicit Role Separation

Each agent performs one clearly defined responsibility.

11.2. Local Embeddings + Vector Store

Ensures cost-efficiency and reproducibility.

11.3. Free Web Search Integration

Avoids API dependency barriers.

11.4. State-Based Orchestration

LangGraph enables deterministic and extensible workflow control.

11.5. Multi-LLM Support

Supports OpenAI, Groq, and Google Gemini for flexibility.

11.6. API and UI Integration

FastAPI for backend services and Streamlit for user-friendly interface.

12. Performance Evaluation

Evaluation Setup

The system was tested on 10 topics across multiple domains:

Artificial Intelligence
Healthcare
Finance
Climate Science
Education

Each topic was processed once using identical system configuration.

Metrics Definition

Coverage Score = (number of sections with at least one supporting source) / (total planned sections)
Response Time = total time taken to generate final report
Source Count = number of unique sources used per report
Hallucination Proxy = percentage of unsupported claims (manually sampled)

Results

Metric	Value Range
Response Time	8–15 seconds
Coverage Score	0.75 – 0.90
Sources Used	4 – 8
Hallucination	~10–15% (sampled)

13. Comparison with Baseline

Baseline: Single-agent LLM pipeline without verification

Metric	Single-Agent	Multi-Agent System
Avg Latency	~6 sec	8–15 sec
Coverage Score	~0.55–0.65	0.75–0.90
Sources Used	2–3	4–8
Hallucination	~30–40%	~10–15%
Explainability	Low	High

14. Error Handling Mechanism

The system incorporates fault tolerance and robustness strategies:

Retry logic for failed API or network calls
Fallback mechanisms for insufficient search results
Verifier-based detection of low coverage scenarios
Graceful degradation allowing partial report generation
Deduplication and filtering of noisy data
Exception handling across agent transitions

Example Failure Scenario

If the Research Agent fails to retrieve sufficient content:

Retry is triggered (based on MAX_RETRIES)
If still insufficient, Verifier flags low coverage
System proceeds with partial report generation
Output is marked with lower confidence

15. Significance & Real-World Impact

This system provides a scalable framework for automated research generation with strong grounding and traceability.

Key Contributions

Reduces manual research effort
Improves reliability via verification layers
Enables structured knowledge synthesis
Provides a transparent reasoning pipeline

Applications

Academic research automation
Business intelligence reporting
Market analysis
Technical documentation generation

Trade-offs

Improved accuracy comes at the cost of increased latency
System complexity is higher compared to single-agent pipelines

16. Results

The system successfully produces:

Structured, multi-section research reports
Citation-backed outputs
Traceable execution flow
Improved factual consistency compared to baseline systems

17. License & Usage Rights

This project is licensed under the MIT License.

Permissions:

Commercial use
Modification
Distribution
Private use

Conditions:

License and copyright notice must be included

Limitations:

No liability or warranty provided

18. Configuration and Tuning

LLM Priority Order

OpenAI
Groq
Google Gemini

Override models:

OPENAI_MODEL=gpt-4o-mini
GROQ_MODEL=llama-3.1-8b-instant
GOOGLE_MODEL=gemini-2.0-flash

Search & Retrieval

Variable	Default
MAX_SEARCH_RESULTS	4
MIN_TEXT_LENGTH	800
CHUNK_SIZE	800
CHUNK_OVERLAP	120
TOP_K_EVIDENCE	5

Verification

Variable	Default
COVERAGE_THRESHOLD	0.7
MIN_SOURCE_DIVERSITY	0.3
MAX_RETRIES	1

Storage

Variable	Default
CHROMA_PERSIST_DIR	backend/app/data/chroma
STATE_DIR	backend/app/data/state
OUTPUT_DIR	backend/app/outputs
EXPORT_PDF	1

Performance Tuning

Optimize for Speed: Reduce MAX_SEARCH_RESULTS, CHUNK_SIZE, TOP_K_EVIDENCE.
Optimize for Quality: Increase MAX_SEARCH_RESULTS, CHUNK_SIZE, TOP_K_EVIDENCE, COVERAGE_THRESHOLD.
Optimize for Cost: Use Groq or Google models.

19. API Reference

Base URL: http://localhost:8000

Generate Research Report

POST /generate-report

Request:

{  
  "topic": "Applications of artificial intelligence in healthcare"  
}

Response:

{  
  "topic": "...",  
  "report": "...",  
  "citations": [],  
  "coverage_score": 0.85,  
  "verified": true,  
  "report_id": "uuid",  
  "outputs": {  
    "markdown": "...",  
    "pdf": "..."  
  },  
  "research_status": {  
    "planning": "completed",  
    "research": "completed",  
    "verification": "completed",  
    "writing": "completed"  
  }  
}

Report History

GET /history
Returns list of all reports.

Retrieve Specific Report

GET /history/{name}
Returns markdown content and PDF path.

Health Check

GET /health

{ "status": "ok" }

20. State Management

State snapshots saved at:
backend/app/data/state/

Files:
{report_id}_planner.json
{report_id}_researcher.json
{report_id}_verifier.json
{report_id}_writer.json

21. Report Output

Generated reports stored in:
backend/app/outputs/

Formats:

Markdown (.md)
PDF (.pdf)

22. Testing

Run tests:

pytest backend/tests/  
pytest backend/tests/ --cov=backend/app

23. Security Considerations

Never commit .env
Use HTTPS in production
Implement authentication
Enable rate limiting
Rotate API keys
Log API access
Use secrets manager
Backup state & outputs

24. Docker Deployment

Dockerfile:

FROM python:3.11-slim  
WORKDIR /app  
COPY requirements.txt .  
RUN pip install --no-cache-dir -r requirements.txt  
COPY backend ./backend  
COPY frontend ./frontend  
EXPOSE 8000 8501  
CMD ["sh", "-c", "python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000 & python -m streamlit run frontend/app.py --server.port 8501 --server.address 0.0.0.0"]

Build and run:

docker build -t research-generator .  
docker run -p 8000:8000 -p 8501:8501 -e OPENAI_API_KEY=your_key research-generator

25. Troubleshooting

No API Key

Ensure at least one provider key is set.

Port in Use

Use different port with --port.

ChromaDB Issues

Clear:

rm -rf backend/app/data/chroma

Frontend Cannot Connect

Check:

curl http://localhost:8000/health

26. Future Improvements / Roadmap

Iterative verification loops
Source credibility scoring
Parallelized research per section
Web UI interface enhancements
Async search optimization
Credibility weighting algorithms
Streaming responses
Batch reports
Academic DB integration
Agent pipeline builder
Version comparison
Multi-language support

27. Conclusion

This project demonstrates how a properly designed multi-agent architecture can enhance reliability, modularity, and explainability in research report generation.
By combining:

LangGraph orchestration
Tool-integrated RAG
Explicit agent role separation
API and UI layers
Multi-LLM support

The system moves beyond a simple LLM pipeline and becomes a structured, extensible research automation framework.

GitHub Repository:
https://github.com/Raghul-S-Coder/Multi-Agent-Research-Evidence-Based-Report-Generator

Author:
Raghul S
GitHub: https://github.com/Raghul-S-Coder

Version: 0.1.0
Last Updated: February 2026