Multi AI Tools Deep Research Assistant

Introduction
1.1 Problem Statement
Modern information retrieval systems often struggle with:

Source Selection: Determining which knowledge source is most appropriate for a given query
Real-time Information: Accessing up-to-date information beyond training data cutoffs
Multi-domain Queries: Handling questions that span current events, academic research, and encyclopedic knowledge
Response Quality: Ensuring factual accuracy and minimizing hallucinations

1.2 Proposed Solution
We present an agentic AI system that:

Automatically selects appropriate tools based on query context
Orchestrates multiple searches when necessary
Synthesizes information from diverse sources
Provides cited responses to ensure transparency and verifiability

System Architecture
2.1 Core Components
2.1.1 LangGraph Workflow Engine
The system uses a state-based graph architecture with three primary nodes:
User Query → Agent (Tool Selection) → Tool Execution → Agent (Synthesis) → Response
State Management:

Maintains conversation history using typed dictionaries
Implements message accumulation for context preservation
Supports conditional routing based on tool requirements

2.1.2 Language Model Configuration

Primary Model: OpenAI GPT-OSS-20B via Groq
Temperature: 1.0 (balanced creativity and accuracy)
Reasoning Effort: Medium
Purpose: Tool selection and response synthesis
2.1.3 Multi-Tool Ecosystem
Screenshot 2025-11-03 23.05.11.png

2.2 Intelligent Tool Selection
The agent employs a sophisticated prompt-based routing system:
Decision Criteria:

Temporal relevance: Recent events → Tavily
Conceptual queries: Definitions, explanations → Wikipedia
Academic queries: Research, technical information → ArXiv
Hybrid queries: Multiple tools in parallel

2.3 Safety and Ethics Implementation
Built-in Safeguards:

Rejection of unethical/illegal information requests
Source citation requirements
Instruction injection prevention
Language-adaptive responses (English/French)

Evaluation Methodology
3.1 RAGAS Framework Integration
We implemented comprehensive evaluation using the RAGAS (Retrieval Augmented Generation Assessment) framework with three key metrics:
3.1.1 Faithfulness
Definition: Measures factual consistency between the generated answer and retrieved context.
Formula:
Faithfulness = (Number of supported claims) / (Total claims)
Importance: Detects hallucinations and ensures response grounding
3.1.2 Answer Relevancy
Definition: Evaluates how directly the answer addresses the user's question.
Method:

Uses embeddings to compute semantic similarity
Measures focus and pertinence of response
Penalizes verbose or off-topic content

3.1.3 Context Recall
Definition: Assesses whether all relevant information from the ground truth is captured in the retrieved context.
Formula:
Context Recall = (Relevant context retrieved) / (Total relevant context available)

Importance: Ensures comprehensive information retrieval
3.2 Evaluation Infrastructure
Configuration:

Evaluator LLM: Llama-3.1-8B-Instant (Groq)
Embeddings: sentence-transformers/all-MiniLM-L6-v2
Test Dataset: 5 diverse queries spanning multiple domains
Metrics: Faithfulness, Answer Relevancy, Context Recall

Test Categories:

Scientific concepts (Quantum physics, Machine learning)
Current events (AI in Africa)
Historical facts (Penicillin discovery)
Recent developments (Nuclear fusion)

Implementation Details
4.1 Workflow Logic
pythondef agent_workflow():
1. Receive user query
2. Analyze query type and requirements
3. Select appropriate tool(s)
4. Execute tool calls in parallel (if needed)
5. Aggregate and synthesize results
6. Generate cited response
7. Return formatted answer
  4.2 Error Handling Strategy
  Graceful Degradation:

Tool failures don't crash the system
Error messages logged for debugging
Alternative sources attempted when available
User-friendly error communication

4.3 Performance Optimizations
Key Improvements:

Increased retrieval depth: max_results=7 (Tavily), top_k=5 (others)
Advanced search mode: Enhanced relevance filtering
Parallel tool execution: Reduced latency for multi-source queries
Context-based synthesis: "Base your answer only on retrieved context"

Conclusion
This work demonstrates the practical implementation of an intelligent multi-tool research assistant that addresses key challenges in modern information retrieval. By combining LangGraph's orchestration capabilities, Groq's high-performance inference, and a curated set of specialized tools, we achieve:

Intelligent Automation: Context-aware tool selection without manual intervention
Quality Assurance: Measurable performance through RAGAS evaluation
Production Readiness: Robust error handling and scalable architecture

The system represents a significant step toward autonomous, reliable, and user-friendly AI assistants capable of handling diverse information needs across multiple domains.

Introduction
1.1 Problem Statement
Modern information retrieval systems often struggle with:

1.2 Proposed Solution
We present an agentic AI system that:

System Architecture
2.1 Core Components
2.1.1 LangGraph Workflow Engine
The system uses a state-based graph architecture with three primary nodes:
User Query → Agent (Tool Selection) → Tool Execution → Agent (Synthesis) → Response
State Management:

Maintains conversation history using typed dictionaries
Implements message accumulation for context preservation
Supports conditional routing based on tool requirements

2.1.2 Language Model Configuration

2.2 Intelligent Tool Selection
The agent employs a sophisticated prompt-based routing system:
Decision Criteria:

2.3 Safety and Ethics Implementation
Built-in Safeguards:

Rejection of unethical/illegal information requests
Source citation requirements
Instruction injection prevention
Language-adaptive responses (English/French)

Evaluation Methodology
3.1 RAGAS Framework Integration
We implemented comprehensive evaluation using the RAGAS (Retrieval Augmented Generation Assessment) framework with three key metrics:
3.1.1 Faithfulness
Definition: Measures factual consistency between the generated answer and retrieved context.
Formula:
Faithfulness = (Number of supported claims) / (Total claims)
Importance: Detects hallucinations and ensures response grounding
3.1.2 Answer Relevancy
Definition: Evaluates how directly the answer addresses the user's question.
Method:

Uses embeddings to compute semantic similarity
Measures focus and pertinence of response
Penalizes verbose or off-topic content

Importance: Ensures comprehensive information retrieval
3.2 Evaluation Infrastructure
Configuration:

Test Categories:

Scientific concepts (Quantum physics, Machine learning)
Current events (AI in Africa)
Historical facts (Penicillin discovery)
Recent developments (Nuclear fusion)

Implementation Details
4.1 Workflow Logic
pythondef agent_workflow():
1. Receive user query
2. Analyze query type and requirements
3. Select appropriate tool(s)
4. Execute tool calls in parallel (if needed)
5. Aggregate and synthesize results
6. Generate cited response
7. Return formatted answer
  4.2 Error Handling Strategy
  Graceful Degradation:

Tool failures don't crash the system
Error messages logged for debugging
Alternative sources attempted when available
User-friendly error communication

4.3 Performance Optimizations
Key Improvements:

The system represents a significant step toward autonomous, reliable, and user-friendly AI assistants capable of handling diverse information needs across multiple domains.

Multi AI Tools Deep Research Assistant

Multi AI Tools Deep Research Assistant

Code

Code