Touri

Abstract

Touri is an AI-powered travel planning system that leverages a multi-agent architecture to automate the end-to-end process of trip planning. Built on CrewAI and powered by Groq's LLM inference, the system orchestrates three specialized agents:

1. Flight Specialist

2. Accommodation Specialist

3. Itinerary Planner

Each equipped with real-world API tools. Given a user's travel parameters, the system searches for live flights via FlightAPI.io, hotels via Booking.com, and weather forecasts via OpenWeatherMap, then assembles a complete travel plan with budget breakdown, currency conversion, and a day-by-day itinerary. The result is delivered through an interactive Streamlit web interface with PDF export capability.

Introduction

Planning a trip involves coordinating multiple independent tasks likefinding flights, booking accommodation, researching activities, checking weather, and managing a budget across currencies. Traditionally this requires a traveler to visit multiple platforms, manually compare options, and synthesize information into a coherent plan. This process is time-consuming and cognitively demanding.

Large Language Models (LLMs) have demonstrated strong capability in reasoning, tool use, and structured output generation. Multi-agent frameworks like CrewAI extend this by allowing multiple specialized agents to collaborate on complex tasks, each with its own role, tools, and goal.

Touri applies this paradigm to travel planning. Rather than a single monolithic LLM prompt, the system decomposes the problem into three specialized sub-tasks handled by dedicated agents. Each agent is equipped with domain-specific tools and operates independently, with results assembled into a unified travel plan.

The key research questions this project addresses are:

Can a multi-agent LLM system reliably orchestrate real-world API calls to produce actionable travel plans?
How can rate limit constraints on free-tier LLM APIs be managed in a production-like workflow?
What is the right level of agent autonomy versus deterministic post-processing for structured output reliability?

Methodology

System Architecture

The system follows a sequential multi-agent pipeline:

User Input → FlightAgent → AccommodationAgent → ItineraryAgent → Plan Assembly → UI
Each agent runs in its own isolated CrewAI crew, with a 12-second pause between executions to respect Groq's token-per-minute (TPM) rate limits.

Agent Design

All agents inherit from a shared BaseTravelAgent class that handles LLM initialization, retry logic, and tool registration.

class BaseTravelAgent:
GROQ_MODELS = {
"llama-70b": "llama-3.3-70b-versatile",
"llama-8b": "llama-3.1-8b-instant",
"gpt-oss-20b": "openai/gpt-oss-20b",
"gpt-oss-120b": "openai/gpt-oss-120b",
}

def __init__(self, model: str = "llama-70b", temperature: float = 0.3):
    self.llm = LLM(
        model=f"groq/{self.GROQ_MODELS[model]}",
        temperature=temperature,
        max_retries=6,
    )

FlightAgent uses FlightAPITool to query FlightAPI.io using the correct path-parameter URL format:

GET https://api.flightapi.io/onewaytrip/{api_key}/{origin}/{dest}/{date}/{adults}/0/0/{class}/USD
The response contains cross-referenced arrays (itineraries, legs, segments, carriers, places) that are resolved by ID to produce structured flight objects.

AccommodationAgent uses BookingComTool with a two-step flow. first resolving the city name to a dest_id via /v1/hotels/locations, then searching with that ID via /v1/hotels/search.

ItineraryAgent uses WeatherTool to fetch a 5-day OpenWeatherMap forecast and builds a day-by-day activity plan grounded in the destination and user interests.

Tool Schema Design

A key technical requirement for Groq's function-calling API is that every tool must expose a Pydantic args_schema with explicit properties. Without this, Groq raises a BadRequestError with 'required' present but 'properties' is missing. All four tools define explicit input schemas:

class FlightSearchInput(BaseModel):
origin: str = Field(..., description="Departure airport IATA code")
destination: str = Field(..., description="Arrival airport IATA code")
departure_date: str = Field(..., description="Date in YYYY-MM-DD format")
return_date: Optional[str] = Field(default=None)
adults: int = Field(default=1)
travel_class: str = Field(default="ECONOMY")

class FlightAPITool(BaseTool):
name: str = "FlightAPI.io Flight Search"
description: str = "Search for real flight options using FlightAPI.io"
args_schema: Type[BaseModel] = FlightSearchInput

Output Parsing

LLMs frequently wrap JSON output in markdown code blocks or prepend prose. A robust parser was implemented using a character-by-character balanced brace tracker:

def safe_parse_json(text: str):
# Strip markdown code blocks
if text.startswith(""): text = text.split("")[1]
if text.startswith("json"):
text = text[4:]

# Find first complete balanced JSON object
for start_char, end_char in [('{', '}'), ('[', ']')]:
    start = text.find(start_char)
    if start == -1:
        continue
    depth = 0
    in_string = False
    for i, ch in enumerate(text[start:], start):
        if ch == '"' and not in_string:
            in_string = True
        elif ch == '"' and in_string:
            in_string = False
        elif not in_string:
            if ch == start_char: depth += 1
            elif ch == end_char:
                depth -= 1
                if depth == 0:
                    return json.loads(text[start:i+1])

Post-Processing Pipeline

After agent execution, _create_travel_plan performs deterministic corrections:

Date correction — overrides LLM-generated dates using day_number and the actual trip start date
Currency conversion — fetches a single USD→target rate and applies it to all prices
Weather attachment — calls WeatherTool directly and attaches the forecast
Budget calculation — sums flights, accommodation, and activity costs

Rate Limit Management

Groq's free tier limits llama-3.3-70b-versatile to 12,000 TPM. The system manages this through:

Sequential task execution with 12-second inter-task pauses
Exponential backoff retry: 15s → 30s → 60s → 120s → 240s
litellm.num_retries = 6 for HTTP-level retries
Compact task prompts (~80 tokens vs ~300 in the original design)
max_iter=1 on all agents to prevent multi-turn token consumption

Experiments

Experiment 1 — Tool Schema Validation

Problem: Groq rejected tool calls with invalid JSON schema ('required' present but 'properties' missing).

Cause: CrewAI's BaseTool subclasses without args_schema generate schemas with required fields but no properties, which Groq's strict validation rejects.

Fix: Added explicit Pydantic args_schema to all four tools. Resolved completely.

Experiment 2 — FlightAPI.io Response Parsing

Problem: Flight tool returned {"success": true, "data": {"flights": [ ] } } despite a valid API key.

Cause: The original implementation used query parameters and expected data.status == 'success'. The actual API uses path parameters and returns flat cross-referenced arrays.

Fix: Rewrote URL construction and response parser to match the actual API contract. Flight results populated correctly after fix.

Experiment 3 — JSON Output Reliability

Problem: safe_parse_json failed on outputs like 'Here it is: \n{"flights": [ ] }' and {"accommodations": [...]} \n\nGo ahead: \n{"accommo... (two JSON objects).

Cause: Regex-based extraction with re.DOTALL greedily matched from first { to last }, spanning multiple JSON objects.

Fix: Replaced regex with balanced brace parser. Correctly extracts the first complete JSON object regardless of surrounding text.

Experiment 4 — Rate Limit Handling

Problem: RateLimitError on llama-3.3-70b-versatile (12k TPM) and llama-3.1-8b-instant (6k TPM) during multi-agent execution.

Cause: Three agents executing back-to-back consumed the full TPM budget within one minute.

Fix: Split crew into three sequential mini-crews with 12-second pauses. Combined with exponential backoff retry, the system completes successfully on the free tier.

Results

The system successfully produces complete travel plans end-to-end with the following components and data sources:

Component	Data Source
Flight search	FlightAPI.io
Hotel search	Booking.com (RapidAPI)
Itinerary generation	Groq LLM + OpenWeatherMap
Currency conversion	ExchangeRate API

Key findings:

Explicit args_schema on all tools is mandatory for Groq function calling without it, 100% of tool calls fail.
LLM output is unreliable for structured data without deterministic post-processing. Dates, in particular, must be corrected server-side regardless of prompt instructions.
Sequential execution with inter-task pauses is sufficient to stay within free-tier TPM limits for a 3-agent system.
Balanced brace parsing is significantly more robust than regex for extracting JSON from LLM output that may contain prose, multiple JSON objects, or markdown formatting.

Conclusion

Touri demonstrates that a multi-agent LLM system can reliably orchestrate real-world API calls to produce actionable, structured travel plans. The key engineering insights from this project are:

1.Tool schema correctness is non-negotiable when using strict function-calling APIs like Groq , Pydantic args_schema must be explicitly defined on every tool.
2. LLM output should be treated as unreliable structured data: deterministic post-processing (date correction, price conversion, JSON extraction) is essential for production reliability.
3. Rate limit management requires architectural decisions, not just retry logic: splitting tasks into separate crews with deliberate pauses is more effective than relying solely on backoff.
4. Agent autonomy should be bounded : max_iter=1 with clear, concise task prompts produces more consistent results than allowing agents to iterate freely, especially under token constraints.

Future work could explore parallel agent execution using async CrewAI flows, integration with booking APIs for direct reservation, and a conversational adjustment interface for iterative plan refinement.

Built with CrewAI · Groq · Streamlit · FlightAPI.io · Booking.com RapidAPI