Multi-Agent Travel Planner

Category : Applied Solution Showcase

1. Problem Context

Planning a trip today means juggling scattered tasks across many tabs: skim blogs and forums, check weather basics, map a realistic day-by-day plan, estimate costs (with currency conversion), and hunt for flights/hotels—then redo everything when dates, prices, or weather change. This workflow is:

Fragmented and time-intensive: constant context-switching among research, itinerary building, budgeting, and bookings.

Not reusable: outputs aren’t standardized into a brief you can share or iterate on.

Hard to verify: facts go stale; sources disagree; single-shot LLM answers lack tool-grounded checks.
Budget-opaque: prices, fees, and FX rates shift; users need quick, explainable totals.
Operationally brittle: ad-hoc scripts/scrapers break; prompts grow monolithic and unmaintainable.

What’s needed is a reproducible way to coordinate distinct capabilities—research, itinerary design, budgeting, and booking suggestions—each with the right tools, clear handoffs, and guardrails, producing a concise, shareable Final Travel Brief.

Solution approach (scope):

Use an agentic architecture with role-specialized agents and a Supervisor to enforce order and prevent loops.

Ground select steps with tools (weather, FX conversion, demo bookings) for verifiability.

Expose a simple Streamlit UI and standardize outputs into a brief suitable for travelers and reviewers.

Constraints & considerations:

Prices/FX/weather vary; treat budgets as estimates.

Respect third-party ToS; demo booking lookups are illustrative—swap in official APIs for production.

Control LLM cost/latency via bounded turns and minimal tool calls; persist state for traceability.

2. Technical Requirements

2.1 Runtime & OS

Python: 3.9+ (3.10/3.11 recommended)

OS: macOS / Linux / Windows

Hardware: CPU-only is sufficient; no GPU required

Network: outbound HTTPS access (OpenAI, Open-Meteo, Frankfurter; optional booking sources)

2.2 Core Frameworks & Services

Orchestration: LangGraph (StateGraph + MemorySaver) coordinating nodes: supervisor, research, itinerary, budget, booking, final.

Agent Runtime: LangChain (OpenAI Functions agent + AgentExecutor).

LLM Backbone: OpenAI chat model (e.g., gpt-4o) with function/tool calling; swappable to any functions-capable provider.

User Interface: Streamlit single-page app (text input, slider for max turns, chat transcript rendering).

Environment Management: python-dotenv for secrets and config.

2.3 Tool Integrations (built-in)

Weather Tool: Open-Meteo geocoding + forecast (no API key).

Foreign Exchange Tool: Frankfurter API (ECB rates, no key).

Bookings Tool (educational): requests + BeautifulSoup scrapers for flights/hotels; replace with official provider APIs for production (e.g., Amadeus, Duffel, Skyscanner partners).

2.4 Packages (minimum)

langgraph
langchain
langchain-openai
langchain-community
streamlit
python-dotenv
requests
beautifulsoup4
wikipedia # used by WikipediaAPIWrapper
duckduckgo-search # if not bundled via langchain_community

Pin exact versions in requirements.txt after a clean install on your target machine.

2.5 Environment Variables

OPENAI_API_KEY – required for LLM calls

(Optional) HTTP_PROXY / HTTPS_PROXY – if your network requires it

Provide a .env_example and do not commit real keys.

3. System Architecture

What this system does?

You type a travel goal once, and a small team of AI “specialists” work in order—research → itinerary → budget → booking—then hand you a single, polished Final Travel Brief in the app.

The moving parts:

1) Streamlit app:

What it does: Takes your trip goal and a “max turns” slider, then shows the Final Travel Brief and a chat-style log of everything the agents did.

Why it matters: Keeps the experience simple and watchable—no terminals, no configs.

2) Orchestrator (LangGraph):

What it does: Think of this as the project manager. It keeps track of state and decides which specialist works next.

Why it matters: Prevents chaos. Ensures agents work in a sensible order and don’t loop forever.

3) Supervisor:

What it does: After each step, it looks at progress and says, “Next up: research / itinerary / budget / booking… or we’re done.”

Why it matters: Keeps the workflow moving forward. If things look stuck, it skips ahead to “final.”

4) Specialists (the agents):

Research: Finds quick, trustworthy context (weather snapshot, areas, transit basics).

Itinerary: Turns your goal + research into a realistic day plan (AM/PM/Evening) with backups.

Budget: Adds up flights, hotel, food/day, transit, activities; converts to local currency.

Booking: Suggests 2–3 flight and hotel options with short pros/cons.

Why they matter: Each agent has one job. That focus makes them clearer, faster, and easier to improve.

5) Finalizer (the editor)

What it does: Merges everything into one clean Final Travel Brief you can copy/share.

Why it matters: You get a single answer, not scattered notes.

6) Tools (ground truth helpers)

Weather (Open-Meteo): quick daily gist and temperatures.

FX (Frankfurter): converts your totals to the local currency.

Bookings (demo): shows how a provider API/scrape could plug in; swap this for real APIs in production.

Why they matter: Agents don’t guess basic facts; they pull them from reliable sources.

7) LLM runtime (the language brain)

What it does: Powers each agent’s reasoning and writing, with built-in support for calling tools safely.

Why it matters: Good reasoning + tool use = practical answers, not just pretty text.

8) Lightweight memory

What it does: Tracks the conversation and turn count for this run.

Why it matters: Makes every step traceable and caps the work so it stays quick and affordable.

What happens when you press “Plan My Trip”

You submit a goal (e.g., “5 days in Tokyo in October, $3k, from DFW”). Orchestrator starts at Supervisor, who picks the first phase (usually “research”). Research runs, calling weather (and light references) only if needed, and adds a short, cited summary. Supervisor checks progress and routes to Itinerary, which builds a walkable day plan and tips. Supervisor routes to Budget, which does one clean calculation pass and currency conversion, then writes a tidy breakdown. Supervisor routes to Booking, which proposes 2–3 options with pros/cons. Supervisor says “final”, and the Finalizer stitches everything into your Final Travel Brief. Streamlit displays the brief on top and the full conversation below (for transparency).

9) Similarity Search Mechanisms

At the core of the travel planner lies the ability to match user goals with relevant travel knowledge bases (e.g., flights, hotels, attractions, cultural activities). This is achieved using similarity search techniques over vectorized representations of both user input and stored domain knowledge.

a) Vector Embeddings

User queries are converted into dense vector embeddings using transformer-based language models (e.g., OpenAI, Sentence-BERT).

Knowledge sources such as destination guides, flight metadata, and curated itineraries are pre-embedded and indexed.

b) Vector Databases

These embeddings are stored in a vector database (e.g., FAISS, Pinecone, Weaviate) optimized for high-dimensional similarity search.

Approximate Nearest Neighbor (ANN) methods (e.g., HNSW, IVF) ensure efficient retrieval even across millions of travel records.

c) Semantic Matching

Instead of relying solely on keyword overlap, semantic similarity ensures that queries like “Tokyo food tour” can retrieve results related to “Japanese cuisine experience,” “street food in Shinjuku,” or “sushi workshops.”

This provides flexibility for diverse user phrasing.

d) Context-Aware Ranking

Retrieved candidates are ranked not just by similarity score, but also by user preferences (budget, travel style, duration).

A re-ranking layer integrates metadata (e.g., flight costs, activity duration) into the similarity results.

10) Query Processing Techniques

The query processing pipeline transforms natural-language travel goals into actionable multi-agent workflows.

a) Pre-Processing & Normalization

Extract structured elements: destination, dates, budget, traveler count, and preferences.

Example: “Plan a 5-day trip to Tokyo in October 2025 for two adults, flying from Dallas, budget $3000” → structured slots ({location: Tokyo, duration: 5 days, budget: 3000, departure: DFW}).

b) Intent Classification

Classify sub-intents like flight search, hotel booking, itinerary planning, budget estimation.

Each intent is mapped to a specialized agent in the LangGraph orchestration system.

c) Decomposition & Orchestration

Large queries are decomposed into subtasks (e.g., “find flights” → “compare costs” → “select options”).

LangGraph ensures controlled agent handoffs and prevents infinite loops through max-turn limits.

d) Contextual Query Expansion

For vague or underspecified inputs (e.g., “prefer culture”), the system enriches the query with related terms (museums, temples, traditional performances).

This improves recall during similarity search.

e) Constraint Handling

Hard constraints (budget, dates, departure city) are enforced early.

Soft constraints (preferences like food or culture) guide re-ranking and itinerary adjustments.

f) Multi-Turn Refinement

The system engages in limited clarifying dialogue, asking follow-up questions only when crucial gaps exist.

This balances user effort with system autonomy.

Inputs & outputs

Input you give: A single natural-language goal, plus the max number of hand-offs (“turns”).

Output you get:
One Final Travel Brief with:
Snapshot (dates, destination, trip length, budget)
Day-by-day (AM/PM/Evening + backups)
Budget totals (USD + local currency)
Top flight/hotel picks (pros/cons)
Quick tips (weather, transit, payments)
Guardrails (how it stays sane)

Default order: research → itinerary → budget → booking → final.
No endless loops: a strict turn cap stops runaway back-and-forth.
Tool discipline: prompts limit how often tools are called (e.g., budget does one short Python calc).
Fail safe: if a step errors or stalls, Supervisor skips ahead to keep momentum.

Image_source: https://blog.langchain.com/langgraph-multi-agent-workflows/

4. Implementation

4.1 State & Tools (agents.py)

# agents.py
from typing import TypedDict, Literal, List, Dict, Any
from langchain_core.messages import HumanMessage, AIMessage

class AgentState(TypedDict, total=False):
    messages: List[Any]   # chat history (HumanMessage/AIMessage)
    next: Literal["research","itinerary","budget","booking","final"]
    shared: Dict[str, Any]  # e.g., {"thread_id": "...", "turns": 3}

A single, minimal contract keeps all nodes interchangeable and easy to test.

4.2 . Orchestration with LangGraph (nodes, edges, routing)

# graph.py
from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver
from agents import AgentState, supervisor_node, research_node, itinerary_node, budget_node, booking_node, final_node

def build_graph():
    g = StateGraph(AgentState)
    for name, fn in [
        ("supervisor", supervisor_node),
        ("research",   research_node),
        ("itinerary",  itinerary_node),
        ("budget",     budget_node),
        ("booking",    booking_node),
        ("final",      final_node),
    ]: g.add_node(name, fn)

    # workers → supervisor; finish at "final"
    for n in ("research","itinerary","budget","booking"): g.add_edge(n, "supervisor")
    g.set_finish_point("final")

    # supervisor chooses next hop via state["next"]
    def route_decider(state: AgentState) -> str: return state.get("next", "final")
    g.add_conditional_edges("supervisor", route_decider, {
        "research":"research","itinerary":"itinerary","budget":"budget","booking":"booking","final":"final"
    })
    g.set_entry_point("supervisor")
    return g.compile(checkpointer=MemorySaver())

Why: explicit nodes + conditional edges make the flow self-documenting and easy to extend.

4.3 Supervisor (tiny router + safety caps):

# agents.py (excerpt)
def supervisor_node(state: AgentState) -> AgentState:
    turns = state.setdefault("shared",{}).get("turns",0) + 1
    state["shared"]["turns"] = turns
    if turns >= 12:            # hard stop to avoid loops/cost
        state["next"] = "final"
        return state

    order = ["research","itinerary","budget","booking","final"]
    curr  = state.get("next","research")
    state["next"] = order[order.index(curr)+1] if curr in order[:-1] else "final"
    return state

Why: a tiny, deterministic policy that prefers the default order and bails out if progress stalls.

4.4 Agent pattern (LLM + prompt + tools) in one small factory

# agents.py (excerpt)
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import AIMessage

def make_agent(llm: ChatOpenAI, tools: list, system_prompt: str):
    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt), ("user", "{input}"), ("assistant", "{agent_scratchpad}")
    ])

    def _run(state: AgentState) -> AgentState:
        text = "\n".join(f"{type(m).__name__}: {getattr(m,'content',m)}" for m in state.get("messages",[]))
        runnable = create_openai_functions_agent(llm, tools, prompt)
        execu = AgentExecutor(agent=runnable, tools=tools, verbose=False)
        out = execu.invoke({"input": text})["output"]
        return {**state, "messages": state.get("messages", []) + [AIMessage(content=out)]}

    return RunnableLambda(_run)

# Example: Budget agent with FX tool, tuned for determinism
from langchain_core.tools import tool

@tool
def fx_convert_tool(amount: float, from_ccy: str, to_ccy: str) -> dict:
    return {"amount": amount, "from": from_ccy, "to": to_ccy, "converted_amount": round(amount*0.92,2)}

budget_node = make_agent(ChatOpenAI(model="gpt-4o", temperature=0.2),
                         [fx_convert_tool],
                         "You are the Budget agent. Return a clear USD + local total.")

Why: every agent uses the same skeleton; only the model temperature, tools, and system prompt differ.

4.5 Finalizer (single place that assembles the brief):

# agents.py (excerpt)
def final_node(state: AgentState) -> AgentState:
    brief = (
        "Final Travel Brief\n"
        "- Snapshot: ...\n- Day-by-day: ...\n- Budget: ...\n- Top picks: ...\n- Tips: ..."
    )
    return {**state, "messages": state.get("messages", []) + [AIMessage(content=brief, name="Finalizer")]}

4.6 Streamlit entrypoint (one click to run, a few lines to render)

# main.py
import uuid, streamlit as st
from langchain_core.messages import HumanMessage
from graph import build_graph
from agents import AgentState

st.title("🌍 Multi-Agent Travel Planner")
goal  = st.text_area("Enter your travel goal:", height=100)
turns = st.slider("Max turns:", 6, 40, 12)

if st.button("Plan My Trip 🚀", use_container_width=True):
    thread_id = str(uuid.uuid4())
    state: AgentState = {"messages":[HumanMessage(goal or "5 days in Tokyo...")],
                         "next":"research", "shared":{"thread_id": thread_id}}
    graph = build_graph()
    final_state = graph.invoke(
        state,
        config={"configurable":{"thread_id": thread_id}, "recursion_limit": turns}
    )

    st.subheader("Final Travel Brief")
    st.success(final_state["messages"][-1].content)

    st.subheader("Conversation Log")
    for m in final_state["messages"]:
        st.markdown(f"**{type(m).__name__}:** {getattr(m,'content',m)}")

Why: minimal UI, clear control over depth (turns), and transparent output (final brief + full log).

5. Results & Demonstration:

The system outputs a single Final Travel Brief—a concise, shareable summary with a snapshot (dates, destination, trip length, budget), a practical day-by-day plan (AM/PM/Evening + backups), a budget breakdown in USD and local currency, two or three flight/hotel picks with quick pros/cons, and helpful tips (weather, transit, payments). A transparent Conversation Log shows each agent’s contribution and any tool calls. To demo, run streamlit run travel_planner/main.py, paste a goal like “5 days in Tokyo in October 2025, $3000, from DFW, prefer food/culture + one day trip,” click Plan My Trip, and review the final brief at the top (with optional screenshots placed where you showcase Research → Itinerary → Budget → Booking → Final). It’s CPU-friendly (typically 20–90s per run), and for production you can swap the demo bookings tool for a provider API.

6 Impact :

What changes for users and teams

Time-to-plan drops from hours of tab-juggling to ~[10–20] minutes per trip (single input → Final Travel Brief).
Confidence & transparency improve via a Conversation Log that shows exactly how each step was produced.
Cost clarity: a simple USD + local-currency breakdown reduces “surprise spend,” targeting ±[10–15]% variance vs. actuals.
Consistency at scale: Supervisor-guided flow produces uniform outputs across travelers, routes, and seasons.

What changes for engineering

Lower run cost: bounded turns and minimal tool calls cap LLM usage to ~[N] calls / plan (≈ $[X].YY per run).
Fast iteration: modular agents/tools let you swap providers (e.g., bookings API) without rewriting the graph.
Portable & accessible: CPU-only, Streamlit front end, .env configuration—easy to run in dev or a lightweight container.

Applied value by audience

Travelers/Clients: actionable plans in minutes; clearer budgets; fewer back-and-forth emails.
Agencies/Operators: repeatable briefs; staff time saved; smoother handoffs to booking desks.
Developers/Educators: a compact, production-shaped example of agentic orchestration with LangGraph + LangChain.

Example KPIs to report (swap in your numbers)

Average end-to-end runtime: [mm
]
Average cost per plan: $[X].YY
Tool calls per plan (weather/FX/booking): [a/b/c]
Budget delta vs. actual spend: ±[N]%
User satisfaction (Brief usefulness, 1–5): [4.2–4.7]
% runs needing human edits before sharing: [≤ 20%]

7. Challenges & Future Work

Key Challenges

Data reliability & ToS: Demo booking scrapers are brittle and risk violating site policies; DOM changes and anti-bot measures can silently break results.
Latency & cost control: Multi-agent handoffs, external API calls, and retries can spike run time and spend without strict caps and caching.
Determinism & quality: Agents may drift (hallucinate, overuse tools, or loop); itinerary realism (timings, transfers, closures) is hard without strong grounding.
Observability & debugging: Tracing which prompt/tool caused an issue is non-trivial; need clear logs, errors, and metrics per phase.
Evaluation at scale: No automated way yet to score itinerary quality, budget accuracy, or booking plausibility beyond manual review.
Locale complexity: Time zones, holidays, seasonality, currencies, and languages introduce subtle edge cases.
Security & privacy: API keys, user inputs, and any PII must be protected; no secrets should live in the repo.

Future Work

Production-grade bookings: Replace scrapers with provider APIs (e.g., Amadeus/Duffel), add rate limiting, retries, and circuit breakers.
Grounded realism: Integrate maps/transit/time-on-foot estimates, opening hours, and live events; enrich with city/POI knowledge via RAG.
Personalization loop: User profiles (interests, pace, dietary prefs), thumbs-up/down feedback, and iterative re-planning.
Quality & eval harness: Define KPIs (budget delta, timing feasibility, user rating), add unit/integration tests and synthetic evals per agent.
Observability: Structured logs, per-agent metrics, trace IDs, and a simple dashboard; error taxonomy with graceful fallbacks.
Cost/latency optimizations: Caching (weather/FX), shared context across agents, streaming responses, and adaptive max_turns.
Safety & compliance: Content filters, ToS checks for sources, and a privacy policy; redact PII in logs.
Features & exports: Multi-city trips, price tracking/alerts, calendar (ICS) export, shareable links, and PDF/Markdown brief generation.
Model flexibility: Pluggable LLMs per role (cheaper deterministic for Budget, creative for Itinerary), JSON-schema tool I/O, and offline stubs for demos.

8. Conclusion

The Multi-Agent Travel Planner shows how an agentic workflow—Supervisor → Research → Itinerary → Budget → Booking → Final—can turn a single natural-language goal into a clear, shareable Final Travel Brief in minutes. By pairing role-focused agents with small, purpose-built tools (weather, FX, booking) and routing them with LangGraph, the system delivers predictable outputs, transparent reasoning (via the conversation log), and easy extensibility (swap tools/models, add agents) without rewriting the app.

This is an applied template you can run today and evolve tomorrow: replace the demo booking lookup with a provider API, add evaluation metrics and observability, and personalize itineraries with lightweight profiles. The result is a practical foundation for AI-assisted trip planning that teams can adapt for real users and real operations.

Try it, fork it, extend it. Run the Streamlit app, review the Final Travel Brief, and iterate on agents or tools to match your use case. Contributions and feedback are welcome—especially around production booking integrations, evaluation harnesses, and export workflows.

Github link at : https://github.com/Ns90322/Multi-Agent-System

9. License

This project is licensed under the MIT License. See the License here.