
Procurement teams in mid-to-large enterprises spend significant time manually researching suppliers, validating purchase requests, and preparing recommendations — often repeating the same structured process for every new requisition. Procurement Assistant is an early-stage, multi-agent AI system designed to automate this cycle. Built with LangGraph and the Anthropic Claude API, it chains four specialized agents — Intake, Procurement, Analyst, and Orchestrator — each with a clearly defined role and a strict output contract. The result is a fully traceable, policy-driven procurement workflow that goes from a free-text purchase request to a structured final report with a supplier recommendation, cost analysis, and a plain-language decision summary.
Enterprise procurement is one of those domains where the process is well-understood, highly structured, and yet still largely manual. In most companies, a purchase request follows a predictable path:
The challenge is not that this process is complex — it is actually very structured. The challenge is that it is time-consuming, repetitive, and heavily dependent on institutional knowledge: knowing the buying rules, knowing the thresholds, knowing which supplier categories apply to which product types.
This is precisely where AI agents can add value. Not by replacing procurement specialists, but by automating the structured, rule-driven parts of the workflow — so that human attention is focused where it genuinely matters: exceptions, strategic decisions, and supplier relationships.
Procurement Assistant is an attempt to build exactly that. It is a first-version system targeting the core intake-to-decision cycle for procurement requests involving machinery, equipment, vehicles, professional services, and other categories defined in a company's purchasing policy.
Before diving into the architecture, it is worth stating the design principles that shaped the system — because they explain many of the implementation choices.
Auditability above all. Every decision made by every agent must reference the specific rule that drove it. A compliance officer must be able to read the decision log and understand the full reasoning chain without external context.
Strict output contracts. Agents do not produce free-form text. Every agent output is a structured Pydantic model, validated at runtime. If an agent cannot produce valid structured output, the system raises a typed exception (StructuredOutputError) rather than passing malformed data downstream.
Separation of concerns. Each agent does exactly one thing. The Intake Agent does not search for suppliers. The Procurement Agent does not make final decisions. The Orchestrator does not override upstream decisions. This makes the system easier to test, debug, and improve incrementally.
Configuration over hardcoding. Buying rules, decision thresholds, model selection, and prompt versions are all managed through YAML configuration files. Changing a threshold does not require touching agent code.
Prompt versioning. System prompts are versioned, checksummed, and changelog-tracked — the same discipline applied to code.
The system is built as a directed agent graph using LangGraph. Each agent is a separate node in the graph. Nodes communicate exclusively through a shared state object (SharedState) — no agent calls another directly.
User Input (CLI)
│
▼
┌──────────────┐
│ Intake │ Validates & classifies the request.
│ Agent │ Triggers clarification rounds if required fields are missing.
└──────┬───────┘
│ validated_request, category_id, process_type
▼
┌──────────────┐
│ Procurement │ Searches for suppliers using web search, PDF parsing,
│ Agent │ and currency conversion tools.
└──────┬───────┘
│ supplier_recommendations, procurement_strategy, negotiation_points
▼
┌──────────────┐
│ Analyst │ Performs TCO analysis and risk assessment.
│ Agent │ Issues final decision based on configurable thresholds.
└──────┬───────┘
│ cost_analysis, risk_analysis, final_decision
▼
┌──────────────┐ ┌──────────────┐
│ Orchestrator │────►│ Human Review │ Activated when decision = ESCALATE
│ Agent │ │ Node │
└──────┬───────┘ └──────────────┘
▼
Final Report + Plain-Language Summary (CLI)
The graph is defined in procurement_system/graph/procurement_graph.py. The shared state flows through each node, accumulating outputs and a running decision log that is included in the final report.
The Intake Agent is the entry point of the system. It receives a raw, free-text purchase request and is responsible for transforming it into a validated, classified, and routed procurement record.
Its core tasks are:
description, quantity, unit. If any are missing, it does not guess or infer — it triggers a clarification round.MACHINERY, IT_SOFTWARE, PROFESSIONAL_SERVICES) based on the enterprise buying rules.catalog_purchase, rfq, formal_rfq, or strategic_sourcing — driven entirely by the rules defined in enterprise_buying_rules.yaml.One of the more interesting implementation details here is how clarification is handled. When required fields are missing, the Intake Agent does not just flag an error — the graph raises a GraphInterrupt, which pauses execution and surfaces a structured question to the CLI. The user answers, and the graph resumes via Command(resume=answer).
Python controls the clarification loop — how many rounds are allowed, and when to proceed with flags if the user does not respond. The agent only formulates the question; it does not track rounds or decide when to stop.
except GraphInterrupt as e: interrupt_obj = e.args[0][0] payload = interrupt_obj.value question = payload.get("question", "Please provide the missing information") answer = input("> ").strip() final_state = graph.invoke(Command(resume=answer), config=config)
The tone of the clarification question adapts based on the round number — polite and open-ended in round 1, specific and urgent in the final round. This is controlled by a parameter injected at runtime, not hardcoded in the prompt.
The Procurement Agent receives the validated request from the Intake Agent and is responsible for market research and sourcing strategy.
It produces:
Each supplier recommendation includes: name, type, estimated price range (total, not per-unit), lead time, reliability score, pros, cons, and contact priority.
The Procurement Agent has access to three tools:
| Tool | Purpose |
|---|---|
| Supplier Web Search | Searches the web for suppliers matching the request category, powered by Tavily |
| PDF Reader | Extracts and parses supplier catalogues, offers, and technical documents in PDF format |
| Currency Converter | Converts supplier price quotes to a common currency for fair comparison |
The tool layer is structured in three tiers: tools/ (agent-facing interface), services/ (business logic), and repositories/ (external API calls). This separation makes each layer independently testable.
The Analyst Agent performs the financial and risk analysis, and issues the final procurement decision.
Its output includes:
Total Cost of Ownership (TCO) analysis:
Risk assessment:
risk_score calculated as the weighted average of probability × impact for each risk, clamped to a 1.0–10.0 scaleFinal decision, determined by configurable thresholds:
# config/config_analyst_agent.yaml decision_thresholds: auto_proceed: 3.0 # risk_score ≤ 3.0 → PROCEED auto_escalate: 7.5 # risk_score ≥ 7.5 → ESCALATE
The four possible decisions are:
| Decision | Meaning |
|---|---|
PROCEED | Risk and budget within thresholds — approved |
PROCEED_WITH_CONDITIONS | Approved, with specific conditions to satisfy before signing |
ESCALATE | Requires human review — risk or value exceeds thresholds |
REJECT | Request is fundamentally unfeasible |
The risk score formula is deterministic — given the same inputs, the same score is always produced. This is intentional: auditability requires reproducibility.
The Orchestrator is the final node in the graph. It makes no new decisions — it synthesises and communicates everything the upstream agents have produced.
It generates two outputs:
A structured final report (LLMFinalReport) containing the full TCO analysis, risk assessment, supplier recommendation, conditions, next steps, and the complete decision log from all agents.
A plain-language message for the requester (maximum 150 words, zero technical jargon). Terms like "TCO", "risk_score", or "rfq" are explicitly prohibited in this output. The message tells the requester: what was decided, who the recommended supplier is, what the cost looks like, and what the next concrete steps are.
If the Analyst Agent has issued an ESCALATE decision, the graph routes through the Human Review Node before producing the final report — a deliberate circuit breaker to ensure high-risk purchases always involve a human.
All agents read from and write to a single SharedState TypedDict. This means there is no message-passing between agents — only state transitions. The decision log is an append-only list that every agent writes to, producing a complete audit trail by the time the Orchestrator runs.
Each agent has two prompts: a system prompt (defining the agent's role, reasoning approach, and output rules) and a user prompt (injected at runtime with the current state fields the agent needs). The user prompt is assembled programmatically by utils/prompt_assembler.py, which injects structured data from SharedState into a template.
This separation means the system prompt defines behaviour and the user prompt delivers data — a clean contract that makes prompts easier to test and version independently.
Agent outputs are enforced via Pydantic using LangGraph's .with_structured_output(). If the LLM produces output that does not match the schema, the system raises StructuredOutputError — a typed exception that is caught at the top level in main.py and handled gracefully, returning control to the user for a new request without crashing the graph.
The model used by each agent is configured independently in config/model_registry.yaml. This means different agents can run on different Claude models — for example, a lighter model for the Orchestrator (which mostly reformats existing data) and a more capable model for the Analyst (which requires deeper reasoning).
The codebase follows a layered architecture:
procurement_system/
├── agents/ ← Agent logic (one subdirectory per agent)
├── config/ ← All YAML configuration (buying rules, thresholds, models)
├── graph/ ← LangGraph graph definition
├── nodes/ ← Graph nodes (one per agent + human_review)
├── prompts/ ← System and user prompts (versioned, checksummed)
├── repositories/ ← External API calls (Tavily, currency, PDF)
├── schemas/ ← Pydantic models for all agent inputs and outputs
├── services/ ← Business logic layer
├── tools/ ← Agent-facing tool interfaces
└── utils/ ← Shared utilities (prompt loading, LLM setup, logging)
tests/
├── agents/ ← Agent-level tests
├── nodes/ ← Node-level tests
├── repositories/ ← Repository-level tests
└── services/ ← Service-level tests
The test suite mirrors the source structure exactly, with coverage at every layer.
The system runs as a command-line application:
python main.py
A sample session:
Provide purchase requisition (or exit):
> 2 units CNC milling machine 3-axis, budget ~90000 USD, delivery by September 2026
📄 RAPORT
============================================================
Decision: PROCEED_WITH_CONDITIONS
Message: Your request for 2 CNC milling machines has been reviewed.
We recommend TechMachinery GmbH as the primary supplier,
with a total estimated cost of $82,000–$94,000.
Before signing, please confirm their delivery guarantee
in writing. Next step: contact TechMachinery GmbH
to confirm pricing and lead time.
--- Decision log ---
• [intake] Classified as MACHINERY / formal_rfq — rule: threshold_50k_usd
• [procurement] 3 supplier options identified
• [analyst] risk_score: 3.8 — PROCEED_WITH_CONDITIONS
• [orchestrator] Final report compiled
This is a v0.1 system. It is important to be clear about what it does not yet do:
enterprise_buying_rules.yaml; there is no self-learning from past decisions.These are known limitations, not oversights. The goal of this version was to validate the multi-agent workflow and the agent contracts — not to build a production-ready procurement platform.
The natural next steps for this system, in rough priority order:
| Component | Technology |
|---|---|
| Agent framework | LangGraph |
| LLM provider | Anthropic Claude (via API) |
| Output validation | Pydantic |
| Supplier web search | Tavily |
| PDF parsing | Custom service layer |
| Currency conversion | Custom service layer |
| Configuration | YAML |
| Language | Python 3.10+ |