We use cookies to improve your browsing experience and to analyze our website traffic. By clicking “Accept All” you agree to our use of cookies. Privacy policy.
5 readsMIT License

Multi Agent collaboration via a ‘Dynamic Router’ Orchestrator and GraphQL: handle composite prompts

Table of contents

Abstract

Handling complex 'composite' user prompts to update an application model can require specialized Agents, each knowing how to generate modifications for certain parts of the application.
Combining a 'dynamic router' (orchestrator) with a classic Agent Oriented Programming approach can fulfill such complex user prompts, with minimal coupling between agents.

Methodology

A framework was created (in Python) and was evaluated by building several examples in multiple domains:

  • home automation
  • 'sim life' game world generation
  • web app generation

The framework is available at https://github.com/mrseanryan/gpt-multi-atomic-agents [under the MIT license].

Agents via a shared language of mutations

Agents can be built using the framework, by first defining a 'language' of either function calls, or GraphQL mutations (hereafter, both are referred to as 'mutations'). The mutations can be grouped together in logical categories for reuse. Agents are defined with a simple description (a simple LLM prompt, typically a single line prompt), and inputs and output. The inputs and output are defined in terms of the mutation categories. The categories simplify modelling how agents can understand each others output.

Agents are defined declaratively, using minimal prompting and relying on a structured approach of described properties (via atomic-agents and Instructor). This declarative approach allows for Agents to be stored as data (JSON files) which enables custom, editable and shareable agents.

Agents collaborate via a Blackboard of mutations

Agent collaboration is supported via a blackboard (shared memory), an approach inspired by classic Agent Oriented Programming. Each agent can read a filtered view of the blackboard, in terms of the categories that it can accept as input. Each agent can also write its output to the blackboard. By carefully aligning the input and output categories of the agents, the agents can indirectly collaborate, by reading each other's output. This low-coupled manner of agent collaboration allows for new agents to be added or modified with minimal effort. Also the system can grow to a large number of agents.

Agents are orchestrated via a 'dynamic router'

A 'dynamic router' is used to take the user's prompt and identify which agents are relevant.

NOTE: the term 'Dynamic Router' is used to distinguish from other systems which either use imperative logic to decide on agent selection, or use a specified chain of agents, or other such 'static' approaches.

To assist the router to select agents, each agent has a description and a list of topics that it supports. For quality purposes and to avoid unwanted output (LLMs tend to be eager to output), the router rewrites the user prompt for each agent.

Additionally, in order to help the agent 'know' more about what the user is talking about (for example, a lawn mower agent would want to know, is this the front or back lawn), then an agent can have agent-parameters. When re-writing the user's prompt for an agent, the router also tries to populate values for the agent-parameters. This was found to improve quality, so that agents would better focus on relevant details of the user's prompt.

The router also functions as an orchestrator: it takes the list of recommended agents identified, and builds an execution plan. Each step of the execution plan specifies an agent to execute, along with a rewritten version of the user prompt. Depending on the complexity of the user prompt, the same agent may be used in multiple steps of the plan, for example if there is an inter-dependency between agents according to their inputs and outputs.

3 stage generation

The user's prompt is fulfilled by using an LLM to generate mutations, which are then applied on the client side.

A 3 stage process is used, to manage the complexity and allow for human-in-the-loop feedback:

  1. Plan
  2. Generate
  3. Execute/Evaluate

More details of this PGE (Plan, Generate, Execute/Evaluate) pattern are described at this Medium article.

Plan Stage

The client of the framework can iterate over the plan with the user, using human-in-the-loop to improve quality and user engagement.

Generate Stage

When the user is satisfied with the plan, then the client uses the framework to execute the plan, generating mutations. Each step of the plan involves executing the agent for that step. Existing application state is input via mutations ("everything is a function"). Agent collaboration is achieved via the blackboard: at each step, the executing agent reads previous mutations that it understands (according to its input categories). The executing agent writes its output as new mutations, stored in the blackboard. The new mutations may be read by further agents according to their input categories. When all the steps have been executed, then the blackboard has a complete set of mutations, ready for the client to execute.

Execute/Evaluate Stage

The mutations are handled on the client, executing them via appropriate handlers which the client provides. Typically the handlers would update the application model, according to the domain. The mutations can be routed by the client to the handlers, using the categories.

At this stage, run-time evaluation of the LLM output can be performed, by verifying that the agent output matches the handlers which the client uses to execute.

Supporting Libraries

Base client libraries were built (in Python and in TypeScript) to support the framework.

Validation

Test clients were built using the client libraries to demonstrate and validate the framework.
Different domains were chosen in order to verify that the framework was general-purpose, and not coupled to one domain or set of clients.

Results

The approach was successful in handling composite user prompts, where more than one AI Agent is required to generate the appropriate mutations.

The choice of modelling both data and updates as mutations ("everything is a function") helped to simplify the problem of how to model input and output in terms the agents could collaborate on. In particular, Function Calls are simple and reliable, operating with off the shelf LLMs such as the Anthropic Claude family.

However, modelling both data and mutations all as function calls proved to be too simple for some domains. More complex domains such as app generation require more detailed and composite mutation and data structures, which may be possible via GraphQL. The choice of GraphQL has a drawback, that there is little support for validating incoming 'snapshots' of data (mutations and queries can be validated, but not JSON documents).

The initial planning (routing) does add latency, however the overhead (about 6 seconds) is minimal compared to the typical cost of generating mutations (about 30 seconds, up to one minute). In addition, with human-in-the-loop it was found that the perceived latency was less, since although the total wall-clock time is increased by having a separate planning stage, for the user there is more interaction and less idle waiting time.

A feature missing from this approach is that of tool calling, where an agent that requires more data can stop generating output, and generate 'tool calls'. This could be added via either GraphQL queries, or by specially named function calls.

Tradeoffs

The results were mostly positive and the framework serves as a working example of how to build collaborating agents in a low-coupled manner. As always, there are tradeoffs.

Advantages

  • Flexibility: handles complex ‘composite’ user prompts which require generation by multiple specialized agents.
  • Extensible on server side: the agents are not coupled, so agents can be added to the system via the Blackboard.
  • Extensible on client side: agents are serializable, since an agent is simply a GraphQL schema with allowed mutations and queries, along with a system prompt. Since agents are also low coupled, it is possible that agents could be submitted by a trusted client, allowing for further extension and customization of the system.
  • Robustness: agents generate independently via the Blackboard. Data schema is managed via GraphQL which is a mature graph technology, or via Function Calls which are simple to validate (e.g. via Pydantic).
  • Quality: each agent is an expert at handling a particular part of the application domain and can be tuned for that task. The GraphQL schema helps to ensure compatible input and output. The optional Evaluator can be used to detect misalignments and take action, for example by asking agents to regenerate given the evaluation results.

Disadvantages

  • Performance and latency: the ‘Dynamic Router’ Orchestrator uses an LLM to analyze the user prompt: this introduces some delay before the agents can start working. A mitigation could be to use some classic NLP techniques such as Type-Token Ration (TTR) or Hapax richness: if the user prompt is simple, then a simpler classifier based router can be used. Another possible mitigation would be to always first use a classifier based router, and only if its output is doubtful then employ the Orchestrator.

  • Performance and latency: the Evaluator introduces delays before the final response can be sent to the client. Rather than adding an Evaluator, the relevant agents could be improved by for example tuning their prompts or involving a fine-tuned LLM for that task.

  • Complexity: the Orchestrator is only really for use when there is already a requirement for multiple specialized Agents, and you expect complex ‘composite’ user prompts that require generation by more than one agent. Prematurely adding a ‘Dynamic Router’ Orchestrator may be introducing unnecessary complexity.

  • Scalability: A consideration for scalability is whether to host the blackboard on the client or on the server. The framework itself accepts the blackboard 'on the wire' in each request from the client. This has the benefit of being simpler to scale, since the server is stateless. However a drawback is that there is more network traffic, and more responsibility for the client. The second drawback was mitigated by providing client side libraries to support client development.

Conclusion

The ‘Dynamic Router’ Orchestrator is an advanced technique for handling complex ‘composite’ user prompts that require generation by multiple specialized agents. The technique has tradeoffs and should only be employed for more complex application domains where there is already a need for multiple specialized agents. More complex domains would require the GraphQL format rather than the simple Functions format, and that has the drawback of limited validation for incoming data. The use of a blackboard combined with agents defined in terms of input and output mutations resulted in low-coupled agents that can successfully collaborate.

Models

There are no models linked

Datasets

There are no datasets linked