Agent Routing and Cost Control in Multi-Agent Systems
Notes on the fundamentals of agent route / agent routing, the common ways to implement it, and how to think about cost control in multi-agent systems.
Core idea#
What I want to capture in this card is a concept that’s very basic in agent systems, but becomes critical the moment you build a real product:
Agent Routing / Agent Route is essentially the scheduling layer in a multi-agent system that decides “who should handle this, in what order, and with what context.”
Reading it as just “dispatch requests to different agents” isn’t quite enough. More precisely, it involves three things at once:
- Task identification: what type does the current request actually belong to
- Capability matching: which agent / tool / workflow should it go to
- Execution orchestration: is it a single hop, a multi-step chain, or a supervisor watching continuously
The explanation in the Gemini share link is very introductory but very clear:
- An Agent Route is a lot like a “front-desk triage station”
- It decides whether a request goes to the search agent, the code agent, the finance agent, or some other execution unit
- In complex tasks, it isn’t just a single dispatch but possibly a continuous relay of tasks
And those two Botpress articles push this one layer further toward productization:
- One emphasizes that routing is the brain of a multi-agent system
- The other emphasizes that once routing becomes dynamic, multi-layered, and multi-model, cost and latency quickly turn into system-level problems
So I now prefer to understand this topic as:
Agent Routing is not a small feature but the core layer in a multi-agent architecture that ties together intent understanding, task dispatch, context passing, latency control, and cost control.
Key points#
1. Agent Routing solves not “can it do this” but “who should do this”#
The default assumption of a single-agent system is:
- whatever the user says
- the same agent tries to catch it all
But the moment multiple specialist roles start appearing in the system, that assumption falls apart.
For example, a system might have:
- a research agent
- a coding agent
- a browser agent
- a reporting agent
- a human escalation channel
At that point, the first problem to solve isn’t whether a given agent can write a report, but:
- does the current request need search, analysis, writing, or executing an action
- is it done by a single agent, or relayed across several
- which agent is most worth calling first
- which step must be handed off to a human
In other words, routing first solves task ownership.
If this step is done poorly, then even if each agent is individually capable, the system as a whole will feel:
- slow to respond
- prone to taking detours
- prone to calling tools redundantly
- expensive
- like the user is getting “passed around”
2. Routing is already part of multi-agent#
The original question had a very correct intuition:
“This already involves multi-agent too, right?”
Yes — and usually it’s not a peripheral problem, it’s the core one.
Because as soon as you have:
- multiple specialist agents
- multiple tools
- multi-stage tasks
- a supervisor / orchestrator
routing is already the scheduling hub of the multi-agent system.
More concretely, multi-agent setups usually contain at least the following layers:
- Orchestrator / Router
- responsible for understanding how to break down the current task
- decides who to hand it to
- Worker / Specialist Agents
- each handles a specific subtask
- Context Handoff Layer
- passes the necessary context, parameters, and history downstream
- Verification / Supervisor Layer
- checks whether the result is acceptable, and retries or reassigns when needed
So “multi-agent” isn’t just a bunch of agents sitting side by side; it’s these agents having a mechanism for who goes first, who goes after, when to fall back, and when to escalate.
The first capability to emerge within that mechanism is routing.
3. Common routing approaches actually fall into three categories#
A. Static / rule-based routing#
The simplest approach is:
- keyword matching
- if/else
- regex
- explicitly enumerated rules
For example:
- refund / billing appears → finance agent
- code / bug / stack trace appears → coding agent
- angry complaint appears → human escalation
Pros:
- fast
- cheap
- controllable
- easy to explain and debug
Cons:
- very brittle
- bad at handling natural-language variation
- easily wrong when faced with multiple intents, shifts in tone, or cross-turn conversation
This approach suits:
- very clear task boundaries
- a limited business space
- extremely high requirements for controllability
B. LLM-based semantic routing#
This is also the most common approach now:
- give the model a description of the agent / tool capabilities
- let the model judge “where to go next” based on the user input and context
- output a structured routing result, for example:
{"next_agent":"research_agent","reason":"user asks for competitor info before writing report"}jsonPros:
- flexible
- understands semantics
- can handle requests that are fairly natural, fuzzy, and cross-turn
Cons:
- slower than rule-based routing
- spends extra tokens on every routing decision
- can misjudge, and can also hallucinate an inappropriate agent
So while it’s powerful, abusing it turns every step into “ask the big model one more time,” which ultimately leads to:
- higher latency
- higher cost
- the whole system over-relying on the router model
C. Hierarchical / supervisor routing#
Slightly more complex systems move into hierarchical routing:
- the top-level supervisor first decides the overall direction
- a subsystem then does second-level routing internally
- after a sub-agent finishes, it returns the result to the upper layer for acceptance
This is no longer just “dispatch” but orchestration + supervision.
Its pros are:
- well suited to complex business flows
- you can explicitly control stages and responsibility boundaries
- easy to attach retry / validation / fallback
But the cons are equally obvious:
- complex architecture
- longer chains
- context passing is more prone to bloat
- one full task run may consume many rounds of model calls
4. The real hard part isn’t picking the agent, it’s sending the right context along with it#
This point tends to get understated in a lot of “what is agent routing” explanations.
In practice, the hardest part of routing often isn’t:
- “research agent or coding agent”
but:
- which part of the context to pass to the downstream agent
- which history to keep, and which noise to trim
- whether parameters need to be standardized
- who continues the decision-making once the result comes back
If handoff is done poorly, you get the classic problems:
- the sub-agent doesn’t know why it was called
- the sub-agent didn’t receive the key constraints
- upstream and downstream see inconsistent task descriptions
- every agent switch re-feeds a large chunk of context, wasting tokens
So routing is actually bound together with the following terms:
- orchestration
- context engineering
- state management
- tool schema design
My take is:
In a real system, routing quality largely depends on handoff quality.
5. Routing design directly determines the cost structure#
That Botpress cost-optimization article isn’t specifically about multi-agent, but it’s a very instructive read in this context.
Because one natural risk of multi-agent systems is:
Every extra routing decision, every extra agent handoff, every extra layer of model calls — cost and latency accumulate.
A few points especially worth noting:
a. Don’t hand every problem to the most expensive model#
High-cost models should be reserved for:
- complex judgments
- high-risk branching
- high-value answers
Whatever can be solved with lighter logic shouldn’t be forced onto a big model.
This means the routing layer itself should have a cost-aware policy:
- FAQ / clear rules → static routing
- medium-complexity intent → small model / embedding routing
- fuzzy and high-value questions → big-model router
b. Narrow the scope first, then call expensive capabilities#
This is a lot like the idea of scoping a knowledge base.
Don’t immediately have one master agent reason over all documents, all tools, and all specialist descriptions. A more sensible approach is usually:
- coarsely classify the task first
- then enter a specific subdomain
- then do finer routing within that subdomain
In other words:
Narrow the search space first, then do expensive reasoning.
c. Don’t disguise simple tasks as AI tasks#
If an action is essentially just:
- a table lookup
- a rule check
- a data format conversion
- a fixed-template reply
then it should be done with code / a deterministic workflow / retrieval as much as possible.
Otherwise, the most common waste in a multi-agent system is:
- something that’s clearly fixed logic
- still gets handed layer by layer to an agent to “think about”
- ultimately turning something that could finish in a few milliseconds into several rounds of LLM calls
6. A more practical framing: routing is the balancer of “correctness, latency, cost”#
Conceptually, routing looks a lot like “smart dispatch.”
But from an engineering angle, it’s actually optimizing three things that conflict with one another:
Correctness#
- pick the right agent
- pass the right context
- fall back / escalate when necessary
Latency#
- fewer detours
- fewer redundant calls
- get the user a result as fast as possible
Cost#
- fewer unnecessary model calls
- less repeated reasoning
- control token, tool, and external-API spend
Whether a routing design is good usually isn’t about how “smart” it is, but whether it can keep an acceptable balance among these three.
7. Rather than a “jack-of-all-trades agent,” an orchestrator + specialists is often more reasonable#
There’s an implicit logic in that Botpress routing article that I largely agree with:
- don’t make a single agent responsible for understanding, executing, and explaining all the business at once
- the more stable approach is usually one orchestrator that judges and several specialists that execute
The benefits of doing this are:
- each agent’s context window can be smaller
- prompts are more focused
- the hallucination surface is narrower
- debugging is clearer
- permission boundaries are easier to control
But it’s not a free lunch.
Because once you split into multiple specialists:
- the router prompt has to be clearer
- the handoff schema has to be clearer
- logging and tracing must be better
- otherwise the system just becomes “more agents, harder to debug”
8. A direction well worth tracking long-term: semantic routing doesn’t have to mean asking an LLM every time#
Both the Gemini summary and the Botpress routing article lean toward LLM-based routing. But from a system-design view, I think what’s more worth remembering is:
Semantic routing ≠ calling an expensive big model to decide every single time.
There are quite a few alternative routes:
- embedding similarity routing
- a small classifier model
- hybrid rule + semantic routing
- staged routing (coarse classification + fine classification)
The point of these approaches is:
- retain a certain amount of semantic understanding
- while pushing latency and cost down
As a system grows, this kind of hybrid routing is often more robust than “relying entirely on one big model as the front-desk controller.”
Current understanding / conclusion#
I now understand Agent Routing as the following sentence:
It’s the scheduling kernel of a multi-agent system, responsible for deciding — across correctness, latency, and cost — how a task gets broken down, dispatched, relayed, and collected.
What’s most worth committing to memory here isn’t the definition but a few judgments:
- Routing is itself a core multi-agent capability, not an auxiliary concept.
- The real difficulty isn’t just picking the agent, it’s context handoff.
- Routing design directly determines the cost curve.
- If it can be done with rules, don’t make it semantic; if it can be done lightweight, don’t make it a heavy model.
- A mature system looks more like orchestrator + specialists than a single jack-of-all-trades master agent.
Implications for actual work#
If I ever design an agent system myself, I think the routing layer should at minimum answer these questions clearly first:
1. What is the unit of routing?#
- route to an agent
- route to a tool
- route to a workflow
- or route to a human
2. What is routing based on?#
- keywords
- schema
- embeddings
- small-model classification
- big-model judgment
- a hybrid strategy
3. What is the minimal context for handoff?#
- the goal
- the constraints
- completed steps
- key intermediate results
- prohibited actions
4. How does fallback work?#
- ask a clarifying question
- switch agents
- downgrade to a rule-based flow
- escalate to a human
5. What is the cost policy?#
- which requests aren’t worth going multi-agent for
- which requests must be routed conservatively
- which requests are worth calling a high-cost model for
To be added#
Directions still worth fleshing out later:
- The specific abstractions LangGraph / CrewAI / Botpress each use for routing
- How a context handoff schema should be designed so it doesn’t get bloated
- Which key nodes multi-agent tracing / observability should record
- The applicability boundaries of an embedding router, a small-model router, and a big-model router
- The relationship between routing and permission control / prompt injection risk
Related links / sources#
- Gemini share: https://gemini.google.com/share/572c022c7cde ↗
- Botpress: Ultimate Guide to AI Agent Routing (2026)
- Botpress: How to Optimize AI Spend Cost in Botpress