Single AI Agent vs. Multiple Specialized Agents: When to Split

The architectural decision a lot of AI builders get wrong and how to think about it.

I've been building a multi-agent AI assistant for a while now. Recently, I hit a problem that I think a lot of people building AI products will eventually face — and the solution turned out to be one of the more interesting architectural decisions I've made. Let me walk you through it.

The Situation

My assistant has a pipeline: a Planner figures out what to do, an Executor (Worker) does the work using tools, and a Synthesizer crafts the final response. One Worker agent handles 80+ tools — calendar, email, file operations, web search, code execution, you name it.

Note: The reason why I implemented this planner -> worker -> synthesizer system is to keep the worker's context clean from unnecessary noise. Bloating an LLM's context with things that it doesn't need created massive hallucination problems, specifically around tool calls. This architecture keeps the worker context clean and has successfully been able to remove tool call hallucinations. This topic requires a whole different article, which I will publish soon, so stay tuned!

It works. For most things, it works really well. But "most things" isn't good enough when you're trying to build something reliable.

The Problem

I asked my assistant to review my resume. It read the file, analyzed it, and told me to change "Jan 2026 to Present" because it was inaccurate. Its suggestion? Change it to "Jan 2026 to Present." The exact same text. The model misread the content, hallucinated an issue, and confidently recommended changing something to what it already was. This wasn't a one-off. Every time I asked it to do something that required precision - reviewing a document, comparing two files, analyzing data - the same class of errors showed up. The agent was treating "review my resume" with the same configuration as "what's the weather." Same model, same temperature, same level of attention to detail.

Think about that for a second. In what world would you assign the same person, with the same instructions, to both check the weather and audit a legal document?

The Insight

When I dug into why this was happening, I found three root causes:

Wrong model for the job: The Worker used a fast, cheap model (Haiku) for everything. Great for "create a todo," terrible for "find inconsistencies in this 3-page document."
Context degradation: My pipeline has three stages. By the time the Synthesizer - the agent that actually writes the response - gets the document content, it's been summarized and truncated. The agent writing the feedback has never seen the exact text it's commenting on.
No task-appropriate behavior: There was no mechanism to say "for this specific task, be more careful, quote the source text before commenting, and use a smarter model."

The fix seemed obvious: specialize. But how you specialize matters enormously.

Two Approaches to Specialization

I landed on a two-tier system, and I think this framework applies to almost any AI product that uses agents.

Tier 1 — Specialized Profiles (Configuration-driven)

Same agent, different settings. Think of it like giving the same employee different checklists for different tasks. The Worker agent's execution loop - call tools, evaluate results, decide next step - works fine for most tasks. What needed to change was the configuration: which model to use, how creative to be, how much context to preserve, and what domain-specific instructions to follow.

So I created profiles. A "document review" profile uses a smarter model at low temperature with instructions to quote text before critiquing it. An "email" profile uses a fast model with instructions about threading and batch operations. A "data ops" profile uses minimal context — when you create a todo, the Synthesizer doesn't need the full tool response, just a confirmation.

Ten profiles. Same Worker code. Each one tuned for its task category. This covers roughly 80% of requests. The execution pattern is identical — the agent calls tools in a loop. Only the configuration changes.

Tier 2 — Specialized Agents (Code-driven)

Different agent, different execution loop entirely. This is when the task fundamentally cannot be handled by the generic "call tools and evaluate" pattern.

Take data analysis. The generic Worker would read a CSV file and try to analyze it in one shot — sending raw data to the LLM and hoping for insights. That's not how data analysis works. You need to load the data into pandas, explore the structure, run computations, notice something interesting, drill deeper, maybe create a visualization, iterate. That's a fundamentally different execution pattern — not just different settings.

So I built a DataAnalysisAgent with its own tool-call loop. It starts a Python REPL session, executes code iteratively, reads outputs, decides what to investigate next, and can run up to 30 iterations of exploration before producing its final report.

Same for document review. The generic Worker reads a file and analyzes everything at once. A dedicated DocumentReviewAgent reads the file, enforces a quote-first protocol (you must quote the exact text before commenting on it), and structures feedback by category. The execution pattern is different — not just the configuration.

The Decision Framework

Here's how I decide which tier a task belongs to:

Can the generic tool-call loop handle this if I just change the settings? → Profile.

Checking the weather with a fast model → Profile
Drafting an email with a creative temperature → Profile
Creating a todo with minimal confirmation context → Profile

Does this task need a fundamentally different execution pattern? → Specialized Agent.

Iterative data analysis with REPL computation → Agent
Section-by-section document review with structured output → Agent
Side-by-side comparison requiring both texts in context simultaneously → Agent
Multi-source research with parallel sub-agents and verification → Agent

The key question is about the loop, not the content. If the task fits "call tool → evaluate → repeat," it's a profile. If the task needs its own choreography, it's an agent.

The Glue: Context Fidelity

One thing I haven't seen discussed much in the AI architecture space is what I call context fidelity — how much of the raw tool output should reach the final response generator.

When you review a document, the Synthesizer needs the full verbatim text to generate accurate feedback. When you create a todo, the Synthesizer just needs "todo created: Buy groceries." Sending the full tool response for a todo creation is wasteful. Sending a truncated document for a review is destructive.

So each profile and agent declares its fidelity level:

Verbatim: No truncation. The Synthesizer sees exactly what the executor saw. Used for document review, code review, comparisons.
Standard: Reasonable truncation. Current behavior. Used for most tasks.
Compact: Aggressive summarization. Just confirmations. Used for CRUD operations.

This single setting fixed the "telephone game" problem. The resume review now works because the Synthesizer actually sees the resume text, not a summary of it.

How This May Apply to Your AI Product

If you're building anything with AI agents — whether it's a customer service bot, an internal tool, or a product feature — you'll eventually hit this same wall. Your one-size-fits-all agent will fail on tasks that need precision, or it'll be overkill for tasks that need speed.

Here's what I'd suggest:

Start with one agent. Don't over-engineer. Get the generic loop working well.
When failures cluster around a task type, create a profile first. Nine times out of ten, the execution pattern is fine — you just need different settings. This is cheap and fast to implement.
When a profile isn't enough — when the task needs a different choreography — build a specialized agent. But make it extend a common base class so your orchestrator doesn't need to know the difference.
Control context fidelity. This is the one most people miss. Your response generator's quality is bounded by the quality of what it receives. If you're truncating everything uniformly, you're silently degrading your precision tasks.
Keep your executor blind. The Worker should only see its instructions and approved tools. Session state, conversation history, user preferences — that stays at the orchestrator level. The moment your executor starts seeing everything, it starts hallucinating based on context that has nothing to do with the current task.

The resume review works now. The model quotes the exact text, verifies its suggestion is actually different from the original, and organizes feedback by category. Not because I switched to a better model — but because the architecture now gives the right configuration to the right task.

That's the thing about AI products. The model is rarely the bottleneck. The architecture is.

This is the first article in a series about the architectural decisions behind building reliable AI applications. Originally published on LinkedIn.

Next in the series: Planner, Worker, Synthesizer: The 3-Agent Pattern That Actually Works. Also worth reading: The Guardrails Problem.