5.3 C
United Kingdom
Tuesday, February 10, 2026

Designing Effective Multi-Agent Architectures – O’Reilly



Papers on agentic and multi-agent systems (MAS) skyrocketed from 820 in 2024 to over 2,500 in 2025. This surge suggests that MAS are now a primary focus for the world’s top research labs and universities. Yet there is a disconnect: While research is booming, these systems still frequently fail when they hit production. Most teams instinctively try to fix these failures with better prompts. I use the term prompting fallacy to describe the belief that model and prompt tweaks alone can fix systemic coordination failures. You can’t prompt your way out of a system-level failure. If your agents are consistently underperforming, the issue likely isn’t the wording of the instruction; it’s the architecture of the collaboration.

Beyond the Prompting Fallacy: Common Collaboration Patterns

Some coordination patterns stabilize systems. Others amplify failure. There is no universal best pattern, only patterns that fit the task and the way information needs to flow. The following provides a quick orientation to common collaboration patterns and when they tend to work well.

Supervisor-based architecture

A linear, supervisor-based architecture is the most common starting point. One central agent plans, delegates work, and decides when the task is done. This setup can be effective for tightly scoped, sequential reasoning problems, such as financial analysis, compliance checks, or step-by-step decision pipelines. The strength of this pattern is control. The weakness is that every decision becomes a bottleneck. As soon as tasks become exploratory or creative, that same supervisor often becomes the point of failure. Latency increases. Context windows fill up. The system starts to overthink simple decisions because everything must pass through a single cognitive bottleneck.

Blackboard-style architecture

In creative settings, a blackboard-style architecture with shared memory often works better. Instead of routing every thought through a manager, multiple specialists contribute partial solutions into a shared workspace. Other agents critique, refine, or build on those contributions. The system improves through accumulation rather than command. This mirrors how real creative teams work: Ideas are externalized, challenged, and iterated on collectively.

Peer-to-peer collaboration

In peer-to-peer collaboration, agents exchange information directly without a central controller. This can work well for dynamic tasks like web navigation, exploration, or multistep discovery, where the goal is to cover ground rather than converge quickly. The risk is drift. Without some form of aggregation or validation, the system can fragment or loop. In practice, this peer-to-peer style often shows up as swarms.

Swarms architecture

Swarms work well in tasks like web research because the goal is coverage, not immediate convergence. Multiple agents explore sources in parallel, follow different leads, and surface findings independently. Redundancy is not a bug here; it’s a feature. Overlap helps validate signals, while divergence helps avoid blind spots. In creative writing, swarms are also effective. One agent proposes narrative directions, another experiments with tone, a third rewrites structure, and a fourth critiques clarity. Ideas collide, merge, and evolve. The system behaves less like a pipeline and more like a writers’ room.

The key risk with swarms is that they generate volume faster than they generate decisions, which can also lead to token burn in production. Consider strict exit conditions to prevent exploding costs. Also, without a later aggregation step, swarms can drift, loop, or overwhelm downstream components. That’s why they work best when paired with a concrete consolidation phase, not as a standalone pattern.

Considering all of this, many production systems benefit from hybrid patterns. A small number of fast specialists operate in parallel, while a slower, more deliberate agent periodically aggregates results, checks assumptions, and decides whether the system should continue or stop. This balances throughput with stability and keeps errors from compounding unchecked. This is why I teach this agents-as-teams mindset throughout AI Agents: The Definitive Guide, because most production failures are coordination problems long before they are model problems.

If you think more deeply about this team analogy, you quickly realize that creative teams don’t run like research labs. They don’t route every thought through a single manager. They iterate, discuss, critique, and converge. Research labs, on the other hand, don’t operate like creative studios. They prioritize reproducibility, controlled assumptions, and tightly scoped analysis. They benefit from structure, not freeform brainstorming loops. This is why it’s not a surprise if your systems fail; if you apply one default agent topology to every problem, the system can’t perform at its full potential. Most failures attributed to “bad prompts” are actually mismatches between task, coordination pattern, information flow, and model architecture.

Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.

Breaking the Loop: “Hiring” Your Agents the Right Way

I design AI agents the same way I think about building a team. Each agent has a skill profile, strengths, blind spots, and an appropriate role. The system only works when these skills compound rather than interfere. A strong model placed in the wrong role behaves like a highly skilled hire assigned to the wrong job. It doesn’t merely underperform, it actively introduces friction. In my mental model, I categorize models by their architectural personality. The following is a high-level overview.

Decoder-only (the generators and planners): These are your standard LLMs like GPT or Claude. They are your talkers and coders, strong at drafting and step-by-step planning. Use them for execution: writing, coding, and producing candidate solutions.

Encoder-only (the analysts and investigators): Models like BERT and its modern representations such as ModernBERT and NeoBERT do not talk; they understand. They build contextual embeddings and are excellent at semantic search, filtering, and relevance scoring. Use them to rank, verify, and narrow the search space before your expensive generator even wakes up.

Mixture of experts (the specialists): MoE models behave like a set of internal specialist departments, where a router activates only a subset of experts per token. Use them when you need high capability but want to spend compute selectively.

Reasoning models (the thinkers): These are models optimized to spend more compute at test time. They pause, reflect, and check their own reasoning. They’re slower, but they often prevent expensive downstream mistakes.

So if you find yourself writing a 2,000-word prompt to make a fast generator act like a thinker, you’ve made a bad hire. You don’t need a better prompt; you need a different architecture and better system-level scaling.

Designing Digital Organizations: The Science of Scaling Agentic Systems

Neural scaling1 is continuous and works well for models. As shown by classic scaling laws, increasing parameter count, data, and compute tends to result in predictable improvements in capability. This logic holds for single models. Collaborative scaling,2 as you need in agentic systems, is different. It’s conditional. It grows, plateaus, and sometimes collapses depending on communication costs, memory constraints, and how much context each agent actually sees. Adding agents doesn’t behave like adding parameters.

This is why topology matters. Chains, trees, and other coordination structures behave very differently under load. Some topologies stabilize reasoning as systems grow. Others amplify noise, latency, and error. These observations align with early work on collaborative scaling in multi-agent systems, which shows that performance does not increase monotonically with agent count.

Recent work from Google Research and Google DeepMind3 makes this distinction explicit. The difference between a system that improves with every loop and one that falls apart is not the number of agents or the size of the model. It’s how the system is wired. As the number of agents increases, so does the coordination tax: Communication overhead grows, latency spikes, and context windows blow up. In addition, when too many entities attempt to solve the same problem without clear structure, the system begins to interfere with itself. The coordination structure, the flow of information, and the topology of decision-making determine whether a system amplifies capability or amplifies error.

The System-Level Takeaway

If your multi-agent system is failing, thinking like a model practitioner is no longer enough. Stop reaching for the prompt. The surge in agentic research has made one truth undeniable: The field is moving from prompt engineering to organizational systems. The next time you design your agentic system, ask yourself:

  • How do I organize the team? (patterns) 
  • Who do I put in those slots? (hiring/architecture) 
  • Why could this fail at scale? (scaling laws)

That said, the winners in the agentic era won’t be those with the smartest instructions but the ones who build the most resilient collaboration structures. Agentic performance is an architectural outcome, not a prompting problem.


References

  1. Jared Kaplan et al., “Scaling Laws for Neural Language Models,” (2020): https://arxiv.org/abs/2001.08361.
  2. Chen Qian et al., “Scaling Large Language Model-based Multi-Agent Collaboration,” (2025): https://arxiv.org/abs/2406.07155.
  3. Yubin Kim et al., “Towards a Science of Scaling Agent Systems,” (2025): https://arxiv.org/abs/2512.08296.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles