Talk: Architecting Multi-Agent Developer Workflows

Felipe Hlibco

I gave a talk recently on multi-agent developer workflows. The prep forced me to organize a lot of scattered thinking into something presentable, and honestly that exercise alone justified the talk. This post distills the core arguments and architectural patterns I covered — not a transcript, more like the cleaned-up version of my speaker notes plus some refinements I’ve made since.

The punchline, for those who want it upfront: multi-agent systems work when the problem genuinely decomposes into specialized roles. But the instinct to split everything into agents? Same instinct that over-decomposed monoliths into too many microservices circa 2016. The pattern helps when it helps; it hurts when applied out of habit.

The Surge #

Gartner reported a 1,445% surge in inquiries about multi-agent systems from Q1 2024 to Q2 2025. Not a typo. Fourteen hundred percent.

I’ve seen this kind of spike before. Microservices around 2015-2016. Kubernetes in 2018-2019. The arc follows the same shape every time: a genuinely useful architectural concept gets discovered by the broader market, hype peaks, everyone tries applying it everywhere, most implementations turn out premature, and eventually the industry settles on sensible use cases. We went through this at TaskRabbit when we migrated from a monolith — the temptation to split into dozens of services was real, and we had to actively resist it.

Right now we’re somewhere between “everyone tries to apply it everywhere” and “most implementations are premature.” Good moment to talk patterns and anti-patterns.

Four Core Patterns #

After looking at dozens of multi-agent implementations (production systems, not conference demos), I’ve landed on four architectural patterns that cover the vast majority of useful designs.

1. Sequential Pipeline #

Simplest pattern. Agent A produces output, Agent B consumes it, Agent C takes B’s output. Deterministic handoffs, no branching. Think Unix pipes for AI agents.

When it works: Workflows with clear phase boundaries. Code generation followed by testing followed by documentation. Data extraction, then transformation, then validation. Each phase demands a different competency.

When it breaks: When phases aren’t actually independent. If Agent B needs to send feedback to Agent A — which happens constantly in real workflows — you’ve introduced a cycle that sequential flow can’t express. Now you need a loop or a coordinator.

Google’s ADK ships a SequentialAgent for exactly this. Deterministic flow control without LLM overhead on the orchestration layer; the agents think, the framework routes.

2. Generator-Critic #

Two agents in a loop: one generates, one evaluates. The generator produces output; the critic assesses quality, flags issues, sends feedback. Generator revises. Loop continues until the critic signs off or you hit a max iteration count.

When it works: Any task where quality assessment costs less than quality production. Writing (draft, review, revise). Code generation (write, test, fix failures). Data cleaning (propose corrections, validate against constraints, refine).

When it breaks: When the critic can’t articulate actionable feedback. If criticism amounts to “not good enough” without saying why, the generator has no gradient to follow. You burn tokens in a loop that never converges.

This pattern sits closest to how I actually work. Nobody writes perfect code on the first pass; you write, test, fix, repeat. Generator-critic automates that cycle. I used a variant of this for the humanization pipeline in my book project — a writer agent drafts, a detector scores, feedback loops back. Took a few iterations to get the feedback actionable enough, which only reinforced my point about the critic needing specificity.

3. Coordinator (Hub-and-Spoke) #

A central coordinator agent receives tasks, analyzes them, routes subtasks to specialists. Coordinator aggregates results, handles conflicts.

When it works: Complex tasks where subtasks require genuinely different expertise. A customer support system where billing questions go to the billing agent, technical issues go to troubleshooting, account changes go to account management. The coordinator handles classification and routing.

When it breaks: Two failure modes. First, the coordinator becomes a bottleneck: every subtask routes through a single LLM call, adding latency and creating a single point of failure. Second — and this one bites harder — the taxonomy of specialist agents turns out to be unclear. Tasks don’t cleanly decompose into specialist domains, and the coordinator spends most of its time deciding who should handle ambiguous requests. I’ve seen teams burn weeks designing agent taxonomies that looked clean on a whiteboard but fell apart the moment real user queries hit the system.

4. Scatter-Gather (Parallel Fan-Out) #

Multiple agents receive the same input simultaneously, each processing independently. Results get gathered and synthesized by a coordinator or merge function.

When it works: Tasks that benefit from diverse perspectives. Research synthesis: send the same question to agents with different knowledge bases, merge results. Testing: run code against different test suites in parallel. Analysis: multiple agents assess the same dataset with different methodologies.

When it breaks: When the merge step proves as hard as the original problem. If synthesizing five analyses into a coherent conclusion requires as much intelligence as producing any single one, you haven’t saved anything. You just moved the hard work downstream.

Google ADK’s ParallelAgent handles fan-out. Wrap the gather step in a SequentialAgent for the full pattern.

The Protocol Layer: MCP and A2A #

Two protocols emerged in 2025 that tackle the plumbing problem in multi-agent systems. They’re complementary, not competing.

MCP (Model Context Protocol) #

Anthropic introduced MCP in November 2024; it has since moved to the Linux Foundation. Core idea: standardize how AI agents discover and consume tools. Instead of every framework inventing its own tool integration format, MCP provides a common protocol — tool servers advertise capabilities, agent clients consume them.

The adoption numbers tell the story: 97 million+ monthly SDK downloads as of early 2026. That’s not experimental anymore. That’s infrastructure.

For developer workflows, MCP means agents connect to any compatible tool server (your IDE, CI/CD system, database, monitoring platform) through one integration pattern. Write the MCP client once; connect to everything.

A2A (Agent-to-Agent Protocol) #

Google introduced A2A in April 2025, now also under the Linux Foundation with 150+ supporting organizations. Where MCP standardizes agent-to-tool communication, A2A standardizes agent-to-agent communication. It defines how agents discover each other, negotiate capabilities, exchange messages, manage shared state.

Why this matters: without A2A (or something equivalent), every multi-agent system invents its own inter-agent communication format. The coordinator pattern needs to know what each specialist can do and how to talk to it. With A2A, specialists advertise capabilities through “Agent Cards,” and coordinators discover them via a standardized registry.

The protocols split neatly: MCP covers the vertical axis (agent to tools), A2A covers the horizontal axis (agent to agent). Together they form the interoperability layer that multi-agent systems need to escape proprietary, framework-locked implementations.

Framework Comparison: ADK, LangGraph, CrewAI #

I covered three frameworks in the talk. Each represents a different design philosophy.

Google ADK provides deterministic workflow agents (SequentialAgent, ParallelAgent, LoopAgent) alongside LLM-powered agents. Not every orchestration decision needs an LLM — routing a task through a fixed pipeline requires reliable execution, not intelligence. ADK separates those concerns. Ships in both Python and TypeScript, which matters if your team leans frontend-heavy.

LangGraph models agent workflows as stateful directed graphs. Nodes are tasks (LLM calls or deterministic logic), edges define transitions, state persists across the graph. Most flexible framework for custom topologies; if your workflow doesn’t fit sequential/parallel/loop patterns, LangGraph lets you define arbitrary graphs. The tradeoff: graph-based workflows get harder to reason about and debug than linear ones. I’ve watched teams spend more time debugging their graph topology than the actual agent logic inside the nodes.

CrewAI takes a role-based approach. Define agents by role (“researcher,” “writer,” “editor”), assign goals and backstories, let the framework manage collaboration. Most human-readable framework — reading a CrewAI config feels like reading an org chart. The abstraction breaks down when agents need fine-grained interaction control. “Researcher sends findings to writer” expresses easily; “researcher sends partial findings, writer starts drafting in parallel while researcher continues, then writer requests specific clarifications” gets ugly fast.

My recommendation: start with ADK’s deterministic agents for workflows you understand well. Move to LangGraph when you need custom topologies. Pick CrewAI when the workflow maps naturally to human team dynamics. Don’t choose a framework first and design your architecture around it — that’s backwards.

When One Agent Is Enough #

This part of the talk got the strongest reaction. Probably because it pushed against the room’s enthusiasm.

Anthropic published a report in early 2026 noting that while AI appears in 60% of developer work, only 0-20% gets fully delegated to autonomous agents. The rest involves human oversight, correction, direction.

That statistic should give pause to anyone designing a twelve-agent system for a task that a single agent with good tools could handle. Multi-agent systems add communication overhead, error propagation, state synchronization headaches, and debugging complexity. Every additional agent represents a node that can fail, hallucinate, or misinterpret instructions.

The question to ask before reaching for multi-agent architecture: does this problem genuinely decompose into roles requiring different capabilities? If one agent with access to the right tools via MCP can handle the full workflow, a single agent wins. Simpler to build, simpler to debug, simpler to maintain. I learned this the hard way building my adversarial writing pipeline — started with four specialized agents, collapsed to two (writer and detector) once I realized the “planner” and “editor” agents were just adding latency without improving output.

Multi-agent architecture earns its complexity when the task exceeds a single agent’s context window, when subtasks demand fundamentally different system prompts or toolsets, or when parallel execution delivers meaningful speedups. If none of those conditions hold, you’re adding complexity for the sake of it.

Task Horizons: Where This Goes Next #

The most forward-looking section of the talk covered task horizons. Early agent systems operated on minute-scale tasks: answer a question, write a function, generate a test. Current systems handle hour-scale work: build a feature, debug a complex issue, refactor a module.

The frontier sits at day-and-week-scale tasks: design and implement an entire system, plan and execute a migration, build and deploy a full application. Longer horizons demand strategic human checkpoints — not human-in-the-loop for every decision, but oversight at critical junctures.

The architectural implication: multi-agent systems tackling long-horizon tasks need durable state management, checkpoint/resume capabilities, and clear escalation paths. A system running for three days can’t keep everything in memory. It needs persistent state (A2A’s state management or a custom store), recoverable execution so a failure at hour 47 doesn’t torch 46 hours of progress, and the ability to pause for human input at predefined gates.

We’re early. But the direction looks clear enough. And the patterns I described — sequential, generator-critic, coordinator, scatter-gather — those are the building blocks that longer-horizon systems will compose from.