Multi-Agent Architectures: Coordination Patterns for Complex AI Systems

Amit Patel

Amit Patel

ML Platform Lead

November 28, 202411 min read
Multi-AgentArchitectureAI SystemsCoordination
Multi-Agent Architectures: Coordination Patterns for Complex AI Systems

Why Multi-Agent?

Single agents hit limits. They can't maintain context across long tasks, struggle with diverse skill requirements, and become unreliable as complexity grows.

Multi-agent systems solve this by dividing work among specialized agents. Here's how to design them.

Architecture Patterns

Pattern 1: Sequential Pipeline

Agents process in sequence, each adding value:

class SequentialPipeline: def __init__(self, agents: list[Agent]): self.agents = agents async def run(self, input: str) -> str: current = input for agent in self.agents: current = await agent.process(current) return current # Example: Document processing pipeline pipeline = SequentialPipeline([ ExtractionAgent(), # Extract key information ValidationAgent(), # Verify accuracy SummarizationAgent(), # Create summary FormattingAgent() # Format output ])

Use when: Tasks have clear stages, each building on the previous.

Pattern 2: Router/Specialist

A router agent directs queries to specialists:

class RouterArchitecture: def __init__( self, router: RouterAgent, specialists: dict[str, Agent] ): self.router = router self.specialists = specialists async def run(self, query: str) -> str: # Router decides which specialist to use routing_decision = await self.router.route(query) specialist = self.specialists[routing_decision.specialist] # Specialist handles the query return await specialist.process( query, context=routing_decision.context ) # Example: Customer service architecture = RouterArchitecture( router=IntentRouter(), specialists={ "billing": BillingAgent(), "technical": TechnicalSupportAgent(), "sales": SalesAgent(), "general": GeneralAgent() } )

Use when: Different query types need different expertise.

Pattern 3: Hierarchical Delegation

A manager agent breaks down tasks and delegates:

class HierarchicalSystem: def __init__(self, manager: ManagerAgent, workers: list[Agent]): self.manager = manager self.workers = workers async def run(self, task: str) -> str: # Manager creates execution plan plan = await self.manager.plan(task) results = {} for subtask in plan.subtasks: worker = self.select_worker(subtask) results[subtask.id] = await worker.execute(subtask) # Manager synthesizes results return await self.manager.synthesize(task, results) # Example: Research task system = HierarchicalSystem( manager=ResearchManager(), workers=[ WebSearchAgent(), DocumentAnalysisAgent(), DataExtractionAgent(), SynthesisAgent() ] )

Use when: Complex tasks need decomposition and synthesis.

Pattern 4: Collaborative Discussion

Agents discuss and refine together:

class CollaborativeDiscussion: def __init__( self, agents: list[Agent], moderator: ModeratorAgent, max_rounds: int = 5 ): self.agents = agents self.moderator = moderator self.max_rounds = max_rounds async def run(self, topic: str) -> str: discussion = Discussion(topic) for round in range(self.max_rounds): for agent in self.agents: contribution = await agent.contribute(discussion) discussion.add(agent.name, contribution) # Check for consensus if await self.moderator.has_consensus(discussion): break return await self.moderator.summarize(discussion) # Example: Code review system = CollaborativeDiscussion( agents=[ SecurityReviewer(), PerformanceReviewer(), MaintainabilityReviewer() ], moderator=ReviewModerator() )

Use when: Multiple perspectives improve output quality.

Communication Protocols

Structured Messages

class AgentMessage(BaseModel): sender: str recipient: str message_type: Literal["request", "response", "info", "error"] content: dict requires_response: bool deadline: Optional[datetime] priority: int = 5

Shared State

class SharedState: def __init__(self): self.state = {} self.locks = {} async def read(self, key: str) -> Any: return self.state.get(key) async def write(self, key: str, value: Any, agent_id: str): async with self.locks.setdefault(key, asyncio.Lock()): self.state[key] = { "value": value, "updated_by": agent_id, "updated_at": now() }

Error Handling in Multi-Agent Systems

class ResilientMultiAgent: async def run_with_fallback( self, task: str, primary_agent: Agent, fallback_agent: Agent ) -> str: try: result = await asyncio.wait_for( primary_agent.run(task), timeout=30.0 ) if self.validate_result(result): return result except Exception as e: self.log_error(primary_agent, e) # Fallback return await fallback_agent.run(task) async def run_with_retry( self, task: str, agent: Agent, max_retries: int = 3 ) -> str: last_error = None for attempt in range(max_retries): try: return await agent.run(task) except RecoverableError as e: last_error = e await asyncio.sleep(2 ** attempt) raise MaxRetriesExceeded(last_error)

Observability

class MultiAgentTracer: def trace_execution(self, system_id: str): return ExecutionTrace( system_id=system_id, spans=[] ) def start_agent_span( self, trace: ExecutionTrace, agent_id: str, task: str ) -> Span: span = Span( trace_id=trace.id, agent_id=agent_id, task=task, start_time=now() ) trace.spans.append(span) return span def end_agent_span( self, span: Span, result: str, tokens_used: int ): span.end_time = now() span.result = result span.tokens_used = tokens_used

Conclusion

Multi-agent architectures unlock capabilities single agents can't achieve. Choose your pattern based on task structure, start simple, and add complexity only when needed. And always—always—build in observability from day one.

Share this article:
Back to all posts

Ready to build production AI?

We help companies ship AI systems that actually work. Let's talk about your project.

Start a conversation