<-Back to all research
PreprintPosted May 23, 2026SSRN

Coordination Overhead and Governance Readiness in Distributed LLM Multi-Agent Systems: An Empirical Architecture Evaluation

Reports a controlled benchmark of six distributed LLM multi-agent architectures, showing strong architecture effects on coordination overhead, latency, and token use, with limited practical quality differences in short-horizon tasks.

Author
Abhinav Mahajan
Date Written
May 13, 2026
Repository
SSRN

Research Focus

This paper reports a controlled empirical evaluation of distributed large language model (LLM) multi-agent architectures on a synthetic, short-horizon benchmark. Existing agent benchmarks demonstrate that LLM agents can collaborate, use tools, and operate in interactive environments, but they often undermeasure the internal coordination work required to achieve final task success.

The study evaluates six architecture conditions - single-agent baseline, orchestrator-worker, graph workflow, blackboard/shared workspace, debate/critic, and bounded peer-to-peer - across 60 stratified tasks and five repeated runs per task-condition pair, for 1,800 Gemini 2.5 Flash executions at temperature 0.7.

Study Details and Findings

  • Author: Abhinav Mahajan (Clayton Homes)
  • Date Written: May 13, 2026
  • Posted: May 23, 2026
  • Repository: SSRN
  • Pages: 14
  • JEL Classification: O33

The evaluation measures task quality, token-weighted and latency-weighted Coordination Overhead Ratio (COR), Governance Readiness Score (GRS), token cost, latency, acceptance-gate outcomes, and failure patterns. All runs completed without runtime errors or acceptance-gate failures.

In this controlled short-horizon benchmark, architecture strongly affected coordination overhead, latency, and token use, while architecture explained only a small share of task-quality variance. Blackboard/shared workspace achieved the highest raw mean quality (0.9030), followed by graph workflow (0.8936), but this raw quality advantage was below the preregistered 0.10 practical effect threshold and both architectures incurred substantial coordination overhead.

GRS was constant at 1.0 across architectures because the shared trace and policy-boundary contract enforced common governance instrumentation; this is interpreted as evaluation-harness validation rather than architecture-specific governance superiority. The results support a task-dependent architecture-selection thesis for controlled short-horizon tasks: multi-agent collaboration should be justified by measurable quality, validation, reliability, or governance benefits that exceed its coordination, cost, and latency burden. The findings should not be generalized to long-horizon web, software-engineering, or enterprise workflows without replication on task suites designed to require delegation, memory, iterative validation, and stateful tool use.

Declaration of Interest

The author declares no known competing financial interests. Clayton Homes had no involvement in the research and is referenced solely for biographical context.

Funding

No external funding. All computational costs were incurred independently by the author.

Suggested Citation

Mahajan, Abhinav, Coordination Overhead and Governance Readiness in Distributed LLM Multi-Agent Systems: An Empirical Architecture Evaluation (May 13, 2026). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6781620.

Research and Applied AI Architecture

My research work connects empirical evidence, governance, and production architecture for enterprise AI systems.

View All Research