Research Focus
This paper reports a controlled empirical evaluation of distributed large language model (LLM) multi-agent architectures on a synthetic, short-horizon benchmark. Existing agent benchmarks demonstrate that LLM agents can collaborate, use tools, and operate in interactive environments, but they often undermeasure the internal coordination work required to achieve final task success.
The study evaluates six architecture conditions - single-agent baseline, orchestrator-worker, graph workflow, blackboard/shared workspace, debate/critic, and bounded peer-to-peer - across 60 stratified tasks and five repeated runs per task-condition pair, for 1,800 Gemini 2.5 Flash executions at temperature 0.7.
Study Details and Findings
- Author: Abhinav Mahajan (Clayton Homes)
- Date Written: May 13, 2026
- Posted: May 23, 2026
- Repository: SSRN
- Pages: 14
- JEL Classification: O33
The evaluation measures task quality, token-weighted and latency-weighted Coordination Overhead Ratio (COR), Governance Readiness Score (GRS), token cost, latency, acceptance-gate outcomes, and failure patterns. All runs completed without runtime errors or acceptance-gate failures.
In this controlled short-horizon benchmark, architecture strongly affected coordination overhead, latency, and token use, while architecture explained only a small share of task-quality variance. Blackboard/shared workspace achieved the highest raw mean quality (0.9030), followed by graph workflow (0.8936), but this raw quality advantage was below the preregistered 0.10 practical effect threshold and both architectures incurred substantial coordination overhead.
GRS was constant at 1.0 across architectures because the shared trace and policy-boundary contract enforced common governance instrumentation; this is interpreted as evaluation-harness validation rather than architecture-specific governance superiority. The results support a task-dependent architecture-selection thesis for controlled short-horizon tasks: multi-agent collaboration should be justified by measurable quality, validation, reliability, or governance benefits that exceed its coordination, cost, and latency burden. The findings should not be generalized to long-horizon web, software-engineering, or enterprise workflows without replication on task suites designed to require delegation, memory, iterative validation, and stateful tool use.
Declaration of Interest
The author declares no known competing financial interests. Clayton Homes had no involvement in the research and is referenced solely for biographical context.
Funding
No external funding. All computational costs were incurred independently by the author.
Suggested Citation
Mahajan, Abhinav, Coordination Overhead and Governance Readiness in Distributed LLM Multi-Agent Systems: An Empirical Architecture Evaluation (May 13, 2026). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6781620.