Execution Systems

Agents Need Workflows

10 min read · Published February 28, 2026 · Updated February 28, 2026

By CogLab Editorial Team · Reviewed by Knyckolas Sutherland

If you’ve tried ‘AI agents’ in a real company, you’ve probably seen the same pattern: the demo looks magical, then the agent meets messy inputs, ambiguous goals, and unclear definitions of done, and the whole thing turns into a polite chaos machine.

The practical takeaway for founders is uncomfortable but empowering: agent performance is usually capped by workflow design, not model IQ.

A recent arXiv paper on multi-agent LLM systems for trading makes this unusually concrete. The authors argue that many multi-agent systems imitate an investment team with broad roles (analyst, manager), but those designs often rely on coarse instructions that don’t reflect real-world workflow complexity.

They propose explicitly decomposing the job into fine-grained tasks and evaluate it in a leakage-controlled backtest using Japanese stock data and multiple input sources (prices, financial statements, news, macro signals). They report improved risk-adjusted returns compared to coarse-grained designs.

The thesis for founders: stop building ‘role agents,’ start building ‘task lines.’ Each step should take bounded inputs, produce an auditable artifact, and have a definition of done, plus explicit stop conditions.

Trading is just a harsh environment where weak systems get punished quickly. But the failure modes map cleanly to everyday company workflows: underspecified objectives, messy inputs, implicit quality bars, and no intermediate artifacts you can inspect when it goes wrong.

What to do next week: pick one high-frequency workflow and decompose it into 6 to 10 micro-tasks (not roles). For example, inbound lead qualification can be split into extraction, summarization, disqualifiers, rubric scoring, draft response, next-step recommendation, and logging.

Add checkpoints and stop conditions: human approval before external sends or state changes, and ‘stop and ask’ behavior when required fields are missing or confidence is low.

Instrument it like a product: completion rate, human edit distance, time-to-done, and failure modes by step. Step-level visibility is what lets you iterate toward reliability.

The real moat isn’t prompts. It’s legibility plus feedback loops. Clearer work beats smarter agents.

Frequently Asked

Why do ‘role-based’ agents fail in production?

Roles are vague. Real work needs bounded inputs, intermediate artifacts, and explicit definitions of done, so the agent can reliably recover from ambiguity.

What is fine-grained task decomposition?

It’s breaking a workflow into small, testable steps (micro-tasks) where each step produces an auditable output that feeds the next step.

What’s the fastest way to apply this in a startup?

Pick one high-volume workflow, decompose it into 6 to 10 micro-tasks, add a human checkpoint before any external or irreversible action, and track edit distance and completion rate weekly.

Sources

arXiv: Toward Expert Investment Teams. A Multi-Agent LLM System with Fine-Grained Trading Tasks (arXiv:2602.23330)

AI Strategy

OpenAI and Washington: The New Power Question

8 min read

AI Strategy

OpenAI and Washington: The New Power Question

8 min read

Services

Explore AI Coaching Programs

Solutions

Browse AI Systems by Team

Resources

Use Implementation Templates

Frequently Asked

Why do ‘role-based’ agents fail in production?

What is fine-grained task decomposition?

What’s the fastest way to apply this in a startup?

Sources

Related Articles