Agentic systems are struggling to scale (this should feel familiar)

4 minute read

This article is part of a series on agentic systems:

The agents work. The system fails.

That is the pattern showing up across companies that tried to turn AI into a headcount strategy instead of a systems strategy.

Between 2023 and 2026, executives cut teams on the assumption that AI could absorb large parts of the workflow. The reversal came fast. Forbes and Human Resources Director, both drawing on the same rehiring pattern, point to the same outcome: companies tried to remove a workflow and discovered they had only removed the people holding it together. Careerminds survey data cited by HRD reports that two-thirds of companies are already rehiring roles they cut for AI. More than half said the rehiring began within six months.

TechRepublic reports that 55% of companies now regret AI-driven layoffs. Computerworld, summarizing Forrester, adds the missing diagnosis: too often management laid people off based on the future promise of AI, not on proven capability inside a working system.

The industry is not struggling because AI is weak, it’s because systems were removed faster than they could be replaced.

It accelerated the system until the weak parts became impossible to ignore.

The first public failures are already here

Klarna became the clearest public example. Its leadership said AI was handling work equivalent to roughly 700 customer service agents, and the move was framed as evidence that replacement at scale had arrived. But as Digital Applied’s Klarna analysis summarizes, the routine volume held while customer satisfaction fell on the complex interactions that actually required judgment, escalation, and trust. Klarna then shifted back toward rehiring and a hybrid model.

It was the same system failure in a more visible form.

Local gain	System failure
AI handled routine volume	Quality dropped on edge cases
Staffing costs looked lower on paper	Rehiring and correction costs erased savings
Output arrived faster	Judgment, escalation, and accountability became the constraint

The failure is easy to misread because the models did useful work. The diagnosis was wrong.

Companies removed the human layer that interpreted ambiguity, handled exceptions, and preserved continuity across the workflow. Once that layer was gone, the system degraded.

The wrong conclusion

The easy conclusion is that the models still are not capable enough, but that misses the failure mode.

The failure sits in the system the agent operates in.

What broke first was the control layer around the agent: intent definition, handoffs, validation, and exception handling.

The real problem is coordination

The same pattern shows up in engineering teams building agentic systems.

An agent can write a function or scaffold something meaningful, but ask it to help ship a real feature, one that spans research, design, implementation, testing, and rollout, and the surrounding system starts to break down.

That happens because agentic systems today often:

Pass too much raw context
Blur roles between planning and execution
Lack clear handoffs and validation

Those failures are connected. When work is passed forward as raw context instead of as a structured handoff, every downstream step has to reinterpret intent, constraints, and success criteria for itself. Local work speeds up, but system throughput does not improve.

This is not a model problem. It is a coordination problem.

The deeper pattern: learning from autopilot

As Maximilian Walterskirchen’s essay on piloting agentic engineering argues, autopilot did not simplify aviation. It forced the system around the cockpit to become more explicit.

Pilots were not removed from the loop. Their role shifted toward supervision, intervention, and takeover when automation no longer fit the situation. The new risk appeared at the handoff back to the human.

Automation does not remove the system. It shifts control within it.

The same structural change is happening in software organizations. Agents do more local work, but the surrounding system still has to decide when to delegate, what to pass forward, how to validate results, and where human judgment remains necessary.

The surrounding systems are failing under acceleration.

The key insight: AI amplifies the system

The 2025 DORA report names the principle directly: “AI is an amplifier, not a fix.” Faster drafting does not repair unclear ownership, brittle reviews, or weak validation. It exposes them.

The recurring debate about smarter models misses the main constraint.

The real question is not:

How do we make agents smarter?

It is:

How do we structure systems so intelligence can scale without losing control?

This series argues that agentic systems are fundamentally a systems engineering problem.

The next question

If faster agents still yield brittle systems, the problem is not a lack of intelligence. It is that too much of the work is still being held together by operators compensating in real time.

That makes the field’s maturity hard to ignore, because so many systems still depend on craft more than design.

👉 Part 2: Agentic systems are still in the artisanal era

Share on

X Facebook LinkedIn Bluesky

Sebastien Lavoie

Agentic systems are struggling to scale (this should feel familiar)

The first public failures are already here

The wrong conclusion

The real problem is coordination

The deeper pattern: learning from autopilot

The key insight: AI amplifies the system

The next question

Share on

You May Also Enjoy

Writing this series with AI: a postmortem

Designing agentic systems for engineering organizations

Long-lived systems need modularity

Why all systems become pipelines