Why Most AI Agents Fail When It Matters

Dmitriy Stepanov is Co-founder, CTO, CAIO, and Business Process Automation Expert at Glorium Technologies.

The velocity of the AI agent market makes it difficult to slow down, verify claims or test results against production conditions.

Thousands of vendors claim autonomous agent capabilities. However, Gartner estimates as few as 130 of them are genuine, with the rest flagged as what analysts call “agent washing,” a rebranding of basic automation or traditional RPA as autonomous AI agents.

One thing experts find curious is the way the market calls these systems “agents,” a word that implies they are steady colleagues you can lean on and delegate routine tasks. But all it takes is one production deployment, and the “independent” agent becomes an unpredictable liability that’s powerful enough to authorize changes to a production database without warning.

And these are not hypothetical risks! They’re logged incidents that highlight how AI agents dazzle in demos but crumble under real production pressure, forcing companies to deal with the aftermath.

It’s intuitive to blame the model for these challenges, but the real culprits are bad evaluation criteria, deployment sequencing mistakes and gaps in supporting infrastructure.

The Problem Is Not The Model

The human reaction to every loss of confidence is to change the model, pick a better one, tune the prompts, add more guardrails to the system message and so on.

According to BCG’s survey of enterprise AI adopters, 70% of AI implementation challenges involve people and processes. Only 20% is attributed to technology problems and 10% to AI algorithms. This is the single most important ratio for this discussion because it highlights that the 70% is all about design: design of workflows, design of roles, design of where things go when something goes wrong.

Gartner predicts that “over 40% of agentic AI projects will be canceled by the end of 2027.” This won’t be because the models didn’t work but because the organizations deploying them didn’t have cost control, value metrics or risk management.

If you look at this longitudinal study conducted by researchers affiliated with Princeton University, you’ll notice the weakest link in AI implementation is predictability. The study of 14 frontier models over 18 months found that capability gains have not translated into gains in reliability. Benchmark accuracy improved, but consistency, robustness, predictability and safety remained at the same status quo. The models got smarter but not more reliable, making the agent worse at knowing when it is wrong.

Workflow Readiness Is The Differentiator

So, if model quality doesn’t decide the match, what does? The answer is how well the organization has restructured itself around the agent.

McKinsey’s 2025 State of AI survey shows that high-performing companies are 2.8 times more likely to have fundamentally redesigned their workflows around AI agents. Respondents who say they’ve experimented with agents without restructuring report only 10% adoption.

It’s clear: Agents don’t lift broken processes; they expose them.

The companies that get this right look very different from those that don’t. Smarsh deployed an AI customer support agent in financial services with limited scope, controlled execution and orchestration. They saw 59% adoption of customer self-service, 25% faster issue resolution and a 30% increase in productivity. Similarly, Zoom adopted an AI virtual agent with multistep routing, full observability and human intervention capabilities. Within three months, billing deflection increased from 0% to 30%, saving over 1,000 agent hours per month. Zoom went on to release Virtual Agent 3.0 as a customer-facing product in February 2026. The governance-first approach was validated as a viable operating model for AI systems.

The Speed Objection Doesn’t Survive The Data

In my experience, one of the most common objections is that governance is friction, extra steps to take before pushing code. Why go through the hassle if there’s another way that’s faster and less regulated? Speed feels like an advantage in the AI agent market, but data suggests otherwise. Gartner’s 40% cancellation rate tells us what happens when organizations prioritize deployment over governance: expensive failures and programs set back by quarters.

In December 2025, the Financial Times reported (per TechTarget) a service disruption with Amazon’s AI coding agent, Kiro, which resulted in an outage affecting the AWS Cost Explorer. Amazon stated that this issue “stemmed from a misconfigured role” that was largely due to user error.

When a governance retrofit isn’t complete, the common response to incidents like these is to blame the model—better benchmarks, longer training, bigger context windows. But when a model isn’t the reason why peer review didn’t apply to the agent, that’s a governance gap. Governance is not friction. Cancellations, rollbacks, liability and lack of trust after a production incident are friction.

Plus, the regulations are catching up. In February 2026, NIST launched its AI Agent Standards Initiative, focusing on interoperability, security and governance. Today, AI agent behavior standards are no longer just a pipe dream, and the companies building AI governance platforms will need to be ahead of the regulatory curve. Otherwise, they’ll have to play catch-up.

What To Ask Before Your Next Deployment

The organizations that will thrive are not those with the most advanced models. They are the ones that built the wiring first—governance frameworks, monitoring infrastructure and error budgets—then matched agent capability to task risk with disciplined patience.

Before your next AI agent deployment, consider three questions: Are you measuring reliability or just accuracy? Have you started with tasks where failure is survivable? Is your infrastructure ready for compound errors when multistep workflows multiply uncertainty?

The gap between demo success and production failure is measured in infrastructure, not model parameters. Fix that, and you fix the deployment.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

What's On

It’s Time To Binge 2026’s Best Show With Season 1 Now Complete

‘I have nothing to lose’: Perplexity CEO says fear of failure is ‘the stupidest thing’

Why Most AI Agents Fail When It Matters

The Problem Is Not The Model

Workflow Readiness Is The Differentiator

The Speed Objection Doesn’t Survive The Data

What To Ask Before Your Next Deployment

It’s Time To Binge 2026’s Best Show With Season 1 Now Complete

A ‘Dead’ Destiny 2 Now Has Something Players Can Grind For Forever

Stop Passing The AI Hot Potato

Our Picks

It’s Time To Binge 2026’s Best Show With Season 1 Now Complete

‘I have nothing to lose’: Perplexity CEO says fear of failure is ‘the stupidest thing’

Why Most AI Agents Fail When It Matters

Most Popular

Trump’s DOJ asks judge to halt first reparations program in U.S. history

A ‘Dead’ Destiny 2 Now Has Something Players Can Grind For Forever

Current price of oil as of June 17, 2026

Archives

Categories

What's On

Why Most AI Agents Fail When It Matters

​The Problem Is Not The Model

Workflow Readiness Is The Differentiator

The Speed Objection Doesn’t Survive The Data

​What To Ask Before Your Next Deployment

Related Articles

The Problem Is Not The Model

What To Ask Before Your Next Deployment