What do you do when your AI agent hallucinates with your money?

Imagine you tell an AI agent to convert $10,000 in U.S. dollars to Canadian dollars by end of day. The agent executes — badly. It misreads parameters, makes an unauthorized leveraged bet, and your capital evaporates. Who’s responsible? Who pays you back?

Right now, nobody has to. And that, a group of researchers argues, is the defining vulnerability of the agentic AI era.

In a paper published on April 8, researchers from Microsoft Research, Columbia University, Google DeepMind, Virtuals Protocol and the AI startup t54 Labs have proposed a sweeping new financial protection framework called the Agentic Risk Standard (ARS), designed to do for AI agents what escrow, insurance, and clearinghouses do for traditional financial transactions. The standard is open-source and available on GitHub via t54 Labs.

We are talking about an entire “agentic economy” here, t54 founder Chandler Fang told Fortune in an emailed statement; “it is very different from simply using AI agents for financial tasks.” He said there are two fundamental types of agentic transactions: human-in-the-loop financial transactions and agent-autonomous transactions. Everyone’s focus is on the human-in-the-loop stuff, he said, and that’s a real problem, because the financial ecosystem currently has no way to operate other than to defer all liability back to a human. It all comes down to the probabilistic nature of this technology, the researchers explained.

The probabilistic problem

The core problem the team identifies is what they call a “guarantee gap,” which they define as a “disconnect between the probabilistic reliability that AI safety techniques provide and the enforceable guarantees users need before delegating high-stakes tasks.” This description recalls what leadership expert Jason Wild previously told Fortune about how AI tools are probabilistic, befuddling managers everywhere. “Without a way to bound potential losses,” the t54 team wrote, “users rationally limit AI delegation to low-risk tasks, constraining the broader adoption of agent-based services.”

Model-level safety improvements, they argue, can reduce the probability of an AI failure, but cannot eliminate it. Large language models are inherently stochastic, meaning that no matter how well trained or well tuned an AI agent is, it can still hallucinate and make mistakes. When that agent is sitting on top of your brokerage account or executing financial API calls, even a single failure can produce immediate, realized loss.

“Most trustworthy AI research aims to reduce the probability of failure,” said Wenyue Hua, Senior Researcher at Microsoft Research. “That work is essential, but probability is not a guarantee. ARS takes a complementary approach: instead of trying to make the model perfect, we formalize what happens financially when it isn’t. The result is a settlement protocol where user protection is deterministic, not probabilistic.”

The researcher’s solution borrows directly from centuries of financial engineering. ARS introduces a layered settlement framework: escrow vaults that hold service fees and release them only upon verified task delivery; collateral requirements that AI service providers must post before accessing user funds; and optional underwriting — a risk-bearing third party that prices the danger of an AI failure, charges a premium, and commits to reimbursing the user if things go wrong.

The framework distinguishes between two types of AI jobs. Standard service tasks — generating a slide deck, writing a report — carry limited financial exposure, so escrow-based settlement is sufficient. Tasks involving the exchange of funds — currency trading, leveraged positions, financial API calls — require the agent to access user capital before outcomes can be verified, which is where underwriting becomes essential. It is the same logic that governs derivatives markets, where clearinghouses stand between counterparties so that a single default doesn’t cascade.

The paper maps ARS explicitly against existing risk-allocation industries in a table: construction uses performance bonds, e-commerce uses platform escrow, financial markets use margin requirements and clearinghouses, and DeFi uses smart contract collateralization. AI agents, the researchers argue, are simply the next high-stakes service category that needs its own version of that infrastructure.

The timing is crucial

Financial regulators are already circling. FINRA’s 2026 regulatory oversight report, released in December, included a first-ever section on generative AI, warning broker-dealers to develop procedures specifically targeting hallucinations and to scrutinize AI agents that may act “beyond the user’s actual or intended scope and authority”. The SEC and other agencies are watching closely.

But ARS is pitched as something regulators haven’t yet built: not a set of rules, but a protocol — a standardized state machine that governs how funds are locked, how claims are filed, and how reimbursements are triggered when an AI agent fails. The researchers acknowledge ARS is one layer of a larger trust stack, and that the real bottleneck will be building accurate risk-pricing models for agentic behavior.

“This paper is the first step in setting up a high-level framework to capture the end-to-end process associated with agent-autonomous transactions and what the risk assessment looks like,” Fang told Fortune. “Further down the road, we should introduce more specific details, models, and other research to understand how we figure out risk across different use cases.”

What's On

$1.5 trillion relief rally erupts on Wall Street after Trump hits pause on Iran strikes

Donald Trump Jr. says ‘the biggest names’ think Europe is a ‘disaster’ that needs to be fixed

What do you do when your AI agent hallucinates with your money?

The probabilistic problem

The timing is crucial

$1.5 trillion relief rally erupts on Wall Street after Trump hits pause on Iran strikes

Donald Trump Jr. says ‘the biggest names’ think Europe is a ‘disaster’ that needs to be fixed

Jamie Dimon: US had home run on national debt, next best option is crisis management

Our Picks

$1.5 trillion relief rally erupts on Wall Street after Trump hits pause on Iran strikes

Donald Trump Jr. says ‘the biggest names’ think Europe is a ‘disaster’ that needs to be fixed

What do you do when your AI agent hallucinates with your money?

Most Popular

Jamie Dimon: US had home run on national debt, next best option is crisis management

Who owns ideas in the AI age?

A year in the life at HP: what matters to its Northern European chief in April 2026?

Archives

Categories

What's On

What do you do when your AI agent hallucinates with your money?

The probabilistic problem

The timing is crucial

Related Articles