JPMorgan runs 500+ AI use cases in production. Klarna's assistant added an estimated $40M to profit doing the work of 700 agents. Enterprise deployments now average 171% ROI. The agents that win aren't running better models than everyone else — they're built on a discipline you can copy.
The argument about whether AI agents work is over. In 2026 the evidence stopped being anecdotal and started showing up on balance sheets.
- JPMorgan runs more than 500 AI use cases in production — real-time fraud detection, anti-money-laundering screening with far fewer false positives, liquidity forecasting, document processing. Not pilots. Live, every day, at the scale of one of the largest banks on earth.
- Klarna’s AI assistant handled two-thirds of customer-service chats in its first month — 2.3 million conversations, the work of 700 full-time agents, resolutions in under two minutes against eleven before, 24/7 in 35+ languages — adding an estimated $40 million in profit improvement in a single year.
- Across the market, enterprise agent deployments are now returning an average 171% ROI — 192% in the US — roughly three times traditional automation. Agentic AI is the fastest-growing enterprise-tech priority, up 31.5% year over year.
This is what “production AI” looks like when it lands. The question worth your time is no longer do agents work — it’s why do some deployments compound while others stall in pilot? Because the data has an unusually clean answer.
It isn’t the model
Here’s the part that should change how you budget. The companies posting these numbers are not running secret, superior models. JPMorgan, Klarna, and a mid-market team in your city all reach for the same frontier models. Fewer than 8% of agent projects that stall are blocked by model capability at all.
What separates the deployments that pay off — and keep paying off — is everything around the model. The 2026 ROI research is blunt about it: the roughly one-in-eight agent projects that reach durable production share four traits, and not one of them is “a better model.” They are:
- Infrastructure built before deployment, not bolted on after.
- Governance documented before launch — what the agent can touch, and what it can’t.
- Baseline metrics captured before the pilot, so “is this working?” has a real answer.
- A named business owner accountable after it ships, not just during the demo.
That’s not a model-selection problem. It’s an operational discipline — and it’s learnable, repeatable, and exactly where most of the value is won or lost.
What the winners actually do
Look closely at JPMorgan’s 500 use cases and a pattern repeats in every one of them:
- Each agent is scoped to a specific job. A fraud model has a defined input, a defined action, and a metric it lives or dies by. It isn’t “AI for the fraud team” in the abstract — it’s “cut false positives by X so the team works the cases that matter.” Capacity added, scope bounded.
- Each one is measured on the traffic it actually sees, not a demo. The operating number — resolution time, false-positive rate, deflection on real cases — tells you it’s working before customers do.
- Each one has a human path and a clear owner. The agent knows what it can’t handle and routes it; someone is accountable for its number ninety days from now.
Scope, measurement, ownership, a clean handoff. None of it is exotic. All of it is the difference between an agent that compounds and one that quietly rots in a proof-of-concept folder.
What to do this week
- Pick one bounded, high-volume task — the kind of repetitive, well-defined work where a fast, correct, always-on answer beats a queue. That’s where the first durable win almost always lives.
- Write down the operating metric before you build. Decide the one number that says “this is working,” and make sure it’s measured on real traffic, not the demo.
- Name an owner and define the handoff. One person accountable for the result; one clear rule for what the agent escalates to a human. Those two decisions, made on day one, are what keep a win a win.
The bigger picture
The headlines will keep swinging between hype and skepticism, but the operators have already moved on. They’re not debating whether agents work — they’re shipping their fifth, their fiftieth, their five-hundredth, and booking the return.
The model is now a commodity; everyone has the same one. The advantage is in the operational layer — scoped, measured, owned, and built to run in production rather than impress in a demo. That layer is exactly what we build, and it’s the difference between an agent that makes the press release and one that’s still paying for itself a year later. The proof that agents work is in. The only open question is who builds them to last.