Blog

Moving AI agents from demos to money movement

What stops agents from executing in payments production
Four essentials to build or buy before agents can act in payment flows
How we're approaching AI agents at Nium
Where the real investment needs to go
The startup advantage
What this means for engineering leaders

What's actually slowing down cross-border payments modernization for banks.

13 Minute read

Global payroll has a trapped liquidity problem. Stablecoins are fixing it

10 Minute read

Multi-currency accounts for banks: the practical playbook for getting started

12 Minute read

Nium & Circle: Connecting stablecoins to real world payments at scale

6 Minute read

The cross-border payroll pulse check: EWA, stablecoins, AI, and compliance

6 Minute read

Nium and Circle to Connect USDC Settlement with Global Payouts

4 Minute read

77% of payroll payment failures are preventable. Here's proof

9 Minute read

Inside the build: how four different teams ship on the same multi-currency infrastructure

16 Minute read

Sekhar Cidambi

CTO

AI agents work great until you need them to do something that can't be wrong.

Take the AI agents we run in production at Nium. They handle customer onboarding, process thousands of documents, and accelerate compliance workflows. They’re fast, cost-effective, and reliable. But they don’t execute, they recommend actions. Humans make the final call.

This isn't caution. It's reality.

When you're processing payments across 100+ jurisdictions with different regulatory requirements, or making compliance decisions that carry legal weight, probabilistic systems just don’t cut it. You need deterministic behavior. You need explainability. You need auditability that meets regulatory standards.

I was on a panel recently at an AWS Builder Event with engineering leaders from Big EV and Visa, in front of a room full of CTOs and VPs of Engineering building fintech infrastructure. Everyone is grappling with this tension right now: the gap between "agent suggests" and "agent executes" is wide. But it's starting to close because teams are building the infrastructure layer that’s needed first.

In short, the hardest AI agent problems right now aren’t about AI — they’re about the plumbing.

What stops agents from executing in payments production

Let me be specific about what this gap looks like in practice.

When we deploy an AI agent for customer onboarding at Nium, it reads documents, extracts information, flags potential issues, validates information with other sources, and recommends next steps. A human reviews the recommendation, makes the final decision, and executes. The agent accelerates the process — we're seeing significant improvements in speed and cost-effectiveness — but it doesn't act autonomously.

Why not? The standard answer is "hallucinations" or "probabilistic behavior," and those are real concerns. But the actual blocker is deeper. In a regulated industry, we need:

Explainability for every action taken.
Auditability with attestation that can be reviewed later.
Standard operating procedures that must be followed exactly.
Guaranteed performance, not "usually correct" but deterministic behavior.

When you're processing a $10,000 payment or a $10 million one, you can't have mistakes. The compliance requirements, the audit trail, and the regulatory obligations all demand certainty. Probabilistic systems, by definition, don't provide that.

So the real question isn't "are the models good enough," but rather “what infrastructure layer needs to exist before we can safely let agents act.”

Four essentials to build or buy before agents can act in payment flows

During the panel discussion, the topic “what’s actually missing” became the focal point. Not "what can LLMs do?" but "what needs to be built around them?"

Here's what's missing in most production fintech stacks today:

A comprehensive recording layer.
Every action an agent takes or considers taking needs to be logged with full context. Not just what it did, but why. What data it saw, what decision path it followed, what alternatives it considered. This isn't standard application logging. It's a specialized audit trail that meets regulatory requirements and supports post-action review.
Maker-checker patterns at the model level.
One approach gaining traction: using separate LLM models where one acts as the "maker" (proposes an action) and another acts as the "checker" (validates before execution). There are startups building exactly this: the different pieces of the verification puzzle. It's early, but the architecture makes sense. You're essentially building separation of duties into the agent layer itself.
Robust evaluation frameworks.
Not post-deployment monitoring, but development-time eval that tests agents against diverse, real-world datasets. How does the agent behave when it encounters an edge case? A document format it hasn't seen? A jurisdiction with different requirements? You need comprehensive test coverage before production, not after.
Learning loops that work in regulated environments.
Agents need to improve over time, but in fintech you can't just let them learn freely in production. You need structured feedback mechanisms where the agent encounters new stimuli, humans evaluate the behavior, and the system learns. Almost like teaching, not unsupervised training. The learning has to be deliberate and auditable.

These aren't incremental improvements to existing infrastructure. They're net-new capabilities that most fintech stacks don't have yet.

How we're approaching AI agents at Nium

Our current philosophy is cautious by design, not cautious by nature.

When we deploy a new AI agent capability, we run it in shadow mode first. The agent processes real production data, for example actual customer documents during onboarding, but it doesn't take any actions. It makes recommendations and we evaluate how well it performs against what a human would have done.

This runs for months, not weeks.

That might sound slow in an industry where teams want to ship in days. But in a regulated environment, with compliance obligations across 100+ jurisdictions, prudence justifies the timeline.

During shadow mode, we're building the infrastructure I described above. The recording layer. The eval framework. The verification patterns. We're treating this as an infrastructure build, not a model deployment.

And we're holding ourselves to a higher standard than regulations currently require. Regulations will catch up eventually, but right now the responsibility is on us — the organizations building these systems — to define what "safe" means for AI agents in fintech. We must be very thoughtful about how we design, build, and test these agentic systems before they touch money movement.

Where the real investment needs to go

One of the audience members asked if I had $100 million to invest in AI infrastructure for fintech, where would it go. Here’s my answer:

Agentic payments infrastructure.
Not the agents themselves, but the ecosystem around them. When agents start making micropayments to each other (and that's coming), you need know-your-agent (KYA) frameworks, delegation mechanisms, and trust protocols. How does a user safely delegate an agent to act on their behalf? How do you verify an agent's identity and authority? This is net-new infrastructure that needs to be built at scale.
Solving for deterministic behavior from probabilistic systems.
This is the core technical challenge. Can you build verification layers that ensure deterministic outcomes even when the underlying system is probabilistic? It's not about making LLMs deterministic. It's about the architecture around them. This is hard infrastructure work, and whoever solves it first unlocks autonomous agents in regulated industries.
AI security.
The attack surface is expanding. Bad actors are using the same AI tools we are, and they're highly motivated. We need agentic security: agents that defend against agents. We need AI-powered vulnerability detection. We need threat models that account for adversarial AI. Security is always infrastructure, but AI-powered security is becoming critical infrastructure.

These aren't moonshots. They're engineering problems with clear requirements. And I'm optimistic we'll see significant progress in the next six months, not because the models will get dramatically better, but because the infrastructure around them will mature.

The startup advantage

One thing that became clear during the panel: startups have a real advantage here.

Someone asked how small teams compete with big tech in AI. My answer: big tech has big tech debt. Legacy systems, regulatory constraints they've already locked in, architectural decisions made years ago. Startups building AI-native infrastructure today can make different choices.

And with AI-assisted development, small teams can move incredibly fast. The innovation happening in the SF Bay Area right now. The number of startups working on different pieces of this puzzle is remarkable. Verification layers, eval frameworks, security tooling, agentic orchestration. Each one is solving a specific infrastructure problem that the industry needs.

The companies that will win aren't necessarily the ones with the best models. They're the ones building the right infrastructure layer between "demo" and "production."

What this means for engineering leaders

If you're a CTO or VP Engineering evaluating AI agents for your fintech stack, here's what I'd ask:

Do you have the infrastructure to record, explain, and audit every action an agent would take? Not logging; audit trails that meet regulatory standards.
Can you run agents in shadow mode against production data for months before they act? Do you have the eval framework to measure their performance accurately?
What's your maker-checker architecture? How are you separating proposal from verification?

If those questions don't have clear answers, you're not ready to let agents act. And that's fine, most teams aren't. The work right now is building that foundation.

The future of AI agents in fintech isn't about waiting for better models. It's about building the infrastructure layer that makes autonomous action safe, auditable, and reliable. The teams doing that work now, deliberately, thoughtfully, with the right architectural patterns, will be the ones shipping agents that actually move money.

Not in five years. In the next 12-18 months. But only if the plumbing is built right.

Share Tweet LinkedIn

Nium Verify

Premium content

Nium Verify

Access real-time, instant bank account verification — eliminating payment failures, reducing operational costs, and enhancing fraud prevention with Nium Verify.

Read the report

Connect to Nium via Swift