Notes on building a production AI agent
Lessons from owning the backend of a real-time, text-and-voice AI agent that validates insurance claims — and from becoming the person a team builds its agent workflows around.
ProjectPaivThere's a wide gap between a demo agent and one you trust with real claims. At Paiv I've spent that gap — building and owning the backend for an AI agent that validates insurance-claim estimates in real time, over both text and voice. These are a few things I've learned getting an agent to production and keeping it there.
The agent is only as good as its tools
A language model is the easy part. The agent is reliable because of the layer underneath it — the tools it can call, the database queries those tools run, and the orchestration that decides what happens when. That's where the real engineering lives: making the agent's judgments accurate and fast enough to catch missing photos, details, and documents before an estimate reaches a human reviewer.
When something goes wrong with an agent, it's almost never the prompt. It's a tool returning the wrong shape, a query that's subtly off, or an orchestration path nobody tested. Treat the tool layer like the production surface it is.
Reliability is a data-integrity problem
The agent sits inside an event-driven pipeline handling roughly 100,000 events a day. At that volume, the interesting bugs are concurrency bugs — races and ordering problems that never show up in a demo and absolutely show up in production. A lot of "make the agent reliable" turned out to mean "make the data underneath it correct," which is a different and more durable kind of work.
Make agentic engineering a team capability
The highest-leverage thing I did wasn't a feature — it was infrastructure for how we build. I designed custom skills, hooks, and steering files for Cursor, Claude Code, and Kiro that let coding agents autonomously query DynamoDB and RDS, deploy through SAM, and parse CloudWatch logs.
The point is repeatability: when an agent workflow lives in shared, version-controlled config instead of one person's head, the whole team gets faster, and the workflows get better because more people use them. It became the resource other engineers set their own setups up from.
Own the whole stack when the team is small
As the team contracted, the most useful thing I could be was someone who'd pick up whichever layer needed to move — agent backend, real-time messaging on Lambda + AppSync + DynamoDB, or the React and React Native frontends. Full-stack context is also what makes good debt-reduction calls: consolidating feature flags into org-level settings, sunsetting low-usage features, and pruning services you can only safely cut when you understand how they connect.
The throughline: a production agent is a systems problem, not a prompting problem. The model is the part you think about least.