Agents — Field Note

Why Agent Demos Break in Production

Agent demos are optimized for spectacle. Production agents need memory boundaries, verifiable action, recovery paths, and a clear theory of when not to act.

May 02, 2025·8 min·Field Notes

The demo version of an agent is a magic trick: clean context, friendly tools, no legacy mess, and a user who knows exactly what to ask. Production is different. Production is a thousand small exceptions wearing a familiar interface.

Agents break when they are allowed to act without understanding the shape of responsibility around the action. Sending an email is not hard. Knowing whether this email should be sent, what prior conversation it depends on, what promise it implies, and how to recover if the premise is wrong is the real system.

A reliable agent needs at least four layers. It needs memory that distinguishes durable facts from temporary notes. It needs tools with explicit permissions and typed failure modes. It needs guardrails that are not just refusal policies but operational constraints. And it needs feedback loops from the real world so it can notice when its model of the work is drifting.

The most important design choice is often restraint. A useful agent should know when to draft instead of send, ask instead of infer, summarize instead of decide, and stop instead of continuing a flawed chain of action. Autonomy without calibrated hesitation is just a faster way to create cleanup work.

This is why provenance matters. People need to inspect the path from evidence to recommendation to action. They need to see which source was used, which assumption was made, and which human decision closed the loop. Without that, trust becomes a mood instead of a property of the system.

I am interested in agents that make production work feel calmer: systems that remember what matters, act when it matters, and can prove why it mattered afterwards.

Watercolor sketch of an agent loop with memory, tools, guardrails, and feedback arrows showing what production agents need beyond the demo.

ThemesAgentsReliabilityProvenance

Back to thinking