When an agent does something unexpected, developers reach for human explanations. "It got confused." "It misunderstood the intent." "It was being lazy." These phrases feel descriptive. They're not — they're a way of avoiding the real question, which is: what actually happened in the system?

Anthropomorphizing is comfortable because it maps agent behavior onto a domain we already understand — human cognition. We know how to handle a confused colleague. We don't always know how to handle a transformer model that's producing unexpected token sequences. So we translate the unfamiliar into the familiar, and in doing so, we lose precision.

The cost shows up in debugging. If the agent "misunderstood," the fix is to explain more clearly — rewrite the prompt, add more context, be more explicit. Sometimes that's right. But sometimes the agent didn't misunderstand anything. It understood perfectly and the instructions were contradictory, or the retrieved document contained stale data, or the tool returned a malformed response that the agent handled gracefully in a way that produced the wrong result. None of those are misunderstandings. They're system failures with specific causes. And you'll never find them if you stopped at "it got confused."

The better habit is to narrate mechanically. Not "the agent misunderstood the task" but "the agent was given these inputs, produced this intermediate reasoning, called this tool, received this response, and generated this output." That chain of events is debuggable. You can point to each step and ask whether it was correct. You can reproduce it. You can change one variable and observe the effect.

This doesn't mean treating agents as simple deterministic systems — they're not. It means holding the probabilistic complexity in one hand while still demanding mechanical precision in your debugging. The model is a black box, but everything around it — the context it received, the tools it called, the outputs it produced — is observable. Debug what you can observe.

The anthropomorphizing trap also distorts expectations. Developers who think of agents as confused colleagues try to fix them the way they'd fix a confused colleague — with better communication. Developers who think of agents as probabilistic systems build evaluation harnesses, log intermediate states, and measure output distributions. The second group ships more reliable systems.

The agent didn't have a bad day. Something in the system produced a bad output. Find it.