The Perfection Tax: Why We Hold AI to Standards We Never Apply to Ourselves

We demand perfection from machines but shrug at human failure. This asymmetry is one of the most consequential biases shaping how we adopt (or refuse to adopt) transformative technology. And it applies far beyond AI.

Listening to Demis Hassabis on the Google DeepMind podcast, he drew a sharp parallel between next-token prediction in LLMs and Daniel Kahneman's System 1 thinking: fast, intuitive, pattern-matching. The kind of processing that gets you a plausible answer quickly but has no mechanism to flag when it is wrong. Hassabis argues that what LLMs need is further evolution of the System 2 layer: slower, deliberate reasoning that can evaluate its own confidence and know when to abstain rather than bluff.

He has form here. AlphaFold, DeepMind's protein-folding model, outputs a confidence score alongside every prediction. It knows what it knows and, critically, what it does not. Current LLMs do not do this well. They hallucinate precisely because they are forced to produce an answer even when the honest response is "I do not know." The research community is catching up. OpenAI's o1 and o3 models and DeepSeek's R1 were attempts to bolt System 2 reasoning onto token prediction (hard to believe that paradigm’s only a year old). A February 2025 paper from arXiv showed that System 2-aligned models outperform baselines in arithmetic and symbolic reasoning, but at a cost: longer, more token-intensive responses. The trade-off between speed and accuracy mirrors Kahneman's original insight about human cognition.

But here is the part that interests me more than the technical architecture: why do we hold AI to a standard we never apply to ourselves?

Waymo's autonomous vehicles have now driven over 127 million fully autonomous miles. The data shows a 91% reduction in serious-injury crashes and an 80% reduction in any-injury crashes compared to human drivers on the same roads. Yet 55% of consumers say they would not purchase an autonomous vehicle. Meanwhile, human drivers kill over 39,000 Americans every year, and we collectively shrug.

The same pattern plays out with nuclear energy. Per unit of energy produced, nuclear has the lowest death rate of any power source, including wind and solar. Yet it faces more regulatory and public resistance than coal, which kills orders of magnitude more people through air pollution alone.

Harvard Business School research explains why. When people evaluate an autonomous vehicle accident, they do not compare the AV to an average human driver. They compare it to an imagined perfect driver who would never have made that mistake. The machine is held to a counterfactual that no human could meet.

This is not a minor quirk. It is a structural barrier to adopting technologies that are demonstrably safer than the status quo. We are, in effect, choosing worse outcomes because the familiar risk feels more acceptable than the unfamiliar one.

This matters for anyone building AI products. The technical challenge of making LLMs reason better is real and important. But the adoption challenge may be harder. You can build a system that is measurably better than a human at a task, and people will still reject it because it failed once in a way a human would not have. The bar is not "better than human." The bar is "perfect." And nothing is perfect.

The practical takeaway: if you are building AI systems, invest as much in communicating uncertainty as in reducing it. AlphaFold's confidence scores are trust mechanism as much as a technical feature. Users can see where the model is sure and where it is guessing. That transparency may be worth more than raw accuracy improvements.

We need to get honest about the double standard. The question is not whether AI is perfect. It is whether AI is better than the alternative. And on that measure, the evidence is increasingly clear.

Previous
Previous

Strength Before Judgement

Next
Next

A Lapsed Engineer's Return to Building