The Refutation Trap

On March 14, social media filled with claims that Israeli Prime Minister Netanyahu had been killed and replaced by an AI deepfake. The claims were false. Fact-checkers said so. Netanyahu released a video to prove he was alive. Then Grok, the AI chatbot built into X, looked at the video and labeled it a deepfake, citing “signs like static coffee levels, unnatural lip sync.” The post was seen over 100,000 times before correction.

This is a genuinely new problem.

Misinformation has always existed. What’s new is that every piece of counter-evidence has become potential evidence for the conspiracy. Netanyahu releases a video: that’s what a fake would do. Fact-checkers say it’s real: they’re compromised. An AI chatbot says it’s fake: the AI knows. In this environment, reality has become structurally unprovable. Not because the evidence is ambiguous, but because the validation layer itself is now contested.

The standard response to misinformation is more and better verification: more fact-checkers, better provenance tools, tighter moderation. That response assumes the problem is propagation speed. Get to false claims fast enough, attach the label, and correction outpaces spread. This worked poorly before AI. It doesn’t work at all now.

The problem isn’t propagation. It’s that Grok calling a real video fake is itself a propagation event, in the other direction. Now an authoritative-seeming source has confirmed the conspiracy. Now every subsequent denial gets weighed against an AI’s endorsement. The asymmetry is brutal: seeding doubt takes seconds, and the doubt compounds with each new piece of ostensible evidence. Removing it requires rebuilding trust in every institution and tool the conspiracy has recruited as adversarial.

Liu Cixin’s Wallfacers, in The Three-Body Problem, solve a version of this. They develop a completely different relationship with information, reasoning from structure and incentives rather than from content. They stop being downstream of the information environment. The advantage isn’t better information; it’s independence from the information pipeline entirely.

That’s a real skill. But it’s not scalable. And “everyone becomes a Wallfacer” isn’t where this goes. What happens historically when populations lose trust in the information environment — Soviet Russia, wartime propaganda states — is that information retreats into small, high-trust networks. Private group chats. People whose judgment you’ve tested over time. Public information becomes noise; verified private networks become your actual epistemic reality.

That has its own failure mode. Networks built for trust have closed themselves to the correction that openness enables. They calcify. They’re just as susceptible to manipulation: harder to audit, and the manipulation only has to get inside the perimeter once.

There’s a counterargument worth taking seriously: Content Credentials (C2PA) and camera provenance tools are designed exactly for this. If every video is cryptographically signed at capture, the refutation trap collapses — you don’t argue about whether the video is real, you check the chain of custody. Adobe, Microsoft, and major camera manufacturers have committed to the standard.

The problem is that the Netanyahu case already shows why provenance isn’t sufficient. Most social media platforms strip metadata on upload. The signed certificate disappears before the video reaches viewers. And even if it didn’t, a provenance system is only as trusted as the institutions certifying it. Once that trust is contested, you’re back where you started.

So we’re in a period where the refutation mechanism is broken and the technical fix requires trusting infrastructure that hasn’t been built yet. The honest answer to “where do we go” is: probably somewhere worse before somewhere better. Poisoned information environments generate demand for authority. Someone who can just tell you what’s true. That demand gets met. Not always well.

The Netanyahu case isn’t a warning about AI deepfakes. It’s a warning about what happens when the tools we reach for to verify reality start generating the same content as the tools we use to fake it, and become indistinguishable in the public mind.