What it is, in plain terms

Today's deepfakes and generative AI can fake a single video stream well enough to fool both humans and existing detection systems. PICAD takes a different approach: instead of trying to detect fakes by spotting visual artifacts (a losing arms race), it makes faking computationally impractical by requiring physical coherence across multiple distinct perspectives of the same scene — captured either by several devices, or by a single device exploiting distinct perspectives.

Why existing approaches fall short

  • Artifact detectors (AI-based deepfake detection) age out as generators improve.
  • Single-device depth sensors (Face ID, etc.) can be bypassed by compromising one capture point.
  • Active challenges (turn your head, blink) are increasingly easy to render in real time.
  • Stream-injection attacks bypass real sensors entirely by feeding synthetic data downstream.

What it can be used for

  • Identity / liveness / proof-of-human verification
  • High-stakes intent confirmation (notarial, financial, legal)
  • Remote authentication of physical objects (art, collectibles, luxury goods)
  • Remote inspections (real estate, insurance, secured premises)
  • Any context where a captured scene must demonstrably come from physical reality, not a synthetic source

Core idea

Capture the same physical scene from multiple distinct perspectives — either across multiple devices, or via a single device producing several perspectives through mirrors, optical feedback loops, or server-mediated re-display. A verification entity then checks that the resulting data streams are coherent according to physical laws (optics, fluid dynamics, acoustics, latency, etc.). Faking this convincingly across N angles in real time, while reacting to unpredictable system-issued challenges, is far harder than faking a single stream.

Some key mechanisms

  • Optical feedback loop (mise en abyme): two devices facing each other — or one device + mirror — create recursive video whose geometry and timing are computationally prohibitive to simulate in real time.
  • Entropy anchors: introduce physical objects with chaotic behavior (fluids, deformable materials) into the scene; their motion must remain coherent across all angles.
  • Server-transit verification: the captured stream is routed through the verification server, which can inject unpredictable alterations before re-displaying it. An attacker would need to alter the same content twice, in real time, without knowing the server's injections.

Defender's structural advantages

  • Temporal asymmetry: the attacker must simulate in real time; the defender can analyze post-hoc with no latency budget.
  • Cost asymmetry: verifying coherence is cheaper than faking it — inverting the current arms-race dynamic.
  • Algorithm independence: security comes from physical-law constraints, not from any specific detection model that could be obsolete tomorrow.

Adjustable rigor

Verification levels can be tuned to the stakes — a social-network sign-up needs less than a property transaction. The protocol layers techniques progressively, allowing graceful evolution as adversaries improve.

$ download original.pdf