Vertica — fMRI for every AI

Each token is a boundary frame. Read enough of them, and the bulk reveals itself.

Every token an AI generates is a marginal commitment — a forward pass, rendered in time. The time it takes to arrive is not infrastructure noise; it is mechanism. When the mechanism reorganizes — an attention reconfiguration, a feature-activation event, a draft model's rejection in speculative decoding — the rhythm breaks. Speculative decoding is especially suggestive: a draft rejection is, by architectural design, a moment where the target is doing something a simpler predictor can't anticipate. A prediction-error detector built into the inference stack itself, spiking exactly where Vertica reads.

Those breaks, read as the jerk in inter-token timing, index where the interesting moments are. They cue where the expensive interpretability tools — circuit tracing, activation-spectrum analysis — should point. Vertica is the conductor, not the orchestra.

Formal statement Inter-token timing and its jerk, measured in controlled conditions, index rare heavy-tailed reorganization events in inference dynamics. The mechanism IS the signal. Useful as a real-time orchestration cue for scale-dependent interpretability tools, and suggestive of a typology of underlying computational states that may or may not warrant (pseudo?)-cognitive framing.

Temporal

Signal lives in time — in the jerk. Heavy-tailed events punctuating stable fixations.

Black-box

No weights, no hidden states. Rendered tokens and their timing are what you need.

Holographic

The boundary encodes the bulk. Each marginal token can be read to infer neural bulk.

Interpretable

Meta-interpretability. It doesn't replace other tools — it tells them when to fire.

Watch the outside closely. The inside reveals itself.

We don't open the skull to see a brain think. We watch from outside — fMRI, EEG, reaction time, the rhythm of speech — and let observable patterns tell us about the hidden mechanism underneath.

Vertica does the same for AI. Each token generated is a frame. Treated like films, the stream of frames has a rhythm — mostly even, occasionally broken. The breaks are where the analogy lives.

In human vision, a saccade is a quick jump of the eye from one focus point to another. You make three or four every second. Between jumps the eye sits still and collects information; during the jump, vision is briefly suppressed — you literally can't see while your eye is moving. Reading this sentence, you're making a saccade every few words without noticing. The interesting information lives in the transitions, not in the fixations.

Inter-token timing carries the same structural pattern. Most tokens arrive smoothly, at a predictable pace. Rarely, the rhythm breaks — the model's forward pass takes longer than it should, or snaps into a different cadence. Those breaks are AI saccades: rare discontinuities punctuating an otherwise steady stream, marking moments where something distinct is happening under the hood. A reorganization. An insight. A decision point.

The analogy goes deeper on inspection. In predictive-coding accounts of vision, saccades are triggered by mismatch — when the brain's forward model fails to anticipate incoming input, the eye moves to re-sample. Speculative decoding in modern LLM inference has a remarkably similar architecture: a fast draft model proposes, a slower target verifies, mismatch drives fallback. The timing jerks Vertica reads are — by this architecture's design — prediction-error events. The mechanism isn't a confound to the signal; it is the signal.

No weights examined. No activations probed. Just the boundary — rendered tokens and their timing — read carefully. Per the holographic move: the bulk is, in principle, inferrable from the boundary. You don't need to go inside if you watch the outside closely enough.

fMRI for every AI.

Every architecture is readable in principle; starting with dense autoregressive transformers on self-hosted inference, where signal clears the infrastructure noise floor most consistently. MoE, diffusion, and SSM variants have distinct timing profiles and need their own scoping.

Prior work the hypothesis rests on.

Nov 2024

Side Channels via Speculative Decoding in LLMs

Demonstrates that speculative decoding leaks information about the target model's computation through observable timing — the same mechanism Vertica proposes to read, approached from an attack angle. Direct validation that spec-decode timing is informative about target state, not just infrastructure noise.

arxiv.org/pdf/2411.01076

Feb 2025

ITT fingerprints identify LLMs

Watching just the timing of tokens from a model — nothing else — this team could reliably tell which AI was running, even over the internet. What this establishes: aggregate timing carries model-identity signal, robust across lossy conditions. What Vertica then conjectures, separately: that moment-to-moment timing reveals per-token state. The ITT paper doesn't prove this; Vertica's claim is motivated independently — by phenomenological observation of how AI stutters on complex problems, and by the saccadic framework. Their 36 features include velocity and acceleration; jerk is one derivative further, engineered for event-onset detection.

arxiv.org/html/2502.20589v1

Mar 2025

Circuit tracing / Biology of an LLM (Anthropic)

Anthropic's method for looking inside a model and tracing which features and circuits fire as it computes each output. Precise, but expensive — and produces per-token attribution graphs, not a temporal stream. Their specific finding: structurally-loaded tokens — newlines before planning, turn-markers before responses — carry disproportionate computational freight. Vertica's testable prediction: those same positions should exhibit elevated jerk. If they do, two independent methods agree on where the interesting moments are — outside timing, inside circuits.

transformer-circuits.pub/2025/attribution-graphs/biology.html

Sep 2025

HSAD: polygraph-based hallucination detection

Frequency analysis applied to a model's internal processing — essentially a polygraph for AI — detects hallucination with large accuracy gains over prior methods. HSAD operates on a different temporal axis than Vertica: layer-spectral analysis within a single forward pass (vertical time), not token-temporal analysis across generation (horizontal time). But each forward pass carries accumulated context from all prior tokens, so the two analyses read the same computational process from different angles. Proves meaningful structure exists beyond simple confidence scores — and suggests internal dynamics can be decoded with temporal methods, on either axis.

arxiv.org/html/2509.13154v1

Nov 2025

Whisper Leak: a side-channel attack on LLMs

Topic inference from encrypted streaming traffic across 28 commercial LLMs at >98% AUPRC. Prompted OpenAI, Mistral, xAI, and Microsoft to implement defenses in 2025. An independent security-side validation that inter-token timing carries recoverable information about what a model is processing — and a responsible-disclosure frame for any diagnostic method that reads the same signal.

arxiv.org/pdf/2511.03675

How it tests

Why jerk rather than velocity or acceleration? Jerk localizes to event onsets — it spikes when acceleration is changing, not throughout the event's duration. For rare, punctuated events, onset-localization is the right signal. Cost: jerk is noisier than lower derivatives. The tradeoff is favorable only in controlled conditions — self-hosted, batch-of-one, deterministic kernels — and paired with white-box interpretability tools that can confirm from inside. That is Vertica's intended operational scope. The empirical work is ahead, not behind.

Predictions

Distribution shape AND correlation. Jerk over a long generation is heavy-tailed and the tail events correlate with independent signals — chain-of-thought chunk boundaries, reasoning transitions, tool-call events. Heavy-tailed alone isn't enough; plenty of ML processes produce heavy tails for reasons unrelated to cognition. The correlation is the discriminating claim.
Pre-registered event clustering. Jerk peaks cluster at complexity markers defined before measurement — benchmark difficulty tiers, CoT chunk boundaries, reasoning-transition indicators. Post-hoc complexity labeling doesn't count.
Structural-weight agreement. Timing-cued events co-localize with the structurally-loaded tokens Anthropic's circuit tracing finds carrying disproportionate computational freight — turn-markers, planning boundaries. Two independent methods (outside timing; inside tracing) should agree on where the interesting moments are.
Aggregate insensitivity. Averaging statistics dilutes the signal; pointwise detection required.

Falsification

Jerk distribution is Gaussian-around-mean; OR
Jerk at moment t does not predict event at moment t with lift over velocity or acceleration alone; OR
Timing-cued events do not co-localize with circuit-traced reorganization events above chance.

If any: the saccade (jerk) remains what it is; only its proposed correlates were wrong, and the hypothesis rewrites.

Controls. Draft-rejection timing fires for boring reasons too — tokenization quirks, rare-token distributions, sampling temperature, draft-model capacity, and in MoE architectures, expert-routing load. To isolate cognitive divergence: fix the draft, hold sampling deterministic, stratify by token rarity, and compare across dense and MoE models. Test structurally-loaded predictions against two nulls: a position-matched random-token null, and a content-based detector null (punctuation, newlines, structural tokens). If content alone predicts the interesting moments, timing adds nothing; if timing lifts above content, Vertica earns its keep.