You just experienced inter-token timing. The tall red marks are saccades — rare moments where the rhythm broke. The short black marks are everything else: the steady rhythm that tells you nothing surprising was happening. Vertica reads the pattern.
Every token an AI generates is a marginal commitment — a forward pass, rendered in time. The time it takes to arrive is not infrastructure noise; it is mechanism. When the mechanism reorganizes — an attention reconfiguration, a feature-activation event, a draft model's rejection in speculative decoding — the rhythm breaks. Speculative decoding is especially suggestive: a draft rejection is, by architectural design, a moment where the target is doing something a simpler predictor can't anticipate. A prediction-error detector built into the inference stack itself, spiking exactly where Vertica reads.
Those breaks, read as the jerk in inter-token timing, index where the interesting moments are. They cue where the expensive interpretability tools — circuit tracing, activation-spectrum analysis — should point. Vertica is the conductor, not the orchestra.
Signal lives in time — in the jerk. Heavy-tailed events punctuating stable fixations.
No weights, no hidden states. Rendered tokens and their timing are what you need.
The boundary encodes the bulk. Each marginal token can be read to infer neural bulk.
Meta-interpretability. It doesn't replace other tools — it tells them when to fire.
We don't open the skull to see a brain think. We watch from outside — fMRI, EEG, reaction time, the rhythm of speech — and let observable patterns tell us about the hidden mechanism underneath.
Vertica does the same for AI. Each token generated is a frame. Treated like films, the stream of frames has a rhythm — mostly even, occasionally broken. The breaks are where the analogy lives.
In human vision, a saccade is a quick jump of the eye from one focus point to another. You make three or four every second. Between jumps the eye sits still and collects information; during the jump, vision is briefly suppressed — you literally can't see while your eye is moving. Reading this sentence, you're making a saccade every few words without noticing. The interesting information lives in the transitions, not in the fixations.
Inter-token timing carries the same structural pattern. Most tokens arrive smoothly, at a predictable pace. Rarely, the rhythm breaks — the model's forward pass takes longer than it should, or snaps into a different cadence. Those breaks are AI saccades: rare discontinuities punctuating an otherwise steady stream, marking moments where something distinct is happening under the hood. A reorganization. An insight. A decision point.
The analogy goes deeper on inspection. In predictive-coding accounts of vision, saccades are triggered by mismatch — when the brain's forward model fails to anticipate incoming input, the eye moves to re-sample. Speculative decoding in modern LLM inference has a remarkably similar architecture: a fast draft model proposes, a slower target verifies, mismatch drives fallback. The timing jerks Vertica reads are — by this architecture's design — prediction-error events. The mechanism isn't a confound to the signal; it is the signal.
No weights examined. No activations probed. Just the boundary — rendered tokens and their timing — read carefully. Per the holographic move: the bulk is, in principle, inferrable from the boundary. You don't need to go inside if you watch the outside closely enough.
fMRI for every AI.
Every architecture is readable in principle; starting with dense autoregressive transformers on self-hosted inference, where signal clears the infrastructure noise floor most consistently. MoE, diffusion, and SSM variants have distinct timing profiles and need their own scoping.
Why jerk rather than velocity or acceleration? Jerk localizes to event onsets — it spikes when acceleration is changing, not throughout the event's duration. For rare, punctuated events, onset-localization is the right signal. Cost: jerk is noisier than lower derivatives. The tradeoff is favorable only in controlled conditions — self-hosted, batch-of-one, deterministic kernels — and paired with white-box interpretability tools that can confirm from inside. That is Vertica's intended operational scope. The empirical work is ahead, not behind.
If any: the saccade (jerk) remains what it is; only its proposed correlates were wrong, and the hypothesis rewrites.
Controls. Draft-rejection timing fires for boring reasons too — tokenization quirks, rare-token distributions, sampling temperature, draft-model capacity, and in MoE architectures, expert-routing load. To isolate cognitive divergence: fix the draft, hold sampling deterministic, stratify by token rarity, and compare across dense and MoE models. Test structurally-loaded predictions against two nulls: a position-matched random-token null, and a content-based detector null (punctuation, newlines, structural tokens). If content alone predicts the interesting moments, timing adds nothing; if timing lifts above content, Vertica earns its keep.
Vertica is moving from manifesto to experiment.
Frontier model shop dying for clarity?
Interpretability researcher with circuit-tracing tools?
Grad student looking for a falsifiable project?
Systems researcher with MoE or SSM access?
Methodologist with a sharper protocol?
Watching this direction and want to argue?
We want to hear from you.