PROJECT OBLIQUE-GUARD: Latent Geometry Stabilization

Artifacts Available:Read Paper View Code

The Compression Axiom

For a decade, the field has debated the "Robustness vs. Accuracy" trade-off: the belief that one must sacrifice model performance to make it safe. We reject this premise.

Our research posits that adversarial vulnerability is not a stochastic failure of the learning process ("bugs"), but a deterministic artifact of hyper-efficient compression, known as Superposition.

To achieve high-efficiency inference, neural networks pack $M$ features into $d$ dimensions (where $M \gg d$ ). The so-called "attacks" are simply the mathematical residue of these overlapping feature vectors. To remove the vulnerability via traditional means (like adversarial training) is to cripple the model's capacity to compress information.

Foundational Physics: The Three Axioms

The Oblique-Guard architecture rests on three axiomatic definitions of neural computation, operationalizing the "Linear Representation Hypothesis" as a strict encoding standard.

I. Linear Semantic Alignment (LSA)

We treat latent activations not as amorphous tensors, but as discrete superpositions of semantic variables. A change in a concept (e.g., "Cat" to "Dog") manifests as a precise linear translation in activation space.

II. Hyper-Dimensional Feature Interleaving (HDFI)

The engine utilizes HDFI to maximize entropic efficiency. This state is defined by:

Overcompleteness: The feature basis exceeds the channel width ( $M > d$ ).
Non-Orthogonality: Feature vectors exhibit non-zero cosine similarity ( $\mathbf{v}_i^\top \mathbf{v}_j \neq 0$ ).

III. The Adversarial Manifold

We redefine "Adversarial Attacks" as Interference Exploits. An adversarial perturbation $\delta$ is a calculated input vector designed to maximize the Deterministic Interference term, pushing an activation across a decision threshold without semantic justification.

Interference Geometry Visualization — **Figure 2: The Interference Highway.** The adversary is not "tricking" the model; they are traversing a predictable path built by the model's own compression efficiency.

The "Kill Vector" Derivation

Our synthetic diagnostics confirm that adversaries do not inject random noise. Instead, they solve for specific Interference Normals.

We derived the exact form of the optimal adversarial perturbation ( $\delta^*$ ) in a linear superposition regime:

\delta^* \propto \mathbf{W}_e^\top (\mathbf{v}_k - \mathbf{v}_j)

This equation proves that the attack is strictly proportional to the interference between the target feature $\mathbf{v}_k$ and the current feature $\mathbf{v}_j$ . If the model's geometry is known, the attack is calculable, and therefore filterable.

Experimental Validation

We deployed this architecture on a standard Vision Transformer (ViT) chassis calibrated on CIFAR-10, utilizing a custom Interference Compression Module (ICM) to force superposition.

1. The Law of Inverse Robustness

Our results confirm a strict scaling law: As the bottleneck dimension $m$ decreases (increasing superposition pressure), the Normalized Robust Accuracy collapses. Denser packing reduces the margin for error, requiring smaller perturbations to bridge the distance between features.

2. Geometric Synchronization (Transferability)

Perhaps most critically, we found that input correlations constrain the lattice geometry.

Uncorrelated Inputs: Result in stochastic, unique lattices.
Correlated Inputs (Real World): Result in Deterministic Lattices. Independent models converge to nearly identical geometric arrangements.

This explains Attack Transferability: An attack generated on Model A succeeds on Model B because they share the same "Interference Highways."

Scaling Analysis — **Figure 3: Geometric Convergence.** Models trained on correlated data form identical interference patterns.

The Solution: Oblique-Guard

Instead of retraining the model to be "robust" (and less capable), we introduce the Oblique-Guard Layer.

This module treats the latent space as a known Interference Lattice. We map specific vector combinations where superposed features destructively interfere. Any input gradient attempting to traverse these specific high-interference corridors is flagged as an "Algorithmic Exploit" and zeroed out before activation.

By identifying inputs that align perfectly with our internal interference geometry, we effectively convert "Adversarial Vulnerability" from a liability into a unique digital signature of the attack itself.

Adversarial RobustnessLatent GeometrySuperpositionSafetyMechanistic Interpretability

Author

Ekjot Singh&Metanthropic

Director & Lead Researcher • Research Lab

Connect