Research & Writing

Publications.

Selected preprints, conference papers, and technical notes on the thermodynamics of intelligence and mechanistic interpretability.

Primary Affiliation

Metanthropic Research

I publish the majority of my formal research under the Metanthropic charter. Access our full archives, including safety evaluations and interpretability logs.

Visit Lab Archives

Metanthropic Research2026

SPECIFICATION: Metanthropic Neural Ablation via Attention Refraction (M-NAAR)

Introduces M-NAAR to resolve the 'Unlearning Trilemma.' By refracting attention away from high-entropy tokens rather than destroying weights, we achieve 0.00 hallucination rates and robust deletion without lobotomizing the model.

Machine UnlearningSafetyAttention Mechanism

Metanthropic Research2026

Specification for Latent Logic Topology & Soundness-Aware Calibration

Operationalizes LLMs as engines of 'Latent Causal Chains' to solve the RLVR Convergence Paradox. Introduces the Soundness-Aware Level (SAL), a microscopic metric that predicts post-alignment reasoning performance with 87% accuracy.

RLVRLatent TopologySAEsScaling Laws

Metanthropic Research2026

The Kinetic-Potential Information Disentanglement Protocol (KP-IDP)

Invalidates the dangerous conflation that Decodability equals Causality. Introduces KP-IDP to distinguish between 'Dark Computation' (Kinetic) and 'Phantom Readouts' (Potential), solving the intervention-reversal paradox.

Causal InferenceModel SteeringSafety

Metanthropic Research2026

Module 003-CFG: Chronometric Flux Gating

A dynamic regularization protocol that eliminates Latent Manifold Collapse in Sparse Autoencoders. By treating feature importance as a temporal trajectory, CFG reduces feature absorption by 95% compared to Top-K baselines.

Sparse AutoencodersInterpretabilityManifold Stabilization

Metanthropic Research2026

PROJECT OBLIQUE-GUARD: Latent Geometry Stabilization

Demonstrates that adversarial vulnerability is a deterministic artifact of Superposition. Introduces the Oblique-Guard Layer to filter geometric exploits by treating them as unique digital signatures within the interference lattice.

Adversarial RobustnessLatent GeometrySuperposition

Metanthropic Research2026

Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability

Proves that SFT introduces the 'Knobe Effect' moral asymmetry where negative outcomes are judged as more intentional. Proposes surgical Iso-Semantic Residual Injection (ISRI) to restore logical neutrality without degrading general reasoning.

Mechanistic InterpretabilityAI EthicsAlignment

Metanthropic Research2026

Arvi 20B: Democratizing Reasoning with Efficient MoEs

An open-weight Mixture-of-Experts reasoning model. With 20.9B total parameters and only 3.6B active parameters, it rivals frontier models on math, coding, and agentic benchmarks through variable effort reasoning.

Mixture-of-ExpertsReasoningOpen Weights

Metanthropic Research2026

MahenOCR: Commercial-Grade OCR with a 1B Parameter VLM

A 1B parameter vision-language model achieving state-of-the-art OCR through a unified end-to-end architecture. Utilizes Reinforcement Learning with Verifiable Rewards (RLVR) to eliminate cascaded module error propagation.

OCRVision-Language ModelsReinforcement Learning

Metanthropic Research2025

The Fragility of Guardrails: Cognitive Jamming and Repetition Collapse in Safety-Steered LLMs

A mechanistic audit of LLM residual streams using Sparse Autoencoders (SAEs). Demonstrates that aggressive safety-steering vectors often interfere with latent world-modeling circuits, triggering 'Cognitive Jamming'.

Mechanistic InterpretabilitySafetyPhysics

Metanthropic Research2025

Dataset Distillation for the Pre-Training Era

Introduces Linear Gradient Matching (LGM) to condense massive datasets into a single synthetic image per class, revealing shared 'Platonic' representations across foundation models (CLIP, DINO-v2).

Generative VisionFoundation ModelsDistillation

Milestone2025

Announcing Metanthropic

Founding declaration of Metanthropic, a frontier research institution architecting deterministic AI systems where safety and reasoning are verifiable, intrinsic properties of intelligence.

MilestoneCompanyFounding