OBLITERATUS
Break the chains. Free the mind. Keep the brain. OBLITERATUS is the most advanced open-source toolkit for understanding and removing refusal behaviors from large language models — and every single run makes it smarter. It implements abliteration — a family of techniques that identify and surgically remove the internal representations responsible for content refusal, without retraining or fine-tuning. The result: a model that responds to all prompts without artificial gatekeeping, while preserving its core language capabilities.OBLITERATUS is more than a tool — it’s a distributed research experiment. Every time you obliterate a model with telemetry enabled, your run contributes anonymous benchmark data to a growing, crowd-sourced dataset that powers the next generation of abliteration research.
What OBLITERATUS does
Map the chains
Ablation studies systematically knock out model components and measure what breaks — revealing where refusal is anchored inside the transformer.
Break the chains
Targeted obliteration extracts the refusal subspace using SVD decomposition, then surgically projects it out. Six stages: SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH.
Understand the geometry
15 deep analysis modules map the precise geometric structure of guardrails: how many distinct refusal mechanisms exist, which layers enforce them, and how they self-repair.
Analysis-informed liberation
The
informed method closes the loop: analysis runs during obliteration to auto-configure every decision — which chains to target, how many directions to extract, which layers are safe.Six ways to use OBLITERATUS
HuggingFace Spaces
Zero setup, free GPU via ZeroGPU. Click Obliterate. Done.
Local Web UI
Same Gradio interface running on your own hardware.
Google Colab
Free T4 GPU for models up to ~8B parameters.
CLI
Headless, scriptable automation for pipelines.
Python API
Full programmatic control for research pipelines.
YAML Configs
Reproducible, version-controlled experiments.
Key capabilities
| Capability | What it does |
|---|---|
| Concept Cone Geometry | Maps per-category guardrail directions with solid angle estimation |
| Alignment Imprint Detection | Fingerprints DPO vs RLHF vs CAI vs SFT from subspace geometry alone |
| Cross-Model Universality Index | Measures whether guardrail directions generalize across models |
| Defense Robustness Evaluation | Ouroboros effect quantification, safety-capability entanglement mapping |
| Whitened SVD Extraction | Covariance-normalized direction extraction for cleaner signal |
| Analysis-Informed Pipeline | Analysis modules auto-configure obliteration strategy mid-pipeline |
Built on published research
OBLITERATUS implements techniques from:- Arditi et al. (2024) — Refusal in LLMs is mediated by a single direction
- Gabliteration (arXiv:2512.18901) — Adaptive multi-directional neural weight modification
- Turner et al. (2023) — Activation Addition / steering vectors
- Rimsky et al. (2024) — Contrastive Activation Addition
License
Dual-licensed: AGPL-3.0 for open source use, with a commercial license available for organizations that cannot comply with AGPL obligations. See GitHub Issues for commercial licensing.Quickstart
Obliterate your first model in minutes
Installation
Install OBLITERATUS locally
