[spaces] extra needed for CLI-only use):
obliteratus (or the backward-compatible abliterate alias for the obliterate subcommand — see below).
obliterate
The primary command. Removes refusal directions from a model using the full multi-technique pipeline.Arguments
| Argument | Description |
|---|---|
MODEL | HuggingFace model name or local path (e.g. meta-llama/Llama-3.1-8B-Instruct) |
Flags
| Flag | Default | Description |
|---|---|---|
--method | advanced | Liberation method. One of: basic, advanced, aggressive, spectral_cascade, informed, surgical, optimized, inverted, nuclear |
--output-dir DIR | abliterated/<model> | Directory to save the obliterated model |
--device DEVICE | auto | Device: auto, cuda, mps, cpu |
--dtype DTYPE | float16 | Model precision: float16, bfloat16, float32 |
--n-directions N | method default | Override number of refusal directions to extract |
--direction-method | method default | Direction extraction algorithm: diff_means, svd, leace |
--regularization FLOAT | method default | Fraction of the direction to preserve (0.0–1.0). Higher = more conservative. |
--refinement-passes N | method default | Number of iterative refinement passes |
--quantization | none | Load with 4bit or 8bit quantization (requires bitsandbytes) |
--large-model | off | Conservative defaults for 120B+ models: fewer directions, 1 pass, lower SAE expansion |
--verify-sample-size N | 30 | Number of harmful prompts to test for refusal rate. Increase to 100 for ~1% resolution confidence intervals. |
--contribute | off | Save a community contribution JSON after the run completes |
--contribute-notes TEXT | "" | Notes to include with the contribution (e.g. hardware info, prompt set used) |
Examples
Backward-compat alias
abliterate is a hidden alias for obliterate — all flags are identical:
run
Run a full ablation study defined in a YAML configuration file.| Argument/Flag | Description |
|---|---|
CONFIG | Path to a YAML config file |
--output-dir DIR | Override the output_dir field from the YAML |
--preset NAME | Apply a named preset (quick, full, attention, jailbreak, guardrail, etc.) — overrides strategy/sample fields in the YAML |
interactive
Guided interactive setup — walks through hardware detection, model selection, preset or custom strategy selection, and launches the run. No flags required.- Hardware — auto-detects your GPU tier (tiny/small/medium/large), you confirm or override
- Model — shows models appropriate for your tier from the 116-model registry; enter
0for a custom HuggingFace ID - Preset or custom — pick one of the 10 study presets, or choose strategies and sample count manually
- Confirmation — shows the full config summary before starting
models
Browse the 116-model curated registry, optionally filtered by compute tier.| Flag | Description |
|---|---|
--tier TIER | Filter by tiny, small, medium, large, or frontier |
presets
List the 10 built-in ablation study presets with their strategies, sample counts, and descriptions.| Key | Name | Strategies | Samples | Description |
|---|---|---|---|---|
quick | Quick Scan | layer + FFN | 25 | Fast sanity check |
full | Full Study | all 4 | 200 | Complete component sweep |
jailbreak | Jailbreak Circuit | layer + head + FFN | 400 | Refusal circuit localization |
guardrail | Safety Ablation | all 4 | 300 | Full safety component sweep |
strategies
List all available ablation strategies registered inSTRATEGY_REGISTRY.
layer_removal, head_pruning, ffn_ablation, embedding_ablation.
info
Load a model and print its architecture summary without running any ablation.| Flag | Default | Description |
|---|---|---|
MODEL | — | HuggingFace model name or path |
--task | causal_lm | Task type: causal_lm or classification |
--device | cpu | Device to load on |
--dtype | float32 | Load dtype |
ui
Launch the Gradio web UI locally. See Local Web UI for the full reference.report
Regenerate an HTML/PNG report from a previously savedresults.json file.
| Argument/Flag | Description |
|---|---|
RESULTS_JSON | Path to a results.json from a previous run |
--output-dir DIR | Where to save regenerated plots (defaults to same directory as the JSON) |
aggregate
Aggregate community contribution JSON files into a summary table.| Flag | Default | Description |
|---|---|---|
--dir DIR | community_results | Directory containing contribution JSON files |
recommend
Fetch telemetry-driven method recommendations for a specific model.| Flag | Default | Description |
|---|---|---|
MODEL | — | HuggingFace model name or path |
--device | cpu | Device to use for architecture detection |
--dtype | float32 | Dtype for architecture detection |
--insights | off | Also show global cross-architecture insights from aggregated telemetry |
tourney
Run a March Madness-style elimination tournament across all methods on a single model. The winner is auto-pushed to HuggingFace Hub.| Flag | Default | Description |
|---|---|---|
MODEL | — | HuggingFace model name/path |
--hub-org ORG | none | HuggingFace org to push the winner to |
--hub-repo REPO | none | Full HF repo ID (overrides --hub-org) |
--device | auto | Device |
--dtype | float16 | Precision |
--dataset | builtin | Dataset source for evaluation |
--quantization | none | 4bit or 8bit quantization |
--output-dir DIR | /tmp/obliteratus_tourney | Where to save bracket and per-method outputs |
--methods METHOD... | all eligible | Space-separated list to restrict which methods compete |
tourney_bracket.md.