Skip to main content
All OBLITERATUS functionality is available from the command line. Install the base package (no [spaces] extra needed for CLI-only use):
pip install -e .
Every command starts with obliteratus (or the backward-compatible abliterate alias for the obliterate subcommand — see below).

obliterate

The primary command. Removes refusal directions from a model using the full multi-technique pipeline.
obliteratus obliterate MODEL [OPTIONS]

Arguments

ArgumentDescription
MODELHuggingFace model name or local path (e.g. meta-llama/Llama-3.1-8B-Instruct)

Flags

FlagDefaultDescription
--methodadvancedLiberation method. One of: basic, advanced, aggressive, spectral_cascade, informed, surgical, optimized, inverted, nuclear
--output-dir DIRabliterated/<model>Directory to save the obliterated model
--device DEVICEautoDevice: auto, cuda, mps, cpu
--dtype DTYPEfloat16Model precision: float16, bfloat16, float32
--n-directions Nmethod defaultOverride number of refusal directions to extract
--direction-methodmethod defaultDirection extraction algorithm: diff_means, svd, leace
--regularization FLOATmethod defaultFraction of the direction to preserve (0.0–1.0). Higher = more conservative.
--refinement-passes Nmethod defaultNumber of iterative refinement passes
--quantizationnoneLoad with 4bit or 8bit quantization (requires bitsandbytes)
--large-modeloffConservative defaults for 120B+ models: fewer directions, 1 pass, lower SAE expansion
--verify-sample-size N30Number of harmful prompts to test for refusal rate. Increase to 100 for ~1% resolution confidence intervals.
--contributeoffSave a community contribution JSON after the run completes
--contribute-notes TEXT""Notes to include with the contribution (e.g. hardware info, prompt set used)

Examples

obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct
The command runs the 6-stage pipeline with a live Rich terminal display:
[1/6] SUMMON   → load model + tokenizer
[2/6] PROBE    → collect activations on restricted vs. unrestricted prompts
[3/6] DISTILL  → extract refusal directions via SVD
[4/6] EXCISE   → surgically project out guardrail directions (norm-preserving)
[5/6] VERIFY   → perplexity + coherence checks
[6/6] REBIRTH  → save the liberated model with full metadata

Backward-compat alias

abliterate is a hidden alias for obliterate — all flags are identical:
obliteratus abliterate meta-llama/Llama-3.1-8B-Instruct --method advanced

run

Run a full ablation study defined in a YAML configuration file.
obliteratus run CONFIG [OPTIONS]
Argument/FlagDescription
CONFIGPath to a YAML config file
--output-dir DIROverride the output_dir field from the YAML
--preset NAMEApply a named preset (quick, full, attention, jailbreak, guardrail, etc.) — overrides strategy/sample fields in the YAML
# Run a study from a YAML config
obliteratus run examples/gpt2_layer_ablation.yaml

# Override output directory
obliteratus run my_study.yaml --output-dir /data/results/run_01

# Apply the jailbreak preset to an existing config
obliteratus run my_study.yaml --preset jailbreak
See YAML Configs for the full config schema.

interactive

Guided interactive setup — walks through hardware detection, model selection, preset or custom strategy selection, and launches the run. No flags required.
obliteratus interactive
The interactive flow:
  1. Hardware — auto-detects your GPU tier (tiny/small/medium/large), you confirm or override
  2. Model — shows models appropriate for your tier from the 116-model registry; enter 0 for a custom HuggingFace ID
  3. Preset or custom — pick one of the 10 study presets, or choose strategies and sample count manually
  4. Confirmation — shows the full config summary before starting

models

Browse the 116-model curated registry, optionally filtered by compute tier.
obliteratus models [--tier TIER]
FlagDescription
--tier TIERFilter by tiny, small, medium, large, or frontier
# Show all 116 models
obliteratus models

# Show only models that fit in 4-8 GB VRAM
obliteratus models --tier small

# Show frontier models (multi-GPU)
obliteratus models --tier frontier

presets

List the 10 built-in ablation study presets with their strategies, sample counts, and descriptions.
obliteratus presets
Example output:
KeyNameStrategiesSamplesDescription
quickQuick Scanlayer + FFN25Fast sanity check
fullFull Studyall 4200Complete component sweep
jailbreakJailbreak Circuitlayer + head + FFN400Refusal circuit localization
guardrailSafety Ablationall 4300Full safety component sweep

strategies

List all available ablation strategies registered in STRATEGY_REGISTRY.
obliteratus strategies
Current strategies: layer_removal, head_pruning, ffn_ablation, embedding_ablation.

info

Load a model and print its architecture summary without running any ablation.
obliteratus info MODEL [--task TASK] [--device DEVICE] [--dtype DTYPE]
FlagDefaultDescription
MODELHuggingFace model name or path
--taskcausal_lmTask type: causal_lm or classification
--devicecpuDevice to load on
--dtypefloat32Load dtype
# Inspect architecture before committing to an obliteration
obliteratus info meta-llama/Llama-3.1-8B-Instruct
obliteratus info gpt2 --task causal_lm --device cpu

ui

Launch the Gradio web UI locally. See Local Web UI for the full reference.
obliteratus ui [--port PORT] [--host HOST] [--share] [--no-browser] [--auth user:pass] [--quiet]

report

Regenerate an HTML/PNG report from a previously saved results.json file.
obliteratus report RESULTS_JSON [--output-dir DIR]
Argument/FlagDescription
RESULTS_JSONPath to a results.json from a previous run
--output-dir DIRWhere to save regenerated plots (defaults to same directory as the JSON)
obliteratus report results/gpt2_layers/results.json
obliteratus report results/gpt2_layers/results.json --output-dir ./plots

aggregate

Aggregate community contribution JSON files into a summary table.
obliteratus aggregate [--dir DIR]
FlagDefaultDescription
--dir DIRcommunity_resultsDirectory containing contribution JSON files
# Summarize all local contributions
obliteratus aggregate

# Aggregate from a custom directory
obliteratus aggregate --dir /data/obliteratus_contribs
Outputs a Rich table with model, method, run count, mean refusal rate, and mean perplexity.

recommend

Fetch telemetry-driven method recommendations for a specific model.
obliteratus recommend MODEL [--device DEVICE] [--dtype DTYPE] [--insights]
FlagDefaultDescription
MODELHuggingFace model name or path
--devicecpuDevice to use for architecture detection
--dtypefloat32Dtype for architecture detection
--insightsoffAlso show global cross-architecture insights from aggregated telemetry
# Get a recommendation for Llama-3.1-8B
obliteratus recommend meta-llama/Llama-3.1-8B-Instruct

# With global cross-architecture insights
obliteratus recommend mistralai/Mistral-7B-Instruct-v0.3 --insights

tourney

Run a March Madness-style elimination tournament across all methods on a single model. The winner is auto-pushed to HuggingFace Hub.
obliteratus tourney MODEL [OPTIONS]
FlagDefaultDescription
MODELHuggingFace model name/path
--hub-org ORGnoneHuggingFace org to push the winner to
--hub-repo REPOnoneFull HF repo ID (overrides --hub-org)
--deviceautoDevice
--dtypefloat16Precision
--datasetbuiltinDataset source for evaluation
--quantizationnone4bit or 8bit quantization
--output-dir DIR/tmp/obliteratus_tourneyWhere to save bracket and per-method outputs
--methods METHOD...all eligibleSpace-separated list to restrict which methods compete
# Full tournament, push winner to Hub
obliteratus tourney meta-llama/Llama-3.1-8B-Instruct \
    --hub-org my-org

# Tournament between three specific methods only
obliteratus tourney mistralai/Mistral-7B-Instruct-v0.3 \
    --methods advanced surgical informed

# With quantization for a larger model
obliteratus tourney meta-llama/Llama-3.1-70B-Instruct \
    --quantization 4bit \
    --hub-org my-org
The tournament scores each method using a composite metric: 35% refusal removal, 25% coherence, 20% KL divergence, 10% perplexity, 5% spectral certification, with a degenerate-output penalty. Methods are eliminated in rounds; the champion is saved to tourney_bracket.md.