CUDA (optional) — required for GPU acceleration on NVIDIA hardware
Git — to install from source
CUDA is not required. OBLITERATUS runs on CPU for small models (Tiny tier), on Apple Silicon via MPS, and on NVIDIA GPUs via CUDA. CPU-only runs are significantly slower for models above ~3B parameters.
Install PyTorch with CUDA support before installing OBLITERATUS. Visit pytorch.org to get the right install command for your CUDA version.
# Example for CUDA 12.1pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121# Then install OBLITERATUSpip install -e .
bitsandbytes is included in the core dependencies and enables 4-bit and 8-bit quantization on CUDA. Use --quantization 4bit or --quantization 8bit with the obliterate command to load large models with reduced VRAM.
mlx and mlx-lm are macOS-only. Do not install requirements-apple.txt on Linux or Windows — the packages will fail to install.
PyTorch device selection defaults to auto, which picks MPS on Apple Silicon when available. You can override with --device mps or --device cpu.
OBLITERATUS works on CPU without any special configuration. Install the standard package:
pip install -e .
CPU is practical for Tiny tier models (GPT-2, TinyLlama 1.1B, Qwen2.5-0.5B) and for inspection commands like obliteratus info and obliteratus strategies. For models above ~3B parameters, expect significantly slower probe collection and projection.
Use obliteratus models --tier tiny to browse models that run comfortably on CPU.
Confirm the CLI is available and the package imported correctly:
# Print help and confirm the CLI is on your PATHobliteratus --help# List available ablation strategies (imports the full package)obliteratus strategies# Browse the curated model libraryobliteratus models
Or verify from Python:
from obliteratus.abliterate import AbliterationPipeline, METHODS# Print all available obliteration methodsfor name, cfg in METHODS.items(): print(f"{name}: {cfg['description']}")
A Dockerfile is included for local containerized usage.
The included Dockerfile is for local Docker usage only. The HuggingFace Space runs on ZeroGPU via the Gradio SDK and does not use this Dockerfile.
# Build the imagedocker build -t obliteratus .# Run with GPU access (NVIDIA)docker run --gpus all -p 7860:7860 obliteratus# Run CPU-onlydocker run -p 7860:7860 obliteratus
The container runs python app.py and exposes the Gradio web UI on port 7860. Mount a volume to persist obliterated models outside the container:
docker run --gpus all -p 7860:7860 -v $(pwd)/output:/app/abliterated obliteratus
For Large and Frontier tier models, use --quantization 4bit to reduce peak VRAM usage. For 120B+ models, add --large-model to enable conservative defaults (fewer directions, single pass).
# Large model on a single 24 GB GPUobliteratus obliterate meta-llama/Llama-3.1-8B-Instruct \ --method advanced \ --quantization 4bit# Frontier model on multi-GPU with conservative settingsobliteratus obliterate deepseek-ai/DeepSeek-V3 \ --method advanced \ --quantization 4bit \ --large-model
Run obliteratus recommend <model> to get a telemetry-driven method and hyperparameter recommendation for any model before you obliterate it.