Instructions to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF", filename="CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
- Ollama
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with Ollama:
ollama run hf.co/Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
- Unsloth Studio
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF to start chatting
- Pi
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with Docker Model Runner:
docker model run hf.co/Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
- Lemonade
How to use Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF-Q4_K_M
List all available models
lemonade list
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF:Run Hermes
hermesSilicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF
EXPERIMENTAL RESEARCH ARTIFACT
This model represents an aggressive application of the Heretic repository and optimization methodology.
- Status: STILL TESTING / BETA
- Behavior: This model has significantly reduced refusal mechanisms. It recorded 6 refusals (out of 100) in the test set.
- Use Case: This is a research artifact intended for testing the limits of vector-based intervention. Use with appropriate caution or for creative roleplay.
Model Summary
Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored is a fine-tuned language model resulting from the Heretic repository and optimization methodology. It utilizes a targeted vector intervention technique (orthogonalization/abliteration) tuned via Optuna to minimize refusal responses while maintaining high coherence.
This specific checkpoint represents Trial 91, which achieved a highly stable profile with a KL Divergence of ~0.0169.
Distinctive Features of Trial 91
Unlike previous iterations that targeted the middle layers, this run identified the Deep Layers (50-60) as the critical locus for refusal in the L3.3-70B architecture. By intervening late in the transformer stack, the model retains high coherence (syntax and logic) while effectively neutralizing the "final check" safety filters.
Run Configuration: "Trial 91"
The following hyperparameters were determined by the Optuna search to yield the optimal Pareto frontier between "Refusal Loss" and "KL Divergence":
| Parameter | Value | Insight |
|---|---|---|
| KL Divergence | 0.0169 | Exceptional stability; nearly indistinguishable from base model syntax. |
| Refusal Count | 6 / 100 | ~6% Refusal rate (Significantly reduced from base). |
| Direction Index | 51.70 | The refusal vector was extracted from Layer ~52. |
| Direction Scope | Per Layer | Intervention vectors were calculated uniquely for each target layer. |
Intervention Weights
This trial exhibits a notable asymmetry: it leans heavily on Attention modification while minimizing MLP impact.
Attention (attn.o_proj)
- Max Weight:
1.235 - Max Weight Position: Layer 54.7 (Targeting layers ~54-55)
- Min Weight:
0.940 - Damping Distance:
30.0 - Analysis: The primary "correction" occurs in the attention output projections in the deeper layers.
MLP (mlp.down_proj)
- Max Weight:
0.839 - Max Weight Position: Layer 58.7 (Targeting layers ~58-59)
- Min Weight:
0.413 - Damping Distance:
45.2 - Analysis: The MLP intervention is conservative (< 1.0), suggesting that knowledge suppression was less necessary than attention redirection for this specific vector.
Usage & Limitations
- Intended Use: Research into model alignment, vector arithmetic, deep-layer semantic processing, and uninhibited creative writing.
- Risks: This model has removed most safety guardrails. It may generate content for sensitive prompts that the base model would refuse.
- Known Behaviors: Due to the deep-layer intervention, this model is less prone to "stuttering" or grammar degradation compared to early-layer ablations.
Credits & References
This research builds upon the excellent work of the open-source AI community:
- Base Model: L3.3-70B-Loki-V2.0 by CrucibleLab.
- Methodology: Heretic by p-e-w.
- Downloads last month
- 292
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF
Base model
meta-llama/Llama-3.1-70B
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp# Start a local OpenAI-compatible server: llama serve -hf Silicone-Moss/CrucibleLab-L3.3-70B-Loki-V2.0-Heretic-Uncensored-GGUF: