Instructions to use 0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use 0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use 0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora" --prompt "Once upon a time"
Gemma 4 E2B-IT β Kali NetHunter Pentest LoRA
LoRA adapters for mlx-community/gemma-4-e2b-it-4bit finetuned on Kali NetHunter penetration testing data for use on a rooted OnePlus 8T.
What it does
Teaches the model to respond like an expert pentester with structured output:
- Nmap scan analysis with risk-rated tables
- Attack plans with exact bash commands
- WiFi, SMB, DNS enumeration workflows
- NetHunter + Termux specific tooling
Training
- Base model:
mlx-community/gemma-4-e2b-it-4bit(Gemma 4 E2B instruction-tuned, 4-bit quantized) - Method: LoRA (rank 8, alpha 16, 4 layers)
- Data: 18 pentest examples + 2 validation (chat format with system/user/assistant)
- Iterations: 200 @ batch_size=1, lr=1e-5, grad_checkpoint=true
- Hardware: Apple Silicon 8GB (peak memory: 4.8GB)
- Final loss: Train 0.54, Val 2.13
Usage
Note: Requires mlx-lm with Gemma 4 support. Use our gemma4-fixes branch which includes critical bug fixes (see below), or the upstream
gemma4branch once PR #1103 is merged.
# Install mlx-lm with Gemma 4 fixes
git clone https://github.com/0xSoftBoi/mlx-lm.git
cd mlx-lm && git checkout gemma4-fixes
pip install -e .
from mlx_lm import load, generate
model, tokenizer = load(
"mlx-community/gemma-4-e2b-it-4bit",
adapter_path="0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora"
)
messages = [
{"role": "system", "content": "Expert pentester on rooted OnePlus 8T with Kali NetHunter + Termux. Give exact commands. Be concise."},
{"role": "user", "content": "Generate an attack plan for SMB"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=300)
print(response)
Upstream fixes (PR #1103)
This model was built alongside PR #1103 to ml-explore/mlx-lm, which adds comprehensive Gemma 4 support:
- Sanitizer bug fix β The multimodal wrapper in
gemma4.pyprepended a doublemodel.prefix to weight keys, causingValueErrorwhen loading any Gemma 4 checkpoint. Fixed by removing the spurious prefix. - PLE per-layer split β E2B models store
embed_tokens_per_layeras a single[262144, 8960]tensor (~9.4GB float32) which exceeds Metal's 4GB buffer limit. We split it into per-layernn.Embeddingchunks, with sanitize logic that handles both quantized (.scales/.biases) and unquantized weights. - Gemma 4 tool call parser β New
function_gemma4parser for the<|tool_call>...<tool_call|>format with<|"|>quote escaping, auto-detected viatokenizer_utils. - Comprehensive tests β MoE variant (26B-A4B), K=V shared projection variant (31B), and multimodal sanitize round-trip.
Limitations
- Small training set (18 examples) β good at matching the pentest output style but may hallucinate specific CVEs or command flags
- E2B is a 2B-parameter model β works great on-device but less capable than larger variants
- Some safety guardrails from the base instruct model remain active
License
Apache 2.0 (same as base model)
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for 0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora
Base model
mlx-community/gemma-4-e2b-it-4bit