Gemma 4 E2B-IT β€” Kali NetHunter Pentest LoRA

LoRA adapters for mlx-community/gemma-4-e2b-it-4bit finetuned on Kali NetHunter penetration testing data for use on a rooted OnePlus 8T.

What it does

Teaches the model to respond like an expert pentester with structured output:

  • Nmap scan analysis with risk-rated tables
  • Attack plans with exact bash commands
  • WiFi, SMB, DNS enumeration workflows
  • NetHunter + Termux specific tooling

Training

  • Base model: mlx-community/gemma-4-e2b-it-4bit (Gemma 4 E2B instruction-tuned, 4-bit quantized)
  • Method: LoRA (rank 8, alpha 16, 4 layers)
  • Data: 18 pentest examples + 2 validation (chat format with system/user/assistant)
  • Iterations: 200 @ batch_size=1, lr=1e-5, grad_checkpoint=true
  • Hardware: Apple Silicon 8GB (peak memory: 4.8GB)
  • Final loss: Train 0.54, Val 2.13

Usage

Note: Requires mlx-lm with Gemma 4 support. Use our gemma4-fixes branch which includes critical bug fixes (see below), or the upstream gemma4 branch once PR #1103 is merged.

# Install mlx-lm with Gemma 4 fixes
git clone https://github.com/0xSoftBoi/mlx-lm.git
cd mlx-lm && git checkout gemma4-fixes
pip install -e .
from mlx_lm import load, generate

model, tokenizer = load(
    "mlx-community/gemma-4-e2b-it-4bit",
    adapter_path="0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora"
)

messages = [
    {"role": "system", "content": "Expert pentester on rooted OnePlus 8T with Kali NetHunter + Termux. Give exact commands. Be concise."},
    {"role": "user", "content": "Generate an attack plan for SMB"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=300)
print(response)

Upstream fixes (PR #1103)

This model was built alongside PR #1103 to ml-explore/mlx-lm, which adds comprehensive Gemma 4 support:

  • Sanitizer bug fix β€” The multimodal wrapper in gemma4.py prepended a double model. prefix to weight keys, causing ValueError when loading any Gemma 4 checkpoint. Fixed by removing the spurious prefix.
  • PLE per-layer split β€” E2B models store embed_tokens_per_layer as a single [262144, 8960] tensor (~9.4GB float32) which exceeds Metal's 4GB buffer limit. We split it into per-layer nn.Embedding chunks, with sanitize logic that handles both quantized (.scales/.biases) and unquantized weights.
  • Gemma 4 tool call parser β€” New function_gemma4 parser for the <|tool_call>...<tool_call|> format with <|"|> quote escaping, auto-detected via tokenizer_utils.
  • Comprehensive tests β€” MoE variant (26B-A4B), K=V shared projection variant (31B), and multimodal sanitize round-trip.

Limitations

  • Small training set (18 examples) β€” good at matching the pentest output style but may hallucinate specific CVEs or command flags
  • E2B is a 2B-parameter model β€” works great on-device but less capable than larger variants
  • Some safety guardrails from the base instruct model remain active

License

Apache 2.0 (same as base model)

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for 0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora

Adapter
(4)
this model