docs: update README with PR #1103 fixes (PLE split, tool parser, comprehensive tests)

e348672 verified 3 months ago

3.54 kB

library_name: mlx
license: apache-2.0
base_model: mlx-community/gemma-4-e2b-it-4bit
tags:
  - mlx
  - lora
  - gemma4
  - pentesting
  - kali-linux
  - nethunter
  - security
language:
  - en
pipeline_tag: text-generation

Gemma 4 E2B-IT — Kali NetHunter Pentest LoRA

LoRA adapters for mlx-community/gemma-4-e2b-it-4bit finetuned on Kali NetHunter penetration testing data for use on a rooted OnePlus 8T.

What it does

Teaches the model to respond like an expert pentester with structured output:

Nmap scan analysis with risk-rated tables
Attack plans with exact bash commands
WiFi, SMB, DNS enumeration workflows
NetHunter + Termux specific tooling

Training

Base model: mlx-community/gemma-4-e2b-it-4bit (Gemma 4 E2B instruction-tuned, 4-bit quantized)
Method: LoRA (rank 8, alpha 16, 4 layers)
Data: 18 pentest examples + 2 validation (chat format with system/user/assistant)
Iterations: 200 @ batch_size=1, lr=1e-5, grad_checkpoint=true
Hardware: Apple Silicon 8GB (peak memory: 4.8GB)
Final loss: Train 0.54, Val 2.13

Usage

Note: Requires mlx-lm with Gemma 4 support. Use our gemma4-fixes branch which includes critical bug fixes (see below), or the upstream gemma4 branch once PR #1103 is merged.

# Install mlx-lm with Gemma 4 fixes
git clone https://github.com/0xSoftBoi/mlx-lm.git
cd mlx-lm && git checkout gemma4-fixes
pip install -e .

from mlx_lm import load, generate

model, tokenizer = load(
    "mlx-community/gemma-4-e2b-it-4bit",
    adapter_path="0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora"
)

messages = [
    {"role": "system", "content": "Expert pentester on rooted OnePlus 8T with Kali NetHunter + Termux. Give exact commands. Be concise."},
    {"role": "user", "content": "Generate an attack plan for SMB"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=300)
print(response)

Upstream fixes (PR #1103)

This model was built alongside PR #1103 to ml-explore/mlx-lm, which adds comprehensive Gemma 4 support:

Sanitizer bug fix — The multimodal wrapper in gemma4.py prepended a double model. prefix to weight keys, causing ValueError when loading any Gemma 4 checkpoint. Fixed by removing the spurious prefix.
PLE per-layer split — E2B models store embed_tokens_per_layer as a single [262144, 8960] tensor (~9.4GB float32) which exceeds Metal's 4GB buffer limit. We split it into per-layer nn.Embedding chunks, with sanitize logic that handles both quantized (.scales/.biases) and unquantized weights.
Gemma 4 tool call parser — New function_gemma4 parser for the <|tool_call>...<tool_call|> format with <|"|> quote escaping, auto-detected via tokenizer_utils.
Comprehensive tests — MoE variant (26B-A4B), K=V shared projection variant (31B), and multimodal sanitize round-trip.

Limitations

Small training set (18 examples) — good at matching the pentest output style but may hallucinate specific CVEs or command flags
E2B is a 2B-parameter model — works great on-device but less capable than larger variants
Some safety guardrails from the base instruct model remain active

License

Apache 2.0 (same as base model)