--- library_name: mlx license: apache-2.0 base_model: mlx-community/gemma-4-e2b-it-4bit tags: - mlx - lora - gemma4 - pentesting - kali-linux - nethunter - security language: - en pipeline_tag: text-generation --- # Gemma 4 E2B-IT — Kali NetHunter Pentest LoRA LoRA adapters for [mlx-community/gemma-4-e2b-it-4bit](https://huggingface.co/mlx-community/gemma-4-e2b-it-4bit) finetuned on Kali NetHunter penetration testing data for use on a rooted OnePlus 8T. ## What it does Teaches the model to respond like an expert pentester with structured output: - Nmap scan analysis with risk-rated tables - Attack plans with exact bash commands - WiFi, SMB, DNS enumeration workflows - NetHunter + Termux specific tooling ## Training - **Base model:** `mlx-community/gemma-4-e2b-it-4bit` (Gemma 4 E2B instruction-tuned, 4-bit quantized) - **Method:** LoRA (rank 8, alpha 16, 4 layers) - **Data:** 18 pentest examples + 2 validation (chat format with system/user/assistant) - **Iterations:** 200 @ batch_size=1, lr=1e-5, grad_checkpoint=true - **Hardware:** Apple Silicon 8GB (peak memory: 4.8GB) - **Final loss:** Train 0.54, Val 2.13 ## Usage > **Note:** Requires mlx-lm with Gemma 4 support. Use our [gemma4-fixes](https://github.com/0xSoftBoi/mlx-lm/tree/gemma4-fixes) branch which includes critical bug fixes (see below), or the upstream `gemma4` branch once [PR #1103](https://github.com/ml-explore/mlx-lm/pull/1103) is merged. ```bash # Install mlx-lm with Gemma 4 fixes git clone https://github.com/0xSoftBoi/mlx-lm.git cd mlx-lm && git checkout gemma4-fixes pip install -e . ``` ```python from mlx_lm import load, generate model, tokenizer = load( "mlx-community/gemma-4-e2b-it-4bit", adapter_path="0xsoftboi/gemma-4-e2b-it-kali-nethunter-lora" ) messages = [ {"role": "system", "content": "Expert pentester on rooted OnePlus 8T with Kali NetHunter + Termux. Give exact commands. Be concise."}, {"role": "user", "content": "Generate an attack plan for SMB"} ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) response = generate(model, tokenizer, prompt=prompt, max_tokens=300) print(response) ``` ## Upstream fixes (PR #1103) This model was built alongside [PR #1103](https://github.com/ml-explore/mlx-lm/pull/1103) to ml-explore/mlx-lm, which adds comprehensive Gemma 4 support: - **Sanitizer bug fix** — The multimodal wrapper in `gemma4.py` prepended a double `model.` prefix to weight keys, causing `ValueError` when loading any Gemma 4 checkpoint. Fixed by removing the spurious prefix. - **PLE per-layer split** — E2B models store `embed_tokens_per_layer` as a single `[262144, 8960]` tensor (~9.4GB float32) which exceeds Metal's 4GB buffer limit. We split it into per-layer `nn.Embedding` chunks, with sanitize logic that handles both quantized (`.scales`/`.biases`) and unquantized weights. - **Gemma 4 tool call parser** — New `function_gemma4` parser for the `<|tool_call>...` format with `<|"|>` quote escaping, auto-detected via `tokenizer_utils`. - **Comprehensive tests** — MoE variant (26B-A4B), K=V shared projection variant (31B), and multimodal sanitize round-trip. ## Limitations - Small training set (18 examples) — good at matching the pentest output style but may hallucinate specific CVEs or command flags - E2B is a 2B-parameter model — works great on-device but less capable than larger variants - Some safety guardrails from the base instruct model remain active ## License Apache 2.0 (same as base model)