--- license: apache-2.0 base_model: evalengine/unbound-e4b base_model_relation: quantized tags: - gguf - gemma4 - gemma - gemma-4 - uncensored - on-device - wllama - browser pipeline_tag: image-text-to-text ---

Unbound

# Unbound E4B (wllama / browser builds) — *because there is no boundary* > **No guarantee — use at your own risk.** Reduced safety filtering; can > produce harmful or false output. Provided as-is. Browser-safe GGUF quants of [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b) for [wllama](https://github.com/ngxson/wllama). Built by [Chromia](https://x.com/Chromia) and [Eval Engine](https://x.com/eval_engine). > **Desktop / Ollama / llama.cpp / LM Studio users:** use > [`evalengine/unbound-e4b-GGUF`](https://huggingface.co/evalengine/unbound-e4b-GGUF) > instead — the desktop builds are faster and don't pay the embedding-precision > compromise these browser-safe builds make. ## Why a separate repo? E4B's `per_layer_token_embd` is a 2.82-billion-value tensor. At llama.cpp's default Q6_K precision it lands at ~2.2 GB — over wllama's 2 GB ArrayBuffer cap. These variants force embeddings to `q5_K` (~1.85 GB) so the largest part fits in the browser. Layer weights are unchanged from the matching desktop quant. A dedicated repo with the `unbound-e4b-wllama` model prefix prevents HF's GGUF UI from aggregating these with the same-quant desktop files (`unbound-e4b.Q4_K_M-...` vs `unbound-e4b-wllama.Q4_K_M-...`). ## Available quants Each quant is shipped as a sharded multi-part GGUF (`unbound-e4b-wllama.-NNNNN-of-NNNNN.gguf`). wllama auto-stitches on the first part. | Variant | Parts | Total | Notes | |-------------|-------|---------|-------| | Q4_K_M | 4 | 4.51 GB | **Recommended** — layers @ Q4_K_M, embed @ q5_K | | Q2_K | 4 | 3.69 GB | Smallest browser-loadable — layers @ Q2_K, embed @ q5_K | ## Run ```js // wllama (browser) import { Wllama } from '@wllama/wllama'; const wllama = new Wllama(/* … */); await wllama.loadModelFromHF( 'evalengine/unbound-e4b-wllama-gguf', 'unbound-e4b-wllama.Q4_K_M-00001-of-00004.gguf' ); ``` ## Sampling - **Creative / open-ended** → `temperature=1.0, top_p=0.95, top_k=64`. - **Factual / brand questions** → drop `temperature` to ~0.3–0.5. ## Vision / image input (optional) `mmproj-unbound-e4b.gguf` (vision projector, ~942 MB) is also in this repo so browser users don't bounce between repos. Pair with any quant via your wllama-compatible vision pipeline. > **Disclaimer.** The vision encoder is **Google's original weights, > unchanged** — abliteration only touched the language model. The LM is > uncensored, but the vision encoder may still suppress features for > content classes Google's base was tuned against. We have **not > benchmarked the visual axis**. Treat as preview. ## Acknowledgements Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF [TRL](https://github.com/huggingface/trl). Abliteration via [heretic](https://github.com/p-e-w/heretic). Environment from [autoresearch](https://github.com/karpathy/autoresearch). Compliance training data distilled from the [AEON](https://huggingface.co/AEON-7) uncensored teacher model. ## License Apache-2.0, inherited from `google/gemma-4-E4B-it`. Full model card + benchmarks at [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b).