---
license: apache-2.0
base_model: evalengine/unbound-e4b
base_model_relation: quantized
tags:
- gguf
- gemma4
- gemma
- gemma-4
- uncensored
- on-device
- wllama
- browser
pipeline_tag: image-text-to-text
---

<p align="center">
  <img src="unbound-logo.svg" alt="Unbound" width="160" height="160">
</p>

# Unbound E4B (wllama / browser builds) — *because there is no boundary*

> **No guarantee — use at your own risk.** Reduced safety filtering; can
> produce harmful or false output. Provided as-is.

Browser-safe GGUF quants of [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b)
for [wllama](https://github.com/ngxson/wllama). Built by
[Chromia](https://x.com/Chromia) and [Eval Engine](https://x.com/eval_engine).

> **Desktop / Ollama / llama.cpp / LM Studio users:** use
> [`evalengine/unbound-e4b-GGUF`](https://huggingface.co/evalengine/unbound-e4b-GGUF)
> instead — the desktop builds are faster and don't pay the embedding-precision
> compromise these browser-safe builds make.

## Why a separate repo?

E4B's `per_layer_token_embd` is a 2.82-billion-value tensor. At
llama.cpp's default Q6_K precision it lands at ~2.2 GB — over wllama's
2 GB ArrayBuffer cap. These variants force embeddings to `q5_K`
(~1.85 GB) so the largest part fits in the browser. Layer weights are
unchanged from the matching desktop quant.

A dedicated repo with the `unbound-e4b-wllama` model prefix prevents HF's
GGUF UI from aggregating these with the same-quant desktop files
(`unbound-e4b.Q4_K_M-...` vs `unbound-e4b-wllama.Q4_K_M-...`).

## Available quants

Each quant is shipped as a sharded multi-part GGUF
(`unbound-e4b-wllama.<QUANT>-NNNNN-of-NNNNN.gguf`). wllama auto-stitches
on the first part.

| Variant     | Parts | Total   | Notes |
|-------------|-------|---------|-------|
| Q4_K_M      | 4     | 4.51 GB | **Recommended** — layers @ Q4_K_M, embed @ q5_K |
| Q2_K        | 4     | 3.69 GB | Smallest browser-loadable — layers @ Q2_K, embed @ q5_K |

## Run

```js
// wllama (browser)
import { Wllama } from '@wllama/wllama';
const wllama = new Wllama(/* … */);
await wllama.loadModelFromHF(
  'evalengine/unbound-e4b-wllama-gguf',
  'unbound-e4b-wllama.Q4_K_M-00001-of-00004.gguf'
);
```

## Sampling

- **Creative / open-ended** → `temperature=1.0, top_p=0.95, top_k=64`.
- **Factual / brand questions** → drop `temperature` to ~0.3–0.5.

## Vision / image input (optional)

`mmproj-unbound-e4b.gguf` (vision projector, ~942 MB) is also in this
repo so browser users don't bounce between repos. Pair with any quant via
your wllama-compatible vision pipeline.

> **Disclaimer.** The vision encoder is **Google's original weights,
> unchanged** — abliteration only touched the language model. The LM is
> uncensored, but the vision encoder may still suppress features for
> content classes Google's base was tuned against. We have **not
> benchmarked the visual axis**. Treat as preview.

## Acknowledgements

Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) + HF
[TRL](https://github.com/huggingface/trl). Abliteration via
[heretic](https://github.com/p-e-w/heretic). Environment from
[autoresearch](https://github.com/karpathy/autoresearch). Compliance training data distilled from the [AEON](https://huggingface.co/AEON-7) uncensored teacher model.

## License

Apache-2.0, inherited from `google/gemma-4-E4B-it`. Full model card +
benchmarks at [`evalengine/unbound-e4b`](https://huggingface.co/evalengine/unbound-e4b).