Instructions to use welcoma/Bonsai-4B-bonsai_q1_f32-MLC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLC-LLM
How to use welcoma/Bonsai-4B-bonsai_q1_f32-MLC with MLC-LLM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Bonsai-4B bonsai_q1_f32 for MLC/WebLLM
This repository contains an experimental MLC/WebLLM conversion of
prism-ml/Bonsai-4B-unpacked.
It is a browser-runtime artifact, not a new model, fine-tune, GGUF, MLX, or ONNX
mirror.
The weights use the local bonsai_q1_f32 format: binary signs packed into
uint32 words with one FP32 scale per 128-wide group. Linear layers,
embeddings, and the final lm head are stored in this format.
Artifact Summary
| Field | Value |
|---|---|
| Source checkpoint | prism-ml/Bonsai-4B-unpacked |
| Architecture | Qwen3-shaped decoder |
| MLC model type | qwen3 |
| Quantization | bonsai_q1_f32 |
| Conversation template | qwen3_nothink |
| Context window in config | 32768 |
| Prefill chunk in config | 2048 |
| Total parameters | 4,021,784,576 |
| Quantized parameter size | 0.586 GB |
| Bits per parameter | 1.251 |
| Parameter shards | 19 |
| Artifact size | about 564 MB |
| WebGPU library | libs/bonsai-4b-bonsai_q1_f32-webgpu.wasm |
Runtime Requirement
This artifact requires an MLC/WebLLM runtime with Bonsai q1 support. It is not expected to load in an unmodified upstream WebLLM build until the Bonsai q1 runtime path is upstreamed.
Use this repository when you control the WebLLM runtime and want to test browser-local Bonsai inference through WebGPU.
WebLLM Configuration
const appConfig = {
model_list: [
{
model: "https://huggingface.co/welcoma/Bonsai-4B-bonsai_q1_f32-MLC/resolve/main/",
model_id: "Bonsai-4B-q1-MLC",
model_lib:
"https://huggingface.co/welcoma/Bonsai-4B-bonsai_q1_f32-MLC/resolve/main/libs/bonsai-4b-bonsai_q1_f32-webgpu.wasm",
overrides: {
context_window_size: 4096,
prefill_chunk_size: 512,
},
},
],
};
The smaller override values above are intended for local browser smoke tests. Increase them only after measuring browser memory and cache behavior on the target device.
Validation
The 4B artifact was converted and WebGPU-compiled on the GCP MLC/WebLLM builder VM, not on a local laptop.
- Source:
prism-ml/Bonsai-4B-unpacked - Quantization:
bonsai_q1_f32 - Conversion peak RAM: 7.491 GB on CPU
- WebGPU compile completed successfully
- Compile estimate without KV cache: 2152.93 MB
- Compile estimate with 4K KV cache: 3304.93 MB
- Hugging Face round-trip check confirmed README, WebGPU wasm, and no accidental
resolve/mirror path.
Limitations
- This is an experimental runtime artifact, not a general
transformersmodel checkpoint. - Quality evaluation is limited to conversion/runtime smoke checks; no benchmark score is claimed by this repository.
- Browser success depends on WebGPU support, available GPU memory, cache quota, and a compatible patched WebLLM runtime.
- The ternary Bonsai family is not represented by this q1 format. Ternary models need a separate 2-bit/ternary MLC path.
Provenance
Original model by Prism ML:
MLC/WebLLM conversion by welcoma.
- Downloads last month
- 7
Model tree for welcoma/Bonsai-4B-bonsai_q1_f32-MLC
Base model
prism-ml/Bonsai-4B-unpacked