mohan007
/

pi05-mlx-4bit

Model card Files Files and versions

mohan007 commited on May 4

Commit

d66b7b6

·

verified ·

1 Parent(s): 8df43a9

Add README

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+license: other
+tags:
+  - mlx
+  - robotics
+  - pi0.5
+  - quantized
+  - apple-silicon
+---
+# pi0.5 — 4-bit Quantized MLX Weights
+4-bit quantized weights for [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base) converted to Apple MLX format.
+Runs on **Apple Silicon (M1/M2/M3)** with ~2.4 GB RAM. Loads in ~6s, inference in ~2s per action chunk.
+## Architecture
+- PaliGemma 2B VLM (SigLIP + Gemma 2B) + Gemma 300M action expert
+- Flow-matching policy: 10-step Forward Euler denoising
+- Output: action chunk [B, 50, 32]
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+import mlx.core as mx
+import mlx.nn as nn
+# Download quantized weights (~2.6 GB, one-time)
+npz_path = hf_hub_download("mohan007/pi05-mlx-4bit", "pi05_mlx_4bit.npz")
+# Load with mlx_pi05
+from mlx_pi05.load import load_model
+model = load_model(quantized_path=npz_path, quantize=True)
+model.eval()
+# Run inference
+import numpy as np
+image_mlx = mx.array(np.zeros((1, 3, 224, 224), dtype=np.float32))
+lang_mlx  = mx.array(np.array([[1, 2, 3, 4, 5]], dtype=np.int32))
+actions   = model.sample_actions(image_mlx, lang_mlx)  # [1, 50, 32]
+```
+## Quantization
+- Gemma 2B + expert layers: 4-bit (group_size=64)
+- SigLIP kept in float16 (fc2 input dim 4304 not divisible by 64)
+- Total: ~2.4 GB vs ~7.2 GB float16
+## Source
+Converted from [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base) original float32 safetensors.