--- license: apache-2.0 base_model: - bytedance-research/Lance base_model_relation: quantized pipeline_tag: any-to-any library_name: Lance tags: - multimodal - image-generation - image-editing - image-understanding - any-to-any - quantized - nvfp4 - fp4 - 4-bit - blackwell language: - en - zh --- # Lance-3B NVFP4 (image checkpoint) 4-bit floating-point quantized variant of [bytedance-research/Lance](https://huggingface.co/bytedance-research/Lance), the **`Lance_3B` image-focused checkpoint**, using NVIDIA's **NVFP4** format (E2M1 weights + FP8 E4M3 per-block scales). Targets Blackwell tensor cores (RTX 50-series, B100/B200) where it gets hardware-accelerated dequantization with 5–10× the throughput of INT4 once paired with TensorRT-LLM / vLLM ≥ 0.8. **File-size: 24.7 GB → ~6 GB (4×)** Companion to the AWQ INT4 image variant: [`Reza2kn/Lance-3B-AWQ-INT4`](https://huggingface.co/Reza2kn/Lance-3B-AWQ-INT4). Video-flavoured sibling: [`Reza2kn/Lance-3B-Video-NVFP4`](https://huggingface.co/Reza2kn/Lance-3B-Video-NVFP4). ## Format - 4-bit E2M1 codes per weight (LUT {±0, ±0.5, ±1, ±1.5, ±2, ±3, ±4, ±6}) - FP8 E4M3 scale per 16-element block (1 byte per 16 weights) - Average **4.5 bits per weight** - Both `scales_fp8` (uint8 bytes carrying float8_e4m3fn) and `scales_bf16` (redundant copy) are stored — drop one for slimmer storage if your runtime supports the other. See the [video sibling NVFP4 README](https://huggingface.co/Reza2kn/Lance-3B-Video-NVFP4) for the full FP4 LUT and storage layout — identical here. ## Calibration Same AWQ activation statistics as the AWQ-INT4 image variant — 252 und-path + 252 gen-path Linears, all with activation data, calibrated on Lance's bundled `x2t_image` + `t2i` example sets (108.5 M tokens total). ## File layout ``` Lance_3B-NVFP4/ ├── nvfp4_state_dict.safetensors # ~6 GB: packed FP4 + FP8 + bf16 scales + pass-through ├── nvfp4_meta.json # per-weight scheme + block_size + shape + FP4 LUT └── README.md ``` ## How to use Production: vLLM ≥ 0.8 / TensorRT-LLM on Blackwell (Lance not yet wired in but format is compatible). Verification: reference `WQLinearNVFP4` swap-in module at **https://github.com/Reza2kn/lance-quant**. ## License Apache 2.0, inherited from the base model.