FE2E INT8 (Pre-quantized for CPU)
Pre-quantized INT8 model for FE2E (CVPR 2026) monocular depth + surface normal estimation from a single image.
Demo Space: WeReCooking2/FE2E-CPU
Files
| File | Size | Description |
|---|---|---|
dit_int8_full.pt |
12.4 GB | Step1X-Edit DiT (12.4B params) + LDRN LoRA merged, dynamic INT8 quantized |
vae_full.pt |
335 MB | AutoEncoder, FP32 |
Both files are saved with torch.save(model) (full model, not state_dict). Load with torch.load(..., mmap=True) to avoid doubling memory.
How it was made
- Loaded FP32 base model (
step1x-edit-i1258.safetensors) on GPU - Cast to FP32 on CPU
- Merged LDRN LoRA in full precision
- Applied
torch.quantization.quantize_dynamic(INT8 on allnn.Linearlayers) - Saved full model with
torch.save(model)
Usage
import torch
dit = torch.load("dit_int8_full.pt", map_location="cpu", weights_only=False, mmap=True)
vae = torch.load("vae_full.pt", map_location="cpu", weights_only=False, mmap=True)
Requires ~12 GB RAM with mmap loading.
Performance
| Platform | Time per image |
|---|---|
| GPU (RTX 5090, FP8 original) | ~2s |
| CPU (HF free Space, INT8) | ~29 min (768x1024) |
Single denoise step, outputs both depth and surface normal maps simultaneously.
No ONNX: PyTorch dynamo exporter produces a broken graph (100% NaN output).
Credits
- FE2E (CVPR 2026)
- Step1X-Edit base model
- rkfg/Step1X-Edit-FP8 FP8 quantization
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for WeReCooking2/FE2E-INT8
Base model
stepfun-ai/Step1X-Edit