FE2E INT8 (Pre-quantized for CPU)

Pre-quantized INT8 model for FE2E (CVPR 2026) monocular depth + surface normal estimation from a single image.

Files

File	Size	Description
`dit_int8_full.pt`	12.4 GB	Step1X-Edit DiT (12.4B params) + LDRN LoRA merged, dynamic INT8 quantized
`vae_full.pt`	335 MB	AutoEncoder, FP32

Both files are saved with torch.save(model) (full model, not state_dict). Load with torch.load(..., mmap=True) to avoid doubling memory.

How it was made

Loaded FP32 base model (step1x-edit-i1258.safetensors) on GPU
Cast to FP32 on CPU
Merged LDRN LoRA in full precision
Applied torch.quantization.quantize_dynamic (INT8 on all nn.Linear layers)
Saved full model with torch.save(model)

Usage

import torch

dit = torch.load("dit_int8_full.pt", map_location="cpu", weights_only=False, mmap=True)
vae = torch.load("vae_full.pt", map_location="cpu", weights_only=False, mmap=True)

Requires ~12 GB RAM with mmap loading.

Performance

Platform	Time per image
GPU (RTX 5090, FP8 original)	~2s
CPU (HF free Space, INT8)	~29 min (768x1024)

Single denoise step, outputs both depth and surface normal maps simultaneously.

No ONNX: PyTorch dynamo exporter produces a broken graph (100% NaN output).

Credits

FE2E (CVPR 2026)
Step1X-Edit base model
rkfg/Step1X-Edit-FP8 FP8 quantization

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WeReCooking2/FE2E-INT8

Base model

stepfun-ai/Step1X-Edit

Finetuned

(3)

this model

WeReCooking2
/

FE2E-INT8