FE2E INT8 (Pre-quantized for CPU)

Pre-quantized INT8 model for FE2E (CVPR 2026) monocular depth + surface normal estimation from a single image.

Demo Space: WeReCooking2/FE2E-CPU

Files

File Size Description
dit_int8_full.pt 12.4 GB Step1X-Edit DiT (12.4B params) + LDRN LoRA merged, dynamic INT8 quantized
vae_full.pt 335 MB AutoEncoder, FP32

Both files are saved with torch.save(model) (full model, not state_dict). Load with torch.load(..., mmap=True) to avoid doubling memory.

How it was made

  1. Loaded FP32 base model (step1x-edit-i1258.safetensors) on GPU
  2. Cast to FP32 on CPU
  3. Merged LDRN LoRA in full precision
  4. Applied torch.quantization.quantize_dynamic (INT8 on all nn.Linear layers)
  5. Saved full model with torch.save(model)

Usage

import torch

dit = torch.load("dit_int8_full.pt", map_location="cpu", weights_only=False, mmap=True)
vae = torch.load("vae_full.pt", map_location="cpu", weights_only=False, mmap=True)

Requires ~12 GB RAM with mmap loading.

Performance

Platform Time per image
GPU (RTX 5090, FP8 original) ~2s
CPU (HF free Space, INT8) ~29 min (768x1024)

Single denoise step, outputs both depth and surface normal maps simultaneously.

No ONNX: PyTorch dynamo exporter produces a broken graph (100% NaN output).

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WeReCooking2/FE2E-INT8

Finetuned
(3)
this model

Space using WeReCooking2/FE2E-INT8 1