---
license: mit
tags:
  - diffusion
  - flow-matching
  - latent-diffusion
  - image-generation
  - imagenet
library_name: pytorch
---

# LWD — Learning When to Denoise

EMA weights for **"Learning When to Denoise: Optimizing Asynchronous Schedules
for Latent Diffusion."**

- 📄 Paper: https://arxiv.org/abs/2606.19662
- 💻 Code: https://github.com/bsq532087/LWD

These are the EMA weights of the LightningDiT-XL/1 (675M-parameter) denoiser
trained with our learned asynchronous semantic–texture schedule on
class-conditional ImageNet 256×256.

## Checkpoints

| File | Training budget | Unguided FID | AutoGuidance FID |
|------|-----------------|:------------:|:----------------:|
| `xl_400k.pt` | 400K iter (≈80 epochs)  | 2.87 | 1.14 |
| `xl_1m.pt`   | 1M iter (≈200 epochs)   | 2.37 | 1.05 |
| `xl_3m.pt`   | 3M iter (≈600 epochs)   | 2.14 | 1.02 |

Each file is a slim checkpoint of the form `{'ema': state_dict}` and is drop-in
for the inference script in the code repository.

## Usage

```python
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download("bsq532087/LWD", "xl_3m.pt")
# then point the code repo's inference config / --ckpt at `ckpt_path`
```

The texture latent decoder (SD-VAE f16-d32) and the SemVAE semantic encoder are
inherited from SFD / LightningDiT; see the code repository for how to obtain
them.

## License & attribution

Released under the MIT License. The denoiser backbone derives from
[LightningDiT](https://github.com/hustvl/LightningDiT) and the semantic-first
latent setup / SemVAE encoder from [SFD](https://github.com/yuemingPAN/SFD);
please also respect the licenses of those projects.

## Citation

```bibtex
@article{qian2026learning,
  title   = {Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion},
  author  = {Qian, Bingshuo and Cheng, Xiang},
  journal = {arXiv preprint arXiv:2606.19662},
  year    = {2026},
}
```