bsq532087
/

LWD

+---
+license: mit
+tags:
+  - diffusion
+  - flow-matching
+  - latent-diffusion
+  - image-generation
+  - imagenet
+library_name: pytorch
+---
+# LWD — Learning When to Denoise
+EMA weights for **"Learning When to Denoise: Optimizing Asynchronous Schedules
+for Latent Diffusion."**
+- 📄 Paper: https://arxiv.org/abs/2606.19662
+- 💻 Code: https://github.com/bsq532087/LWD
+These are the EMA weights of the LightningDiT-XL/1 (675M-parameter) denoiser
+trained with our learned asynchronous semantic–texture schedule on
+class-conditional ImageNet 256×256.
+## Checkpoints
+| File | Training budget | Unguided FID | AutoGuidance FID |
+|------|-----------------|:------------:|:----------------:|
+| `xl_400k.pt` | 400K iter (≈80 epochs)  | 2.87 | 1.14 |
+| `xl_1m.pt`   | 1M iter (≈200 epochs)   | 2.37 | 1.05 |
+| `xl_3m.pt`   | 3M iter (≈600 epochs)   | 2.14 | 1.02 |
+Each file is a slim checkpoint of the form `{'ema': state_dict}` and is drop-in
+for the inference script in the code repository.
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+ckpt_path = hf_hub_download("bsq532087/LWD", "xl_3m.pt")
+# then point the code repo's inference config / --ckpt at `ckpt_path`
+```
+The texture latent decoder (SD-VAE f16-d32) and the SemVAE semantic encoder are
+inherited from SFD / LightningDiT; see the code repository for how to obtain
+them.
+## License & attribution
+Released under the MIT License. The denoiser backbone derives from
+[LightningDiT](https://github.com/hustvl/LightningDiT) and the semantic-first
+latent setup / SemVAE encoder from [SFD](https://github.com/yuemingPAN/SFD);
+please also respect the licenses of those projects.
+## Citation
+```bibtex
+@article{qian2026learning,
+  title   = {Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion},
+  author  = {Qian, Bingshuo and Cheng, Xiang},
+  journal = {arXiv preprint arXiv:2606.19662},
+  year    = {2026},
+}
+```