SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers Paper • 2605.22668 • Published 20 days ago • 40
Bridging the Gap: Addressing Discrepancies in Diffusion Model Training for Classifier-Free Guidance Paper • 2311.00938 • Published Nov 2, 2023 • 1
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 11
Improving Diffusion Models's Data-Corruption Resistance using Scheduled Pseudo-Huber Loss Paper • 2403.16728 • Published Mar 25, 2024 • 1
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 89
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis Paper • 2602.03139 • Published Feb 3 • 45
REX: Revisiting Budgeted Training with an Improved Schedule Paper • 2107.04197 • Published Jul 9, 2021 • 1
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published Nov 25, 2024 • 18
CAME: Confidence-guided Adaptive Memory Efficient Optimization Paper • 2307.02047 • Published Jul 5, 2023 • 3
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20, 2024 • 22
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21, 2024 • 27