--- license: apache-2.0 base_model: zhen-nan/L2P base_model_relation: quantized tags: - image-generation - text-to-image - int8 - comfyui - quantized ---
# L2P: Unlocking Latent Potential for Pixel Generation (INT8 Quantized)

Project Page arXiv Original Model

## 📦 Model Overview This repository contains a highly optimized, **INT8 quantized** version of the `model-1k-merge` from the [L2P (Latent-to-Pixel) framework](https://huggingface.co/zhen-nan/L2P). It has been specifically repackaged and compressed for **ComfyUI** users who want the native 4K capabilities of L2P without the massive 19.6 GB VRAM and storage footprint of the original 16-bit model. ### 🔬 Quantization Details This is a "healthy" mixed-precision quantization that carefully balances VRAM reduction with output fidelity: * **Size Reduction:** Reduced from **19.6 GB to 7.19 GB** (~63% smaller). * **Mixed Precision:** The heaviest matrix layers (like `qkv` and feed-forward networks) are quantized to `INT8` with an `F32` scaling factor. Highly sensitive layers—including layer norms, biases, and the entire `local_decoder`—remain in `BF16` to prevent color banding and maintain pristine image quality. * **ComfyUI Ready:** The state dict keys have been prefixed with `model.diffusion_model.` and the Attention Q/K/V tensors have been packed into a single matrix for seamless, drop-in compatibility with ComfyUI. ## 🚀 How to Use (ComfyUI) 1. Download the `model-1k-merge-INT8.safetensors` file. 2. Place it in your ComfyUI models directory: `ComfyUI/models/checkpoints/` (or your designated diffusion model folder). 3. Load it using the standard `Load Checkpoint` node in ComfyUI. 4. Because the model bypasses the traditional VAE memory bottlenecks, you can natively generate at massive resolutions (up to 4K) directly in pixel space. --- ## 📖 About the Original L2P Framework *An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead and data requirements.* Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the **Latent-to-Pixel (L2P)** transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. ### Key Innovations: * **No VAE Bottleneck:** L2P discards the VAE in favor of large-patch tokenization, unlocking native 4K ultra-high resolution generation. * **Efficient Transfer:** Freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation. * **Zero Real-Data Collection:** Utilizes LDM-generated synthetic images as the sole training corpus. L2P fits an already smooth data manifold, enabling rapid convergence. * **Accessible Scaling:** This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs. Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval. ## 📜 Citation If you use this model in your research or projects, please credit the original L2P authors: ```bibtex @article{l2p2026, title={L2P: Unlocking Latent Potential for Pixel Generation}, author={Original L2P Authors}, journal={arXiv preprint arXiv:2605.12013}, year={2026} }