--- license: mit library_name: diffusers pipeline_tag: text-to-image tags: - diffusers - image-generation - class-conditional - imagenet - pixnerd widget: - output: url: PixNerd-XL-16-512/demo.png language: - en --- # BiliSakura/PixNerd-diffusers Self-contained PixNerd-XL/16 checkpoints for Hugging Face diffusers. **No external code repo is required** — each subfolder ships its own `pipeline.py`, component modules, and weights. This repo is derived from the development bundle in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection), but inference only needs: - This model repo (`BiliSakura/PixNerd-diffusers`) - PyPI `diffusers`, `torch`, `huggingface_hub` This Hugging Face repo hosts **multiple self-contained checkpoints as subfolders**. Each subfolder includes its own `pipeline.py`, `model_index.json`, weights, and component code (`transformer/`, `scheduler/`). ## Available checkpoints | Subfolder | Resolution | Source checkpoint | | --- | --- | --- | | [`PixNerd-XL-16-256/`](PixNerd-XL-16-256/) | 256×256 | `epoch%3D319-step%3D1600000_emainit.ckpt` | | [`PixNerd-XL-16-512/`](PixNerd-XL-16-512/) | 512×512 | `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` | Both checkpoints are ImageNet class-conditional PixNerd-XL/16 exports with flow-matching sampling. ## Demo ![PixNerd-XL-16-512 demo](PixNerd-XL-16-512/demo.png) Class 207 — golden retriever, 512×512, 25 steps. ## ImageNet class labels Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style). - `pipe.id2label` — inspect id → English label correspondence - `pipe.labels` — reverse maps (English synonym → id), sorted for browsing - `pipe.get_label_ids("golden retriever")` - `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically - `pipe(prompt="golden retriever", ...)` — deprecated alias for `class_labels` Chinese labels are preserved in the main source repo under `src/labels/id2label_cn.json` for reference. ## Load from Hugging Face ```python import torch from diffusers import DiffusionPipeline variant = "PixNerd-XL-16-256" # or PixNerd-XL-16-512 resolution = 256 if variant.endswith("256") else 512 pipe = DiffusionPipeline.from_pretrained( f"BiliSakura/PixNerd-diffusers/{variant}", trust_remote_code=True, torch_dtype=torch.bfloat16, ).to("cuda") # Scheduler defaults: timeshift=3.0, order=2 (see scheduler/scheduler_config.json) images = pipe( class_labels="golden retriever", height=resolution, width=resolution, num_inference_steps=25, guidance_scale=4.0, ).images print(pipe.id2label[207]) # "golden retriever" pipe.get_label_ids("golden retriever") # [207] images = pipe(class_labels="golden retriever", height=resolution, width=resolution).images ``` ## Load from a local clone ```python import torch from diffusers import DiffusionPipeline repo = "models/BiliSakura/PixNerd-diffusers" variant = "PixNerd-XL-16-256" pipe = DiffusionPipeline.from_pretrained( f"{repo}/{variant}", trust_remote_code=True, torch_dtype=torch.bfloat16, ).to("cuda") images = pipe(class_labels="golden retriever", height=256, width=256).images ``` ## Repo layout ```text BiliSakura/PixNerd-diffusers/ ├── README.md ├── PixNerd-XL-16-256/ │ ├── README.md │ ├── pipeline.py │ ├── model_index.json │ ├── conversion_metadata.json │ ├── transformer/ │ └── scheduler/ └── PixNerd-XL-16-512/ ├── README.md ├── pipeline.py ├── model_index.json ├── conversion_metadata.json ├── transformer/ └── scheduler/ ``` ## Interface notes - The pipeline uses `class_labels` for ImageNet class conditioning (`prompt` remains a deprecated alias). - Pass integer ImageNet ids (`prompt=207`) or human-readable synonyms (`prompt="golden retriever"`). - `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size (16). - Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`. ## Limitations - Intended for ImageNet class-conditional generation. - No text encoder is included. - Output quality depends on scheduler settings and inference step count. ## Citation Source paper (ICLR 2026): - [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268) - [Hugging Face Papers page](https://huggingface.co/papers/2507.23268) Source code: - Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd) - Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers) ```bibtex @article{2507.23268, Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang}, Title = {PixNerd: Pixel Neural Field Diffusion}, Year = {2025}, Eprint = {arXiv:2507.23268}, } ```