---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - diffusers
  - image-generation
  - class-conditional
  - imagenet
  - pixnerd
widget:
  - output:
      url: PixNerd-XL-16-512/demo.png
language:
  - en
---

# BiliSakura/PixNerd-diffusers

Self-contained PixNerd-XL/16 checkpoints for Hugging Face diffusers. **No external code repo is required** — each subfolder ships its own `pipeline.py`, component modules, and weights.

This repo is derived from the development bundle in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection), but inference only needs:

- This model repo (`BiliSakura/PixNerd-diffusers`)
- PyPI `diffusers`, `torch`, `huggingface_hub`

This Hugging Face repo hosts **multiple self-contained checkpoints as subfolders**. Each subfolder includes its own `pipeline.py`, `model_index.json`, weights, and component code (`transformer/`, `scheduler/`).

## Available checkpoints

| Subfolder | Resolution | Source checkpoint |
| --- | --- | --- |
| [`PixNerd-XL-16-256/`](PixNerd-XL-16-256/) | 256×256 | `epoch%3D319-step%3D1600000_emainit.ckpt` |
| [`PixNerd-XL-16-512/`](PixNerd-XL-16-512/) | 512×512 | `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` |

Both checkpoints are ImageNet class-conditional PixNerd-XL/16 exports with flow-matching sampling.

## Demo

![PixNerd-XL-16-512 demo](PixNerd-XL-16-512/demo.png)

Class 207 — golden retriever, 512×512, 25 steps.

## ImageNet class labels

Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).

- `pipe.id2label` — inspect id → English label correspondence
- `pipe.labels` — reverse maps (English synonym → id), sorted for browsing
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically
- `pipe(prompt="golden retriever", ...)` — deprecated alias for `class_labels`

Chinese labels are preserved in the main source repo under `src/labels/id2label_cn.json` for reference.

## Load from Hugging Face

```python
import torch
from diffusers import DiffusionPipeline

variant = "PixNerd-XL-16-256"  # or PixNerd-XL-16-512
resolution = 256 if variant.endswith("256") else 512

pipe = DiffusionPipeline.from_pretrained(
    f"BiliSakura/PixNerd-diffusers/{variant}",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

# Scheduler defaults: timeshift=3.0, order=2 (see scheduler/scheduler_config.json)

images = pipe(
    class_labels="golden retriever",
    height=resolution,
    width=resolution,
    num_inference_steps=25,
    guidance_scale=4.0,
).images

print(pipe.id2label[207])          # "golden retriever"
pipe.get_label_ids("golden retriever")  # [207]
images = pipe(class_labels="golden retriever", height=resolution, width=resolution).images
```

## Load from a local clone

```python
import torch
from diffusers import DiffusionPipeline

repo = "models/BiliSakura/PixNerd-diffusers"
variant = "PixNerd-XL-16-256"

pipe = DiffusionPipeline.from_pretrained(
    f"{repo}/{variant}",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

images = pipe(class_labels="golden retriever", height=256, width=256).images
```

## Repo layout

```text
BiliSakura/PixNerd-diffusers/
├── README.md
├── PixNerd-XL-16-256/
│   ├── README.md
│   ├── pipeline.py
│   ├── model_index.json
│   ├── conversion_metadata.json
│   ├── transformer/
│   └── scheduler/
└── PixNerd-XL-16-512/
    ├── README.md
    ├── pipeline.py
    ├── model_index.json
    ├── conversion_metadata.json
    ├── transformer/
    └── scheduler/
```

## Interface notes

- The pipeline uses `class_labels` for ImageNet class conditioning (`prompt` remains a deprecated alias).
- Pass integer ImageNet ids (`prompt=207`) or human-readable synonyms (`prompt="golden retriever"`).
- `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size (16).
- Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`.

## Limitations

- Intended for ImageNet class-conditional generation.
- No text encoder is included.
- Output quality depends on scheduler settings and inference step count.

## Citation

Source paper (ICLR 2026):

- [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268)
- [Hugging Face Papers page](https://huggingface.co/papers/2507.23268)

Source code:

- Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd)
- Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers)

```bibtex
@article{2507.23268,
  Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
  Title = {PixNerd: Pixel Neural Field Diffusion},
  Year = {2025},
  Eprint = {arXiv:2507.23268},
}
```