--- library_name: diffusers pipeline_tag: unconditional-image-generation tags: - diffusers - sit - image-generation - class-conditional - imagenet license: mit inference: true widget: - output: url: SiT-XL-2-512/demo.png language: - en --- # SiT-diffusers Diffusers-ready checkpoints for **Scalable Interpolant Transformers (SiT)**, converted for local/offline use. This root folder is a model collection that contains: - `SiT-S-2-256` - `SiT-B-2-256` - `SiT-L-2-256` - `SiT-XL-2-256` - `SiT-XL-2-512` Each subfolder is a self-contained Diffusers model repo with: - `pipeline.py` - `transformer/transformer_sit.py` - `scheduler/scheduler_config.json` (`FlowMatchEulerDiscreteScheduler`) - `transformer/diffusion_pytorch_model.safetensors` - `vae/diffusion_pytorch_model.safetensors` Each variant embeds English `id2label` directly in `model_index.json` (DiT-style), so class labels can be passed as ImageNet ids or English synonym strings. ## Demo ![SiT-XL-2-512 demo](SiT-XL-2-512/demo.png) Class-conditional sample (ImageNet class **207**, golden retriever), `SiT-XL/2` at 512×512, 250 steps, CFG 4.0, seed 0. ## Model Paths Use paths relative to this root README: | Model | Resolution | Local path | | --- | ---: | --- | | SiT-S/2 | 256x256 | `./SiT-S-2-256` | | SiT-B/2 | 256x256 | `./SiT-B-2-256` | | SiT-L/2 | 256x256 | `./SiT-L-2-256` | | SiT-XL/2 | 256x256 | `./SiT-XL-2-256` | | SiT-XL/2 | 512x512 | `./SiT-XL-2-512` | ## Inference Demo (Diffusers) ### 1) Load a local subfolder checkpoint ```python import torch from diffusers import DiffusionPipeline model_path = "./SiT-XL-2-512" # change to any path in the table above device = "cuda" if torch.cuda.is_available() else "cpu" pipe = DiffusionPipeline.from_pretrained( model_path, trust_remote_code=True, ).to(device) generator = torch.Generator(device=device).manual_seed(0) # ImageNet class example: 207 = golden retriever print(pipe.id2label[207]) print(pipe.get_label_ids("golden retriever")) # [207] result = pipe( class_labels="golden retriever", height=512, width=512, num_inference_steps=250, # official SiT comparisons commonly use 250 steps guidance_scale=4.0, generator=generator, ) image = result.images[0] image.save("sit_xl_512_demo.png") ``` ### 2) Quick variant switch (256 models) ```python model_path = "./SiT-S-2-256" # model_path = "./SiT-B-2-256" # model_path = "./SiT-L-2-256" # model_path = "./SiT-XL-2-256" pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device) image = pipe( class_labels=207, height=256, width=256, num_inference_steps=250, guidance_scale=4.0, generator=generator, ).images[0] image.save("sit_256_demo.png") ```