---
title: Cosmos Predict Transfer Demo
emoji: 🌍
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: "5.9.0"
python_version: "3.10"
app_file: app.py
pinned: false
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
hardware: zero-a10g
---

# NVIDIA Cosmos World Foundation Models Demo

Interactive demo for **Cosmos Predict2.5** and **Cosmos Transfer2.5** - NVIDIA's World Foundation Models for Physical AI.

## Models

| Model | HuggingFace ID | Parameters | VRAM | Description |
|-------|----------------|------------|------|-------------|
| **Predict2.5-2B** | [nvidia/Cosmos-Predict2.5-2B](https://huggingface.co/nvidia/Cosmos-Predict2.5-2B) | 2.06B | 32.5 GB | Text/Image/Video to World generation |
| **Transfer2.5-2B** | [nvidia/Cosmos-Transfer2.5-2B](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) | 2.36B | 65.4 GB | World-to-world translation |

## Features

### Predict2.5 (Tab 1)
- **Text2World**: Generate video worlds from text descriptions
- **Image2World**: Animate still images into video sequences
- **Video2World**: Extend videos with future predictions

### Transfer2.5 (Tab 2)
- **Style Transfer**: Transform videos between domains (day→night, sunny→rainy)
- **Control Types**: blur, edge, depth, segmentation
- **Structure Preservation**: Maintains spatial consistency

## Hardware Requirements

- **ZeroGPU** with NVIDIA H200 (70GB VRAM)
- Precision: BF16 only (FP16/FP32 not supported)

## Default Parameters

| Parameter | Predict2.5 | Transfer2.5 |
|-----------|------------|-------------|
| Resolution | 720×480 | Input-dependent |
| Frames | 49 (~3s at 16fps) | Input-dependent |
| Inference Steps | 30 | 30 |
| Guidance Scale | 7.0 | 7.0 |
| Seed | 42 | 42 |

## API Usage

### Gradio Client

```python
from gradio_client import Client

client = Client("YOUR_SPACE_URL")

# Text2World
result = client.predict(
    prompt="A futuristic city at sunset",
    negative_prompt="low quality, blurry",
    num_frames=49,
    height=480,
    width=720,
    num_inference_steps=30,
    guidance_scale=7.0,
    seed=42,
    api_name="/run_predict_text2world"
)
```

## Running Tests

```bash
# All tests
python -m tests.smoke_all

# Predict2.5 only (if VRAM < 65GB)
python -m tests.smoke_all --predict-only

# Individual tests
python -m tests.smoke_predict
python -m tests.smoke_transfer
```

## References

- **Paper**: [arXiv: 2511.00062](https://arxiv.org/abs/2511.00062) - World Simulation with Video Foundation Models for Physical AI
- **GitHub**: [cosmos-predict2.5](https://github.com/nvidia-cosmos/cosmos-predict2.5) | [cosmos-transfer2.5](https://github.com/nvidia-cosmos/cosmos-transfer2.5)
- **License**: [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license)

## Key Dependencies

```
torch==2.5.1
diffusers>=0.34.0
transformers>=4.52.4
gradio>=5.0.0
```

## Citation

```bibtex
@article{cosmos2025,
  title={World Simulation with Video Foundation Models for Physical AI},
  author={NVIDIA},
  journal={arXiv preprint arXiv:2511.00062},
  year={2025}
}
```

---

**Note**: First inference will download and cache models (~5-10 minutes). Subsequent runs use cached models.