--- title: Cosmos Predict Transfer Demo emoji: 🌍 colorFrom: green colorTo: blue sdk: gradio sdk_version: "5.9.0" python_version: "3.10" app_file: app.py pinned: false license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license hardware: zero-a10g --- # NVIDIA Cosmos World Foundation Models Demo Interactive demo for **Cosmos Predict2.5** and **Cosmos Transfer2.5** - NVIDIA's World Foundation Models for Physical AI. ## Models | Model | HuggingFace ID | Parameters | VRAM | Description | |-------|----------------|------------|------|-------------| | **Predict2.5-2B** | [nvidia/Cosmos-Predict2.5-2B](https://huggingface.co/nvidia/Cosmos-Predict2.5-2B) | 2.06B | 32.5 GB | Text/Image/Video to World generation | | **Transfer2.5-2B** | [nvidia/Cosmos-Transfer2.5-2B](https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B) | 2.36B | 65.4 GB | World-to-world translation | ## Features ### Predict2.5 (Tab 1) - **Text2World**: Generate video worlds from text descriptions - **Image2World**: Animate still images into video sequences - **Video2World**: Extend videos with future predictions ### Transfer2.5 (Tab 2) - **Style Transfer**: Transform videos between domains (day→night, sunny→rainy) - **Control Types**: blur, edge, depth, segmentation - **Structure Preservation**: Maintains spatial consistency ## Hardware Requirements - **ZeroGPU** with NVIDIA H200 (70GB VRAM) - Precision: BF16 only (FP16/FP32 not supported) ## Default Parameters | Parameter | Predict2.5 | Transfer2.5 | |-----------|------------|-------------| | Resolution | 720×480 | Input-dependent | | Frames | 49 (~3s at 16fps) | Input-dependent | | Inference Steps | 30 | 30 | | Guidance Scale | 7.0 | 7.0 | | Seed | 42 | 42 | ## API Usage ### Gradio Client ```python from gradio_client import Client client = Client("YOUR_SPACE_URL") # Text2World result = client.predict( prompt="A futuristic city at sunset", negative_prompt="low quality, blurry", num_frames=49, height=480, width=720, num_inference_steps=30, guidance_scale=7.0, seed=42, api_name="/run_predict_text2world" ) ``` ## Running Tests ```bash # All tests python -m tests.smoke_all # Predict2.5 only (if VRAM < 65GB) python -m tests.smoke_all --predict-only # Individual tests python -m tests.smoke_predict python -m tests.smoke_transfer ``` ## References - **Paper**: [arXiv: 2511.00062](https://arxiv.org/abs/2511.00062) - World Simulation with Video Foundation Models for Physical AI - **GitHub**: [cosmos-predict2.5](https://github.com/nvidia-cosmos/cosmos-predict2.5) | [cosmos-transfer2.5](https://github.com/nvidia-cosmos/cosmos-transfer2.5) - **License**: [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license) ## Key Dependencies ``` torch==2.5.1 diffusers>=0.34.0 transformers>=4.52.4 gradio>=5.0.0 ``` ## Citation ```bibtex @article{cosmos2025, title={World Simulation with Video Foundation Models for Physical AI}, author={NVIDIA}, journal={arXiv preprint arXiv:2511.00062}, year={2025} } ``` --- **Note**: First inference will download and cache models (~5-10 minutes). Subsequent runs use cached models.