cosmos-predict-transfer-demo

Paused

App Files Files Community

wbw2000 commited on Jan 26

Commit

585c8d6

verified ·

1 Parent(s): f829a7a

Upload REPORT.md with huggingface_hub

Browse files

Files changed (1) hide show

REPORT.md +310 -0

REPORT.md ADDED Viewed

	@@ -0,0 +1,310 @@

+# NVIDIA Cosmos World Foundation Models - Deployment Report
+## Summary
+| Item | Value |
+|------|-------|
+| **Space URL** | https://huggingface.co/spaces/wbw2000/cosmos-predict-transfer-demo |
+| **Commit Hash** | `f829a7a65deb6afc884d48e479a0c02f16885e24` |
+| **Created** | 2026-01-26 |
+| **Status** | Building / Pending Verification |
+---
+## 1. Model Information
+### Cosmos Predict2.5-2B
+| Property | Value |
+|----------|-------|
+| **Model ID** | `nvidia/Cosmos-Predict2.5-2B` |
+| **Parameters** | 2,059,174,912 (~2.06B) |
+| **VRAM Required** | 32.54 GB |
+| **Precision** | BF16 only |
+| **Capabilities** | Text2World, Image2World, Video2World |
+### Cosmos Transfer2.5-2B
+| Property | Value |
+|----------|-------|
+| **Model ID** | `nvidia/Cosmos-Transfer2.5-2B` |
+| **Parameters** | 2,358,047,744 (~2.36B) |
+| **VRAM Required** | 65.4 GB |
+| **Precision** | BF16 only |
+| **Control Inputs** | Blur, Edge, Depth, Segmentation |
+---
+## 2. Hardware Configuration
+| Property | Value |
+|----------|-------|
+| **Hardware** | ZeroGPU (NVIDIA H200) |
+| **VRAM** | 70 GB |
+| **Supported** | Both Predict2.5 (32GB) and Transfer2.5 (65GB) |
+| **GPU Duration** | Predict: 300s, Transfer: 420s |
+### Why ZeroGPU/H200
+- Predict2.5-2B requires 32.54 GB → A10G (24GB) insufficient
+- Transfer2.5-2B requires 65.4 GB → Only H200 (70GB) or A100 (80GB) sufficient
+- ZeroGPU H200 (70GB) is the most cost-effective option on HuggingFace
+---
+## 3. Key Dependencies
+```
+torch==2.5.1
+diffusers>=0.34.0
+transformers>=4.52.4
+accelerate>=1.7.0
+gradio>=5.0.0
+av>=14.0.0
+opencv-python-headless>=4.8.0
+imageio>=2.31.0
+scikit-image>=0.21.0
+```
+---
+## 4. Default Parameters
+### Predict2.5
+| Parameter | Default | Range |
+|-----------|---------|-------|
+| Resolution | 720×480 | 720×480 or 1280×720 |
+| Frames | 49 (~3s) | 17-97 |
+| Inference Steps | 30 | 10-50 |
+| Guidance Scale | 7.0 | 1.0-15.0 |
+| Seed | 42 | Any integer |
+### Transfer2.5
+| Parameter | Default | Range |
+|-----------|---------|-------|
+| Control Type | blur | blur, edge, depth, segmentation |
+| Inference Steps | 30 | 10-50 |
+| Guidance Scale | 7.0 | 1.0-15.0 |
+| Control Scale | 1.0 | 0.5-2.0 |
+| Seed | 42 | Any integer |
+---
+## 5. Smoke Test Design
+### Tests Implemented
+#### Predict2.5 Tests (`tests/smoke_predict.py`)
+1. **Output Validation**: Verify video utilities work correctly
+2. **Model Loading**: Verify Predict2.5-2B can be loaded
+3. **Text2World Inference**: Generate video from text prompt
+#### Transfer2.5 Tests (`tests/smoke_transfer.py`)
+1. **Control Extraction**: Verify edge/depth extraction works
+2. **Style Consistency**: Verify SSIM computation works
+3. **Model Loading**: Verify Transfer2.5-2B can be loaded
+4. **Video Inference**: Apply style transfer to video
+### Running Tests
+```bash
+# All tests
+python -m tests.smoke_all
+# Predict2.5 only (if VRAM < 65GB)
+python -m tests.smoke_all --predict-only
+# Individual modules
+python -m tests.smoke_predict
+python -m tests.smoke_transfer
+```
+### Expected Output
+```
+======================================================================
+NVIDIA COSMOS WORLD FOUNDATION MODELS - SMOKE TESTS
+======================================================================
+[System Information]
+  python_version: 3.11.x
+  torch_version: 2.5.1
+  cuda_available: True
+  gpu_name: NVIDIA H200
+  gpu_total_vram_gb: 70.0
+[PREDICT2.5 TESTS]
+  output_validation: PASSED
+  model_loading: PASSED
+  text2world_inference: PASSED
+[TRANSFER2.5 TESTS]
+  control_extraction: PASSED
+  style_consistency: PASSED
+  model_loading: PASSED
+  video_inference: PASSED
+Overall: 7/7 tests passed
+STATUS: ALL TESTS PASSED
+```
+---
+## 6. Paper Consistency Validation
+### Reference
+**Paper**: arXiv 2511.00062 - "World Simulation with Video Foundation Models for Physical AI"
+### Validation Points
+#### 6.1 Predict2.5 - Temporal Consistency (Section 4.2)
+**Paper Claim**: "Generated videos maintain reasonable spatiotemporal continuity in short-term prediction"
+**Validation Method**:
+- Generate N=3 videos with different seeds
+- Compute mean frame-to-frame pixel difference
+- Verify differences are smooth (mean_diff < 50)
+**Metric**: Mean adjacent frame difference (pixel intensity 0-255 scale)
+**Pass Criteria**: mean_diff < 50 for majority of samples
+#### 6.2 Predict2.5 - Reproducibility (Section 5)
+**Paper Claim**: "Fixed random seeds produce deterministic outputs"
+**Validation Method**:
+- Run same prompt with same seed twice
+- Compute SSIM between corresponding frames
+- Verify outputs are nearly identical
+**Metric**: Mean SSIM between run1 and run2
+**Pass Criteria**: mean_ssim > 0.95
+#### 6.3 Transfer2.5 - Structure Preservation (Section 3.2)
+**Paper Claim**: "Cosmos-Transfer2.5 preserves structural consistency during domain transfer"
+**Validation Method**:
+- Extract edge maps from input and output
+- Compute SSIM between edge maps
+- Verify edges are preserved
+**Metric**: Edge SSIM between input and output
+**Pass Criteria**: mean_edge_ssim > 0.3
+#### 6.4 Transfer2.5 - Domain Change (Section 4.3)
+**Paper Claim**: "Model can perform world-to-world translation (e.g., day→night)"
+**Validation Method**:
+- Apply day→night transfer
+- Compute SSIM between input and output
+- Verify output differs from input while maintaining structure
+**Metric**: Pixel SSIM between input and output
+**Pass Criteria**: 0.1 < mean_ssim < 0.9 (different but not random)
+### Running Validation
+```bash
+python -m tests.paper_validation
+python -m tests.paper_validation --skip-transfer  # If VRAM limited
+```
+---
+## 7. Limitations & Future Work
+### Current Limitations
+| Limitation | Reason | Potential Solution |
+|------------|--------|-------------------|
+| Transfer2.5 may fail on ZeroGPU | 65.4GB very close to 70GB limit | Use A100 80GB or quantization |
+| Long cold start | Large model downloads | Use model caching |
+| Limited output length | Avoid timeout (5 min) | Increase GPU duration |
+| No multi-view support | Not implemented | Add multi-view inference |
+### Not Covered from Paper
+1. **Multi-view generation** (Section 3.3) - Not implemented
+2. **Autonomous Vehicle post-training** (Section 4.1) - Not included
+3. **Full benchmark evaluation** (Section 5.2) - Only smoke tests
+4. **14B model variants** - VRAM insufficient
+### Future Improvements
+1. Add A100 80GB option for better Transfer2.5 stability
+2. Implement multi-view inference for robotics use cases
+3. Add quantized model variants for lower VRAM
+4. Implement full paper benchmark suite
+5. Add video-to-video inference for Predict2.5
+---
+## 8. API Usage Examples
+### Gradio Client (Python)
+```python
+from gradio_client import Client
+# Connect to Space
+client = Client("wbw2000/cosmos-predict-transfer-demo")
+# Text2World
+result = client.predict(
+    prompt="A peaceful garden with butterflies",
+    negative_prompt="low quality, blurry",
+    num_frames=49,
+    height=480,
+    width=720,
+    num_inference_steps=30,
+    guidance_scale=7.0,
+    seed=42,
+    api_name="/run_predict_text2world"
+)
+video_path, log = result
+print(f"Video saved to: {video_path}")
+```
+### REST API (curl)
+```bash
+curl -X POST "https://wbw2000-cosmos-predict-transfer-demo.hf.space/api/run_predict_text2world" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data": [
+      "A futuristic city at sunset",
+      "low quality, blurry",
+      49, 480, 720, 30, 7.0, 42
+    ]
+  }'
+```
+---
+## 9. References
+- **Paper**: https://arxiv.org/abs/2511.00062
+- **Predict2.5 GitHub**: https://github.com/nvidia-cosmos/cosmos-predict2.5
+- **Transfer2.5 GitHub**: https://github.com/nvidia-cosmos/cosmos-transfer2.5
+- **Predict2.5 HuggingFace**: https://huggingface.co/nvidia/Cosmos-Predict2.5-2B
+- **Transfer2.5 HuggingFace**: https://huggingface.co/nvidia/Cosmos-Transfer2.5-2B
+- **License**: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
+---
+## 10. Changelog
+| Date | Change |
+|------|--------|
+| 2026-01-26 | Initial deployment to HuggingFace Spaces |
+| - | Commit: f829a7a65deb6afc884d48e479a0c02f16885e24 |
+---
+*Report generated: 2026-01-26*
+*Author: Claude Code (Anthropic)*