cy0307 commited on
Commit
0dce4ee
·
verified ·
1 Parent(s): 1e05f01

Update omni comparison in model repo

Browse files
results/omni_finetune/OMNI_MODEL_COMPARISON.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Omni Model Comparison
2
+
3
+ Generated: `2026-06-06T23:26:13+00:00`
4
+
5
+ Compare only rows with the same scope and target. Single-episode raw-feature metrics, 128-episode metadata baselines, Qwen3 structured JSON metrics, and Cosmos3 future-window metrics answer different questions.
6
+
7
+ ## Current Result Versions
8
+
9
+ | version | status | scope | source |
10
+ | --- | --- | --- | --- |
11
+ | Single-Episode Public-Sample Task Suite | verified | one public Xperience-10M sample episode | `results/episode_task_suite/summary_report.json` |
12
+ | 128-Episode Aligned Simple/NN Baselines | pass | selected 128-episode 96/16/16 split | `results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md` |
13
+ | 128-Episode Foundation-Model Branches | partial_verified | selected 128-episode split and compatible derived windows | `results/omni_finetune/verified_public/` |
14
+
15
+ Read the three rows this way:
16
+
17
+ - Version 1 is the public-sample 12-task harness with minimal and neural heads.
18
+ - Version 2 is the selected 128-episode same-split simple/NN baseline alignment.
19
+ - Version 3 is the verified model-branch layer: the current final Qwen3-Omni LoRA package is the JSON-task diagnostic result, while Cosmos3-Nano is a future-window compatibility result rather than a full Cosmos diffusion fine-tune.
20
+
21
+ ## 128-Episode Task Baselines
22
+
23
+ | task | simple | neural |
24
+ | --- | ---: | ---: |
25
+ | Action Recognition | macro_f1 0.0002 | macro_f1 0.0000 |
26
+ | Procedure Step Recognition | macro_f1 0.0000 | macro_f1 0.0000 |
27
+ | Action Boundary Detection | macro_f1 0.5220 | macro_f1 0.4582 |
28
+ | Next-Action Prediction | macro_f1 0.0002 | macro_f1 0.0000 |
29
+ | Hand Trajectory Forecasting | mpjpe | |
30
+ | Contact State Prediction | macro_f1 0.5168 | macro_f1 0.2195 |
31
+ | Object Relevance Prediction | micro_f1 0.1822 | micro_f1 0.1054 |
32
+ | Language Grounding | mrr 0.0128 | |
33
+ | Cross-Modal Retrieval | mrr | |
34
+ | Cross-Modal Reconstruction | r2 | |
35
+ | Temporal Order Verification | f1 0.3271 | |
36
+ | Multimodal Synchronization Detection | f1 | |
37
+
38
+ ## Verified Model Branches
39
+
40
+ | branch | backbone | eval samples | held-out episodes | key metrics |
41
+ | --- | --- | ---: | ---: | --- |
42
+ | Cosmos3-Nano Future-Window World Model | `cosmos_world_model` | 378 | 14 | future_retrieval_mrr=0.0221, temporal_consistency=0.0952, transition_accuracy=0.9683, contact_accuracy=0.7434 |
43
+ | Qwen3-Omni LoRA | `qwen3_omni_lora` | 448 | 14 | json_validity_rate=0.8750, action_macro_f1=0.0027, transition_accuracy=0.8504, contact_accuracy=0.6451 |
44
+ | Qwen3-Omni LoRA | `qwen3_omni_lora` | 448 | 14 | json_validity_rate=0.8527, action_macro_f1=0.0021, transition_accuracy=0.8281, contact_accuracy=0.6518 |
45
+ | Qwen3-Omni LoRA | `qwen3_omni_lora` | 448 | 14 | json_validity_rate=0.9978, action_macro_f1=0.0024, transition_accuracy=0.9710, contact_accuracy=0.7188 |
46
+
47
+ ## Pending
48
+
49
+ - Use the final Qwen3 full-eval package as the current Qwen result; older Qwen package rows remain historical diagnostics for comparison.
50
+ - Promote Cosmos3 from compatibility adapter to full Cosmos3 fine-tuning only after a separate environment with matching Diffusers/Cosmos dependencies is prepared.