cy0307 commited on
Commit
2f1cc3b
·
verified ·
1 Parent(s): 80e8448

Refresh model README

Browse files
Files changed (1) hide show
  1. README.md +20 -12
README.md CHANGED
@@ -74,7 +74,7 @@ before the multi-episode omni-model stage becomes a real held-out evaluation.
74
  | Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
75
  | Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split; companion simple/NN metadata baselines are also aligned to the selected 128-episode 96/16/16 split |
76
  | Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
77
- | Scale-up path | A first selected-episode Qwen3-Omni LoRA diagnostic pilot has completed on the 96/16/16 split; same-split simple/NN metadata baselines now cover the 12 task ids as a companion comparison. The Qwen result proves the multi-episode export/train/eval/package loop, but the weak held-out metrics make it a baseline for error analysis rather than a strong model. Cosmos 3/world-model and VLA/policy branches reuse the same split and package contract after their targets are implemented. |
78
  | Public surfaces | GitHub repo, GitHub Pages dashboard, GHCR static-site package, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
79
 
80
  For the fastest interpretation of the current metrics, start with
@@ -111,7 +111,7 @@ This project is best read as a staged embodied-AI research study:
111
  | Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
112
  | Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split; the selected 128-episode setup also has same-split simple/NN metadata baselines for JSON-supported tasks. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
113
  | Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
114
- | Scale-up | The selected 128-episode Qwen3-Omni LoRA diagnostic pilot has a verified validation-aware held-out package: 96/16/16 selected episodes, 3,808 exported windows, 512 validation windows, 448 held-out test windows, and public-safe metrics/predictions. Same-split simple/NN metadata baselines are published separately for the 12 task ids. JSON validity is 87.50%, below the 98% target, so the next pass focuses on structured-output reliability and task-quality error analysis. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`results/omni_finetune/verified_public/`](results/omni_finetune/verified_public/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
115
 
116
  Detailed dataset notes, reproduction checks, and generated JSON reports are
117
  included for readers who want to inspect the implementation, but they are
@@ -133,7 +133,7 @@ They give the current research state in one compact table:
133
  | Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented |
134
  | Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
135
  | Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links |
136
- | Qwen3-Omni multi-episode pilot | Verified diagnostic result package exists for the selected 96/16/16 episode split; current held-out metrics are weak and below the JSON-validity quality target |
137
  | Raw Xperience-10M data / full Qwen weights | Not redistributed |
138
 
139
  ## 90-Second Research Project Path
@@ -152,7 +152,7 @@ If you are reading the project cold, open these in order:
152
  | 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
153
  | 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
154
  | 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
155
- | 11 | What is still pending? | [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | The first held-out diagnostic pilot is verified; strong model quality remains pending because JSON validity is 87.50% and action/subtask metrics remain weak. |
156
 
157
  A compact reader-path summary is available at
158
  [`docs/data/project_packet.json`](docs/data/project_packet.json).
@@ -481,8 +481,8 @@ python scripts/train_all_modalities_model.py --workspace /path/to/workspace
481
 
482
  This repo includes a first Qwen3-Omni fine-tuning path over Xperience-10M. The
483
  repository separates public-sample evidence from multi-episode fine-tuning
484
- artifacts. The validation-aware selected-episode held-out package is now verified as a
485
- diagnostic pilot, not a strong final model.
486
  The useful distinction is:
487
 
488
  - direct Qwen3-Omni inputs: RGB/fisheye video, embedded MP4 audio, and language
@@ -505,6 +505,11 @@ for public README, website, or Hugging Face updates only after the validator
505
  passes and `scripts/omni/package_verified_omni_result.py` creates a
506
  public-safe derived-artifact package. The current verified package is listed in
507
  [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json).
 
 
 
 
 
508
 
509
  ### Sample Count Decision
510
 
@@ -544,13 +549,16 @@ Current status in this repo:
544
  - gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
545
  - selected_episode_plan: 128 source-balanced episodes, 96/16/16 train/val/test
546
  - selected_download_size: 277.71 GiB excluding `visualization.rrd`
547
- - verified_validation_aware_diagnostic_package: true
548
  - selected_split: 96 train / 16 validation / 16 held-out test episodes
549
  - exported_windows: 2,848 train / 512 validation / 448 test
550
  - validation_samples_used: 512
551
  - held_out_eval: 448 test windows from 14 exported test episodes
552
- - train_loss / val_loss: 0.4130 / 0.0331
553
- - current_quality_target: JSON validity 87.50%, below the 98% target
 
 
 
554
  - gated dataset: available for selected multi-episode data preparation
555
  - source_discovery: `results/omni_finetune/source_discovery.json`
556
  - data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
@@ -716,12 +724,12 @@ windows without depending on Qwen chat-message records.
716
  The public-safe verified package intentionally excludes raw data, base Qwen
717
  weights, LoRA weights, and full checkpoints. Adapter upload is a separate step:
718
  use it only when the intended adapter directory is present and the model card
719
- clearly distinguishes older smoke weights from the selected-episode diagnostic
720
- or validation-aware run.
721
 
722
  ```bash
723
  python3 scripts/omni/upload_qwen3_omni_lora_to_hf.py \
724
- --repo-id cy0307/ropedia-qwen3-omni-lora-smoke \
725
  --source-dir /path/to/adapter_upload_package \
726
  --message "Upload Xperience-10M Qwen3-Omni LoRA pilot"
727
  ```
 
74
  | Task suite | 12 human-readable embodied-AI task contracts with input, process, output, metrics, predictions, and case-study walkthroughs |
75
  | Baselines | Minimal linear/ridge/logistic heads plus compact PyTorch MLP task heads over the same chronological split; companion simple/NN metadata baselines are also aligned to the selected 128-episode 96/16/16 split |
76
  | Research directions | Task mapping and extension probes for human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling |
77
+ | Scale-up path | The selected-episode Qwen3-Omni LoRA final diagnostic result is verified on the 96/16/16 split; same-split simple/NN metadata baselines now cover the 12 task ids as a companion comparison. The Qwen result proves the multi-episode export/train/eval/package loop and meets the strict-JSON target, but weak action/subtask metrics make it a baseline for error analysis rather than a strong model. Cosmos3/world-model and VLA/policy branches reuse the same split and package contract after their targets are implemented. |
78
  | Public surfaces | GitHub repo, GitHub Pages dashboard, GHCR static-site package, HF Space, HF artifact dataset, HF baseline-model repo, and HF collection |
79
 
80
  For the fastest interpretation of the current metrics, start with
 
111
  | Task suite | Twelve human-readable tasks cover action, procedure, contact, object, language, retrieval, reconstruction, order, and synchronization questions. | [`RESEARCH_TAKEAWAYS.md`](RESEARCH_TAKEAWAYS.md), [`results/episode_task_suite/summary_report.json`](results/episode_task_suite/summary_report.json) |
112
  | Baselines | Minimal heads and compact PyTorch MLP heads provide a first controlled comparison on the same chronological split; the selected 128-episode setup also has same-split simple/NN metadata baselines for JSON-supported tasks. | [`results/episode_task_suite/neural_mlp/`](results/episode_task_suite/neural_mlp/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
113
  | Diagnostics | Audio contribution, modality ablations, timeline overlays, object labels, and alignment stress tests show which signals are useful and which tasks remain hard. | [`results/audio_ablation/AUDIO_ABLATION_SUMMARY.md`](results/audio_ablation/AUDIO_ABLATION_SUMMARY.md), [`docs/single_episode_explorer.html`](docs/single_episode_explorer.html) |
114
+ | Scale-up | The selected 128-episode Qwen3-Omni LoRA diagnostic path has a final verified held-out package: 96/16/16 selected episodes, 3,808 exported windows, 512 validation windows, 448 held-out test windows, and public-safe metrics/predictions. Same-split simple/NN metadata baselines are published for the 12 task ids, and the first Cosmos3-Nano future-window compatibility package is verified as a separate world-model branch. The final Qwen pass reaches 99.78% JSON validity, meeting the 98% target, while action/subtask quality remains weak and is the next error-analysis target. | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/omni_model_comparison.json`](docs/data/omni_model_comparison.json), [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`results/omni_finetune/OMNI_MODEL_COMPARISON.md`](results/omni_finetune/OMNI_MODEL_COMPARISON.md), [`results/omni_finetune/verified_public/`](results/omni_finetune/verified_public/), [`results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md`](results/omni_finetune/multi_episode_128_task_baselines/BASELINE_ALIGNMENT_REPORT.md) |
115
 
116
  Detailed dataset notes, reproduction checks, and generated JSON reports are
117
  included for readers who want to inspect the implementation, but they are
 
133
  | Dataset context | Official Xperience-10M links, sample-vs-gated-data boundary, modality coverage, and redistribution policy are documented |
134
  | Evaluation protocol | Verified generated protocol for windowing, split policy, leakage controls, and per-task metrics |
135
  | Website and Hub pages | Public dashboard, Hugging Face Space, artifact dataset, baseline model repo, and collection use the same project framing and links |
136
+ | Qwen3-Omni multi-episode pilot | Final verified diagnostic result package exists for the selected 96/16/16 episode split; JSON validity meets the target, while action/subtask metrics remain weak |
137
  | Raw Xperience-10M data / full Qwen weights | Not redistributed |
138
 
139
  ## 90-Second Research Project Path
 
152
  | 8 | What research directions does this support? | [`RESEARCH_ROADMAP.md`](RESEARCH_ROADMAP.md), [`docs/data/research_directions.json`](docs/data/research_directions.json), [`docs/data/research_direction_extensions.json`](docs/data/research_direction_extensions.json) | The tasks are mapped to human modeling, 3D/4D reconstruction, egocentric interaction, and world modeling. |
153
  | 9 | Which foundation model comes next? | [`FOUNDATION_MODEL_PLAN.md`](FOUNDATION_MODEL_PLAN.md), [`docs/data/foundation_model_plan.json`](docs/data/foundation_model_plan.json), [`XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md`](XPERIENCE_EMBODIED_FOUNDATION_MODEL_PRETRAINING.md) | Qwen3-Omni is the first held-out LoRA baseline; Cosmos 3 is the first world-model branch; policy models wait for explicit action targets; Xperience-native pretraining is the full-corpus future goal. |
154
  | 10 | How do I reproduce it? | [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md), [`notes/reproducibility_audit.md`](notes/reproducibility_audit.md) | Public commands and expected outputs are documented for the sample-episode task suite. |
155
+ | 11 | What is still pending? | [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json), [`DATA_ACCESS_STATUS.md`](results/omni_finetune/DATA_ACCESS_STATUS.md), [`MULTI_EPISODE_ACCESS_STATUS.md`](results/omni_finetune/MULTI_EPISODE_ACCESS_STATUS.md) | The final held-out diagnostic Qwen pass is verified and JSON-validity target is met; strong action/subtask model quality remains pending. |
156
 
157
  A compact reader-path summary is available at
158
  [`docs/data/project_packet.json`](docs/data/project_packet.json).
 
481
 
482
  This repo includes a first Qwen3-Omni fine-tuning path over Xperience-10M. The
483
  repository separates public-sample evidence from multi-episode fine-tuning
484
+ artifacts. The selected-episode held-out package is now verified as a
485
+ diagnostic result, not a strong final action/subtask model.
486
  The useful distinction is:
487
 
488
  - direct Qwen3-Omni inputs: RGB/fisheye video, embedded MP4 audio, and language
 
505
  passes and `scripts/omni/package_verified_omni_result.py` creates a
506
  public-safe derived-artifact package. The current verified package is listed in
507
  [`docs/data/omni_finetune_verified_result.json`](docs/data/omni_finetune_verified_result.json).
508
+ The current cross-version comparison is generated at
509
+ [`docs/data/omni_model_comparison.json`](docs/data/omni_model_comparison.json)
510
+ and [`results/omni_finetune/OMNI_MODEL_COMPARISON.md`](results/omni_finetune/OMNI_MODEL_COMPARISON.md);
511
+ it separates the single-episode task suite, 128-episode aligned simple/NN
512
+ baselines, and verified Qwen3/Cosmos model-branch packages.
513
 
514
  ### Sample Count Decision
515
 
 
549
  - gated_metadata_audit: 12,102 complete visible episodes across 802 complete sessions
550
  - selected_episode_plan: 128 source-balanced episodes, 96/16/16 train/val/test
551
  - selected_download_size: 277.71 GiB excluding `visualization.rrd`
552
+ - verified_final_diagnostic_package: true
553
  - selected_split: 96 train / 16 validation / 16 held-out test episodes
554
  - exported_windows: 2,848 train / 512 validation / 448 test
555
  - validation_samples_used: 512
556
  - held_out_eval: 448 test windows from 14 exported test episodes
557
+ - final_train_loss / final_val_loss: 0.0277 / 0.0278
558
+ - current_quality_target: JSON validity 99.78%, meeting the 98% target; action/subtask quality remains weak
559
+ - qwen3_lora_adapter_repo: https://huggingface.co/cy0307/ropedia-qwen3-omni-lora-128ep
560
+ - 128_aligned_baselines: 12 task ids, 8 simple metadata/text baselines, 6 neural metadata/text baselines
561
+ - cosmos3_branch: verified Cosmos3-Nano future-window compatibility package, 378 held-out future-window predictions from 14 test episodes
562
  - gated dataset: available for selected multi-episode data preparation
563
  - source_discovery: `results/omni_finetune/source_discovery.json`
564
  - data_status: `results/omni_finetune/DATA_ACCESS_STATUS.md`
 
724
  The public-safe verified package intentionally excludes raw data, base Qwen
725
  weights, LoRA weights, and full checkpoints. Adapter upload is a separate step:
726
  use it only when the intended adapter directory is present and the model card
727
+ clearly distinguishes older smoke weights from the final selected-episode
728
+ diagnostic run.
729
 
730
  ```bash
731
  python3 scripts/omni/upload_qwen3_omni_lora_to_hf.py \
732
+ --repo-id cy0307/ropedia-qwen3-omni-lora-128ep \
733
  --source-dir /path/to/adapter_upload_package \
734
  --message "Upload Xperience-10M Qwen3-Omni LoRA pilot"
735
  ```