File size: 3,078 Bytes
540e67a
 
476e8e8
4602161
eeac43c
 
 
540e67a
b7a466b
 
540e67a
 
 
 
 
 
45c1706
540e67a
9371cfb
 
eeac43c
9371cfb
 
 
 
540e67a
 
6a1869c
540e67a
eeac43c
 
 
 
 
 
 
 
540e67a
8a8926d
540e67a
45c1706
540e67a
 
 
 
eeac43c
 
 
 
 
 
 
 
 
 
 
 
540e67a
04c0bde
476e8e8
 
 
9371cfb
 
 
 
eeac43c
9371cfb
 
 
eeac43c
 
 
 
 
 
4602161
 
540e67a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Multi-Episode Access Status

Current status: access to the gated full `ropedia-ai/xperience-10m` dataset is
granted, and a metadata-only Hugging Face audit has been completed. A selected
128-episode pilot has produced a verified diagnostic Qwen3-Omni LoRA package
with held-out evaluation. The result is useful as a pipeline and error-analysis
baseline, not as a strong final model.

This file records the public data-access status and pilot requirements. It does
not include local-machine aliases, private paths, SSH hosts, or token locations.

## Selection Plan

| Item | Value |
| --- | ---: |
| Dataset | `ropedia-ai/xperience-10m` |
| Minimum pilot gate | 32 complete leaf episodes |
| Strategy | stratified round-robin across top-level session UUIDs |
| Metadata-audited visible complete episodes | 12,102 |
| Metadata-audited complete sessions | 802 |
| Current selected pilot | 128 source-balanced episodes |
| Recommended split | 96 train / 16 val / 16 test |
| Recommended estimated download | 277.71 GiB excluding `visualization.rrd` |
| Representative 32-episode estimate | ~70.5 GiB at median episode size |
| Smallest one-per-session 32-episode estimate | 35.35 GiB |
| Excluded file | `visualization.rrd` |

## Current Stage

The current Qwen3-Omni artifacts include a verified validation-monitored
diagnostic held-out run: 96/16/16 selected train/val/test episodes, 3,808
exported windows, 2,848 train examples, 512 validation examples, and 448
held-out test predictions from 14 exported test episodes. Training used eight
distributed accelerator processes for one epoch with LoRA rank 16 and recorded a
final train loss of 0.4130 plus a validation loss of 0.0331. The result verifies
the multi-episode pipeline and gives a real error-analysis baseline; it is still
not a strong final model.

A stronger model-quality pilot should be presented only after:

- selected valid episodes are available locally,
- the manifest builder confirms complete held-out episode splits,
- training finishes with recorded metadata and progress logs,
- evaluation runs on held-out test episodes,
- predictions, metrics, confusion matrices, and a run report are committed.
- JSON validity and action/subtask metrics improve beyond the current
  diagnostic baseline.

Current diagnostic metrics:

- JSON validity: 87.50%
- action macro-F1: 0.0027
- subtask accuracy: 0.0067
- transition accuracy: 0.8504
- next-action accuracy: 0.0246
- contact accuracy: 0.6451
- object micro-F1: 0.2230

The public data access summary is:

`results/omni_finetune/DATA_ACCESS_STATUS.md`

The current metadata-only full dataset audit is:

`results/omni_finetune/FULL_DATASET_METADATA_AUDIT.md`

The current 128-episode source-balanced download plan is:

`results/omni_finetune/XPERIENCE10M_128_EPISODE_SELECTION.md`

The current verified diagnostic package is:

`docs/data/omni_finetune_verified_result.json`

`results/omni_finetune/verified_public/`

The older machine-generated source discovery report remains a pre-access local
planning record:

`results/omni_finetune/DATA_BLOCKER_REPORT.md`