File size: 2,166 Bytes
ca4ac1c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Audio Ablation and Raw-Audio Upgrade
This report is generated from committed task-suite artifacts plus the local public-sample MP4 audio stream.
It measures whether audio changes each single-episode task under the same chronological split.
## Raw Audio Feature
- Source: `local_public_sample/fisheye_cam0.mp4`
- Has audio: `True`
- Sample rate: `16000`
- Window feature dim: `588`
- Feature: Per-window raw waveform STFT log-mel statistics plus delta and waveform envelope statistics.
## Task Deltas
| Task | Metric | Current audio | No audio | Current audio delta | Raw replaces audio | Raw replacement delta |
| --- | --- | ---: | ---: | ---: | ---: | ---: |
| Current Action Recognition | macro_f1 | 0.0091 | 0.0088 | 0.0003 | 0.0013 | -0.0077 |
| Current Subtask Recognition | macro_f1 | 0.0113 | 0.0112 | 0.0001 | 0.0008 | -0.0104 |
| Action Transition Detection | macro_f1 | 0.4621 | 0.4687 | -0.0066 | 0.4792 | 0.0171 |
| Next-Action Prediction | macro_f1 | 0.0106 | 0.0107 | -0.0001 | 0.0060 | -0.0046 |
| Future Hand Motion Forecasting | mae | 4.4664 | 4.3038 | -0.1626 | 4.3059 | 0.1605 |
| Contact State Prediction | macro_f1 | 1.0000 | 1.0000 | 0.0000 | 1.0000 | 0.0000 |
| Relevant Object Prediction | micro_f1 | 0.1581 | 0.1479 | 0.0102 | 0.1787 | 0.0206 |
| Language-to-Time Grounding | mrr | 0.0321 | 0.0272 | 0.0049 | 0.0248 | -0.0072 |
| Cross-Modal Window Retrieval | mrr | 0.3751 | 0.3892 | -0.0141 | 0.3275 | -0.0476 |
| Sensor-to-Visual Reconstruction | mae | 9.7942 | 10.4467 | 0.6524 | 8.8307 | 0.9635 |
| Temporal Order Verification | macro_f1 | 0.5172 | 0.4943 | 0.0230 | 0.5302 | 0.0129 |
| Cross-Modal Misalignment Detection | macro_f1 | 0.4173 | 0.4226 | -0.0052 | 0.4438 | 0.0264 |
## Aggregate
- Mean current-audio delta: `0.041849794979543296`
- Tasks where current handcrafted audio improves the primary metric: `6`
- Mean raw-replacement delta vs current handcrafted audio: `0.09362598132150173`
- Tasks where raw log-mel replacement improves over current handcrafted audio: `6`
Positive deltas always mean better according to each task's primary metric. For MAE tasks, lower MAE is converted into a positive improvement.
|