ntsrigaud commited on
Commit
d167d8d
·
verified ·
1 Parent(s): 67c0c33

Upload two_stream_attn_v1_finetune_20260515T094538Z

Browse files
Files changed (2) hide show
  1. README.md +19 -19
  2. config.json +23 -23
README.md CHANGED
@@ -17,7 +17,7 @@ metrics:
17
  - accuracy
18
  - f1
19
  model-index:
20
- - name: two_stream_attn_v1_finetune_20260514T122537Z
21
  results:
22
  - task:
23
  type: gesture-recognition
@@ -26,12 +26,12 @@ model-index:
26
  type: IPN-Hand
27
  metrics:
28
  - type: accuracy
29
- value: 0.9584
30
  - type: f1
31
- value: 0.9573
32
  ---
33
 
34
- # two_stream_attn_v1_finetune_20260514T122537Z
35
 
36
  A real-time hand gesture classifier trained on
37
  a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
@@ -91,7 +91,7 @@ Input (B, T=32, 147)
91
 
92
  | Class | Description |
93
  |-------|-------------|
94
- | `palm` | Open palm held flat toward camera (static hand shape) |
95
  | `swiping_right` | Horizontal swipe from left to right |
96
  | `swiping_left` | Horizontal swipe from right to left |
97
  | `swiping_down` | Vertical swipe downward |
@@ -106,7 +106,7 @@ Input (B, T=32, 147)
106
 
107
  | Class | Mode | Command | Runtime handling |
108
  |-------|------|---------|------------------|
109
- | `palm` | `discrete` | `erase_annotations` | Discrete command via GestureActivationController CommandDispatcher |
110
  | `swiping_right` | `discrete` | `next_slide` | Discrete command via GestureActivationController → CommandDispatcher |
111
  | `swiping_left` | `discrete` | `previous_slide` | Discrete command via GestureActivationController → CommandDispatcher |
112
  | `swiping_down` | `discrete` | `stop_presentation` | Discrete command via GestureActivationController → CommandDispatcher |
@@ -139,7 +139,7 @@ from maestro.infrastructure.model.checkpoint_loader import load_inference_artifa
139
  # Download the artifact (cached after first call)
140
  local_path = hf_hub_download(
141
  repo_id="ntsrigaud/maestro-lstm-hybrid",
142
- filename="two_stream_attn_v1_finetune_20260514T122537Z_inference.pt",
143
  )
144
 
145
  # Load the artifact (includes model, class labels, and feature schema)
@@ -171,8 +171,8 @@ Two-phase transfer learning pipeline:
171
  - **Phase 1 (pretraining):** backbone pretrained on external checkpoint `two_stream_attn_v1_20260513T155730Z.pt` to learn generic gesture dynamics.
172
  - **Phase 2 (fine-tuning):** head replaced and model adapted on Hybrid Jester+IPN 10-gesture vocabulary.
173
  - **Stage A (frozen backbone):** 10 epoch(s) head-only warmup.
174
- - **Stage B (full model):** up to 60 epoch(s) joint fine-tuning with scheduler/early stopping.
175
- - **Stage B retention defences:** replay_max_samples_per_class=500, distillation_weight=0.0, replay_ce_weight=0.0, backbone_lr_multiplier=0.1, ewc_weight=2000.0, gpm_components=0, forgetting_penalty_weight=0.5.
176
 
177
  ## Training Configuration
178
 
@@ -199,23 +199,23 @@ Two-phase transfer learning pipeline:
199
 
200
  | Metric | Value |
201
  |--------|-------|
202
- | Accuracy | 95.8% |
203
- | Macro F1 | 95.7% |
204
 
205
  ### Per-Class Recall
206
 
207
  | Class | Recall |
208
  |-------|--------|
209
- | `palm` | 98.4% |
210
  | `swiping_right` | 95.3% |
211
- | `swiping_left` | 98.7% |
212
- | `swiping_down` | 96.8% |
213
  | `swiping_up` | 97.8% |
214
  | `zooming_in_full_hand` | 97.3% |
215
- | `zooming_out_full_hand` | 92.8% |
216
- | `point_one` | 97.1% |
217
- | `point_two` | 94.3% |
218
- | `unknown` | 88.9% |
219
 
220
  ## Comparison with Previous Architecture
221
 
@@ -246,4 +246,4 @@ Estimated CO₂ equivalent: negligible (<0.001 kg CO₂eq).
246
 
247
  ---
248
 
249
- *Generated by the Maestro training pipeline on 2026-05-14.*
 
17
  - accuracy
18
  - f1
19
  model-index:
20
+ - name: two_stream_attn_v1_finetune_20260515T094538Z
21
  results:
22
  - task:
23
  type: gesture-recognition
 
26
  type: IPN-Hand
27
  metrics:
28
  - type: accuracy
29
+ value: 0.9539
30
  - type: f1
31
+ value: 0.9527
32
  ---
33
 
34
+ # two_stream_attn_v1_finetune_20260515T094538Z
35
 
36
  A real-time hand gesture classifier trained on
37
  a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
 
91
 
92
  | Class | Description |
93
  |-------|-------------|
94
+ | `fist` | |
95
  | `swiping_right` | Horizontal swipe from left to right |
96
  | `swiping_left` | Horizontal swipe from right to left |
97
  | `swiping_down` | Vertical swipe downward |
 
106
 
107
  | Class | Mode | Command | Runtime handling |
108
  |-------|------|---------|------------------|
109
+ | `fist` | `unmapped` | `` | Not mapped in active command map |
110
  | `swiping_right` | `discrete` | `next_slide` | Discrete command via GestureActivationController → CommandDispatcher |
111
  | `swiping_left` | `discrete` | `previous_slide` | Discrete command via GestureActivationController → CommandDispatcher |
112
  | `swiping_down` | `discrete` | `stop_presentation` | Discrete command via GestureActivationController → CommandDispatcher |
 
139
  # Download the artifact (cached after first call)
140
  local_path = hf_hub_download(
141
  repo_id="ntsrigaud/maestro-lstm-hybrid",
142
+ filename="two_stream_attn_v1_finetune_20260515T094538Z_inference.pt",
143
  )
144
 
145
  # Load the artifact (includes model, class labels, and feature schema)
 
171
  - **Phase 1 (pretraining):** backbone pretrained on external checkpoint `two_stream_attn_v1_20260513T155730Z.pt` to learn generic gesture dynamics.
172
  - **Phase 2 (fine-tuning):** head replaced and model adapted on Hybrid Jester+IPN 10-gesture vocabulary.
173
  - **Stage A (frozen backbone):** 10 epoch(s) head-only warmup.
174
+ - **Stage B (full model):** up to 48 epoch(s) joint fine-tuning with scheduler/early stopping.
175
+ - **Stage B retention defences:** replay_max_samples_per_class=500, distillation_weight=0.0, replay_ce_weight=0.0, backbone_lr_multiplier=0.1, gpm_components=0, forgetting_penalty_weight=0.5.
176
 
177
  ## Training Configuration
178
 
 
199
 
200
  | Metric | Value |
201
  |--------|-------|
202
+ | Accuracy | 95.4% |
203
+ | Macro F1 | 95.3% |
204
 
205
  ### Per-Class Recall
206
 
207
  | Class | Recall |
208
  |-------|--------|
209
+ | `fist` | 96.6% |
210
  | `swiping_right` | 95.3% |
211
+ | `swiping_left` | 98.1% |
212
+ | `swiping_down` | 96.4% |
213
  | `swiping_up` | 97.8% |
214
  | `zooming_in_full_hand` | 97.3% |
215
+ | `zooming_out_full_hand` | 93.2% |
216
+ | `point_one` | 96.6% |
217
+ | `point_two` | 93.5% |
218
+ | `unknown` | 87.9% |
219
 
220
  ## Comparison with Previous Architecture
221
 
 
246
 
247
  ---
248
 
249
+ *Generated by the Maestro training pipeline on 2026-05-15.*
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "model_version": "two_stream_attn_v1_finetune_20260514T122537Z",
3
  "model_config": {
4
  "model_name": "two_stream_attn_v1",
5
  "input_size": 147,
@@ -32,37 +32,37 @@
32
  }
33
  },
34
  "evaluation": {
35
- "test_accuracy": 0.9584274740171712,
36
- "test_macro_f1": 0.9573115163058359,
37
- "test_loss": 0.41317880455315303,
38
- "calibration_ece": 0.03046416184897343,
39
  "per_class_recall": {
40
- "palm": 0.9842931937172775,
41
  "swiping_right": 0.9525959367945824,
42
- "swiping_left": 0.9869158878504672,
43
- "swiping_down": 0.9684418145956607,
44
  "swiping_up": 0.9775967413441955,
45
  "zooming_in_full_hand": 0.9727272727272728,
46
- "zooming_out_full_hand": 0.9282700421940928,
47
- "point_one": 0.9711286089238845,
48
- "point_two": 0.9429347826086957,
49
- "unknown": 0.8888888888888888
50
  },
51
  "per_class_precision": {
52
- "palm": 0.9868766404199475,
53
- "swiping_right": 0.9723502304147466,
54
- "swiping_left": 0.9795918367346939,
55
- "swiping_down": 0.958984375,
56
- "swiping_up": 0.97165991902834,
57
  "zooming_in_full_hand": 0.928416485900217,
58
- "zooming_out_full_hand": 0.9649122807017544,
59
- "point_one": 0.9002433090024331,
60
- "point_two": 0.9665738161559888,
61
- "unknown": 0.9498680738786279
62
  }
63
  },
64
  "class_labels": [
65
- "palm",
66
  "swiping_right",
67
  "swiping_left",
68
  "swiping_down",
@@ -73,7 +73,7 @@
73
  "point_two",
74
  "unknown"
75
  ],
76
- "created_at": "2026-05-14T12:30:37.922314+00:00",
77
  "gesture_command_mapping": {
78
  "commands": {
79
  "swiping_up": "start_presentation",
 
1
  {
2
+ "model_version": "two_stream_attn_v1_finetune_20260515T094538Z",
3
  "model_config": {
4
  "model_name": "two_stream_attn_v1",
5
  "input_size": 147,
 
32
  }
33
  },
34
  "evaluation": {
35
+ "test_accuracy": 0.9538638985005767,
36
+ "test_macro_f1": 0.9526959411661929,
37
+ "test_loss": 0.42990012703744873,
38
+ "calibration_ece": 0.030860847365347433,
39
  "per_class_recall": {
40
+ "fist": 0.9656357388316151,
41
  "swiping_right": 0.9525959367945824,
42
+ "swiping_left": 0.9813084112149533,
43
+ "swiping_down": 0.9644970414201184,
44
  "swiping_up": 0.9775967413441955,
45
  "zooming_in_full_hand": 0.9727272727272728,
46
+ "zooming_out_full_hand": 0.9324894514767933,
47
+ "point_one": 0.9658792650918635,
48
+ "point_two": 0.9347826086956522,
49
+ "unknown": 0.8790123456790123
50
  },
51
  "per_class_precision": {
52
+ "fist": 0.972318339100346,
53
+ "swiping_right": 0.9612756264236902,
54
+ "swiping_left": 0.9868421052631579,
55
+ "swiping_down": 0.9607072691552063,
56
+ "swiping_up": 0.9542743538767395,
57
  "zooming_in_full_hand": 0.928416485900217,
58
+ "zooming_out_full_hand": 0.9692982456140351,
59
+ "point_one": 0.8910411622276029,
60
+ "point_two": 0.9717514124293786,
61
+ "unknown": 0.9393139841688655
62
  }
63
  },
64
  "class_labels": [
65
+ "fist",
66
  "swiping_right",
67
  "swiping_left",
68
  "swiping_down",
 
73
  "point_two",
74
  "unknown"
75
  ],
76
+ "created_at": "2026-05-15T09:50:52.965929+00:00",
77
  "gesture_command_mapping": {
78
  "commands": {
79
  "swiping_up": "start_presentation",