ntsrigaud
/

maestro-lstm-hybrid

@@ -17,7 +17,7 @@ metrics:
 - accuracy
 - f1
 model-index:
-- name: two_stream_attn_v1_20260512T145906Z
   results:
   - task:
       type: gesture-recognition
@@ -26,12 +26,12 @@ model-index:
       type: IPN-Hand
     metrics:
     - type: accuracy
-      value: 0.9675
     - type: f1
-      value: 0.9641
 ---
-# two_stream_attn_v1_20260512T145906Z
 A real-time hand gesture classifier trained on
 a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
@@ -91,7 +91,7 @@ Input (B, T=32, 147)
 | Class | Description |
 |-------|-------------|
-| `unknown` | — |
 | `point_one` | Single-finger pointing gesture (continuous laser-pointer control) |
 | `point_two` | Two-finger pointing gesture (continuous annotation-pen control) |
 | `stop_sign` | Static open palm facing camera (Jester class) |
@@ -139,7 +139,7 @@ from maestro.infrastructure.model.checkpoint_loader import load_inference_artifa
 # Download the artifact (cached after first call)
 local_path = hf_hub_download(
     repo_id="ntsrigaud/maestro-lstm-hybrid",
-    filename="two_stream_attn_v1_20260512T145906Z_inference.pt",
 )
 # Load the artifact (includes model, class labels, and feature schema)
@@ -160,15 +160,19 @@ with torch.no_grad():
 ## Training Dataset
-- **Source**: Hybrid merge of Jester and IPN-Hand windows: Jester provides no_gesture/swiping/zoom/stop_sign classes; IPN-Hand provides point_one and point_two
-- **Used classes**: 10 (9 active gestures + `no_gesture` background)
 - **Dataset split**: 70% train / 15% val / 15% test (stratified by class)
 - **Augmentation**: temporal scale ±20%, spatial jitter σ=0.005
 ## Training Strategy
-Single-stage supervised training on IPN-Hand only.
-The model is initialized from scratch and optimized end-to-end on the target gesture set.
 ## Training Configuration
@@ -181,11 +185,11 @@ The model is initialized from scratch and optimized end-to-end on the target ges
 | Num layers | 4 |
 | MHA heads | 8 (head dim: 24) |
 | Dropout | 0.35 |
-| Learning rate | 0.001 |
 | Weight decay | 0.0005 |
 | Batch size | 128 |
-| Max epochs | 80 |
-| Early stopping patience | 20 |
 | Label smoothing | 0.05 |
 | Class weighting | disabled |
 | Max samples per class | 5000 |
@@ -195,22 +199,22 @@ The model is initialized from scratch and optimized end-to-end on the target ges
 | Metric | Value |
 |--------|-------|
-| Accuracy | 96.7% |
-| Macro F1 | 96.4% |
 ### Per-Class Recall
 | Class | Recall |
 |-------|--------|
-| `unknown` | 86.6% |
-| `point_one` | 98.4% |
-| `point_two` | 98.6% |
-| `stop_sign` | 98.4% |
-| `swiping_down` | 95.7% |
-| `swiping_left` | 99.1% |
-| `swiping_right` | 94.3% |
-| `swiping_up` | 94.2% |
-| `zooming_in_full_hand` | 98.1% |
 | `zooming_out_full_hand` | 97.1% |
 ## Comparison with Previous Architecture
@@ -228,7 +232,7 @@ The model is initialized from scratch and optimized end-to-end on the target ges
 - Trained on IPN Hand subjects only. Performance may degrade with unusual hand sizes,
   skin tones, or lighting conditions not represented in training data.
-- The `no_gesture` class represents background/transition frames. At runtime, predictions
   are filtered through per-class confidence thresholds defined in `production_hybrid.yaml`.
 - Requires **mediapipe>=0.10.14** for landmark extraction at inference time.
 - Not intended for safety-critical or accessibility-critical applications.
@@ -242,4 +246,4 @@ Estimated CO₂ equivalent: negligible (<0.001 kg CO₂eq).
 ---
-*Generated by the Maestro training pipeline on 2026-05-12.*

 - accuracy
 - f1
 model-index:
+- name: two_stream_attn_v1_finetune_20260513T050407Z
   results:
   - task:
       type: gesture-recognition
       type: IPN-Hand
     metrics:
     - type: accuracy
+      value: 0.9551
     - type: f1
+      value: 0.9481
 ---
+# two_stream_attn_v1_finetune_20260513T050407Z
 A real-time hand gesture classifier trained on
 a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
 | Class | Description |
 |-------|-------------|
+| `unknown` | Background / transition / no gesture |
 | `point_one` | Single-finger pointing gesture (continuous laser-pointer control) |
 | `point_two` | Two-finger pointing gesture (continuous annotation-pen control) |
 | `stop_sign` | Static open palm facing camera (Jester class) |
 # Download the artifact (cached after first call)
 local_path = hf_hub_download(
     repo_id="ntsrigaud/maestro-lstm-hybrid",
+    filename="two_stream_attn_v1_finetune_20260513T050407Z_inference.pt",
 )
 # Load the artifact (includes model, class labels, and feature schema)
 ## Training Dataset
+- **Source**: Hybrid merge of Jester and IPN-Hand windows: Jester provides unknown/swiping/zoom/stop_sign classes; IPN-Hand provides point_one and point_two
+- **Used classes**: 10 (9 active gestures + `unknown` background)
 - **Dataset split**: 70% train / 15% val / 15% test (stratified by class)
 - **Augmentation**: temporal scale ±20%, spatial jitter σ=0.005
 ## Training Strategy
+Two-phase transfer learning pipeline:
+- **Phase 1 (pretraining):** backbone pretrained on external checkpoint `two_stream_attn_v1_20260513T045733Z.pt` to learn generic gesture dynamics.
+- **Phase 2 (fine-tuning):** head replaced and model adapted on Hybrid Jester+IPN 10-gesture vocabulary.
+- **Stage A (frozen backbone):** 10 epoch(s) head-only warmup.
+- **Stage B (full model):** up to 58 epoch(s) joint fine-tuning with scheduler/early stopping.
+- **Stage B retention defences:** replay_max_samples_per_class=500, distillation_weight=0.5, replay_ce_weight=0.3, backbone_lr_multiplier=0.1, ewc_weight=100.0, gpm_components=20, forgetting_penalty_weight=0.5.
 ## Training Configuration
 | Num layers | 4 |
 | MHA heads | 8 (head dim: 24) |
 | Dropout | 0.35 |
+| Learning rate | 3e-05 |
 | Weight decay | 0.0005 |
 | Batch size | 128 |
+| Max epochs | 60 |
+| Early stopping patience | 12 |
 | Label smoothing | 0.05 |
 | Class weighting | disabled |
 | Max samples per class | 5000 |
 | Metric | Value |
 |--------|-------|
+| Accuracy | 95.5% |
+| Macro F1 | 94.8% |
 ### Per-Class Recall
 | Class | Recall |
 |-------|--------|
+| `unknown` | 82.8% |
+| `point_one` | 98.1% |
+| `point_two` | 97.2% |
+| `stop_sign` | 98.5% |
+| `swiping_down` | 92.2% |
+| `swiping_left` | 93.6% |
+| `swiping_right` | 88.5% |
+| `swiping_up` | 92.3% |
+| `zooming_in_full_hand` | 97.0% |
 | `zooming_out_full_hand` | 97.1% |
 ## Comparison with Previous Architecture
 - Trained on IPN Hand subjects only. Performance may degrade with unusual hand sizes,
   skin tones, or lighting conditions not represented in training data.
+- The `unknown` class represents background/transition frames. At runtime, predictions
   are filtered through per-class confidence thresholds defined in `production_hybrid.yaml`.
 - Requires **mediapipe>=0.10.14** for landmark extraction at inference time.
 - Not intended for safety-critical or accessibility-critical applications.
 ---
+*Generated by the Maestro training pipeline on 2026-05-13.*

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "model_version": "two_stream_attn_v1_20260512T145906Z",
   "model_config": {
     "model_name": "two_stream_attn_v1",
     "input_size": 147,
@@ -16,9 +16,9 @@
     "window_step": null
   },
   "training_config": {
-    "epochs": 80,
     "batch_size": 128,
-    "learning_rate": 0.001,
     "weight_decay": 0.0005,
     "grad_clip_norm": 1.0,
     "seed": 42,
@@ -32,33 +32,33 @@
     }
   },
   "evaluation": {
-    "test_accuracy": 0.9674507008790687,
-    "test_macro_f1": 0.9640774183053038,
-    "test_loss": 0.3977402173430542,
-    "calibration_ece": 0.035556920550729974,
     "per_class_recall": {
-      "unknown": 0.8656716417910447,
-      "point_one": 0.9835841313269493,
-      "point_two": 0.9864314789687924,
-      "stop_sign": 0.9835164835164835,
-      "swiping_down": 0.9568965517241379,
-      "swiping_left": 0.990909090909091,
-      "swiping_right": 0.9425287356321839,
-      "swiping_up": 0.9423076923076923,
-      "zooming_in_full_hand": 0.9808917197452229,
       "zooming_out_full_hand": 0.9712643678160919
     },
     "per_class_precision": {
-      "unknown": 0.9613259668508287,
-      "point_one": 0.9663978494623656,
-      "point_two": 0.9404915912031048,
-      "stop_sign": 0.9728260869565217,
-      "swiping_down": 0.9327731092436975,
-      "swiping_left": 0.990909090909091,
-      "swiping_right": 0.9647058823529412,
-      "swiping_up": 0.9932432432432432,
-      "zooming_in_full_hand": 0.9777777777777777,
-      "zooming_out_full_hand": 0.9854227405247813
     }
   },
   "class_labels": [
@@ -73,7 +73,7 @@
     "zooming_in_full_hand",
     "zooming_out_full_hand"
   ],
-  "created_at": "2026-05-12T15:07:23.016730+00:00",
   "gesture_command_mapping": {
     "commands": {
       "swiping_up": "start_presentation",

 {
+  "model_version": "two_stream_attn_v1_finetune_20260513T050407Z",
   "model_config": {
     "model_name": "two_stream_attn_v1",
     "input_size": 147,
     "window_step": null
   },
   "training_config": {
+    "epochs": 60,
     "batch_size": 128,
+    "learning_rate": 3e-05,
     "weight_decay": 0.0005,
     "grad_clip_norm": 1.0,
     "seed": 42,
     }
   },
   "evaluation": {
+    "test_accuracy": 0.955096222380613,
+    "test_macro_f1": 0.9481389392072146,
+    "test_loss": 0.41714808393697267,
+    "calibration_ece": 0.026080811808244665,
     "per_class_recall": {
+      "unknown": 0.8283582089552238,
+      "point_one": 0.9808481532147743,
+      "point_two": 0.9715061058344641,
+      "stop_sign": 0.9853479853479854,
+      "swiping_down": 0.9224137931034483,
+      "swiping_left": 0.9363636363636364,
+      "swiping_right": 0.8850574712643678,
+      "swiping_up": 0.9230769230769231,
+      "zooming_in_full_hand": 0.9697452229299363,
       "zooming_out_full_hand": 0.9712643678160919
     },
     "per_class_precision": {
+      "unknown": 0.9380281690140845,
+      "point_one": 0.9409448818897638,
+      "point_two": 0.9250645994832042,
+      "stop_sign": 0.972875226039783,
+      "swiping_down": 0.9385964912280702,
+      "swiping_left": 0.9716981132075472,
+      "swiping_right": 0.9746835443037974,
+      "swiping_up": 1.0,
+      "zooming_in_full_hand": 0.9712918660287081,
+      "zooming_out_full_hand": 0.9726618705035971
     }
   },
   "class_labels": [
     "zooming_in_full_hand",
     "zooming_out_full_hand"
   ],
+  "created_at": "2026-05-13T05:12:19.988870+00:00",
   "gesture_command_mapping": {
     "commands": {
       "swiping_up": "start_presentation",