ntsrigaud
/

maestro-lstm-hybrid

@@ -17,7 +17,7 @@ metrics:
 - accuracy
 - f1
 model-index:
-- name: two_stream_attn_v1_finetune_20260515T104743Z
   results:
   - task:
       type: gesture-recognition
@@ -26,12 +26,12 @@ model-index:
       type: IPN-Hand
     metrics:
     - type: accuracy
-      value: 0.9566
     - type: f1
-      value: 0.9556
 ---
-# two_stream_attn_v1_finetune_20260515T104743Z
 A real-time hand gesture classifier trained on
 a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
@@ -43,7 +43,7 @@ standard webcam using MediaPipe for landmark extraction.
 ## Model Description
 - **Architecture**: EnhancedTwoStreamLSTM (BiLSTM h=96×2, MHA 8 heads, proj=96, mean+max pool, MLP gate)
-- **Parameters**: 2,099,434
 - **Input**: `(batch, 16, 147)`
     — 16-frame sliding window at 30 FPS ≈ 533 ms
 - **Output**: Softmax logits over 10 gesture classes
@@ -139,7 +139,7 @@ from maestro.infrastructure.model.checkpoint_loader import load_inference_artifa
 # Download the artifact (cached after first call)
 local_path = hf_hub_download(
     repo_id="ntsrigaud/maestro-lstm-hybrid",
-    filename="two_stream_attn_v1_finetune_20260515T104743Z_inference.pt",
 )
 # Load the artifact (includes model, class labels, and feature schema)
@@ -168,10 +168,10 @@ with torch.no_grad():
 ## Training Strategy
 Two-phase transfer learning pipeline:
-- **Phase 1 (pretraining):** backbone pretrained on external checkpoint `two_stream_attn_v1_20260513T155730Z.pt` to learn generic gesture dynamics.
 - **Phase 2 (fine-tuning):** head replaced and model adapted on Hybrid Jester+IPN 10-gesture vocabulary.
 - **Stage A (frozen backbone):** 10 epoch(s) head-only warmup.
-- **Stage B (full model):** up to 80 epoch(s) joint fine-tuning with scheduler/early stopping.
 - **Stage B retention defences:** replay_max_samples_per_class=500, distillation_weight=0.0, replay_ce_weight=0.0, backbone_lr_multiplier=0.1, ewc_weight=N/A, gpm_components=0, forgetting_penalty_weight=0.5.
 ## Training Configuration
@@ -199,23 +199,23 @@ Two-phase transfer learning pipeline:
 | Metric | Value |
 |--------|-------|
-| Accuracy | 95.7% |
-| Macro F1 | 95.6% |
 ### Per-Class Recall
 | Class | Recall |
 |-------|--------|
-| `fist` | 96.2% |
-| `swiping_right` | 95.5% |
-| `swiping_left` | 98.7% |
-| `swiping_down` | 96.6% |
-| `swiping_up` | 97.6% |
-| `zooming_in_full_hand` | 97.5% |
-| `zooming_out_full_hand` | 93.9% |
 | `point_one` | 97.4% |
-| `point_two` | 94.3% |
-| `unknown` | 87.7% |
 ## Comparison with Previous Architecture
@@ -226,7 +226,7 @@ Two-phase transfer learning pipeline:
 | Feature projection | No | **Yes (→96)** |
 | Temporal pooling | Mean only | **Mean + Max** |
 | Cross-stream fusion | Concat only | **2-layer MLP gate** |
-| Parameters | ~182 K | ~2,099,434 |
 ## Limitations and Risks

 - accuracy
 - f1
 model-index:
+- name: two_stream_attn_v1_2layer_ld_cong_finetune_20260515T134706Z
   results:
   - task:
       type: gesture-recognition
       type: IPN-Hand
     metrics:
     - type: accuracy
+      value: 0.9606
     - type: f1
+      value: 0.9587
 ---
+# two_stream_attn_v1_2layer_ld_cong_finetune_20260515T134706Z
 A real-time hand gesture classifier trained on
 a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
 ## Model Description
 - **Architecture**: EnhancedTwoStreamLSTM (BiLSTM h=96×2, MHA 8 heads, proj=96, mean+max pool, MLP gate)
+- **Parameters**: 1,208,554
 - **Input**: `(batch, 16, 147)`
     — 16-frame sliding window at 30 FPS ≈ 533 ms
 - **Output**: Softmax logits over 10 gesture classes
 # Download the artifact (cached after first call)
 local_path = hf_hub_download(
     repo_id="ntsrigaud/maestro-lstm-hybrid",
+    filename="two_stream_attn_v1_2layer_ld_cong_finetune_20260515T134706Z_inference.pt",
 )
 # Load the artifact (includes model, class labels, and feature schema)
 ## Training Strategy
 Two-phase transfer learning pipeline:
+- **Phase 1 (pretraining):** backbone pretrained on external checkpoint `two_stream_attn_v1_2layer_pretrain_20260515T125437Z.pt` to learn generic gesture dynamics.
 - **Phase 2 (fine-tuning):** head replaced and model adapted on Hybrid Jester+IPN 10-gesture vocabulary.
 - **Stage A (frozen backbone):** 10 epoch(s) head-only warmup.
+- **Stage B (full model):** up to 66 epoch(s) joint fine-tuning with scheduler/early stopping.
 - **Stage B retention defences:** replay_max_samples_per_class=500, distillation_weight=0.0, replay_ce_weight=0.0, backbone_lr_multiplier=0.1, ewc_weight=N/A, gpm_components=0, forgetting_penalty_weight=0.5.
 ## Training Configuration
 | Metric | Value |
 |--------|-------|
+| Accuracy | 96.1% |
+| Macro F1 | 95.9% |
 ### Per-Class Recall
 | Class | Recall |
 |-------|--------|
+| `fist` | 97.3% |
+| `swiping_right` | 97.1% |
+| `swiping_left` | 98.3% |
+| `swiping_down` | 98.0% |
+| `swiping_up` | 98.2% |
+| `zooming_in_full_hand` | 97.0% |
+| `zooming_out_full_hand` | 95.1% |
 | `point_one` | 97.4% |
+| `point_two` | 95.1% |
+| `unknown` | 85.7% |
 ## Comparison with Previous Architecture
 | Feature projection | No | **Yes (→96)** |
 | Temporal pooling | Mean only | **Mean + Max** |
 | Cross-stream fusion | Concat only | **2-layer MLP gate** |
+| Parameters | ~182 K | ~1,208,554 |
 ## Limitations and Risks

config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
-  "model_version": "two_stream_attn_v1_finetune_20260515T104743Z",
   "model_config": {
-    "model_name": "two_stream_attn_v1_2layer_finetune",
     "input_size": 147,
     "hidden_size": 96,
     "num_layers": 2,
@@ -32,33 +32,33 @@
     }
   },
   "evaluation": {
-    "test_accuracy": 0.9566320645905421,
-    "test_macro_f1": 0.9555633825064029,
-    "test_loss": 0.4212703935896512,
-    "calibration_ece": 0.030699590615060227,
     "per_class_recall": {
-      "fist": 0.9621993127147767,
-      "swiping_right": 0.9548532731376975,
-      "swiping_left": 0.9869158878504672,
-      "swiping_down": 0.9664694280078896,
-      "swiping_up": 0.9755600814663951,
-      "zooming_in_full_hand": 0.975,
-      "zooming_out_full_hand": 0.9388185654008439,
       "point_one": 0.973753280839895,
-      "point_two": 0.9429347826086957,
-      "unknown": 0.8765432098765432
     },
     "per_class_precision": {
-      "fist": 0.9790209790209791,
-      "swiping_right": 0.9701834862385321,
-      "swiping_left": 0.9777777777777777,
-      "swiping_down": 0.9551656920077972,
-      "swiping_up": 0.9618473895582329,
-      "zooming_in_full_hand": 0.9407894736842105,
-      "zooming_out_full_hand": 0.973741794310722,
-      "point_one": 0.8918269230769231,
-      "point_two": 0.9719887955182073,
-      "unknown": 0.9441489361702128
     }
   },
   "class_labels": [
@@ -73,7 +73,7 @@
     "point_two",
     "unknown"
   ],
-  "created_at": "2026-05-15T13:15:40.671167+00:00",
   "gesture_command_mapping": {
     "commands": {
       "swiping_up": "start_presentation",
@@ -83,8 +83,6 @@
       "zooming_in_full_hand": "zoom_in_view",
       "zooming_out_full_hand": "zoom_out_view",
       "fist": "erase_annotations",
-      "pinch": "activate_laser_pointer",
-      "click": "mouse_click",
       "unknown": "no_action"
     },
     "modes": {

 {
+  "model_version": "two_stream_attn_v1_2layer_ld_cong_finetune_20260515T134706Z",
   "model_config": {
+    "model_name": "two_stream_attn_v1_2layer_ld_cong",
     "input_size": 147,
     "hidden_size": 96,
     "num_layers": 2,
     }
   },
   "evaluation": {
+    "test_accuracy": 0.960553633217993,
+    "test_macro_f1": 0.9587371203998121,
+    "test_loss": 0.404887427927576,
+    "calibration_ece": 0.033548892410568715,
     "per_class_recall": {
+      "fist": 0.9725085910652921,
+      "swiping_right": 0.9706546275395034,
+      "swiping_left": 0.983177570093458,
+      "swiping_down": 0.980276134122288,
+      "swiping_up": 0.9816700610997964,
+      "zooming_in_full_hand": 0.9704545454545455,
+      "zooming_out_full_hand": 0.9514767932489452,
       "point_one": 0.973753280839895,
+      "point_two": 0.9510869565217391,
+      "unknown": 0.8567901234567902
     },
     "per_class_precision": {
+      "fist": 0.9433333333333334,
+      "swiping_right": 0.9728506787330317,
+      "swiping_left": 0.9813432835820896,
+      "swiping_down": 0.9613152804642167,
+      "swiping_up": 0.9620758483033932,
+      "zooming_in_full_hand": 0.9510022271714922,
+      "zooming_out_full_hand": 0.9740820734341252,
+      "point_one": 0.9298245614035088,
+      "point_two": 0.958904109589041,
+      "unknown": 0.9559228650137741
     }
   },
   "class_labels": [
     "point_two",
     "unknown"
   ],
+  "created_at": "2026-05-15T13:52:14.109098+00:00",
   "gesture_command_mapping": {
     "commands": {
       "swiping_up": "start_presentation",
       "zooming_in_full_hand": "zoom_in_view",
       "zooming_out_full_hand": "zoom_out_view",
       "fist": "erase_annotations",
       "unknown": "no_action"
     },
     "modes": {