Upload two_stream_attn_v1_finetune_20260512T041947Z
Browse files- README.md +250 -0
- config.json +100 -0
README.md
ADDED
|
@@ -0,0 +1,250 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: mit
|
| 5 |
+
tags:
|
| 6 |
+
- gesture-recognition
|
| 7 |
+
- hand-gesture
|
| 8 |
+
- pytorch
|
| 9 |
+
- mediapipe
|
| 10 |
+
- temporal-model
|
| 11 |
+
- lstm
|
| 12 |
+
- attention
|
| 13 |
+
- bidirectional
|
| 14 |
+
datasets:
|
| 15 |
+
- IPN-Hand
|
| 16 |
+
metrics:
|
| 17 |
+
- accuracy
|
| 18 |
+
- f1
|
| 19 |
+
model-index:
|
| 20 |
+
- name: two_stream_attn_v1_finetune_20260512T041947Z
|
| 21 |
+
results:
|
| 22 |
+
- task:
|
| 23 |
+
type: gesture-recognition
|
| 24 |
+
dataset:
|
| 25 |
+
name: IPN Hand
|
| 26 |
+
type: IPN-Hand
|
| 27 |
+
metrics:
|
| 28 |
+
- type: accuracy
|
| 29 |
+
value: 0.9898
|
| 30 |
+
- type: f1
|
| 31 |
+
value: 0.9917
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
# two_stream_attn_v1_finetune_20260512T041947Z
|
| 35 |
+
|
| 36 |
+
A real-time hand gesture classifier trained on
|
| 37 |
+
a Hybrid Jester+IPN gesture dataset (Jester dynamic classes + IPN pointing classes).
|
| 38 |
+
|
| 39 |
+
This model is part of the **Maestro** pipeline that enables touchless
|
| 40 |
+
control of presentation and meeting software through hand gestures captured from a
|
| 41 |
+
standard webcam using MediaPipe for landmark extraction.
|
| 42 |
+
|
| 43 |
+
## Model Description
|
| 44 |
+
|
| 45 |
+
- **Architecture**: EnhancedTwoStreamLSTM (BiLSTM h=96×2, MHA 8 heads, proj=96, mean+max pool, MLP gate)
|
| 46 |
+
- **Parameters**: 2,099,434
|
| 47 |
+
- **Input**: `(batch, 32, 147)`
|
| 48 |
+
— 32-frame sliding window at 30 FPS ≈ 1067 ms
|
| 49 |
+
- **Output**: Softmax logits over 10 gesture classes
|
| 50 |
+
- **Inference latency**: < 1 ms per call (CPU, single sample)
|
| 51 |
+
- **Feature schema**: `feature-schema-v5`
|
| 52 |
+
|
| 53 |
+
## Architecture
|
| 54 |
+
|
| 55 |
+
`EnhancedTwoStreamLSTM` splits the 147-dim feature vector into two parallel streams and
|
| 56 |
+
processes them through a BiLSTM + self-attention + MLP-gate pipeline:
|
| 57 |
+
|
| 58 |
+
```
|
| 59 |
+
Input (B, T=32, 147)
|
| 60 |
+
│
|
| 61 |
+
├─ Stream A — Pose/Shape (73 dims)
|
| 62 |
+
│ Linear+LN+GELU → 96
|
| 63 |
+
│ 2-layer BiLSTM (h=96) → (B, T, 192)
|
| 64 |
+
│ LayerNorm → Self-MHA (8 heads) + residual + post-LN
|
| 65 |
+
│ mean+max pool → pool_LN → ctx_a (B, 192)
|
| 66 |
+
│
|
| 67 |
+
├─ Stream B — Motion/Dynamics (74 dims)
|
| 68 |
+
│ (identical structure) → ctx_b (B, 192)
|
| 69 |
+
│
|
| 70 |
+
├─ MLP cross-stream gate
|
| 71 |
+
│ gate_a = Sigmoid(
|
| 72 |
+
│ Linear(96→192)(
|
| 73 |
+
│ Tanh(Linear(192→96)(ctx_b))))
|
| 74 |
+
│ ctx_a = LN(ctx_a × gate_a + ctx_a)
|
| 75 |
+
│ gate_b = Sigmoid(
|
| 76 |
+
│ Linear(96→192)(
|
| 77 |
+
│ Tanh(Linear(192→96)(ctx_a))))
|
| 78 |
+
│ ctx_b = LN(ctx_b × gate_b + ctx_b)
|
| 79 |
+
│
|
| 80 |
+
└─ cat(ctx_a, ctx_b) → (384,)
|
| 81 |
+
LN → Linear(384→192) → GELU → Dropout → Linear(192→10)
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
**Design rationale:**
|
| 85 |
+
- BiLSTMs encode temporal order via their recurrent cell state — no positional encoding needed.
|
| 86 |
+
- Mean+Max pooling captures both sustained gesture shape (mean) and transient click events (max).
|
| 87 |
+
- The 2-layer MLP gate provides non-linear cross-modal recalibration at ~37 K params
|
| 88 |
+
(vs ~263 K for full MHA cross-attention with a degenerate mean-pooled query).
|
| 89 |
+
|
| 90 |
+
## Gesture Classes
|
| 91 |
+
|
| 92 |
+
| Class | Description |
|
| 93 |
+
|-------|-------------|
|
| 94 |
+
| `no_gesture` | — |
|
| 95 |
+
| `point_one` | Single-finger pointing gesture (continuous laser-pointer control) |
|
| 96 |
+
| `point_two` | Two-finger pointing gesture (continuous annotation-pen control) |
|
| 97 |
+
| `stop_sign` | — |
|
| 98 |
+
| `swiping_down` | — |
|
| 99 |
+
| `swiping_left` | — |
|
| 100 |
+
| `swiping_right` | — |
|
| 101 |
+
| `swiping_up` | — |
|
| 102 |
+
| `zooming_in_full_hand` | — |
|
| 103 |
+
| `zooming_out_full_hand` | — |
|
| 104 |
+
|
| 105 |
+
## Gesture Usage In Presentation System
|
| 106 |
+
|
| 107 |
+
| Class | Mode | Command | Runtime handling |
|
| 108 |
+
|-------|------|---------|------------------|
|
| 109 |
+
| `no_gesture` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 110 |
+
| `point_one` | `continuous` | `—` | Continuous tracker: LaserPointerTracker (bypasses discrete dispatcher) |
|
| 111 |
+
| `point_two` | `continuous` | `—` | Continuous tracker: AnnotationPenTracker (bypasses discrete dispatcher) |
|
| 112 |
+
| `stop_sign` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 113 |
+
| `swiping_down` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 114 |
+
| `swiping_left` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 115 |
+
| `swiping_right` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 116 |
+
| `swiping_up` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 117 |
+
| `zooming_in_full_hand` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 118 |
+
| `zooming_out_full_hand` | `unmapped` | `—` | Not mapped in command_map_presentation.yaml |
|
| 119 |
+
|
| 120 |
+
## Feature Schema (`feature-schema-v5`)
|
| 121 |
+
|
| 122 |
+
| Block | Dims | Description |
|
| 123 |
+
|-------|------|-------------|
|
| 124 |
+
| `position` | 0–62 | 21 wrist-relative, scale-normalised landmark positions (x, y, z) |
|
| 125 |
+
| `fingertip_spread` | 63–67 | 5 inter-fingertip Euclidean distances |
|
| 126 |
+
| `wrist_trajectory` | 68–70 | Net wrist displacement from oldest frame in the window |
|
| 127 |
+
| `velocity` | 71–133 | 21 per-landmark wrist-relative velocity vectors (Δposition per unit time) |
|
| 128 |
+
| `joint_angles` | 134–143 | 10 MCP + PIP joint angles in radians |
|
| 129 |
+
| `wrist_vel_raw` | 144–146 | Camera-normalised wrist velocity (x, y, z) — key directional signal |
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
## How to Use
|
| 133 |
+
|
| 134 |
+
```python
|
| 135 |
+
import torch
|
| 136 |
+
from huggingface_hub import hf_hub_download
|
| 137 |
+
from maestro.infrastructure.model.checkpoint_loader import load_inference_artifact
|
| 138 |
+
|
| 139 |
+
# Download the artifact (cached after first call)
|
| 140 |
+
local_path = hf_hub_download(
|
| 141 |
+
repo_id="ntsrigaud/maestro-lstm-hybrid",
|
| 142 |
+
filename="two_stream_attn_v1_finetune_20260512T041947Z_inference.pt",
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
# Load the artifact (includes model, class labels, and feature schema)
|
| 146 |
+
artifact = load_inference_artifact(
|
| 147 |
+
artifact_path=local_path,
|
| 148 |
+
device=torch.device("cpu"),
|
| 149 |
+
)
|
| 150 |
+
artifact.model.eval()
|
| 151 |
+
|
| 152 |
+
# Build a 147-dim feature vector using LandmarkFeatureTransformer
|
| 153 |
+
# and fill a 32-frame SlidingWindowSequenceBuffer, then:
|
| 154 |
+
with torch.no_grad():
|
| 155 |
+
# tensor shape: (batch=1, T=32, F=147)
|
| 156 |
+
window_tensor = torch.tensor(window_np, dtype=torch.float32).unsqueeze(0)
|
| 157 |
+
logits = artifact.model(window_tensor)
|
| 158 |
+
pred_class = artifact.class_labels[logits.argmax(dim=1).item()]
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
## Training Dataset
|
| 162 |
+
|
| 163 |
+
- **Source**: Hybrid merge of Jester and IPN-Hand windows: Jester provides no_gesture/swiping/zoom/stop_sign classes; IPN-Hand provides point_one and point_two
|
| 164 |
+
- **Used classes**: 10 (9 active gestures + `unknown` background)
|
| 165 |
+
- **Dataset split**: 70% train / 15% val / 15% test (stratified by class)
|
| 166 |
+
- **Augmentation**: temporal scale ±20%, spatial jitter σ=0.005;
|
| 167 |
+
label-aware horizontal mirror (swipe_left ↔ swipe_right)
|
| 168 |
+
|
| 169 |
+
## Training Strategy
|
| 170 |
+
|
| 171 |
+
Two-phase transfer learning pipeline:
|
| 172 |
+
- **Phase 1 (pretraining):** backbone pretrained on external checkpoint `two_stream_attn_v1_20260512T041219Z.pt` to learn generic gesture dynamics.
|
| 173 |
+
- **Phase 2 (fine-tuning):** head replaced and model adapted on Hybrid Jester+IPN 10-gesture vocabulary.
|
| 174 |
+
- **Stage A (frozen backbone):** 10 epoch(s) head-only warmup.
|
| 175 |
+
- **Stage B (full model):** up to 60 epoch(s) joint fine-tuning with scheduler/early stopping.
|
| 176 |
+
- **Stage B retention defences:** replay_max_samples_per_class=500, distillation_weight=0.5, replay_ce_weight=0.3, backbone_lr_multiplier=0.1, ewc_weight=100.0, gpm_components=20, forgetting_penalty_weight=0.5.
|
| 177 |
+
|
| 178 |
+
## Training Configuration
|
| 179 |
+
|
| 180 |
+
| Parameter | Value |
|
| 181 |
+
|-----------|-------|
|
| 182 |
+
| Architecture | EnhancedTwoStreamLSTM (BiLSTM h=96×2, MHA 8 heads, proj=96, mean+max pool, MLP gate) |
|
| 183 |
+
| Input size | 147 |
|
| 184 |
+
| Hidden size | 96/stream (BiLSTM output: 192) |
|
| 185 |
+
| Projection dim | 96 |
|
| 186 |
+
| Num layers | 4 |
|
| 187 |
+
| MHA heads | 8 (head dim: 24) |
|
| 188 |
+
| Dropout | 0.35 |
|
| 189 |
+
| Learning rate | 3e-05 |
|
| 190 |
+
| Weight decay | 0.0005 |
|
| 191 |
+
| Batch size | 128 |
|
| 192 |
+
| Max epochs | 60 |
|
| 193 |
+
| Early stopping patience | 20 |
|
| 194 |
+
| Label smoothing | 0.05 |
|
| 195 |
+
| Class weighting | disabled |
|
| 196 |
+
| Max samples per class | 5000 |
|
| 197 |
+
| LR scheduler | ReduceLROnPlateau (factor=0.5, patience=8) |
|
| 198 |
+
|
| 199 |
+
## Evaluation Results (Test Set)
|
| 200 |
+
|
| 201 |
+
| Metric | Value |
|
| 202 |
+
|--------|-------|
|
| 203 |
+
| Accuracy | 99.0% |
|
| 204 |
+
| Macro F1 | 99.2% |
|
| 205 |
+
|
| 206 |
+
### Per-Class Recall
|
| 207 |
+
|
| 208 |
+
| Class | Recall |
|
| 209 |
+
|-------|--------|
|
| 210 |
+
| `no_gesture` | 100.0% |
|
| 211 |
+
| `point_one` | 98.9% |
|
| 212 |
+
| `point_two` | 98.5% |
|
| 213 |
+
| `stop_sign` | 99.5% |
|
| 214 |
+
| `swiping_down` | 99.0% |
|
| 215 |
+
| `swiping_left` | 100.0% |
|
| 216 |
+
| `swiping_right` | 99.1% |
|
| 217 |
+
| `swiping_up` | 98.1% |
|
| 218 |
+
| `zooming_in_full_hand` | 99.2% |
|
| 219 |
+
| `zooming_out_full_hand` | 99.0% |
|
| 220 |
+
|
| 221 |
+
## Comparison with Previous Architecture
|
| 222 |
+
|
| 223 |
+
| Feature | TwoStreamGestureLSTM | EnhancedTwoStreamLSTM |
|
| 224 |
+
|---------|---------------------|-----------------------|
|
| 225 |
+
| LSTM direction | Unidirectional | **Bidirectional** |
|
| 226 |
+
| Attention | Bahdanau (scalar) | **MHA Q/K/V (8 heads)** |
|
| 227 |
+
| Feature projection | No | **Yes (→96)** |
|
| 228 |
+
| Temporal pooling | Mean only | **Mean + Max** |
|
| 229 |
+
| Cross-stream fusion | Concat only | **2-layer MLP gate** |
|
| 230 |
+
| Parameters | ~182 K | ~2,099,434 |
|
| 231 |
+
|
| 232 |
+
## Limitations and Risks
|
| 233 |
+
|
| 234 |
+
- Trained on IPN Hand subjects only. Performance may degrade with unusual hand sizes,
|
| 235 |
+
skin tones, or lighting conditions not represented in training data.
|
| 236 |
+
- The `unknown` class represents background/transition frames. At runtime, predictions
|
| 237 |
+
are filtered through per-class confidence thresholds defined in `production_ipn.yaml`.
|
| 238 |
+
- Requires **mediapipe>=0.10.14** for landmark extraction at inference time.
|
| 239 |
+
- Not intended for safety-critical or accessibility-critical applications.
|
| 240 |
+
- Performance was measured on a held-out test split from the same dataset; real-world
|
| 241 |
+
generalisation may differ.
|
| 242 |
+
|
| 243 |
+
## Environmental Impact
|
| 244 |
+
|
| 245 |
+
Training was performed on CPU/MPS. Estimated training time: ~10 minutes.
|
| 246 |
+
Estimated CO₂ equivalent: negligible (<0.001 kg CO₂eq).
|
| 247 |
+
|
| 248 |
+
---
|
| 249 |
+
|
| 250 |
+
*Generated by the Maestro training pipeline on 2026-05-12.*
|
config.json
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_version": "two_stream_attn_v1_finetune_20260512T041947Z",
|
| 3 |
+
"model_config": {
|
| 4 |
+
"model_name": "two_stream_attn_v1",
|
| 5 |
+
"input_size": 147,
|
| 6 |
+
"hidden_size": 96,
|
| 7 |
+
"num_layers": 4,
|
| 8 |
+
"dropout": 0.35,
|
| 9 |
+
"num_classes": 10
|
| 10 |
+
},
|
| 11 |
+
"feature_schema": {
|
| 12 |
+
"feature_schema_version": "feature-schema-v5",
|
| 13 |
+
"feature_dim": 147,
|
| 14 |
+
"orientation_normalization": false,
|
| 15 |
+
"window_length": 32,
|
| 16 |
+
"window_step": null
|
| 17 |
+
},
|
| 18 |
+
"training_config": {
|
| 19 |
+
"epochs": 60,
|
| 20 |
+
"batch_size": 128,
|
| 21 |
+
"learning_rate": 3e-05,
|
| 22 |
+
"weight_decay": 0.0005,
|
| 23 |
+
"grad_clip_norm": 1.0,
|
| 24 |
+
"seed": 42,
|
| 25 |
+
"label_smoothing": 0.05,
|
| 26 |
+
"class_weighting": false,
|
| 27 |
+
"max_samples_per_class": 5000,
|
| 28 |
+
"scheduler": {
|
| 29 |
+
"factor": 0.5,
|
| 30 |
+
"patience": 8,
|
| 31 |
+
"min_lr": 1e-06
|
| 32 |
+
}
|
| 33 |
+
},
|
| 34 |
+
"evaluation": {
|
| 35 |
+
"test_accuracy": 0.9898119122257053,
|
| 36 |
+
"test_macro_f1": 0.9916782280254713,
|
| 37 |
+
"test_loss": 0.3169419604159946,
|
| 38 |
+
"calibration_ece": 0.04126546900162752,
|
| 39 |
+
"per_class_recall": {
|
| 40 |
+
"no_gesture": 1.0,
|
| 41 |
+
"point_one": 0.9890560875512996,
|
| 42 |
+
"point_two": 0.9850746268656716,
|
| 43 |
+
"stop_sign": 0.9947460595446584,
|
| 44 |
+
"swiping_down": 0.9903846153846154,
|
| 45 |
+
"swiping_left": 1.0,
|
| 46 |
+
"swiping_right": 0.990909090909091,
|
| 47 |
+
"swiping_up": 0.9810126582278481,
|
| 48 |
+
"zooming_in_full_hand": 0.9919484702093397,
|
| 49 |
+
"zooming_out_full_hand": 0.9897959183673469
|
| 50 |
+
},
|
| 51 |
+
"per_class_precision": {
|
| 52 |
+
"no_gesture": 1.0,
|
| 53 |
+
"point_one": 0.9836734693877551,
|
| 54 |
+
"point_two": 0.9864130434782609,
|
| 55 |
+
"stop_sign": 0.9964912280701754,
|
| 56 |
+
"swiping_down": 1.0,
|
| 57 |
+
"swiping_left": 0.9818181818181818,
|
| 58 |
+
"swiping_right": 0.990909090909091,
|
| 59 |
+
"swiping_up": 1.0,
|
| 60 |
+
"zooming_in_full_hand": 0.9919484702093397,
|
| 61 |
+
"zooming_out_full_hand": 0.9897959183673469
|
| 62 |
+
}
|
| 63 |
+
},
|
| 64 |
+
"class_labels": [
|
| 65 |
+
"no_gesture",
|
| 66 |
+
"point_one",
|
| 67 |
+
"point_two",
|
| 68 |
+
"stop_sign",
|
| 69 |
+
"swiping_down",
|
| 70 |
+
"swiping_left",
|
| 71 |
+
"swiping_right",
|
| 72 |
+
"swiping_up",
|
| 73 |
+
"zooming_in_full_hand",
|
| 74 |
+
"zooming_out_full_hand"
|
| 75 |
+
],
|
| 76 |
+
"created_at": "2026-05-12T04:25:36.916751+00:00",
|
| 77 |
+
"gesture_command_mapping": {
|
| 78 |
+
"commands": {
|
| 79 |
+
"swipe_up": "start_presentation",
|
| 80 |
+
"swipe_down": "stop_presentation",
|
| 81 |
+
"swipe_right": "next_slide",
|
| 82 |
+
"swipe_left": "previous_slide",
|
| 83 |
+
"zoom_in": "zoom_in_view",
|
| 84 |
+
"zoom_out": "zoom_out_view",
|
| 85 |
+
"open_palm_hold": "erase_annotations",
|
| 86 |
+
"unknown": "no_action"
|
| 87 |
+
},
|
| 88 |
+
"modes": {
|
| 89 |
+
"swipe_up": "discrete",
|
| 90 |
+
"swipe_down": "discrete",
|
| 91 |
+
"swipe_right": "discrete",
|
| 92 |
+
"swipe_left": "discrete",
|
| 93 |
+
"zoom_in": "discrete",
|
| 94 |
+
"zoom_out": "discrete",
|
| 95 |
+
"open_palm_hold": "discrete",
|
| 96 |
+
"point_one": "continuous",
|
| 97 |
+
"point_two": "continuous"
|
| 98 |
+
}
|
| 99 |
+
}
|
| 100 |
+
}
|