Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

.gitattributes +4 -0
README.md +149 -0
au_compare_overlay.png +3 -0
au_effect_maps.png +3 -0
au_solid_neutral_vs_activated.png +3 -0
au_solid_panels.png +3 -0
au_to_mesh_pls_v2.npz +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+au_compare_overlay.png filter=lfs diff=lfs merge=lfs -text
+au_effect_maps.png filter=lfs diff=lfs merge=lfs -text
+au_solid_neutral_vs_activated.png filter=lfs diff=lfs merge=lfs -text
+au_solid_panels.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,149 @@

+---
+license: mit
+library_name: py-feat
+tags:
+  - face-action-units
+  - facs
+  - facial-landmarks
+  - regression
+  - pls
+  - mediapipe
+  - py-feat
+---
+# AU → MP-mesh PLS (visualization)
+> Predicts the 478-vertex MediaPipe FaceMesh deformation from 20 AU intensities. Trained with pose covariates and pose×AU interactions for cleaner AU coefficients; deploy as 23-d input (AU + pose), pose=0 at inference for the standard canonical-frontal viz.
+# AU+pose -> MP mesh PLS (v2)
+Linear PLS regression mapping 20 FACS AU intensities + 3 pose covariates
+(Pitch, Yaw, Roll) to 478×3 = 1434 MediaPipe FaceMesh vertex coordinates in a
+pose-canonical frame. Used to visualize how the MP face mesh deforms as AU
+sliders move.
+## Training data
+- 344,418 frames from 9,978 CelebV-HQ celebrity videos
+- Mesh source: MPDetector mp_facemesh_v2 (478-vertex MediaPipe topology)
+- AU source: Detector with img2pose face + xgb AU on the same frames
+- Pose source: MPDetector facial_transformation_matrices (Pitch, Yaw, Roll, radians)
+- Pose-filtered to |yaw| <= 40°, |pitch| <= 30°
+- Per-frame Umeyama similarity Procrustes alignment to a reference template
+  using 12 stable upper-face anchors (forehead 10/9/8/151, nose bridge 6/168/197/195,
+  outer canthi 33/263, inner canthi 133/362). Removes (R, s, t).
+- Top 1% of frames by max anchor residual dropped (alignment outliers).
+- IMPORTANT: NO per-subject neutral subtraction — predicting absolute aligned
+  coords directly is the Cheong / Py-Feat-tutorial-06 recipe. Per-subject
+  neutral subtraction was tested and capped R² at 0.083; switching to absolute
+  coords raised R² to 0.143 (no-pose) and 0.244 (with pose).
+## Method
+- PLSRegression(n_components=83, scale=True), Cheong style
+- Inputs: [20 AU | 3 pose (Pitch, Yaw, Roll)] = 23 features
+- Outputs: 1434-d absolute pose-canonical mesh coords (image-space pixels)
+## Out-of-sample performance (3-fold GroupKFold by video_id)
+- **Variance-weighted R² = 0.2443 ± 0.0034** across 1434 dims
+- Per-fold R² = [0.2395, 0.2472, 0.2463]
+- MAE = 0.223 px (canonical-frame image space)
+R² is modest by absolute standards because AU intensities can't fully describe
+1434-d mesh deformation (continuous AU intensities from xgb are noisier than
+human FACS coding; many micro-expressions aren't captured by 20 AUs). For
+visualization, qualitative AU-direction correctness matters more than R².
+## Inference
+```python
+import numpy as np
+m = np.load("au_to_mesh_pls_v2.npz")
+au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0  # smile
+pose = np.zeros(3)                                                    # [pitch, yaw, roll]
+x = np.concatenate([au, pose])                                        # (23,)
+flat = x @ m["coef"] + m["intercept"]                                # (1434,)
+# IMPORTANT: layout is axis-major [all x | all y | all z], NOT interleaved
+mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)     # (478, 3)
+# Render with mediapipe.solutions.face_mesh.FACEMESH_TESSELATION
+# Optionally apply user-chosen rigid (R, s, t) post-hoc.
+```
+The NPZ also includes:
+- `mean_aligned_mesh` (478, 3) — population mean canonical mesh ("AU=0" face)
+- `mean_low_au_mesh`  (478, 3) — mean of low-AU-sum frames (cleaner neutral)
+## Citation context
+Adapts Cheong et al. 2023 / Py-Feat tutorial 06 (E. Jolly): affine-aligned 68
+dlib landmarks + pose -> 20 AUs via PLS. Direction inverted (AU+pose -> mesh)
+and scaled to MP's 478-vertex mesh on 10x larger wild-celebrity data.
+## File format
+NPZ with:
+  - coef             (23, 1434) float32 — linear weights, rows match input_columns
+  - intercept        (1434,)    float32 — bias
+  - input_columns    (23,)      str     — input feature order: AU01..AU43, Pitch, Yaw, Roll
+  - au_columns       (20,)      str     — convenience: just the AU subset
+  - pose_columns     (3,)       str     — convenience: just the pose subset
+  - mean_aligned_mesh (478, 3)  float32 — population mean (viz default)
+  - mean_low_au_mesh  (478, 3)  float32 — low-AU-frame mean (cleaner neutral)
+  - reference_anchors (12, 3)   float32 — Procrustes reference
+  - anchor_indices    (12,)     int32   — MP indices used as anchors
+  - n_components      ()        int32
+  - model_card        ()        str
+  - training_metadata ()        str (JSON)
+Loader: np.load("au_to_mesh_pls_v2.npz") — no extra deps needed.
+## Applying pose post-hoc (optional)
+The model is trained with pose covariates as input but at inference users
+typically pass `pose=0` to get the canonical (frontal) deformation. To render
+the result at any chosen head pose, apply a rigid transform AFTER prediction:
+```python
+import numpy as np
+from scipy.spatial.transform import Rotation
+m = np.load("au_to_mesh_pls_v2.npz")
+# 1) Predict canonical mesh from AU vector at pose=0:
+au = np.zeros(20); au[m["au_columns"].tolist().index("AU12")] = 1.0
+x = np.concatenate([au, np.zeros(3)])                                 # (23,)
+flat = x @ m["coef"] + m["intercept"]                                 # (1434,)
+# IMPORTANT: layout is axis-major [all x | all y | all z], NOT interleaved
+canonical_mesh = np.stack([flat[:478], flat[478:956], flat[956:]], axis=1)  # (478, 3)
+# 2) Apply user-chosen rigid pose:
+R = Rotation.from_euler("xyz", [pitch, yaw, roll]).as_matrix()        # (3, 3)
+posed_mesh = canonical_mesh @ R.T                                     # (478, 3)
+# Or for re-projection onto image: s * (canonical @ R.T) + t
+```
+**Two ways to control pose at inference:**
+- **Render-time rotation** (recommended for clean separation): set `pose=0` in
+  the input vector, then rotate the predicted mesh post-hoc as above.
+- **Pose-conditioned prediction** (if you want the model's residual pose-correlated
+  bias): pass non-zero pose values directly into the input vector. The model was
+  trained with pose+pose×AU interactions, so non-zero pose changes the prediction
+  in addition to the rigid rotation needed at render time.
+For most use cases, render-time rotation is cleaner — the model's AU-driven
+deformation stays canonical and pose is purely a viewing transform.
+**Convention notes**:
+- MP canonical: y-axis UP, x-axis to subject's left, z-axis out of face.
+  Standard right-handed.
+- `Rotation.from_euler("xyz", ...)` is intrinsic xyz (R = Rx·Ry·Rz) — matches
+  img2pose Pitch/Yaw/Roll output.
+- If you have MP's full 4x4 `facial_transformation_matrix` (M), apply via
+  `posed = (np.concatenate([canonical, np.ones((478, 1))], axis=1) @ M.T)[:, :3]`.
+## Figures
+![au_solid_panels.png](./au_solid_panels.png)
+![au_solid_neutral_vs_activated.png](./au_solid_neutral_vs_activated.png)
+![au_compare_overlay.png](./au_compare_overlay.png)
+![au_effect_maps.png](./au_effect_maps.png)