EDGE G1 BeatDistance Checkpoints
This repository publishes G1 humanoid checkpoints and evaluation artifacts from the EDGE BeatDistance worktree.
Models
| Path | Use case | Notes |
|---|---|---|
lbeat_relative_finetune/train-500.pt |
Strongest beat-alignment checkpoint | Fine-tuned from the FK-beat BeatDistance checkpoint with the normalized G1 beat-loss estimator. This is the newest model and the one to try when rhythm is the priority. |
fkbeats/train-2000.pt |
Stable default G1 model | Full AIST G1 BeatDistance checkpoint trained with FK-derived beat metadata but without beat loss. It has better contact and smoothness than the lbeat fine-tune. |
train-2000.pt |
Older compatibility checkpoint | Earlier G1 BeatDistance checkpoint before the full FK-beat data/eval path. Keep only for reproducing older results. |
Current full-test metrics:
| Model | G1BAS | G1FKBAS | G1BeatF1 | G1FootSliding | G1GroundPenetration | RootSmoothnessJerkMean | G1Dist |
|---|---|---|---|---|---|---|---|
fkbeats/train-2000.pt |
0.5446 | 0.5502 | 0.3333 | 0.6112 | 0.0665 | 724.8973 | 6.8652 |
lbeat_relative_finetune/train-500.pt |
0.5956 | 0.5978 | 0.4372 | 0.7102 | 0.0979 | 629.5984 | 7.5871 |
Interpretation: the lbeat fine-tune improves beat metrics over the stable FK-beat model, but it also regresses foot sliding, ground penetration, and distribution distance. Use it as the rhythm-focused checkpoint, not as a universally better robot-motion checkpoint.
FK And Non-FK Model I/O
The FK and non-FK G1 models use the same inference interface and the same model input/output tensor shape. Both are G1 diffusion checkpoints with normalized motion tensors of shape [T, 38] and produce G1 motion pickles containing root pose and DoF positions.
FK is used around the model, not inside the neural network architecture:
- FK-derived beat labels improve the prepared dataset and beat evaluation.
- FK enables robot-native diagnostics such as foot sliding, ground penetration, FK beat alignment, and G1 rendering.
- The checkpoint still takes the same music/Jukebox features and beat-distance conditioning as the earlier G1 BeatDistance checkpoints.
So a teammate can swap fkbeats/train-2000.pt and lbeat_relative_finetune/train-500.pt in the same generation command. Use the G1/FK flags only for evaluation or rendering.
Dataset Assets
Dataset archives are under dataset/:
| Path | Contents |
|---|---|
dataset/g1_aistpp_full_fkbeats_motion_beat_data.tar |
Prepared G1 FK-beat AIST tree: metadata, splits, source motions, sliced motions, and beat features. Contains 18,891 motion pickle files across train/test source and slices. |
dataset/aist_g1_retargeted_raw_pkls.tar |
Raw retargeted AIST-to-G1 motion pickles. Contains 1,408 files. |
dataset/aist_full_audio_mp3.tar |
Full AIST audio mp3 files. Contains 60 songs for whole-song generation. |
dataset/splits/ |
Official EDGE AIST split files and ignore list. |
dataset/README.md |
Extraction and usage notes. |
The prepared archive intentionally does not include the local precomputed Jukebox tensor cache or sliced wav symlinks. Those are large runtime caches tied to the cluster filesystem. The full mp3 audio is included so features and slices can be regenerated in a clean environment.
Evaluation Artifacts
| Path | Contents |
|---|---|
eval_fkbeats/ |
Reports, metrics, renders, and generated motions for the stable FK-beat checkpoint. |
eval_lbeat_relative_finetune/ |
Reports, metrics, renders, and generated motions for the lbeat fine-tuned checkpoint. |
eval_fk/ |
Older FK diagnostic eval artifacts. |
Minimal Usage Shape
Use the diffusion branch of the EDGE codebase and the repo-local Python environment. After downloading and extracting the dataset archives:
python -m eval.run_full_song_eval \
--checkpoint lbeat_relative_finetune/train-500.pt \
--motion_format g1 \
--feature_type jukebox \
--data_path data/g1_aistpp_full_fkbeats \
--full_music_dir data/aist_full_audio \
--use_beats \
--beat_rep distance \
--no_render
For robot-native rendering/evaluation, provide the G1 FK model assets available in the codebase and enable the corresponding G1 render/eval flags.
Important Caveats
- These are kinematic G1 motion-generation artifacts, not simulator-tracked policies.
- The lbeat model is rhythm-focused and currently less stable on contact/realism metrics than the stable FK-beat checkpoint.
- AIST++ audio/data may have separate upstream licensing terms. Use the bundled data only in accordance with the original dataset terms.