--- license: mit pipeline_tag: robotics tags: - retargeted-motion - text-to-motion - motion-to-text - motion-prediction - humanoid - unitree-g1 - robotics - motion-generation ---
Test-Time Scaling for Controller-Aware Language-Conditioned Humanoid Motion Generation
This repository hosts the pretrained **checkpoints and runtime assets** for **TEXEDO**, a text-to-motion pipeline for the Unitree G1 humanoid. Given a language prompt, TEXEDO generates multiple candidate motions, decodes them into a 36-dimensional G1 robot motion format, scores them with dynamic and semantic verifiers, and selects the best candidate for deployment. - 🌐 **Project page:** https://jianuocao.github.io/TEXEDO/ - 💻 **Code:** https://github.com/JianuoCao/TEXEDO - 📄 **Paper:** https://arxiv.org/abs/2606.22998 - 📦 **Dataset:** https://huggingface.co/datasets/JianuoCao/TEXEDO ## Contents | Logical name | What it is | Approx. size | |---|---|---| | `fsq_tokenizer` | FSQ motion tokenizer (encoder/decoder + codebook) for 36-dim G1 motion | ~216 MB | | `fsq_norm_stats` | Per-channel normalization stats for the tokenizer | ~2 KB | | `generator` | Stage-2 text→motion generator: flan-t5-base fine-tuned on FSQ motion tokens (multi-task) | ~3.2 GB | | `dynamic_verifier` | Dynamic-feasibility (physical-plausibility) scorer | ~40 MB | | `dynamic_norm_stats` | Normalization stats paired with the dynamic verifier | ~2 KB | | `semantic_evaluator` | Text–motion matching evaluator (match net + decomposition + meta) | variable | | `glove` | GloVe vocab for the semantic text encoder | ~20 MB | | `g1_robot` | Unitree G1 MuJoCo model (XML + meshes) | ~26 MB | > The base LM `google/flan-t5-base` is loaded from the public Hub at runtime and is not re-hosted here. ## Usage The checkpoints are designed to be fetched automatically by the [TEXEDO code](https://github.com/JianuoCao/TEXEDO): ```bash git clone https://github.com/JianuoCao/TEXEDO.git cd TEXEDO conda env create -f environment.yml conda activate TEXEDO pip install -e . # Downloads these checkpoints + runtime assets into ./assets python scripts/download_assets.py ``` Then run the full generate → score → select → render pipeline: ```bash python -m pipeline.generate --prompt "a person waves with the right hand" --num-samples 8 --out-dir candidates/ python -m pipeline.score --motion-dir candidates/ --caption "a person waves with the right hand" --output scores.csv python -m pipeline.select_best_of_n --scores scores.csv --motion-dir candidates/ --copy-best-to best/ python scripts/visualize_csv.py --input-dir best/ --output-dir viz/ ``` You can also download a single file directly: ```python from huggingface_hub import hf_hub_download ckpt = hf_hub_download( repo_id="JianuoCao/TEXEDO-Checkpoint", filename="tokenizer/checkpoint_epoch_95.pt", ) ``` See the repo's [docs/MODELS.md](https://github.com/JianuoCao/TEXEDO/blob/main/docs/MODELS.md) for the full asset manifest and layout. ## Citation ```bibtex @misc{cao2026texedotesttime, title={TEXEDO: Test-Time Scaling for Controller-Aware Language-Conditioned Humanoid Motion Generation}, author={Jianuo Cao and Yuxin Chen and Yuzhen Song and Masayoshi Tomizuka and Chenran Li and Thomas Tian}, year={2026}, eprint={2606.22998}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2606.22998}, } ``` ## License Released under the MIT license. Third-party datasets, pretrained base models, robot assets, and dependencies retain their own licenses and terms of use.