WalkingOnSaturn commited on
Commit
c781b57
·
verified ·
1 Parent(s): 4731cee

initial: Gradio API server (Kimodo-SOMA-RP-v1.1) + constraints schema

Browse files
Files changed (7) hide show
  1. Dockerfile +42 -0
  2. LICENSE +201 -0
  3. README.md +53 -3
  4. constraints_schema.py +110 -0
  5. requirements.txt +375 -0
  6. server.py +225 -0
  7. start.sh +47 -0
Dockerfile ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2
+ # SPDX-License-Identifier: Apache-2.0
3
+ #
4
+ # Forked from nvidia/Kimodo. We replace the Viser entrypoint (kimodo_demo)
5
+ # with a thin Gradio API server (server.py) that exposes /gradio_api/call/kimodo_motion
6
+ # for the GengaMachines webapp.
7
+
8
+ FROM nvcr.io/nvidia/pytorch:24.10-py3
9
+
10
+ ENV DEBIAN_FRONTEND=noninteractive \
11
+ PIP_DISABLE_PIP_VERSION_CHECK=1 \
12
+ PYTHONDONTWRITEBYTECODE=1 \
13
+ PYTHONUNBUFFERED=1 \
14
+ HF_HOME=/data/.huggingface \
15
+ XDG_CACHE_HOME=/data/.cache \
16
+ PIP_CACHE_DIR=/data/.cache/pip
17
+
18
+ WORKDIR /workspace
19
+
20
+ RUN apt-get update && apt-get install -y --no-install-recommends \
21
+ git curl ca-certificates \
22
+ cmake build-essential \
23
+ && rm -rf /var/lib/apt/lists/*
24
+
25
+ RUN rm -f /usr/local/bin/cmake || true
26
+
27
+ COPY requirements.txt /workspace/requirements.txt
28
+ ARG GITHUB_TOKEN=
29
+
30
+ RUN --mount=type=cache,target=/root/.cache/pip \
31
+ --mount=type=secret,id=GITHUB_TOKEN,mode=0444,required=false \
32
+ python -m pip install --upgrade pip \
33
+ && python -m pip install -r requirements.txt;
34
+
35
+ # Our custom files: replace the Viser demo with a Gradio API.
36
+ COPY server.py /workspace/server.py
37
+ COPY constraints_schema.py /workspace/constraints_schema.py
38
+ COPY start.sh /start.sh
39
+ RUN chmod +x /start.sh
40
+
41
+ EXPOSE 7860
42
+ ENTRYPOINT ["/start.sh"]
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -1,10 +1,60 @@
1
  ---
2
  title: Genga Kimodo
3
- emoji: 📉
4
  colorFrom: yellow
5
- colorTo: green
6
  sdk: docker
 
7
  pinned: false
 
 
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Genga Kimodo
3
+ emoji: 🐉
4
  colorFrom: yellow
5
+ colorTo: pink
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
+ license: apache-2.0
10
+ short_description: API-only Kimodo motion backend for GengaMachines.
11
+ models:
12
+ - nvidia/Kimodo-SOMA-RP-v1
13
+ - nvidia/Kimodo-SOMA-RP-v1.1
14
  ---
15
 
16
+ # Genga × Kimodo
17
+
18
+ API-only Hugging Face Space backing the GengaMachines webapp's `/motion`
19
+ flow. Wraps NVIDIA Kimodo-SOMA-RP-v1.1 with a thin Gradio API.
20
+
21
+ The official interactive Kimodo demo (Viser) lives at
22
+ [nvidia/Kimodo](https://huggingface.co/spaces/nvidia/Kimodo). This Space is
23
+ deliberately **not** that — we expose `/gradio_api/call/kimodo_motion` so the
24
+ webapp can submit prompts + constraints and receive structured motion JSON.
25
+
26
+ ## API
27
+
28
+ ```
29
+ POST /gradio_api/call/kimodo_motion
30
+ Content-Type: application/json
31
+ Authorization: Bearer ${HF_TOKEN}
32
+
33
+ { "data": [prompt, num_frames, seed, cfg, num_steps, constraints_json] }
34
+ ```
35
+
36
+ Then poll `/gradio_api/call/kimodo_motion/<event_id>` for the SSE stream
37
+ (complete / error / heartbeat events).
38
+
39
+ Result envelope:
40
+
41
+ ```json
42
+ {
43
+ "status": "ok",
44
+ "numFrames": 90,
45
+ "fps": 30,
46
+ "rootTranslation": [[x, y, z], ...],
47
+ "jointRotMats": [[ [[..3..]], ...30 ], ...90 ],
48
+ "footContacts": [[lh, lt, rh, rt], ...] | null,
49
+ "summary": "..."
50
+ }
51
+ ```
52
+
53
+ `jointRotMats` is row-major `[N, 30, 3, 3]` — local-space SOMA joint rotations.
54
+ The webapp converts to quaternions in `SomaCharacterMesh.tsx`.
55
+
56
+ ## Constraints
57
+
58
+ `constraints_json` is a JSON-stringified list. See `constraints_schema.py`.
59
+ Coordinates: Y-up, meters, character-local. Supported types: `root2d`,
60
+ `fullbody`, `left-hand`, `right-hand`, `left-foot`, `right-foot`, `end-effector`.
constraints_schema.py ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Pydantic schema for the Kimodo constraint payload accepted by server.py.
2
+
3
+ Mirrors the JSON shape produced by the NVIDIA authoring demo so we can fall
4
+ back on the official kimodo.constraints classes for inference. The webapp
5
+ sends a JSON-stringified list of these objects in the Gradio `constraints_json`
6
+ arg.
7
+
8
+ Coordinates: Y-up, meters, character-local. Frame indices are 0-based within
9
+ the generated clip. The root is canonicalized to the XZ origin at frame 0.
10
+ """
11
+
12
+ from __future__ import annotations
13
+
14
+ from typing import List, Literal, Optional, Tuple, Union
15
+
16
+ from pydantic import BaseModel, Field, field_validator
17
+
18
+ Vec2 = Tuple[float, float]
19
+ Vec3 = Tuple[float, float, float]
20
+
21
+
22
+ class Root2DConstraint(BaseModel):
23
+ type: Literal["root2d"]
24
+ frame_indices: List[int]
25
+ smooth_root_2d: List[Vec2]
26
+ global_root_heading: Optional[List[Vec2]] = None
27
+
28
+ @field_validator("frame_indices")
29
+ @classmethod
30
+ def _frames_match_length(cls, v, info):
31
+ # We can only sanity-check inside this constraint; cross-list checks
32
+ # happen in server.py once num_frames is known.
33
+ if not v:
34
+ raise ValueError("root2d constraint must have at least one frame_index")
35
+ return v
36
+
37
+
38
+ class FullBodyConstraint(BaseModel):
39
+ type: Literal["fullbody"]
40
+ frame_indices: List[int]
41
+ root_positions: List[Vec3]
42
+ local_joints_rot: List[List[Vec3]]
43
+ smooth_root_2d: Optional[List[Vec2]] = None
44
+
45
+
46
+ class EndEffectorConstraint(BaseModel):
47
+ type: Literal[
48
+ "left-hand",
49
+ "right-hand",
50
+ "left-foot",
51
+ "right-foot",
52
+ "end-effector",
53
+ ]
54
+ frame_indices: List[int]
55
+ root_positions: List[Vec3]
56
+ local_joints_rot: List[List[Vec3]]
57
+ smooth_root_2d: Optional[List[Vec2]] = None
58
+ joint_names: Optional[List[str]] = None # required when type == "end-effector"
59
+
60
+ @field_validator("joint_names")
61
+ @classmethod
62
+ def _names_required_for_custom(cls, v, info):
63
+ if info.data.get("type") == "end-effector" and not v:
64
+ raise ValueError(
65
+ "type='end-effector' requires `joint_names`; use a typed variant "
66
+ "(left-hand, right-hand, left-foot, right-foot) otherwise."
67
+ )
68
+ return v
69
+
70
+
71
+ KimodoConstraint = Union[Root2DConstraint, FullBodyConstraint, EndEffectorConstraint]
72
+
73
+
74
+ def parse_constraints(payload: List[dict], num_frames: int) -> List[KimodoConstraint]:
75
+ """Validate the JSON payload from the webapp and bound-check frame indices.
76
+
77
+ Returns a list of typed Pydantic objects ready to feed kimodo's sampler.
78
+ Raises ValueError on the first violation; the caller surfaces the error to
79
+ the SSE stream so the webapp toast renders it.
80
+ """
81
+ if not isinstance(payload, list):
82
+ raise ValueError("constraints must be a JSON list")
83
+ out: List[KimodoConstraint] = []
84
+ for i, raw in enumerate(payload):
85
+ if not isinstance(raw, dict) or "type" not in raw:
86
+ raise ValueError(f"constraints[{i}]: must be a dict with a 'type' field")
87
+ t = raw["type"]
88
+ cls = {
89
+ "root2d": Root2DConstraint,
90
+ "fullbody": FullBodyConstraint,
91
+ "left-hand": EndEffectorConstraint,
92
+ "right-hand": EndEffectorConstraint,
93
+ "left-foot": EndEffectorConstraint,
94
+ "right-foot": EndEffectorConstraint,
95
+ "end-effector": EndEffectorConstraint,
96
+ }.get(t)
97
+ if cls is None:
98
+ raise ValueError(f"constraints[{i}]: unknown type '{t}'")
99
+ try:
100
+ obj = cls(**raw)
101
+ except Exception as e:
102
+ raise ValueError(f"constraints[{i}] ({t}): {e}") from e
103
+ for f in obj.frame_indices:
104
+ if f < 0 or f >= num_frames:
105
+ raise ValueError(
106
+ f"constraints[{i}] ({t}): frame_index {f} is out of range "
107
+ f"[0, {num_frames - 1}]"
108
+ )
109
+ out.append(obj)
110
+ return out
requirements.txt ADDED
@@ -0,0 +1,375 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2
+ # SPDX-License-Identifier: Apache-2.0
3
+
4
+ # This file was autogenerated by uv via the following command:
5
+ # NOTE: `torch` (and its CUDA wheels) are intentionally omitted from this lockfile.
6
+ # The Docker base image (nvcr.io/nvidia/pytorch) already provides a tested PyTorch build.
7
+ #
8
+ # uv pip compile docker_requirements.in -o docker_requirements.txt --python-version 3.10 --python-platform x86_64-manylinux2014
9
+ kimodo @ git+https://github.com/nv-tlabs/kimodo.git
10
+ viser @ git+https://github.com/nv-tlabs/kimodo-viser.git
11
+ py-soma-x @ git+https://github.com/NVlabs/SOMA-X.git
12
+ accelerate==1.13.0
13
+ # via peft
14
+ aiofiles==24.1.0
15
+ # via gradio
16
+ annotated-doc==0.0.4
17
+ # via
18
+ # fastapi
19
+ # typer
20
+ annotated-types==0.7.0
21
+ # via pydantic
22
+ antlr4-python3-runtime==4.9.3
23
+ # via
24
+ # hydra-core
25
+ # omegaconf
26
+ anyio==4.12.1
27
+ # via
28
+ # gradio
29
+ # httpx
30
+ # starlette
31
+ attrs==25.4.0
32
+ # via
33
+ # jsonschema
34
+ # referencing
35
+ av==16.1.0
36
+ # via
37
+ # -r docker_requirements.in
38
+ # kimodo
39
+ boto3==1.42.66
40
+ # via
41
+ # -r docker_requirements.in
42
+ # kimodo
43
+ botocore==1.42.66
44
+ # via
45
+ # boto3
46
+ # s3transfer
47
+ brotli==1.2.0
48
+ # via gradio
49
+ certifi==2026.2.25
50
+ # via
51
+ # httpcore
52
+ # httpx
53
+ # requests
54
+ charset-normalizer==3.4.5
55
+ # via
56
+ # requests
57
+ # trimesh
58
+ click==8.3.1
59
+ # via
60
+ # typer
61
+ # uvicorn
62
+ colorlog==6.10.1
63
+ # via trimesh
64
+ einops==0.8.2
65
+ # via
66
+ # -r docker_requirements.in
67
+ # kimodo
68
+ embreex==2.17.7.post7
69
+ # via trimesh
70
+ exceptiongroup==1.3.1
71
+ # via anyio
72
+ fastapi==0.135.1
73
+ # via gradio
74
+ ffmpy==1.0.0
75
+ # via gradio
76
+ filelock==3.25.2
77
+ # via
78
+ # -r docker_requirements.in
79
+ # huggingface-hub
80
+ # kimodo
81
+ # torch
82
+ fsspec==2026.2.0
83
+ # via
84
+ # gradio-client
85
+ # huggingface-hub
86
+ # torch
87
+ gradio==6.9.0
88
+ # via
89
+ # -r docker_requirements.in
90
+ # kimodo
91
+ gradio-client==2.3.0
92
+ # via
93
+ # -r docker_requirements.in
94
+ # gradio
95
+ # kimodo
96
+ groovy==0.1.2
97
+ # via gradio
98
+ h11==0.16.0
99
+ # via
100
+ # httpcore
101
+ # uvicorn
102
+ hf-xet==1.4.0
103
+ # via huggingface-hub
104
+ httpcore==1.0.9
105
+ # via httpx
106
+ httpx==0.28.1
107
+ # via
108
+ # gradio
109
+ # gradio-client
110
+ # huggingface-hub
111
+ # safehttpx
112
+ # trimesh
113
+ huggingface-hub==1.6.0
114
+ # via
115
+ # accelerate
116
+ # gradio
117
+ # gradio-client
118
+ # peft
119
+ # tokenizers
120
+ # transformers
121
+ hydra-core==1.3.2
122
+ # via
123
+ # -r docker_requirements.in
124
+ # kimodo
125
+ idna==3.11
126
+ # via
127
+ # anyio
128
+ # httpx
129
+ # requests
130
+ imageio==2.37.3
131
+ # via viser
132
+ jinja2==3.1.6
133
+ # via
134
+ # gradio
135
+ # torch
136
+ jmespath==1.1.0
137
+ # via
138
+ # boto3
139
+ # botocore
140
+ jsonschema==4.26.0
141
+ # via trimesh
142
+ jsonschema-specifications==2025.9.1
143
+ # via jsonschema
144
+ lxml==6.0.2
145
+ # via
146
+ # trimesh
147
+ # yourdfpy
148
+ manifold3d==3.4.0
149
+ # via trimesh
150
+ mapbox-earcut==2.0.0
151
+ # via trimesh
152
+ markdown-it-py==4.0.0
153
+ # via rich
154
+ markupsafe==3.0.3
155
+ # via
156
+ # gradio
157
+ # jinja2
158
+ mdurl==0.1.2
159
+ # via markdown-it-py
160
+ msgspec==0.20.0
161
+ # via viser
162
+ nodeenv==1.10.0
163
+ # via viser
164
+ numpy==1.26.4
165
+ # via
166
+ # -r docker_requirements.in
167
+ # accelerate
168
+ # embreex
169
+ # gradio
170
+ # imageio
171
+ # kimodo
172
+ # manifold3d
173
+ # mapbox-earcut
174
+ # motion-correction
175
+ # pandas
176
+ # peft
177
+ # pycollada
178
+ # scenepic
179
+ # scipy
180
+ # shapely
181
+ # transformers
182
+ # trimesh
183
+ # vhacdx
184
+ # viser
185
+ # yourdfpy
186
+ omegaconf==2.3.0
187
+ # via
188
+ # -r docker_requirements.in
189
+ # hydra-core
190
+ # kimodo
191
+ orjson==3.11.7
192
+ # via gradio
193
+ packaging==26.0
194
+ # via
195
+ # -r docker_requirements.in
196
+ # accelerate
197
+ # gradio
198
+ # gradio-client
199
+ # huggingface-hub
200
+ # hydra-core
201
+ # kimodo
202
+ # peft
203
+ # transformers
204
+ pandas==2.3.3
205
+ # via gradio
206
+ peft==0.18.1
207
+ # via
208
+ # -r docker_requirements.in
209
+ # kimodo
210
+ pillow==12.1.1
211
+ # via
212
+ # -r docker_requirements.in
213
+ # gradio
214
+ # imageio
215
+ # kimodo
216
+ # scenepic
217
+ # trimesh
218
+ psutil==7.2.2
219
+ # via
220
+ # accelerate
221
+ # peft
222
+ pycollada==0.9.3
223
+ # via trimesh
224
+ pydantic==2.12.5
225
+ # via
226
+ # -r docker_requirements.in
227
+ # fastapi
228
+ # gradio
229
+ # kimodo
230
+ pydantic-core==2.41.5
231
+ # via pydantic
232
+ pydub==0.25.1
233
+ # via gradio
234
+ pygments==2.19.2
235
+ # via rich
236
+ python-dateutil==2.9.0.post0
237
+ # via
238
+ # botocore
239
+ # pandas
240
+ # pycollada
241
+ python-multipart==0.0.22
242
+ # via gradio
243
+ pytz==2026.1.post1
244
+ # via
245
+ # gradio
246
+ # pandas
247
+ pyyaml==6.0.3
248
+ # via
249
+ # accelerate
250
+ # gradio
251
+ # huggingface-hub
252
+ # omegaconf
253
+ # peft
254
+ # transformers
255
+ referencing==0.37.0
256
+ # via
257
+ # jsonschema
258
+ # jsonschema-specifications
259
+ regex==2026.2.28
260
+ # via transformers
261
+ requests==2.32.5
262
+ # via viser
263
+ rich==14.3.3
264
+ # via
265
+ # typer
266
+ # viser
267
+ rpds-py==0.30.0
268
+ # via
269
+ # jsonschema
270
+ # referencing
271
+ rtree==1.4.1
272
+ # via trimesh
273
+ s3transfer==0.16.0
274
+ # via boto3
275
+ safehttpx==0.1.7
276
+ # via gradio
277
+ safetensors==0.7.0
278
+ # via
279
+ # accelerate
280
+ # peft
281
+ # transformers
282
+ scenepic==1.1.2
283
+ # via
284
+ # -r docker_requirements.in
285
+ # kimodo
286
+ scipy==1.15.3
287
+ # via
288
+ # -r docker_requirements.in
289
+ # kimodo
290
+ # scenepic
291
+ # trimesh
292
+ semantic-version==2.10.0
293
+ # via gradio
294
+ shapely==2.1.2
295
+ # via trimesh
296
+ shellingham==1.5.4
297
+ # via typer
298
+ six==1.17.0
299
+ # via
300
+ # python-dateutil
301
+ # yourdfpy
302
+ starlette==0.52.1
303
+ # via
304
+ # fastapi
305
+ # gradio
306
+ svg-path==7.0
307
+ # via trimesh
308
+ tokenizers==0.22.2
309
+ # via transformers
310
+ tomlkit==0.13.3
311
+ # via gradio
312
+ tqdm==4.67.3
313
+ # via
314
+ # -r docker_requirements.in
315
+ # huggingface-hub
316
+ # kimodo
317
+ # peft
318
+ # transformers
319
+ # viser
320
+ transformers==5.1.0
321
+ # via
322
+ # -r docker_requirements.in
323
+ # kimodo
324
+ # peft
325
+ trimesh==4.11.3
326
+ # via
327
+ # -r docker_requirements.in
328
+ # kimodo
329
+ # viser
330
+ # yourdfpy
331
+ typer==0.24.1
332
+ # via
333
+ # gradio
334
+ # huggingface-hub
335
+ # typer-slim
336
+ typer-slim==0.24.0
337
+ # via transformers
338
+ typing-extensions==4.15.0
339
+ # via
340
+ # anyio
341
+ # exceptiongroup
342
+ # fastapi
343
+ # gradio
344
+ # gradio-client
345
+ # huggingface-hub
346
+ # pydantic
347
+ # pydantic-core
348
+ # referencing
349
+ # starlette
350
+ # torch
351
+ # typing-inspection
352
+ # uvicorn
353
+ # viser
354
+ typing-inspection==0.4.2
355
+ # via
356
+ # fastapi
357
+ # pydantic
358
+ tzdata==2025.3
359
+ # via pandas
360
+ urllib3==2.6.3
361
+ # via
362
+ # -r docker_requirements.in
363
+ # botocore
364
+ # kimodo
365
+ # requests
366
+ uvicorn==0.41.0
367
+ # via gradio
368
+ vhacdx==0.0.10
369
+ # via trimesh
370
+ websockets==15.0.1
371
+ # via viser
372
+ xxhash==3.6.0
373
+ # via trimesh
374
+ yourdfpy==0.0.60
375
+ # via viser
server.py ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio API replacing kimodo_demo's Viser entrypoint.
2
+
3
+ Exposes a single endpoint at `/gradio_api/call/kimodo_motion` that accepts:
4
+ (prompt, num_frames, seed, cfg, num_steps, constraints_json)
5
+
6
+ and returns a JSON envelope:
7
+ {
8
+ "status": "ok",
9
+ "numFrames": int,
10
+ "fps": 30,
11
+ "rootTranslation": [[x,y,z], ...], # [N, 3]
12
+ "jointRotMats": [[[[...]]]], # [N, 30, 3, 3]
13
+ "footContacts": [[lh, lt, rh, rt]], # [N, 4] (optional)
14
+ "summary": str
15
+ }
16
+
17
+ The webapp's src/lib/services/kimodo.ts polls
18
+ `/gradio_api/call/kimodo_motion/<event_id>` for the SSE event stream.
19
+ """
20
+
21
+ from __future__ import annotations
22
+
23
+ import json
24
+ import os
25
+ import sys
26
+ import traceback
27
+
28
+ import gradio as gr
29
+ import numpy as np
30
+ import torch
31
+
32
+ from constraints_schema import parse_constraints
33
+
34
+ # Lazy imports of kimodo so import-time failures (e.g. missing CUDA on the
35
+ # Space build container) don't kill `python server.py --help`.
36
+ _model = None
37
+ _skeleton = None
38
+ _device = None
39
+
40
+
41
+ def _load_model():
42
+ global _model, _skeleton, _device
43
+ if _model is not None:
44
+ return _model, _skeleton, _device
45
+ print("[server] loading Kimodo-SOMA-RP-v1.1 ...", file=sys.stderr, flush=True)
46
+ from kimodo import load_model
47
+
48
+ _device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
49
+ print(f"[server] device = {_device}", file=sys.stderr, flush=True)
50
+
51
+ model, resolved = load_model(
52
+ "Kimodo-SOMA-RP-v1.1",
53
+ device=_device,
54
+ default_family="Kimodo",
55
+ return_resolved_name=True,
56
+ )
57
+ print(f"[server] resolved model = {resolved}", file=sys.stderr, flush=True)
58
+ _model = model
59
+ _skeleton = model.skeleton
60
+ return _model, _skeleton, _device
61
+
62
+
63
+ def kimodo_motion(
64
+ prompt: str,
65
+ num_frames: int,
66
+ seed: int,
67
+ cfg: float,
68
+ num_steps: int,
69
+ constraints_json: str,
70
+ progress: gr.Progress = gr.Progress(), # noqa: B008 — Gradio convention
71
+ ) -> dict:
72
+ """Generate one SOMA motion sample. Heavy work runs on the GPU; constraint
73
+ parsing on the CPU. Returns the JSON envelope the webapp expects."""
74
+ try:
75
+ if not prompt or not prompt.strip():
76
+ return {"status": "error", "error": "prompt is empty"}
77
+ n = int(num_frames)
78
+ if n < 1 or n > 300:
79
+ return {
80
+ "status": "error",
81
+ "error": f"num_frames must be in [1, 300]; got {n}",
82
+ }
83
+
84
+ # Validate the constraints payload up front so a bad request doesn't
85
+ # waste GPU time. We accept the same JSON the kimodo CLI accepts —
86
+ # extra cross-list validation in constraints_schema bounds-checks frame
87
+ # indices against num_frames.
88
+ try:
89
+ raw = json.loads(constraints_json) if constraints_json else []
90
+ parse_constraints(raw, n) # validates shape + bounds
91
+ except (ValueError, json.JSONDecodeError) as e:
92
+ return {"status": "error", "error": f"constraint validation: {e}"}
93
+
94
+ progress(0.02, desc="Loading model...")
95
+ model, skeleton, device = _load_model()
96
+
97
+ # Convert the JSON list of dicts into kimodo constraint objects via
98
+ # the official loader — accepts a list-of-dicts directly.
99
+ from kimodo.constraints import load_constraints_lst
100
+
101
+ constraint_lst = load_constraints_lst(raw, skeleton, device=device)
102
+
103
+ if seed is not None and int(seed) >= 0:
104
+ from kimodo.tools import seed_everything
105
+
106
+ seed_everything(int(seed))
107
+
108
+ progress(0.10, desc=f"Diffusion ({int(num_steps)} steps)...")
109
+ cfg_kwargs = {"cfg_type": "regular", "cfg_weight": float(cfg)}
110
+ # Single sample, single prompt. If you want multi-prompt later, this is
111
+ # where you'd thread it through.
112
+ output = model(
113
+ [prompt.strip()],
114
+ [n],
115
+ constraint_lst=constraint_lst,
116
+ num_denoising_steps=int(num_steps),
117
+ num_samples=1,
118
+ multi_prompt=True,
119
+ num_transition_frames=20,
120
+ return_numpy=True,
121
+ **cfg_kwargs,
122
+ )
123
+
124
+ progress(0.92, desc="Serializing...")
125
+
126
+ # Output keys we know exist (per generate.py): posed_joints, global_rot_mats.
127
+ # Shapes: posed_joints [n_samples, T, J, 3], global_rot_mats [n_samples, T, J, 3, 3].
128
+ if "posed_joints" not in output or "global_rot_mats" not in output:
129
+ return {
130
+ "status": "error",
131
+ "error": f"unexpected model output keys: {list(output.keys())}",
132
+ }
133
+ posed_joints = output["posed_joints"]
134
+ global_rot_mats = output["global_rot_mats"]
135
+ if posed_joints.ndim != 4 or global_rot_mats.ndim != 5:
136
+ return {
137
+ "status": "error",
138
+ "error": (
139
+ f"unexpected shapes: posed_joints={posed_joints.shape}, "
140
+ f"global_rot_mats={global_rot_mats.shape}"
141
+ ),
142
+ }
143
+
144
+ # Convert global rotation matrices → local (parent-relative) so the
145
+ # client can apply per-bone rotations directly without doing inverse
146
+ # FK in the browser.
147
+ from kimodo.skeleton import global_rots_to_local_rots
148
+
149
+ joints_pos_t = torch.from_numpy(posed_joints[0]).to(device)
150
+ joints_rot_t = torch.from_numpy(global_rot_mats[0]).to(device)
151
+ local_rot_mats_t = global_rots_to_local_rots(joints_rot_t, skeleton)
152
+ local_rot_mats = local_rot_mats_t.detach().cpu().numpy().astype(np.float32)
153
+
154
+ # Root translation = posed_joints at the root joint index.
155
+ root_idx = int(getattr(skeleton, "root_idx", 0))
156
+ root_translation = (
157
+ joints_pos_t[:, root_idx, :].detach().cpu().numpy().astype(np.float32)
158
+ )
159
+
160
+ # Spot-check the SOMA shape: 30 joints expected for SOMA-RP-v1.1.
161
+ T, J = local_rot_mats.shape[0], local_rot_mats.shape[1]
162
+ if (T, J) != (n, 30):
163
+ return {
164
+ "status": "error",
165
+ "error": (
166
+ f"expected ({n}, 30, 3, 3) for local_rot_mats, got "
167
+ f"{local_rot_mats.shape}"
168
+ ),
169
+ }
170
+
171
+ # Optional foot_contacts if the model emitted them.
172
+ foot_contacts_out = None
173
+ if "foot_contacts" in output:
174
+ fc = output["foot_contacts"]
175
+ # Drop the leading sample dim if present
176
+ if fc.ndim == 3:
177
+ fc = fc[0]
178
+ foot_contacts_out = np.asarray(fc, dtype=np.float32).tolist()
179
+
180
+ progress(1.0, desc="Done")
181
+ return {
182
+ "status": "ok",
183
+ "numFrames": int(T),
184
+ "fps": int(getattr(model, "fps", 30)),
185
+ "rootTranslation": root_translation.tolist(),
186
+ "jointRotMats": local_rot_mats.tolist(),
187
+ "footContacts": foot_contacts_out,
188
+ "summary": prompt.strip(),
189
+ }
190
+ except Exception as e:
191
+ traceback.print_exc()
192
+ return {"status": "error", "error": f"{type(e).__name__}: {e}"}
193
+
194
+
195
+ with gr.Blocks(title="Genga Kimodo") as demo:
196
+ gr.Markdown(
197
+ "# Genga × Kimodo\n"
198
+ "API-only Space. Inference endpoint at `/gradio_api/call/kimodo_motion`.\n\n"
199
+ "This Space backs the GengaMachines webapp and is not a public sandbox. "
200
+ "For the official interactive Kimodo demo, see "
201
+ "[nvidia/Kimodo](https://huggingface.co/spaces/nvidia/Kimodo)."
202
+ )
203
+ in_prompt = gr.Textbox(label="Prompt", value="A person waves hello with their right hand.")
204
+ in_frames = gr.Slider(30, 300, value=90, step=6, label="num_frames (30 fps)")
205
+ in_seed = gr.Number(value=42, label="seed (use -1 to skip seeding)", precision=0)
206
+ in_cfg = gr.Slider(1.0, 10.0, value=5.0, step=0.5, label="cfg_weight")
207
+ in_steps = gr.Slider(10, 50, value=30, step=1, label="num_denoising_steps")
208
+ in_constraints = gr.Textbox(label="constraints_json", value="[]", lines=4)
209
+ btn = gr.Button("Generate")
210
+ out = gr.JSON(label="result")
211
+
212
+ btn.click(
213
+ fn=kimodo_motion,
214
+ inputs=[in_prompt, in_frames, in_seed, in_cfg, in_steps, in_constraints],
215
+ outputs=out,
216
+ api_name="kimodo_motion",
217
+ )
218
+
219
+
220
+ if __name__ == "__main__":
221
+ demo.queue(max_size=4).launch(
222
+ server_name="0.0.0.0",
223
+ server_port=int(os.environ.get("PORT", 7860)),
224
+ show_api=True,
225
+ )
start.sh ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3
+ # SPDX-License-Identifier: Apache-2.0
4
+ # Forked: replaces the upstream `kimodo_demo` Viser launch with a Gradio API
5
+ # (server.py) used programmatically by the GengaMachines webapp.
6
+ set -euo pipefail
7
+
8
+ cd /workspace
9
+
10
+ # pre-download checkpoints. We use SOMA-RP-v1.1 (per the integration plan); v1
11
+ # is downloaded too for parity with upstream.
12
+ python - <<'PY'
13
+ from huggingface_hub import snapshot_download
14
+ for repo in ("nvidia/Kimodo-SOMA-RP-v1", "nvidia/Kimodo-SOMA-RP-v1.1"):
15
+ print(f"snapshot_download({repo}) ...")
16
+ snapshot_download(repo)
17
+ print("Checkpoint download complete.")
18
+ PY
19
+
20
+ # launch text encoder (internal-only, port 9550)
21
+ echo "Starting text-encoder on :9550 ..."
22
+ kimodo_textencoder &
23
+ TEXT_ENCODER_PID=$!
24
+
25
+ cleanup() {
26
+ echo "Shutting down text-encoder (pid=${TEXT_ENCODER_PID}) ..."
27
+ kill "${TEXT_ENCODER_PID}" >/dev/null 2>&1 || true
28
+ }
29
+ trap cleanup EXIT
30
+
31
+ # wait for the text encoder to be healthy
32
+ echo "Waiting for text-encoder health ..."
33
+ for i in $(seq 1 1200); do
34
+ if curl -fsS "http://127.0.0.1:9550/" >/dev/null 2>&1; then
35
+ echo "Text-encoder is up."
36
+ break
37
+ fi
38
+ sleep 1
39
+ if [[ $i -eq 1200 ]]; then
40
+ echo "ERROR: text-encoder did not become healthy on http://127.0.0.1:9550/ within 1200s" >&2
41
+ exit 1
42
+ fi
43
+ done
44
+
45
+ # launch our Gradio API in place of kimodo_demo
46
+ echo "Starting Gradio API on :7860 ..."
47
+ exec python -u /workspace/server.py