Instructions to use AlexWortega/moe100m-physics-tinybpe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AlexWortega/moe100m-physics-tinybpe with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AlexWortega/moe100m-physics-tinybpe", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload TASK.md with huggingface_hub
Browse files
TASK.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TASK — 100M-active MoE, from scratch, physics-sim next-frame prediction, custom minimal vocab
|
| 2 |
+
|
| 3 |
+
Train a Qwen3-style sparse-MoE LM **from scratch** on the physics-simulation
|
| 4 |
+
next-frame-prediction corpus, using a **custom minimal tokenizer** whose vocab
|
| 5 |
+
contains only the tokens needed to emit the simulation text (digits, punctuation,
|
| 6 |
+
structural keywords). Target ~100M active params.
|
| 7 |
+
|
| 8 |
+
## Scaffold
|
| 9 |
+
- Model/trainer: github.com/AlexWortega/moe-200m-qwen3-100b- (Qwen3-MoE: GQA + partial RoPE
|
| 10 |
+
+ QK-Norm + RMSNorm, aux-loss-free sigmoid bias router, 1 shared + N routed top-2 SwiGLU
|
| 11 |
+
experts, tied embed/lm_head, Liger fused-CE, Muon optimizer). `MoEModelConfig` in model.py.
|
| 12 |
+
- Sibling 100M config exists: moe-100m-volta-week (good sizing reference).
|
| 13 |
+
|
| 14 |
+
## Data (HF Hub, from the physics-llm project)
|
| 15 |
+
- AlexWortega/physics-scenarios-raw, AlexWortega/physics-scenarios-packed (~900K scenes,
|
| 16 |
+
30 types, 24 train / 6 held-out). Format = the LFM2 serialization (Scene/Gravity/Frame/obj_...).
|
| 17 |
+
|
| 18 |
+
## Key requirement — custom vocab
|
| 19 |
+
Vocab = ONLY simulation tokens (tens–low-hundreds). With tied embeddings, shrinking vocab from
|
| 20 |
+
151,936 → ~100 frees ~97M embedding params, so the whole ~100M budget goes to the MoE/dynamics
|
| 21 |
+
(vs the 350M LFM2 whose huge vocab embeddings ate the budget). Drop free-text Scene/Frame
|
| 22 |
+
descriptions (not needed for physics); keep Type as a categorical token.
|
| 23 |
+
|
| 24 |
+
## Success metric
|
| 25 |
+
Pymunk position error as % of scene diagonal (same as LFM2 baseline), via the existing harness
|
| 26 |
+
at /Users/aleksandrnikolich/Desktop/vae_llm/physics_blog/bench (physics_core.rollout). Baseline
|
| 27 |
+
to beat: LFM2-350M bf16 — @15f trained 0.38% / held-out 0.93%; orbit 0.75% @80f.
|
| 28 |
+
|
| 29 |
+
## High-impact unknowns -> Clarify
|
| 30 |
+
- experiment budget (GPU-h / wall-clock) - GPU choice (eva02 A6000 vs eva01 4xV100)
|
| 31 |
+
- tokenizer/number encoding (char-level vs tiny-BPE) [genuine 2-path fork]
|