"""Robot Learning Paradigms — a Gradio explorer. A layered ontology for robot learning. The app separates control substrate, learning objective, predictive/world model, architecture, data regime, and deployment role so modern foundation-model robotics is not forced into one flat algorithm list. Run: python robot_paradigms_app.py """ from __future__ import annotations import html as html_lib import gradio as gr import pandas as pd # --------------------------------------------------------------------------- # FAMILY METADATA (color chip + short description) # --------------------------------------------------------------------------- FAMILY = { "BC": ("#2563eb", "Behavioral Cloning — match the expert's action distribution: min_θ D[π_θ ‖ π_D]. The leaves are different action-heads (objectives) for the same parent."), "Reinforcement": ("#ea580c", "Maximize expected return by trial and error."), "Offline RL": ("#dc2626", "RL on a fixed log + a behavior-support constraint."), "Inverse RL": ("#9333ea", "Recover the reward (or a discriminator) from demos."), "Model-Based": ("#7c3aed", "Learn a forward model; plan or train inside it."), "Sequence": ("#ca8a04", "Cast control as generation over actions or trajectories."), "Goal-Cond.": ("#16a34a", "Condition π on a goal; relabel failures by what was reached."), "Hierarchical": ("#92400e", "Two-level: high-level picks subgoals/skills, low-level executes."), "Meta-Learning": ("#db2777", "Train across tasks so adaptation needs only a few steps / episodes."), "LLM-Orchestration": ("#0891b2", "LLM/VLM composes skills or emits constraints — no policy gradient."), } # --------------------------------------------------------------------------- # FAMILY-LEVEL OBJECTIVES (shown above every leaf in that family) # --------------------------------------------------------------------------- FAMILY_EQUATIONS = { "BC": r"\min_\theta\;\; D\!\left[\,\pi_\theta(a\mid o)\,\Big\|\,\pi_{\mathcal{D}}(a\mid o)\,\right]" r"\quad\text{(the divergence depends on the action-head — sub-tree below)}", "Reinforcement": r"\max_\pi\;\;\mathbb{E}_{\tau\sim\pi}\!\left[\,\sum_{t=0}^{\infty}\gamma^{t}\,r(s_t,a_t)\,\right]", "Offline RL": r"\max_\pi\;\;\mathbb{E}_{s\sim\mathcal{D}}\!\left[\,Q(s,\pi(s))\,\right]" r"\quad\text{s.t.}\quad \pi(\cdot\mid s)\approx \pi_{\mathcal{D}}(\cdot\mid s)", "Inverse RL": r"\text{recover }\;r\;\text{ (or discriminator }D\text{)}\;\text{such that }\;" r"\arg\max_\pi \mathbb{E}_\pi\!\left[\sum\gamma^t r\right]\;\text{matches the expert.}", "Model-Based": r"\max_{a_{t:t+H}}\sum_{k=0}^{H}\gamma^{k}\,\hat r(\hat s_{t+k},a_{t+k})\;," r"\;\;\hat s_{t+1}=\hat f_\phi(\hat s_t,a_t)\;\;\text{(plan, or train }\pi\text{ inside)}", "Sequence": r"\text{Train a generative model over actions or trajectories; \textit{sample} at inference.}\;\;" r"p_\theta(a_t\mid\text{context})\;\;\text{or}\;\;p_\theta(\tau)", "Goal-Cond.": r"\pi(a\mid s, g)\;\;\;\text{HER relabel:}\;\;(s_t,a_t,s_{t+1},g)\;\to\;(s_t,a_t,s_{t+1},\,g'\!=\!s_T)", "Hierarchical": r"\pi_{\text{hi}}(z_k\mid s_{kT})\;\cdot\;\pi_{\text{lo}}(a_t\mid s_t, z_k)\;," r"\;\;\text{options terminate per }\beta(s)", "Meta-Learning": r"\min_\theta\;\;\mathbb{E}_{T\sim p(T)}\!\left[\,\mathcal{L}_T\!\big(\theta - \alpha\,\nabla_\theta \mathcal{L}_T(\theta)\big)\,\right]" r"\quad\text{(MAML form)}", "LLM-Orchestration": r"\text{plan}=\mathrm{LLM/VLM}(\text{instr},\,\text{scene})\;\;\Rightarrow\;\;" r"\text{execute via pretrained skills or }\arg\min_\tau\int \mathcal{C}_\text{VLM}(\tau(t))\,dt" r"\quad(\text{no policy gradient})", } # --------------------------------------------------------------------------- # FAMILY RELATIONS — techniques that share the parent equation but aren't # their own leaf. Rendered as a "Same objective, different wrapper" block. # --------------------------------------------------------------------------- FAMILY_RELATIONS = { "BC": [ ("VLA Foundation Models", r"\pi_\theta(a\mid o,\ell)\;\;\text{with}\;\;\theta\;\text{init from a VLM};\;" r"\text{action-head}\,\in\,\{\text{Flow}, \text{Diffusion}, \text{Tokenized}\}", "Not a new objective. The loss is one of the four heads in this sub-tree; " "what makes a model a VLA is (i) the trunk is initialised from a pretrained Vision-Language " "Model and (ii) fine-tuning runs on multi-embodiment / Open-X-Embodiment / DROID-scale data. " "Tokenized BC: RT-1, RT-2, RT-H, RT-X, OpenVLA, π₀-FAST, Gato, HPT, Gemini Robotics, " "Helix. Flow Matching: π₀ / π₀.5 / π₀.6 / OpenPI. Diffusion: RDT-1B, " "Octo, GR00T-N1."), ("DAgger (Dataset Aggregation)", r"\mathcal{D}_{i+1} = \mathcal{D}_i \cup \{(s,\pi^*(s)) : s\sim d^{\pi_i}\}", "Same BC loss — only the state distribution changes. Visit states with the current " "learner, label them with an online expert, retrain. Cures BC's compounding error."), ("Action Chunking (ACT / ALOHA)", r"\pi_\theta(a_{t:t+H}\mid o_t)\;\;\text{+ temporal ensemble}\;\;" r"a^{\text{exec}}_t = \tfrac{1}{|H|}\!\!\sum_{k\,:\,t\in\text{chunk}_k}\!\!a_t^{(k)}", "Same loss family; the action target is an H-step chunk, decoded jointly, then " "ensembled across overlapping chunks at execution. Backbone of ALOHA bimanual teleop."), ("MSE-BC (classic supervised regression)", r"\mathcal{L} = \mathbb{E}_{(o,a)\sim\mathcal{D}}\!\left[\,\|a - \mu_\theta(o)\|^2\,\right]", "Degenerate single-mode case of a Gaussian flow / diffusion head — collapses " "multi-modal demos to the mean. This is the failure mode that motivated every leaf above."), ("Visual SSL Pretraining (R3M, MVP, VIP, Voltron)", r"\phi^* = \arg\min_\phi\,\mathcal{L}_{\text{SSL}}(\phi;\,\text{web video})\;\;\Rightarrow\;\;\pi_\psi(a\mid \phi(o))", "Orthogonal pretraining step, not a policy objective. Pretrain an encoder φ " "with contrastive / masked / value-based losses on Ego4D or web video, freeze it, then run " "any head above on φ(o)."), ], "Reinforcement": [ ("Domain Randomization / Sim-to-Real", r"\max_\pi\;\;\mathbb{E}_{\xi\sim p(\xi)}\,\mathbb{E}_{\tau\sim\pi,\,\mathrm{sim}_\xi}\!\left[\,\sum_t \gamma^t r_t\,\right]", "Same RL gradient inside an outer expectation over physics / visual params ξ. " "Not a new objective; it's a training-time recipe that makes the policy robust enough to " "transfer to the real robot. Backbone of every modern legged-locomotion deployment " "(ANYmal, RMA, Extreme Parkour, OpenAI Dactyl)."), ], "Sequence": [ ("Decision Transformer as an architecture receptacle", r"\text{tokens}=[\hat R_t,s_t,a_{t-1},\ldots]\;\xrightarrow{\text{causal Transformer}}\;p_\theta(a_t\mid\text{history},\hat R_t)", "Decision Transformer is not a new robot objective in the same sense as PPO or SAC. It is a " "sequence-modeling architecture that can hold several objectives: supervised BC-style next-action " "prediction, offline RL through return-to-go conditioning, goal-conditioned control when the return " "token is replaced by a goal token, or long-horizon planning when the output is a trajectory. In the " "tree it lives under Sequence because the core move is to turn a control problem into token prediction, " "but it overlaps with Offline RL and BC."), ("Long-range sequence control", r"p_\theta(a_{1:T}\mid c)=\prod_t p_\theta(a_t\mid a_{ str: color, _ = FAMILY[family] return ( f'' f'{family.upper()}' ) # --------------------------------------------------------------------------- # PARADIGMS — 20 equation-distinct leaves under 10 families # --------------------------------------------------------------------------- PARADIGMS: list[dict] = [ # ==================== BEHAVIORAL CLONING (BC) — 4 action heads ==================== dict( id="flow-matching-policy", name="Flow Matching Policy", family="BC", tagline="Learn a velocity field that flows noise into expert actions.", mapping="o → a via ODE integration of v_θ(a^t, o, t)", math=( r"\mathcal{L}(\theta) = \mathbb{E}_{t\sim U[0,1],\,a^0\sim\mathcal{N},\,a^1\sim\mathcal{D}}" r"\!\left[\,\bigl\|\,v_\theta(a^t, o, t) - (a^1 - a^0)\,\bigr\|^2\,\right]\;," r"\;\;a^t = (1\!-\!t)\,a^0 + t\,a^1\;\;\Rightarrow\;\;" r"a^1 = a^0 + \int_0^1 v_\theta(a^t, o, t)\,dt" ), intuition=( "Pick a noise sample a⁰ ~ 𝒩(0, I) and an expert action a¹; the conditional optimal-" "transport path between them has constant velocity (a¹ − a⁰). Train v_θ to match that " "velocity at random points along the path. At inference, integrate the ODE from noise " "to action. Multi-modal-aware like diffusion, but with straighter paths → fewer " "function evaluations (often 4–10 vs 50–100 for DDPM)." ), key_papers=["π₀ (Physical Intelligence 2024)", "π₀.5 / π₀.6 (2025)", "OpenPI (2025)", "Conditional Flow Matching (Lipman 2023, original method)"], pros=["Multi-modal-aware (no mean collapse)", "Fewer NFEs than DDPM at similar quality", "Straight-line target makes training stable", "Drop-in action head for VLA backbones"], cons=["Still needs ODE solve at inference", "Sensitive to action normalisation", "Theory newer than diffusion → fewer ablations in literature"], when="Modern default for high-precision, multi-modal manipulation from demos — especially " "as the action head of a VLA (π₀ family).", ), dict( id="diffusion-policy", name="Diffusion Policy", family="BC", tagline="Generate actions by k-step denoising of Gaussian noise.", mapping="o → a via k-step denoising", math=( r"\mathcal{L}(\theta) = \mathbb{E}_{k,\,a,\,\epsilon\sim\mathcal{N}}" r"\!\left[\,\|\epsilon - \epsilon_\theta(a^k, o, k)\|^2\,\right]\;," r"\;\;a^{k-1} = a^{k} - \alpha\,\epsilon_\theta(a^{k}, o, k) + \sigma_k z\;,\;\;z\sim\mathcal{N}(0,I)" ), intuition=( "Treat π(a|o) as the reverse of a noising process: corrupt expert actions to noise across " "k steps, train ε_θ to predict the noise that was added, then denoise from pure noise at " "inference. The score-based parameterisation handles multi-modal demos (left vs right " "around an obstacle) gracefully — MSE-BC would average them into a wall." ), key_papers=["Diffusion Policy (Chi 2023)", "3D Diffusion Policy / DP3 (Ze 2024)", "Equivariant Diffusion Policy (Wang 2024)", "Consistency Policy (Prasad 2024)", "Diffusion-EDFs (Ryu 2023)", "RDT-1B (Liu 2024, as VLA action head)", "Octo (2024, diffusion head VLA)"], pros=["Captures multi-modal demos", "Strong empirical SOTA on manipulation", "Score head plugs cleanly into any encoder (incl. VLM trunks)"], cons=["k-step denoising at inference (slower than flow with straight paths)", "Sensitive to action normalisation & horizon", "DDIM / consistency tricks needed for real-time control"], when="Multi-modal demos, contact-rich manipulation, or as the diffusion action head of a VLA.", ), dict( id="tokenized-bc", name="Tokenized / Categorical BC", family="BC", tagline="Discretise actions, predict them autoregressively like text tokens.", mapping="o → (a^{(1)}, …, a^{(K)}) with a^{(j)} ∈ vocab 𝒱", math=( r"a \to (a^{(1)},\dots,a^{(K)}),\;\;a^{(j)}\in\mathcal{V}\;;\;\;\;" r"\mathcal{L}(\theta) = -\sum_{j=1}^{K}\log p_\theta\!\left(a^{(j)}\,\bigg|\,o,\,a^{(pessimism: CQL pushes " "down Q on OOD actions, IQL learns an expectile-regression value and never queries Q on " "OOD actions, AWAC turns the policy step into advantage-weighted BC, TD3+BC adds an " "‖π(s) − a_𝒟‖² penalty. All share the same objective shape above." ), key_papers=["BCQ (2019)", "CQL (Kumar 2020)", "IQL (Kostrikov 2021)", "AWAC (Nair 2020)", "TD3+BC (Fujimoto 2021)", "BEAR (2019)", "EDAC (2021)", "ReBRAC (2023)"], pros=["Uses logged data only — no env interaction needed", "Much safer than naive offline AC"], cons=["Bounded by data quality + state coverage", "Regulariser strength is fragile"], when="You have logged trajectories with rewards and you cannot interact with the system.", ), # ==================== INVERSE RL / ADVERSARIAL ==================== dict( id="maxent-irl", name="MaxEnt IRL (Recover the Reward)", family="Inverse RL", tagline="First infer r̂, then optimise it with RL.", mapping="demos → r̂(s, a) → RL → π", math=( r"\max_r\;\Big[\,\min_\pi\;\mathbb{E}_\pi[-r]\;-\;\mathbb{E}_{\pi^*}[-r]\,\Big]" r"\;-\;\psi(r)\quad\text{(MaxEnt: }\psi\,\text{= entropy regulariser on the implied }\pi_r\text{)}" ), intuition=( "Find a reward under which the expert is optimal (or near-optimal under a max-entropy " "prior); then run any RL algorithm on r̂. The recovered reward generalises to states " "the expert never visited — unlike BC, which only knows how to copy." ), key_papers=["MaxEnt IRL (Ziebart 2008)", "Guided Cost Learning (Finn 2016)", "f-IRL (Ni 2020)"], pros=["Generalises beyond the demo support", "Reusable, transferable reward"], cons=["Bilevel optimisation is hard", "Reward is non-identifiable (many r explain demos)"], when="You want a reusable reward function, not just a policy.", ), dict( id="gail", name="GAIL / AIRL (Adversarial Imitation)", family="Inverse RL", tagline="GAN for behaviour — fool a discriminator that knows the expert.", mapping="demos → D → shaped reward log D → RL → π", math=( r"\min_\pi\;\max_D\;\;\mathbb{E}_{(s,a)\sim\pi^*}\!\left[\,\log D(s,a)\,\right]" r"\;+\;\mathbb{E}_{(s,a)\sim\pi}\!\left[\,\log(1 - D(s,a))\,\right]" ), intuition=( "Train a discriminator to tell expert (s,a) from learner (s,a); the learner's shaped " "reward becomes 'how well I fooled D' (≈ −log(1 − D)). No explicit reward needed — " "just demos and rollouts. AIRL recovers an interpretable reward as a by-product." ), key_papers=["GAIL (Ho & Ermon 2016)", "AIRL (Fu 2017)", "SQIL (Reddy 2019)"], pros=["No reward design needed", "Strong even from a few demos"], cons=["GAN-style instability", "Needs env interaction (it's still RL inside)"], when="You have demos and a simulator but no reward signal.", ), # ==================== MODEL-BASED / WORLD MODELS ==================== dict( id="forward-dynamics-mpc", name="Forward-Dynamics + MPC", family="Model-Based", tagline="Fit f̂(s, a)→s' in observation space; replan every step.", mapping="(s, a) → ŝ' then plan over ŝ-rollouts", math=( r"\hat s_{t+1} = \hat f_\phi(\hat s_t, a_t)\;\;\Rightarrow\;\;" r"a^*_{t:t+H} = \arg\max_{a_{t:t+H}}\sum_{k=0}^{H}\gamma^k\,\hat r(\hat s_{t+k}, a_{t+k})" r"\quad\text{(CEM / MPPI / shooting)}" ), intuition=( "Learn forward dynamics from interaction; at each step, sample candidate action " "sequences, roll them forward in the model, score them with the reward, and execute the " "first action of the best sequence (then re-plan). Way more sample-efficient than " "model-free — your robot 'thinks before it acts'." ), key_papers=["PILCO (Deisenroth & Rasmussen 2011)", "PETS (Chua 2018)", "Visual Foresight (Finn & Levine 2017)"], pros=["Very sample-efficient", "Same model reusable for any reward"], cons=["Model error compounds over the horizon", "Long-horizon planning is hard"], when="Real-robot RL where each rollout is expensive.", ), dict( id="latent-imagination", name="Latent Imagination (Dreamer / TD-MPC / MuZero)", family="Model-Based", tagline="Dream in a compact latent space; train the actor inside the dream.", mapping="φ(o)→z; ẑ_{t+1} ~ p_φ(·|ẑ_t, a_t); train π on imagined rollouts", math=( r"\hat z_{t+1}\!\sim\!p_\phi(\,\cdot\!\mid\!\hat z_t, a_t)\,,\;\;" r"\hat r_t\!\sim\!p_\phi(\,\cdot\!\mid\!\hat z_t)\;\;\Rightarrow\;\;" r"\nabla_\theta J = \mathbb{E}_{\hat\tau\sim\text{dream}}\!\left[\,\sum_t \gamma^t \hat r_t\,\right]" ), intuition=( "Encode pixels into compact latents; learn a recurrent latent dynamics + reward head; " "train an actor-critic on cheap imagined rollouts entirely in latent space. " "Dreamer V3 solves Atari, DMC, and Minecraft with one set of hyperparameters; MuZero " "uses the same idea but plans with MCTS over the learned model." ), key_papers=["World Models (Ha 2018)", "PlaNet (Hafner 2019)", "Dreamer V1/V2/V3 (2020–23)", "TD-MPC / TD-MPC2 (Hansen 2022, 2024)", "MuZero (Schrittwieser 2020)", "EfficientZero (Ye 2021)", "DayDreamer (Wu 2022)"], pros=["Extremely sample-efficient", "Plans / trains in compact latents", "Same recipe scales across domains"], cons=["Model bias amplifies in dream", "Heavier engineering than PPO"], when="High-dim observations, limited real-interaction budget.", ), dict( id="generative-video-wm", name="Generative Video World Model", family="Model-Based", tagline="A neural simulator that predicts future video; action conditioning is optional but crucial for control.", mapping="context → future video where context may include text, goals, or actions", math=( r"p_\theta(o_{t+1:t+H}\mid o_{\le t}, c)\quad" r"c\in\{\ell, g, a_{t:t+H}, \text{mask / mixed conditions}\}" ), intuition=( "This is the umbrella for video-based world models. The key design choice is what the " "conditioning variable c contains. If c contains future actions, the model is a simulator. " "If c is only text or a goal, the model is closer to a video planner or world action model. " "The same pretrained video backbone can become either branch." ), key_papers=["UniSim (Yang 2023)", "GAIA-1 (Wayve 2023)", "Genie / Genie-2 / Genie-3 (DeepMind 2024–25)", "UniPi (Du 2023)"], pros=["Uses video pretraining", "Can represent rich scene change", "Can become WAM, AC-WM, or flexibly conditioned WM"], cons=["Expensive to train", "Pixel-level prediction can waste capacity", "Control depends on the conditioning interface"], when="You want a foundation model over physical scene evolution rather than just a policy head.", ), dict( id="action-conditioned-wm", name="Action-Conditioned World Model (AC-WM)", family="Model-Based", tagline="Predict what would happen if the robot executed a proposed future action sequence.", mapping="(o_t, a_{t:t+H}) → o_{t+1:t+H} or z_{t+1:t+H}", math=( r"p_\theta(o_{t+1:t+H}\mid o_{\le t},a_{t:t+H})\quad\text{or}\quad" r"p_\theta(z_{t+1:t+H}\mid z_t,a_{t:t+H})" ), intuition=( "Actions go into the model. That makes the model a counterfactual simulator: try action " "sequence A, predict the future; try sequence B, predict a different future. This is the " "branch needed for closed-loop policy evaluation, RL inside the world model, model-predictive " "control, and learning from failures or autonomous play." ), key_papers=["World Models (Ha 2018)", "Dreamer family", "Veo-Robotics (2025)", "Ctrl-World (2025)", "DreamDojo (2026)", "PlayWorld (2026)", "World-Gymnast (2026)", "WorldGym (2025)", "V-JEPA 2 (2025)"], pros=["Counterfactual rollouts", "Can use success, failure, and play data", "Enables RL, MPC, policy evaluation, and fine-grained planning"], cons=["Action representation is embodiment-specific", "Harder to preserve generic video pretraining", "Must model bad actions as well as good ones"], when="You need a real simulator-like model: evaluating candidate actions, training in imagination, or debugging policies.", ), dict( id="world-action-model", name="World Action Model (WAM)", family="Model-Based", tagline="Generate a successful future video and decode the actions that realize it.", mapping="(o_t, instruction) → (future video, actions)", math=( r"p_\theta(o_{t+1:T},a_{t:T}\mid o_t,\ell)\quad\text{or}\quad" r"p_\theta(o_{t+1:T}\mid o_t,\ell)\;p_\phi(a_{t:T}\mid o_{t:T})" ), intuition=( "Actions come out of the model. A WAM is best understood as a world-model-powered policy " "proposal generator: given the scene and instruction, imagine a successful execution, then " "recover actions from that imagined video. It belongs under world models because video " "generation is central, but behaviorally it overlaps with BC and VLA policies." ), key_papers=["DreamZero (2026)", "Large Video Planner (2025)", "mimic-video (2025)", "VideoPolicy / Video Generators are Robot Policies (2025)", "Unified Video Action Model (2025)", "Cosmos Policy (2026)"], pros=["Preserves image+text video pretraining", "Easier learning target: successful executions", "Good best-of-N action proposals", "Cross-embodiment video trunk is natural"], cons=["Weak counterfactual reasoning", "Hard to use unlabeled failures without relabeling", "Policy evaluation and closed-loop RL still need an action-conditioned simulator"], when="You want strong action proposals from video pretraining, especially for instruction-following manipulation.", ), dict( id="occupancy-latent-wm", name="Occupancy / Latent State World Model", family="Model-Based", tagline="Predict compact physical state, occupancy, contact, or latent scene dynamics instead of raw pixels.", mapping="(z_t or 3D state, a_{t:t+H}) → z_{t+1:t+H}, occupancy, contacts, cost", math=( r"z_t=\phi(o_{\le t}),\quad p_\theta(z_{t+1}\mid z_t,a_t),\quad" r"\hat c_t=\psi(z_t)\;\text{for occupancy/contact/cost}" ), intuition=( "Occupancy and latent-state models are not separate from world models; they are a " "representation choice inside the AC-WM branch. Instead of predicting RGB pixels, the model " "predicts a planning-friendly state: free space, object occupancy, contacts, or a learned latent. " "This often makes manipulation and navigation planning more direct." ), key_papers=["PlaNet / Dreamer latent dynamics", "TD-MPC / TD-MPC2", "V-JEPA 2 latent prediction", "occupancy-flow and neural scene dynamics lines"], pros=["More planning-friendly than pixels", "Usually cheaper than video generation", "Connects naturally to MPC, collision checking, and cost maps"], cons=["May lose visual detail", "Representation design matters", "Needs the right state abstraction for the task"], when="You care about physical feasibility, collision, contact, or long-horizon planning more than photorealistic video.", ), # ==================== SEQUENCE-MODEL CONTROL ==================== dict( id="decision-transformer", name="Decision Transformer (Return-Conditioned Sequence Model)", family="Sequence", tagline="Autoregress actions like tokens, conditioned on return-to-go.", mapping="(R̂_t, s_t, a_t, R̂_{t−1}, s_{t−1}, …) → a_t", math=( r"a_t\sim p_\theta\!\left(\,a_t\,\Big|\,\hat R_t,\,s_t,\,\hat R_{t-1},\,s_{t-1},\,a_{t-1},\,\dots\,\right)\;," r"\;\;\hat R_t = \sum_{t'\ge t} r_{t'}\;\;\text{(return-to-go conditioning)}" ), intuition=( "No Bellman, no bootstrap, no critic. Treat trajectories as token sequences and train a " "transformer to predict the next action conditioned on a desired return-to-go " "(plus past states/actions). At inference, condition on a high target return and let " "the transformer hallucinate the action sequence that achieves it." ), key_papers=["Decision Transformer (Chen 2021)", "Trajectory Transformer (Janner 2021)", "Gato (Reed 2022)"], pros=["Single supervised objective", "Multi-task friendly", "Cleanly reuses LM infra"], cons=["Return-conditioning can be brittle (overestimation of achievable returns)", "No principled stitching of suboptimal trajectories"], when="Offline data with rewards; you want a transformer-friendly RL formulation.", ), dict( id="trajectory-diffusion", name="Trajectory Diffusion (Diffuser)", family="Sequence", tagline="Denoise whole (state, action) trajectories; guide with a goal/reward.", mapping="goal / reward → τ = (s_1, a_1, …, s_T, a_T)", math=( r"p_\theta(\tau)\;\;\text{diffusion model over}\;\;\tau=(s_1,a_1,\dots,s_T,a_T)\;;" r"\;\;\;\text{plan}\;=\;\mathrm{sample}\!\left(\,p_\theta(\tau)\;\Big|\;\nabla_\tau \log p(g\mid\tau)\,\right)" ), intuition=( "Don't just denoise actions — denoise the entire trajectory of states and actions, then " "guide the sampler with classifier-style gradients for goals or rewards. Planning " "becomes Bayesian inference: sample plans from the posterior p(τ | g)." ), key_papers=["Diffuser (Janner 2022)", "Decision Diffuser (Ajay 2022)"], pros=["Unifies planning + control in one model", "Flexible task / goal conditioning"], cons=["Slow inference (denoise the whole τ)", "Discrete actions are awkward"], when="Offline data, long horizons, flexible / re-specifiable goal conditioning.", ), # ==================== GOAL-CONDITIONED ==================== dict( id="goal-conditioned", name="Goal-Conditioned + Hindsight Relabeling", family="Goal-Cond.", tagline="One policy commandable to any reachable goal; turn failures into supervision.", mapping="(s, g) → a", math=( r"\pi(a\mid s, g)\;\;\;\;\text{Hindsight Experience Replay:}\;\;\;" r"(s_t,a_t,s_{t+1},g)\;\to\;(s_t,a_t,s_{t+1},\,g'\!=\!s_T)" ), intuition=( "Condition the policy on a goal; relabel each failed rollout by treating whatever was " "actually reached as the 'intended' goal. Every episode becomes successful supervision " "for some goal — turns sparse-reward tasks into dense supervision." ), key_papers=["UVFA (Schaul 2015)", "HER (Andrychowicz 2017)", "GCSL (Ghosh 2019)", "RIG (Nair 2018)", "Play-LMP (Lynch 2019)"], pros=["Unifies many tasks into one policy", "Free supervision via relabeling"], cons=["Goal space must be observable / specifiable", "Sparse-reward exploration outside data support is still hard"], when="Many related tasks differing only by a goal you can specify or observe.", ), # ==================== HIERARCHICAL ==================== dict( id="hrl", name="Hierarchical (Options / Subgoal HRL)", family="Hierarchical", tagline="High level picks skills/subgoals; low level executes them.", mapping="π_hi(z | s); π_lo(a | s, z)", math=( r"\pi_{\text{hi}}(z_k\mid s_{kT})\,,\;\;\pi_{\text{lo}}(a_t\mid s_t, z_k)\,,\;\;" r"\text{option terminates per }\beta(s)\in[0,1]" ), intuition=( "Long-horizon tasks become tractable when chunked into reusable skills (options or " "subgoals). The high level operates at a coarse timescale and the low level executes " "primitives — better credit assignment, transferable skills." ), key_papers=["Options (Sutton, Precup, Singh 1999)", "Option-Critic (Bacon 2017)", "FeUdal Networks (Vezhnevets 2017)", "HIRO (Nachum 2018)", "HAC (Levy 2017)"], pros=["Better long-horizon credit assignment", "Skills are transferable across tasks"], cons=["Complicated training (joint hi/lo + termination)", "Skill boundary discovery is hard"], when="Long, compositional tasks (kitchens, assembly, sequencing).", ), # ==================== META-LEARNING ==================== dict( id="meta-learning", name="Meta-Learning Policies (MAML / RL² / PEARL)", family="Meta-Learning", tagline="Learn to learn — adapt to a new task in a few steps / episodes.", mapping="task distribution p(T) → fast-adapting π", math=( r"\theta^* = \arg\min_\theta\;\;\mathbb{E}_{T\sim p(T)}\!\left[\,\mathcal{L}_T\!\big(\theta - \alpha\,\nabla_\theta \mathcal{L}_T(\theta)\big)\,\right]" r"\quad(\text{MAML})\;\;\;\Big|\;\;\;z\sim q(z\mid\text{history})\;(\text{PEARL: context-inferred latent})" ), intuition=( "Train across many tasks so that either (a) a few gradient steps on a new task get you " "to a good policy — MAML — or (b) inferring a task-latent z from a short interaction " "history is enough to specialise — PEARL. Two distinct realisations of 'learning to " "learn'; same outer objective shape." ), key_papers=["MAML (Finn 2017)", "RL² (Duan 2016)", "PEARL (Rakelly 2019)", "VariBAD (Zintgraf 2020)", "ProMP (Rothfuss 2018)"], pros=["Few-shot adaptation to new tasks", "Principled multi-task formulation"], cons=["Needs a rich task distribution", "Inner loop is expensive at train time"], when="Many related tasks where rapid adaptation matters more than peak per-task performance.", ), # ==================== LLM / VLM ORCHESTRATION ==================== dict( id="llm-planner", name="LLM-as-Planner / Code-as-Policies", family="LLM-Orchestration", tagline="LLM composes pre-existing skills; no policy gradient.", mapping="(language, scene) → plan / Python program → skill calls", math=( r"\text{plan} = \mathrm{LLM}(\text{instruction},\,\text{scene description},\,\text{API})\;;" r"\;\;\text{exec}(\text{plan})\;\to\;a_{1:T}" ), intuition=( "Don't use the LLM for raw actuator output — use its reasoning to orchestrate " "vetted perception + motion primitives. Ground the LLM via scene captions, value " "functions, or affordances (SayCan), and let it write Python that calls your skill " "library (Code-as-Policies)." ), key_papers=["SayCan (Ahn 2022)", "Inner Monologue (Huang 2022)", "Code-as-Policies (Liang 2023)", "ProgPrompt (Singh 2023)"], pros=["Zero-shot task composition from language", "Human-readable plans / programs", "No policy training required"], cons=["Quality of the skill library = ceiling of behaviour", "Grounding is brittle"], when="Long-horizon, semantically rich tasks with a decent skill library.", ), dict( id="vlm-affordance", name="VLM-Affordance / Spatial Programs (VoxPoser / ReKep)", family="LLM-Orchestration", tagline="VLM emits where to act; a classical solver finds the trajectory.", mapping="(language, scene) → 3D cost / keypoint constraints → argmin_τ", math=( r"\mathcal{C}(x) = \mathrm{LLM\!+\!VLM}(\text{prompt},\,\text{scene})\;\;\Rightarrow\;\;" r"\tau^* = \arg\min_\tau \int_0^T \mathcal{C}(\tau(t))\,dt\quad\text{(s.t. kinematic constraints)}" ), intuition=( "Instead of asking a VLM for joint angles, ask it for a 3D voxel cost map or a set of " "relational keypoint constraints. A classical trajectory optimiser then minimises the " "VLM-defined cost subject to kinematics — combining open-vocabulary semantics with " "reliable motion planning." ), key_papers=["VoxPoser (Huang 2023)", "MOKA (Liu 2024)", "PIVOT (Nasiriany 2024)", "ReKep (Huang 2024)", "RoboPoint (Yuan 2024)"], pros=["Open-vocabulary tasks zero-shot", "No robot-specific fine-tuning needed"], cons=["Limited to tasks expressible as spatial cost / constraints", "VLM inference latency is the bottleneck"], when="Open-vocabulary pick / place / arrange tasks with no robot dataset.", ), ] # --------------------------------------------------------------------------- # PAPERS ATLAS (130+ entries — sortable / filterable table) # Tags follow the new equation-first taxonomy. Relations get distinct tags # so users can still filter for them (e.g. "VLA (Flow)" pulls the π₀ family). # --------------------------------------------------------------------------- PAPERS: list[tuple[str, str, int, str]] = [ # ----- Classic BC (MSE — relation, footnote, not a tree leaf) ----- ("ALVINN: An Autonomous Land Vehicle in a Neural Network", "Pomerleau", 1989, "Classic BC (MSE)"), ("End-to-End Learning for Self-Driving Cars", "Bojarski et al. (NVIDIA)", 2016, "Classic BC (MSE)"), ("BC-Z: Zero-Shot Task Generalization", "Jang et al.", 2021, "Classic BC (MSE)"), ("DART: Noise Injection for Robust IL", "Laskey et al.", 2017, "Classic BC (MSE)"), # ----- DAgger (BC relation) ----- ("DAgger: A Reduction of Imitation Learning to Structured Prediction", "Ross, Gordon, Bagnell", 2011, "DAgger (BC relation)"), ("AggreVaTe", "Ross & Bagnell", 2014, "DAgger (BC relation)"), ("SafeDAgger", "Zhang & Cho", 2017, "DAgger (BC relation)"), # ----- Action Chunking (BC relation) ----- ("Learning Fine-Grained Bimanual Manipulation (ACT / ALOHA)", "Zhao et al.", 2023, "ACT (BC relation)"), ("Mobile ALOHA", "Fu et al.", 2024, "ACT (BC relation)"), ("ALOHA Unleashed", "Zhao et al.", 2024, "ACT (BC relation)"), ("RoboAgent (MT-ACT)", "Bharadhwaj et al.", 2023, "ACT (BC relation)"), # ----- Diffusion Policy (leaf + VLA instances) ----- ("Diffusion Policy", "Chi et al.", 2023, "Diffusion Policy"), ("3D Diffusion Policy (DP3)", "Ze et al.", 2024, "Diffusion Policy"), ("Equivariant Diffusion Policy", "Wang et al.", 2024, "Diffusion Policy"), ("Consistency Policy", "Prasad et al.", 2024, "Diffusion Policy"), ("Diffusion-EDFs", "Ryu et al.", 2023, "Diffusion Policy"), ("RDT-1B (Robotics Diffusion Transformer)", "Liu et al.", 2024, "VLA — Diffusion head"), ("Octo: An Open-Source Generalist Robot Policy", "Octo team", 2024, "VLA — Diffusion head"), ("GR00T-N1 (NVIDIA Humanoid Foundation Model)", "NVIDIA", 2025, "VLA — Diffusion head"), # ----- Flow Matching Policy (leaf + VLA instances) ----- ("Conditional Flow Matching (foundational method)", "Lipman et al.", 2023, "Flow Matching Policy"), ("π₀ (Physical Intelligence)", "Black et al.", 2024, "VLA — Flow Matching head"), ("π₀.5 (open-world generalization, 104 homes)", "Physical Intelligence", 2025, "VLA — Flow Matching head"), ("π₀.6 (Physical Intelligence)", "Physical Intelligence", 2025, "VLA — Flow Matching head"), ("OpenPI (open-source π₀ / π₀.5)", "Physical Intelligence", 2025, "VLA — Flow Matching head"), # ----- Tokenized / Categorical BC (leaf + VLA instances) ----- ("RT-1 (Robotics Transformer)", "Brohan et al.", 2022, "VLA — Tokenized head"), ("RT-2 (VLM as Robot Controller)", "Brohan et al.", 2023, "VLA — Tokenized head"), ("Open X-Embodiment / RT-X", "Open X collaboration", 2023, "VLA — Tokenized head"), ("RT-H (Action Hierarchies with Language)", "Belkhale et al.", 2024, "VLA — Tokenized head"), ("OpenVLA", "Kim et al.", 2024, "VLA — Tokenized head"), ("π₀-FAST (autoregressive action tokenizer)", "Physical Intelligence", 2025, "VLA — Tokenized head"), ("Gato (A Generalist Agent)", "Reed et al.", 2022, "Tokenized / Categorical BC"), ("HPT (Heterogeneous Pretrained Transformers)", "Wang et al.", 2024, "VLA — Tokenized head"), ("GR-1 (Generative Robot)", "Wu et al.", 2024, "VLA — Tokenized head"), ("GR-2 (ByteDance)", "ByteDance", 2024, "VLA — Tokenized head"), ("Helix (dual-system VLA, 35-DoF @ 200 Hz)", "Figure AI", 2025, "VLA — Tokenized head"), ("Gemini Robotics (think-then-act VLA)", "Google DeepMind", 2025, "VLA — Tokenized head"), ("Gemini Robotics 1.5", "Google DeepMind", 2025, "VLA — Tokenized head"), # ----- Energy-Based / Implicit BC ----- ("Implicit Behavioral Cloning", "Florence et al.", 2021, "Energy-Based / Implicit BC"), # ----- Visual SSL (BC relation) ----- ("R3M", "Nair et al.", 2022, "Visual SSL (BC relation)"), ("MVP (Masked Visual Pretraining)", "Xiao et al.", 2022, "Visual SSL (BC relation)"), ("VIP (Value-Implicit Pretraining)", "Ma et al.", 2023, "Visual SSL (BC relation)"), ("Voltron", "Karamcheti et al.", 2023, "Visual SSL (BC relation)"), ("MCR (Manipulation-Centric Representations)", "Jiang et al.", 2024, "Visual SSL (BC relation)"), ("RPT (Robot Learning with Sensorimotor Pre-training)", "Radosavovic et al.", 2023, "Visual SSL (BC relation)"), # ----- Imitation data scaling (relation under BC) ----- ("MimicGen (synthetic demos via SE(3) replay)", "Mandlekar et al.", 2023, "Imitation data scaling"), ("DexMimicGen (dexterous bimanual data scaling)", "Jiang et al.", 2024, "Imitation data scaling"), # ----- Value-Based RL ----- ("Playing Atari with Deep RL (DQN)", "Mnih et al.", 2013, "Value-Based RL"), ("Human-level control through Deep RL (DQN)", "Mnih et al.", 2015, "Value-Based RL"), ("Rainbow", "Hessel et al.", 2017, "Value-Based RL"), ("C51 (A Distributional Perspective on RL)", "Bellemare et al.", 2017, "Value-Based RL"), ("QR-DQN", "Dabney et al.", 2017, "Value-Based RL"), ("IQN", "Dabney et al.", 2018, "Value-Based RL"), ("R2D2", "Kapturowski et al.", 2019, "Value-Based RL"), ("Agent57", "Badia et al.", 2020, "Value-Based RL"), # ----- Policy Gradient RL ----- ("REINFORCE (Simple Statistical Gradient-Following)", "Williams", 1992, "Policy Gradient RL"), ("Trust Region Policy Optimization (TRPO)", "Schulman et al.", 2015, "Policy Gradient RL"), ("Proximal Policy Optimization (PPO)", "Schulman et al.", 2017, "Policy Gradient RL"), ("A3C", "Mnih et al.", 2016, "Policy Gradient RL"), ("ACER", "Wang et al.", 2016, "Policy Gradient RL"), ("IMPALA", "Espeholt et al.", 2018, "Policy Gradient RL"), ("PPG (Phasic Policy Gradient)", "Cobbe et al.", 2020, "Policy Gradient RL"), ("Ape-X DPG", "Horgan et al.", 2018, "Policy Gradient RL"), # ----- Off-Policy Actor-Critic ----- ("DDPG", "Lillicrap et al.", 2015, "Off-Policy Actor-Critic"), ("TD3 (Addressing Function Approximation Error)", "Fujimoto et al.", 2018, "Off-Policy Actor-Critic"), ("Soft Actor-Critic (SAC)", "Haarnoja et al.", 2018, "Off-Policy Actor-Critic"), ("DrQ-v2", "Yarats et al.", 2021, "Off-Policy Actor-Critic"), ("RAD (Data-Aug RL)", "Laskin et al.", 2020, "Off-Policy Actor-Critic"), ("CURL (Contrastive RL)", "Laskin et al.", 2020, "Off-Policy Actor-Critic"), # ----- Sim2Real (RL relation) ----- ("Domain Randomization", "Tobin et al.", 2017, "Sim2Real (RL relation)"), ("Learning Dexterous In-Hand Manipulation", "OpenAI / Akkaya et al.", 2018, "Sim2Real (RL relation)"), ("Solving Rubik's Cube with a Robot Hand", "OpenAI", 2019, "Sim2Real (RL relation)"), ("Sim-to-Real ANYmal Locomotion", "Hwangbo et al.", 2019, "Sim2Real (RL relation)"), ("RMA (Rapid Motor Adaptation)", "Kumar et al.", 2021, "Sim2Real (RL relation)"), ("Cassie / Berkeley Humanoid Walking", "Siekmann et al.", 2021, "Sim2Real (RL relation)"), ("Extreme Parkour with Legged Robots", "Cheng et al.", 2023, "Sim2Real (RL relation)"), # ----- Offline RL ----- ("BCQ (Batch-Constrained Q-Learning)", "Fujimoto et al.", 2019, "Offline RL"), ("CQL (Conservative Q-Learning)", "Kumar et al.", 2020, "Offline RL"), ("IQL (Implicit Q-Learning)", "Kostrikov et al.", 2021, "Offline RL"), ("AWAC", "Nair et al.", 2020, "Offline RL"), ("TD3+BC", "Fujimoto & Gu", 2021, "Offline RL"), ("BEAR", "Kumar et al.", 2019, "Offline RL"), ("EDAC", "An et al.", 2021, "Offline RL"), ("ReBRAC", "Tarasov et al.", 2023, "Offline RL"), # ----- MaxEnt IRL ----- ("MaxEnt IRL", "Ziebart et al.", 2008, "MaxEnt IRL"), ("Guided Cost Learning", "Finn et al.", 2016, "MaxEnt IRL"), ("f-IRL", "Ni et al.", 2020, "MaxEnt IRL"), # ----- GAIL / AIRL ----- ("GAIL (Generative Adversarial Imitation)", "Ho & Ermon", 2016, "GAIL / AIRL"), ("AIRL (Adversarial IRL)", "Fu et al.", 2017, "GAIL / AIRL"), ("SQIL (Soft Q Imitation Learning)", "Reddy et al.", 2019, "GAIL / AIRL"), # ----- Forward-Dynamics + MPC ----- ("PILCO", "Deisenroth & Rasmussen", 2011, "Forward-Dynamics + MPC"), ("PETS (Probabilistic Ensembles + TS)", "Chua et al.", 2018, "Forward-Dynamics + MPC"), ("Visual Foresight", "Finn & Levine", 2017, "Forward-Dynamics + MPC"), # ----- Latent Imagination ----- ("World Models", "Ha & Schmidhuber", 2018, "Latent Imagination"), ("PlaNet (Latent Dynamics)", "Hafner et al.", 2019, "Latent Imagination"), ("Dreamer", "Hafner et al.", 2020, "Latent Imagination"), ("DreamerV2", "Hafner et al.", 2021, "Latent Imagination"), ("DreamerV3", "Hafner et al.", 2023, "Latent Imagination"), ("TD-MPC", "Hansen et al.", 2022, "Latent Imagination"), ("TD-MPC2", "Hansen et al.", 2024, "Latent Imagination"), ("MuZero", "Schrittwieser et al.", 2020, "Latent Imagination"), ("EfficientZero", "Ye et al.", 2021, "Latent Imagination"), ("DayDreamer (Dreamer on Real Robots)", "Wu et al.", 2022, "Latent Imagination"), # ----- Generative Video World Model (+ LAPA relation) ----- ("UniSim", "Yang et al.", 2023, "Generative Video World Model"), ("GAIA-1", "Wayve", 2023, "Generative Video World Model"), ("Genie", "Bruce et al.", 2024, "Generative Video World Model"), ("Genie-2", "DeepMind", 2024, "Generative Video World Model"), ("Genie 3 (action-conditioned foundation world model)", "DeepMind", 2025, "Generative Video World Model"), ("UniPi (Universal Policy via Text-to-Video)", "Du et al.", 2023, "Generative Video World Model"), ("LAPA (Latent Action Pretraining)", "Ye et al.", 2024, "LAPA (Video-WM relation)"), ("ATM (Any-point Trajectory Modeling)", "Wen et al.", 2024, "LAPA (Video-WM relation)"), # ----- Decision Transformer ----- ("Decision Transformer", "Chen et al.", 2021, "Decision Transformer"), ("Trajectory Transformer", "Janner et al.", 2021, "Decision Transformer"), # ----- Trajectory Diffusion ----- ("Diffuser (Planning with Diffusion)", "Janner et al.", 2022, "Trajectory Diffusion"), ("Decision Diffuser", "Ajay et al.", 2022, "Trajectory Diffusion"), # ----- Goal-Conditioned + Hindsight ----- ("Universal Value Function Approximators (UVFA)", "Schaul et al.", 2015, "Goal-Conditioned + Hindsight"), ("Hindsight Experience Replay (HER)", "Andrychowicz et al.", 2017, "Goal-Conditioned + Hindsight"), ("GCSL (Goal-Conditioned Supervised Learning)", "Ghosh et al.", 2019, "Goal-Conditioned + Hindsight"), ("RIG (Visual RL with Imagined Goals)", "Nair et al.", 2018, "Goal-Conditioned + Hindsight"), ("Learning Latent Plans from Play (Play-LMP)", "Lynch et al.", 2019, "Goal-Conditioned + Hindsight"), # ----- Hierarchical ----- ("Between MDPs and semi-MDPs (Options)", "Sutton, Precup, Singh", 1999, "Hierarchical"), ("Option-Critic", "Bacon et al.", 2017, "Hierarchical"), ("FeUdal Networks (FuN)", "Vezhnevets et al.", 2017, "Hierarchical"), ("HIRO (Data-efficient HRL)", "Nachum et al.", 2018, "Hierarchical"), ("HAC", "Levy et al.", 2017, "Hierarchical"), # ----- Meta-Learning ----- ("MAML", "Finn et al.", 2017, "Meta-Learning"), ("RL² (Fast RL via Slow RL)", "Duan et al.", 2016, "Meta-Learning"), ("PEARL", "Rakelly et al.", 2019, "Meta-Learning"), ("VariBAD", "Zintgraf et al.", 2020, "Meta-Learning"), ("ProMP", "Rothfuss et al.", 2018, "Meta-Learning"), # ----- LLM-as-Planner ----- ("SayCan", "Ahn et al.", 2022, "LLM-as-Planner"), ("Inner Monologue", "Huang et al.", 2022, "LLM-as-Planner"), ("Code as Policies", "Liang et al.", 2023, "LLM-as-Planner"), ("ProgPrompt", "Singh et al.", 2023, "LLM-as-Planner"), # ----- VLM-Affordance ----- ("VoxPoser", "Huang et al.", 2023, "VLM-Affordance"), ("MOKA", "Liu et al.", 2024, "VLM-Affordance"), ("PIVOT (Visual Prompting)", "Nasiriany et al.", 2024, "VLM-Affordance"), ("ReKep (Relational Keypoint Constraints)", "Huang et al.", 2024, "VLM-Affordance"), ("RoboPoint", "Yuan et al.", 2024, "VLM-Affordance"), ] # --------------------------------------------------------------------------- # RENDERING # --------------------------------------------------------------------------- def render_paradigm(name: str) -> str: p = next((x for x in PARADIGMS if x["name"] == name), None) if p is None: return "Pick a paradigm from the dropdown." fam_chip = chip(p["family"]) papers_list = "".join(f"
  • {paper}
  • " for paper in p["key_papers"]) pros = "".join(f"
  • {x}
  • " for x in p["pros"]) cons = "".join(f"
  • {x}
  • " for x in p["cons"]) # Family-level objective (shared by every leaf in this family) fam_eq = FAMILY_EQUATIONS.get(p["family"], "") family_eq_block = ( f'
    ' f'
    Family objective — {p["family"]}
    ' f'
    $$ {fam_eq} $$
    ' f'
    ' ) if fam_eq else "" # Relations (same parent objective, different wrapper) rels = FAMILY_RELATIONS.get(p["family"], []) if rels: rel_items = "".join( f'
    ' f'
    {n}
    ' f'
    $$ {eq} $$
    ' f'
    {expl}
    ' f'
    ' for n, eq, expl in rels ) relations_block = ( f'
    ' f'
    ' f'Same family objective, different wrapper (relations)
    ' f'{rel_items}' f'
    ' ) else: relations_block = "" return f"""

    {p['name']}

    {fam_chip}

    {p['tagline']}

    {family_eq_block}
    Mapping
    {p['mapping']}
    Leaf-specific objective
    $$ {p['math']} $$
    Intuition

    {p['intuition']}

    Pros
      {pros}
    Cons
      {cons}
    Key Papers
      {papers_list}
    When to reach for it: {p['when']}
    {relations_block}
    """ def render_compare(name_a: str, name_b: str) -> str: def card(p): if p is None: return "
    " return f"""

    {p['name']}

    {chip(p['family'])}

    {p['tagline']}

    {p['mapping']}
    $$ {p['math']} $$

    {p['intuition']}

    When: {p['when']}
    """ a = next((x for x in PARADIGMS if x["name"] == name_a), None) b = next((x for x in PARADIGMS if x["name"] == name_b), None) return f"""
    {card(a)}{card(b)}
    """ def get_atlas_df(family_filter: str, year_min: int, query: str) -> pd.DataFrame: rows = [] q = (query or "").lower().strip() for title, authors, year, tag in PAPERS: if year < year_min: continue if family_filter != "All" and tag != family_filter: continue if q and q not in title.lower() and q not in authors.lower() and q not in tag.lower(): continue rows.append((year, title, authors, tag)) df = pd.DataFrame(rows, columns=["Year", "Title", "Authors", "Paradigm"]) return df.sort_values(["Year", "Title"], ascending=[False, True]).reset_index(drop=True) # --------------------------------------------------------------------------- # PICK-YOUR-PARADIGM (3-question guide) # --------------------------------------------------------------------------- def recommend(data_type: str, env: str, scale: str) -> str: recs: list[str] = [] if data_type == "Expert demos": # Modern BC defaults — flow matching and diffusion are the two heads # the field has converged on for multi-modal manipulation. recs += ["Flow Matching Policy", "Diffusion Policy"] if scale == "Large / multi-task": recs.append("Tokenized / Categorical BC") # VLA recipe else: recs.append("Energy-Based / Implicit BC") elif data_type == "Reward only": if env == "Simulator available": recs += ["Policy Gradient RL (PPO / TRPO family)", "Off-Policy Actor-Critic (SAC / TD3 family)", "Latent Imagination (Dreamer / TD-MPC / MuZero)"] elif env == "Logged data only": recs += ["Offline RL (Pessimistic Q + Behavior Constraint)", "Decision Transformer (Return-Conditioned Sequence Model)"] else: # Real robot only recs += ["Latent Imagination (Dreamer / TD-MPC / MuZero)", "Forward-Dynamics + MPC"] elif data_type == "Both": recs += ["GAIL / AIRL (Adversarial Imitation)", "MaxEnt IRL (Recover the Reward)", "Offline RL (Pessimistic Q + Behavior Constraint)"] if scale == "Large / multi-task": recs.append("Flow Matching Policy") # π₀-style VLA elif data_type == "Unlabeled video": recs += ["Generative Video World Model", "Flow Matching Policy", # downstream after LAPA latent decode "Diffusion Policy"] elif data_type == "Language + scene only": recs += ["LLM-as-Planner / Code-as-Policies", "VLM-Affordance / Spatial Programs (VoxPoser / ReKep)", "Tokenized / Categorical BC"] # VLA route # de-dup, preserve order seen: set[str] = set() recs = [r for r in recs if not (r in seen or seen.add(r))] cards = [] for r in recs[:4]: p = next((x for x in PARADIGMS if x["name"] == r), None) if not p: continue cards.append( f"
    " f"
    " f"{p['name']}{chip(p['family'])}
    " f"

    {p['tagline']}

    " f"
    {p['mapping']}
    " f"
    " ) if not cards: return "

    Pick options above and I'll suggest paradigms.

    " return f"

    Recommended paradigms

    {''.join(cards)}
    " # --------------------------------------------------------------------------- # UI # --------------------------------------------------------------------------- CSS = """ .gradio-container { font-family: -apple-system, BlinkMacSystemFont, "Inter", "Segoe UI", sans-serif; } #header { background: linear-gradient(135deg, #0f172a 0%, #1e3a8a 50%, #7c3aed 100%); color: white; padding: 28px 32px; border-radius: 12px; margin-bottom: 12px; } #header h1 { color:white; font-size:34px; margin:0; letter-spacing:-.5px; } #header p { color:#cbd5e1; font-size:16px; margin:8px 0 0 0; max-width:780px; } .fam-legend { display:flex; gap:8px; flex-wrap:wrap; padding:10px 16px; } .fam-legend span { padding:3px 10px; border-radius:999px; font-size:11px; color:white; font-weight:600; letter-spacing:.3px;} .mermaid { background:white; border-radius:10px; padding:14px; } """ # Injected into : MathJax so $$...$$ renders inside gr.HTML, with a # debounced MutationObserver that re-typesets on every DOM update (tab switch, # dropdown change, etc.). HEAD_HTML = r""" """ # --------------------------------------------------------------------------- # Layered ontology — the clean classification. The tree below is a projection # of this ontology, not the canonical structure. # --------------------------------------------------------------------------- ONTOLOGY_LAYERS = [ dict( name="Control substrate", question="What object actually closes the loop?", classical="PID, impedance control, computed torque, LQR, MPC, hybrid automata.", modern="Neural feedback policy, action chunker, trajectory optimizer, skill graph, video-to-action generator.", items=[ ("Direct feedback", "u = π(o, g); the policy is the controller."), ("Trajectory / MPC", "optimize u_{t:t+H} online, execute the first action, replan."), ("Skill / option", "high-level discrete or continuous skill z, low-level controller executes."), ("Constraint / cost program", "LLM/VLM emits costs, keypoints, or code; solver/controller executes."), ("Future-video proposal", "generate a desired future, then decode actions or select a plan."), ], ), dict( name="Learning objective", question="What loss or optimization signal trains the controller?", classical="Designed objective J, Lyapunov function, tracking error, robust/adaptive criteria.", modern="Demonstrations, rewards, preferences, fixed logs, hindsight relabeling, self-supervised prediction.", items=[ ("BC / imitation", "match expert actions; action head may be regression, flow, diffusion, tokenized, or energy-based."), ("RL / optimal control", "maximize expected return; PPO/SAC/Q-learning are learning versions of optimal control."), ("Offline RL", "optimize return from fixed logs while staying near data support."), ("IRL / adversarial imitation", "recover reward or discriminator from expert behavior, then optimize it."), ("Predictive self-supervision", "learn dynamics, video, occupancy, contact, or latent evolution."), ], ), dict( name="Predictive model", question="Does the system learn a simulator or future generator?", classical="System identification plus model-based control.", modern="Latent dynamics, action-conditioned video, WAMs, occupancy/contact predictors, policy evaluators.", items=[ ("No explicit world model", "reactive policy; dynamics are implicit in the policy."), ("Forward dynamics / MPC", "state + action -> next state; plan with CEM/MPPI/gradients."), ("Latent imagination", "learn z dynamics and train/plan inside the latent world."), ("AC-WM", "actions in, future out; supports counterfactuals, RL, evaluation, MPC."), ("WAM", "instruction in, successful future + actions out; strong proposal generator."), ("Occupancy / contact / latent state", "planning-friendly representation inside the world-model branch."), ], ), dict( name="Architecture / representation", question="How are actions, states, and tasks parameterized?", classical="State coordinates, features, basis functions, linearization, observers.", modern="Transformers, diffusion/flow heads, VLA trunks, tokenizers, JEPA latents, 3D occupancy fields.", items=[ ("VLA trunk", "foundation VLM/VLA backbone; not an objective by itself."), ("Action head", "continuous regression, flow, diffusion, tokenized autoregression, energy-based scoring."), ("Decision Transformer", "causal sequence architecture; can instantiate BC, offline RL, or goal-conditioned control."), ("Video / latent generative model", "diffusion, autoregressive, JEPA-style, or recurrent latent dynamics."), ("Spatial representation", "2D pixels, 3D state, occupancy, keypoints, contact, cost maps."), ], ), dict( name="Data regime", question="Where does the learning signal come from?", classical="Designed experiments, system-ID rollouts, calibrated sensors, known plant model.", modern="Teleop demos, simulation, offline logs, autonomous play, internet video, multi-embodiment data.", items=[ ("Expert demonstrations", "teleop or kinesthetic data; powers BC, ACT, diffusion/flow policies."), ("Reward rollouts", "online or simulated interaction; powers PPO/SAC/model-based RL."), ("Fixed logs", "offline RL and sequence models; must handle support constraints."), ("Play / failures", "especially valuable for AC-WMs because actions explain all outcomes."), ("Human / internet video", "pretrains visual priors, video models, latent action models, WAMs."), ("Cross-embodiment data", "Open-X/DROID-style scaling; requires action abstraction or embodiment-specific heads."), ], ), dict( name="Deployment role", question="What does the learned system do at runtime?", classical="Track, stabilize, estimate, plan, verify safety.", modern="Policy, planner, simulator, critic/evaluator, data generator, safety filter, skill orchestrator.", items=[ ("Policy", "outputs actions directly."), ("Planner", "searches or optimizes over actions/trajectories."), ("Simulator", "rolls out counterfactual futures."), ("Evaluator / critic", "scores policies, plans, or imagined futures."), ("Data generator", "creates synthetic rollouts, relabels, or proposal trajectories."), ("Orchestrator", "selects tools, skills, constraints, or subgoals."), ], ), ] METHOD_STACKS = [ ("Diffusion Policy", "Direct feedback / action chunk", "BC", "none", "diffusion action head", "expert demos", "policy"), ("π₀ / OpenPI", "Direct feedback", "BC", "none", "VLA trunk + flow head", "cross-embodiment demos", "policy"), ("OpenVLA / RT-2", "Direct feedback", "BC", "none", "VLA trunk + tokenized actions", "web/VLM + robot demos", "policy"), ("Decision Transformer", "Sequence decoder", "offline return-conditioned MLE", "optional learned dynamics outside it", "causal Transformer", "fixed logs with rewards", "policy / planner"), ("Dreamer / TD-MPC", "Latent controller", "RL inside learned model", "latent dynamics", "recurrent / latent model", "rollouts", "policy + simulator"), ("AC-WM", "Planner / simulator", "predictive self-supervision + optional RL", "action-conditioned future model", "video/latent/occupancy predictor", "successes + failures + play", "simulator / evaluator / planner"), ("WAM / DreamZero", "Future-video proposal", "video-action generation", "text-conditioned future generator", "video model + action decoder", "human/robot video + task labels", "proposal policy / planner"), ("SayCan / Code-as-Policies", "Skill orchestration", "planning over skills/costs", "usually external", "LLM/VLM + solver", "language + affordance data", "orchestrator"), ] EVOLUTION_STAGES = [ ("Classical feedback control", "PID, computed torque, impedance, LQR", "Known or identified model; hand-designed tracking/stability objective."), ("Optimal control and MPC", "trajectory optimization, CEM/MPPI/MPC", "Use dynamics + cost to optimize actions online."), ("Learning from demonstration", "LfD, BC, DAgger", "Replace hand-designed control law with supervised action learning."), ("Deep RL for robotics", "PPO, SAC, sim-to-real locomotion/manipulation", "Learn policies from reward, often in simulation."), ("Deep generative policies", "ACT, Diffusion Policy, flow policies", "Better multimodal imitation and long action chunks."), ("Offline decision models", "CQL/IQL, Decision Transformer, Trajectory Transformer, Diffuser", "Use fixed logs; sequence modeling enters decision-making."), ("Foundation-model policies", "RT-2, OpenVLA, π₀, Gemini Robotics, Helix", "VLM/VLA trunks inject language and semantic priors."), ("World-model robotics", "Dreamer, AC-WM, WAM, occupancy/contact WMs", "Learn simulators or future generators for planning, training, evaluation, and proposals."), ("Hybrid frontier", "VLA + world model + MPC/RL + LLM planner + safety filter", "Modern systems combine layers rather than choosing one algorithm."), ] def render_layered_ontology() -> str: layer_cards = [] for layer in ONTOLOGY_LAYERS: items = "".join( f"
  • {name}: {desc}
  • " for name, desc in layer["items"] ) layer_cards.append(f"""
    {layer['question']}

    {layer['name']}

    Control theory:
    {layer['classical']}
    Robot learning:
    {layer['modern']}
      {items}
    """) method_rows = "".join( f"" + "".join(f"{cell}" for cell in row) + "" for row in METHOD_STACKS ) evolution = "".join( f"
    " f"{stage}{examples}{meaning}
    " for stage, examples, meaning in EVOLUTION_STAGES ) return f"""
    Canonical view

    Robot learning is a stack, not a flat taxonomy

    The same method can be a policy, an architecture, a world model, a data recipe, and a deployment role. This layered ontology keeps classical control, robot learning, VLAs, sequence models, and world models in one clean frame.

    {''.join(layer_cards)}

    Method stacks

    {method_rows}
    MethodControl substrateObjectivePredictive modelArchitectureDataRole

    How the field evolved

    {evolution}
    Rule of thumb: classify a method by asking six questions in order: what closes the loop, what trains it, whether it predicts the future, how it represents actions/state/tasks, what data it uses, and what role it plays at deployment. The old tree is useful as a map, but this stack is the cleaner ontology.
    """ # --------------------------------------------------------------------------- # Family tree — pure HTML/CSS, no third-party renderer. # --------------------------------------------------------------------------- # Each tuple: (display-name, FAMILY key, [(leaf-name, leaf-mapping), ...]) TREE_DATA = [ ("Behavioral Cloning (BC)", "BC", [ ("Flow Matching Policy", "L = E‖v_θ(a^t,o,t) − (a¹−a⁰)‖²"), ("Diffusion Policy", "L = E‖ε − ε_θ(a^k,o,k)‖²"), ("Tokenized / Categorical BC", "L = −Σ_j log p_θ(a^(j) | o, a^( str: """Render a sideways SVG tree: root → family branches → paradigm leaves.""" # --- layout constants --- LEAF_W, LEAF_H = 370, 44 LEAF_PAD = 6 FAM_GAP = 22 FAM_W, FAM_H = 220, 50 ROOT_R = 70 X_ROOT_C = 95 X_FAM = 250 X_LEAF = 530 PAD_TOP = 30 PAD_BOTTOM = 30 leaves: list[dict] = [] families: list[dict] = [] y = PAD_TOP for fi, (fam_name, fam_key, paradigms) in enumerate(TREE_DATA): color = FAMILY[fam_key][0] first_y = y leaf_start = len(leaves) for n, m in paradigms: leaves.append({"fi": fi, "name": n, "mapping": m, "color": color, "y": y}) y += LEAF_H + LEAF_PAD last_y = y - LEAF_PAD fam_center = (first_y + last_y) / 2 fam_top = fam_center - FAM_H / 2 families.append({ "fi": fi, "name": fam_name, "color": color, "y": fam_top, "leaf_range": (leaf_start, len(leaves)), }) y += FAM_GAP total_h = y + PAD_BOTTOM - FAM_GAP total_w = X_LEAF + LEAF_W + 30 root_cy = (families[0]["y"] + families[-1]["y"] + FAM_H) / 2 parts: list[str] = [] parts.append( f'' ) # ---- interaction styles (pure-CSS hover + click/focus, no JS) ---- parts.append( "" ) # ---- branches: root → family ---- rx = X_ROOT_C + ROOT_R - 4 ry = root_cy for fam in families: fx = X_FAM fy = fam["y"] + FAM_H / 2 c1x = rx + (fx - rx) * 0.55 c2x = fx - (fx - rx) * 0.55 path = f"M {rx},{ry} C {c1x},{ry} {c2x},{fy} {fx},{fy}" gid = f"g-root-{fam['fi']}" parts.append( f'' f'' f'' f'' ) parts.append( f'' ) # ---- branches: family → leaf ---- for fam in families: fx_right = X_FAM + FAM_W - 3 fy = fam["y"] + FAM_H / 2 for li in range(*fam["leaf_range"]): leaf = leaves[li] lx = X_LEAF ly = leaf["y"] + LEAF_H / 2 c1x = fx_right + (lx - fx_right) * 0.5 c2x = lx - (lx - fx_right) * 0.5 path = f"M {fx_right},{fy} C {c1x},{fy} {c2x},{ly} {lx},{ly}" parts.append( f'' ) # ---- ROOT ---- parts.append( f'' f'' ) parts.append( f'' ) parts.append( f'🤖' ) parts.append( f'ROBOT' ) parts.append( f'POLICY' f'' ) # ---- FAMILY nodes ---- for fam in families: fam_name = html_lib.escape(fam["name"]) parts.append( f'' f'' ) parts.append( f'' f'
    ' f'{fam_name}
    ' ) # ---- LEAF cards ---- for leaf in leaves: c = leaf["color"] leaf_name = html_lib.escape(leaf["name"]) leaf_mapping = html_lib.escape(leaf["mapping"]) parts.append( f'' f'' ) parts.append( f'' ) parts.append( f'' f'
    ' f'
    {leaf_name}
    ' f'
    {leaf_mapping}
    ' f'
    ' ) parts.append("
    ") svg = "".join(parts) n_families = len(TREE_DATA) n_leaves = sum(len(t[2]) for t in TREE_DATA) return f"""
    🌳 Objective / Family Projection
    {n_families} families · {n_leaves} paradigms · this is a useful projection, but the Layered Ontology tab is the canonical view.
    Read this way: root on the left, broad families in the middle, representative leaves on the right. Some leaves are objectives, while others are architectures or world-model variants; use the Layered Ontology and Relationship Map tabs to see overlap.
    Hover any node to lift it; click (or tab to) a card to spotlight it.
    {svg}
    Notation:  o = observation  ·  s = state  ·  a = action  ·  g = goal  ·  R̂ = return-to-go  ·  π = policy  ·  Q = action-value  ·  f = dynamics  ·  φ = encoder  ·  D = discriminator  ·  τ = trajectory  ·  ẑ = latent action.
    Action-head superscripts:  a^k = action at diffusion step k  ·  a^t = action at flow time t ∈ [0,1]  ·  a^(j) = j-th action token.
    """ FAMILY_LAYOUT_CHOICES = [ "01 Compact Branch Cards", "02 Horizontal Swimlanes", "03 Subway Map", "04 Branch Matrix", "05 Radial Rings", "06 Accordion Codex", "07 Evolution Timeline", "08 Kanban Board", "09 Layer Stack", "10 Classic SVG Tree", "11 Game Skill Tree", ] def _family_payload(): rows = [] for fam_name, fam_key, paradigms in TREE_DATA: color, desc = FAMILY[fam_key] rows.append( dict( name=fam_name, key=fam_key, color=color, desc=desc, equation=FAMILY_EQUATIONS.get(fam_key, ""), leaves=[dict(name=n, mapping=m) for n, m in paradigms], ) ) return rows def _leaf_pills(leaves: list[dict], color: str, show_mapping: bool = False) -> str: if show_mapping: return "".join( f"
    {html_lib.escape(leaf['name'])}" f"{html_lib.escape(leaf['mapping'])}
    " for leaf in leaves ) return "".join( f"{html_lib.escape(leaf['name'])}" for leaf in leaves ) def _family_layout_css() -> str: return """ """ def _family_header(layout: str) -> str: n_families = len(TREE_DATA) n_leaves = sum(len(t[2]) for t in TREE_DATA) return f"""
    Family Projection alternatives

    {html_lib.escape(layout)}

    {n_families} projected families and {n_leaves} leaves. These are layout prototypes; the canonical taxonomy remains the layered ontology and ownership audit.

    """ _SKILL_TREE_CSS = """ """ def render_skill_tree() -> str: fams = _family_payload() n_families = len(fams) n_leaves = sum(len(f["leaves"]) for f in fams) branches = [] for i, fam in enumerate(fams): c = fam["color"] fam_name = html_lib.escape(fam["name"]) fam_desc = html_lib.escape(fam["desc"]) leaves = "".join( f'' for leaf in fam["leaves"] ) branches.append( f'
    ' f'' f'
    {leaves}
    ' f'
    ' ) return f"""
    {_SKILL_TREE_CSS}

    🎮 Robot Policy Skill Tree

    {n_families} branches · {n_leaves} skills
    Hover a branch to focus it (others dim) · hover a skill for its objective · click a skill to allocate it (gold = learned).
    🤖
    ROBOT
    POLICY
    {"".join(branches)}
    """ def render_family_layout(layout: str) -> str: fams = _family_payload() if layout not in FAMILY_LAYOUT_CHOICES: layout = FAMILY_LAYOUT_CHOICES[0] if layout == "11 Game Skill Tree": return render_skill_tree() if layout == "10 Classic SVG Tree": return render_tree() body = "" if layout == "01 Compact Branch Cards": cards = "".join( f"
    {html_lib.escape(fam['name'])}
    " f"
    {html_lib.escape(fam['desc'])}
    {_leaf_pills(fam['leaves'], fam['color'])}
    " for fam in fams ) body = f"
    {cards}
    " elif layout == "02 Horizontal Swimlanes": lanes = "".join( f"
    " f"
    {html_lib.escape(fam['name'])}
    {html_lib.escape(fam['key'])}
    " f"
    {_leaf_pills(fam['leaves'], fam['color'], True)}
    " for fam in fams ) body = f"
    {lanes}
    " elif layout == "03 Subway Map": lines = "".join( f"
    " f"
    {html_lib.escape(fam['name'])}
    " f"
    " + "".join( f"
    {html_lib.escape(leaf['name'])}
    " for leaf in fam["leaves"] ) + "
    " for fam in fams ) body = f"
    {lines}
    " elif layout == "04 Branch Matrix": rows = "".join( f"{html_lib.escape(fam['name'])}" f"{html_lib.escape(fam['key'])}" f"{len(fam['leaves'])}" f"
    {_leaf_pills(fam['leaves'], fam['color'])}
    " for fam in fams ) body = f"
    {rows}
    FamilyOwner keyLeavesParadigms
    " elif layout == "05 Radial Rings": rings = "".join( f"
    " f"
    {html_lib.escape(fam['name'])}
    {_leaf_pills(fam['leaves'], fam['color'])}
    " for fam in fams ) body = f"
    {rings}
    " elif layout == "06 Accordion Codex": details = "".join( f"
    {html_lib.escape(fam['name'])} · {len(fam['leaves'])} leaves" f"
    {html_lib.escape(fam['desc'])}
    {_leaf_pills(fam['leaves'], fam['color'], True)}
    " for fam in fams ) body = f"
    {details}
    " elif layout == "07 Evolution Timeline": timeline = "".join( f"
    " f"
    {i+1}
    " f"
    {html_lib.escape(fam['name'])}
    {html_lib.escape(fam['key'])}
    " f"
    {_leaf_pills(fam['leaves'], fam['color'])}
    " for i, fam in enumerate(fams) ) body = f"
    {timeline}
    " elif layout == "08 Kanban Board": cols = "".join( f"
    {html_lib.escape(fam['name'])}
    " f"
    {_leaf_pills(fam['leaves'], fam['color'], True)}
    " for fam in fams ) body = f"
    {cols}
    " elif layout == "09 Layer Stack": stack = "".join( f"
    " f"
    {html_lib.escape(fam['name'])}
    {html_lib.escape(fam['key'])}
    " f"
    {_leaf_pills(fam['leaves'], fam['color'])}
    " for fam in fams ) body = f"
    {stack}
    " return f"""
    {_family_layout_css()} {_family_header(layout)} {body}
    Selection guide: choose a layout based on the job. Cards are best for teaching; swimlanes and matrix are best for precision; subway/timeline are best for showing progression; kanban is best for scanning all leaves.
    """ CONNECTION_TREES = [ { "title": "Objective Lineage", "subtitle": "Paradigms grouped by what signal trains the policy.", "color": "#2563eb", "root": "Training objective", "children": [ ("BC / imitation", [ "MSE-BC", "Flow Matching Policy", "Diffusion Policy", "Tokenized / Categorical BC", "Energy-Based / Implicit BC", "DAgger", "Action Chunking", ]), ("Reward optimization", [ "Value-Based Q-Learning", "Policy Gradient PPO / TRPO", "Off-Policy Actor-Critic SAC / TD3", ]), ("Fixed-log optimization", [ "Offline RL", "Decision Transformer", "Trajectory Diffusion", ]), ("Reward recovery", [ "MaxEnt IRL", "GAIL / AIRL", ]), ("Goal conditioning", [ "Goal-conditioned control", "Hindsight relabeling", ]), ], }, { "title": "Model And Planning Tree", "subtitle": "Paradigms grouped by whether they predict futures and how planning uses them.", "color": "#7c3aed", "root": "Predictive model", "children": [ ("No explicit model", [ "BC heads", "VLA policies", "Direct RL policies", ]), ("Forward dynamics", [ "Forward-Dynamics + MPC", "Latent Imagination", "Dreamer / TD-MPC", ]), ("Action-conditioned world model", [ "AC-WM", "Occupancy / contact / latent state WM", "Policy evaluation", "RL inside the model", ]), ("World action model", [ "WAM / DreamZero", "Future-video proposal", "Best-of-N planning", ]), ], }, { "title": "Architecture Tree", "subtitle": "Paradigms grouped by representation rather than objective.", "color": "#ca8a04", "root": "Representation", "children": [ ("Continuous action heads", [ "Regression BC", "Flow Matching Policy", "Diffusion Policy", "Energy-Based BC", ]), ("Token sequence models", [ "Decision Transformer", "Trajectory Transformer", "Tokenized BC", "OpenVLA / RT-2 action tokens", ]), ("Foundation trunks", [ "VLA", "π₀ / OpenPI", "Gemini Robotics", "Helix", ]), ("Spatial / latent worlds", [ "Occupancy WM", "Latent state WM", "Video / JEPA latent predictor", ]), ], }, { "title": "Deployment Tree", "subtitle": "Paradigms grouped by what component they become at runtime.", "color": "#0891b2", "root": "Runtime role", "children": [ ("Policy", [ "BC", "Diffusion Policy", "Flow Policy", "VLA", "RL policy", ]), ("Planner", [ "MPC", "Trajectory optimization", "Diffuser / guided trajectory sampling", "Best-of-N WAM planning", ]), ("Simulator / evaluator", [ "AC-WM", "WorldGym-style evaluation", "Latent imagination", ]), ("Orchestrator", [ "LLM-as-Planner", "Code-as-Policies", "VLM-affordance programs", "Hierarchical skills", ]), ], }, ] CONNECTION_CROSSLINKS = [ ("Decision Transformer", "Sequence architecture", "BC", "Offline RL", "Goal-conditioned control"), ("VLA", "Architecture + data regime", "Tokenized BC", "Flow heads", "Diffusion heads"), ("Diffusion Policy", "BC objective", "Generative action head", "Trajectory sampling", "VLA head"), ("AC-WM", "World model", "MPC", "RL inside model", "Policy evaluation"), ("WAM", "World model", "Future-video proposal", "BC-like action decoder", "Best-of-N planning"), ("SayCan / Code-as-Policies", "LLM orchestration", "Skill library", "Affordance scoring", "Classical solver"), ] def render_connection_trees() -> str: """Render multiple compact paradigm-connection trees.""" tree_blocks = [] for tree in CONNECTION_TREES: child_blocks = [] for parent, leaves in tree["children"]: leaf_html = "".join( f"
  • {leaf}
  • " for leaf in leaves ) child_blocks.append(f"""
  • {parent}
      {leaf_html}
  • """) tree_blocks.append(f"""
    Connection tree

    {tree['title']}

    {tree['subtitle']}

    {tree['root']}
      {''.join(child_blocks)}
    """) cross_rows = "".join( "" + "".join( f"{cell}" if i else f"{cell}" for i, cell in enumerate(row) ) + "" for row in CONNECTION_CROSSLINKS ) return f"""
    Paradigm connections

    Connection trees between robot-learning paradigms

    Each tree slices the same methods by a different relationship: training objective, predictive model, architecture, or runtime role. The table underneath marks the important cross-links where one paradigm belongs to more than one branch.

    {''.join(tree_blocks)}
    {cross_rows}
    Paradigm Primary branch Connects to Also connects to Runtime interpretation
    """ WORLD_MODEL_PAPERS = [ ("RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies", "Atreya et al.", 2025, "World Models — Evaluation"), ("DreamZero: World Action Models are Zero-shot Policies", "Ye et al.", 2026, "World Action Model"), ("Large Video Planner Enables Generalizable Robot Control", "Chen et al.", 2025, "World Action Model"), ("mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs", "Pai et al.", 2025, "World Action Model"), ("Video Generators are Robot Policies", "Liang et al.", 2025, "World Action Model"), ("Unified Video Action Model", "Li et al.", 2025, "World Action Model"), ("Training Agents Inside of Scalable World Models", "Hafner et al.", 2025, "Action-Conditioned World Model"), ("Evaluating Gemini Robotics Policies in a Veo World Simulator", "Gemini Robotics Team et al.", 2025, "Action-Conditioned World Model"), ("Ctrl-World: A Controllable Generative World Model for Robot Manipulation", "Guo et al.", 2025, "Action-Conditioned World Model"), ("DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos", "Gao et al.", 2026, "Action-Conditioned World Model"), ("PlayWorld: Learning Robot World Models from Autonomous Play", "Yin et al.", 2026, "Action-Conditioned World Model"), ("V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning", "Assran et al.", 2025, "Action-Conditioned World Model"), ("Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning", "Kim et al.", 2026, "World Models — Planning"), ("World-Gymnast: Training Robots with Reinforcement Learning in a World Model", "Sharma et al.", 2026, "Action-Conditioned World Model"), ("Steering Your Diffusion Policy with Latent Space Reinforcement Learning", "Wagenmaker et al.", 2025, "World Models — Planning"), ("WorldGym: World Model as An Environment for Policy Evaluation", "Quevedo et al.", 2025, "Action-Conditioned World Model"), ] _existing_papers = {(title, authors, year, tag) for title, authors, year, tag in PAPERS} PAPERS.extend([paper for paper in WORLD_MODEL_PAPERS if paper not in _existing_papers]) SURVEY_PAPERS = [ ("A Survey on Vision-Language-Action Models for Embodied AI", "Ma et al.", 2024, "Survey — VLA / Embodied AI"), ("A Survey on Robotics with Foundation Models: toward Embodied AI", "Xu et al.", 2024, "Survey — Foundation Models"), ("Robot Learning in the Era of Foundation Models: A Survey", "Xiao et al.", 2023, "Survey — Foundation Models"), ("A Survey on Vision-Language-Action Models: An Action Tokenization Perspective", "Zhong et al.", 2025, "Survey — VLA / Action Tokenization"), ("Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey", "Shao et al.", 2025, "Survey — VLA / Manipulation"), ("Survey of Vision-Language-Action Models for Embodied Manipulation", "Li et al.", 2025, "Survey — VLA / Manipulation"), ("A Survey on Efficient Vision-Language-Action Models", "Yu et al.", 2025, "Survey — Efficient VLA"), ("Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey", "Guan et al.", 2025, "Survey — Efficient VLA"), ("Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review", "Lisondra et al.", 2025, "Survey — Service Robots"), ("Robot Learning from Human Videos: A Survey", "Ma et al.", 2026, "Survey — Human Video Learning"), ("World Model for Robot Learning: A Comprehensive Survey", "Hou et al.", 2026, "Survey — World Models"), ] _existing_papers = {(title, authors, year, tag) for title, authors, year, tag in PAPERS} PAPERS.extend([paper for paper in SURVEY_PAPERS if paper not in _existing_papers]) SURVEY_SOURCE_INDEX = [ dict( title="Robot Learning in the Era of Foundation Models: A Survey", authors="Xiao et al.", year=2023, arxiv="2311.14379", url="https://arxiv.org/abs/2311.14379", validates="Foundation-model robot learning across manipulation, navigation, planning, and reasoning.", landscape_role="Supports the high-level shift from task-specific robot learning to foundation-model policy stacks.", ), dict( title="A Survey on Robotics with Foundation Models: toward Embodied AI", authors="Xu et al.", year=2024, arxiv="2402.02385", url="https://arxiv.org/abs/2402.02385", validates="Foundation models for autonomous manipulation, high-level planning, low-level control, datasets, simulators, and benchmarks.", landscape_role="Supports separating foundation trunks from the objectives and controllers they are paired with.", ), dict( title="A Survey on Vision-Language-Action Models for Embodied AI", authors="Ma et al.", year=2024, arxiv="2405.14093", url="https://arxiv.org/abs/2405.14093", validates="VLA components, low-level control policies, high-level task planners, resources, and challenges.", landscape_role="Supports treating VLA as an architecture / representation family rather than a training objective.", ), dict( title="A Survey on Vision-Language-Action Models: An Action Tokenization Perspective", authors="Zhong et al.", year=2025, arxiv="2507.01925", url="https://arxiv.org/abs/2507.01925", validates="Action tokenization choices for VLA systems.", landscape_role="Supports the representation layer: tokenized actions are an action interface, not a separate objective.", ), dict( title="Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey", authors="Shao et al.", year=2025, arxiv="2508.13073", url="https://arxiv.org/abs/2508.13073", validates="Large VLM-based VLA taxonomies, monolithic vs hierarchical designs, RL integration, human-video learning, and world-model integration.", landscape_role="Supports the cross-link design: VLA systems are stacks that combine architecture, objective, data, and runtime role.", ), dict( title="Survey of Vision-Language-Action Models for Embodied Manipulation", authors="Li et al.", year=2025, arxiv="2508.15201", url="https://arxiv.org/abs/2508.15201", validates="VLA model structures, datasets, pre-training, post-training, and evaluation.", landscape_role="Supports the clean VLA branch and its data/evaluation modifiers.", ), dict( title="A Survey on Efficient Vision-Language-Action Models", authors="Yu et al.", year=2025, arxiv="2510.24795", url="https://arxiv.org/abs/2510.24795", validates="Efficient VLAs across data, model, and training process.", landscape_role="Supports efficiency as a deployment constraint layered over VLA architectures.", ), dict( title="Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey", authors="Guan et al.", year=2025, arxiv="2510.17111", url="https://arxiv.org/abs/2510.17111", validates="Latency, memory footprint, training/inference cost, model architecture, perception features, action generation, and deployment strategies.", landscape_role="Supports separating architecture efficiency from objective families like BC and RL.", ), dict( title="Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review", authors="Lisondra et al.", year=2025, arxiv="2505.20503", url="https://arxiv.org/abs/2505.20503", validates="Foundation models in mobile service robots, sensor fusion, real-time decision-making, task generalization, HRI, and deployment constraints.", landscape_role="Supports the runtime-role and deployment-bottleneck layers.", ), dict( title="Robot Learning from Human Videos: A Survey", authors="Ma et al.", year=2026, arxiv="2604.27621", url="https://arxiv.org/abs/2604.27621", validates="Human-video-based learning for robotics, human-robot skill transfer, and data foundations.", landscape_role="Supports human video as a data regime feeding WAMs, latent-action pretraining, representation learning, and imitation.", ), dict( title="World Model for Robot Learning: A Comprehensive Survey", authors="Hou et al.", year=2026, arxiv="2605.00080", url="https://arxiv.org/abs/2605.00080", validates="World models as learned simulators for policy learning, planning, simulation, evaluation, data generation, video generation, datasets, benchmarks, and protocols.", landscape_role="Supports predictive models as their own branch: AC-WM, WAM, latent imagination, and evaluator/simulator roles.", ), ] PARADIGM_NAMES = [p["name"] for p in PARADIGMS] FAMILY_LABELS = ["All"] + sorted({t for *_, t in PAPERS}) MAX_PAPER_YEAR = max(year for _, _, year, _ in PAPERS) LANDSCAPE_SYNTHESIS_HTML = r"""
    Clean landscape

    Robot learning is converging into policy stacks

    The cleanest map is not BC vs RL vs VLA vs world model. It is a stack: controller, objective, predictive model, architecture, data source, and runtime role. Most current papers are new combinations of these layers.

    1. Control substrate
    Classical feedback, MPC, skill graphs, neural policies, action chunkers, and future-video proposal generators are different ways to close the loop.
    2. Training signal
    BC, RL, offline RL, IRL, preferences, and predictive self-supervision are objectives. They should not be mixed up with architectures.
    3. Prediction layer
    World models are learned simulators or future generators. AC-WMs take actions in; WAMs output successful futures and actions.
    4. Representation
    VLA trunks, Transformers, diffusion/flow heads, tokenized actions, 3D maps, and latent states are representational choices.
    5. Data regime
    The central bottleneck is data: teleop demos, fixed logs, sim rollouts, play, failures, cross-embodiment data, and human videos.
    6. Deployment role
    At runtime a learned component may be a policy, planner, simulator, evaluator, data generator, safety filter, or LLM/VLM orchestrator.

    Current field map

    RegionWhat it meansRepresentative methodsMain open bottleneck
    Classical / optimal controlKnown dynamics, costs, constraints, stability, and replanning.PID, impedance, LQR, MPC, trajectory optimization.Manual modeling and limited semantic generality.
    Imitation policiesLearn direct control from demonstrations.BC, ACT, Diffusion Policy, flow policies, energy-based BC.Distribution shift, multimodality, and demonstration coverage.
    RL and offline RLOptimize reward through interaction or fixed logs.PPO, SAC, TD3, CQL, IQL, TD3+BC, VLA-RL.Reward design, exploration, safety, and real-world sample cost.
    Sequence decision modelsTreat control as conditional generation over tokens or trajectories.Decision Transformer, Trajectory Transformer, Diffuser.Long-horizon reliability and correct conditioning.
    Foundation robot policiesUse VLM/VLA trunks plus robot action heads and large multi-task data.RT-2, OpenVLA, π₀/OpenPI, Gemini Robotics, GR00T, Helix.Embodiment transfer, latency, evaluation, and data scaling.
    World-model roboticsLearn futures for planning, RL, evaluation, data generation, or proposals.Dreamer, TD-MPC, AC-WM, WAM, DreamZero, PlayWorld, WorldGym.Action grounding, physical consistency, long-horizon drift.
    LLM/VLM orchestrationUse language/vision models to select skills, constraints, code, or cost maps.SayCan, Code-as-Policies, VoxPoser, ReKep, VLM spatial programs.Grounding, verification, tool reliability, and skill-library limits.

    Survey reading stack

    VLA overview: Ma et al. 2024; Shao et al. 2025; Li et al. 2025.
    Action representation: Zhong et al. 2025 for tokenization; diffusion/flow heads for continuous actions.
    World models: Hou et al. 2026 for robot-learning world models; human-video and WAM papers for scalable data.
    Data and deployment: Yu et al. 2025 for efficient VLAs; Lisondra et al. 2025 for service robots; Ma et al. 2026 for human-video learning.
    Bottom line: modern robot learning is moving from single algorithms to integrated systems: VLA policy + generative action head + world model/evaluator + planner/orchestrator + safety/controller layer. The best taxonomy separates these layers instead of forcing each paper into exactly one family.
    """ SKILL_MAP_STAGES = [ { "id": "coarse", "label": "Stage 1 — Coarse Map", "subtitle": "First ask what job a component does in a robot-learning system.", "nodes": [ { "name": "Controller", "branch": "Control substrate", "plain": "The thing that actually sends actions to the robot.", "owns": "Closes the loop at runtime.", "not_owns": "Does not define the training loss, data source, or model architecture.", "formula": "a_t = pi(o_t, g)", "papers": "PID / impedance control; MPC; ACT; Diffusion Policy; pi0 / OpenPI.", "links": ["Learning objective", "Action representation", "Safety filter"], "unlocks": ["BC policy", "RL policy", "MPC planner", "VLA policy"], }, { "name": "Training Signal", "branch": "Learning objective", "plain": "The score or loss that teaches the robot what behavior is good.", "owns": "Defines what is optimized during training.", "not_owns": "Does not decide whether the model is a Transformer, VLA, or world model.", "formula": "min loss or max reward", "papers": "BC; PPO; SAC; CQL; IQL; GAIL; MaxEnt IRL.", "links": ["Controller", "Data source", "Evaluator"], "unlocks": ["BC", "RL", "Offline RL", "IRL"], }, { "name": "Future Model", "branch": "Predictive model", "plain": "A learned simulator that predicts what could happen next.", "owns": "Predicts future states, video, occupancy, contacts, or latent dynamics.", "not_owns": "Does not automatically become a policy; it may be a planner, evaluator, or data generator.", "formula": "future = f(current, action)", "papers": "PETS; Dreamer; TD-MPC; AC-WM; WAM; World Model surveys.", "links": ["MPC planner", "RL", "Policy evaluation"], "unlocks": ["Forward model", "Latent imagination", "AC-WM", "WAM"], }, { "name": "Representation", "branch": "Architecture / representation", "plain": "The language used inside the model: tokens, flows, diffusion steps, latents, or maps.", "owns": "Parameterizes observations, actions, goals, and memory.", "not_owns": "Does not by itself say what objective trained the robot.", "formula": "o, a, g -> tokens / latents / fields", "papers": "Decision Transformer; OpenVLA; RT-2; Diffusion Policy; V-JEPA 2.", "links": ["Training Signal", "Controller", "Future Model"], "unlocks": ["Action heads", "VLA trunk", "Sequence model", "Spatial map"], }, { "name": "Data Source", "branch": "Data regime", "plain": "Where the examples, rewards, logs, videos, or interactions come from.", "owns": "Defines the evidence available to learn from.", "not_owns": "Does not determine the algorithm alone; demos can train BC, VLA, or WAM decoders.", "formula": "D = demos / logs / rollouts / video", "papers": "Open X-Embodiment; DROID; MimicGen; human-video learning surveys.", "links": ["Training Signal", "Representation", "Evaluation"], "unlocks": ["Teleop demos", "Fixed logs", "Play", "Human video", "Cross-embodiment data"], }, { "name": "Runtime Role", "branch": "Deployment role", "plain": "What the learned thing is used for once the robot is running.", "owns": "Policy, planner, simulator, evaluator, data generator, orchestrator, or safety layer.", "not_owns": "Does not imply a particular training objective.", "formula": "component -> role in system", "papers": "SayCan; Code-as-Policies; WorldGym; DreamZero; MPC safety filters.", "links": ["Controller", "Future Model", "LLM planner"], "unlocks": ["Policy", "Planner", "Simulator", "Evaluator", "Orchestrator"], }, ], }, { "id": "middle", "label": "Stage 2 — Main Branches", "subtitle": "Then split the field into non-overlapping primary branches.", "nodes": [ { "name": "Imitation Branch", "branch": "Learning objective", "plain": "Copy expert behavior from demonstrations.", "owns": "Supervised action learning from expert actions.", "not_owns": "VLA, diffusion, and flow are action representations unless the BC loss is the main claim.", "formula": "min D[pi_theta(a|o) || pi_D(a|o)]", "papers": "BC; DAgger; ACT; Diffusion Policy; Flow Matching Policy.", "links": ["Action heads", "Teleop demos", "VLA trunk"], "unlocks": ["MSE-BC", "ACT", "Diffusion Policy", "Flow Matching Policy", "Tokenized BC"], }, { "name": "Reinforcement Branch", "branch": "Learning objective", "plain": "Improve behavior by maximizing reward.", "owns": "Reward optimization from interaction or imagined rollouts.", "not_owns": "A VLA fine-tuned with RL stays VLA as architecture, but RL is the objective.", "formula": "max E[sum gamma^t r_t]", "papers": "PPO; SAC; TD3; VLA-RL; VLAC.", "links": ["Simulator", "Reward model", "Safety"], "unlocks": ["Policy gradient", "Actor-critic", "VLA-RL", "Model-based RL"], }, { "name": "Offline Decision Branch", "branch": "Learning objective", "plain": "Learn from fixed logs without new robot interaction.", "owns": "Support-aware learning from static datasets.", "not_owns": "Decision Transformer is a sequence architecture that often lives here by use case.", "formula": "max Q(s, pi(s)) with pi near data", "papers": "BCQ; CQL; IQL; AWAC; TD3+BC; Decision Transformer.", "links": ["Fixed logs", "Sequence model", "Behavior constraint"], "unlocks": ["CQL", "IQL", "AWAC", "Decision Transformer"], }, { "name": "World Model Branch", "branch": "Predictive model", "plain": "Learn futures so the robot can plan, evaluate, or train in imagination.", "owns": "Prediction of future states, video, or latent dynamics.", "not_owns": "A world model is not automatically the controller.", "formula": "p(future | present, condition)", "papers": "PETS; Dreamer; TD-MPC; AC-WM; WAM; Hou et al. survey.", "links": ["MPC", "RL", "Policy evaluation", "Video data"], "unlocks": ["Forward dynamics", "Latent imagination", "AC-WM", "WAM", "Occupancy WM"], }, { "name": "Foundation Policy Branch", "branch": "Architecture / representation", "plain": "Use large vision-language backbones plus robot action heads.", "owns": "VLM/VLA trunk, action representation, and multi-task conditioning.", "not_owns": "Does not own BC or RL; those are the objectives used to train or tune it.", "formula": "(image, text) -> action head -> action", "papers": "RT-2; OpenVLA; pi0 / OpenPI; Gemini Robotics; GR00T; Helix.", "links": ["BC", "RL fine-tuning", "Cross-embodiment data", "World model"], "unlocks": ["Tokenized VLA", "Flow-head VLA", "Diffusion-head VLA", "Dual-system VLA"], }, { "name": "Orchestration Branch", "branch": "Deployment role", "plain": "Use language or vision-language models to choose skills, code, or constraints.", "owns": "High-level task decomposition and tool/skill selection.", "not_owns": "Does not own low-level policy learning unless it directly trains actions.", "formula": "plan = LLM(instruction, scene, tools)", "papers": "SayCan; Inner Monologue; Code-as-Policies; VoxPoser; ReKep.", "links": ["Skill library", "VLM affordance", "Classical solver"], "unlocks": ["LLM planner", "Spatial cost program", "Skill graph"], }, ], }, { "id": "fine", "label": "Stage 3 — Fine-Grained Leaves", "subtitle": "Finally inspect the leaves: equations, papers, and linked concepts.", "nodes": [ { "name": "Flow Matching Policy", "branch": "Imitation Branch", "plain": "Move noise smoothly into expert actions with a learned velocity field.", "owns": "A BC action head with a flow-matching loss.", "not_owns": "Not the whole VLA; pi0 uses this as one layer in a larger stack.", "formula": "L = E ||v_theta(a^t,o,t) - (a^1-a^0)||^2", "papers": "Conditional Flow Matching; pi0; pi0.5; OpenPI.", "links": ["BC", "VLA trunk", "ODE sampling", "Action chunking"], "unlocks": ["pi0 / OpenPI", "Fast continuous action generation"], }, { "name": "Diffusion Policy", "branch": "Imitation Branch", "plain": "Start from random action noise and denoise it into a robot action.", "owns": "A BC action head with a denoising objective.", "not_owns": "Not the same thing as trajectory diffusion or video diffusion.", "formula": "L = E ||epsilon - epsilon_theta(a^k,o,k)||^2", "papers": "Diffusion Policy; DP3; Equivariant Diffusion Policy; RDT-1B; Octo.", "links": ["BC", "Multimodal demos", "Action chunks", "VLA diffusion head"], "unlocks": ["Contact-rich manipulation", "Multimodal action distributions"], }, { "name": "Tokenized VLA", "branch": "Foundation Policy Branch", "plain": "Turn robot actions into tokens and predict them like words.", "owns": "Discrete or compressed action representation inside a VLA.", "not_owns": "The token loss is usually BC; the VLA branch owns the architecture, not the objective.", "formula": "L = -sum log p(a_token_j | image, text, previous tokens)", "papers": "RT-1; RT-2; OpenVLA; RT-X; pi0-FAST; Gemini Robotics.", "links": ["Tokenization", "Language model infrastructure", "Cross-embodiment data"], "unlocks": ["Generalist manipulation policy", "Shared action vocabulary"], }, { "name": "Decision Transformer", "branch": "Offline Decision Branch", "plain": "Treat a robot trajectory like a sentence and predict the next action.", "owns": "Sequence-model control over logged trajectories.", "not_owns": "Not a separate reward optimizer; it is usually supervised next-action prediction with return conditioning.", "formula": "p(a_t | return-to-go, states, past actions)", "papers": "Decision Transformer; Trajectory Transformer; Gato.", "links": ["Offline RL", "BC", "Goal conditioning", "Token sequence"], "unlocks": ["Return-conditioned control", "Long-context policy memory"], }, { "name": "AC-WM", "branch": "World Model Branch", "plain": "Ask what would happen if the robot tried a candidate action sequence.", "owns": "Action-conditioned future prediction.", "not_owns": "Not a policy proposal model; actions are inputs, not outputs.", "formula": "p(future | current observation, future actions)", "papers": "Dreamer; TD-MPC; Veo-Robotics; Ctrl-World; PlayWorld; WorldGym.", "links": ["MPC", "RL inside model", "Policy evaluation", "Failures and play"], "unlocks": ["Counterfactual rollouts", "Model-based policy improvement"], }, { "name": "WAM", "branch": "World Model Branch", "plain": "Imagine a successful future video and decode the actions to make it happen.", "owns": "Future-video proposal plus action decoding.", "not_owns": "Not a counterfactual simulator unless action conditioning is added.", "formula": "p(future video, actions | image, instruction)", "papers": "DreamZero; Large Video Planner; mimic-video; VideoPolicy; Unified Video Action Model.", "links": ["Video pretraining", "Best-of-N planning", "BC decoder", "VLA"], "unlocks": ["Instruction-conditioned action proposals", "Video-model policy prior"], }, { "name": "VLM Spatial Program", "branch": "Orchestration Branch", "plain": "Use a VLM to mark where the robot should act, then let a solver move there.", "owns": "Open-vocabulary spatial grounding and constraint generation.", "not_owns": "Not an end-to-end motor policy.", "formula": "cost map = VLM(scene, instruction); trajectory = argmin cost", "papers": "VoxPoser; MOKA; PIVOT; ReKep; RoboPoint.", "links": ["Classical motion planning", "Keypoints", "Affordances", "LLM planning"], "unlocks": ["Zero-shot spatial manipulation", "Inspectable constraints"], }, ], }, ] SKILL_MAP_NODE_LOOKUP = { node["name"]: (stage["label"], node) for stage in SKILL_MAP_STAGES for node in stage["nodes"] } SKILL_MAP_NODE_NAMES = list(SKILL_MAP_NODE_LOOKUP.keys()) OWNERSHIP_AUDIT = [ ("BC", "Learning objective", "match expert actions", "action head, demos, VLA trunk"), ("RL", "Learning objective", "maximize reward", "simulator, reward model, policy architecture"), ("Offline RL", "Learning objective", "optimize fixed logs with support constraints", "sequence model, dataset, behavior policy"), ("IRL / GAIL", "Learning objective", "infer reward or discriminator from demonstrations", "RL optimizer, demos"), ("Diffusion Policy", "Learning objective", "BC-style denoising action loss", "diffusion representation, demos, VLA head"), ("Flow Matching Policy", "Learning objective", "BC-style flow action loss", "ODE sampler, VLA trunk, cross-embodiment data"), ("Decision Transformer", "Architecture / representation", "causal sequence model for trajectories", "offline RL use case, BC loss, return tokens"), ("VLA", "Architecture / representation", "vision-language-action trunk and action interface", "BC/RL objective, dataset, action tokenizer"), ("Tokenized action", "Architecture / representation", "discrete action representation", "BC objective, VLA runtime"), ("AC-WM", "Predictive model", "action-conditioned future prediction", "MPC, RL, evaluator, play/failure data"), ("WAM", "Predictive model", "future-video proposal plus action decoding", "BC decoder, VLA prior, best-of-N planner"), ("Occupancy / latent WM", "Architecture / representation", "planning-friendly state representation inside a world model", "AC-WM, MPC, navigation/manipulation"), ("Open X / DROID-style data", "Data regime", "cross-embodiment robot data source", "VLA, BC, offline RL"), ("Human video learning", "Data regime", "unlabeled or weakly labeled human demonstration source", "WAM, latent action pretraining, visual representation"), ("SayCan / Code-as-Policies", "Deployment role", "skill orchestration at runtime", "LLM/VLM, skill library, solver"), ("VoxPoser / ReKep", "Deployment role", "spatial constraint or cost generation", "VLM grounding, classical motion planning"), ] def validate_landscape_structure() -> dict: """Return simple evidence that the skill map has single-owner nodes.""" required = {"name", "branch", "plain", "owns", "not_owns", "formula", "papers", "links", "unlocks"} names = [node["name"] for stage in SKILL_MAP_STAGES for node in stage["nodes"]] survey_titles = {title for title, *_ in SURVEY_PAPERS} source_titles = {src["title"] for src in SURVEY_SOURCE_INDEX} missing_fields = [ node.get("name", "") for stage in SKILL_MAP_STAGES for node in stage["nodes"] if required - set(node) ] return { "stages": len(SKILL_MAP_STAGES), "nodes": len(names), "unique_nodes": len(set(names)), "duplicate_nodes": sorted({name for name in names if names.count(name) > 1}), "missing_fields": missing_fields, "ownership_rows": len(OWNERSHIP_AUDIT), "survey_papers": len(SURVEY_PAPERS), "survey_sources": len(SURVEY_SOURCE_INDEX), "surveys_missing_sources": sorted(survey_titles - source_titles), "sources_missing_urls": sorted(src["title"] for src in SURVEY_SOURCE_INDEX if not src.get("url")), } def render_skill_map(stage_id: str = "coarse") -> str: stage = next((s for s in SKILL_MAP_STAGES if s["id"] == stage_id), SKILL_MAP_STAGES[0]) node_cards = [] for i, node in enumerate(stage["nodes"]): links = "".join(f"{html_lib.escape(link)}" for link in node["links"]) unlocks = "".join(f"
  • {html_lib.escape(item)}
  • " for item in node["unlocks"]) node_cards.append(f"""
    Lv {i + 1}
    {html_lib.escape(node['branch'])}

    {html_lib.escape(node['name'])}

    {html_lib.escape(node['plain'])}

    {html_lib.escape(node['formula'])}
    Details
    Owns:
    {html_lib.escape(node['owns'])}
    Does not own:
    {html_lib.escape(node['not_owns'])}
    Papers: {html_lib.escape(node['papers'])}
      {unlocks}
    """) return f"""
    RPG-style skill map

    Robot learning from coarse regions to fine-grained skills

    The map keeps branches clean by giving every node one primary job. Related ideas appear as links and unlocks, not duplicate children in multiple branches.

    {html_lib.escape(stage['label'])}

    {html_lib.escape(stage['subtitle'])}

    {len(stage['nodes'])} nodes
    {''.join(node_cards)}
    Test 1: one question.
    A branch is valid only if it answers one question: training signal, architecture, prediction, data, control, or runtime role.
    Test 2: one owner.
    A method gets one primary owner. VLA is architecture; BC/RL are objectives; AC-WM/WAM are predictive models.
    Test 3: links, not copies.
    Overlaps become prerequisites, modifiers, or cross-links. They are not repeated as separate children.
    Test 4: equation check.
    If two leaves have the same core equation, they belong together unless they differ by runtime role or representation.
    """ def render_skill_node(node_name: str) -> str: stage_label, node = SKILL_MAP_NODE_LOOKUP.get(node_name, next(iter(SKILL_MAP_NODE_LOOKUP.values()))) links = "".join(f"{html_lib.escape(link)}" for link in node["links"]) unlocks = "".join(f"
  • {html_lib.escape(item)}
  • " for item in node["unlocks"]) return f"""

    {html_lib.escape(node['name'])}

    {html_lib.escape(stage_label)} {html_lib.escape(node['branch'])}

    {html_lib.escape(node['plain'])}

    {html_lib.escape(node['formula'])}
    Branch owns
    {html_lib.escape(node['owns'])}
    Branch does not own
    {html_lib.escape(node['not_owns'])}
    Related papers: {html_lib.escape(node['papers'])}
    Linked words: {links}
    Unlocks
      {unlocks}
    """ def render_ownership_audit() -> str: result = validate_landscape_structure() rows = "".join( f"{html_lib.escape(item)}" f"{html_lib.escape(owner)}" f"{html_lib.escape(test)}" f"{html_lib.escape(links)}" for item, owner, test, links in OWNERSHIP_AUDIT ) duplicate_text = ", ".join(result["duplicate_nodes"]) if result["duplicate_nodes"] else "None" missing_text = ", ".join(result["missing_fields"]) if result["missing_fields"] else "None" return f"""
    Branch ownership audit

    One primary owner, many typed links

    This table is the validation layer behind the map. Ambiguous terms get exactly one primary owner; related ideas are recorded as links rather than duplicated branches.

    Stages
    {result['stages']}
    Skill nodes
    {result['nodes']} total / {result['unique_nodes']} unique
    Duplicates
    {html_lib.escape(duplicate_text)}
    Missing fields
    {html_lib.escape(missing_text)}
    {rows}
    Ambiguous item Primary owner Ownership test Allowed links, not duplicate children
    """ def render_survey_source_index() -> str: rows = "".join( f"" f"{html_lib.escape(src['title'])}
    {html_lib.escape(src['authors'])}, {src['year']}" f"arXiv:{html_lib.escape(src['arxiv'])}" f"{html_lib.escape(src['validates'])}" f"{html_lib.escape(src['landscape_role'])}" f"" for src in SURVEY_SOURCE_INDEX ) return f"""
    Survey source index

    Which surveys support which parts of the map?

    This tab makes the landscape auditable: every major survey is tied to the branch or validation claim it supports.

    {rows}
    Survey Source What it validates Role in this landscape
    Audit rule: a survey is not used as a flat category label. It is used as evidence for one or more map layers: objective, architecture, predictive model, data regime, or runtime role.
    """ VALIDATION_RUBRIC_HTML = r"""
    Validation method

    How to prove the landscape is clean

    A clean map does not mean every paper has only one idea. It means each branch owns one kind of claim, while overlaps are represented as typed links.

    Branch question test
    Every branch must answer one sentence: what closes the loop, what trains it, what predicts the future, what represents actions/state/task, what data it uses, or what role it plays.
    Primary-owner test
    Each leaf gets one primary branch. For example, VLA is primarily architecture/data scale; BC and RL are objectives; WAM and AC-WM are world-model roles.
    Equation test
    If two methods optimize the same objective but use different wrappers, they belong under the same objective branch with different representation links.
    Cross-link test
    A paper that combines ideas should be represented as a stack, not copied into multiple branches. Example: pi0 = VLA trunk + flow-matching BC head + cross-embodiment demos.

    Canonical ownership rules

    If the paper's main claim is...Put it under...Show these as links
    A new loss, reward, or optimization procedureLearning objectiveArchitecture, data, runtime role
    A new VLA/VLM trunk, tokenizer, action head, or latent representationArchitecture / representationBC/RL objective, dataset, embodiment
    A model that predicts futuresPredictive modelPlanner, policy, evaluator, video data
    A new dataset, teleop system, simulator, benchmark, or scaling recipeData regime / evaluationModel trained on it, objective used
    A way to compose tools, skills, or constraints at runtimeDeployment roleSkill library, solver, VLM, safety layer
    Practical validation workflow: take each new paper, write its six-layer stack, pick the one layer that contains the paper's main contribution, then add the other layers as links. If two branches both seem primary, the branch definitions are too broad and should be split by question.
    """ RELATIONSHIP_GUIDE_HTML = r"""
    How to read the branches

    Paradigms are not a flat list

    A branch can be an objective, an architecture, a data regime, a wrapper, or a representation. Some branches are containers for others.

    Objective branch
    Defines the loss or optimization target. Examples: BC, PPO/SAC, Offline RL, IRL.
    Architecture branch
    Defines how variables are represented and decoded. Examples: Decision Transformer, diffusion trajectory model, VLA trunk.
    Container branch
    Can hold other branches. Examples: world model can contain RL, MPC, BC proposals, occupancy prediction, or sequence decoding.

    Important overlaps

    ItemPrimary homeAlso overlaps withWhy
    Decision TransformerSequenceOffline RL, BC, Goal-conditioned controlCausal Transformer architecture; the objective is next-action prediction, steered by return or goal tokens.
    VLABC relationFlow, Diffusion, Tokenized BC, sometimes WAMVLA says what trunk/data scale you use; the action head supplies the actual training loss.
    World Action ModelWorld ModelsBC, VLA, best-of-N planningIt generates actions, but through a future-video model; it behaves like a policy proposal generator.
    Action-conditioned WMWorld ModelsMPC, RL, Offline RL, policy evaluationIt is a simulator: feed candidate actions, predict consequences, then plan or train inside it.
    Occupancy / latent state WMWorld ModelsAC-WM, MPC, navigation, manipulationOccupancy is a representation inside the world-model branch, not a separate objective.

    Decision Transformer, concretely

    In long-range decision transformers, the architecture is a causal sequence model over trajectory tokens. A typical token stream is return-to-go, state, previous action, return-to-go, state, previous action.... The model predicts the next action token. If you replace return-to-go with a goal, it becomes goal-conditioned sequence control. If you train only on expert demonstrations without reward tokens, it collapses toward BC. So DT is a receptacle: it contains BC-like supervised learning, offline-RL-style return conditioning, and long-horizon memory in one architecture.
    """ WORLD_MODELS_101_HTML = r"""
    Beginner guide

    World models for robotics: WAMs vs action-conditioned world models

    Based on Anirudha Majumdar's March 17, 2026 discussion of whether robotics world models should condition on future actions.

    What is a world model?

    A world model is a learned simulator. Instead of immediately commanding the robot, it predicts what the world would look like next. The robot can then use those predictions to choose an action, train a policy, or evaluate whether a policy is likely to work.

    The central design choice

    Do actions go into the model as a condition, or come out of the model as part of the generated plan? That single choice changes what data the model can use and what kind of planning it supports.

    Actions out

    World Action Model (WAM)

    [current image + text] → [future video + actions]

    The model sees the current scene and an instruction such as place the marker in the basket. It imagines a successful video and also decodes the actions needed to produce that video.

    Examples: DreamZero, mimic-video, VideoPolicy, Unified Video Action Model, Large Video Planner, Cosmos Policy-style best-of-N planning.

    Actions in

    Action-Conditioned World Model (AC-WM)

    [current image + future actions] → [future video]

    The model receives a candidate action sequence, for example end-effector poses for the next second, and predicts what would happen if the robot executed those actions.

    Examples: Dreamer, Veo-Robotics, Ctrl-World, DreamDojo, PlayWorld, World-Gymnast, WorldGym, V-JEPA 2-style latent prediction.

    Why WAMs are attractive

    Preserves video pretraining.
    A WAM still uses image + text inputs, so adapting a pretrained video model to robotics is a smaller distribution shift.
    Easier target.
    It mostly learns what successful task execution looks like, rather than every possible consequence of arbitrary actions.
    Good action proposals.
    Generate many possible successful futures, score them with a reward model, and pick the best one.
    Cross-embodiment friendliness.
    Most of the video model can be shared across robots; only the action decoder is embodiment-specific.

    Why action conditioning is powerful

    Uses more data.
    Successes, failures, autonomous play, and rollouts from any policy can all train the model.
    Counterfactual reasoning.
    Ask: what if the robot moved left instead of right? WAMs do not naturally answer that.
    RL inside the model.
    Train a policy in imagined rollouts before spending real robot time.
    Policy evaluation.
    Roll out a candidate policy in the model and estimate whether it will succeed.

    Concrete examples

    Pick-and-place.
    WAM: generate a plausible video of the cup going into the bin and decode actions. AC-WM: test ten grasp trajectories and predict which one spills or succeeds.
    Autonomous play.
    WAM: needs a task label after the fact. AC-WM: directly learns from the action sequence and resulting video, even if the attempt failed.
    Policy debugging.
    WAM: proposes a successful-looking behavior. AC-WM: runs the actual policy in imagination and reveals where it drifts or collides.
    Likely long-term direction: flexible conditioning. A single model can accept text when we want high-level proposals, actions when we need counterfactual rollouts, or both when we want task intent plus fine-grained control. This keeps WAM-style proposal quality while preserving AC-WM-style data scaling, planning, RL, and policy evaluation.
    """ def build_app() -> gr.Blocks: n_families = len(TREE_DATA) n_leaves = sum(len(t[2]) for t in TREE_DATA) n_papers = len(PAPERS) with gr.Blocks(title="Robot Learning Paradigms") as demo: gr.HTML( f"""""" ) legend_html = '
    ' + "".join( f"{name}" for name, (c, _) in FAMILY.items() ) + "
    " gr.HTML(legend_html) with gr.Tabs(): # ============== TAB 0: Layered Ontology ============== with gr.Tab("Layered Ontology"): gr.HTML(render_layered_ontology()) # ============== TAB 1: Landscape Synthesis ============== with gr.Tab("Landscape Synthesis"): gr.HTML(LANDSCAPE_SYNTHESIS_HTML) # ============== TAB 2: Skill Map ============== with gr.Tab("Skill Map"): with gr.Row(): skill_stage = gr.Radio( choices=[s["label"] for s in SKILL_MAP_STAGES], value=SKILL_MAP_STAGES[0]["label"], label="Stage", interactive=True, ) skill_node = gr.Dropdown( choices=SKILL_MAP_NODE_NAMES, value=SKILL_MAP_NODE_NAMES[0], label="Node detail", interactive=True, ) skill_map_html = gr.HTML(render_skill_map("coarse")) skill_node_html = gr.HTML(render_skill_node(SKILL_MAP_NODE_NAMES[0])) def update_skill_stage(stage_label): stage = next((s for s in SKILL_MAP_STAGES if s["label"] == stage_label), SKILL_MAP_STAGES[0]) first_node = stage["nodes"][0]["name"] return ( render_skill_map(stage["id"]), gr.update(choices=[n["name"] for n in stage["nodes"]], value=first_node), render_skill_node(first_node), ) skill_stage.change( update_skill_stage, inputs=skill_stage, outputs=[skill_map_html, skill_node, skill_node_html], ) skill_node.change(render_skill_node, inputs=skill_node, outputs=skill_node_html) # ============== TAB 3: Ownership Audit ============== with gr.Tab("Ownership Audit"): gr.HTML(render_ownership_audit()) # ============== TAB 4: Validation Rubric ============== with gr.Tab("Validation Rubric"): gr.HTML(VALIDATION_RUBRIC_HTML) # ============== TAB 5: Family Projection ============== with gr.Tab("Family Projection"): family_layout_dd = gr.Dropdown( choices=FAMILY_LAYOUT_CHOICES, value=FAMILY_LAYOUT_CHOICES[-1], label="Choose a layout prototype", interactive=True, ) family_layout_html = gr.HTML(render_family_layout(FAMILY_LAYOUT_CHOICES[-1])) family_layout_dd.change( render_family_layout, inputs=family_layout_dd, outputs=family_layout_html, ) # ============== TAB 6: Connection Trees ============== with gr.Tab("Connection Trees"): gr.HTML(render_connection_trees()) # ============== TAB 7: Paradigm Explorer ============== with gr.Tab("Paradigm Explorer"): with gr.Row(): with gr.Column(scale=1, min_width=240): gr.Markdown("**Pick a paradigm**") family_filter_pe = gr.Dropdown( choices=["All"] + list(FAMILY.keys()), value="All", label="Filter by family", interactive=True, ) paradigm_dd = gr.Dropdown( choices=PARADIGM_NAMES, value=PARADIGM_NAMES[0], label="Paradigm", interactive=True, ) with gr.Column(scale=3): paradigm_html = gr.HTML( render_paradigm(PARADIGM_NAMES[0]), elem_id="paradigm-card", ) def update_paradigm_dd(fam): if fam == "All": names = PARADIGM_NAMES else: names = [p["name"] for p in PARADIGMS if p["family"] == fam] if not names: return gr.update(choices=[], value=None), "Nothing for that family." return gr.update(choices=names, value=names[0]), render_paradigm(names[0]) family_filter_pe.change( update_paradigm_dd, inputs=family_filter_pe, outputs=[paradigm_dd, paradigm_html] ) paradigm_dd.change(render_paradigm, inputs=paradigm_dd, outputs=paradigm_html) # ============== TAB 8: Comparison ============== with gr.Tab("Side-by-Side"): gr.Markdown( "### Compare two paradigms at a glance\n" "Pick any two paradigms; their core equations and intuition land next to each other.\n" "The default pair contrasts the two BC heads the field has converged on." ) default_a, default_b = "Flow Matching Policy", "Diffusion Policy" with gr.Row(): pick_a = gr.Dropdown( choices=PARADIGM_NAMES, value=default_a, label="Paradigm A", interactive=True, ) pick_b = gr.Dropdown( choices=PARADIGM_NAMES, value=default_b, label="Paradigm B", interactive=True, ) cmp_html = gr.HTML(render_compare(default_a, default_b)) pick_a.change(render_compare, inputs=[pick_a, pick_b], outputs=cmp_html) pick_b.change(render_compare, inputs=[pick_a, pick_b], outputs=cmp_html) # ============== TAB 9: Pick Your Paradigm ============== with gr.Tab("Pick Your Paradigm"): gr.Markdown( "### What data + setup do you have?\n" "Three questions → a short list of paradigms worth trying first." ) with gr.Row(): q_data = gr.Radio( ["Expert demos", "Reward only", "Both", "Unlabeled video", "Language + scene only"], value="Expert demos", label="What data do you have?", ) q_env = gr.Radio( ["Real robot only", "Simulator available", "Logged data only"], value="Simulator available", label="What about the environment?", ) q_scale = gr.Radio( ["Small / single task", "Large / multi-task"], value="Small / single task", label="Scale?", ) rec_html = gr.HTML(recommend("Expert demos", "Simulator available", "Small / single task")) for ctl in (q_data, q_env, q_scale): ctl.change(recommend, inputs=[q_data, q_env, q_scale], outputs=rec_html) # ============== TAB 10: Paper Atlas ============== with gr.Tab(f"Paper Atlas ({n_papers}+)"): gr.Markdown( "### The papers behind the paradigms\n" "Filter by tag, year, or free-text. Tags include paradigm leaves " "**plus** relation tags (`DAgger (BC relation)`, `Sim2Real (RL relation)`, " "`Visual SSL (BC relation)`, `VLA — Flow / Diffusion / Tokenized head`, " "`LAPA (Video-WM relation)`) and survey tags (`Survey — VLA`, " "`Survey — World Models`, `Survey — Foundation Models`) so you can pull " "both primary papers and landscape reviews." ) with gr.Row(): fam_filter = gr.Dropdown(choices=FAMILY_LABELS, value="All", label="Paradigm / relation tag") year_min = gr.Slider(1989, MAX_PAPER_YEAR, value=1989, step=1, label="Min year") text_q = gr.Textbox(label="Search title / author") df_out = gr.Dataframe( value=get_atlas_df("All", 1989, ""), headers=["Year", "Title", "Authors", "Paradigm"], interactive=False, wrap=True, ) for ctl in (fam_filter, year_min, text_q): ctl.change(get_atlas_df, inputs=[fam_filter, year_min, text_q], outputs=df_out) # ============== TAB 11: Survey Sources ============== with gr.Tab("Survey Sources"): gr.HTML(render_survey_source_index()) # ============== TAB 12: Relationship Map ============== with gr.Tab("Relationship Map"): gr.HTML(RELATIONSHIP_GUIDE_HTML) # ============== TAB 13: World Models 101 ============== with gr.Tab("World Models 101"): gr.HTML(WORLD_MODELS_101_HTML) # ============== TAB 14: The Big Picture ============== with gr.Tab("The Big Picture"): gr.Markdown( """ ### The clean view: six layers, not one list | Layer | Classical control object | Modern robot-learning object | |---|---|---| | **Control substrate** | PID, impedance, LQR, MPC, hybrid automata | neural policy, action chunker, skill graph, trajectory optimizer, future-video proposal generator | | **Learning objective** | tracking loss, hand-designed cost, Lyapunov/robust objective | BC, RL, offline RL, IRL, preference learning, predictive self-supervision | | **Predictive model** | plant model, system identification, observer | latent dynamics, AC-WM, WAM, occupancy/contact/latent state model | | **Architecture** | state coordinates, linearization, basis functions | VLA trunk, diffusion/flow/tokenized head, Decision Transformer, video diffusion, JEPA latent | | **Data regime** | calibrated experiments, known model, system-ID rollouts | teleop demos, fixed logs, sim, play, failures, internet video, cross-embodiment data | | **Deployment role** | controller, estimator, planner, verifier | policy, planner, simulator, evaluator, data generator, LLM/VLM orchestrator | ### Why the old flat taxonomy breaks - **PPO, SAC, BC, IQL** are objectives / optimization procedures. - **VLA** is an architecture + pretraining + data recipe, usually paired with BC-style action heads. - **Decision Transformer** is a sequence architecture; it can instantiate BC, offline RL, or goal-conditioned control depending on conditioning tokens. - **AC-WM and WAM** both belong under world models, but play different roles: simulator vs proposal generator. - **Occupancy / latent state** is a representation inside world modeling, not a separate training objective. - **Domain randomization, sim-to-real, representation pretraining, cross-embodiment transfer** are cross-cutting modifiers. ### Evolution from control theory to current robot learning 1. **Classical feedback control →** PID, computed torque, impedance, LQR: hand-designed feedback laws and stability objectives. 2. **Optimal control / MPC →** known or identified dynamics plus online optimization over a horizon. 3. **Learning from demonstration →** replace hand-designed controllers with supervised policies from expert actions. 4. **Deep RL →** learn policy/value functions from reward, often in simulation with sim-to-real transfer. 5. **Generative policies →** ACT, diffusion, flow, and tokenized action heads model multimodal demonstrations better than MSE regression. 6. **Offline sequence decision models →** fixed logs become sequence data; DT, Trajectory Transformer, Diffuser blur BC and offline RL. 7. **Foundation-model robotics →** VLM/VLA trunks add language and semantic priors, but still need an action head and objective. 8. **World-model robotics →** learned simulators/future generators support planning, RL, policy evaluation, and proposal generation. 9. **Hybrid systems →** VLA policy + world model + MPC/RL + LLM planner + safety controller. ### Practical classification rule For any new method, classify it by answering: 1. What closes the loop at runtime? 2. What objective trains it? 3. Does it learn a future model? If yes, are actions inputs or outputs? 4. What architecture/representation does it use? 5. What data regime enables it? 6. What deployment role does it serve? That is the cleanest way to cover current robot learning without confusing objectives, architectures, representations, and recipes. """ ) gr.HTML( f"""
    Layered ontology of robot-learning paradigms · {n_leaves} paradigms · {n_papers}+ papers · 1989 → {MAX_PAPER_YEAR}
    """ ) return demo if __name__ == "__main__": build_app().launch( server_name="127.0.0.1", server_port=7864, inbrowser=False, theme=gr.themes.Soft(primary_hue="indigo", secondary_hue="violet"), css=CSS, head=HEAD_HTML, )