Camais03 commited on
Commit
1ff3a4d
·
verified ·
1 Parent(s): d63e969

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +197 -199
README.md CHANGED
@@ -15,9 +15,9 @@ tags:
15
  - beta
16
  ---
17
 
18
- # Crafter World Model (Beta)
19
 
20
- ## Update
21
 
22
  **Action sensitivity appears to be fixed in the current beta training setup, and the project has now moved into a beta phase.**
23
 
@@ -30,26 +30,25 @@ The main system is now reliably action-conditioned enough to expose publicly as
30
 
31
  ![Action Sensitivity Update](images/action_sensitivity_update.png)
32
 
33
- ### Current rollout behaviour
34
  <img src="images/rollout_large.gif" alt="Current rollout behaviour" width="1024">
35
 
36
  ### Training progress
37
- ![Training progress](images/wm_val_ode.png)
38
 
39
- ### Action sensitivity
40
- ![Action sensitivity](images/action_sensitivity_500k_full.png)
41
 
42
  ---
43
 
44
- ## What this repository is
45
 
46
  This repository contains my current work on an **action-conditioned world model for Crafter**, forming the first phase of a broader research agenda around:
47
 
48
- - model-based reinforcement learning
49
- - imagination-based control
50
- - long-horizon planning
51
- - sparse-reward environments
52
- - scalable world models trained on consumer hardware
53
 
54
  This project is currently best understood as a **research prototype in beta**.
55
 
@@ -57,324 +56,323 @@ It already includes a usable tokenizer, a latent-space action-conditioned world
57
 
58
  ---
59
 
60
- ## Current status
61
 
62
  The current system can:
63
 
64
- - compress Crafter observations into compact latent tokens
65
- - model future latent dynamics conditioned on actions
66
- - generate coherent multi-step rollouts
67
- - decode those rollouts back into plausible video
68
- - expose the model through an interactive imagined-game demo
69
 
70
  Earlier versions of this project could generate convincing futures without really following the supplied action sequence. That was the central bottleneck. The current setup appears to have resolved that issue well enough for public beta release.
71
 
72
  That said, the model is still not perfect. Remaining weaknesses include:
73
 
74
- - small-object confusion
75
- - some inventory and HUD detail errors
76
- - object-location drift
77
- - occasional mixing of similar sprites or structures
78
- - degradation over longer autoregressive rollouts
79
- - some remaining brittleness around rare states and rare transitions
80
 
81
  So this is a **serious, working beta research system**, not a final benchmarked product.
82
 
83
  ---
84
 
85
- ## Hardware note
86
 
87
  A major goal of this project is to show that meaningful world-model research can be done on **consumer hardware**.
88
 
89
  This work was trained on **a single RTX 3090 (24 GB)**.
90
 
91
- The setup should also be feasible on a **3060-class GPU** with smaller microbatches and **gradient accumulation**, at the cost of training speed.
92
 
93
  ---
94
 
95
- ## Project goal
96
 
97
  The immediate goal is to learn a world model that can:
98
 
99
- 1. compress Crafter observations into useful latent tokens
100
- 2. model future latent dynamics conditioned on actions
101
- 3. produce multi-step rollouts that are both visually coherent and action-faithful
102
 
103
  The longer-term goal is to use these learned dynamics for:
104
 
105
- - planning
106
- - control
107
- - reinforcement learning in imagination
108
- - eventually more general agents that can reason over imagined futures
109
 
110
  ---
111
 
112
- ## Relation to prior work
113
 
114
  This project is strongly inspired by recent scalable world-model work, especially the combination of:
115
 
116
- - causal or masked video tokenizers
117
- - latent-space dynamics models
118
- - action-conditioned rollout generation
119
- - evaluation through rollout quality and action sensitivity rather than reconstruction alone
120
 
121
  It is **inspired by Dreamer-4 style work**, but it is **not a full reproduction**, and it currently covers **only the world-modeling part of the pipeline**, not the later RL agent-training phase.
122
 
123
  It also draws from work on:
124
 
125
- - Crafter as a benchmark for sparse-reward, compositional environments
126
- - masked autoencoders as tokenizers for generative models
127
- - diffusion / shortcut-style training for latent dynamics
128
 
129
  ---
130
 
131
- ## What is included
132
 
133
  This repository currently includes code and assets for:
134
 
135
- - **Crafter data collection**
136
- - **causal MAE tokenizer pretraining**
137
- - **latent world-model pretraining**
138
- - **evaluation and diagnostics**
139
- - **interactive imagination demo / web-app deployment path**
140
- - **exported checkpoints in PyTorch, safetensors, and ONNX formats**
141
 
142
  It also includes supporting outputs such as:
143
 
144
- - rollout visualisations
145
- - validation plots
146
- - action sensitivity plots
147
- - failure-mode examples
148
- - exported checkpoints under `checkpoints/`
149
 
150
  ---
151
 
152
- ## Included checkpoints and exports
153
 
154
  The current exported files live under `checkpoints/` and include:
155
 
156
- - `mae_model.safetensors`
157
- - `mae_decode.onnx`
158
- - `world_model.safetensors`
159
- - `world_model_ema.safetensors`
160
- - `world_model.onnx`
161
 
162
  These are intended to support:
163
 
164
- - direct checkpoint download
165
- - lightweight inference experiments
166
- - Hugging Face Spaces deployment
167
- - future ONNX Runtime / browser / API-based demos
168
 
169
  ---
170
 
171
- ## Interactive demo / app
172
 
173
  This repo includes a usable interactive demo path for testing the world model as an imagined game.
174
 
175
  The basic idea is:
176
 
177
- - start from a real context window from the Crafter dataset
178
- - choose an action
179
- - predict the next latent frame with the world model
180
- - decode it through the MAE decoder
181
- - feed the prediction back into context
182
- - continue rolling forward open-loop
183
 
184
  This is not the real Crafter environment running underneath. It is the **model’s imagined continuation** of the game.
185
 
186
  The Hugging Face Space version is intended to make this easy to test without needing to run the training code locally.
187
 
188
- ### Intended controls
189
 
190
- - **Arrow keys / WASD**: movement
191
- - **Space**: interact / do
192
- - **Tab**: noop
193
- - **Shift**: sleep
194
- - **1–0**: place / craft actions
195
- - **R**: reset
196
- - **G**: save gif/json in the local demo version
197
 
198
  ---
199
 
200
- ## Training data
201
 
202
  The current beta model was trained primarily on **Crafter human expert data**.
203
 
204
- That choice was deliberate. At this stage, I wanted to maximize the density of meaningful action-conditioned transitions rather than optimize for broad coverage from random-policy play.
205
 
206
- A later stage of the project will revisit broader or more mixed data collection, including more game-agnostic or random-policy style data, but the current release is mainly built around the human expert regime.
207
 
208
  ---
209
 
210
- ## Data collection policy
211
 
212
  This repository also includes my current **Crafter data-collection policy code**, which was designed to improve action-conditioned learning by increasing the fraction of transitions where actions produce meaningful state changes.
213
 
214
  Key ideas include:
215
 
216
- - stuck detection through frame-change heuristics
217
- - forced interaction bursts when the agent appears stuck
218
- - periodic interleaving of interaction actions during movement
219
- - adaptive exploration behaviour
220
- - cleaner action classification logic to avoid action-name matching bugs
221
- - shard-based storage with episode metadata, gifs, and achievement events
222
 
223
- The motivation is simple: **if actions rarely produce visible consequences in the data, the world model has much less incentive to learn action-faithful dynamics**.
224
 
225
  ---
226
 
227
- ## Main components
228
 
229
- ## 1. Causal MAE tokenizer
230
 
231
  The tokenizer is a **causal masked autoencoder** trained on Crafter frame sequences.
232
 
233
  Main properties:
234
 
235
- - tube masking across frames
236
- - spatial self-attention within frames
237
- - periodic temporal causal attention
238
- - bottlenecked latent representation
239
- - MAE-style masked reconstruction objective
240
- - LPIPS-assisted reconstruction training
241
- - latent outputs intended for downstream world modeling, not just pretty decoding
242
 
243
  A major lesson from this project so far is that the tokenizer matters a lot more than it may seem at first.
244
 
245
  In particular:
246
 
247
- - **high masking turned out to be important**
248
- - lower masking can give cleaner-looking reconstructions while producing **worse downstream action sensitivity**
249
- - decoder quality alone is not a sufficient measure of whether the latent space is good for dynamics
250
 
251
  So although the decoder is mostly used for visualisation, the **latent space quality is still critical**, because the world model operates in that latent space.
252
 
253
  ---
254
 
255
- ## 2. Latent world model
256
 
257
  The world model is trained in latent space using an action-conditioned architecture based around a DiT-style token backbone.
258
 
259
  Current ingredients include:
260
 
261
- - action-conditioned latent prediction
262
- - shortcut-forcing style training
263
- - bucketed context / prediction-length sampling
264
- - autoregressive rollout evaluation
265
- - action-sensitivity diagnostics
266
- - EMA checkpointing
267
- - validation across multiple `(context, prediction)` regimes
268
 
269
  The world model now produces:
270
 
271
- - coherent future rollouts
272
- - much better action sensitivity than earlier versions
273
- - usable imagined-game behaviour in open loop
274
 
275
  This is the main milestone that moved the project into beta.
276
 
277
  ---
278
 
279
- ## 3. Diagnostics and evaluation
280
 
281
  I track progress with multiple diagnostics rather than relying on training loss alone.
282
 
283
  These include:
284
 
285
- - fixed-noise / denoising validation
286
- - ODE-style reconstruction validation
287
- - autoregressive rollout evaluation
288
- - action sensitivity evaluation
289
- - rollout gifs
290
- - failure-case inspection
291
- - multi-regime validation over several context/prediction bucket pairs
292
 
293
  This matters because a model can look good in one metric while still failing in the behaviour I actually care about.
294
 
295
  ---
296
 
297
- ## 4. Latent-space analysis
298
 
299
  I am also investigating better ways to reason about what makes a **good latent space** for downstream world modeling.
300
 
301
  The current exploratory tooling includes:
302
 
303
- - **UMAP** visualisation of latent structure
304
- - **GMM** complexity analysis over latent features
305
- - checkpoint-to-checkpoint latent comparisons
306
 
307
  At the moment this remains exploratory. I do not yet think I fully understand how to interpret these plots in a way that is directly actionable for world-model training, but I think it is an important direction.
308
 
309
  There is space in this repo for that analysis to become much more systematic over time.
310
 
311
- ### Example latent analysis
312
- ![UMAP of latent space](images/umap.png)
313
 
314
- ![GMM latent complexity](images/gmm_curve.png)
 
 
 
 
 
 
 
315
 
316
  ---
317
 
318
- ## Current strengths
319
 
320
  The current beta model already shows several encouraging properties:
321
 
322
- - coherent latent rollouts
323
- - meaningful action conditioning
324
- - usable open-loop imagination
325
- - multi-step rollout generation
326
- - stable training on a single consumer GPU
327
- - a clear path to demo deployment through Hugging Face Spaces
328
 
329
  ---
330
 
331
- ## Current failure modes
332
 
333
  The project is still very much an active research system, and several failure modes remain important.
334
 
335
- ### World-model failure modes
336
 
337
  Typical world-model failures include:
338
 
339
- - staying too stationary in some cases
340
- - gradual object-position drift across rollout steps
341
- - small rare details disappearing
342
- - rare entities or tiles becoming blurry or unstable
343
- - longer-horizon compounding error
344
-
345
- ### Tokenizer / decoder failure modes
346
-
347
- Typical tokenizer-related issues include:
348
 
349
- - inventory number mistakes
350
- - arrows or other small details being missed
351
- - confusion between furnaces, crafting tables, and similar sprites
352
- - imperfect preservation of object identity
353
- - occasional loss of fine HUD detail
354
 
355
- ### Example failure cases
356
- ![Failure mode example 1](images/failure_mode_1.png)
357
 
358
- ![Failure mode example 2](images/failure_mode_2.png)
359
 
360
- ### Action-space comparison
361
- ![Bad vs improved action sensitivity](images/action_space_bad_vs_fixed.png)
362
 
363
  These examples are included deliberately. I do not want the repo to present only the successes. The failure modes are a major part of the research story.
364
 
365
  ---
366
 
367
- ## Why the tokenizer matters so much
368
 
369
  One of the clearest takeaways from this work is that a tokenizer can look visually decent while still being a poor substrate for dynamics learning.
370
 
371
  A world model does not need the prettiest decoder output. It needs latents that preserve the distinctions required for:
372
 
373
- - causality
374
- - controllability
375
- - object identity
376
- - local change
377
- - action consequence
378
 
379
  That is why masking level, bottleneck structure, and latent organisation matter so much here.
380
 
@@ -382,15 +380,15 @@ My current view is that **a good world-model tokenizer is not just a compression
382
 
383
  ---
384
 
385
- ## Repository state
386
 
387
  A few caveats up front:
388
 
389
- - the training code is still a bit messy
390
- - some scripts were written for active notebook-based iteration
391
- - local paths may need editing before reuse
392
- - there are still older comments, experimental branches, and rough edges
393
- - names and interfaces may change as the project is cleaned up
394
 
395
  I am still sharing it because the core technical direction is now clear and useful.
396
 
@@ -398,59 +396,59 @@ Cleaning up the code for a more polished release is one of the next major tasks.
398
 
399
  ---
400
 
401
- ## Intended direction
402
 
403
  My aim is for this repository to become a strong base for other researchers who want to work on:
404
 
405
- - world models
406
- - latent dynamics
407
- - imagination-based planning
408
- - action-conditioned generative models
409
- - model-based RL on consumer hardware
410
 
411
  Over time I want this project to include:
412
 
413
- - cleaner training scripts
414
- - clearer explanations of each component
415
- - more structured ablations
416
- - better evaluation tools
417
- - fuller reproduction instructions
418
- - eventual downstream agent-training in imagination
419
 
420
  ---
421
 
422
- ## Scope of this release
423
 
424
  This release should be understood as:
425
 
426
- - a **beta research release**
427
- - a **working action-conditioned world model**
428
- - a **portfolio / research artifact**
429
- - a **foundation for future planning and RL work**
430
 
431
  It should **not** be understood as:
432
 
433
- - a polished library
434
- - a final benchmark result
435
- - a full Dreamer-4 reproduction
436
- - a complete end-to-end agent-training system
437
 
438
  ---
439
 
440
- ## If you want to explore the project
441
 
442
  Good places to start are:
443
 
444
- - the exported checkpoints under `checkpoints/`
445
- - the demo / app
446
- - the tokenizer training code
447
- - the world-model training code
448
- - the validation plots and rollout gifs
449
- - the failure-mode examples
450
 
451
  ---
452
 
453
- ## Acknowledgements
454
 
455
  This project was strongly influenced by several pieces of prior work:
456
 
@@ -464,12 +462,12 @@ Any mistakes, implementation choices, and deviations from the referenced work ar
464
 
465
  ---
466
 
467
- ## Citation
468
 
469
  If this repository is useful to your work, please cite the repository and the relevant upstream papers.
470
 
471
  ---
472
 
473
- ## Status
474
 
475
  **Beta. Active research. Action sensitivity fixed in the current setup, with further training, testing, cleanup, and longer-horizon improvement still in progress.**
 
15
  - beta
16
  ---
17
 
18
+ # Crafter World Model (Beta):
19
 
20
+ ## Update:
21
 
22
  **Action sensitivity appears to be fixed in the current beta training setup, and the project has now moved into a beta phase.**
23
 
 
30
 
31
  ![Action Sensitivity Update](images/action_sensitivity_update.png)
32
 
33
+ ### Current rollout behaviour:
34
  <img src="images/rollout_large.gif" alt="Current rollout behaviour" width="1024">
35
 
36
  ### Training progress
37
+ ![ODE progress](images/wm_val_ode.png)
38
 
39
+ ![Rollout progress](images/rollout_curves.png)
 
40
 
41
  ---
42
 
43
+ ## What this repository is:
44
 
45
  This repository contains my current work on an **action-conditioned world model for Crafter**, forming the first phase of a broader research agenda around:
46
 
47
+ - model-based reinforcement learning.
48
+ - imagination-based control.
49
+ - long-horizon planning.
50
+ - sparse-reward environments.
51
+ - scalable world models trained on consumer hardware.
52
 
53
  This project is currently best understood as a **research prototype in beta**.
54
 
 
56
 
57
  ---
58
 
59
+ ## Current status:
60
 
61
  The current system can:
62
 
63
+ - compress Crafter observations into compact latent tokens.
64
+ - model future latent dynamics conditioned on actions.
65
+ - generate coherent multi-step rollouts.
66
+ - decode those rollouts back into plausible video.
67
+ - expose the model through an interactive imagined-game demo.
68
 
69
  Earlier versions of this project could generate convincing futures without really following the supplied action sequence. That was the central bottleneck. The current setup appears to have resolved that issue well enough for public beta release.
70
 
71
  That said, the model is still not perfect. Remaining weaknesses include:
72
 
73
+ - small-object confusion.
74
+ - some inventory and HUD detail errors.
75
+ - considerable object-location drift.
76
+ - mixing of similar sprites or structures.
77
+ - degradation over longer autoregressive rollouts (can sometimes end stuck surrounded by stone).
 
78
 
79
  So this is a **serious, working beta research system**, not a final benchmarked product.
80
 
81
  ---
82
 
83
+ ## Hardware note:
84
 
85
  A major goal of this project is to show that meaningful world-model research can be done on **consumer hardware**.
86
 
87
  This work was trained on **a single RTX 3090 (24 GB)**.
88
 
89
+ The setup should also be feasible on a **3060/(12 GB) class GPUs** with smaller microbatches at the cost of training speed.
90
 
91
  ---
92
 
93
+ ## Project goal:
94
 
95
  The immediate goal is to learn a world model that can:
96
 
97
+ 1. compress Crafter observations into useful latent tokens.
98
+ 2. model future latent dynamics conditioned on actions.
99
+ 3. produce multi-step rollouts that are both visually coherent and action-faithful.
100
 
101
  The longer-term goal is to use these learned dynamics for:
102
 
103
+ - planning.
104
+ - control.
105
+ - reinforcement learning in imagination.
106
+ - eventually more general agents that can reason over imagined futures.
107
 
108
  ---
109
 
110
+ ## Relation to prior work:
111
 
112
  This project is strongly inspired by recent scalable world-model work, especially the combination of:
113
 
114
+ - causal or masked video tokenizers.
115
+ - latent-space dynamics models.
116
+ - action-conditioned rollout generation.
117
+ - evaluation through rollout quality and action sensitivity rather than reconstruction alone.
118
 
119
  It is **inspired by Dreamer-4 style work**, but it is **not a full reproduction**, and it currently covers **only the world-modeling part of the pipeline**, not the later RL agent-training phase.
120
 
121
  It also draws from work on:
122
 
123
+ - Crafter as a benchmark for sparse-reward, compositional environments.
124
+ - masked autoencoders as tokenizers for generative models.
125
+ - diffusion / shortcut-style training for latent dynamics.
126
 
127
  ---
128
 
129
+ ## What is included:
130
 
131
  This repository currently includes code and assets for:
132
 
133
+ - **Crafter data collection**.
134
+ - **causal MAE tokenizer pretraining**.
135
+ - **latent world-model pretraining**.
136
+ - **evaluation and diagnostics**.
137
+ - **interactive imagination demo / web-app deployment path**.
138
+ - **exported checkpoints in PyTorch, safetensors, and ONNX formats**.
139
 
140
  It also includes supporting outputs such as:
141
 
142
+ - rollout visualisations.
143
+ - validation plots.
144
+ - action sensitivity plots.
145
+ - failure-mode examples.
146
+ - exported checkpoints under `checkpoints/`.
147
 
148
  ---
149
 
150
+ ## Included checkpoints and exports:
151
 
152
  The current exported files live under `checkpoints/` and include:
153
 
154
+ - `mae_model.safetensors`.
155
+ - `mae_decode.onnx`.
156
+ - `world_model.safetensors`.
157
+ - `world_model_ema.safetensors`.
158
+ - `world_model.onnx`.
159
 
160
  These are intended to support:
161
 
162
+ - direct checkpoint download.
163
+ - lightweight inference experiments.
164
+ - Hugging Face Spaces deployment.
165
+ - future ONNX Runtime / browser / API-based demos.
166
 
167
  ---
168
 
169
+ ## Interactive demo / app:
170
 
171
  This repo includes a usable interactive demo path for testing the world model as an imagined game.
172
 
173
  The basic idea is:
174
 
175
+ - start from a real context window from the Crafter dataset.
176
+ - choose an action.
177
+ - predict the next latent frame with the world model.
178
+ - decode it through the MAE decoder.
179
+ - feed the prediction back into context.
180
+ - continue rolling forward open-loop.
181
 
182
  This is not the real Crafter environment running underneath. It is the **model’s imagined continuation** of the game.
183
 
184
  The Hugging Face Space version is intended to make this easy to test without needing to run the training code locally.
185
 
186
+ ### Intended controls:
187
 
188
+ - **Arrow keys / WASD**: movement.
189
+ - **Space**: interact / do.
190
+ - **Tab**: noop.
191
+ - **Shift**: sleep.
192
+ - **1–0**: place / craft actions.
193
+ - **R**: reset.
194
+ - **G**: save gif/json in the local demo version.
195
 
196
  ---
197
 
198
+ ## Training data:
199
 
200
  The current beta model was trained primarily on **Crafter human expert data**.
201
 
202
+ That choice was deliberate. At this stage, I wanted to maximize the density of meaningful action-conditioned transitions. The full plan is to make this setup fairly game agnostic with random policies, train in imagination, gather data with better policy retrain forming a feedback loop.
203
 
204
+ A later stage of the project will revisit broader or more mixed data collection, including more game-agnostic or random-policy style data, but the current release is mainly built around the human expert regime. This is only 100 episodes of human expert gameplay so not a huge dataset by any means.
205
 
206
  ---
207
 
208
+ ## Data collection policy:
209
 
210
  This repository also includes my current **Crafter data-collection policy code**, which was designed to improve action-conditioned learning by increasing the fraction of transitions where actions produce meaningful state changes.
211
 
212
  Key ideas include:
213
 
214
+ - stuck detection through frame-change heuristics.
215
+ - forced interaction bursts when the agent appears stuck.
216
+ - periodic interleaving of interaction actions during movement.
217
+ - adaptive exploration behaviour.
218
+ - cleaner action classification logic to avoid action-name matching bugs.
219
+ - shard-based storage with episode metadata, gifs, and achievement events.
220
 
221
+ This was all mainly to improve achievement coverage while remaining as a game-like agnostic loop (explore, try interact, try craft). It get's on average 10/22 achievements with some shards reaching 14.
222
 
223
  ---
224
 
225
+ ## Main components:
226
 
227
+ ## 1. Causal MAE tokenizer:
228
 
229
  The tokenizer is a **causal masked autoencoder** trained on Crafter frame sequences.
230
 
231
  Main properties:
232
 
233
+ - independent masking across frames (possible experiments with higher masking ratios with tube masking).
234
+ - spatial self-attention within frames.
235
+ - periodic temporal causal attention.
236
+ - bottlenecked latent representation.
237
+ - MAE-style masked reconstruction objective.
238
+ - LPIPS-assisted reconstruction training.
239
+ - latent outputs intended for downstream world modeling, not just pretty decoding.
240
 
241
  A major lesson from this project so far is that the tokenizer matters a lot more than it may seem at first.
242
 
243
  In particular:
244
 
245
+ - **high masking turned out to be important**.
246
+ - lower masking can give cleaner-looking reconstructions while producing **worse downstream action sensitivity**.
247
+ - decoder quality alone is not a sufficient measure of whether the latent space is good for dynamics.
248
 
249
  So although the decoder is mostly used for visualisation, the **latent space quality is still critical**, because the world model operates in that latent space.
250
 
251
  ---
252
 
253
+ ## 2. Latent world model:
254
 
255
  The world model is trained in latent space using an action-conditioned architecture based around a DiT-style token backbone.
256
 
257
  Current ingredients include:
258
 
259
+ - action-conditioned latent prediction.
260
+ - shortcut-forcing style training.
261
+ - bucketed context / prediction-length sampling.
262
+ - autoregressive rollout evaluation.
263
+ - action-sensitivity diagnostics.
264
+ - EMA checkpointing.
265
+ - validation across multiple `(context, prediction)` regimes.
266
 
267
  The world model now produces:
268
 
269
+ - coherent future rollouts.
270
+ - much better action sensitivity than earlier versions.
271
+ - usable imagined-game behaviour in open loop.
272
 
273
  This is the main milestone that moved the project into beta.
274
 
275
  ---
276
 
277
+ ## 3. Diagnostics and evaluation:
278
 
279
  I track progress with multiple diagnostics rather than relying on training loss alone.
280
 
281
  These include:
282
 
283
+ - fixed-noise / denoising validation.
284
+ - ODE-style reconstruction validation.
285
+ - autoregressive rollout evaluation.
286
+ - action sensitivity evaluation.
287
+ - rollout gifs.
288
+ - failure-case inspection.
289
+ - multi-regime validation over several context/prediction bucket pairs.
290
 
291
  This matters because a model can look good in one metric while still failing in the behaviour I actually care about.
292
 
293
  ---
294
 
295
+ ## 4. Latent-space analysis:
296
 
297
  I am also investigating better ways to reason about what makes a **good latent space** for downstream world modeling.
298
 
299
  The current exploratory tooling includes:
300
 
301
+ - **UMAP** visualisation of latent structure.
302
+ - **GMM** complexity analysis over latent features.
303
+ - checkpoint-to-checkpoint latent comparisons.
304
 
305
  At the moment this remains exploratory. I do not yet think I fully understand how to interpret these plots in a way that is directly actionable for world-model training, but I think it is an important direction.
306
 
307
  There is space in this repo for that analysis to become much more systematic over time.
308
 
309
+ Below is an example of a UMAP on a latent space known to have good and bad action sensitivity. Not sure these are the best to really probe such spaces though:
 
310
 
311
+ ### Example latent analysis:
312
+ ![UMAP of good latent space](images/umap_good.png)
313
+
314
+ ![UMAP of bad latent space](images/umap_good.png)
315
+
316
+ ![UMAP of good latent space with examples](images/umap_good_examples.png)
317
+
318
+ I need to properly go over these models. I think the GMM is currently setup incorrectly and the UMAP isn't really comparable as they're on different splits.
319
 
320
  ---
321
 
322
+ ## Current strengths:
323
 
324
  The current beta model already shows several encouraging properties:
325
 
326
+ - coherent latent rollouts.
327
+ - meaningful action conditioning.
328
+ - usable open-loop imagination.
329
+ - multi-step rollout generation.
330
+ - stable training on a single consumer GPU.
331
+ - a clear path to demo deployment through Hugging Face Spaces.
332
 
333
  ---
334
 
335
+ ## Current failure modes:
336
 
337
  The project is still very much an active research system, and several failure modes remain important.
338
 
339
+ ### Failure modes:
340
 
341
  Typical world-model failures include:
342
 
343
+ - Snapping away from chosen direction.
344
+ - Object-position drift across rollout steps.
345
+ - Memory of space (can lock you in a wall of stone).
346
+ - Rare details disappearing (tokenizer issue not sure if it's the encoder or decoder).
347
+ - NPC's and tiles becoming blurry or unstable and/or swapping places (moving sand/stone or coal).
348
+ - Confusion between furnaces, crafting tables, and similar sprites
349
+ - Imperfect preservation of object identity
350
+ - Occasional loss of fine HUD detail
 
351
 
352
+ ### Example failure cases:
353
+ ![Failure mode arrows and npc swap](images/arrows_npc_swap_issues.png)
 
 
 
354
 
355
+ ![Failure mode enemy, furnace and tree blurring](images/enemy_furnace_tree_issues.png)
 
356
 
357
+ ![Failure mode furnace and crafting issues](images/furnace_crafting_issues.png)
358
 
359
+ ![Failure mode action following and object position](images/action_following_and_object_position.gif)
 
360
 
361
  These examples are included deliberately. I do not want the repo to present only the successes. The failure modes are a major part of the research story.
362
 
363
  ---
364
 
365
+ ## Why the tokenizer matters so much:
366
 
367
  One of the clearest takeaways from this work is that a tokenizer can look visually decent while still being a poor substrate for dynamics learning.
368
 
369
  A world model does not need the prettiest decoder output. It needs latents that preserve the distinctions required for:
370
 
371
+ - causality.
372
+ - controllability.
373
+ - object identity.
374
+ - local change.
375
+ - action consequence.
376
 
377
  That is why masking level, bottleneck structure, and latent organisation matter so much here.
378
 
 
380
 
381
  ---
382
 
383
+ ## Repository state:
384
 
385
  A few caveats up front:
386
 
387
+ - the training code is still a bit messy.
388
+ - some scripts were written for active notebook-based iteration.
389
+ - local paths may need editing before reuse.
390
+ - there are still older comments, experimental branches, and rough edges.
391
+ - names and interfaces may change as the project is cleaned up.
392
 
393
  I am still sharing it because the core technical direction is now clear and useful.
394
 
 
396
 
397
  ---
398
 
399
+ ## Intended direction:
400
 
401
  My aim is for this repository to become a strong base for other researchers who want to work on:
402
 
403
+ - world models.
404
+ - latent dynamics.
405
+ - imagination-based planning.
406
+ - action-conditioned generative models.
407
+ - model-based RL on consumer hardware.
408
 
409
  Over time I want this project to include:
410
 
411
+ - cleaner training scripts.
412
+ - clearer explanations of each component.
413
+ - more structured ablations.
414
+ - better evaluation tools.
415
+ - fuller reproduction instructions.
416
+ - eventual downstream agent-training in imagination.
417
 
418
  ---
419
 
420
+ ## Scope of this release:
421
 
422
  This release should be understood as:
423
 
424
+ - a **beta research release**.
425
+ - a **working action-conditioned world model**.
426
+ - a **portfolio / research artifact**.
427
+ - a **foundation for future planning and RL work**.
428
 
429
  It should **not** be understood as:
430
 
431
+ - a polished library.
432
+ - a final benchmark result.
433
+ - a full Dreamer-4 reproduction.
434
+ - a complete end-to-end agent-training system.
435
 
436
  ---
437
 
438
+ ## If you want to explore the project:
439
 
440
  Good places to start are:
441
 
442
+ - the exported checkpoints under `checkpoints/`.
443
+ - the demo / app.
444
+ - the tokenizer training code.
445
+ - the world-model training code.
446
+ - the validation plots and rollout gifs.
447
+ - the failure-mode examples.
448
 
449
  ---
450
 
451
+ ## Acknowledgements:
452
 
453
  This project was strongly influenced by several pieces of prior work:
454
 
 
462
 
463
  ---
464
 
465
+ ## Citation:
466
 
467
  If this repository is useful to your work, please cite the repository and the relevant upstream papers.
468
 
469
  ---
470
 
471
+ ## Status:
472
 
473
  **Beta. Active research. Action sensitivity fixed in the current setup, with further training, testing, cleanup, and longer-horizon improvement still in progress.**