blimp-textworld-standard-q8

Standard full-history RL on TextWorld q8.

This is a full-parameter RL fine-tuned checkpoint, not a LoRA adapter.

Base model: Qwen/Qwen3-1.7B

Final held-out TextWorld q8 eval, 32 episodes:

  • untrained Qwen3-1.7B: success 0.375, mean steps 36.59
  • standard full-history RL: success 0.375, mean steps 35.375
  • BLiMP block-memory RL: success 0.53125, mean steps 33.25
  • BLiMP + ECHO/score: success 0.5, mean steps 33.71875

GitHub repo: https://github.com/andthattoo/blimp

Downloads last month
23
Safetensors
Model size
2B params
Tensor type
BF16
·
Video Preview
loading

Model tree for andthattoo/blimp-textworld-standard-q8

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(812)
this model