--- library_name: transformers base_model: Qwen/Qwen3-1.7B tags: - blimp - textworld - reinforcement-learning - qwen3 --- # blimp-textworld-standard-q8 Standard full-history RL on TextWorld q8. This is a full-parameter RL fine-tuned checkpoint, not a LoRA adapter. Base model: `Qwen/Qwen3-1.7B` Final held-out TextWorld q8 eval, 32 episodes: - untrained Qwen3-1.7B: success 0.375, mean steps 36.59 - standard full-history RL: success 0.375, mean steps 35.375 - BLiMP block-memory RL: success 0.53125, mean steps 33.25 - BLiMP + ECHO/score: success 0.5, mean steps 33.71875 GitHub repo: https://github.com/andthattoo/blimp