---
library_name: transformers
base_model: Qwen/Qwen3-1.7B
tags:
- blimp
- textworld
- reinforcement-learning
- qwen3
---

# blimp-textworld-standard-q8

Standard full-history RL on TextWorld q8.

This is a full-parameter RL fine-tuned checkpoint, not a LoRA adapter.

Base model: `Qwen/Qwen3-1.7B`

Final held-out TextWorld q8 eval, 32 episodes:

- untrained Qwen3-1.7B: success 0.375, mean steps 36.59
- standard full-history RL: success 0.375, mean steps 35.375
- BLiMP block-memory RL: success 0.53125, mean steps 33.25
- BLiMP + ECHO/score: success 0.5, mean steps 33.71875

GitHub repo: https://github.com/andthattoo/blimp