Reinforcement Learning
Transformers
Safetensors
qwen3
text-generation
blimp
textworld
text-generation-inference
Instructions to use andthattoo/blimp-textworld-standard-q8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use andthattoo/blimp-textworld-standard-q8 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("andthattoo/blimp-textworld-standard-q8") model = AutoModelForMultimodalLM.from_pretrained("andthattoo/blimp-textworld-standard-q8") - Notebooks
- Google Colab
- Kaggle
blimp-textworld-standard-q8
Standard full-history RL on TextWorld q8.
This is a full-parameter RL fine-tuned checkpoint, not a LoRA adapter.
Base model: Qwen/Qwen3-1.7B
Final held-out TextWorld q8 eval, 32 episodes:
- untrained Qwen3-1.7B: success 0.375, mean steps 36.59
- standard full-history RL: success 0.375, mean steps 35.375
- BLiMP block-memory RL: success 0.53125, mean steps 33.25
- BLiMP + ECHO/score: success 0.5, mean steps 33.71875
GitHub repo: https://github.com/andthattoo/blimp
- Downloads last month
- 23