CoLAR Gemma 3-1B GSM-Hard RL

This repository stores CoLAR exports in a Hugging Face-compatible layout. The repo root works for standard Transformers loading, and extra_state.pt preserves the latent head for latent decoding.

Current Revision

Current tag: best-epoch00-step768
Stage: reinforcement-learning
Task: GSM-Hard reasoning
Compare slug: gemma3_1b_colar_gsm_hard_rl

Tagged Checkpoints

Tag	Local reference	Status
`best-epoch00-step768`	best epoch0 step768 export	current commit

Files

HF model files at repo root for standard decoding
extra_state.pt for CoLAR latent decoding
export_meta.json from the local export
latent_metadata.json with archival provenance

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('agurung/colar-gemma-3-1b-gsm-hard-rl', revision='best-epoch00-step768', torch_dtype='auto', device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('agurung/colar-gemma-3-1b-gsm-hard-rl', revision='best-epoch00-step768')

For latent decoding, download the same revision and use extra_state.pt together with the repo root model files.

Notes

This is the RL export selected by the server6 matrix scripts as the best checkpoint.

Downloads last month: 2

Safetensors

Model size

1.0B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for agurung/colar-gemma-3-1b-gsm-hard-rl

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Finetuned

(560)

this model