Add missing tokenizer files

ccee67b verified 7 months ago

4.13 kB

	---
	license: apache-2.0
	language:
	- dak
	- en
	tags:
	- reinforcement-learning
	- rl
	- grpo
	- dakota
	- indigenous-languages
	- thinking-machines
	- tinker
	base_model: Qwen/Qwen3-30B-A3B-Instruct-2507
	datasets:
	- HarleyCooper/Dakota-Grammar-RL
	---

	# Qwen3-30B-ThinkingMachines-Dakota1890

	This model is a Reinforcement Learning (RL) fine-tune of Qwen3-30B-A3B-Instruct-2507, optimized for Dakota language grammar and morphology.

	It was trained using the Thinking Machines Tinker distributed RL pipeline, leveraging the GRPO (Group Relative Policy Optimization) algorithm. The training process used a custom verifier environment built from Stephen Return Riggs' 1890 Dakota Grammar & Dictionary.

	## Model Details

	- Base Model: Qwen/Qwen3-30B-A3B-Instruct-2507
	- Architecture: LoRA Adapter (Rank 32)
	- Training Method: GRPO (Group Relative Policy Optimization)
	- Training Infrastructure: Thinking Machines Tinker
	- Language: Dakota (dak), English (en)
	- License: Apache 2.0

	## Training Data & Methodology

	The model was trained on a dataset of ~10,000 RL tasks generated from the 1890 Dakota Grammar. These tasks focus on:

	1. Morphology: Applying prefixes/suffixes (e.g., possessives `-ku`, `-ću`, `-tku`).
	2. Translation: Context-aware translation between Dakota and English.
	3. Character Preservation: Strict adherence to Dakota orthography (ŋ, š, ć, ź, ž, ʼ).

	### Reward Function

	The RL training used a composite reward function (`DakotaGrammarRubric`) with the following components:

	* Character Preservation (20%): Verifies correct usage of special Unicode characters.
	* Affix Accuracy (10%): Checks for correct morphological transformations.
	* Exact Match (40%): Rewards precise answers for rigid grammatical tasks.
	* Pattern Matching (15%): Uses regex to verify structural correctness.
	* Length Penalty (15%): Prevents verbosity.

	## Performance

	(Metrics from the Thinking Machines run `i55d4x26`)

	* Composite Reward: 0.317 final (peak 0.442 at step 116; start 0.105)
	* Character Preservation: 0.619 final (peak 0.699; start 0.265)
	* Affix Accuracy: 1.000 final (start 0.957)
	* Exact Match: 0.100 final (peak 0.337; start 0.001)

	## Usage

	### With Hugging Face Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"
	adapter_name = "HarleyCooper/Qwen3-30B-ThinkingMachines-Dakota1890"

	model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	device_map="auto",
	torch_dtype="auto",
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(base_model_name)
	model = PeftModel.from_pretrained(model, adapter_name)

	# Inference
	prompt = "Translate 'my elder brother' to Dakota using the correct possessive suffix."
	messages = [
	{"role": "system", "content": "You are a Dakota language expert."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=128)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	# Alternatively:
	# from peft import AutoPeftModelForCausalLM
	# model = AutoPeftModelForCausalLM.from_pretrained(adapter_name, device_map="auto")
	```

	### With Thinking Machines Tinker

	This checkpoint is also available directly via the Tinker platform at:

	```
	tinker://da1ef918-d67a-5080-b500-dd1256db9ca7:train:0/sampler_weights/final
	```

	## Files

	* `adapter_model.safetensors`: The LoRA adapter weights.
	* `adapter_config.json`: Adapter configuration.
	* `tinker_metadata.json`: Metadata from the Thinking Machines training run.

	## Citation

	If you use this model, please cite the original grammar source:

	> Riggs, S. R. (1890). Dakota Grammar, Texts, and Ethnography. Washington: Government Printing Office.

	And the Thinking Machines / PrimeIntellect RL framework.