HarleyCooper's picture
Add missing tokenizer files
ccee67b verified
|
Raw
History Blame Contribute Delete
4.13 kB
---
license: apache-2.0
language:
- dak
- en
tags:
- reinforcement-learning
- rl
- grpo
- dakota
- indigenous-languages
- thinking-machines
- tinker
base_model: Qwen/Qwen3-30B-A3B-Instruct-2507
datasets:
- HarleyCooper/Dakota-Grammar-RL
---
# Qwen3-30B-ThinkingMachines-Dakota1890
This model is a **Reinforcement Learning (RL) fine-tune** of Qwen3-30B-A3B-Instruct-2507, optimized for **Dakota language grammar and morphology**.
It was trained using the **Thinking Machines Tinker** distributed RL pipeline, leveraging the **GRPO (Group Relative Policy Optimization)** algorithm. The training process used a custom verifier environment built from Stephen Return Riggs' 1890 *Dakota Grammar & Dictionary*.
## Model Details
- **Base Model**: Qwen/Qwen3-30B-A3B-Instruct-2507
- **Architecture**: LoRA Adapter (Rank 32)
- **Training Method**: GRPO (Group Relative Policy Optimization)
- **Training Infrastructure**: Thinking Machines Tinker
- **Language**: Dakota (dak), English (en)
- **License**: Apache 2.0
## Training Data & Methodology
The model was trained on a dataset of **~10,000 RL tasks** generated from the 1890 Dakota Grammar. These tasks focus on:
1. **Morphology**: Applying prefixes/suffixes (e.g., possessives `-ku`, `-ću`, `-tku`).
2. **Translation**: Context-aware translation between Dakota and English.
3. **Character Preservation**: Strict adherence to Dakota orthography (ŋ, š, ć, ź, ž, ʼ).
### Reward Function
The RL training used a composite reward function (`DakotaGrammarRubric`) with the following components:
* **Character Preservation (20%)**: Verifies correct usage of special Unicode characters.
* **Affix Accuracy (10%)**: Checks for correct morphological transformations.
* **Exact Match (40%)**: Rewards precise answers for rigid grammatical tasks.
* **Pattern Matching (15%)**: Uses regex to verify structural correctness.
* **Length Penalty (15%)**: Prevents verbosity.
## Performance
(Metrics from the Thinking Machines run `i55d4x26`)
* **Composite Reward**: 0.317 final (peak 0.442 at step 116; start 0.105)
* **Character Preservation**: 0.619 final (peak 0.699; start 0.265)
* **Affix Accuracy**: 1.000 final (start 0.957)
* **Exact Match**: 0.100 final (peak 0.337; start 0.001)
## Usage
### With Hugging Face Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"
adapter_name = "HarleyCooper/Qwen3-30B-ThinkingMachines-Dakota1890"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_name)
# Inference
prompt = "Translate 'my elder brother' to Dakota using the correct possessive suffix."
messages = [
{"role": "system", "content": "You are a Dakota language expert."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Alternatively:
# from peft import AutoPeftModelForCausalLM
# model = AutoPeftModelForCausalLM.from_pretrained(adapter_name, device_map="auto")
```
### With Thinking Machines Tinker
This checkpoint is also available directly via the Tinker platform at:
```
tinker://da1ef918-d67a-5080-b500-dd1256db9ca7:train:0/sampler_weights/final
```
## Files
* `adapter_model.safetensors`: The LoRA adapter weights.
* `adapter_config.json`: Adapter configuration.
* `tinker_metadata.json`: Metadata from the Thinking Machines training run.
## Citation
If you use this model, please cite the original grammar source:
> Riggs, S. R. (1890). *Dakota Grammar, Texts, and Ethnography*. Washington: Government Printing Office.
And the Thinking Machines / PrimeIntellect RL framework.