| ---
|
| license: apache-2.0
|
| language:
|
| - dak
|
| - en
|
| tags:
|
| - reinforcement-learning
|
| - rl
|
| - grpo
|
| - dakota
|
| - indigenous-languages
|
| - thinking-machines
|
| - tinker
|
| base_model: Qwen/Qwen3-30B-A3B-Instruct-2507 |
| datasets:
|
| - HarleyCooper/Dakota-Grammar-RL
|
| ---
|
|
|
| # Qwen3-30B-ThinkingMachines-Dakota1890
|
|
|
| This model is a **Reinforcement Learning (RL) fine-tune** of Qwen3-30B-A3B-Instruct-2507, optimized for **Dakota language grammar and morphology**. |
|
|
| It was trained using the **Thinking Machines Tinker** distributed RL pipeline, leveraging the **GRPO (Group Relative Policy Optimization)** algorithm. The training process used a custom verifier environment built from Stephen Return Riggs' 1890 *Dakota Grammar & Dictionary*. |
|
|
| ## Model Details |
|
|
| - **Base Model**: Qwen/Qwen3-30B-A3B-Instruct-2507 |
| - **Architecture**: LoRA Adapter (Rank 32) |
| - **Training Method**: GRPO (Group Relative Policy Optimization) |
| - **Training Infrastructure**: Thinking Machines Tinker |
| - **Language**: Dakota (dak), English (en) |
| - **License**: Apache 2.0 |
|
|
| ## Training Data & Methodology
|
|
|
| The model was trained on a dataset of **~10,000 RL tasks** generated from the 1890 Dakota Grammar. These tasks focus on:
|
|
|
| 1. **Morphology**: Applying prefixes/suffixes (e.g., possessives `-ku`, `-ću`, `-tku`).
|
| 2. **Translation**: Context-aware translation between Dakota and English.
|
| 3. **Character Preservation**: Strict adherence to Dakota orthography (ŋ, š, ć, ź, ž, ʼ).
|
|
|
| ### Reward Function
|
|
|
| The RL training used a composite reward function (`DakotaGrammarRubric`) with the following components:
|
|
|
| * **Character Preservation (20%)**: Verifies correct usage of special Unicode characters.
|
| * **Affix Accuracy (10%)**: Checks for correct morphological transformations.
|
| * **Exact Match (40%)**: Rewards precise answers for rigid grammatical tasks.
|
| * **Pattern Matching (15%)**: Uses regex to verify structural correctness.
|
| * **Length Penalty (15%)**: Prevents verbosity.
|
|
|
| ## Performance |
|
|
| (Metrics from the Thinking Machines run `i55d4x26`) |
|
|
| * **Composite Reward**: 0.317 final (peak 0.442 at step 116; start 0.105) |
| * **Character Preservation**: 0.619 final (peak 0.699; start 0.265) |
| * **Affix Accuracy**: 1.000 final (start 0.957) |
| * **Exact Match**: 0.100 final (peak 0.337; start 0.001) |
|
|
| ## Usage |
|
|
| ### With Hugging Face Transformers |
|
|
| ```python
|
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from peft import PeftModel |
| |
| base_model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507" |
| adapter_name = "HarleyCooper/Qwen3-30B-ThinkingMachines-Dakota1890" |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| base_model_name, |
| device_map="auto", |
| torch_dtype="auto", |
| trust_remote_code=True, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
| model = PeftModel.from_pretrained(model, adapter_name) |
|
|
| # Inference
|
| prompt = "Translate 'my elder brother' to Dakota using the correct possessive suffix."
|
| messages = [
|
| {"role": "system", "content": "You are a Dakota language expert."},
|
| {"role": "user", "content": prompt}
|
| ]
|
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
|
|
| outputs = model.generate(**inputs, max_new_tokens=128) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| |
| # Alternatively: |
| # from peft import AutoPeftModelForCausalLM |
| # model = AutoPeftModelForCausalLM.from_pretrained(adapter_name, device_map="auto") |
| ```
|
|
|
| ### With Thinking Machines Tinker |
|
|
| This checkpoint is also available directly via the Tinker platform at: |
|
|
| ``` |
| tinker://da1ef918-d67a-5080-b500-dd1256db9ca7:train:0/sampler_weights/final |
| ``` |
|
|
| ## Files
|
|
|
| * `adapter_model.safetensors`: The LoRA adapter weights.
|
| * `adapter_config.json`: Adapter configuration.
|
| * `tinker_metadata.json`: Metadata from the Thinking Machines training run.
|
|
|
| ## Citation
|
|
|
| If you use this model, please cite the original grammar source:
|
|
|
| > Riggs, S. R. (1890). *Dakota Grammar, Texts, and Ethnography*. Washington: Government Printing Office.
|
|
|
| And the Thinking Machines / PrimeIntellect RL framework.
|
|
|
|
|