--- license: apache-2.0 language: - dak - en tags: - reinforcement-learning - rl - grpo - dakota - indigenous-languages - thinking-machines - tinker base_model: Qwen/Qwen3-30B-A3B-Instruct-2507 datasets: - HarleyCooper/Dakota-Grammar-RL --- # Qwen3-30B-ThinkingMachines-Dakota1890 This model is a **Reinforcement Learning (RL) fine-tune** of Qwen3-30B-A3B-Instruct-2507, optimized for **Dakota language grammar and morphology**. It was trained using the **Thinking Machines Tinker** distributed RL pipeline, leveraging the **GRPO (Group Relative Policy Optimization)** algorithm. The training process used a custom verifier environment built from Stephen Return Riggs' 1890 *Dakota Grammar & Dictionary*. ## Model Details - **Base Model**: Qwen/Qwen3-30B-A3B-Instruct-2507 - **Architecture**: LoRA Adapter (Rank 32) - **Training Method**: GRPO (Group Relative Policy Optimization) - **Training Infrastructure**: Thinking Machines Tinker - **Language**: Dakota (dak), English (en) - **License**: Apache 2.0 ## Training Data & Methodology The model was trained on a dataset of **~10,000 RL tasks** generated from the 1890 Dakota Grammar. These tasks focus on: 1. **Morphology**: Applying prefixes/suffixes (e.g., possessives `-ku`, `-ću`, `-tku`). 2. **Translation**: Context-aware translation between Dakota and English. 3. **Character Preservation**: Strict adherence to Dakota orthography (ŋ, š, ć, ź, ž, ʼ). ### Reward Function The RL training used a composite reward function (`DakotaGrammarRubric`) with the following components: * **Character Preservation (20%)**: Verifies correct usage of special Unicode characters. * **Affix Accuracy (10%)**: Checks for correct morphological transformations. * **Exact Match (40%)**: Rewards precise answers for rigid grammatical tasks. * **Pattern Matching (15%)**: Uses regex to verify structural correctness. * **Length Penalty (15%)**: Prevents verbosity. ## Performance (Metrics from the Thinking Machines run `i55d4x26`) * **Composite Reward**: 0.317 final (peak 0.442 at step 116; start 0.105) * **Character Preservation**: 0.619 final (peak 0.699; start 0.265) * **Affix Accuracy**: 1.000 final (start 0.957) * **Exact Match**: 0.100 final (peak 0.337; start 0.001) ## Usage ### With Hugging Face Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507" adapter_name = "HarleyCooper/Qwen3-30B-ThinkingMachines-Dakota1890" model = AutoModelForCausalLM.from_pretrained( base_model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(base_model_name) model = PeftModel.from_pretrained(model, adapter_name) # Inference prompt = "Translate 'my elder brother' to Dakota using the correct possessive suffix." messages = [ {"role": "system", "content": "You are a Dakota language expert."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) # Alternatively: # from peft import AutoPeftModelForCausalLM # model = AutoPeftModelForCausalLM.from_pretrained(adapter_name, device_map="auto") ``` ### With Thinking Machines Tinker This checkpoint is also available directly via the Tinker platform at: ``` tinker://da1ef918-d67a-5080-b500-dd1256db9ca7:train:0/sampler_weights/final ``` ## Files * `adapter_model.safetensors`: The LoRA adapter weights. * `adapter_config.json`: Adapter configuration. * `tinker_metadata.json`: Metadata from the Thinking Machines training run. ## Citation If you use this model, please cite the original grammar source: > Riggs, S. R. (1890). *Dakota Grammar, Texts, and Ethnography*. Washington: Government Printing Office. And the Thinking Machines / PrimeIntellect RL framework.