--- tags: - clawpathy - tinker - lora - sft base_model: Qwen/Qwen3-4B-Instruct-2507 library_name: tinker --- # Qwen3-4B-Opus-Reasoning-SFT Trained with [Clawpathy](https://github.com/clawpathy) using the Tinker platform. ## Training Details | Parameter | Value | |---|---| | **Base model** | Qwen/Qwen3-4B-Instruct-2507 | | **Method** | Supervised Fine-Tuning | | **Dataset** | nohurry/Opus-4.6-Reasoning-3000x-filtered | | **LoRA rank** | 32 | | **Learning rate** | 0.00018 | | **Steps** | 50 | | **Final loss** | 1.236125 | | **Tinker sampler path** | `tinker://332646c7-9a78-5c49-a58c-3ca971b142ce:train:0/sampler_weights/clawpathy-c942fbe6-f6b-final` | | **Tinker state path** | `tinker://332646c7-9a78-5c49-a58c-3ca971b142ce:train:0/weights/clawpathy-c942fbe6-f6b-state` | ## Evaluation { "job_id": "c942fbe6-f6b", "model": "Trained: Qwen/Qwen3-4B-Instruct-2507 (job c942fbe6-f6b)", "eval_type": "benchmark", "benchmarks": [ "Dataset SFT-match (50)", "MMLU (0-shot, 200)" ], "benchmark_names": [ "Dataset SFT-match (50)", "MMLU (0-shot, 200)" ], "domain": "general", "accuracy": 0.21, "metrics": { "dataset_sft_match/token_f1": 0.3774, "dataset_sft_match/exact_match": 0.0, "dataset_sft_match/evaluated": 50, "dataset_sft_match/split": "train", "dataset_sft_match/format": "instruction", "dataset_sft_match/examples": [ { "expected": "# Solution: Calculating Bus Rental and Toll Costs\n\n## Step 1: Find the total number of people going on the field trip\n\n- Fifth-grade students: 252\n- Teachers: 8\n- **Total people**:", "output": "## Solution\n\n**Step 1: Calculate the total number of people going on the trip**\n\n$$252 \\text{ students} + 8 \\text{ teachers} = 260 \\text{ people}$$\n\n**Step 2: Determine how many bu", "token_f1": 0.3976, "exact": false }, { "expected": "## Missing Problem Description\n\nYour request appears to be incomplete. You've provided the **flushing instructions** for an interactive problem, but the **actual problem descriptio", "output": "I notice you've provided instructions for an interactive problem, but you haven't actually included the problem statement or the specific question you'd like me to solve.\n\nCould yo", "token_f1": 0.2332, "exact": false }, { "expected": "**Answer: no**\n\n**Explanation:**\n\nThe premise states that \"Thorn thought the same thing.\" For this statement to make sense, there must be another person (or people) who had this th", "output": "## Solution\n\n**Step 1: Analyze the premise**\n- \"Apparently Thorn thought the same thing\" indicates that Thorn's thoughts aligned with someone else's thoughts\n- The word \"same\" impl", "token_f1": 0.4814, "exact": false }, { "expected": "## Solution\n\nThe question asks for **automated software** specifically designed to create energy profile diagrams.\n\n### Analysis of Options:\n\n| Option | Software | Type | Automated", "output": "# Automated Software for Energy Profile Diagrams\n\n## Analysis of Each Option\n\n**A: TikZ** - This is a LaTeX package for creating graphics. While it can produce energy profile diagr", "token_f1": 0.391, "exact": false }, { "expected": "No, these sentences are not paraphrases of each other. \n\nA paraphrase would convey the same meaning or information using different wording. However, these two sentences describe en", "output": "## Analysis\n\nLet me compare these two sentences to determine if they are paraphrases.\n\n**Sentence 1:** \"In Ventura County, the number of closings and advisories dropped by 73 perce", "token_f1": 0.3219, "exact": false }, { "expected": "The answer is C (contradiction).\n\nThe two sentences directly contradict each other. Sentence 1 explicitly states that the woman is wearing a WHITE dress while riding a bicycle. How", "output": "# Analysis of Sentence Agreement\n\n## Step 1: Identify the key information in each sentence\n\n**Sentence 1:** \"A woman wearing a white dress riding a bicycle.\"\n- Subject: A woman\n- C", "token_f1": 0.3739, "exact": false }, { "expected": "# Article Based on Provided Highlights\n\n## Clinton to Suspend Presidential Campaign Saturday, Sources Say\n\nIn a significant development in the Democratic presidential race, sources", "output": "# Clinton Suspends Presidential Campaign Amid Democratic Unity Efforts\n\n## Key Developments in the 2016 Democratic Primary Race\n\n**WASHINGTON, D.C.** \u2014 Democratic presidential nomi", "token_f1": 0.4765, "exact": false }, { "expected": "# Problem Analysis\n\nBased on the example provided, I can identify this as a **modular arithmetic transformation problem**.\n\n## Problem Statement (Reconstructed)\n\nGiven:\n- A modulus", "output": "I notice you've provided an example but haven't actually asked a question. Let me analyze what this example demonstrates:\n\n## What this example shows:\n\n**Problem Type**: This appea", "token_f1": 0.1667, "exact": false }, { "expected": "It appears you've provided the output format for a competitive programming problem, but I'm missing the actual problem statement. To give you a detailed solution with verification,", "output": "I notice that your message appears to be a partial problem statement, but it's incomplete. You've provided:\n\n- A conditional structure: \"Otherwise, print an integer m...\"\n- A descr", "token_f1": 0.4075, "exact": false }, { "expected": "Based on the context provided, the answer is:\n\n**All federal funds**, including grants, loans, unions and employers who accepted taxpayer funds, and other forms of financial assist", "output": "# Answer Extraction\n\n## Step-by-Step Analysis\n\n**Step 1: Identify the question**\n- \"What was forced to comply?\"\n\n**Step 2: Locate relevant information in the context**\n- The key se", "token_f1": 0.5455, "exact": false } ], "mmlu/accuracy": 0.21, "mmlu/evaluated": 200, "mmlu/correct": 42, "mmlu/split": "validation", "mmlu/limit": 200, "mmlu/examples": [ { "question": "The cyclic subgroup of Z_24 generated by 18 has order", "gold": "A", "pred": "A", "raw": "# Solution\n\n## Step 1: Understanding the Problem\n\nWe need to find the order of t", "correct": true }, { "question": "Find the order of the factor group Z_6/<3>.", "gold": "B", "pred": "C", "raw": "# Solution: Order of the Factor Group Z_6/<3>\n\n## Step 1: Understand the problem", "correct": false }, { "question": "Statement 1 | A permutation that is a product of m even permutations and n odd permutations is an even permutation if and only if n is even. Statement 2 | Every", "gold": "A", "pred": "A", "raw": "# Solution\n\n## Statement 1 Analysis\n\n**Statement 1:** A permutation that is a pr", "correct": true }, { "question": "Find the order of the factor group (Z_4 x Z_12)/(<2> x <2>)", "gold": "C", "pred": "A", "raw": "# Solution\n\n## Step 1: Understand the problem\nWe need to find the order of the f", "correct": false }, { "question": "Find the maximum possible order for some element of Z_4 x Z_6.", "gold": "C", "pred": "A", "raw": "# Finding the Maximum Order in Z_4 \u00d7 Z_6\n\n## Step 1: Understanding the Problem\n\n", "correct": false }, { "question": "Statement 1 | The symmetric group S_3 is cyclic. Statement 2 | Every group is isomorphic to some group of permutations.", "gold": "D", "pred": "A", "raw": "# Solution\n\n## Statement 1: \"The symmetric group S_3 is cyclic.\"\n\n**Analysis:**\n", "correct": false }, { "question": "Statement 1 | If a and b are elements of finite order in an Abelian group, then |ab| is the lcm (|a|,|b|). Statement 2 | If g is a group element and g^n = e, th", "gold": "B", "pred": "A", "raw": "# Verification of Group Theory Statements\n\n## Statement 1 Analysis\n**Statement 1", "correct": false }, { "question": "Statement 1 | If f is a homomorphism from G to K and H is normal in G then f(H) is normal in K. Statement 2 | If f is a homomorphism from G to a group and H is ", "gold": "D", "pred": "A", "raw": "# Solution\n\n## Statement 1 Analysis\n**If f is a homomorphism from G to K and H i", "correct": false }, { "question": "Find the maximum possible order for an element of S_n for n = 7.", "gold": "B", "pred": "A", "raw": "# Solution: Finding the Maximum Order of an Element in S\u2087\n\n## Step 1: Understand", "correct": false }, { "question": "Statement 1 | Every integral domain has a field of quotients. Statement 2 | A polynomial of degree n over a ring can have at most n zeros counting multiplicity.", "gold": "C", "pred": "A", "raw": "# Solution\n\n## Statement 1: Every integral domain has a field of quotients\n\n**TR", "correct": false } ] }, "results": [ { "benchmark": "dataset_sft_match/token_f1", "score": 0.3774 }, { "benchmark": "dataset_sft_match/exact_match", "score": 0.0 }, { "benchmark": "dataset_sft_match/evaluated", "score": 50.0 }, { "benchmark": "mmlu/accuracy", "score": 0.21 }, { "benchmark": "mmlu/evaluated", "score": 200.0 }, { "benchmark": "mmlu/correct", "score": 42.0 }, { "benchmark": "mmlu/limit", "score": 200.0 } ], "code_eval": null, "stored_eval": null, "dataset_id": "nohurry/Opus-4.6-Reasoning-3000x-filtered", "status": "completed", "requested_eval_tasks": [], "executed_eval_tasks": [ "dataset_sft_match", "internal/mmlu_0_shot" ], "task_errors": {}, "is_proxy": false, "is_complete": true }