---
tags:
- clawpathy
- tinker
- lora
- sft
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: tinker
---

# Qwen3-4B-Opus-Reasoning-SFT

Trained with [Clawpathy](https://github.com/clawpathy) using the Tinker platform.

## Training Details

| Parameter | Value |
|---|---|
| **Base model** | Qwen/Qwen3-4B-Instruct-2507 |
| **Method** | Supervised Fine-Tuning |
| **Dataset** | nohurry/Opus-4.6-Reasoning-3000x-filtered |
| **LoRA rank** | 32 |
| **Learning rate** | 0.00018 |
| **Steps** | 50 |
| **Final loss** | 1.236125 |
| **Tinker sampler path** | `tinker://332646c7-9a78-5c49-a58c-3ca971b142ce:train:0/sampler_weights/clawpathy-c942fbe6-f6b-final` |
| **Tinker state path** | `tinker://332646c7-9a78-5c49-a58c-3ca971b142ce:train:0/weights/clawpathy-c942fbe6-f6b-state` |

## Evaluation

{
  "job_id": "c942fbe6-f6b",
  "model": "Trained: Qwen/Qwen3-4B-Instruct-2507 (job c942fbe6-f6b)",
  "eval_type": "benchmark",
  "benchmarks": [
    "Dataset SFT-match (50)",
    "MMLU (0-shot, 200)"
  ],
  "benchmark_names": [
    "Dataset SFT-match (50)",
    "MMLU (0-shot, 200)"
  ],
  "domain": "general",
  "accuracy": 0.21,
  "metrics": {
    "dataset_sft_match/token_f1": 0.3774,
    "dataset_sft_match/exact_match": 0.0,
    "dataset_sft_match/evaluated": 50,
    "dataset_sft_match/split": "train",
    "dataset_sft_match/format": "instruction",
    "dataset_sft_match/examples": [
      {
        "expected": "# Solution: Calculating Bus Rental and Toll Costs\n\n## Step 1: Find the total number of people going on the field trip\n\n- Fifth-grade students: 252\n- Teachers: 8\n- **Total people**:",
        "output": "## Solution\n\n**Step 1: Calculate the total number of people going on the trip**\n\n$$252 \\text{ students} + 8 \\text{ teachers} = 260 \\text{ people}$$\n\n**Step 2: Determine how many bu",
        "token_f1": 0.3976,
        "exact": false
      },
      {
        "expected": "## Missing Problem Description\n\nYour request appears to be incomplete. You've provided the **flushing instructions** for an interactive problem, but the **actual problem descriptio",
        "output": "I notice you've provided instructions for an interactive problem, but you haven't actually included the problem statement or the specific question you'd like me to solve.\n\nCould yo",
        "token_f1": 0.2332,
        "exact": false
      },
      {
        "expected": "**Answer: no**\n\n**Explanation:**\n\nThe premise states that \"Thorn thought the same thing.\" For this statement to make sense, there must be another person (or people) who had this th",
        "output": "## Solution\n\n**Step 1: Analyze the premise**\n- \"Apparently Thorn thought the same thing\" indicates that Thorn's thoughts aligned with someone else's thoughts\n- The word \"same\" impl",
        "token_f1": 0.4814,
        "exact": false
      },
      {
        "expected": "## Solution\n\nThe question asks for **automated software** specifically designed to create energy profile diagrams.\n\n### Analysis of Options:\n\n| Option | Software | Type | Automated",
        "output": "# Automated Software for Energy Profile Diagrams\n\n## Analysis of Each Option\n\n**A: TikZ** - This is a LaTeX package for creating graphics. While it can produce energy profile diagr",
        "token_f1": 0.391,
        "exact": false
      },
      {
        "expected": "No, these sentences are not paraphrases of each other. \n\nA paraphrase would convey the same meaning or information using different wording. However, these two sentences describe en",
        "output": "## Analysis\n\nLet me compare these two sentences to determine if they are paraphrases.\n\n**Sentence 1:** \"In Ventura County, the number of closings and advisories dropped by 73 perce",
        "token_f1": 0.3219,
        "exact": false
      },
      {
        "expected": "The answer is C (contradiction).\n\nThe two sentences directly contradict each other. Sentence 1 explicitly states that the woman is wearing a WHITE dress while riding a bicycle. How",
        "output": "# Analysis of Sentence Agreement\n\n## Step 1: Identify the key information in each sentence\n\n**Sentence 1:** \"A woman wearing a white dress riding a bicycle.\"\n- Subject: A woman\n- C",
        "token_f1": 0.3739,
        "exact": false
      },
      {
        "expected": "# Article Based on Provided Highlights\n\n## Clinton to Suspend Presidential Campaign Saturday, Sources Say\n\nIn a significant development in the Democratic presidential race, sources",
        "output": "# Clinton Suspends Presidential Campaign Amid Democratic Unity Efforts\n\n## Key Developments in the 2016 Democratic Primary Race\n\n**WASHINGTON, D.C.** \u2014 Democratic presidential nomi",
        "token_f1": 0.4765,
        "exact": false
      },
      {
        "expected": "# Problem Analysis\n\nBased on the example provided, I can identify this as a **modular arithmetic transformation problem**.\n\n## Problem Statement (Reconstructed)\n\nGiven:\n- A modulus",
        "output": "I notice you've provided an example but haven't actually asked a question. Let me analyze what this example demonstrates:\n\n## What this example shows:\n\n**Problem Type**: This appea",
        "token_f1": 0.1667,
        "exact": false
      },
      {
        "expected": "It appears you've provided the output format for a competitive programming problem, but I'm missing the actual problem statement. To give you a detailed solution with verification,",
        "output": "I notice that your message appears to be a partial problem statement, but it's incomplete. You've provided:\n\n- A conditional structure: \"Otherwise, print an integer m...\"\n- A descr",
        "token_f1": 0.4075,
        "exact": false
      },
      {
        "expected": "Based on the context provided, the answer is:\n\n**All federal funds**, including grants, loans, unions and employers who accepted taxpayer funds, and other forms of financial assist",
        "output": "# Answer Extraction\n\n## Step-by-Step Analysis\n\n**Step 1: Identify the question**\n- \"What was forced to comply?\"\n\n**Step 2: Locate relevant information in the context**\n- The key se",
        "token_f1": 0.5455,
        "exact": false
      }
    ],
    "mmlu/accuracy": 0.21,
    "mmlu/evaluated": 200,
    "mmlu/correct": 42,
    "mmlu/split": "validation",
    "mmlu/limit": 200,
    "mmlu/examples": [
      {
        "question": "The cyclic subgroup of Z_24 generated by 18 has order",
        "gold": "A",
        "pred": "A",
        "raw": "# Solution\n\n## Step 1: Understanding the Problem\n\nWe need to find the order of t",
        "correct": true
      },
      {
        "question": "Find the order of the factor group Z_6/<3>.",
        "gold": "B",
        "pred": "C",
        "raw": "# Solution: Order of the Factor Group Z_6/<3>\n\n## Step 1: Understand the problem",
        "correct": false
      },
      {
        "question": "Statement 1 | A permutation that is a product of m even permutations and n odd permutations is an even permutation if and only if n is even. Statement 2 | Every",
        "gold": "A",
        "pred": "A",
        "raw": "# Solution\n\n## Statement 1 Analysis\n\n**Statement 1:** A permutation that is a pr",
        "correct": true
      },
      {
        "question": "Find the order of the factor group (Z_4 x Z_12)/(<2> x <2>)",
        "gold": "C",
        "pred": "A",
        "raw": "# Solution\n\n## Step 1: Understand the problem\nWe need to find the order of the f",
        "correct": false
      },
      {
        "question": "Find the maximum possible order for some element of Z_4 x Z_6.",
        "gold": "C",
        "pred": "A",
        "raw": "# Finding the Maximum Order in Z_4 \u00d7 Z_6\n\n## Step 1: Understanding the Problem\n\n",
        "correct": false
      },
      {
        "question": "Statement 1 | The symmetric group S_3 is cyclic. Statement 2 | Every group is isomorphic to some group of permutations.",
        "gold": "D",
        "pred": "A",
        "raw": "# Solution\n\n## Statement 1: \"The symmetric group S_3 is cyclic.\"\n\n**Analysis:**\n",
        "correct": false
      },
      {
        "question": "Statement 1 | If a and b are elements of finite order in an Abelian group, then |ab| is the lcm (|a|,|b|). Statement 2 | If g is a group element and g^n = e, th",
        "gold": "B",
        "pred": "A",
        "raw": "# Verification of Group Theory Statements\n\n## Statement 1 Analysis\n**Statement 1",
        "correct": false
      },
      {
        "question": "Statement 1 | If f is a homomorphism from G to K and H is normal in G then f(H) is normal in K. Statement 2 | If f is a homomorphism from G to a group and H is ",
        "gold": "D",
        "pred": "A",
        "raw": "# Solution\n\n## Statement 1 Analysis\n**If f is a homomorphism from G to K and H i",
        "correct": false
      },
      {
        "question": "Find the maximum possible order for an element of S_n for n = 7.",
        "gold": "B",
        "pred": "A",
        "raw": "# Solution: Finding the Maximum Order of an Element in S\u2087\n\n## Step 1: Understand",
        "correct": false
      },
      {
        "question": "Statement 1 | Every integral domain has a field of quotients. Statement 2 | A polynomial of degree n over a ring can have at most n zeros counting multiplicity.",
        "gold": "C",
        "pred": "A",
        "raw": "# Solution\n\n## Statement 1: Every integral domain has a field of quotients\n\n**TR",
        "correct": false
      }
    ]
  },
  "results": [
    {
      "benchmark": "dataset_sft_match/token_f1",
      "score": 0.3774
    },
    {
      "benchmark": "dataset_sft_match/exact_match",
      "score": 0.0
    },
    {
      "benchmark": "dataset_sft_match/evaluated",
      "score": 50.0
    },
    {
      "benchmark": "mmlu/accuracy",
      "score": 0.21
    },
    {
      "benchmark": "mmlu/evaluated",
      "score": 200.0
    },
    {
      "benchmark": "mmlu/correct",
      "score": 42.0
    },
    {
      "benchmark": "mmlu/limit",
      "score": 200.0
    }
  ],
  "code_eval": null,
  "stored_eval": null,
  "dataset_id": "nohurry/Opus-4.6-Reasoning-3000x-filtered",
  "status": "completed",
  "requested_eval_tasks": [],
  "executed_eval_tasks": [
    "dataset_sft_match",
    "internal/mmlu_0_shot"
  ],
  "task_errors": {},
  "is_proxy": false,
  "is_complete": true
}