---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Thinking-2507
language: en
library_name: peft
tags:
  - agentic
  - terminal-bench
  - sft
  - lora
  - qwen3
  - tool-use
  - bash
  - reasoning
datasets:
  - prometheus04/microagent-train-v2
---

# qwen3-4b-thinking-microagent

LoRA SFT pipeline + scripts + docs for fine-tuning
[`Qwen/Qwen3-4B-Thinking-2507`](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
into a terminal agent.

**Target:** beat 13% on Terminal-Bench 2.0 with a single A100-40GB.

## What's in this repo

| Path | What |
|---|---|
| `README.md` | top-level overview |
| `docs/PROJECT_OVERVIEW.md` | project goals + status |
| `docs/DATA_PIPELINE.md` | how the training corpus is built |
| `docs/FILTER_DESIGN.md` | filter rules deep dive |
| `docs/MODEL_SELECTION.md` | why Qwen3-4B-Thinking-2507 vs alternatives |
| `docs/HPC_PRINCIPLES.md` | single-A100 training optimization playbook |
| `docs/REPRODUCIBILITY.md` | step-by-step reproduction guide |
| `docs/VAST_AI_SETUP.md` | running on cheap rental A100s |
| `docs/CHANGELOG.md` | v1 → v2 changes |
| `scripts/run_pipeline_v2.py` | builds the training corpus |
| `scripts/convert_code_v2.py` | code-specific filter (recovery + give_up) |
| `scripts/rewrite_giveups.py` | retrospective give_up rewriter |
| `scripts/train_v2.py` | HPC-grade LoRA training (Unsloth + packing + FA2) |
| `scripts/setup_a100.sh` | one-shot A100 installer |
| `scripts/merge_lora.py` | adapter → merged model for vLLM serving |
| `data/pipeline_v2_log.txt` | full v2 pipeline run log |

## Training corpus

Lives in a separate repo:
[`prometheus04/microagent-train-v2`](https://huggingface.co/datasets/prometheus04/microagent-train-v2)
(26,627 trajectories, ~1 GB).

## Why this exists

There's a lot of public commentary about training small agents on terminal-style
data. There's much less *executable code* you can run. This repo is the
end-to-end recipe — corpus build, filter design rationale, HPC-optimized training,
and the reasoning behind every choice.

## Headline numbers (corpus)

- 26,627 trajectories, ~244M training tokens
- 81.7% multi-turn (≥6 turns), avg ~8.5 assistant turns
- 5.1% `<give_up>` examples for honest failure handling
- Math content: **0%** (deliberately dropped)
- Code content: **48.4%**

## Headline numbers (training, projected)

- A100-40GB single-GPU
- 4–5 hours wall time for 1 epoch
- ~$5 cost on Vast.ai
- ~80MB final LoRA adapter

## How to run

See [`docs/REPRODUCIBILITY.md`](https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent/blob/main/docs/REPRODUCIBILITY.md)
for the full step-by-step.

Short version:
```bash
git clone https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent
cd qwen3-4b-thinking-microagent
huggingface-cli download prometheus04/microagent-train-v2 \
  --repo-type dataset --local-dir data
bash scripts/setup_a100.sh
python scripts/train_v2.py --output-dir runs/v1 --epochs 1.0
```

## Format the model learns

```
<think>brief reasoning</think>
<bash>shell commands</bash>
```

Or to end:
```
<think>verification</think>
<finish>one-line summary</finish>
```

Or honest stop:
```
<think>three approaches all failed; out of turns</think>
<give_up>tried 3 distinct approaches; last failure: NameError: name 'x' is not defined</give_up>
```

## License

MIT for code. Base model is Apache 2.0. Training corpus derived from Nvidia's
Nemotron-Terminal-Corpus (NVIDIA Open Model License).