--- license: apache-2.0 base_model: Qwen/Qwen3-4B-Thinking-2507 language: en library_name: peft tags: - agentic - terminal-bench - sft - lora - qwen3 - tool-use - bash - reasoning datasets: - prometheus04/microagent-train-v2 --- # qwen3-4b-thinking-microagent LoRA SFT pipeline + scripts + docs for fine-tuning [`Qwen/Qwen3-4B-Thinking-2507`](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) into a terminal agent. **Target:** beat 13% on Terminal-Bench 2.0 with a single A100-40GB. ## What's in this repo | Path | What | |---|---| | `README.md` | top-level overview | | `docs/PROJECT_OVERVIEW.md` | project goals + status | | `docs/DATA_PIPELINE.md` | how the training corpus is built | | `docs/FILTER_DESIGN.md` | filter rules deep dive | | `docs/MODEL_SELECTION.md` | why Qwen3-4B-Thinking-2507 vs alternatives | | `docs/HPC_PRINCIPLES.md` | single-A100 training optimization playbook | | `docs/REPRODUCIBILITY.md` | step-by-step reproduction guide | | `docs/VAST_AI_SETUP.md` | running on cheap rental A100s | | `docs/CHANGELOG.md` | v1 → v2 changes | | `scripts/run_pipeline_v2.py` | builds the training corpus | | `scripts/convert_code_v2.py` | code-specific filter (recovery + give_up) | | `scripts/rewrite_giveups.py` | retrospective give_up rewriter | | `scripts/train_v2.py` | HPC-grade LoRA training (Unsloth + packing + FA2) | | `scripts/setup_a100.sh` | one-shot A100 installer | | `scripts/merge_lora.py` | adapter → merged model for vLLM serving | | `data/pipeline_v2_log.txt` | full v2 pipeline run log | ## Training corpus Lives in a separate repo: [`prometheus04/microagent-train-v2`](https://huggingface.co/datasets/prometheus04/microagent-train-v2) (26,627 trajectories, ~1 GB). ## Why this exists There's a lot of public commentary about training small agents on terminal-style data. There's much less *executable code* you can run. This repo is the end-to-end recipe — corpus build, filter design rationale, HPC-optimized training, and the reasoning behind every choice. ## Headline numbers (corpus) - 26,627 trajectories, ~244M training tokens - 81.7% multi-turn (≥6 turns), avg ~8.5 assistant turns - 5.1% `` examples for honest failure handling - Math content: **0%** (deliberately dropped) - Code content: **48.4%** ## Headline numbers (training, projected) - A100-40GB single-GPU - 4–5 hours wall time for 1 epoch - ~$5 cost on Vast.ai - ~80MB final LoRA adapter ## How to run See [`docs/REPRODUCIBILITY.md`](https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent/blob/main/docs/REPRODUCIBILITY.md) for the full step-by-step. Short version: ```bash git clone https://huggingface.co/prometheus04/qwen3-4b-thinking-microagent cd qwen3-4b-thinking-microagent huggingface-cli download prometheus04/microagent-train-v2 \ --repo-type dataset --local-dir data bash scripts/setup_a100.sh python scripts/train_v2.py --output-dir runs/v1 --epochs 1.0 ``` ## Format the model learns ``` brief reasoning shell commands ``` Or to end: ``` verification one-line summary ``` Or honest stop: ``` three approaches all failed; out of turns tried 3 distinct approaches; last failure: NameError: name 'x' is not defined ``` ## License MIT for code. Base model is Apache 2.0. Training corpus derived from Nvidia's Nemotron-Terminal-Corpus (NVIDIA Open Model License).