--- base_model: - qwen/Qwen3.6-27B tags: - text-generation-inference - image - transformers - unsloth - qwen3_6 - reasoning - chain-of-thought - lora - sft - multimodal - vision - tool-use - function-calling - long-context - agent - science license: apache-2.0 language: - en - zh - es - ru - ja pipeline_tag: image-text-to-text datasets: - Jackrong/Claude-opus-4.6-TraceInversion-9000x - Jackrong/Claude-opus-4.7-TraceInversion-5000x ---
## π‘ 1. Base Model, Training Library & Cooperation| Curriculum Stage | Focus & Sample Characteristics | Strategy Details |
|---|---|---|
| π¦ Stage 1: Format Inception | β’ Limit context within 4,096 tokens β’ Emphasize stable reasoning templates |
Focuses on short-to-medium length, cleanly formatted reasoning samples. The primary goal is to establish a reliable, structured reasoning output format (such as auto-closing <think> tags), preventing premature exposure to complex chains from causing format collapse. |
| π οΈ Stage 2: Complexity Expansion | β’ Extend length to 4,096 - 8,192 tokens β’ Introduce high-difficulty logic samples |
Gradually increases the ratio of complex reasoning chains. By aligned distillation with "teacher models" whose reasoning style distributions closely match the Qwen3.6 base, the capacity gap is controlled to achieve highly efficient knowledge transfer. |
| π Stage 3: Long-Context SFT | β’ Progressively scale window up to 32K tokens β’ 10% high-quality short sample replay |
In this stage, the model is pushed to deep reasoning scenarios under ultra-long context and multi-turn dialogues. To prevent capacity drift or degradation of short-instruction comprehension during long-text training, a 10% replay of high-quality short samples is strictly enforced. |
| Dimension | Details & Infrastructure |
|---|---|
| π₯οΈ Training Hardware | NVIDIA DGX Cluster / H100 / RTX 6000 Pro |
| βοΈ Fine-tuning Framework | Unsloth (used for highly efficient SFT of dense models and memory optimization) |
| Module / Component | Issue & Troubleshooting Diagnostics |
|---|---|
| π Weight Merge (LoRA Merger) |
When merging LoRA adapters back into the base model, it is highly susceptible to peak memory out-of-memory (OOM) errors. Ensure the merging host has sufficient virtual memory or perform the low-precision merge on the CPU. |
| π οΈ Dependency Compatibility | PEFT, Transformers 5.x fusion mode, and Unsloth patches may occasionally cause module import failures (ImportError) or weight mapping conflicts. Please align your dependency versions with those provided in our finetuning-guide repository. |