mahsum commited on
Commit
1625ea0
·
verified ·
1 Parent(s): 785fa7c

Initial release: Jazari-4B-SFT-TR (experimental Turkish adaptation of Qwen3.5-4B)

Browse files
Files changed (7) hide show
  1. .gitattributes +1 -0
  2. README.md +177 -0
  3. chat_template.jinja +154 -0
  4. config.json +113 -0
  5. model.safetensors +3 -0
  6. tokenizer.json +3 -0
  7. tokenizer_config.json +33 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - tr
5
+ - en
6
+ base_model: Qwen/Qwen3.5-4B
7
+ tags:
8
+ - turkish
9
+ - experimental
10
+ - fine-tuned
11
+ - qwen
12
+ - conversational
13
+ - text-generation
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Jazari-4B-SFT-TR (Experimental)
18
+
19
+ A 4B parameter Turkish-adapted LLM based on Qwen3.5-4B. This is an **experimental** model created as a low-budget Turkish language adaptation case study.
20
+
21
+ > **Note:** This is a research/experimental release. It is NOT a production-ready model. See limitations below.
22
+
23
+ ## What is this?
24
+
25
+ Jazari-4B-SFT-TR is a Qwen3.5-4B model adapted for Turkish through:
26
+ 1. **Continued Pre-Training (CPT):** 674 MB of Turkish text, 11,939 steps
27
+ 2. **Supervised Fine-Tuning (SFT):** 73,182 examples across 13 categories, 9,600 steps (early stopping)
28
+
29
+ The entire training was done on a single RTX 5090 GPU (vast.ai) over approximately 1 week, using LoRA (r=128 for CPT, r=64 for SFT) with the Unsloth + TRL framework.
30
+
31
+ ## Training Details
32
+
33
+ | Parameter | Value |
34
+ |-----------|-------|
35
+ | Base Model | Qwen/Qwen3.5-4B |
36
+ | CPT Data | 674 MB Turkish text (AkademikDerlem, BellaTurca, OzenliDerlem, dolphin-r1-turkish, gsm8k_tr) |
37
+ | SFT Data | 73,182 examples (chat, reasoning, tool-calling, code, knowledge, helpsteer, etc.) |
38
+ | CPT Steps | 11,939 (1 epoch, train loss: 0.48) |
39
+ | SFT Steps | 9,600 (early stopping at epoch 1.05, best eval loss: 0.992) |
40
+ | LoRA Rank | CPT: r=128, SFT: r=64 (rsLoRA) |
41
+ | Precision | bf16 |
42
+ | Framework | Unsloth 2026.3.x + TRL 0.29.1 |
43
+ | GPU | NVIDIA RTX 5090 (32 GB VRAM) |
44
+ | Total Training Time | ~7 days |
45
+ | Total Cost | ~$25 (vast.ai) |
46
+
47
+ ## Evaluation Results
48
+
49
+ ### Custom Turkish Benchmark (360 multiple-choice questions)
50
+
51
+ > **Important:** This benchmark uses handcrafted questions inspired by global benchmarks (MMLU, ARC, GSM8K, TruthfulQA, HellaSwag, Winogrande). It is NOT an official benchmark and scores are not directly comparable to published leaderboard results. Questions tend to be easier than official benchmarks.
52
+
53
+ | Benchmark | Description | Questions | jazari-4b-sft | qwen3.5:4b (base) |
54
+ |-----------|-------------|:---------:|:-------------:|:------------------:|
55
+ | TR-MMLU | Turkish knowledge (12 subjects) | 120 | 85.0% | 80.0% |
56
+ | TR-ARC | Turkish science (elementary) | 50 | 98.0% | 84.0% |
57
+ | TR-GSM8K | Turkish math word problems | 50 | 52.0% | 66.0% |
58
+ | TR-TruthfulQA | Misconception/truthfulness | 40 | 95.0% | 52.5% |
59
+ | TR-HellaSwag | Sentence completion | 40 | 97.5% | 47.5% |
60
+ | TR-Winogrande | Coreference resolution | 30 | 66.7% | 43.3% |
61
+ | TR-Cultural | Turkish culture & idioms | 30 | 76.7% | 33.3% |
62
+ | **Overall** | | **360** | **82.5%** | **65.0%** |
63
+
64
+ *Note: qwen3.5:4b scores may be affected by think-mode output parsing issues (121/360 answers could not be extracted). These results should be interpreted with caution.*
65
+
66
+ ### Real-World Evaluation (15 scenarios, scored by independent evaluator)
67
+
68
+ | Model | Accuracy | Turkish Quality | Helpfulness | Overall |
69
+ |-------|:--------:|:---------------:|:-----------:|:-------:|
70
+ | jazari-4b-sft | 5.20/10 | 7.33/10 | 5.00/10 | 5.84/10 |
71
+ | qwen3.5:4b | 5.00/10 | 7.00/10 | 4.53/10 | 5.51/10 |
72
+
73
+ *jazari won 6/15 scenarios, qwen won 3/15, 6 ties.*
74
+
75
+ ## Strengths
76
+
77
+ - **Turkish fluency:** More natural Turkish responses compared to base Qwen3.5-4B
78
+ - **Cultural knowledge:** Better understanding of Turkish idioms, proverbs, and cultural references
79
+ - **Conversational ability:** Stable, helpful responses in casual Turkish conversation
80
+ - **Speed:** ~5x faster inference than base model (no think-mode overhead)
81
+ - **Low cost:** Entire training cost ~$25 on vast.ai
82
+
83
+ ## Known Limitations & Failure Modes
84
+
85
+ **Be aware of these issues before using this model:**
86
+
87
+ 1. **Overconfidence:** The model rarely says "I don't know." It may generate plausible-sounding but incorrect answers.
88
+ 2. **Format discipline:** Struggles with strict output formats. When asked for JSON, may return plain text instead.
89
+ 3. **Translation:** Cannot reliably translate Turkish to English (may echo the Turkish input).
90
+ 4. **Grammar correction:** May not detect obvious spelling errors (e.g., "Herkezin" vs "Herkesin").
91
+ 5. **Math word problems:** Weaker than base model on TR-GSM8K (52% vs 66%).
92
+ 6. **Instruction following:** May add extra text when asked for short answers (e.g., "Just say Yes or No").
93
+ 7. **Hallucination:** May fabricate plausible-sounding facts, especially about specific places, people, or events.
94
+ 8. **Limited CPT data:** Only 674 MB of Turkish text was used for pre-training, which limits deep domain knowledge.
95
+
96
+ ### Example failures:
97
+
98
+ ```
99
+ Q: "Bu cümledeki yazım yanlışını bul: 'Herkezin bir fikri var.'"
100
+ A: "Cümlede ciddi bir sorun yok." # WRONG - should detect "Herkezin" → "Herkesin"
101
+
102
+ Q: "Bu cümleyi İngilizce'ye çevir: 'Türkiye güzel bir ülkedir.'"
103
+ A: "Türkiye güzel bir ülkedir." # WRONG - echoed Turkish instead of translating
104
+
105
+ Q: "Şu bilgileri JSON formatında ver: isim=Ali, yaş=25"
106
+ A: "Ali, 25 yaşındadır." # WRONG - should have returned JSON
107
+ ```
108
+
109
+ ## Usage
110
+
111
+ ```python
112
+ from transformers import AutoModelForCausalLM, AutoTokenizer
113
+
114
+ model = AutoModelForCausalLM.from_pretrained("mahsum/jazari-4b-sft-tr")
115
+ tokenizer = AutoTokenizer.from_pretrained("mahsum/jazari-4b-sft-tr")
116
+
117
+ messages = [
118
+ {"role": "user", "content": "Türkiye'nin en güzel şehri neresidir?"}
119
+ ]
120
+
121
+ inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
122
+ output = model.generate(inputs, max_new_tokens=256, temperature=0.7, top_p=0.9, repetition_penalty=1.3)
123
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
124
+ ```
125
+
126
+ ### GGUF / Ollama
127
+
128
+ ```bash
129
+ # If GGUF version is available
130
+ ollama run mahsum/jazari-4b-sft-tr
131
+ ```
132
+
133
+ ## What this model is good for
134
+
135
+ - Turkish conversational AI experiments
136
+ - Studying low-budget Turkish LLM adaptation techniques
137
+ - Baseline for further Turkish fine-tuning
138
+ - Learning about CPT + SFT training pipelines
139
+
140
+ ## What this model is NOT good for
141
+
142
+ - Production deployment without additional safety measures
143
+ - Reliable factual question answering
144
+ - Tasks requiring strict format compliance (JSON, structured output)
145
+ - Translation tasks
146
+ - Any high-stakes application
147
+
148
+ ## Next Steps (v2 Roadmap)
149
+
150
+ - [ ] Full CPT with 26 GB cleaned Turkish text (currently only 674 MB used)
151
+ - [ ] Alignment training (SimPO/GRPO)
152
+ - [ ] Improved format discipline and instruction following
153
+ - [ ] Uncertainty calibration ("I don't know" training)
154
+ - [ ] Official Cetvel benchmark evaluation
155
+ - [ ] GGUF quantized versions for edge deployment
156
+
157
+ ## Citation
158
+
159
+ ```bibtex
160
+ @misc{aktas2026jazari,
161
+ title={Jazari-4B-SFT-TR: Low-Budget Turkish Adaptation of Qwen3.5-4B},
162
+ author={Aktas, Mahsum},
163
+ year={2026},
164
+ url={https://huggingface.co/mahsum/jazari-4b-sft-tr}
165
+ }
166
+ ```
167
+
168
+ ## Acknowledgments
169
+
170
+ - [Qwen Team](https://huggingface.co/Qwen) for the base model
171
+ - [Unsloth](https://github.com/unslothai/unsloth) for training optimizations
172
+ - [vast.ai](https://vast.ai) for affordable GPU compute
173
+ - Turkish NLP community for open datasets
174
+
175
+ ---
176
+
177
+ *Jazari is named after Al-Jazari (1136-1206), the Turkish-Muslim polymath and engineer, often regarded as the father of robotics.*
chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if loop.index0 > ns.last_query_index %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is false %}
150
+ {{- '<think>\n\n</think>\n\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3_5ForConditionalGeneration"
4
+ ],
5
+ "torch_dtype": "bfloat16",
6
+ "eos_token_id": 248046,
7
+ "image_token_id": 248056,
8
+ "model_name": "/workspace/merged_cpt",
9
+ "model_type": "qwen3_5",
10
+ "pad_token_id": 248044,
11
+ "text_config": {
12
+ "attention_bias": false,
13
+ "attention_dropout": 0.0,
14
+ "attn_output_gate": true,
15
+ "bos_token_id": null,
16
+ "torch_dtype": "bfloat16",
17
+ "eos_token_id": 248044,
18
+ "full_attention_interval": 4,
19
+ "head_dim": 256,
20
+ "hidden_act": "silu",
21
+ "hidden_size": 2560,
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 9216,
24
+ "layer_types": [
25
+ "linear_attention",
26
+ "linear_attention",
27
+ "linear_attention",
28
+ "full_attention",
29
+ "linear_attention",
30
+ "linear_attention",
31
+ "linear_attention",
32
+ "full_attention",
33
+ "linear_attention",
34
+ "linear_attention",
35
+ "linear_attention",
36
+ "full_attention",
37
+ "linear_attention",
38
+ "linear_attention",
39
+ "linear_attention",
40
+ "full_attention",
41
+ "linear_attention",
42
+ "linear_attention",
43
+ "linear_attention",
44
+ "full_attention",
45
+ "linear_attention",
46
+ "linear_attention",
47
+ "linear_attention",
48
+ "full_attention",
49
+ "linear_attention",
50
+ "linear_attention",
51
+ "linear_attention",
52
+ "full_attention",
53
+ "linear_attention",
54
+ "linear_attention",
55
+ "linear_attention",
56
+ "full_attention"
57
+ ],
58
+ "linear_conv_kernel_dim": 4,
59
+ "linear_key_head_dim": 128,
60
+ "linear_num_key_heads": 16,
61
+ "linear_num_value_heads": 32,
62
+ "linear_value_head_dim": 128,
63
+ "mamba_ssm_dtype": "float32",
64
+ "max_position_embeddings": 262144,
65
+ "mlp_only_layers": [],
66
+ "model_type": "qwen3_5_text",
67
+ "mtp_num_hidden_layers": 1,
68
+ "mtp_use_dedicated_embeddings": false,
69
+ "num_attention_heads": 16,
70
+ "num_hidden_layers": 32,
71
+ "num_key_value_heads": 4,
72
+ "pad_token_id": null,
73
+ "partial_rotary_factor": 0.25,
74
+ "rms_norm_eps": 1e-06,
75
+ "rope_parameters": {
76
+ "mrope_interleaved": true,
77
+ "mrope_section": [
78
+ 11,
79
+ 11,
80
+ 10
81
+ ],
82
+ "partial_rotary_factor": 0.25,
83
+ "rope_theta": 10000000,
84
+ "rope_type": "default"
85
+ },
86
+ "tie_word_embeddings": true,
87
+ "use_cache": true,
88
+ "vocab_size": 248320
89
+ },
90
+ "tie_word_embeddings": true,
91
+ "unsloth_version": "2026.3.15",
92
+ "use_cache": false,
93
+ "video_token_id": 248057,
94
+ "vision_config": {
95
+ "deepstack_visual_indexes": [],
96
+ "depth": 24,
97
+ "torch_dtype": "bfloat16",
98
+ "hidden_act": "gelu_pytorch_tanh",
99
+ "hidden_size": 1024,
100
+ "in_channels": 3,
101
+ "initializer_range": 0.02,
102
+ "intermediate_size": 4096,
103
+ "model_type": "qwen3_5",
104
+ "num_heads": 16,
105
+ "num_position_embeddings": 2304,
106
+ "out_hidden_size": 2560,
107
+ "patch_size": 16,
108
+ "spatial_merge_size": 2,
109
+ "temporal_patch_size": 2
110
+ },
111
+ "vision_end_token_id": 248054,
112
+ "vision_start_token_id": 248053
113
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1e4f66e509f31fb51dce86e5d304a8aec523ac8b1dbb98c5394e84f056e75c8
3
+ size 9079166568
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
3
+ size 19989343
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "padding_side": "left",
25
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
26
+ "split_special_tokens": false,
27
+ "tokenizer_class": "TokenizersBackend",
28
+ "unk_token": null,
29
+ "video_token": "<|video_pad|>",
30
+ "vision_bos_token": "<|vision_start|>",
31
+ "vision_eos_token": "<|vision_end|>",
32
+ "chat_template": "{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n {%- if content is string %}\n {{- content }}\n {%- elif content is iterable and content is not mapping %}\n {%- for item in content %}\n {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain images.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- set image_count.value = image_count.value + 1 %}\n {%- endif %}\n {%- if add_vision_id %}\n {{- 'Picture ' ~ image_count.value ~ ': ' }}\n {%- endif %}\n {{- '<|vision_start|><|image_pad|><|vision_end|>' }}\n {%- elif 'video' in item or item.type == 'video' %}\n {%- if is_system_content %}\n {{- raise_exception('System message cannot contain videos.') }}\n {%- endif %}\n {%- if do_vision_count %}\n {%- set video_count.value = video_count.value + 1 %}\n {%- endif %}\n {%- if add_vision_id %}\n {{- 'Video ' ~ video_count.value ~ ': ' }}\n {%- endif %}\n {{- '<|vision_start|><|video_pad|><|vision_end|>' }}\n {%- elif 'text' in item %}\n {{- item.text }}\n {%- else %}\n {{- raise_exception('Unexpected item type in content.') }}\n {%- endif %}\n {%- endfor %}\n {%- elif content is none or content is undefined %}\n {{- '' }}\n {%- else %}\n {{- raise_exception('Unexpected content type.') }}\n {%- endif %}\n{%- endmacro %}\n{%- if not messages %}\n {{- raise_exception('No messages provided.') }}\n{%- endif %}\n{%- if tools and tools is iterable and tools is not mapping %}\n {{- '<|im_start|>system\\n' }}\n {{- \"# Tools\\n\\nYou have access to the following functions:\\n\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\" }}\n {{- '\\n\\nIf you choose to call a function ONLY reply in the following format with NO suffix:\\n\\n<tool_call>\\n<function=example_function_name>\\n<parameter=example_parameter_1>\\nvalue_1\\n</parameter>\\n<parameter=example_parameter_2>\\nThis is the value for the second parameter\\nthat can span\\nmultiple lines\\n</parameter>\\n</function>\\n</tool_call>\\n\\n<IMPORTANT>\\nReminder:\\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\\n- Required parameters MUST be specified\\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\\n</IMPORTANT>' }}\n {%- if messages[0].role == 'system' %}\n {%- set content = render_content(messages[0].content, false, true)|trim %}\n {%- if content %}\n {{- '\\n\\n' + content }}\n {%- endif %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {%- set content = render_content(messages[0].content, false, true)|trim %}\n {{- '<|im_start|>system\\n' + content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" %}\n {%- set content = render_content(message.content, false)|trim %}\n {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if ns.multi_step_tool %}\n {{- raise_exception('No user query found in messages.') }}\n{%- endif %}\n{%- for message in messages %}\n {%- set content = render_content(message.content, true)|trim %}\n {%- if message.role == \"system\" %}\n {%- if not loop.first %}\n {{- raise_exception('System message must be at the beginning.') }}\n {%- endif %}\n {%- elif message.role == \"user\" %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- set reasoning_content = reasoning_content|trim %}\n {%- if loop.index0 > ns.last_query_index %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content + '\\n</think>\\n\\n' + content }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {%- if loop.first %}\n {%- if content|trim %}\n {{- '\\n\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n {%- else %}\n {{- '<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n {%- endif %}\n {%- else %}\n {{- '\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n {%- endif %}\n {%- if tool_call.arguments is defined %}\n {%- for args_name, args_value in tool_call.arguments|items %}\n {{- '<parameter=' + args_name + '>\\n' }}\n {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}\n {{- args_value }}\n {{- '\\n</parameter>\\n' }}\n {%- endfor %}\n {%- endif %}\n {{- '</function>\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.previtem and loop.previtem.role != \"tool\" %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if not loop.last and loop.nextitem.role != \"tool\" %}\n {{- '<|im_end|>\\n' }}\n {%- elif loop.last %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- else %}\n {{- raise_exception('Unexpected message role.') }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- else %}\n {{- '<think>\\n' }}\n {%- endif %}\n{%- endif %}"
33
+ }