pavelfedortsov commited on
Commit
e3445f1
·
verified ·
1 Parent(s): eca335a

Upload E4B colloquial merged for RunPod vLLM

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ card_assets/training_curves.png filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ru
4
+ license: other
5
+ license_name: gemma
6
+ license_link: https://ai.google.dev/gemma/terms
7
+ base_model: google/gemma-4-E4B-it
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - gemma4
11
+ - russian
12
+ - colloquial
13
+ - style-transfer
14
+ - merged
15
+ - vllm
16
+ library_name: transformers
17
+ datasets:
18
+ - pavelfedortsov/russian-colloquial-sft-50k
19
+ ---
20
+
21
+ # gemma4-e4b-colloquial-ru-merged
22
+
23
+ *English:* Full-weight **Gemma 4 E4B** checkpoint with colloquial Russian LoRA **merged in** for vLLM / [RunPod Serverless](https://www.runpod.io/serverless). No PEFT at inference time.
24
+
25
+ ## What this model does
26
+
27
+ Rewrites **formal Russian** into **casual chat-style Russian** (Telegram-like), **without profanity**, while keeping facts, names, numbers, and paragraph structure.
28
+
29
+ **Not** a general chat model — use the instruction prefix from training (see below).
30
+
31
+ ## Model lineage
32
+
33
+ | Stage | Artifact |
34
+ |-------|----------|
35
+ | Base | [google/gemma-4-E4B-it](https://huggingface.co/google/gemma-4-E4B-it) |
36
+ | LoRA (SFT) | [pavelfedortsov/gemma4-e4b-lora-colloquial-ru](https://huggingface.co/pavelfedortsov/gemma4-e4b-lora-colloquial-ru) |
37
+ | **This repo** | LoRA merged into base + vLLM fixes (`k_norm`, processor configs) |
38
+
39
+ Merge was done with `peft.merge_and_unload()`; missing `language_model` **k_norm** weights for layers 24–41 were copied from the base checkpoint (required for vLLM).
40
+
41
+ ## Training data
42
+
43
+ - **50,000** SFT pairs, mat-free colloquial style
44
+ - Hub dataset: [pavelfedortsov/russian-colloquial-sft-50k](https://huggingface.co/datasets/pavelfedortsov/russian-colloquial-sft-50k)
45
+ - Built from [kurumikz/telegram-corpus-russian-kazakh](https://huggingface.co/datasets/kurumikz/telegram-corpus-russian-kazakh) + Gemini pair generation (see dataset card)
46
+
47
+ **User prompt template (training & inference):**
48
+
49
+ ```
50
+ Перепиши простым разговорным русским, как в переписке. Без мата и грубости. Сохрани смысл:
51
+ <формальный текст>
52
+ ```
53
+
54
+ ## Training configuration (LoRA → merge)
55
+
56
+ Config file (also in `card_assets/train_colloquial_e4b_gpu.yaml`):
57
+
58
+ | Parameter | Value |
59
+ |-----------|--------|
60
+ | Base model | `google/gemma-4-E4B-it` |
61
+ | Method | LoRA on language tower (`model.language_model.*`) |
62
+ | LoRA rank / alpha | **32 / 64** |
63
+ | Target modules | `q,k,v,o` + MLP (`gate, up, down`) |
64
+ | Dataset | 50k × 1 repeat |
65
+ | Epochs | **2** (12,500 optimizer steps) |
66
+ | Seq length | 512 |
67
+ | Batch | 1 × grad accum **8** (effective 8) |
68
+ | LR | 1e-4, cosine, warmup 3% |
69
+ | Precision | bf16, gradient checkpointing |
70
+ | Loss | assistant-only |
71
+ | Hardware | RunPod **A100 80GB** |
72
+
73
+ ## Training metrics (LoRA run)
74
+
75
+ ![Training curves](card_assets/training_curves.png)
76
+
77
+ | Metric | Start (step ~25) | End (step 12,500) | Best |
78
+ |--------|------------------|-------------------|------|
79
+ | `loss` | ~3.42 | ~0.81 | **~0.67** |
80
+ | `mean_token_accuracy` | ~0.63 | ~0.82 | **~0.84** |
81
+
82
+ Checkpoints saved every 1000 steps under the LoRA adapter repo.
83
+
84
+ ## Inference
85
+
86
+ ### RunPod Serverless (vLLM)
87
+
88
+ ```env
89
+ MODEL_NAME=pavelfedortsov/gemma4-e4b-colloquial-ru-merged
90
+ HF_TOKEN=<your_token>
91
+ TRUST_REMOTE_CODE=true
92
+ DTYPE=bfloat16
93
+ MAX_MODEL_LEN=4096
94
+ GPU_MEMORY_UTILIZATION=0.90
95
+ ENFORCE_EAGER=true
96
+ ENABLE_LORA=false
97
+ LANGUAGE_MODEL_ONLY=true
98
+ LIMIT_MM_PER_PROMPT={"image":0,"audio":0,"video":0}
99
+ ```
100
+
101
+ Recommended GPU: **≥40 GB** VRAM (merged ~32 GB weights in bf16).
102
+
103
+ ### Transformers (local)
104
+
105
+ ```python
106
+ from transformers import AutoModelForCausalLM, AutoTokenizer
107
+ import torch
108
+
109
+ model_id = "pavelfedortsov/gemma4-e4b-colloquial-ru-merged"
110
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
111
+ model = AutoModelForCausalLM.from_pretrained(
112
+ model_id,
113
+ torch_dtype=torch.bfloat16,
114
+ device_map="auto",
115
+ )
116
+
117
+ formal = "Сегодня на совещании обсуждали внедрение новой версии API."
118
+ user = (
119
+ "Перепиши простым разговорным русским, как в переписке. "
120
+ "Без мата и грубости. Сохрани смысл:\n"
121
+ f"{formal}"
122
+ )
123
+ messages = [{"role": "user", "content": user}]
124
+ prompt = tokenizer.apply_chat_template(
125
+ messages, tokenize=False, add_generation_prompt=True
126
+ )
127
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
128
+ out = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9)
129
+ print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
130
+ ```
131
+
132
+ ### OpenAI-compatible API (RunPod / vLLM)
133
+
134
+ ```bash
135
+ curl "$RUNPOD_URL/v1/chat/completions" \
136
+ -H "Authorization: Bearer $RUNPOD_API_KEY" \
137
+ -H "Content-Type: application/json" \
138
+ -d '{
139
+ "model": "pavelfedortsov/gemma4-e4b-colloquial-ru-merged",
140
+ "messages": [{
141
+ "role": "user",
142
+ "content": "Перепиши простым разговорным русским, как в переписке. Без мата и грубости. Сохрани смысл:\nВаш формальный т��кст."
143
+ }],
144
+ "max_tokens": 512,
145
+ "temperature": 0.7
146
+ }'
147
+ ```
148
+
149
+ ## Limitations
150
+
151
+ - [Gemma license](https://ai.google.dev/gemma/terms) applies to the base architecture and weights.
152
+ - Quality varies on long news-style text; model may shorten or paraphrase aggressively.
153
+ - Not safety-tuned for production without your own evaluation.
154
+ - Merged vs LoRA inference can differ slightly in style.
155
+
156
+ ## Related repos
157
+
158
+ | Resource | Link |
159
+ |----------|------|
160
+ | LoRA adapter | https://huggingface.co/pavelfedortsov/gemma4-e4b-lora-colloquial-ru |
161
+ | Dataset (50k) | https://huggingface.co/datasets/pavelfedortsov/russian-colloquial-sft-50k |
162
+ | Base model | https://huggingface.co/google/gemma-4-E4B-it |
card_assets/train_colloquial_e4b_gpu.yaml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Colloquial SFT: google/gemma-4-E4B-it, 50k pairs (data/russian_colloquial_sft.jsonl)
2
+ # GPU 24GB+: batch 1 + grad accum 8. RunPod: scripts/run_train_colloquial_e4b_gpu.sh
3
+ model_id: google/gemma-4-E4B-it
4
+
5
+ dataset_path: data/russian_colloquial_sft.jsonl
6
+ dataset_repeats: 1
7
+
8
+ max_seq_length: 512
9
+ per_device_train_batch_size: 1
10
+ gradient_accumulation_steps: 8
11
+ num_train_epochs: 2
12
+ learning_rate: 1.0e-4
13
+ warmup_ratio: 0.03
14
+ lr_scheduler_type: cosine
15
+
16
+ lora_r: 32
17
+ lora_alpha: 64
18
+ lora_dropout: 0.05
19
+ lora_target_modules:
20
+ - q_proj
21
+ - k_proj
22
+ - v_proj
23
+ - o_proj
24
+ - gate_proj
25
+ - up_proj
26
+ - down_proj
27
+
28
+ gradient_checkpointing: true
29
+ use_cache: false
30
+ bf16: true
31
+ fp16: false
32
+
33
+ device: auto
34
+ logging_steps: 25
35
+ save_strategy: steps
36
+ save_steps: 1000
37
+ packing: false
38
+
39
+ output_dir: outputs/gemma4-e4b-lora-colloquial-ru
40
+ assistant_only_loss: true
card_assets/training_curves.png ADDED

Git LFS Details

  • SHA256: 61941ae161241e56ff4ac919615a5a83c60b9ffa07df8ec3e753a13836e6da2a
  • Pointer size: 131 Bytes
  • Size of remote file: 161 kB
chat_template.jinja ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {#- Text-only training template for Gemma 4 (E2B) with assistant-only loss. -#}
2
+ {{ bos_token }}
3
+ {%- for message in messages -%}
4
+ {%- if message['role'] == 'assistant' -%}
5
+ {{ '<|turn>model\n' }}{% generation %}{{ message['content'] | trim }}{{ '<turn|>\n' }}{% endgeneration %}
6
+ {%- elif message['role'] == 'user' -%}
7
+ {{ '<|turn>user\n' }}{{ message['content'] | trim }}{{ '<turn|>\n' }}
8
+ {%- elif message['role'] == 'system' -%}
9
+ {{ '<|turn>user\n' }}{{ message['content'] | trim }}{{ '<turn|>\n' }}
10
+ {%- else -%}
11
+ {{ raise_exception('Unsupported role: ' ~ message['role']) }}
12
+ {%- endif -%}
13
+ {%- endfor -%}
14
+ {%- if add_generation_prompt -%}
15
+ {{ '<|turn>model\n' }}
16
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Gemma4ForConditionalGeneration"
4
+ ],
5
+ "audio_config": {
6
+ "_name_or_path": "",
7
+ "architectures": null,
8
+ "attention_chunk_size": 12,
9
+ "attention_context_left": 13,
10
+ "attention_context_right": 0,
11
+ "attention_invalid_logits_value": -1000000000.0,
12
+ "attention_logit_cap": 50.0,
13
+ "chunk_size_feed_forward": 0,
14
+ "conv_kernel_size": 5,
15
+ "dtype": "float32",
16
+ "gradient_clipping": 10000000000.0,
17
+ "hidden_act": "silu",
18
+ "hidden_size": 1024,
19
+ "id2label": {
20
+ "0": "LABEL_0",
21
+ "1": "LABEL_1"
22
+ },
23
+ "initializer_range": 0.02,
24
+ "is_encoder_decoder": false,
25
+ "label2id": {
26
+ "LABEL_0": 0,
27
+ "LABEL_1": 1
28
+ },
29
+ "model_type": "gemma4_audio",
30
+ "num_attention_heads": 8,
31
+ "num_hidden_layers": 12,
32
+ "output_attentions": false,
33
+ "output_hidden_states": false,
34
+ "output_proj_dims": 1536,
35
+ "problem_type": null,
36
+ "residual_weight": 0.5,
37
+ "return_dict": true,
38
+ "rms_norm_eps": 1e-06,
39
+ "subsampling_conv_channels": [
40
+ 128,
41
+ 32
42
+ ],
43
+ "use_clipped_linears": true
44
+ },
45
+ "audio_token_id": 258881,
46
+ "boa_token_id": 256000,
47
+ "boi_token_id": 255999,
48
+ "dtype": "float32",
49
+ "eoa_token_id": 258883,
50
+ "eoa_token_index": 258883,
51
+ "eoi_token_id": 258882,
52
+ "eos_token_id": [
53
+ 1,
54
+ 106
55
+ ],
56
+ "image_token_id": 258880,
57
+ "initializer_range": 0.02,
58
+ "model_type": "gemma4",
59
+ "text_config": {
60
+ "attention_bias": false,
61
+ "attention_dropout": 0.0,
62
+ "attention_k_eq_v": false,
63
+ "bos_token_id": 2,
64
+ "dtype": "float32",
65
+ "enable_moe_block": false,
66
+ "eos_token_id": 1,
67
+ "expert_intermediate_size": null,
68
+ "final_logit_softcapping": 30.0,
69
+ "global_head_dim": 512,
70
+ "head_dim": 256,
71
+ "hidden_activation": "gelu_pytorch_tanh",
72
+ "hidden_size": 2560,
73
+ "hidden_size_per_layer_input": 256,
74
+ "initializer_range": 0.02,
75
+ "intermediate_size": 10240,
76
+ "layer_types": [
77
+ "sliding_attention",
78
+ "sliding_attention",
79
+ "sliding_attention",
80
+ "sliding_attention",
81
+ "sliding_attention",
82
+ "full_attention",
83
+ "sliding_attention",
84
+ "sliding_attention",
85
+ "sliding_attention",
86
+ "sliding_attention",
87
+ "sliding_attention",
88
+ "full_attention",
89
+ "sliding_attention",
90
+ "sliding_attention",
91
+ "sliding_attention",
92
+ "sliding_attention",
93
+ "sliding_attention",
94
+ "full_attention",
95
+ "sliding_attention",
96
+ "sliding_attention",
97
+ "sliding_attention",
98
+ "sliding_attention",
99
+ "sliding_attention",
100
+ "full_attention",
101
+ "sliding_attention",
102
+ "sliding_attention",
103
+ "sliding_attention",
104
+ "sliding_attention",
105
+ "sliding_attention",
106
+ "full_attention",
107
+ "sliding_attention",
108
+ "sliding_attention",
109
+ "sliding_attention",
110
+ "sliding_attention",
111
+ "sliding_attention",
112
+ "full_attention",
113
+ "sliding_attention",
114
+ "sliding_attention",
115
+ "sliding_attention",
116
+ "sliding_attention",
117
+ "sliding_attention",
118
+ "full_attention"
119
+ ],
120
+ "max_position_embeddings": 131072,
121
+ "model_type": "gemma4_text",
122
+ "moe_intermediate_size": null,
123
+ "num_attention_heads": 8,
124
+ "num_experts": null,
125
+ "num_global_key_value_heads": null,
126
+ "num_hidden_layers": 42,
127
+ "num_key_value_heads": 2,
128
+ "num_kv_shared_layers": 18,
129
+ "pad_token_id": 0,
130
+ "rms_norm_eps": 1e-06,
131
+ "rope_parameters": {
132
+ "full_attention": {
133
+ "partial_rotary_factor": 0.25,
134
+ "rope_theta": 1000000.0,
135
+ "rope_type": "proportional"
136
+ },
137
+ "sliding_attention": {
138
+ "rope_theta": 10000.0,
139
+ "rope_type": "default"
140
+ }
141
+ },
142
+ "sliding_window": 512,
143
+ "tie_word_embeddings": true,
144
+ "top_k_experts": null,
145
+ "use_bidirectional_attention": null,
146
+ "use_cache": true,
147
+ "use_double_wide_mlp": false,
148
+ "vocab_size": 262144,
149
+ "vocab_size_per_layer_input": 262144
150
+ },
151
+ "tie_word_embeddings": true,
152
+ "transformers_version": "5.9.0",
153
+ "video_token_id": 258884,
154
+ "vision_config": {
155
+ "_name_or_path": "",
156
+ "architectures": null,
157
+ "attention_bias": false,
158
+ "attention_dropout": 0.0,
159
+ "chunk_size_feed_forward": 0,
160
+ "default_output_length": 280,
161
+ "dtype": "float32",
162
+ "global_head_dim": 64,
163
+ "head_dim": 64,
164
+ "hidden_activation": "gelu_pytorch_tanh",
165
+ "hidden_size": 768,
166
+ "id2label": {
167
+ "0": "LABEL_0",
168
+ "1": "LABEL_1"
169
+ },
170
+ "initializer_range": 0.02,
171
+ "intermediate_size": 3072,
172
+ "is_encoder_decoder": false,
173
+ "label2id": {
174
+ "LABEL_0": 0,
175
+ "LABEL_1": 1
176
+ },
177
+ "max_position_embeddings": 131072,
178
+ "model_type": "gemma4_vision",
179
+ "num_attention_heads": 12,
180
+ "num_hidden_layers": 16,
181
+ "num_key_value_heads": 12,
182
+ "output_attentions": false,
183
+ "output_hidden_states": false,
184
+ "patch_size": 16,
185
+ "pooling_kernel_size": 3,
186
+ "position_embedding_size": 10240,
187
+ "problem_type": null,
188
+ "return_dict": true,
189
+ "rms_norm_eps": 1e-06,
190
+ "rope_parameters": {
191
+ "rope_theta": 100.0,
192
+ "rope_type": "default"
193
+ },
194
+ "standardize": false,
195
+ "use_clipped_linears": true
196
+ },
197
+ "vision_soft_tokens_per_image": 280
198
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 1,
6
+ 106,
7
+ 50
8
+ ],
9
+ "pad_token_id": 0,
10
+ "temperature": 1.0,
11
+ "top_k": 64,
12
+ "top_p": 0.95,
13
+ "transformers_version": "5.5.0.dev0"
14
+ }
image_processor_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": false,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.0,
8
+ 0.0,
9
+ 0.0
10
+ ],
11
+ "image_processor_type": "Gemma4ImageProcessor",
12
+ "image_seq_length": 280,
13
+ "image_std": [
14
+ 1.0,
15
+ 1.0,
16
+ 1.0
17
+ ],
18
+ "max_soft_tokens": 280,
19
+ "patch_size": 16,
20
+ "pooling_kernel_size": 3,
21
+ "resample": 3,
22
+ "rescale_factor": 0.00392156862745098
23
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bd8aa751975c345fb7ccff75b88b0b4590d7d8e7bfb2c146ad06230ad43b919
3
+ size 31764692904
preprocessor_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dither": 0.0,
3
+ "feature_extractor_type": "Gemma4AudioFeatureExtractor",
4
+ "feature_size": 128,
5
+ "fft_length": 512,
6
+ "fft_overdrive": false,
7
+ "frame_length": 320,
8
+ "hop_length": 160,
9
+ "input_scale_factor": 1.0,
10
+ "max_frequency": 8000.0,
11
+ "mel_floor": 0.001,
12
+ "min_frequency": 0.0,
13
+ "padding_side": "right",
14
+ "padding_value": 0.0,
15
+ "per_bin_mean": null,
16
+ "per_bin_stddev": null,
17
+ "preemphasis": 0.0,
18
+ "preemphasis_htk_flavor": true,
19
+ "return_attention_mask": true,
20
+ "sampling_rate": 16000
21
+ }
processor_config.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "audio_ms_per_token": 40,
3
+ "audio_seq_length": 750,
4
+ "feature_extractor": {
5
+ "dither": 0.0,
6
+ "feature_extractor_type": "Gemma4AudioFeatureExtractor",
7
+ "feature_size": 128,
8
+ "fft_length": 512,
9
+ "fft_overdrive": false,
10
+ "frame_length": 320,
11
+ "hop_length": 160,
12
+ "input_scale_factor": 1.0,
13
+ "max_frequency": 8000.0,
14
+ "mel_floor": 0.001,
15
+ "min_frequency": 0.0,
16
+ "padding_side": "right",
17
+ "padding_value": 0.0,
18
+ "per_bin_mean": null,
19
+ "per_bin_stddev": null,
20
+ "preemphasis": 0.0,
21
+ "preemphasis_htk_flavor": true,
22
+ "return_attention_mask": true,
23
+ "sampling_rate": 16000
24
+ },
25
+ "image_processor": {
26
+ "do_convert_rgb": true,
27
+ "do_normalize": false,
28
+ "do_rescale": true,
29
+ "do_resize": true,
30
+ "image_mean": [
31
+ 0.0,
32
+ 0.0,
33
+ 0.0
34
+ ],
35
+ "image_processor_type": "Gemma4ImageProcessor",
36
+ "image_seq_length": 280,
37
+ "image_std": [
38
+ 1.0,
39
+ 1.0,
40
+ 1.0
41
+ ],
42
+ "max_soft_tokens": 280,
43
+ "patch_size": 16,
44
+ "pooling_kernel_size": 3,
45
+ "resample": 3,
46
+ "rescale_factor": 0.00392156862745098
47
+ },
48
+ "image_seq_length": 280,
49
+ "processor_class": "Gemma4Processor",
50
+ "video_processor": {
51
+ "do_convert_rgb": true,
52
+ "do_normalize": true,
53
+ "do_rescale": true,
54
+ "do_resize": true,
55
+ "do_sample_frames": true,
56
+ "image_mean": [
57
+ 0.0,
58
+ 0.0,
59
+ 0.0
60
+ ],
61
+ "image_std": [
62
+ 1.0,
63
+ 1.0,
64
+ 1.0
65
+ ],
66
+ "max_soft_tokens": 70,
67
+ "num_frames": 32,
68
+ "patch_size": 16,
69
+ "pooling_kernel_size": 3,
70
+ "resample": 3,
71
+ "rescale_factor": 0.00392156862745098,
72
+ "return_metadata": false,
73
+ "video_processor_type": "Gemma4VideoProcessor"
74
+ }
75
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f
3
+ size 32169626
tokenizer_config.json ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "audio_token": "<|audio|>",
3
+ "backend": "tokenizers",
4
+ "boa_token": "<|audio>",
5
+ "boi_token": "<|image>",
6
+ "bos_token": "<bos>",
7
+ "eoa_token": "<audio|>",
8
+ "eoc_token": "<channel|>",
9
+ "eoi_token": "<image|>",
10
+ "eos_token": "<eos>",
11
+ "eot_token": "<turn|>",
12
+ "escape_token": "<|\"|>",
13
+ "etc_token": "<tool_call|>",
14
+ "etd_token": "<tool|>",
15
+ "etr_token": "<tool_response|>",
16
+ "extra_special_tokens": [
17
+ "<|video|>"
18
+ ],
19
+ "image_token": "<|image|>",
20
+ "is_local": true,
21
+ "local_files_only": false,
22
+ "mask_token": "<mask>",
23
+ "model_max_length": 1000000000000000019884624838656,
24
+ "model_specific_special_tokens": {
25
+ "audio_token": "<|audio|>",
26
+ "boa_token": "<|audio>",
27
+ "boi_token": "<|image>",
28
+ "eoa_token": "<audio|>",
29
+ "eoc_token": "<channel|>",
30
+ "eoi_token": "<image|>",
31
+ "eot_token": "<turn|>",
32
+ "escape_token": "<|\"|>",
33
+ "etc_token": "<tool_call|>",
34
+ "etd_token": "<tool|>",
35
+ "etr_token": "<tool_response|>",
36
+ "image_token": "<|image|>",
37
+ "soc_token": "<|channel>",
38
+ "sot_token": "<|turn>",
39
+ "stc_token": "<|tool_call>",
40
+ "std_token": "<|tool>",
41
+ "str_token": "<|tool_response>",
42
+ "think_token": "<|think|>"
43
+ },
44
+ "pad_token": "<pad>",
45
+ "padding_side": "left",
46
+ "processor_class": "Gemma4Processor",
47
+ "response_schema": {
48
+ "properties": {
49
+ "content": {
50
+ "type": "string"
51
+ },
52
+ "role": {
53
+ "const": "assistant"
54
+ },
55
+ "thinking": {
56
+ "type": "string"
57
+ },
58
+ "tool_calls": {
59
+ "items": {
60
+ "properties": {
61
+ "function": {
62
+ "properties": {
63
+ "arguments": {
64
+ "additionalProperties": {},
65
+ "type": "object",
66
+ "x-parser": "gemma4-tool-call"
67
+ },
68
+ "name": {
69
+ "type": "string"
70
+ }
71
+ },
72
+ "type": "object",
73
+ "x-regex": "call\\:(?P<name>\\w+)(?P<arguments>\\{.*\\})"
74
+ },
75
+ "type": {
76
+ "const": "function"
77
+ }
78
+ },
79
+ "type": "object"
80
+ },
81
+ "type": "array",
82
+ "x-regex-iterator": "<\\|tool_call>(.*?)<tool_call\\|>"
83
+ }
84
+ },
85
+ "type": "object",
86
+ "x-regex": "(\\<\\|channel\\>thought\\n(?P<thinking>.*?)\\<channel\\|\\>)?(?P<tool_calls>\\<\\|tool_call\\>.*\\<tool_call\\|\\>)?(?P<content>(?:(?!\\<turn\\|\\>)(?!\\<\\|tool_response\\>).)+)?(?:\\<turn\\|\\>|\\<\\|tool_response\\>)?"
87
+ },
88
+ "soc_token": "<|channel>",
89
+ "sot_token": "<|turn>",
90
+ "stc_token": "<|tool_call>",
91
+ "std_token": "<|tool>",
92
+ "str_token": "<|tool_response>",
93
+ "think_token": "<|think|>",
94
+ "tokenizer_class": "GemmaTokenizer",
95
+ "unk_token": "<unk>"
96
+ }
video_processor_config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "do_sample_frames": true,
7
+ "image_mean": [
8
+ 0.0,
9
+ 0.0,
10
+ 0.0
11
+ ],
12
+ "image_std": [
13
+ 1.0,
14
+ 1.0,
15
+ 1.0
16
+ ],
17
+ "max_soft_tokens": 70,
18
+ "num_frames": 32,
19
+ "patch_size": 16,
20
+ "pooling_kernel_size": 3,
21
+ "resample": 3,
22
+ "rescale_factor": 0.00392156862745098,
23
+ "return_metadata": false,
24
+ "video_processor_type": "Gemma4VideoProcessor"
25
+ }