anuragpradhan commited on
Commit
4e62066
·
verified ·
1 Parent(s): 52ef3c5

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - HuggingFaceTB/SmolVLM2-2.2B-Instruct
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ datasets:
7
+ - HuggingFaceM4/the_cauldron
8
+ - HuggingFaceM4/Docmatix
9
+ - lmms-lab/LLaVA-OneVision-Data
10
+ - lmms-lab/M4-Instruct-Data
11
+ - HuggingFaceFV/finevideo
12
+ - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
13
+ - lmms-lab/LLaVA-Video-178K
14
+ - orrzohar/Video-STaR
15
+ - Mutonix/Vript
16
+ - TIGER-Lab/VISTA-400K
17
+ - Enxin/MovieChat-1K_train
18
+ - ShareGPT4Video/ShareGPT4Video
19
+ pipeline_tag: image-text-to-text
20
+ tags:
21
+ - bnb-my-repo
22
+ - video-text-to-text
23
+ language:
24
+ - en
25
+ ---
26
+ # HuggingFaceTB/SmolVLM2-2.2B-Instruct (Quantized)
27
+
28
+ ## Description
29
+ This model is a quantized version of the original model [`HuggingFaceTB/SmolVLM2-2.2B-Instruct`](https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct).
30
+
31
+ It's quantized using the BitsAndBytes library to 4-bit using the [bnb-my-repo](https://huggingface.co/spaces/bnb-community/bnb-my-repo) space.
32
+
33
+ ## Quantization Details
34
+ - **Quantization Type**: int4
35
+ - **bnb_4bit_quant_type**: nf4
36
+ - **bnb_4bit_use_double_quant**: True
37
+ - **bnb_4bit_compute_dtype**: float32
38
+ - **bnb_4bit_quant_storage**: float32
39
+
40
+
41
+
42
+ # 📄 Original Model Information
43
+
44
+
45
+
46
+
47
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM2_banner.png" width="800" height="auto" alt="Image description">
48
+
49
+ # SmolVLM2 2.2B
50
+
51
+ SmolVLM2-2.2B is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 5.2GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited.
52
+ ## Model Summary
53
+
54
+ - **Developed by:** Hugging Face 🤗
55
+ - **Model type:** Multi-modal model (image/multi-image/video/text)
56
+ - **Language(s) (NLP):** English
57
+ - **License:** Apache 2.0
58
+ - **Architecture:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
59
+
60
+ ## Resources
61
+
62
+ - **Demo:** [Video Highlight Generator](https://huggingface.co/spaces/HuggingFaceTB/SmolVLM2-HighlightGenerator)
63
+ - **Blog:** [Blog post](https://huggingface.co/blog/smolvlm2)
64
+
65
+
66
+ ## Uses
67
+
68
+
69
+ SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation.
70
+
71
+ To fine-tune SmolVLM2 on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
72
+
73
+ ## Evaluation
74
+
75
+ ### Vision Evaluation
76
+
77
+ | Model | Mathvista | MMMU | OCRBench | MMStar | AI2D | ChartQA_Test | Science_QA | TextVQA Val | DocVQA Val |
78
+ |-------------------|-----------|-------|----------|--------|------|--------------|------------|-------------|------------|
79
+ | **SmolVLM2 2.2B** | 51.5 | 42 | 72.9 | 46 | 70 | 68.84 | 90 | 73.21 | 79.98 |
80
+ | SmolVLM 2.2B | 43.9 | 38.3 | 65.5 | 41.8 | 84.5 | 71.6 | 84.5 | 72.1 | 79.7 |
81
+
82
+
83
+ ### Video Evaluation
84
+ We evaluated the performance of the SmolVLM2 family on the following scientific benchmarks:
85
+
86
+ | Size | Video-MME | MLVU | MVBench |
87
+ |----------|-----------------|----------|---------------|
88
+ | 2.2B | 52.1 | 55.2 | 46.27 |
89
+ | 500M | 42.2 | 47.3 | 39.73 |
90
+ | 256M | 33.7 | 40.6 | 32.7 |
91
+
92
+
93
+ ### How to get started
94
+
95
+ You can use transformers to load, infer and fine-tune SmolVLM. Make sure you have num2words, flash-attn and latest transformers installed.
96
+ You can load the model as follows.
97
+
98
+ ```python
99
+ from transformers import AutoProcessor, AutoModelForImageTextToText
100
+ import torch
101
+
102
+ model_path = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
103
+ processor = AutoProcessor.from_pretrained(model_path)
104
+ model = AutoModelForImageTextToText.from_pretrained(
105
+ model_path,
106
+ torch_dtype=torch.bfloat16,
107
+ _attn_implementation="flash_attention_2"
108
+ ).to("cuda")
109
+ ```
110
+
111
+ #### Simple Inference
112
+
113
+ You preprocess your inputs directly using chat templates and directly passing them
114
+
115
+ ```python
116
+ messages = [
117
+ {
118
+ "role": "user",
119
+ "content": [
120
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
121
+ {"type": "text", "text": "Can you describe this image?"},
122
+ ]
123
+ },
124
+ ]
125
+
126
+ inputs = processor.apply_chat_template(
127
+ messages,
128
+ add_generation_prompt=True,
129
+ tokenize=True,
130
+ return_dict=True,
131
+ return_tensors="pt",
132
+ ).to(model.device, dtype=torch.bfloat16)
133
+
134
+ generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
135
+ generated_texts = processor.batch_decode(
136
+ generated_ids,
137
+ skip_special_tokens=True,
138
+ )
139
+ print(generated_texts[0])
140
+ ```
141
+
142
+ #### Video Inference
143
+
144
+ To use SmolVLM2 for video inference, make sure you have decord installed.
145
+
146
+ ```python
147
+ messages = [
148
+ {
149
+ "role": "user",
150
+ "content": [
151
+ {"type": "video", "path": "path_to_video.mp4"},
152
+ {"type": "text", "text": "Describe this video in detail"}
153
+ ]
154
+ },
155
+ ]
156
+
157
+ inputs = processor.apply_chat_template(
158
+ messages,
159
+ add_generation_prompt=True,
160
+ tokenize=True,
161
+ return_dict=True,
162
+ return_tensors="pt",
163
+ ).to(model.device, dtype=torch.bfloat16)
164
+
165
+ generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
166
+ generated_texts = processor.batch_decode(
167
+ generated_ids,
168
+ skip_special_tokens=True,
169
+ )
170
+
171
+ print(generated_texts[0])
172
+ ```
173
+ #### Multi-image Interleaved Inference
174
+
175
+ You can interleave multiple media with text using chat templates.
176
+
177
+ ```python
178
+ import torch
179
+
180
+
181
+ messages = [
182
+ {
183
+ "role": "user",
184
+ "content": [
185
+ {"type": "text", "text": "What is the similarity between these two images?"},
186
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
187
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"},
188
+ ]
189
+ },
190
+ ]
191
+
192
+ inputs = processor.apply_chat_template(
193
+ messages,
194
+ add_generation_prompt=True,
195
+ tokenize=True,
196
+ return_dict=True,
197
+ return_tensors="pt",
198
+ ).to(model.device, dtype=torch.bfloat16)
199
+
200
+ generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
201
+ generated_texts = processor.batch_decode(
202
+ generated_ids,
203
+ skip_special_tokens=True,
204
+ )
205
+ print(generated_texts[0])
206
+ ```
207
+
208
+
209
+ ### Model optimizations
210
+
211
+ ## Misuse and Out-of-scope Use
212
+
213
+ SmolVLM is not intended for high-stakes scenarios or critical decision-making processes that affect an individual's well-being or livelihood. The model may produce content that appears factual but may not be accurate. Misuse includes, but is not limited to:
214
+
215
+ - Prohibited Uses:
216
+ - Evaluating or scoring individuals (e.g., in employment, education, credit)
217
+ - Critical automated decision-making
218
+ - Generating unreliable factual content
219
+ - Malicious Activities:
220
+ - Spam generation
221
+ - Disinformation campaigns
222
+ - Harassment or abuse
223
+ - Unauthorized surveillance
224
+
225
+ ### License
226
+
227
+ SmolVLM2 is built upon [the shape-optimized SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as image encoder and [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) for text decoder part.
228
+
229
+ We release the SmolVLM2 checkpoints under the Apache 2.0 license.
230
+
231
+ ## Citation information
232
+ You can cite us in the following way:
233
+ ```bibtex
234
+ @article{marafioti2025smolvlm,
235
+ title={SmolVLM: Redefining small and efficient multimodal models},
236
+ author={Andrés Marafioti and Orr Zohar and Miquel Farré and Merve Noyan and Elie Bakouch and Pedro Cuenca and Cyril Zakka and Loubna Ben Allal and Anton Lozhkov and Nouamane Tazi and Vaibhav Srivastav and Joshua Lochner and Hugo Larcher and Mathieu Morlon and Lewis Tunstall and Leandro von Werra and Thomas Wolf},
237
+ journal={arXiv preprint arXiv:2504.05299},
238
+ year={2025}
239
+ }
240
+ ```
241
+
242
+ ## Training Data
243
+ SmolVLM2 used 3.3M samples for training originally from ten different datasets: [LlaVa Onevision](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [M4-Instruct](https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data), [Mammoth](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M), [LlaVa Video 178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K), [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo), [VideoStar](https://huggingface.co/datasets/orrzohar/Video-STaR), [VRipt](https://huggingface.co/datasets/Mutonix/Vript), [Vista-400K](https://huggingface.co/datasets/TIGER-Lab/VISTA-400K), [MovieChat](https://huggingface.co/datasets/Enxin/MovieChat-1K_train) and [ShareGPT4Video](https://huggingface.co/datasets/ShareGPT4Video/ShareGPT4Video).
244
+ In the following plots we give a general overview of the samples across modalities and the source of those samples.
245
+ <!--
246
+ <center><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="auto" height="auto" alt="Image description">
247
+ </center>
248
+
249
+ ### Details
250
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_datadetails.png" width="auto" height="auto" alt="Image description"> -->
251
+
252
+ ## Data Split per modality
253
+
254
+ | Data Type | Percentage |
255
+ |--------------|------------|
256
+ | Image | 34.4% |
257
+ | Text | 20.2% |
258
+ | Video | 33.0% |
259
+ | Multi-image | 12.3% |
260
+
261
+
262
+ ## Granular dataset slices per modality
263
+
264
+ ### Text Datasets
265
+ | Dataset | Percentage |
266
+ |--------------------------------------------|------------|
267
+ | llava-onevision/magpie_pro_ft3_80b_mt | 6.8% |
268
+ | llava-onevision/magpie_pro_ft3_80b_tt | 6.8% |
269
+ | llava-onevision/magpie_pro_qwen2_72b_tt | 5.8% |
270
+ | llava-onevision/mathqa | 0.9% |
271
+
272
+ ### Multi-image Datasets
273
+ | Dataset | Percentage |
274
+ |--------------------------------------------|------------|
275
+ | m4-instruct-data/m4_instruct_multiimage | 10.4% |
276
+ | mammoth/multiimage-cap6 | 1.9% |
277
+
278
+ ### Image Datasets
279
+ | Dataset | Percentage |
280
+ |--------------------------------------------|------------|
281
+ | llava-onevision/other | 17.4% |
282
+ | llava-onevision/vision_flan | 3.9% |
283
+ | llava-onevision/mavis_math_metagen | 2.6% |
284
+ | llava-onevision/mavis_math_rule_geo | 2.5% |
285
+ | llava-onevision/sharegpt4o | 1.7% |
286
+ | llava-onevision/sharegpt4v_coco | 1.5% |
287
+ | llava-onevision/image_textualization | 1.3% |
288
+ | llava-onevision/sharegpt4v_llava | 0.9% |
289
+ | llava-onevision/mapqa | 0.9% |
290
+ | llava-onevision/qa | 0.8% |
291
+ | llava-onevision/textocr | 0.8% |
292
+
293
+ ### Video Datasets
294
+ | Dataset | Percentage |
295
+ |--------------------------------------------|------------|
296
+ | llava-video-178k/1-2m | 7.3% |
297
+ | llava-video-178k/2-3m | 7.0% |
298
+ | other-video/combined | 5.7% |
299
+ | llava-video-178k/hound | 4.4% |
300
+ | llava-video-178k/0-30s | 2.4% |
301
+ | video-star/starb | 2.2% |
302
+ | vista-400k/combined | 2.2% |
303
+ | vript/long | 1.0% |
304
+ | ShareGPT4Video/all | 0.8% |
305
+
added_tokens.json ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<end_of_utterance>": 49279,
3
+ "<fake_token_around_image>": 49189,
4
+ "<global-img>": 49152,
5
+ "<image>": 49190,
6
+ "<row_1_col_1>": 49153,
7
+ "<row_1_col_2>": 49154,
8
+ "<row_1_col_3>": 49155,
9
+ "<row_1_col_4>": 49156,
10
+ "<row_1_col_5>": 49157,
11
+ "<row_1_col_6>": 49158,
12
+ "<row_2_col_1>": 49159,
13
+ "<row_2_col_2>": 49160,
14
+ "<row_2_col_3>": 49161,
15
+ "<row_2_col_4>": 49162,
16
+ "<row_2_col_5>": 49163,
17
+ "<row_2_col_6>": 49164,
18
+ "<row_3_col_1>": 49165,
19
+ "<row_3_col_2>": 49166,
20
+ "<row_3_col_3>": 49167,
21
+ "<row_3_col_4>": 49168,
22
+ "<row_3_col_5>": 49169,
23
+ "<row_3_col_6>": 49170,
24
+ "<row_4_col_1>": 49171,
25
+ "<row_4_col_2>": 49172,
26
+ "<row_4_col_3>": 49173,
27
+ "<row_4_col_4>": 49174,
28
+ "<row_4_col_5>": 49175,
29
+ "<row_4_col_6>": 49176,
30
+ "<row_5_col_1>": 49177,
31
+ "<row_5_col_2>": 49178,
32
+ "<row_5_col_3>": 49179,
33
+ "<row_5_col_4>": 49180,
34
+ "<row_5_col_5>": 49181,
35
+ "<row_5_col_6>": 49182,
36
+ "<row_6_col_1>": 49183,
37
+ "<row_6_col_2>": 49184,
38
+ "<row_6_col_3>": 49185,
39
+ "<row_6_col_4>": 49186,
40
+ "<row_6_col_5>": 49187,
41
+ "<row_6_col_6>": 49188,
42
+ "<|reserved_special_token_0|>": 49191,
43
+ "<|reserved_special_token_10|>": 49201,
44
+ "<|reserved_special_token_11|>": 49202,
45
+ "<|reserved_special_token_12|>": 49203,
46
+ "<|reserved_special_token_13|>": 49204,
47
+ "<|reserved_special_token_14|>": 49205,
48
+ "<|reserved_special_token_15|>": 49206,
49
+ "<|reserved_special_token_16|>": 49207,
50
+ "<|reserved_special_token_17|>": 49208,
51
+ "<|reserved_special_token_18|>": 49209,
52
+ "<|reserved_special_token_19|>": 49210,
53
+ "<|reserved_special_token_1|>": 49192,
54
+ "<|reserved_special_token_20|>": 49211,
55
+ "<|reserved_special_token_21|>": 49212,
56
+ "<|reserved_special_token_22|>": 49213,
57
+ "<|reserved_special_token_23|>": 49214,
58
+ "<|reserved_special_token_24|>": 49215,
59
+ "<|reserved_special_token_25|>": 49216,
60
+ "<|reserved_special_token_26|>": 49217,
61
+ "<|reserved_special_token_27|>": 49218,
62
+ "<|reserved_special_token_28|>": 49219,
63
+ "<|reserved_special_token_29|>": 49220,
64
+ "<|reserved_special_token_2|>": 49193,
65
+ "<|reserved_special_token_30|>": 49221,
66
+ "<|reserved_special_token_31|>": 49222,
67
+ "<|reserved_special_token_32|>": 49223,
68
+ "<|reserved_special_token_33|>": 49224,
69
+ "<|reserved_special_token_34|>": 49225,
70
+ "<|reserved_special_token_35|>": 49226,
71
+ "<|reserved_special_token_36|>": 49227,
72
+ "<|reserved_special_token_37|>": 49228,
73
+ "<|reserved_special_token_38|>": 49229,
74
+ "<|reserved_special_token_39|>": 49230,
75
+ "<|reserved_special_token_3|>": 49194,
76
+ "<|reserved_special_token_40|>": 49231,
77
+ "<|reserved_special_token_41|>": 49232,
78
+ "<|reserved_special_token_42|>": 49233,
79
+ "<|reserved_special_token_43|>": 49234,
80
+ "<|reserved_special_token_44|>": 49235,
81
+ "<|reserved_special_token_45|>": 49236,
82
+ "<|reserved_special_token_46|>": 49237,
83
+ "<|reserved_special_token_47|>": 49238,
84
+ "<|reserved_special_token_48|>": 49239,
85
+ "<|reserved_special_token_49|>": 49240,
86
+ "<|reserved_special_token_4|>": 49195,
87
+ "<|reserved_special_token_50|>": 49241,
88
+ "<|reserved_special_token_51|>": 49242,
89
+ "<|reserved_special_token_52|>": 49243,
90
+ "<|reserved_special_token_53|>": 49244,
91
+ "<|reserved_special_token_54|>": 49245,
92
+ "<|reserved_special_token_55|>": 49246,
93
+ "<|reserved_special_token_56|>": 49247,
94
+ "<|reserved_special_token_57|>": 49248,
95
+ "<|reserved_special_token_58|>": 49249,
96
+ "<|reserved_special_token_59|>": 49250,
97
+ "<|reserved_special_token_5|>": 49196,
98
+ "<|reserved_special_token_60|>": 49251,
99
+ "<|reserved_special_token_61|>": 49252,
100
+ "<|reserved_special_token_62|>": 49253,
101
+ "<|reserved_special_token_63|>": 49254,
102
+ "<|reserved_special_token_64|>": 49255,
103
+ "<|reserved_special_token_65|>": 49256,
104
+ "<|reserved_special_token_66|>": 49257,
105
+ "<|reserved_special_token_67|>": 49258,
106
+ "<|reserved_special_token_68|>": 49259,
107
+ "<|reserved_special_token_69|>": 49260,
108
+ "<|reserved_special_token_6|>": 49197,
109
+ "<|reserved_special_token_70|>": 49261,
110
+ "<|reserved_special_token_71|>": 49262,
111
+ "<|reserved_special_token_72|>": 49263,
112
+ "<|reserved_special_token_73|>": 49264,
113
+ "<|reserved_special_token_74|>": 49265,
114
+ "<|reserved_special_token_75|>": 49266,
115
+ "<|reserved_special_token_76|>": 49267,
116
+ "<|reserved_special_token_77|>": 49268,
117
+ "<|reserved_special_token_78|>": 49269,
118
+ "<|reserved_special_token_79|>": 49270,
119
+ "<|reserved_special_token_7|>": 49198,
120
+ "<|reserved_special_token_80|>": 49271,
121
+ "<|reserved_special_token_81|>": 49272,
122
+ "<|reserved_special_token_82|>": 49273,
123
+ "<|reserved_special_token_83|>": 49274,
124
+ "<|reserved_special_token_84|>": 49275,
125
+ "<|reserved_special_token_85|>": 49276,
126
+ "<|reserved_special_token_86|>": 49277,
127
+ "<|reserved_special_token_87|>": 49278,
128
+ "<|reserved_special_token_8|>": 49199,
129
+ "<|reserved_special_token_9|>": 49200
130
+ }
chat_template.jinja ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ <|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '<image>' }}{% endif %}{% endfor %}<end_of_utterance>
2
+ {% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
config.json ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "SmolVLMModel"
4
+ ],
5
+ "image_token_id": 49190,
6
+ "model_type": "smolvlm",
7
+ "pad_token_id": 128002,
8
+ "quantization_config": {
9
+ "_load_in_4bit": true,
10
+ "_load_in_8bit": false,
11
+ "bnb_4bit_compute_dtype": "float32",
12
+ "bnb_4bit_quant_storage": "float32",
13
+ "bnb_4bit_quant_type": "nf4",
14
+ "bnb_4bit_use_double_quant": true,
15
+ "llm_int8_enable_fp32_cpu_offload": false,
16
+ "llm_int8_has_fp16_weight": false,
17
+ "llm_int8_skip_modules": null,
18
+ "llm_int8_threshold": 6.0,
19
+ "load_in_4bit": true,
20
+ "load_in_8bit": false,
21
+ "quant_method": "bitsandbytes"
22
+ },
23
+ "scale_factor": 3,
24
+ "text_config": {
25
+ "_flash_attn_2_enabled": true,
26
+ "_name_or_path": "None",
27
+ "architectures": [
28
+ "VLlama3ForCausalLM"
29
+ ],
30
+ "attention_bias": false,
31
+ "attention_dropout": 0.0,
32
+ "head_dim": 64,
33
+ "hidden_act": "silu",
34
+ "hidden_size": 2048,
35
+ "initializer_range": 0.02,
36
+ "intermediate_size": 8192,
37
+ "max_position_embeddings": 8192,
38
+ "mlp_bias": false,
39
+ "model_type": "llama",
40
+ "neftune_noise_alpha": 0.0,
41
+ "num_attention_heads": 32,
42
+ "num_hidden_layers": 24,
43
+ "num_key_value_heads": 32,
44
+ "pad_token_id": 2,
45
+ "perceiver_config": {
46
+ "_name_or_path": "",
47
+ "add_cross_attention": false,
48
+ "architectures": null,
49
+ "attention_dropout": 0.0,
50
+ "bad_words_ids": null,
51
+ "begin_suppress_tokens": null,
52
+ "bos_token_id": null,
53
+ "chunk_size_feed_forward": 0,
54
+ "cross_attention_hidden_size": null,
55
+ "decoder_start_token_id": null,
56
+ "diversity_penalty": 0.0,
57
+ "do_sample": false,
58
+ "early_stopping": false,
59
+ "encoder_no_repeat_ngram_size": 0,
60
+ "eos_token_id": null,
61
+ "exponential_decay_length_penalty": null,
62
+ "finetuning_task": null,
63
+ "forced_bos_token_id": null,
64
+ "forced_eos_token_id": null,
65
+ "hidden_act": "silu",
66
+ "id2label": {
67
+ "0": "LABEL_0",
68
+ "1": "LABEL_1"
69
+ },
70
+ "is_decoder": false,
71
+ "is_encoder_decoder": false,
72
+ "label2id": {
73
+ "LABEL_0": 0,
74
+ "LABEL_1": 1
75
+ },
76
+ "length_penalty": 1.0,
77
+ "max_length": 20,
78
+ "min_length": 0,
79
+ "model_type": "vllama3",
80
+ "no_repeat_ngram_size": 0,
81
+ "num_beam_groups": 1,
82
+ "num_beams": 1,
83
+ "num_key_value_heads": 1,
84
+ "num_return_sequences": 1,
85
+ "output_attentions": false,
86
+ "output_hidden_states": false,
87
+ "output_scores": false,
88
+ "pad_token_id": null,
89
+ "prefix": null,
90
+ "problem_type": null,
91
+ "pruned_heads": {},
92
+ "qk_layer_norms_perceiver": false,
93
+ "remove_invalid_values": false,
94
+ "repetition_penalty": 1.0,
95
+ "resampler_depth": 6,
96
+ "resampler_head_dim": 96,
97
+ "resampler_n_heads": 16,
98
+ "resampler_n_latents": 64,
99
+ "return_dict": true,
100
+ "return_dict_in_generate": false,
101
+ "sep_token_id": null,
102
+ "suppress_tokens": null,
103
+ "task_specific_params": null,
104
+ "temperature": 1.0,
105
+ "tf_legacy_loss": false,
106
+ "tie_encoder_decoder": false,
107
+ "tie_word_embeddings": true,
108
+ "tokenizer_class": null,
109
+ "top_k": 50,
110
+ "top_p": 1.0,
111
+ "torch_dtype": null,
112
+ "torchscript": false,
113
+ "transformers_version": "4.46.0",
114
+ "typical_p": 1.0,
115
+ "use_bfloat16": false
116
+ },
117
+ "pixel_shuffle_factor": 3,
118
+ "pretraining_tp": 1,
119
+ "qk_layer_norms": false,
120
+ "rms_norm_eps": 1e-05,
121
+ "rope_scaling": null,
122
+ "rope_theta": 130000,
123
+ "torch_dtype": "bfloat16",
124
+ "transformers.js_config": {
125
+ "kv_cache_dtype": {
126
+ "fp16": "float16",
127
+ "q4f16": "float16"
128
+ }
129
+ },
130
+ "use_cache": true,
131
+ "use_resampler": false,
132
+ "vocab_size": 49280
133
+ },
134
+ "tie_word_embeddings": false,
135
+ "torch_dtype": "float32",
136
+ "transformers_version": "4.53.1",
137
+ "use_cache": false,
138
+ "use_reentrant_checkpointing": false,
139
+ "vision_config": {
140
+ "attention_dropout": 0.0,
141
+ "hidden_act": "gelu_pytorch_tanh",
142
+ "hidden_size": 1152,
143
+ "image_size": 384,
144
+ "initializer_range": 0.02,
145
+ "intermediate_size": 4304,
146
+ "layer_norm_eps": 1e-06,
147
+ "max_image_size": {
148
+ "longest_edge": 384
149
+ },
150
+ "model_type": "smolvlm_vision",
151
+ "num_attention_heads": 16,
152
+ "num_channels": 3,
153
+ "num_hidden_layers": 27,
154
+ "patch_size": 14,
155
+ "size": {
156
+ "longest_edge": 1920
157
+ },
158
+ "tie_word_embeddings": false,
159
+ "use_base_siglip": false
160
+ },
161
+ "vocab_size": 49280
162
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c6d5775cff1dc78cb5976cf8ae1b83aa1ce2bf70362762424a88d6206bd8a51
3
+ size 1466365538
special_tokens_map.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<fake_token_around_image>",
4
+ "<image>",
5
+ "<end_of_utterance>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<|im_start|>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "end_of_utterance_token": "<end_of_utterance>",
15
+ "eos_token": {
16
+ "content": "<end_of_utterance>",
17
+ "lstrip": false,
18
+ "normalized": false,
19
+ "rstrip": false,
20
+ "single_word": false
21
+ },
22
+ "fake_image_token": "<fake_token_around_image>",
23
+ "global_image_token": "<global-img>",
24
+ "image_token": "<image>",
25
+ "pad_token": {
26
+ "content": "<|im_end|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "unk_token": {
33
+ "content": "<|endoftext|>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ }
39
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,1191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "49152": {
141
+ "content": "<global-img>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "49153": {
149
+ "content": "<row_1_col_1>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "49154": {
157
+ "content": "<row_1_col_2>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "49155": {
165
+ "content": "<row_1_col_3>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "49156": {
173
+ "content": "<row_1_col_4>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "49157": {
181
+ "content": "<row_1_col_5>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "49158": {
189
+ "content": "<row_1_col_6>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "49159": {
197
+ "content": "<row_2_col_1>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "49160": {
205
+ "content": "<row_2_col_2>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "49161": {
213
+ "content": "<row_2_col_3>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "49162": {
221
+ "content": "<row_2_col_4>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "49163": {
229
+ "content": "<row_2_col_5>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "49164": {
237
+ "content": "<row_2_col_6>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "49165": {
245
+ "content": "<row_3_col_1>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "49166": {
253
+ "content": "<row_3_col_2>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "49167": {
261
+ "content": "<row_3_col_3>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "49168": {
269
+ "content": "<row_3_col_4>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "49169": {
277
+ "content": "<row_3_col_5>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "49170": {
285
+ "content": "<row_3_col_6>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "49171": {
293
+ "content": "<row_4_col_1>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "49172": {
301
+ "content": "<row_4_col_2>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "49173": {
309
+ "content": "<row_4_col_3>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "49174": {
317
+ "content": "<row_4_col_4>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "49175": {
325
+ "content": "<row_4_col_5>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "49176": {
333
+ "content": "<row_4_col_6>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "49177": {
341
+ "content": "<row_5_col_1>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "49178": {
349
+ "content": "<row_5_col_2>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "49179": {
357
+ "content": "<row_5_col_3>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "49180": {
365
+ "content": "<row_5_col_4>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "49181": {
373
+ "content": "<row_5_col_5>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "49182": {
381
+ "content": "<row_5_col_6>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "49183": {
389
+ "content": "<row_6_col_1>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "49184": {
397
+ "content": "<row_6_col_2>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "49185": {
405
+ "content": "<row_6_col_3>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "49186": {
413
+ "content": "<row_6_col_4>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "49187": {
421
+ "content": "<row_6_col_5>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "49188": {
429
+ "content": "<row_6_col_6>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "49189": {
437
+ "content": "<fake_token_around_image>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "49190": {
445
+ "content": "<image>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "49191": {
453
+ "content": "<|reserved_special_token_0|>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "49192": {
461
+ "content": "<|reserved_special_token_1|>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "49193": {
469
+ "content": "<|reserved_special_token_2|>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "49194": {
477
+ "content": "<|reserved_special_token_3|>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "49195": {
485
+ "content": "<|reserved_special_token_4|>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "49196": {
493
+ "content": "<|reserved_special_token_5|>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "49197": {
501
+ "content": "<|reserved_special_token_6|>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "49198": {
509
+ "content": "<|reserved_special_token_7|>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "49199": {
517
+ "content": "<|reserved_special_token_8|>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "49200": {
525
+ "content": "<|reserved_special_token_9|>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "49201": {
533
+ "content": "<|reserved_special_token_10|>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "49202": {
541
+ "content": "<|reserved_special_token_11|>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "49203": {
549
+ "content": "<|reserved_special_token_12|>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "49204": {
557
+ "content": "<|reserved_special_token_13|>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "49205": {
565
+ "content": "<|reserved_special_token_14|>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "49206": {
573
+ "content": "<|reserved_special_token_15|>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "49207": {
581
+ "content": "<|reserved_special_token_16|>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "49208": {
589
+ "content": "<|reserved_special_token_17|>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "49209": {
597
+ "content": "<|reserved_special_token_18|>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "49210": {
605
+ "content": "<|reserved_special_token_19|>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "49211": {
613
+ "content": "<|reserved_special_token_20|>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "49212": {
621
+ "content": "<|reserved_special_token_21|>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "49213": {
629
+ "content": "<|reserved_special_token_22|>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "49214": {
637
+ "content": "<|reserved_special_token_23|>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "49215": {
645
+ "content": "<|reserved_special_token_24|>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "49216": {
653
+ "content": "<|reserved_special_token_25|>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "49217": {
661
+ "content": "<|reserved_special_token_26|>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "49218": {
669
+ "content": "<|reserved_special_token_27|>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "49219": {
677
+ "content": "<|reserved_special_token_28|>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "49220": {
685
+ "content": "<|reserved_special_token_29|>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "49221": {
693
+ "content": "<|reserved_special_token_30|>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "49222": {
701
+ "content": "<|reserved_special_token_31|>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "49223": {
709
+ "content": "<|reserved_special_token_32|>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "49224": {
717
+ "content": "<|reserved_special_token_33|>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "49225": {
725
+ "content": "<|reserved_special_token_34|>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "49226": {
733
+ "content": "<|reserved_special_token_35|>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "49227": {
741
+ "content": "<|reserved_special_token_36|>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "49228": {
749
+ "content": "<|reserved_special_token_37|>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "49229": {
757
+ "content": "<|reserved_special_token_38|>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "49230": {
765
+ "content": "<|reserved_special_token_39|>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "49231": {
773
+ "content": "<|reserved_special_token_40|>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "49232": {
781
+ "content": "<|reserved_special_token_41|>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "49233": {
789
+ "content": "<|reserved_special_token_42|>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "49234": {
797
+ "content": "<|reserved_special_token_43|>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "49235": {
805
+ "content": "<|reserved_special_token_44|>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "49236": {
813
+ "content": "<|reserved_special_token_45|>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "49237": {
821
+ "content": "<|reserved_special_token_46|>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "49238": {
829
+ "content": "<|reserved_special_token_47|>",
830
+ "lstrip": false,
831
+ "normalized": false,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": true
835
+ },
836
+ "49239": {
837
+ "content": "<|reserved_special_token_48|>",
838
+ "lstrip": false,
839
+ "normalized": false,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": true
843
+ },
844
+ "49240": {
845
+ "content": "<|reserved_special_token_49|>",
846
+ "lstrip": false,
847
+ "normalized": false,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": true
851
+ },
852
+ "49241": {
853
+ "content": "<|reserved_special_token_50|>",
854
+ "lstrip": false,
855
+ "normalized": false,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": true
859
+ },
860
+ "49242": {
861
+ "content": "<|reserved_special_token_51|>",
862
+ "lstrip": false,
863
+ "normalized": false,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": true
867
+ },
868
+ "49243": {
869
+ "content": "<|reserved_special_token_52|>",
870
+ "lstrip": false,
871
+ "normalized": false,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": true
875
+ },
876
+ "49244": {
877
+ "content": "<|reserved_special_token_53|>",
878
+ "lstrip": false,
879
+ "normalized": false,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": true
883
+ },
884
+ "49245": {
885
+ "content": "<|reserved_special_token_54|>",
886
+ "lstrip": false,
887
+ "normalized": false,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": true
891
+ },
892
+ "49246": {
893
+ "content": "<|reserved_special_token_55|>",
894
+ "lstrip": false,
895
+ "normalized": false,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": true
899
+ },
900
+ "49247": {
901
+ "content": "<|reserved_special_token_56|>",
902
+ "lstrip": false,
903
+ "normalized": false,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": true
907
+ },
908
+ "49248": {
909
+ "content": "<|reserved_special_token_57|>",
910
+ "lstrip": false,
911
+ "normalized": false,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": true
915
+ },
916
+ "49249": {
917
+ "content": "<|reserved_special_token_58|>",
918
+ "lstrip": false,
919
+ "normalized": false,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": true
923
+ },
924
+ "49250": {
925
+ "content": "<|reserved_special_token_59|>",
926
+ "lstrip": false,
927
+ "normalized": false,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": true
931
+ },
932
+ "49251": {
933
+ "content": "<|reserved_special_token_60|>",
934
+ "lstrip": false,
935
+ "normalized": false,
936
+ "rstrip": false,
937
+ "single_word": false,
938
+ "special": true
939
+ },
940
+ "49252": {
941
+ "content": "<|reserved_special_token_61|>",
942
+ "lstrip": false,
943
+ "normalized": false,
944
+ "rstrip": false,
945
+ "single_word": false,
946
+ "special": true
947
+ },
948
+ "49253": {
949
+ "content": "<|reserved_special_token_62|>",
950
+ "lstrip": false,
951
+ "normalized": false,
952
+ "rstrip": false,
953
+ "single_word": false,
954
+ "special": true
955
+ },
956
+ "49254": {
957
+ "content": "<|reserved_special_token_63|>",
958
+ "lstrip": false,
959
+ "normalized": false,
960
+ "rstrip": false,
961
+ "single_word": false,
962
+ "special": true
963
+ },
964
+ "49255": {
965
+ "content": "<|reserved_special_token_64|>",
966
+ "lstrip": false,
967
+ "normalized": false,
968
+ "rstrip": false,
969
+ "single_word": false,
970
+ "special": true
971
+ },
972
+ "49256": {
973
+ "content": "<|reserved_special_token_65|>",
974
+ "lstrip": false,
975
+ "normalized": false,
976
+ "rstrip": false,
977
+ "single_word": false,
978
+ "special": true
979
+ },
980
+ "49257": {
981
+ "content": "<|reserved_special_token_66|>",
982
+ "lstrip": false,
983
+ "normalized": false,
984
+ "rstrip": false,
985
+ "single_word": false,
986
+ "special": true
987
+ },
988
+ "49258": {
989
+ "content": "<|reserved_special_token_67|>",
990
+ "lstrip": false,
991
+ "normalized": false,
992
+ "rstrip": false,
993
+ "single_word": false,
994
+ "special": true
995
+ },
996
+ "49259": {
997
+ "content": "<|reserved_special_token_68|>",
998
+ "lstrip": false,
999
+ "normalized": false,
1000
+ "rstrip": false,
1001
+ "single_word": false,
1002
+ "special": true
1003
+ },
1004
+ "49260": {
1005
+ "content": "<|reserved_special_token_69|>",
1006
+ "lstrip": false,
1007
+ "normalized": false,
1008
+ "rstrip": false,
1009
+ "single_word": false,
1010
+ "special": true
1011
+ },
1012
+ "49261": {
1013
+ "content": "<|reserved_special_token_70|>",
1014
+ "lstrip": false,
1015
+ "normalized": false,
1016
+ "rstrip": false,
1017
+ "single_word": false,
1018
+ "special": true
1019
+ },
1020
+ "49262": {
1021
+ "content": "<|reserved_special_token_71|>",
1022
+ "lstrip": false,
1023
+ "normalized": false,
1024
+ "rstrip": false,
1025
+ "single_word": false,
1026
+ "special": true
1027
+ },
1028
+ "49263": {
1029
+ "content": "<|reserved_special_token_72|>",
1030
+ "lstrip": false,
1031
+ "normalized": false,
1032
+ "rstrip": false,
1033
+ "single_word": false,
1034
+ "special": true
1035
+ },
1036
+ "49264": {
1037
+ "content": "<|reserved_special_token_73|>",
1038
+ "lstrip": false,
1039
+ "normalized": false,
1040
+ "rstrip": false,
1041
+ "single_word": false,
1042
+ "special": true
1043
+ },
1044
+ "49265": {
1045
+ "content": "<|reserved_special_token_74|>",
1046
+ "lstrip": false,
1047
+ "normalized": false,
1048
+ "rstrip": false,
1049
+ "single_word": false,
1050
+ "special": true
1051
+ },
1052
+ "49266": {
1053
+ "content": "<|reserved_special_token_75|>",
1054
+ "lstrip": false,
1055
+ "normalized": false,
1056
+ "rstrip": false,
1057
+ "single_word": false,
1058
+ "special": true
1059
+ },
1060
+ "49267": {
1061
+ "content": "<|reserved_special_token_76|>",
1062
+ "lstrip": false,
1063
+ "normalized": false,
1064
+ "rstrip": false,
1065
+ "single_word": false,
1066
+ "special": true
1067
+ },
1068
+ "49268": {
1069
+ "content": "<|reserved_special_token_77|>",
1070
+ "lstrip": false,
1071
+ "normalized": false,
1072
+ "rstrip": false,
1073
+ "single_word": false,
1074
+ "special": true
1075
+ },
1076
+ "49269": {
1077
+ "content": "<|reserved_special_token_78|>",
1078
+ "lstrip": false,
1079
+ "normalized": false,
1080
+ "rstrip": false,
1081
+ "single_word": false,
1082
+ "special": true
1083
+ },
1084
+ "49270": {
1085
+ "content": "<|reserved_special_token_79|>",
1086
+ "lstrip": false,
1087
+ "normalized": false,
1088
+ "rstrip": false,
1089
+ "single_word": false,
1090
+ "special": true
1091
+ },
1092
+ "49271": {
1093
+ "content": "<|reserved_special_token_80|>",
1094
+ "lstrip": false,
1095
+ "normalized": false,
1096
+ "rstrip": false,
1097
+ "single_word": false,
1098
+ "special": true
1099
+ },
1100
+ "49272": {
1101
+ "content": "<|reserved_special_token_81|>",
1102
+ "lstrip": false,
1103
+ "normalized": false,
1104
+ "rstrip": false,
1105
+ "single_word": false,
1106
+ "special": true
1107
+ },
1108
+ "49273": {
1109
+ "content": "<|reserved_special_token_82|>",
1110
+ "lstrip": false,
1111
+ "normalized": false,
1112
+ "rstrip": false,
1113
+ "single_word": false,
1114
+ "special": true
1115
+ },
1116
+ "49274": {
1117
+ "content": "<|reserved_special_token_83|>",
1118
+ "lstrip": false,
1119
+ "normalized": false,
1120
+ "rstrip": false,
1121
+ "single_word": false,
1122
+ "special": true
1123
+ },
1124
+ "49275": {
1125
+ "content": "<|reserved_special_token_84|>",
1126
+ "lstrip": false,
1127
+ "normalized": false,
1128
+ "rstrip": false,
1129
+ "single_word": false,
1130
+ "special": true
1131
+ },
1132
+ "49276": {
1133
+ "content": "<|reserved_special_token_85|>",
1134
+ "lstrip": false,
1135
+ "normalized": false,
1136
+ "rstrip": false,
1137
+ "single_word": false,
1138
+ "special": true
1139
+ },
1140
+ "49277": {
1141
+ "content": "<|reserved_special_token_86|>",
1142
+ "lstrip": false,
1143
+ "normalized": false,
1144
+ "rstrip": false,
1145
+ "single_word": false,
1146
+ "special": true
1147
+ },
1148
+ "49278": {
1149
+ "content": "<|reserved_special_token_87|>",
1150
+ "lstrip": false,
1151
+ "normalized": false,
1152
+ "rstrip": false,
1153
+ "single_word": false,
1154
+ "special": true
1155
+ },
1156
+ "49279": {
1157
+ "content": "<end_of_utterance>",
1158
+ "lstrip": false,
1159
+ "normalized": false,
1160
+ "rstrip": false,
1161
+ "single_word": false,
1162
+ "special": true
1163
+ }
1164
+ },
1165
+ "additional_special_tokens": [
1166
+ "<fake_token_around_image>",
1167
+ "<image>",
1168
+ "<end_of_utterance>"
1169
+ ],
1170
+ "bos_token": "<|im_start|>",
1171
+ "clean_up_tokenization_spaces": false,
1172
+ "end_of_utterance_token": "<end_of_utterance>",
1173
+ "eos_token": "<end_of_utterance>",
1174
+ "extra_special_tokens": {
1175
+ "end_of_utterance_token": "<end_of_utterance>",
1176
+ "fake_image_token": "<fake_token_around_image>",
1177
+ "global_image_token": "<global-img>",
1178
+ "image_token": "<image>"
1179
+ },
1180
+ "fake_image_token": "<fake_token_around_image>",
1181
+ "global_image_token": "<global-img>",
1182
+ "image_token": "<image>",
1183
+ "legacy": false,
1184
+ "model_max_length": 16384,
1185
+ "pad_token": "<|im_end|>",
1186
+ "processor_class": "SmolVLMProcessor",
1187
+ "tokenizer_class": "GPT2Tokenizer",
1188
+ "truncation_side": "left",
1189
+ "unk_token": "<|endoftext|>",
1190
+ "vocab_size": 49152
1191
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff