Instructions to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-14B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B

SGLang

How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-14B with Docker Model Runner:
```
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
```

HyperscaleAI

bigshanedogg commited on May 28

Commit

a03175b

1 Parent(s): ddcb423

Migration to Native transformers Support (>= 5.9.0) (#12)

Browse files

- Migration to Native transformers Support (>= 5.9.0) (76f1e8140d2f2658fb681fa07de08155096cce06)

Co-authored-by: bigshane <bigshanedogg@users.noreply.huggingface.co>

Files changed (4) hide show

README.md +42 -306
chat_template.jinja +108 -0
generation_config.json +5 -2
tokenizer_config.json +1 -1

README.md CHANGED Viewed

@@ -369,7 +369,7 @@ For all later turns, the reasoning (think) content from previous turns is not ad
 ## **Huggingface Usage Example**
-After downloading the model binaries, including the configuration files, to a local path(`/path/to/hyperclova-x-seed-think-14b`), you can run the following in a Python environment with the [Huggingface library](https://huggingface.co/docs/transformers/installation)(verified to work with version >= 4.53.0) and [timm(pytorch-image-models)](https://github.com/huggingface/pytorch-image-models) installed.
 You can use the `apply_chat_template` parameter to explicitly enable or disable the reasoning feature.
@@ -390,70 +390,16 @@ inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, force_r
 inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, skip_reasoning=True, return_dict=True, return_tensors="pt")
 ```
-### Non-think Example Code
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
-model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto")
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-chat = [
-  {"role": "system", "content": "- In this environment, various tools can be used to answer users' questions.\n- You are \"CLOVA X,\" an AI language model developed by NAVER.\n- Begin by creating a plan for solving the problem, and then utilize the tools accordingly to address the problem.\n- The current date is Monday, July 21, 2025.\n- Latest information such as news, stock prices, and shopping is retrieved through the tool_list.\n- If external tools are required, the assistant should not answer directly but must first obtain the necessary information via the assistant -> tool/function_call role, and then respond."},
-  {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."},
-]
-# By adding skip_reasoning=True, the model is forced to always answer directly without reasoning
-inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, skip_reasoning=True, return_dict=True, return_tensors="pt")
-inputs = inputs.to("cuda")
-output_ids = model.generate(
-    **inputs,
-    max_length=1024,
-    stop_strings=["<|endofturn|>", "<|stop|>"],
-    temperature=0.5,
-    top_p=0.6,
-    repetition_penalty=1.05,
-    tokenizer=tokenizer
-)
-print(tokenizer.batch_decode(output_ids))
-```
-### Think Example Code
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
-model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto")
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-chat = [
-  {"role": "system", "content": "- In this environment, various tools can be used to answer users' questions.\n- You are \"CLOVA X,\" an AI language model developed by NAVER.\n- Begin by creating a plan for solving the problem, and then utilize the tools accordingly to address the problem.\n- The current date is Monday, July 21, 2025.\n- Latest information such as news, stock prices, and shopping is retrieved through the tool_list.\n- If external tools are required, the assistant should not answer directly but must first obtain the necessary information via the assistant -> tool/function_call role, and then respond."},
-  {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."},
-]
-# By adding force_reasoning=True, the model is forced to always reason before responding
-inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, force_reasoning=True, return_dict=True, return_tensors="pt")
-inputs = inputs.to("cuda")
-output_ids = model.generate(
-    **inputs,
-    max_length=1024,
-    stop_strings=["<|endofturn|>", "<|stop|>"],
-    temperature=0.5,
-    top_p=0.6,
-    repetition_penalty=1.05,
-    tokenizer=tokenizer
-)
-print(tokenizer.batch_decode(output_ids))
-```
-### Hybrid(the model decides whether to use think or non-think mode) Example Code
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
-model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 chat = [
@@ -461,8 +407,10 @@ chat = [
   {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."},
 ]
-# The model decides whether to answer after reasoning or to respond immediately without reasoning
 inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
 inputs = inputs.to("cuda")
 output_ids = model.generate(
@@ -485,7 +433,7 @@ import json
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
-model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 # 1) The name of the tool should be written as function_call.{{ name }}.
@@ -547,265 +495,53 @@ print(tokenizer.batch_decode(output_ids))
 ## **vLLM Usage Example**
-The HyperCLOVA X SEED Think model is built on a custom LLM architecture based on the LLaMA architecture, incorporating μP and Peri-LN techniques. For convenient use with vLLM, it is available as a dedicated vLLM plugin that can be installed and used with ease once vLLM is set up.
-1. Download vLLM plugin source code
-    ```bash
-    git clone https://github.com/NAVER-Cloud-HyperCLOVA-X/hcx-vllm-plugin
-    ```
-2. vLLM Plugin Build & Installation: While keeping the NAVER-Cloud-HyperCLOVA-X/hcx-vllm-plugin path downloaded in step 1, refer to the commands below.
-    ```bash
-    pip install -e .
-    ```
-After downloading the model checkpoint to a local path (`/path/to/hyperclova-x-seed-think-14b`), you can perform text inference by running the following commands on a GPU environment with A100 or higher.
 ```bash
-python -m vllm.entrypoints.openai.api_server --model=/path/to/hyperclova-x-seed-think-14b --trust_remote_code --port=8000
-curl http://localhost:8000/v1/completions \
 -H "Content-Type: application/json" \
 -d '{
-"prompt": "<|im_start|>tool_list\n<|im_end|>\n<|im_start|>system\n- The AI language model is named \"CLOVA X\" and was developed by NAVER.\n- Today is Friday, July 18, 2025.<|im_end|>\n<|im_start|>user\nExplain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics.<|im_end|>\n<|im_start|>assistant/think\n",
-"top_k":-1,
-"temperature":0.5,
-"top_p":0.6,
-"repetition_penalty":1.05,
-"stop":["<|im_end|><|endofturn|>", "<|im_end|><|stop|>"],
-"max_tokens":8192,
-"skip_special_tokens":false
 }'
 ```
-### Chat Completions Usage Example
-1.  Using the Chat completions endpoint
-<!-- end list -->
-  - Basic serving script (same as completions)
-    `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code`
-  - Sampling parameters such as `top_k`, `temperature`, `top_p`, `repetition_penalty`, and `max_tokens` can be set freely.
-  - However, the `skip_special_tokens` and `stop` options must be set as below for vLLM to recognize the model's token generation stop signal and cease generation.
-  - request example
-    ```bash
-    curl -X POST http://localhost:8000/v1/chat/completions \
-    -H "Content-Type: application/json" \
-    -d '{
-        "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
-        "messages": [
-            {"role": "system", "content": "- The AI language model is named \"CLOVA X\" and was developed by NAVER.\n- Today is Friday, July 18, 2025."},
-            {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."}
-        ],
-        "skip_special_tokens":false,
-        "stop": [
-            "<|im_end|><|endofturn|>",
-            "<|im_end|><|stop|>"
-        ],
-        "top_k": -1,
-        "temperature": 0.5,
-        "top_p": 0.6,
-        "repetition_penalty": 1.05,
-        "max_tokens": 8192
-    }'
-    ```
-<!-- end list -->
-2.  tool call usage example
-<!-- end list -->
-  - Serving script
-      - You need to add `--enable-auto-tool-choice --tool-call-parser hcx` to the existing script.
-        `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-auto-tool-choice --tool-call-parser hcx`
-  - request example
-      - If you put the available tools in `tools`, they will be applied to the `tool_list` part passed to the model.
-        ```bash
-        curl -X POST http://localhost:8000/v1/chat/completions \
-        -H "Content-Type: application/json" \
-        -d '{
-            "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
-            "messages": [
-            {"role": "user", "content": "Could you please tell me the current weather conditions in Boston, MA in Celsius?"}
-            ],
-            "stop": ["<|im_end|><|stop|>", "<|im_end|><|endofturn|>"],
-            "max_tokens": 8192,
-            "skip_special_tokens": false,
-            "tools": [
-            {
-                "type": "function",
-                "function": {
-                "name": "get_current_weather",
-                "description": "Retrieves the current weather conditions for a specified city and state.",
-                "parameters": {
-                    "type": "object",
-                    "required": ["location"],
-                    "properties": {
-                    "location": {
-                        "type": "string",
-                        "description": "The location for which to get the weather, in the format of \'\\'City, State\\'', such as \'\\'San Francisco, CA\\'' if State for the city exists. \'\\'City, Country\\'' if State for the city does not exist. Use short form for state."
-                    },
-                    "unit": {
-                        "type": "string",
-                        "description": "The unit of temperature for the weather report.",
-                        "enum": ["celsius", "fahrenheit"],
-                        "default": "fahrenheit"
-                    }
-                }
-                }
-            }
-            }
-        ]
-        }'
-        ```
-  - response example
-      - Parsed tool calls are returned in the `tool_calls` field.
-      - If there is a response generated by the model other than the tool call, it is returned in the `content` field. Otherwise, `null` is returned.
-        ```bash
-        {
-            "id": "chatcmpl-b9aad45639464c0ebf71861df13b4eb2",
-            "object": "chat.completion",
-            "created": 1753358351,
-            "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
-            "choices": [
-                {
-                    "index": 0,
-                    "message": {
-                        "role": "assistant",
-                        "reasoning_content": null,
-                        "content": null,
-                        "tool_calls": [
-                            {
-                                "id": "chatcmpl-tool-b9513352e4c64065a315e495b2613753",
-                                "type": "function",
-                                "function": {
-                                    "name": "get_current_weather",
-                                    "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}"
-                                }
-                            }
-                        ]
-                    },
-                    "logprobs": null,
-                    "finish_reason": "tool_calls",
-                    "stop_reason": "<|im_end|><|stop|>"
-                }
-            ],
-            "usage": {
-                "prompt_tokens": 189,
-                "total_tokens": 224,
-                "completion_tokens": 35,
-                "prompt_tokens_details": null
-            },
-            "prompt_logprobs": null,
-            "kv_transfer_params": null
-        }
-        ```
-<!-- end list -->
-3.  reasoning usage example
-<!-- end list -->
-  - Serving script
-      - You need to add `--enable-reasoning --reasoning-parser hcx` to the existing script.
-        `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-reasoning --reasoning-parser hcx`
-      - The `--enable-reasoning` option has been deprecated since vLLM v0.9.0.
-          - If you are using vLLM v0.9.0 or higher, you only need to add `--reasoning-parser hcx` without `--enable-reasoning`.
-      - The reasoning parser extracts the reasoning content from responses generated in reasoning mode. This option does not always make the model operate in reasoning mode, nor does excluding the parser necessarily force non-reasoning operation.
-      - request example
-          - `"chat_template_kwargs": {"force_reasoning": true}` forces reasoning.
-          - `"chat_template_kwargs": {"skip_reasoning": true}` forces non-reasoning.
-          - If both are set to `true`, `force_reasoning: true` has higher priority.
-          - If neither is given, the model decides whether to reason or not.
-        <!-- end list -->
-        ```bash
-        curl -X POST http://localhost:8000/v1/chat/completions \
-        -H "Content-Type: application/json" \
-        -d '{
-            "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
-            "messages": [
-                {"role": "system", "content": "You are a helpful assistant."},
-                {"role": "user", "content": "Tell me the prime number closest to 1000."}
-            ],
-            "stop": ["<|im_end|><|stop|>", "<|im_end|><|endofturn|>"],
-            "max_tokens": 8192,
-            "skip_special_tokens": false,
-            "chat_template_kwargs": {"force_reasoning": true}
-        }'
-        ```
-      - response example
-          - The reasoning part is returned in the `reasoning_content` field, and the assistant's final response is returned in the `content` field separately.
-        <!-- end list -->
-        ```bash
-        {
-            "id": "chatcmpl-157d282ebaca4333a9f04b1bdfa7eb8b",
-            "object": "chat.completion",
-            "created": 1753361336,
-            "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
-            "choices": [
-                {
-                    "index": 0,
-                    "message": {
-                        "role": "assistant",
-                        "reasoning_content": "Okay, so I need to find the prime number closest to 1000. Hmm, let's start by recalling what a prime number is. ... (중략) ... But in this case, at least for the nearest, 3 versus9, which gives997 as the answer.\n\nTherefore, the correct answer would be997.",
-                        "content": "\nThe prime number closest to 1000 is **997**. \n\nHere's the reasoning:\n1. **Check if 1000 is prime:** It's even, so not prime.\n2. **Find primes near 1000:**\n   - **Below 1000:** Check numbers downward. \n     - Upon verifying, 997 is prime (no divisors up to its square root).\n   - **Above 1000:** Next prime after 997 is 1009, which is 9 units away from 1000.\n3. **Compare distances:**\n   - \\( |1000 - 997| = 3 \\)\n   - \\( |1009 - 1000| = 9 \\)\n   \nSince 3 < 9, **997** is the closest prime. \n\n\\boxed{997}",
-                        "tool_calls": []
-                    },
-                    "logprobs": null,
-                    "finish_reason": "stop",
-                    "stop_reason": "<|im_end|><|endofturn|>"
-                }
-            ],
-            "usage": {
-                "prompt_tokens": 38,
-                "total_tokens": 3254,
-                "completion_tokens": 3216,
-                "prompt_tokens_details": null
-            },
-            "prompt_logprobs": null,
-            "kv_transfer_params": null
-        }
-        ```
-<!-- end list -->
-4.  reasoning + tool call usage example
-<!-- end list -->
-  - Serving script
-      - If you want to use both the reasoning parser and the tool call parser, you can combine the reasoning serving script and the tool call serving script.
-        `vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B --trust_remote_code --enable-auto-tool-choice --tool-call-parser hcx --enable-reasoning --reasoning-parser hcx`
 ## License
@@ -828,4 +564,4 @@ The model is licensed under [HyperCLOVA X SEED Model License Agreement](./LICENS
 ## Questions
-For any other questions, please feel free to contact us at [dl_hcxopensource@navercorp.com](mailto:dl_hcxopensource@navercorp.com).

 ## **Huggingface Usage Example**
+After downloading the model binaries, including the configuration files, to a local path(`/path/to/hyperclova-x-seed-think-14b`), you can run the following in a Python environment with the [Huggingface library](https://huggingface.co/docs/transformers/installation) (verified to work with version >= 5.9.0) installed.
 You can use the `apply_chat_template` parameter to explicitly enable or disable the reasoning feature.
 inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, skip_reasoning=True, return_dict=True, return_tensors="pt")
 ```
+### Basic Example Code
+The example below runs the model in **hybrid mode** (the model decides whether to reason).
+To force a specific mode, replace the `inputs = ...` line with one of the commented alternatives.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 chat = [
   {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."},
 ]
+# Hybrid (default): the model decides whether to reason before answering.
 inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
+# Think mode:     inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, force_reasoning=True, return_dict=True, return_tensors="pt")
+# Non-think mode: inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, skip_reasoning=True,  return_dict=True, return_tensors="pt")
 inputs = inputs.to("cuda")
 output_ids = model.generate(
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B"
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 # 1) The name of the tool should be written as function_call.{{ name }}.
 ## **vLLM Usage Example**
+The HyperCLOVA X SEED Think model is natively supported by vLLM. After installing vLLM, you can serve the model directly.
+```bash
+pip install vllm
+vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
+```
+### Chat Completions Request
+- Sampling parameters such as `top_k`, `temperature`, `top_p`, `repetition_penalty`, and `max_tokens` can be set freely.
+- However, the `skip_special_tokens` and `stop` options must be set as below for vLLM to recognize the model's token generation stop signal and cease generation.
 ```bash
+curl -X POST http://localhost:8000/v1/chat/completions \
 -H "Content-Type: application/json" \
 -d '{
+    "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-14B",
+    "messages": [
+        {"role": "system", "content": "- The AI language model is named \"CLOVA X\" and was developed by NAVER.\n- Today is Friday, July 18, 2025."},
+        {"role": "user", "content": "Explain in as much detail as possible the relationship between the Schrödinger equation and quantum mechanics."}
+    ],
+    "skip_special_tokens": false,
+    "stop": ["<|im_end|><|endofturn|>", "<|im_end|><|stop|>"],
+    "top_k": -1,
+    "temperature": 0.5,
+    "top_p": 0.6,
+    "repetition_penalty": 1.05,
+    "max_tokens": 8192
 }'
 ```
+### Controlling Reasoning Mode
+Use `chat_template_kwargs` in the request body to control reasoning behavior:
+- `{"force_reasoning": true}` — Always reason before answering.
+- `{"skip_reasoning": true}` — Always answer directly without reasoning.
+- If neither is given, the model decides on its own.
+- If both are set to `true`, `force_reasoning` takes priority.
+Example:
+```json
+"chat_template_kwargs": {"force_reasoning": true}
+```
+> **Note:** Native `--reasoning-parser` and `--tool-call-parser` support for HyperCLOVA-X is not yet available in vLLM upstream. To extract `reasoning_content` and `tool_calls` as structured response fields, please refer to the [hcx-vllm-plugin](https://github.com/NAVER-Cloud-HyperCLOVA-X/hcx-vllm-plugin) repository.
 ## License
 ## Questions
+For any other questions, please feel free to contact us at [dl_hcxopensource@navercorp.com](mailto:dl_hcxopensource@navercorp.com).

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,108 @@

+{% if tools is not defined or tools is none %}
+    {{- '<|im_start|>tool_list\n<|im_end|>\n' }}
+{%- else %}
+    {{- '<|im_start|>tool_list\n[' }}
+    {%- for tool in tools %}
+        {{- '{"name": "' }}
+        {{- tool.function.name }}
+        {{- '", ' }}
+        {{- '"description": "' }}
+        {{- tool.function.description }}
+        {{- '"' }}
+        {%- if tool.function.parameters is defined %}
+            {{- ', "parameters": ' }}
+            {{- tool.function.parameters | tojson }}
+        {%- endif %}
+        {{- '}' }}
+        {%- if not loop.last %}
+            {{- ', ' }}
+        {%- endif %}
+    {%- endfor %}
+{{- ']<|im_end|>\n' }}
+{%- endif %}
+{%- set ns = namespace(is_searching=true, last_query_index=-1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.is_searching and (message.role == 'user' or message.role == 'tool') %}
+        {%- set ns.last_query_index = index %}
+        {%- set ns.is_searching = false %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if loop.index0 == 0 and message.role != 'system' %}
+        {{- '<|im_start|>system\n<|im_end|>\n' }}
+    {%- endif %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- elif message.content is iterable and message.content is not none %}
+        {%- set ns_content = namespace(text='') %}
+        {%- for part in message.content %}
+            {%- if part.type is defined and part.type == 'text' and part.text is defined %}
+                {%- set ns_content.text = ns_content.text + part.text %}
+            {%- endif %}
+        {%- endfor %}
+        {%- set content = ns_content.text %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- set reasoning_content = '' %}
+    {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
+        {%- set reasoning_content = message.reasoning_content %}
+    {%- endif %}
+    {%- if message.role == "assistant" %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if reasoning_content %}
+                {{- '<|im_start|>assistant/think\n' + reasoning_content.strip('\n') + '<|im_end|>\n' }}
+            {%- endif %}
+        {%- endif %}
+        {%- if content %}
+            {{- '<|im_start|>assistant\n' + content.strip('\n') + '<|im_end|>' }}
+            {%- if message.tool_calls %}
+                {{- '\n' }}
+            {%- else %}
+                {{- '<|endofturn|>\n' }}
+            {%- endif %}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {{- '<|im_start|>assistant -> tool/function_call\n[' }}
+            {%- for tool_call in message.tool_calls %}
+                {%- if not loop.first %}
+                    {{- ', ' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}' }}
+                {%- endfor %}
+            {{- ']<|im_end|><|stop|>\n' }}
+        {%- endif %}
+    {%- elif message.role == "tool" %}
+        {{- '<|im_start|>tool/function_call\n' + content + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {%- if force_reasoning is defined and force_reasoning is true %}
+        {{- '<|im_start|>assistant/think\n' }}
+    {%- elif skip_reasoning is defined and skip_reasoning is true %}
+        {{- '<|im_start|>assistant/think\n<|im_end|>\n<|im_start|>assistant\n' }}
+    {%- else %}
+        {{- '<|im_start|>assistant' }}
+    {%- endif %}
+{%- endif %}

generation_config.json CHANGED Viewed

@@ -1,8 +1,11 @@
 {
-  "_from_model_config": true,
   "bos_token_id": 100257,
   "eos_token_id": 100257,
   "pad_token_id": 100257,
-  "transformers_version": "4.52.4",
   "use_cache": false
 }

 {
   "bos_token_id": 100257,
   "eos_token_id": 100257,
+  "max_new_tokens": 256,
   "pad_token_id": 100257,
+  "stop_strings": [
+    "<|endofturn|>",
+    "<|stop|>"
+  ],
   "use_cache": false
 }

tokenizer_config.json CHANGED Viewed

@@ -491,9 +491,9 @@
     "<PASSWORD>"
   ],
   "bos_token": "<|endoftext|>",
-  "chat_template": "{% if tools is not defined or tools is none %}\n    {{- '<|im_start|>tool_list\\n<|im_end|>\\n' }}\n{%- else %}\n    {{- '<|im_start|>tool_list\\n[' }}\n    {%- for tool in tools %}\n        {{- '{\"name\": \"' }}\n        {{- tool.function.name }}\n        {{- '\", ' }}\n        {{- '\"description\": \"' }}\n        {{- tool.function.description }}\n        {{- '\"' }}\n        {%- if tool.function.parameters is defined %}\n            {{- ', \"parameters\": ' }}\n            {{- tool.function.parameters | tojson }}\n        {%- endif %}\n        {{- '}' }}\n        {%- if not loop.last %}\n            {{- ', ' }}\n        {%- endif %}\n    {%- endfor %}\n{{- ']<|im_end|>\\n' }}\n{%- endif %}\n\n{%- set ns = namespace(is_searching=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- if ns.is_searching and (message.role == 'user' or message.role == 'tool') %}\n        {%- set ns.last_query_index = index %}\n        {%- set ns.is_searching = false %}\n    {%- endif %}\n{%- endfor %}\n\n{%- for message in messages %}\n    {%- if loop.index0 == 0 and message.role != 'system' %}\n        {{- '<|im_start|>system\\n<|im_end|>\\n' }}\n    {%- endif %}\n\n    {%- if message.content is string %}\n        {%- set content = message.content %}\n    {%- else %}\n        {%- set content = '' %}\n    {%- endif %}\n\n    {%- set reasoning_content = '' %}\n    {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n        {%- set reasoning_content = message.reasoning_content %}    \n    {%- endif %}\n    {%- if message.role == \"assistant\" %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {%- if reasoning_content %}\n                {{- '<|im_start|>assistant/think\\n' + reasoning_content.strip('\\n') + '<|im_end|>\\n' }}\n            {%- endif %}\n        {%- endif %}\n\n        {%- if content %}\n            {{- '<|im_start|>assistant\\n' + content.strip('\\n') + '<|im_end|>' }}\n            {%- if message.tool_calls %}\n                {{- '\\n' }}\n            {%- else %}\n                {{- '<|endofturn|>\\n' }}\n            {%- endif %}\n        {%- endif %}\n\n        {%- if message.tool_calls %}\n            {{- '<|im_start|>assistant -> tool/function_call\\n[' }}\n            {%- for tool_call in message.tool_calls %}\n                {%- if not loop.first %}\n                    {{- ', ' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}' }}\n                {%- endfor %}\n            {{- ']<|im_end|><|stop|>\\n' }}\n\n        {%- endif %}\n    {%- elif message.role == \"tool\" %}\n        {{- '<|im_start|>tool/function_call\\n' + content + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {%- if force_reasoning is defined and force_reasoning is true %}\n        {{- '<|im_start|>assistant/think\\n' }}\n    {%- elif skip_reasoning is defined and skip_reasoning is true %}\n        {{- '<|im_start|>assistant\\n' }}\n    {%- else %}\n        {{- '<|im_start|>assistant' }}\n    {%- endif %}\n{%- endif %}",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
   "model_max_length": 1000000000000000019884624838656,
   "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",

     "<PASSWORD>"
   ],
   "bos_token": "<|endoftext|>",
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
   "model_max_length": 1000000000000000019884624838656,
   "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",