Instructions to use seonglae/yokhal-md with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use seonglae/yokhal-md with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="seonglae/yokhal-md")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("seonglae/yokhal-md")
model = AutoModelForCausalLM.from_pretrained("seonglae/yokhal-md")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use seonglae/yokhal-md with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "seonglae/yokhal-md"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "seonglae/yokhal-md",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/seonglae/yokhal-md

SGLang

How to use seonglae/yokhal-md with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "seonglae/yokhal-md" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "seonglae/yokhal-md",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "seonglae/yokhal-md" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "seonglae/yokhal-md",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use seonglae/yokhal-md with Docker Model Runner:
```
docker model run hf.co/seonglae/yokhal-md
```

Yokhal (욕쟁이 할머니)

Korean Chatbot based on Google Gemma

Model Details

Model Description

Fine-tuned by: Seonglae Cho
Model type: Gemma
Language(s) (NLP): Korean, English
Finetuned from model: Gemma-2b-it

Model Sources

Repository: https://github.com/seonglae/yokhal
Demo: https://huggingface.co/spaces/seonglae/yokhal

Uses

Direct Use

Korean Chatbot with Internet culture

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.


tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16,
                                             device_map="auto" if device is None else device, 
                                             attn_implementation="flash_attention_2") # if flash enabled
sys_prompt = '한국어로 대답해'
texts = ['안녕', '서울은 오늘 어때']
chats = list(map(lambda t: [{'role': 'user', 'content': f'{sys_prompt}\n{t}'}], texts)) # ChatML format
prompts = list(map(lambda p: tokenizer.apply_chat_template(p, tokenize=False, add_generation_prompt=True), chats))
input_ids = tokenizer(prompts, return_tensors="pt", padding=True).to("cuda" if device is None else device)
outputs = model.generate(**input_ids, max_new_tokens=100, repetition_penalty=1.05)
for output in outputs:
  print(tokenizer.decode(output, skip_special_tokens=True), end='\n\n')

Training Details

Trained on 2 x RTX3090

More Information on Github source code

Training Data

[More Information Needed]

Training Procedure

Weight Initialized from Internet comments dataset
Trained on Korean Namuwiki dataset until step 80000 (30000 step is on main branch because of repetition issue above there)

seq_length 1024 with dataset packing
batch 3 per device
lr 1e-5
optim adafactor

Instruction tuning on Korean Instruction Dataset using QLoRa (not on main)

seq_length 2048
lr 2e-4

Preprocessing [optional]

Gemma do not support explicit system prompt in ChatML, so I trained putting system prompt before user message like below

if (chat[0]['role'] == 'system'):
  chat[1]['content'] = f"{chat[0]['content']}\n{chat[1]['content']}"
  chat = chat[1:]
try:
  prompt = tokenizer.apply_chat_template(chat, tokenize=False)

Source Code

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Downloads last month: 8

Safetensors

Model size

3B params

Tensor type

BF16

seonglae
/

yokhal-md

Yokhal (욕쟁이 할머니)

Model Details

Model Description

Model Sources

Uses

Direct Use

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Spaces using seonglae/yokhal-md 2