Instructions to use seonglae/yokhal-md with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use seonglae/yokhal-md with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="seonglae/yokhal-md") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("seonglae/yokhal-md") model = AutoModelForCausalLM.from_pretrained("seonglae/yokhal-md") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use seonglae/yokhal-md with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "seonglae/yokhal-md" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seonglae/yokhal-md", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/seonglae/yokhal-md
- SGLang
How to use seonglae/yokhal-md with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "seonglae/yokhal-md" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seonglae/yokhal-md", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "seonglae/yokhal-md" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "seonglae/yokhal-md", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use seonglae/yokhal-md with Docker Model Runner:
docker model run hf.co/seonglae/yokhal-md
Yokhal (욕쟁이 할머니)
Korean Chatbot based on Google Gemma
Model Details
Model Description
- Fine-tuned by: Seonglae Cho
- Model type: Gemma
- Language(s) (NLP): Korean, English
- Finetuned from model: Gemma-2b-it
Model Sources
Uses
Direct Use
Korean Chatbot with Internet culture
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16,
device_map="auto" if device is None else device,
attn_implementation="flash_attention_2") # if flash enabled
sys_prompt = '한국어로 대답해'
texts = ['안녕', '서울은 오늘 어때']
chats = list(map(lambda t: [{'role': 'user', 'content': f'{sys_prompt}\n{t}'}], texts)) # ChatML format
prompts = list(map(lambda p: tokenizer.apply_chat_template(p, tokenize=False, add_generation_prompt=True), chats))
input_ids = tokenizer(prompts, return_tensors="pt", padding=True).to("cuda" if device is None else device)
outputs = model.generate(**input_ids, max_new_tokens=100, repetition_penalty=1.05)
for output in outputs:
print(tokenizer.decode(output, skip_special_tokens=True), end='\n\n')
Training Details
Trained on 2 x RTX3090
More Information on Github source code
Training Data
[More Information Needed]
Training Procedure
- Weight Initialized from Internet comments dataset
- Trained on Korean Namuwiki dataset until step 80000 (30000 step is on main branch because of repetition issue above there)
seq_length1024 with dataset packingbatch3 per devicelr1e-5optimadafactor
- Instruction tuning on Korean Instruction Dataset using QLoRa (not on main)
seq_length2048lr2e-4
Preprocessing [optional]
Gemma do not support explicit system prompt in ChatML, so I trained putting system prompt before user message like below
if (chat[0]['role'] == 'system'):
chat[1]['content'] = f"{chat[0]['content']}\n{chat[1]['content']}"
chat = chat[1:]
try:
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
- Downloads last month
- 8