KaifengGGG/WenYanWen_English_Parallel
Viewer • Updated • 1.96M • 369 • 11
How to use KaifengGGG/Llama3-8b-Hanscripter with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="KaifengGGG/Llama3-8b-Hanscripter")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("KaifengGGG/Llama3-8b-Hanscripter")
model = AutoModelForCausalLM.from_pretrained("KaifengGGG/Llama3-8b-Hanscripter")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use KaifengGGG/Llama3-8b-Hanscripter with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KaifengGGG/Llama3-8b-Hanscripter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "KaifengGGG/Llama3-8b-Hanscripter",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/KaifengGGG/Llama3-8b-Hanscripter
How to use KaifengGGG/Llama3-8b-Hanscripter with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "KaifengGGG/Llama3-8b-Hanscripter" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "KaifengGGG/Llama3-8b-Hanscripter",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "KaifengGGG/Llama3-8b-Hanscripter" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "KaifengGGG/Llama3-8b-Hanscripter",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use KaifengGGG/Llama3-8b-Hanscripter with Docker Model Runner:
docker model run hf.co/KaifengGGG/Llama3-8b-Hanscripter
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("KaifengGGG/Llama3-8b-Hanscripter")
model = AutoModelForCausalLM.from_pretrained("KaifengGGG/Llama3-8b-Hanscripter")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. Our Github repo.
Below are detailed descriptions of the various parameters and technologies used.
The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:
True - Enables the use of 4-bit quantization.False - Nested quantization is not used.Settings for training the model are as follows:
FalseTrue - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16).TrueTrue
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="KaifengGGG/Llama3-8b-Hanscripter") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)