Instructions to use MLP-KTLim/llama-3-Korean-Bllossom-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MLP-KTLim/llama-3-Korean-Bllossom-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MLP-KTLim/llama-3-Korean-Bllossom-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("MLP-KTLim/llama-3-Korean-Bllossom-8B")
model = AutoModelForMultimodalLM.from_pretrained("MLP-KTLim/llama-3-Korean-Bllossom-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MLP-KTLim/llama-3-Korean-Bllossom-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MLP-KTLim/llama-3-Korean-Bllossom-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MLP-KTLim/llama-3-Korean-Bllossom-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MLP-KTLim/llama-3-Korean-Bllossom-8B

SGLang

How to use MLP-KTLim/llama-3-Korean-Bllossom-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MLP-KTLim/llama-3-Korean-Bllossom-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MLP-KTLim/llama-3-Korean-Bllossom-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MLP-KTLim/llama-3-Korean-Bllossom-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MLP-KTLim/llama-3-Korean-Bllossom-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MLP-KTLim/llama-3-Korean-Bllossom-8B with Docker Model Runner:
```
docker model run hf.co/MLP-KTLim/llama-3-Korean-Bllossom-8B
```

파인튜닝

#10

by kakarooky - opened Jul 19, 2024

Discussion

kakarooky

Jul 19, 2024

•

edited Jul 19, 2024

안녕하세요.
우선 좋은 모델 만들어 주셔서 감사합니다.

파인튜닝을 좀 해보려고 하는데, 기존 파인튜닝 방법으로는 잘 되지 않더라구요.
(다른 Llama3 기반 한국어 모델들은 튜닝이 되는데...)
혹시 입력 컨텍스트가 조금 다른가 해서 질문 드립니다.

# 입력 dataset
{'text': '<s>[INST] 라이브 스트리밍이란? [/INST] Live Streaming은 방송 현장에서 중계되는 영상이나 이미 편성된 동영상 미디어 파일을 사용자에게 실시간으로 전송하는 서비스입니다.\\n주로 스포츠 중계, 온라인 교육, 개인 방송, 메타버스, 라이브 커머스 등의 서비스에 사용합니다. </s>'}

# 파인튜닝 후, 추론
[{'generated_text': '<s>[INST] 라이브 스트리밍이란 [/INST]<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s<s'}]

관련하여 간단하게 파인튜닝 가능한 dataset 이나 아니면, 소스코드 공개가 가능할까요?

ShinDJ

MLP-LAB org Jul 19, 2024

•

edited Jul 19, 2024

안녕하세요 서울과학기술대학교 MLP Lab
신동재 연구원 입니다.

기재해주신 프롬프트는 llama2 프롬프트로 보입니다 본 모델은 llama3 기반 모델이기 때문에 README 예시에 적혀있는 프롬프트를 참조해주시길 바랍니다.

추가적으로 현재 데이터셋 및 소스코드 공개는 어려운점 양해부탁드립니다.

BLLOSSOM 모델을 사용해주셔서 감사합니다

ShinDJ changed discussion status to closed Jul 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment