Instructions to use meta-llama/Meta-Llama-3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use meta-llama/Meta-Llama-3-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")

Inference
Local Apps Settings

vLLM

How to use meta-llama/Meta-Llama-3-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "meta-llama/Meta-Llama-3-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/meta-llama/Meta-Llama-3-8B

SGLang

How to use meta-llama/Meta-Llama-3-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "meta-llama/Meta-Llama-3-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "meta-llama/Meta-Llama-3-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Meta-Llama-3-8B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use meta-llama/Meta-Llama-3-8B with Docker Model Runner:
```
docker model run hf.co/meta-llama/Meta-Llama-3-8B
```

The model just repeats part of the input

#83

by summerstay - opened Apr 26, 2024

Discussion

summerstay

Apr 26, 2024

I have tried many different prompts and settings but whatever I do, I can't get a long response from this base model without it just repeating itself over and over. I thought it might have to do with the special tokens, but adding in didn't seem to help. Any idea what is going wrong? I have tried repetition and presence parameters, but I can't seem to find a setting between "does nothing" and "makes it descend into gibberish".

realdanielbyrne

Apr 26, 2024

This is the base model hence it is only trained to predict the next sequence of words. Generally base models know a lot about language, and nothing about chatting. You will need to either fine-tune this model on instruction following or try starting with the instruction tuned version of this model, meta-llama/Meta-Llama-3-8B-Instruct.

summerstay

Apr 26, 2024

Thanks. I have used GPT-3 as a base model. It seems like this is much more prone to repetition than GPT-3 was. I have finally gotten it working okay, but only by turning up the repetition penalty to more than 1. Much higher and the penalty stops it from being able to end sentences (because . is penalized) and soon loses all sense entirely.

realdanielbyrne

Jun 20, 2024

Interesting. Typically in the instruction tuned models then encode a stop token which accomplishes what you are attempting to do with the repeat penalty.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment