Instructions to use hf-100/Llama-3-Spellbound-Instruct-8B-0.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hf-100/Llama-3-Spellbound-Instruct-8B-0.3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="hf-100/Llama-3-Spellbound-Instruct-8B-0.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("hf-100/Llama-3-Spellbound-Instruct-8B-0.3")
model = AutoModelForMultimodalLM.from_pretrained("hf-100/Llama-3-Spellbound-Instruct-8B-0.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use hf-100/Llama-3-Spellbound-Instruct-8B-0.3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hf-100/Llama-3-Spellbound-Instruct-8B-0.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hf-100/Llama-3-Spellbound-Instruct-8B-0.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/hf-100/Llama-3-Spellbound-Instruct-8B-0.3

SGLang

How to use hf-100/Llama-3-Spellbound-Instruct-8B-0.3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hf-100/Llama-3-Spellbound-Instruct-8B-0.3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hf-100/Llama-3-Spellbound-Instruct-8B-0.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hf-100/Llama-3-Spellbound-Instruct-8B-0.3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hf-100/Llama-3-Spellbound-Instruct-8B-0.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use hf-100/Llama-3-Spellbound-Instruct-8B-0.3 with Docker Model Runner:
```
docker model run hf.co/hf-100/Llama-3-Spellbound-Instruct-8B-0.3
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Llama-3 Spellbound Instruct Tuning-Free

Updated Aspects

Trained on additional tokens
Improved mix of subject matter model was trained on
Trained for 1.5M additional tokens
Additional training on DPO dataset

Model Rationale

Llama 3 is a strong base model with strong world understanding and creativity. Additional instruct finetuning trades that world understanding and creativity for instruction following that Llama doesn't require in order to adhere to most forms of roleplay.

This model was trained on unstructured text only, no instruct related fine-tuning was performed.

Made by tryspellbound.com.

(tryspellbound.com does not currently use this model, it uses Claude 3 Sonnet.)

Features of this fine-tune for Llama 3:

Roleplaying in multi-turn stories where the history is presented in a single message
Dynamic switching of writing styles for different scenarios
Interpretation of formatting marks 'quote' and 'action'

Warning: The underlying model, Llama 3, was trained on data that included adult content. This fine-tune does not add additional guardrails and is not suitable for all environments.

Purpose of the Model

The main goal is to explore how presenting LLMs with history and instructions separately affects their performance, demonstrating:

Improved coherence in long conversations
Enhanced quality of character interactions
Decreased instruction adherence, which could be improved with additional training

Advanced prompting of the model

For advanced prompting, see this document

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for hf-100/Llama-3-Spellbound-Instruct-8B-0.3

Merges

6 models

Quantizations

1 model