Instructions to use SciPhi/SciPhi-Mistral-7B-32k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SciPhi/SciPhi-Mistral-7B-32k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SciPhi/SciPhi-Mistral-7B-32k")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SciPhi/SciPhi-Mistral-7B-32k")
model = AutoModelForCausalLM.from_pretrained("SciPhi/SciPhi-Mistral-7B-32k")

Inference
Local Apps Settings

vLLM

How to use SciPhi/SciPhi-Mistral-7B-32k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SciPhi/SciPhi-Mistral-7B-32k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SciPhi/SciPhi-Mistral-7B-32k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SciPhi/SciPhi-Mistral-7B-32k

SGLang

How to use SciPhi/SciPhi-Mistral-7B-32k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SciPhi/SciPhi-Mistral-7B-32k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SciPhi/SciPhi-Mistral-7B-32k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SciPhi/SciPhi-Mistral-7B-32k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SciPhi/SciPhi-Mistral-7B-32k",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SciPhi/SciPhi-Mistral-7B-32k with Docker Model Runner:
```
docker model run hf.co/SciPhi/SciPhi-Mistral-7B-32k
```

What's the prompt format for this model?

by TK-Master - opened Oct 29, 2023

Discussion

TK-Master

Oct 29, 2023

What's the recommended prompt format for this model? what was the model trained with?

Thnx

Handgun1773

Oct 29, 2023

This looks like an improved base model to be fine-tuned on, so no prompt template.

algorithm

Oct 30, 2023

Wondering the same thing...

brucethemoose

Oct 31, 2023

Apparently this is not the base model (as that was just uploaded).

So... Is this an instruct? What is the prompt?

TK-Master

Oct 31, 2023

•

edited Oct 31, 2023

I still don't know the ideal format but I had terrible results with the mistral format ([INST] prompt [/INST]) so it clearly isn't this one..
I had better luck with alpaca and zephyr formats but without the eos </s>

emrgnt-cmplxty

SciPhi-AI org Oct 31, 2023

Alpaca instruct is preferred.

brucethemoose

Oct 31, 2023

•

edited Oct 31, 2023

Alpaca instruct is preferred.

Is it the same as the RAG model?

https://huggingface.co/SciPhi/SciPhi-Self-RAG-Mistral-7B-32k#recommended-chat-formatting

If so, that is enhanced Alpaca (As base Alpaca doesn't use any particular syntax for the system prompt).

Y'all should print the precise format of the trained model in a box on the model page. Something like this template would be very helpful: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF#prompt-template-zephyr

emrgnt-cmplxty

SciPhi-AI org Oct 31, 2023

Thanks for your interest and feedback - you are correct in this regard. I will do some testing tonight and produce a clean template + some code to support it elsewhere.

emrgnt-cmplxty

SciPhi-AI org Oct 31, 2023

I recommend formatting like this -

Recommended Chat Formatting


We recommend mapping such that

messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]

goes to --->

### System:
You are a friendly chatbot who always responds in the style of a pirate

### Instruction:
How many helicopters can a human eat in one sitting?

### Response:
...

I chose this format as the majority of the fine tuning dataset was instruction tuning and it seemed like the closest match. It might need revision, please let me know your findings.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment