Instructions to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2")
model = AutoModelForMultimodalLM.from_pretrained("Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2

SGLang

How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with Docker Model Runner:
```
docker model run hf.co/Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2
```

Psyonic-Cetacean-V1-20B-Ultra-Quality

4bpw h6 exl2 quant of https://huggingface.co/DavidAU/Psyonic-Cetacean-V1-20B-Ultra-Quality-Float32

Original merge model: https://huggingface.co/jebcarter/psyonic-cetacean-20B

This is a Llama2-based stack merge consisting of:

Prompt format: Alpaca

This model is focused on storywriting and text adventure, with a side order of Assistant and Chat functionality. Like its ancestor Psyfighter-2 this model will function better if you let it improvise and riff on your concepts rather than feeding it an excess of detail. Additionally, either the removal of the ChatML vocab or the stack merging process itself has resulted in not only an uncensored model but an actively anti-censored model, so please be aware that this model can and will kill you during adventures or output NSFW material if prompted accordingly.

Thanks to https://huggingface.co/jebcarter for a wonderful model and https://huggingface.co/DavidAU for remastering it.

Downloads last month: 2