Instructions to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2") model = AutoModelForMultimodalLM.from_pretrained("Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2
- SGLang
How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2 with Docker Model Runner:
docker model run hf.co/Sinensis/Psyonic-Cetacean-Ultra-Quality-20b-4.0bpw-h6-exl2
Psyonic-Cetacean-V1-20B-Ultra-Quality
4bpw h6 exl2 quant of https://huggingface.co/DavidAU/Psyonic-Cetacean-V1-20B-Ultra-Quality-Float32
Original merge model: https://huggingface.co/jebcarter/psyonic-cetacean-20B
This is a Llama2-based stack merge consisting of:
Prompt format: Alpaca
This model is focused on storywriting and text adventure, with a side order of Assistant and Chat functionality. Like its ancestor Psyfighter-2 this model will function better if you let it improvise and riff on your concepts rather than feeding it an excess of detail. Additionally, either the removal of the ChatML vocab or the stack merging process itself has resulted in not only an uncensored model but an actively anti-censored model, so please be aware that this model can and will kill you during adventures or output NSFW material if prompted accordingly.
Thanks to https://huggingface.co/jebcarter for a wonderful model and https://huggingface.co/DavidAU for remastering it.
- Downloads last month
- 2