Instructions to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF",
	filename="Psyonic-Cetacean-Ultra-IQ4_NL.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

Use Docker

docker model run hf.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

Ollama
How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with Ollama:
```
ollama run hf.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M
```

Unsloth Studio

How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with Docker Model Runner:
```
docker model run hf.co/DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M
```

Lemonade

How to use DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull DavidAU/Psyonic-Cetacean-Ultra-Quality-20b-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Psyonic-Cetacean-Ultra-Quality-20b-GGUF-Q4_K_M

List all available models

lemonade list

Concerned about the use of PPL scores

by FallenMerick - opened May 29, 2024

Discussion

FallenMerick

May 29, 2024

But... what about Q8?

The mountain moved:

150 points better: PPL = 8.5850 +/- 0.05881 VS: BASE/ORGINAL: PPL = 8.6012 +/- 0.05900

Looking at these numbers, it appears that 150 points = 0.015 PPL. If this is the case, then the margin of error on these scores is +/- 588 points, and the 150 points of improvement is well within that margin of error, essentially making it meaningless as a marker for the improvement itself.

Its possible that this 32 bit remaster is indeed much more intelligent and capable than its 16 bit counterpart, but it doesn't appear that this improvement is being indicated within the PPL scores themselves. I would love to see some other benchmark differences between these two versions of PsyCet, and potentially some A/B blind testing as well if possible.

DavidAU

Owner May 29, 2024

•

edited May 29, 2024

Completely agree - that is why you need to look at all the scores of all quants.
It is margin of movement.
If only a few quants show change, it is error - if they all show change it is movement.
Likewise the +- is RANGE, not just margin of error.
And like perplexity, range is also an average - a terrible indicator of true mathematical movement especially in something as large and complex as a LLM.

The actual range amount is also a factor, which is relative to the base level perplexity - in this case 8-9.
IE: if the range was say 1.000 or higher you have a possible unstable model.
However if the base range is 15 ; a "error" range of 1 would be no issue.

This is because perplexity is not linear, it is closer to levels of magnitude.
It is only a relative, rough "30000 feet" view of the model.

And then there is the file used to calculate perplexity itself.
A wild card.
Change this, you change everything.

The original creator of this model tested it himself.
To put it mildly it was blown away. His comments are all over his discord - KoboldAI.
Members of his group familiar with the model also tested it too.
Without exception all of them were impressed.
Likewise a lot of real world testing - original and new and improved where done prior to release to further confirm the change to the positive change so to speak.
These methods confirm or deny perplexity changes and likewise reveal positive and/or negative changes as well.

AS per Jeb Carter, creator of the model:

instruction following has improved dramatically.
new abilities have emerged.
he had to REDUCE the instructions sets used because the model no longer needed as specific instructions.
prose, nuance and depth have all improved.
issues with the original model have disappeared.

This is not "something for nothing" ; it is method of ensuring maximum precision at every step just before "ggufing" the model.
The methods employed only ensure precision loss is minimized or eliminated.
It is mathematical and theory sound.

FallenMerick

May 29, 2024

I believe what is being said, and it absolutely makes sense that maintaining 32 bit precision from the beginning would lead to better overall precision in the final product. I just wanted to point out why I don't believe PPL is very useful as a measurement for the improvement being seen.

I appreciate the insight!

hazkun

Jun 7, 2024

the model stheno by sao10k is really good. it's score almost 70 at leaderboard while just having 8B parameter. is it possible to make fp32 version too from that model ?

DavidAU

Owner Jun 8, 2024

I am aware of this model. If I recall there are a number of versions?
Do you mean one in particular?

hazkun

Jun 8, 2024

from his post, it seem like 3.2 is not his final model. i dont know, maybe he will update the model. but now it already outperform even 70B models from my experience. it will hit quite a punch now since the 3.2 version is kinda popular rn if there is a fp32 version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment