Instructions to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw")
model = AutoModelForCausalLM.from_pretrained("brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw

SGLang

How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with Docker Model Runner:
```
docker model run hf.co/brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw
```

Errors on load in text gen web ui.

by Xire - opened Feb 15, 2024

Discussion

Xire

Feb 15, 2024

I was gonna try this but I can't get it to run, errors the moment you try to load. Doesn't matter if I try ExLlamav2_HF, non-HF, ExLlama, etc. None work.
Below is errors (not sure how to format code here):

Traceback (most recent call last):
File "/home/xire/text-generation-webui/modules/ui_model_menu.py", line 210, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)

File "/home/xire/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_maploader

File "/home/xire/text-generation-webui/modules/models.py", line 411, in ExLlamav2_HF_loader
return Exllamav2HF.from_pretrained(model_name)

File "/home/xire/text-generation-webui/modules/exllamav2_hf.py", line 162, in from_pretrained
config.prepare()
File "/home/xire/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/config.py", line 188, in prepare

with safe_open(st_file, framework = "pt", device = "cpu") as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

I saw in the other discussion the creator suggested removing period from folder name. But I have none. My folder name is:
brucethemoose_Yi-34B-200K-RPMerge-exl2-40bpw

ProphetOfBostrom

Feb 26, 2024

for future reference in case anyone's searching:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
is a corrupt file error. the file on your computer is different from what you were supposed to have downloaded. probably it got interruped. hence incomplete buffer

for text-generation-webui:
just running the downloader again should find and replace the damaged (probably truncated - cut off early) files. if they're truncated ooba tends to start off where the download was interrupted and doesn't waste time redownloading.

now if you're like me and wondering why this model no longer emits spaces, that's a WIP.

Xire

Feb 28, 2024

Ah I see, thanks for that. I left the download running while doing other things so perhaps it cut out midway. Hard to tell when the file size is still the same.

Xire changed discussion status to closed Feb 28, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment