Instructions to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw") model = AutoModelForCausalLM.from_pretrained("brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw
- SGLang
How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw with Docker Model Runner:
docker model run hf.co/brucethemoose/Yi-34B-200K-RPMerge-exl2-40bpw
Errors on load in text gen web ui.
I was gonna try this but I can't get it to run, errors the moment you try to load. Doesn't matter if I try ExLlamav2_HF, non-HF, ExLlama, etc. None work.
Below is errors (not sure how to format code here):
Traceback (most recent call last):
File "/home/xire/text-generation-webui/modules/ui_model_menu.py", line 210, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
File "/home/xire/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_maploader
File "/home/xire/text-generation-webui/modules/models.py", line 411, in ExLlamav2_HF_loader
return Exllamav2HF.from_pretrained(model_name)
File "/home/xire/text-generation-webui/modules/exllamav2_hf.py", line 162, in from_pretrained
config.prepare()
File "/home/xire/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/config.py", line 188, in prepare
with safe_open(st_file, framework = "pt", device = "cpu") as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
I saw in the other discussion the creator suggested removing period from folder name. But I have none. My folder name is:
brucethemoose_Yi-34B-200K-RPMerge-exl2-40bpw
for future reference in case anyone's searching:safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
is a corrupt file error. the file on your computer is different from what you were supposed to have downloaded. probably it got interruped. hence incomplete buffer
for text-generation-webui:
just running the downloader again should find and replace the damaged (probably truncated - cut off early) files. if they're truncated ooba tends to start off where the download was interrupted and doesn't waste time redownloading.
now if you're like me and wondering why this model no longer emits spaces, that's a WIP.
Ah I see, thanks for that. I left the download running while doing other things so perhaps it cut out midway. Hard to tell when the file size is still the same.