Instructions to use TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge
- SGLang
How to use TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge with Docker Model Runner:
docker model run hf.co/TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge
4 bit version?
I tried doing it myself but ran into problems when using this: https://github.com/0cc4m/GPTQ-for-LLaMa (it adds support for mpt models)
I was looking into this as well. I tried to use main GPTQ-for-llama to quant it (this model just sounds a million times more promising than the original) but I'm getting errors because it is not a llama model. I saw that like a week ago the Occam released a quanted version, so it is doable (https://huggingface.co/OccamRazor/mpt-7b-storywriter-4bit-128g). I just don't know how. I also looked through occam's github with his version of koboldai and originally just didn't see his GPTQ implementation.
Anyway, now that I see mpasila's link I'm going to try that route. I have data right now too so if it works I would be happy to upload a working model. Maybe thebloke will beat me to it hah
Edit: I tried every which way to make the GPTQ that was linked above work. Does anyone have the sauce. I even tried the gptneox which at least failed different way (cuda memory over run). When I tried to run with llama version it screws up every time talking about the tokenizer not being compatable with the neox style tokenizer.
I also tried installing the two different ways. The old way with the conda env and the new way by making a new conda env and then running the pip install git command they have listed on the repo. Couldn't get the pip install way to work at all.
I will have a look tomorrow if I have the time
so if i had to guess we need that layer mapping...