QuixiAI/WizardLM_alpaca_evol_instruct_70k_unfiltered
Viewer • Updated • 55k • 187 • 148
How to use seonglae/wizardlm-7b-uncensored-gptq with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="seonglae/wizardlm-7b-uncensored-gptq") # Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM
tokenizer = AutoTokenizer.from_pretrained("seonglae/wizardlm-7b-uncensored-gptq")
model = AutoModelForMultimodalLM.from_pretrained("seonglae/wizardlm-7b-uncensored-gptq")How to use seonglae/wizardlm-7b-uncensored-gptq with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "seonglae/wizardlm-7b-uncensored-gptq"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "seonglae/wizardlm-7b-uncensored-gptq",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/seonglae/wizardlm-7b-uncensored-gptq
How to use seonglae/wizardlm-7b-uncensored-gptq with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "seonglae/wizardlm-7b-uncensored-gptq" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "seonglae/wizardlm-7b-uncensored-gptq",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "seonglae/wizardlm-7b-uncensored-gptq" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "seonglae/wizardlm-7b-uncensored-gptq",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use seonglae/wizardlm-7b-uncensored-gptq with Docker Model Runner:
docker model run hf.co/seonglae/wizardlm-7b-uncensored-gptq
This model should use AutoGPTQ so you need to use auto-gptq
no-act-order modelfrom transformers import AutoTokenizer, pipeline, AutoModelForCausalLM, LlamaForCausalLM, LlamaTokenizer, StoppingCriteria, PreTrainedTokenizerBase
from auto_gptq import AutoGPTQForCausalLM
model_id = 'seonglae/wizardlm-7b-uncensored-gptq'
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(
model_id,
model_basename=model_basename,
trust_remote_code=True,
device='cuda:0',
use_triton=False,
use_safetensors=True,
)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
temperature=0.5,
top_p=0.95,
max_new_tokens=100,
repetition_penalty=1.15,
)
prompt = "USER: Are you AI?\nASSISTANT:"
pipe(prompt)