Instructions to use openaccess-ai-collective/manticore-13b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openaccess-ai-collective/manticore-13b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openaccess-ai-collective/manticore-13b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openaccess-ai-collective/manticore-13b") model = AutoModelForCausalLM.from_pretrained("openaccess-ai-collective/manticore-13b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use openaccess-ai-collective/manticore-13b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openaccess-ai-collective/manticore-13b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openaccess-ai-collective/manticore-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/openaccess-ai-collective/manticore-13b
- SGLang
How to use openaccess-ai-collective/manticore-13b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openaccess-ai-collective/manticore-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openaccess-ai-collective/manticore-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openaccess-ai-collective/manticore-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openaccess-ai-collective/manticore-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use openaccess-ai-collective/manticore-13b with Docker Model Runner:
docker model run hf.co/openaccess-ai-collective/manticore-13b
can you provie the data process demo before train llms?
#7
by scall - opened
data process demo
def generate_and_tokenize_prompt2(examples, CUTOFF_LEN=SEQ_LEN):
instruction, input_text = examples['instruction'].strip(),examples['input'].strip()
if len(input_text)==0:
user_prompt = f"User:{instruction}\n###{input_text}\n\nAssistant: \n"
else:
user_prompt = f"User:{instruction}\n\nAssistant: \n"
len_user_prompt_tokens = (len(tokenizer(user_prompt,truncation=True,max_length=CUTOFF_LEN + 1,)["input_ids"])- 1) # no eos token
full_tokens = tokenizer(user_prompt + examples["output"],truncation=True,max_length=CUTOFF_LEN + 1,padding="max_length")["input_ids"][:-1]
labels = [-100]*len_user_prompt_tokens + [id if id!=0 else -100 for id in full_tokens[len_user_prompt_tokens:] ]
return {"input_ids": full_tokens,"labels": torch.LongTensor(labels) ,"attention_mask": torch.LongTensor([1] * len(full_tokens)) }
I use my data process to instruct tuning manticore-13b, the train loss initially very large, such as 78.0, but your train loss is very small ( https://wandb.ai/wing-lian/manticore-13b/runs/nq3u3uoh/workspace)
how high is your learning rate set to?
how high is your learning rate set to?
single batch size is 48, runing in single node with 8 gpus, learning rate is 1e-5, warm up steps is 200