Instructions to use openaccess-ai-collective/manticore-13b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openaccess-ai-collective/manticore-13b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openaccess-ai-collective/manticore-13b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openaccess-ai-collective/manticore-13b") model = AutoModelForCausalLM.from_pretrained("openaccess-ai-collective/manticore-13b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use openaccess-ai-collective/manticore-13b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openaccess-ai-collective/manticore-13b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openaccess-ai-collective/manticore-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/openaccess-ai-collective/manticore-13b
- SGLang
How to use openaccess-ai-collective/manticore-13b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openaccess-ai-collective/manticore-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openaccess-ai-collective/manticore-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openaccess-ai-collective/manticore-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openaccess-ai-collective/manticore-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use openaccess-ai-collective/manticore-13b with Docker Model Runner:
docker model run hf.co/openaccess-ai-collective/manticore-13b
| base_model: huggyllama/llama-13b | |
| # base_model: /workspace/manticore-13b/ | |
| base_model_config: huggyllama/llama-13b | |
| model_type: LlamaForCausalLM | |
| tokenizer_type: LlamaTokenizer | |
| load_in_8bit: false | |
| datasets: | |
| - path: winglian/evals | |
| data_files: | |
| - hf/ARC-Challenge.jsonl | |
| - hf/ARC-Easy.jsonl | |
| - mmlu/abstract_algebra.jsonl | |
| - mmlu/conceptual_physics.jsonl | |
| - mmlu/formal_logic.jsonl | |
| - mmlu/high_school_physics.jsonl | |
| - mmlu/logical_fallacies.jsonl | |
| type: explainchoice | |
| - path: winglian/evals | |
| data_files: | |
| - openai/tldr.jsonl | |
| type: summarizetldr | |
| - path: winglian/evals | |
| data_files: | |
| - hellaswag/hellaswag-concise.jsonl | |
| type: concisechoice | |
| - path: metaeval/ScienceQA_text_only | |
| type: concisechoice | |
| - path: ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered | |
| type: alpaca | |
| - path: ehartford/wizard_vicuna_70k_unfiltered | |
| type: sharegpt | |
| - path: winglian/chatlogs-en-cleaned | |
| data_files: | |
| - sharegpt_cleaned.jsonl | |
| type: sharegpt | |
| - path: teknium/GPT4-LLM-Cleaned | |
| type: alpaca | |
| - path: teknium/GPTeacher-General-Instruct | |
| data_files: gpt4-instruct-similarity-0.6-dataset.json | |
| type: gpteacher | |
| - path: QingyiSi/Alpaca-CoT | |
| data_files: | |
| - Chain-of-Thought/formatted_cot_data/aqua_train.json | |
| - Chain-of-Thought/formatted_cot_data/creak_train.json | |
| - Chain-of-Thought/formatted_cot_data/ecqa_train.json | |
| - Chain-of-Thought/formatted_cot_data/esnli_train.json | |
| - Chain-of-Thought/formatted_cot_data/gsm8k_train.json | |
| - Chain-of-Thought/formatted_cot_data/qasc_train.json | |
| - Chain-of-Thought/formatted_cot_data/qed_train.json | |
| - Chain-of-Thought/formatted_cot_data/sensemaking_train.json | |
| - Chain-of-Thought/formatted_cot_data/strategyqa_train.json | |
| - GPTeacher/Roleplay/formatted_roleplay-similarity_0.6-instruct-dataset.json | |
| type: alpaca | |
| dataset_prepared_path: last_run_prepared | |
| val_set_size: 0.02 | |
| adapter: | |
| lora_model_dir: | |
| sequence_len: 2048 | |
| max_packed_sequence_len: 2048 | |
| lora_r: | |
| lora_alpha: | |
| lora_dropout: | |
| lora_target_modules: | |
| lora_fan_in_fan_out: | |
| wandb_project: manticore-13b | |
| wandb_watch: | |
| wandb_run_id: | |
| wandb_log_model: | |
| output_dir: ./manticore-13b | |
| batch_size: 512 | |
| micro_batch_size: 8 | |
| num_epochs: 4 | |
| optimizer: | |
| torchdistx_path: | |
| lr_scheduler: | |
| learning_rate: 0.000032 | |
| train_on_inputs: false | |
| group_by_length: false | |
| bf16: true | |
| tf32: true | |
| gradient_checkpointing: true | |
| early_stopping_patience: | |
| resume_from_checkpoint: | |
| local_rank: | |
| logging_steps: 1 | |
| xformers_attention: true | |
| flash_attention: | |
| gptq_groupsize: | |
| gptq_model_v1: | |
| warmup_steps: 20 | |
| eval_steps: 10 | |
| save_steps: | |
| debug: | |
| deepspeed: | |
| weight_decay: 0 | |
| fsdp: | |
| - full_shard | |
| - auto_wrap | |
| fsdp_config: | |
| fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer | |
| special_tokens: | |
| bos_token: "<s>" | |
| eos_token: "</s>" | |
| unk_token: "<unk>" | |