Instructions to use allura-org/MS3-24B-Roselily-Creative with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allura-org/MS3-24B-Roselily-Creative with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="allura-org/MS3-24B-Roselily-Creative")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("allura-org/MS3-24B-Roselily-Creative") model = AutoModelForMultimodalLM.from_pretrained("allura-org/MS3-24B-Roselily-Creative") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use allura-org/MS3-24B-Roselily-Creative with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "allura-org/MS3-24B-Roselily-Creative" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allura-org/MS3-24B-Roselily-Creative", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/allura-org/MS3-24B-Roselily-Creative
- SGLang
How to use allura-org/MS3-24B-Roselily-Creative with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "allura-org/MS3-24B-Roselily-Creative" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allura-org/MS3-24B-Roselily-Creative", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "allura-org/MS3-24B-Roselily-Creative" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allura-org/MS3-24B-Roselily-Creative", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use allura-org/MS3-24B-Roselily-Creative with Docker Model Runner:
docker model run hf.co/allura-org/MS3-24B-Roselily-Creative
Use Docker
docker model run hf.co/allura-org/MS3-24B-Roselily-Creativetodo
make a model card and put a cute girl on it
some info
Making this public so it can be tried and possibly merged if desired while I work on getting the energy to write a proper card.
Short list of things to know:
- This is a bunch of RP, story writing, etc. creative data applied to ToastyPigeon/ms3-roselily-instruct.
- Instruct format: ChatML or Alpaca preferred, Tekken v7 possible
- ChatML tokens were assigned to unused tokens 20 and 21, this leaves all the tekken tokens intact so merges w/ tekken models are feasible
- Instruct-tuning phase did include Tekken v7 so the tokens are initialized and recognized, but I did not continue with it on the creative step because I do not like it for creative stuff (too restrictive with turn order)
- Feels a little less sensitive to samplers than Instruct-based MS3 models, but should probably still be used with conservative samplers
chat templates
You may need to set <|im_end|> and/or </s> as stopping strings depending on which format you're using, the model generates both properly but tokenizers can be finicky about what they stop on by default
Alpaca w/ System
### System:
{system prompt}
### Instruction:
{user message}
### Response:
{model answer}</s>
ChatML
<|im_start|>system
{system prompt}<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{model answer}<|im_end|>
Also saw some completion training in chat mode and adventure mode.
- Downloads last month
- 9
Model tree for allura-org/MS3-24B-Roselily-Creative
Base model
mistralai/Mistral-Small-24B-Base-2501
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "allura-org/MS3-24B-Roselily-Creative"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allura-org/MS3-24B-Roselily-Creative", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'