Instructions to use allura-org/MS3-24B-Roselily-Creative with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use allura-org/MS3-24B-Roselily-Creative with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="allura-org/MS3-24B-Roselily-Creative")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("allura-org/MS3-24B-Roselily-Creative")
model = AutoModelForMultimodalLM.from_pretrained("allura-org/MS3-24B-Roselily-Creative")

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use allura-org/MS3-24B-Roselily-Creative with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "allura-org/MS3-24B-Roselily-Creative"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/MS3-24B-Roselily-Creative",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/allura-org/MS3-24B-Roselily-Creative

SGLang

How to use allura-org/MS3-24B-Roselily-Creative with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "allura-org/MS3-24B-Roselily-Creative" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/MS3-24B-Roselily-Creative",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "allura-org/MS3-24B-Roselily-Creative" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allura-org/MS3-24B-Roselily-Creative",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use allura-org/MS3-24B-Roselily-Creative with Docker Model Runner:
```
docker model run hf.co/allura-org/MS3-24B-Roselily-Creative
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

todo

make a model card and put a cute girl on it

some info

Making this public so it can be tried and possibly merged if desired while I work on getting the energy to write a proper card.

Short list of things to know:

This is a bunch of RP, story writing, etc. creative data applied to ToastyPigeon/ms3-roselily-instruct.
Instruct format: ChatML or Alpaca preferred, Tekken v7 possible
ChatML tokens were assigned to unused tokens 20 and 21, this leaves all the tekken tokens intact so merges w/ tekken models are feasible
Instruct-tuning phase did include Tekken v7 so the tokens are initialized and recognized, but I did not continue with it on the creative step because I do not like it for creative stuff (too restrictive with turn order)
Feels a little less sensitive to samplers than Instruct-based MS3 models, but should probably still be used with conservative samplers

chat templates

You may need to set <|im_end|> and/or </s> as stopping strings depending on which format you're using, the model generates both properly but tokenizers can be finicky about what they stop on by default

Alpaca w/ System

### System:
{system prompt}

### Instruction:
{user message}

### Response:
{model answer}</s>

ChatML

<|im_start|>system
{system prompt}<|im_end|>
<|im_start|>user
{user message}<|im_end|>
<|im_start|>assistant
{model answer}<|im_end|>

Also saw some completion training in chat mode and adventure mode.

Downloads last month: 9

Safetensors

Model size

24B params

Tensor type

F16

Model tree for allura-org/MS3-24B-Roselily-Creative

Base model

mistralai/Mistral-Small-24B-Base-2501

Finetuned

ToastyPigeon/ms3-roselily-instruct

Finetuned

(2)

this model

Merges

2 models

Quantizations

4 models