Instructions to use skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
pagestorm-research-preview-14b-full-book-mlx-8bit
An MLX 8-bit quantization of
Pageshift-Entertainment/pagestorm-research-preview-14b-full-book
— a 13.5B staged full-book story generator (Mistral ministral3 architecture, 262K context).
Converted with mlx-lm 0.31.3:
mlx_lm.convert --hf-path Pageshift-Entertainment/pagestorm-research-preview-14b-full-book \
-q --q-bits 8 --q-group-size 64 --mlx-path pagestorm-14b-mlx-8bit
- 8.5 bits/weight, ~13 GB on disk (down from ~27 GB bf16)
- ~38 tok/s, ~14.4 GB peak memory on Apple Silicon
- Chat template, tekken tokenizer, and the original
story_stage_generation.pyhelper are retained
How it works — staged generation
This is not a chat model. Generation is a chain of stage-named roles, each rendered as a
<|start_header_id|>{stage}<|stop_header_id|> block terminated by <|eot_id|>:
prompt → book_preview → book_plan → first_chapter_plan → first_chapter_text
→ full_book_chapters_plan → book_characters_list → scene_breakdown → chapter_text
You drive it stage by stage: feed the prior stages, then open the next stage header and let the
model fill it. Minimal example generating the book_preview from a one-line idea:
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler
model, tok = load("skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit")
idea = "A heist thriller aboard a generation ship where the target is the only seed vault."
prompt = (f"<|start_header_id|>prompt<|stop_header_id|>\n\n{idea}<|eot_id|>"
f"<|start_header_id|>book_preview<|stop_header_id|>\n\n")
print(generate(model, tok, prompt, max_tokens=512,
sampler=make_sampler(temp=0.8, top_p=0.95), verbose=True))
<|eot_id|> is the EOS token, so each stage stops on its own. See the upstream
story_stage_generation.py for the full multi-chapter assembly logic (per-chapter scene
breakdowns and chapter text).
Credit & license
All credit to Pageshift-Entertainment for the original model, the LongPage dataset, and the accompanying research. Apache-2.0, same as the base model. This is an unofficial quantization for local MLX inference.
- Downloads last month
- 31
8-bit