pagestorm-research-preview-14b-full-book-mlx-8bit

An MLX 8-bit quantization of Pageshift-Entertainment/pagestorm-research-preview-14b-full-book — a 13.5B staged full-book story generator (Mistral ministral3 architecture, 262K context).

Converted with mlx-lm 0.31.3:

mlx_lm.convert --hf-path Pageshift-Entertainment/pagestorm-research-preview-14b-full-book \
  -q --q-bits 8 --q-group-size 64 --mlx-path pagestorm-14b-mlx-8bit
  • 8.5 bits/weight, ~13 GB on disk (down from ~27 GB bf16)
  • ~38 tok/s, ~14.4 GB peak memory on Apple Silicon
  • Chat template, tekken tokenizer, and the original story_stage_generation.py helper are retained

How it works — staged generation

This is not a chat model. Generation is a chain of stage-named roles, each rendered as a <|start_header_id|>{stage}<|stop_header_id|> block terminated by <|eot_id|>:

prompt → book_preview → book_plan → first_chapter_plan → first_chapter_text
       → full_book_chapters_plan → book_characters_list → scene_breakdown → chapter_text

You drive it stage by stage: feed the prior stages, then open the next stage header and let the model fill it. Minimal example generating the book_preview from a one-line idea:

from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler

model, tok = load("skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit")

idea = "A heist thriller aboard a generation ship where the target is the only seed vault."
prompt = (f"<|start_header_id|>prompt<|stop_header_id|>\n\n{idea}<|eot_id|>"
          f"<|start_header_id|>book_preview<|stop_header_id|>\n\n")

print(generate(model, tok, prompt, max_tokens=512,
               sampler=make_sampler(temp=0.8, top_p=0.95), verbose=True))

<|eot_id|> is the EOS token, so each stage stops on its own. See the upstream story_stage_generation.py for the full multi-chapter assembly logic (per-chapter scene breakdowns and chapter text).

Credit & license

All credit to Pageshift-Entertainment for the original model, the LongPage dataset, and the accompanying research. Apache-2.0, same as the base model. This is an unofficial quantization for local MLX inference.

Downloads last month
31
Safetensors
Model size
14B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit

Dataset used to train skibare87/pagestorm-research-preview-14b-full-book-mlx-8bit