markldn's picture
Add Q4_K_M GGUF
7e7495f verified
|
Raw
History Blame Contribute Delete
1.89 kB
metadata
license: apache-2.0
language:
  - en
base_model: Pageshift-Entertainment/pagestorm-research-preview-14b-full-book
base_model_relation: quantized
tags:
  - gguf
  - llama.cpp
  - story-generation
  - staged-generation
  - full-book
  - ministral3
pipeline_tag: text-generation

PageStorm Research Preview 14B Full Book — GGUF

GGUF quantizations of Pageshift-Entertainment/pagestorm-research-preview-14b-full-book, a ministral3 model trained to produce a full novel from a single prompt via a staged generation pipeline.

Files

  • pagestorm-research-preview-14b-full-book-Q8_0.gguf (~14 GB)
  • pagestorm-research-preview-14b-full-book-Q4_K_M.gguf (~7.7 GB)

Requirements

  • A llama.cpp build whose runtime supports the mistral3 architecture (llm_build_mistral3 / LLM_ARCH_MISTRAL3). Older builds will fail to load it.

Notes

  • The Q8_0 file was converted with convert_hf_to_gguf.py --outtype q8_0.
  • The Q4_K_M file was quantized from a temporary BF16 GGUF exported from the original BF16 Hugging Face checkpoint, not requantized from Q8_0.
  • The source config.json needed original_max_position_embeddings changed from 16384.0 to integer 16384 so the converter could write the int rope KV field.
  • The model uses a staged protocol with custom role headers (<|start_header_id|>…<|stop_header_id|>) and <|eot_id|> as the stage stop token — it is not a plain chat model. See the base model card and its story_stage_generation.py for the prompt protocol.
  • Native context is 262144; KV at that length is large — quantize the KV cache (--cache-type-k q8_0 --cache-type-v q8_0) and/or cap --ctx-size to fit VRAM.

Attribution

Base model © Pageshift Entertainment, Apache-2.0. This repo only redistributes a quantized copy of those weights.