markldn's picture
Add Q4_K_M GGUF
7e7495f verified
|
Raw
History Blame Contribute Delete
1.89 kB
---
license: apache-2.0
language:
- en
base_model: Pageshift-Entertainment/pagestorm-research-preview-14b-full-book
base_model_relation: quantized
tags:
- gguf
- llama.cpp
- story-generation
- staged-generation
- full-book
- ministral3
pipeline_tag: text-generation
---
# PageStorm Research Preview 14B Full Book — GGUF
GGUF quantizations of
[Pageshift-Entertainment/pagestorm-research-preview-14b-full-book](https://huggingface.co/Pageshift-Entertainment/pagestorm-research-preview-14b-full-book),
a `ministral3` model trained to produce a full novel from a single prompt via a
staged generation pipeline.
## Files
- `pagestorm-research-preview-14b-full-book-Q8_0.gguf` (~14 GB)
- `pagestorm-research-preview-14b-full-book-Q4_K_M.gguf` (~7.7 GB)
## Requirements
- A llama.cpp build whose runtime supports the **`mistral3`** architecture
(`llm_build_mistral3` / `LLM_ARCH_MISTRAL3`). Older builds will fail to load it.
## Notes
- The Q8_0 file was converted with `convert_hf_to_gguf.py --outtype q8_0`.
- The Q4_K_M file was quantized from a temporary BF16 GGUF exported from the
original BF16 Hugging Face checkpoint, not requantized from Q8_0.
- The source `config.json` needed `original_max_position_embeddings` changed
from `16384.0` to integer `16384` so the converter could write the int rope
KV field.
- The model uses a **staged** protocol with custom role headers
(`<|start_header_id|>…<|stop_header_id|>`) and `<|eot_id|>` as the stage stop
token — it is not a plain chat model. See the base model card and its
`story_stage_generation.py` for the prompt protocol.
- Native context is 262144; KV at that length is large — quantize the KV cache
(`--cache-type-k q8_0 --cache-type-v q8_0`) and/or cap `--ctx-size` to fit VRAM.
## Attribution
Base model © Pageshift Entertainment, Apache-2.0. This repo only redistributes a
quantized copy of those weights.