Instructions to use Fordentinc/book-builder-bookwriter-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Fordentinc/book-builder-bookwriter-v1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Fordentinc/book-builder-bookwriter-v1", filename="book-builder-bookwriter-v1-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Fordentinc/book-builder-bookwriter-v1 with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: llama cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: llama cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Use Docker
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Fordentinc/book-builder-bookwriter-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Fordentinc/book-builder-bookwriter-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Fordentinc/book-builder-bookwriter-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- Ollama
How to use Fordentinc/book-builder-bookwriter-v1 with Ollama:
ollama run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- Unsloth Studio
How to use Fordentinc/book-builder-bookwriter-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Fordentinc/book-builder-bookwriter-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Fordentinc/book-builder-bookwriter-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Fordentinc/book-builder-bookwriter-v1 to start chatting
- Pi
How to use Fordentinc/book-builder-bookwriter-v1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Fordentinc/book-builder-bookwriter-v1:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Fordentinc/book-builder-bookwriter-v1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Fordentinc/book-builder-bookwriter-v1 with Docker Model Runner:
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- Lemonade
How to use Fordentinc/book-builder-bookwriter-v1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Run and chat with the model
lemonade run user.book-builder-bookwriter-v1-Q4_K_M
List all available models
lemonade list
- 🚧 WORK IN PROGRESS — v1 EARLY RELEASE 🚧
- A bigger, better model (v2) is in development.
- book-builder-bookwriter-v1
🚧 WORK IN PROGRESS — v1 EARLY RELEASE 🚧
A bigger, better model (v2) is in development.
This page is the v1 early release of book-builder-bookwriter-v1. It is not the final model. Skim the rest of this card before you generate anything so you know what you're getting and what you're not.
book-builder-bookwriter-v1
A 7.6B-parameter prose-writing model fine-tuned on 10,211 human-authored novels (310,316 chapters, ~1.82 billion tokens) with a strict story-bible-to-chapter format. Built for BookBuilder to generate novel chapters from structured story bibles.
âš Work in Progress (v1, early release)
This is an early checkpoint of an ongoing training run. Training was paused at step 5000 / 9697 (~52% of one epoch) so the artifacts could be released publicly while the larger run continues.
Known limitations of v1:
- Treats bibles as style prompts, not strict plot instructions. Expect drift from the synopsis on one-shot generation.
- Does not reliably follow "FORBIDDEN" rules, character role assignments, or per-character constraints in rich BookBuilder-style bibles.
- Loaded keywords in synopses (proper names like "Stillwater", specific years like "1947") can trigger off-topic associations from the training corpus.
- On longer generations, may drift into paragraph-level repetition loops without the tuned sampling defaults in
ollama/Modelfile.What's coming:
- v2 (Q3/Q4 2026): Larger base model (Qwen 2.5 14B or 32B) + synthesized instruction-following data so the model can actually obey FORBIDDEN sections, distinguish protagonists from antagonists, and follow beat sheets. This addresses the main v1 limitation.
- v1 continuation to step 9697: the LoRA may be taken to the full single-epoch checkpoint and re-released as
book-builder-bookwriter-v1.1if testing shows it's worth the additional training compute. The resumable training state is preserved at branchresumable-step-5000.Best use for v1 right now: as a prose-style backbone inside a structured pipeline (like BookBuilder itself) that provides per-chapter beat sheets and plot anchors at generation time. The pipeline supplies the discipline the v1 model can't enforce on its own. For one-shot "give me a chapter from a synopsis" use, v1 produces readable prose but will frequently drift from the intended plot.
This is NOT a chat model. Do not prompt it like ChatGPT. See "How to use" below.
What this model does
You write a Story Bible in the format shown below (or fill in the template).
You give the model the bible plus ### Chapter.
The model writes the chapter prose.
It will not answer questions. It will not respond to "Write me a story about X." It only continues prose conditioned on the bible context.
Format the model expects
Every training example looked exactly like this:
### Bible
Title: [book title]
Author: [author name]
Genre: [genre]
Publisher: [publisher]
Synopsis: [1-3 paragraphs describing the book]
### Genre
[genre]
### Chapter
[chapter title]
[chapter prose...]
Your prompt MUST end at ### Chapter\n[chapter title]\n\n and the model fills in the prose.
How to use
Option 1: Ollama (easiest)
IMPORTANT: A plain ollama pull from HF discards any Modelfile parameters in the repo, so Ollama runs the model with its default sampling — which on long completion prompts causes repetition loops and over-long generations. Use the setup script below to register the model under the name bookbuilder with tuned defaults that prevent both problems.
One-shot setup:
curl -sSL https://huggingface.co/Fordentinc/book-builder-bookwriter-v1/resolve/main/ollama/setup_ollama.sh | bash
# Optional: pass a quant tag, default is Q5_K_M
# curl -sSL https://huggingface.co/Fordentinc/book-builder-bookwriter-v1/resolve/main/ollama/setup_ollama.sh | bash -s Q8_0
Then:
ollama run bookbuilder < your_bible.txt
The script pulls the GGUF, builds a local Modelfile with the right repeat_penalty 1.18, num_predict 2500, and stop tokens, and registers the result as bookbuilder.
If you'd rather pull manually (without the loop fix), you need to pass sampling flags every time:
ollama pull hf.co/Fordentinc/book-builder-bookwriter-v1:Q5_K_M
ollama run hf.co/Fordentinc/book-builder-bookwriter-v1:Q5_K_M \
--num-predict 2500 --repeat-penalty 1.18 --temperature 0.75
Available quants:
Q4_K_M(4.7 GB) - fits on 8 GB GPUs, some token-decoding artifactsQ5_K_M(5.4 GB) - balanced, recommended for 24 GB cardsQ8_0(8.1 GB) - near-lossless, cleaner token decoding than K-quantsF16(15.2 GB) - full precision, no quantization artifacts
Option 2: LM Studio
- Search for
Fordentinc/book-builder-bookwriter-v1 - Download the Q5_K_M quant
- Switch to Completion mode (not Chat). This is critical.
- Paste your filled-in bible as the input
- Generate
Option 3: llama.cpp
./llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q5_K_M \
--temp 0.8 --top-p 0.95 -n 2048 \
-f your_bible.txt
Option 4: Transformers + PEFT (Python, full bf16)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_id = "Qwen/Qwen2.5-7B"
adapter = "Fordentinc/book-builder-bookwriter-v1"
tok = AutoTokenizer.from_pretrained(adapter)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter)
model.eval()
bible_and_chapter_header = open("your_bible.txt").read()
inputs = tok(bible_and_chapter_header, return_tensors="pt").to("cuda")
out = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=True, temperature=0.8, top_p=0.95,
repetition_penalty=1.05,
)
print(tok.decode(out[0], skip_special_tokens=True))
Option 5: vLLM (production / OpenAI-compatible API)
vLLM cannot load the LoRA adapter alone; use the merged bf16 weights instead:
vllm serve Fordentinc/book-builder-bookwriter-v1 --dtype bfloat16 --max-model-len 16384
Step-by-step: from blank page to chapter
- Download the template: bible_template.txt
- Fill in every field. The model relies on each section to anchor character voices, setting, and tone.
- Save as plain text (e.g.
my_book.txt). - End the file with
### Chapterfollowed by your chapter title and one blank line. Example:### Chapter Chapter 1: The Long Drive Home - Run inference using one of the options above.
- The model writes ~1500-4000 words of prose, then stops or hits your
max_new_tokenscap. - For chapter 2: keep the same bible, change the chapter header, optionally append the last paragraph of chapter 1 so the model continues smoothly.
See example_bibles/ for two complete working examples.
Recommended sampling parameters
| Parameter | Value | Why |
|---|---|---|
temperature |
0.8 | Lower = repetitive, higher = incoherent |
top_p |
0.95 | Standard nucleus sampling |
repetition_penalty |
1.05 | Prevents loops; do not push past 1.15 |
max_new_tokens |
2048-4096 | Most chapters land in 1500-3500 tokens |
min_p |
0.05 (if supported) | Better than top_k for prose |
Training details
- Base model: Qwen 2.5 7B (Apache 2.0)
- Method: QLoRA, r=16, alpha=32, dropout 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training corpus: 10,211 novels with derivable story bibles (Sci-Fi, Thriller, Romance, Crime, Western, Fantasy, etc.)
- Corpus stats: 310,316 chapter rows, ~1.4 billion words, ~1.82 billion tokens
- Hardware: 1× NVIDIA B200 (180 GB HBM3e)
- Sequence length: 2048
- Effective batch: 32 (per-device 4, grad-accum 8)
- Optimizer: paged_adamw_8bit
- LR: 2e-4 cosine, 3% warmup
- Epochs: 1
- Wall time: ~13.5 hours
- Data cleanup: em-dashes removed (replaced with commas), smart-quotes normalized, residual front-matter stripped, chapters with <800 or >6000 words filtered out
Quirks and limitations
- No system messages, no chat history, no
[INST]tags. The model was never shown those during training. - Bibles outside its training distribution (highly experimental forms, non-Western names, modern slang heavy) may produce uneven results.
- Names follow Western conventions (USA/UK/Italy/Western Europe). The training filter excluded other naming traditions.
- Em-dashes are absent from training data and the model will rarely produce them. This is intentional.
- Chapter length is learned from data (avg ~4,500 words). To force shorter chapters, cap
max_new_tokens.
License
Apache 2.0 (inherited from base Qwen 2.5 7B). You may use commercially.
Citation
@misc{bookbuilder_bookwriter_v1_2026,
author = {Fordentinc},
title = {book-builder-bookwriter-v1: A prose-writing LoRA on Qwen 2.5 7B},
year = {2026},
url = {https://huggingface.co/Fordentinc/book-builder-bookwriter-v1},
}
Reporting issues
Open a discussion on this model page. Include the bible you used (first 500 chars) and the first 200 chars of the model output.
- Downloads last month
- 122
Model tree for Fordentinc/book-builder-bookwriter-v1
Base model
Qwen/Qwen2.5-7B
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1: