Instructions to use Fordentinc/book-builder-bookwriter-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Fordentinc/book-builder-bookwriter-v1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Fordentinc/book-builder-bookwriter-v1",
	filename="book-builder-bookwriter-v1-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Fordentinc/book-builder-bookwriter-v1 with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Use Docker

docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M

LM Studio
Jan

vLLM

How to use Fordentinc/book-builder-bookwriter-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Fordentinc/book-builder-bookwriter-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Fordentinc/book-builder-bookwriter-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Ollama
How to use Fordentinc/book-builder-bookwriter-v1 with Ollama:
```
ollama run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
```

Unsloth Studio

How to use Fordentinc/book-builder-bookwriter-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Fordentinc/book-builder-bookwriter-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Fordentinc/book-builder-bookwriter-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Fordentinc/book-builder-bookwriter-v1 to start chatting

How to use Fordentinc/book-builder-bookwriter-v1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Fordentinc/book-builder-bookwriter-v1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Fordentinc/book-builder-bookwriter-v1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Fordentinc/book-builder-bookwriter-v1 with Docker Model Runner:
```
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
```

Lemonade

How to use Fordentinc/book-builder-bookwriter-v1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Fordentinc/book-builder-bookwriter-v1:Q4_K_M

Run and chat with the model

lemonade run user.book-builder-bookwriter-v1-Q4_K_M

List all available models

lemonade list

🚧 WORK IN PROGRESS — v1 EARLY RELEASE 🚧

A bigger, better model (v2) is in development.

This page is the v1 early release of book-builder-bookwriter-v1. It is not the final model. Skim the rest of this card before you generate anything so you know what you're getting and what you're not.

book-builder-bookwriter-v1

A 7.6B-parameter prose-writing model fine-tuned on 10,211 human-authored novels (310,316 chapters, ~1.82 billion tokens) with a strict story-bible-to-chapter format. Built for BookBuilder to generate novel chapters from structured story bibles.

⚠ Work in Progress (v1, early release)

This is an early checkpoint of an ongoing training run. Training was paused at step 5000 / 9697 (~52% of one epoch) so the artifacts could be released publicly while the larger run continues.

Known limitations of v1:

Treats bibles as style prompts, not strict plot instructions. Expect drift from the synopsis on one-shot generation.

Does not reliably follow "FORBIDDEN" rules, character role assignments, or per-character constraints in rich BookBuilder-style bibles.

Loaded keywords in synopses (proper names like "Stillwater", specific years like "1947") can trigger off-topic associations from the training corpus.

On longer generations, may drift into paragraph-level repetition loops without the tuned sampling defaults in ollama/Modelfile.

What's coming:

v2 (Q3/Q4 2026): Larger base model (Qwen 2.5 14B or 32B) + synthesized instruction-following data so the model can actually obey FORBIDDEN sections, distinguish protagonists from antagonists, and follow beat sheets. This addresses the main v1 limitation.

v1 continuation to step 9697: the LoRA may be taken to the full single-epoch checkpoint and re-released as book-builder-bookwriter-v1.1 if testing shows it's worth the additional training compute. The resumable training state is preserved at branch resumable-step-5000.

Best use for v1 right now: as a prose-style backbone inside a structured pipeline (like BookBuilder itself) that provides per-chapter beat sheets and plot anchors at generation time. The pipeline supplies the discipline the v1 model can't enforce on its own. For one-shot "give me a chapter from a synopsis" use, v1 produces readable prose but will frequently drift from the intended plot.

This is NOT a chat model. Do not prompt it like ChatGPT. See "How to use" below.

What this model does

You write a Story Bible in the format shown below (or fill in the template). You give the model the bible plus ### Chapter. The model writes the chapter prose.

It will not answer questions. It will not respond to "Write me a story about X." It only continues prose conditioned on the bible context.

Format the model expects

Every training example looked exactly like this:

### Bible
Title: [book title]
Author: [author name]
Genre: [genre]
Publisher: [publisher]
Synopsis: [1-3 paragraphs describing the book]

### Genre
[genre]



### Chapter
[chapter title]

[chapter prose...]

Your prompt MUST end at ### Chapter\n[chapter title]\n\n and the model fills in the prose.

How to use

Option 1: Ollama (easiest)

IMPORTANT: A plain ollama pull from HF discards any Modelfile parameters in the repo, so Ollama runs the model with its default sampling — which on long completion prompts causes repetition loops and over-long generations. Use the setup script below to register the model under the name bookbuilder with tuned defaults that prevent both problems.

One-shot setup:

curl -sSL https://huggingface.co/Fordentinc/book-builder-bookwriter-v1/resolve/main/ollama/setup_ollama.sh | bash
# Optional: pass a quant tag, default is Q5_K_M
# curl -sSL https://huggingface.co/Fordentinc/book-builder-bookwriter-v1/resolve/main/ollama/setup_ollama.sh | bash -s Q8_0

Then:

ollama run bookbuilder < your_bible.txt

The script pulls the GGUF, builds a local Modelfile with the right repeat_penalty 1.18, num_predict 2500, and stop tokens, and registers the result as bookbuilder.

If you'd rather pull manually (without the loop fix), you need to pass sampling flags every time:

ollama pull hf.co/Fordentinc/book-builder-bookwriter-v1:Q5_K_M
ollama run hf.co/Fordentinc/book-builder-bookwriter-v1:Q5_K_M \
  --num-predict 2500 --repeat-penalty 1.18 --temperature 0.75

Available quants:

Q4_K_M (4.7 GB) - fits on 8 GB GPUs, some token-decoding artifacts
Q5_K_M (5.4 GB) - balanced, recommended for 24 GB cards
Q8_0 (8.1 GB) - near-lossless, cleaner token decoding than K-quants
F16 (15.2 GB) - full precision, no quantization artifacts

Option 2: LM Studio

Search for Fordentinc/book-builder-bookwriter-v1
Download the Q5_K_M quant
Switch to Completion mode (not Chat). This is critical.
Paste your filled-in bible as the input
Generate

Option 3: llama.cpp

./llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q5_K_M \
  --temp 0.8 --top-p 0.95 -n 2048 \
  -f your_bible.txt

Option 4: Transformers + PEFT (Python, full bf16)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_id = "Qwen/Qwen2.5-7B"
adapter = "Fordentinc/book-builder-bookwriter-v1"

tok = AutoTokenizer.from_pretrained(adapter)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter)
model.eval()

bible_and_chapter_header = open("your_bible.txt").read()

inputs = tok(bible_and_chapter_header, return_tensors="pt").to("cuda")
out = model.generate(
    **inputs,
    max_new_tokens=2048,
    do_sample=True, temperature=0.8, top_p=0.95,
    repetition_penalty=1.05,
)
print(tok.decode(out[0], skip_special_tokens=True))

Option 5: vLLM (production / OpenAI-compatible API)

vLLM cannot load the LoRA adapter alone; use the merged bf16 weights instead:

vllm serve Fordentinc/book-builder-bookwriter-v1 --dtype bfloat16 --max-model-len 16384

Step-by-step: from blank page to chapter

Download the template: bible_template.txt
Fill in every field. The model relies on each section to anchor character voices, setting, and tone.
Save as plain text (e.g. my_book.txt).
End the file with ### Chapter followed by your chapter title and one blank line. Example:
```
### Chapter
Chapter 1: The Long Drive Home
```
Run inference using one of the options above.
The model writes ~1500-4000 words of prose, then stops or hits your max_new_tokens cap.
For chapter 2: keep the same bible, change the chapter header, optionally append the last paragraph of chapter 1 so the model continues smoothly.

See example_bibles/ for two complete working examples.

Recommended sampling parameters

Parameter	Value	Why
`temperature`	0.8	Lower = repetitive, higher = incoherent
`top_p`	0.95	Standard nucleus sampling
`repetition_penalty`	1.05	Prevents loops; do not push past 1.15
`max_new_tokens`	2048-4096	Most chapters land in 1500-3500 tokens
`min_p`	0.05 (if supported)	Better than top_k for prose

Training details

Base model: Qwen 2.5 7B (Apache 2.0)
Method: QLoRA, r=16, alpha=32, dropout 0.05
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training corpus: 10,211 novels with derivable story bibles (Sci-Fi, Thriller, Romance, Crime, Western, Fantasy, etc.)
Corpus stats: 310,316 chapter rows, ~1.4 billion words, ~1.82 billion tokens
Hardware: 1× NVIDIA B200 (180 GB HBM3e)
Sequence length: 2048
Effective batch: 32 (per-device 4, grad-accum 8)
Optimizer: paged_adamw_8bit
LR: 2e-4 cosine, 3% warmup
Epochs: 1
Wall time: ~13.5 hours
Data cleanup: em-dashes removed (replaced with commas), smart-quotes normalized, residual front-matter stripped, chapters with <800 or >6000 words filtered out

Quirks and limitations

No system messages, no chat history, no [INST] tags. The model was never shown those during training.
Bibles outside its training distribution (highly experimental forms, non-Western names, modern slang heavy) may produce uneven results.
Names follow Western conventions (USA/UK/Italy/Western Europe). The training filter excluded other naming traditions.
Em-dashes are absent from training data and the model will rarely produce them. This is intentional.
Chapter length is learned from data (avg ~4,500 words). To force shorter chapters, cap max_new_tokens.

License

Apache 2.0 (inherited from base Qwen 2.5 7B). You may use commercially.

Citation

@misc{bookbuilder_bookwriter_v1_2026,
  author = {Fordentinc},
  title  = {book-builder-bookwriter-v1: A prose-writing LoRA on Qwen 2.5 7B},
  year   = {2026},
  url    = {https://huggingface.co/Fordentinc/book-builder-bookwriter-v1},
}

Reporting issues

Open a discussion on this model page. Include the bible you used (first 500 chars) and the first 200 chars of the model output.

Downloads last month: 122

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Fordentinc/book-builder-bookwriter-v1

Base model

Qwen/Qwen2.5-7B

Quantized

(83)

this model