# Changelog All notable changes to the Liquid AI Spam Classifier project. ## [v0.5.9] - 2026-04-16 (retrain script fixes: NaN gradients + accurate example counts) ### Summary Fixed gradient explosion (NaN loss) that crashed full retrains on Apple Silicon, and corrected stale example counts and time estimates across all retrain command files. ### Fixed - `retrain_liquid.py` — `ACTIVATION_OFFLOADING = True` caused `AssertionError: Torch not compiled with CUDA enabled`; TRL's `OffloadActivations` is CUDA-only and crashes on MPS. Set to `False`. - `retrain_liquid.py` — gradient explosion (`loss: 0`, `grad_norm: nan`, `entropy: nan`) caused by learning rate too aggressive without activation offloading. Fixed: - `LEARNING_RATE`: `2e-4` → `5e-5` - `max_grad_norm`: `0.3` → `1.0` - Added `warmup_steps=100` to ramp LR gradually and prevent early gradient spikes - `retrain.command` (liquid), `Retrain.command`, `spam-classifier-mlx/retrain.command` — menu displayed stale example counts (~20,000) and time estimates (~2.5-3.5 hrs) that didn't match actual dataset sizes. Corrected to actual counts (liquid/mlx full: ~16,000; mlx fast: ~6,800) and recalculated time estimates. - `retrain.command` (liquid) — removed "activation offloading" from memory optimizations note since it is now disabled on MPS. ### Changed - `spam-classifier-liquid/spam_classifier_liquid.ipynb` — added `torch_empty_cache_steps=50` and `dataloader_pin_memory=False` to notebook `SFTConfig` to match retrain script --- ## [v0.5.3] - 2026-04-16 (GGUF rename to spam-classifier-F16.gguf) ### Summary Renamed the local and HuggingFace GGUF file to `spam-classifier-F16.gguf` so that HuggingFace's model card parser can detect the quantization type (F16) and display it in the GGUF variants widget. Updated all local file references accordingly. ### Changed - `spam-classifier.gguf` → `spam-classifier-F16.gguf` (local file rename) - `VoltageVagabond/spam-classifier-liquid-GGUF` — deleted `spam-classifier.gguf`, uploaded `spam-classifier-F16.gguf`, updated README - Updated all references in `StartServer.command`, `Retrain.command`, `merge_and_convert_gguf.py`, `verify_gguf_model.py`, both `Modelfile`s, `spam-classifier-liquid-GGUF/README.md`, and this changelog --- ## [v0.5.2] - 2026-04-16 (GGUF system prompt patch + llama-server fixes) ### Summary Baked the spam classifier system prompt directly into the GGUF model file's `tokenizer.chat_template` metadata so any client (llama.cpp, LM Studio, Ollama) applies the correct behavior without manual configuration. Fixed two llama-server startup bugs introduced by a brew update. ### Changed - `spam-classifier-F16.gguf` — patched `tokenizer.chat_template` to use `"You are an email spam classifier..."` as the default system prompt; done via raw binary rewrite of the GGUF metadata section (string grows by 118 bytes; tensor data section is untouched) - `StartServer.command` — fixed `-fa on` (flag syntax changed in brew b8680; was bare `-fa`, now requires explicit `on`/`off`/`auto`) - `StartServer.command` — added `--webui-config` with `systemMessage` and `temperature` so the llama.cpp Web UI pre-fills the system prompt automatically (the Web UI uses the raw `/completion` endpoint and does not apply the chat template on its own) --- ## [v0.5.1] - 2026-04-16 (consolidated retrain script + memory optimizations) ### Summary Replaced the two separate `retrain-fast.command` and `retrain-full.command` scripts with a single `retrain.command` that prompts for fast or full mode at launch. Applied memory optimizations to `retrain_liquid.py` to reduce MPS GPU pressure during training. Added a top-level `Retrain.command` pipeline script in the LLM Project root that chains retrain → GGUF rebuild → HuggingFace upload. ### Changed - `retrain-fast.command` + `retrain-full.command` → replaced by single `retrain.command` - Double-click launches a menu: f) Fast (~1-1.5 hrs) / u) Full (~2.5-3.5 hrs) / q) Quit - Includes adapter swap prompt with backup logic (same as before, just unified) - Reminds user to run `Retrain.command` in LLM Project root for GGUF rebuild ### Memory optimizations in `retrain_liquid.py` - `activation_offloading=True` — offloads forward-pass activations from MPS to CPU RAM; frees ~25% MPS memory at ~15% speed cost (biggest knob for avoiding OOM) - `torch_empty_cache_steps=50` — flushes the MPS memory pool every 50 optimizer steps; prevents memory fragmentation from causing OOM mid-run - `optim="adamw_torch_fused"` — fused AdamW kernel; slightly faster and lower peak memory than unfused `adamw_torch` - `dataloader_pin_memory=False` — pin_memory is a CUDA optimization that wastes memory on MPS; explicitly disabled - Already enabled: `gradient_checkpointing=True`, `bf16=True`, `MAX_LENGTH=256` ### Added (LLM Project root) - `Retrain.command` — end-to-end pipeline: retrain → swap adapter → rebuild GGUF (clears `merged-liquid-full/` cache so new adapter is actually baked in) → upload adapter + GGUF to HuggingFace → remind to restart llama.cpp server --- ## [v0.5.0] - 2026-04-16 (GGUF merged model + server commands) ### Summary Converted the trained LoRA adapter into a fully merged standalone GGUF file suitable for llama.cpp, Ollama, and LM Studio. Added StartServer.command and StopServer.command for launching the llama.cpp server locally. Uploaded the merged GGUF to a new HuggingFace repo with full platform instructions. ### Added - `merge_and_convert_gguf.py` — merges LoRA adapter into base model weights then converts to GGUF F16 using llama.cpp's convert_hf_to_gguf.py script - `spam-classifier-F16.gguf` (~2.2 GB) — fully merged standalone GGUF; no separate base model or adapter file needed at runtime - `StartServer.command` — double-click launcher for llama.cpp server with all Apple Silicon performance flags (-ngl 99, -fa, --mlock, 8-bit KV cache, perf-core thread pinning) and system prompt injected at startup - `StopServer.command` — kills the server by PID file, falls back to port kill - `Modelfile` — Ollama configuration with system prompt and temperature baked in; allows `ollama create spam-classifier -f Modelfile` for zero-config deployment - `llama-server-config.json` — reference config showing all server flags - `upload_adapter_to_root.py` — uploads adapter files to HF repo root (required by gguf-my-lora Space which expects adapter_config.json at root, not in subfolder) - `upload_merged_gguf.py` — creates and uploads to VoltageVagabond/spam-classifier-liquid-GGUF - `upload_gguf_readme.py` — uploads README with per-platform usage instructions - `verify_gguf_model.py` — tests the GGUF against real test set examples using llama-cpp-python; confirms fine-tuning is active, not just base model behavior ### New HuggingFace repo - `VoltageVagabond/spam-classifier-liquid-GGUF` — merged F16 GGUF with Modelfile, README covering Ollama / LM Studio / llama.cpp server / llama.cpp CLI usage, and educational disclaimer for senior project context ### docs - `docs/08-gguf-conversion-guide.md` — full guide: Option A (gguf-my-lora Space), Option B (local merge + convert), troubleshooting section covering every error encountered (wrong Space, nested subfolder, too many requests, redirect loop, adapter-only GGUF failing to load) - `docs/README.md` — added guide 8 to table of contents ### Key lessons documented - `gguf-my-lora` Space produces an adapter-only GGUF (~8.6 MB), NOT a standalone model — this causes "failed to load" errors in Ollama/LM Studio without `--lora` - The system prompt must match training format exactly or the model falls back to base LFM2.5 general-assistant behavior - GGUF format cannot embed a system prompt in weights — Modelfile (Ollama) is the closest "set it and forget it" workaround for end users --- ## [v0.4.9] - 2026-04-16 (GGUF conversion guide + adapter repo root upload) ### Summary Documented how to convert the trained LoRA adapter to GGUF format so it can be used with llama.cpp, Ollama, and LM Studio. Also fixed the HuggingFace model repo so that adapter files are at the root level (required by the gguf-my-lora Space). ### Added - `docs/08-gguf-conversion-guide.md` — step-by-step guide covering two conversion paths (Option A: gguf-my-lora Space in browser; Option B: merge locally then convert with llama.cpp), plus a full troubleshooting section for every error encountered - `upload_adapter_to_root.py` (project root) — helper script that uploads `adapter_config.json`, `adapter_model.safetensors`, `tokenizer_config.json`, `tokenizer.json`, and `chat_template.jinja` to the root of the `VoltageVagabond/spam-classifier-liquid` HF repo (the gguf-my-lora Space requires `adapter_config.json` at root, not inside an `adapters/` subfolder) - `docs/README.md` updated to include guide 8 in the table of contents ### Issues encountered and fixed (documented in guide 8) - `gguf-my-repo` Space gave "no model_type in config.json" — wrong Space; LoRA adapters need `gguf-my-lora`, not `gguf-my-repo` - `gguf-my-lora` gave "adapter_config.json not found" — adapter files were nested in `adapters/` subfolder on HF, not at repo root; fixed by uploading to root - `gguf-my-repo` showed "too many requests" — Space has 1,900+ likes and gets heavy traffic; workaround is to duplicate the Space to your own account - HuggingFace sign-in redirect loop — caused by stale cookies; fixed by clearing cookies or using incognito window --- ## [v0.4.8] - 2026-04-14 (8-bit KV cache quantization) ### Summary Enable 8-bit quantization for the KV cache at inference time to reduce memory usage without changing model weights or training. ### What changed in app.py - `model.generate()` now passes `cache_implementation="quantized"` and `cache_config={"backend": "hqq", "nbits": 8}`, quantizing both the key and value cache to 8-bit during generation - Used the `hqq` backend (recommended for int8; `quanto` only supports int2/int4) - Model weights remain at BF16; only the runtime KV cache is affected ### What changed in requirements.txt - Added `hqq>=0.2.0` — required package for the HQQ quantization backend --- ## [v0.4.7] - 2026-04-14 (Documentation sync with fine_tune.py) ### Summary Audit pass to bring `README.md` in line with `fine_tune.py`. The README had a stale "Binary classification only" limitation note (3-class has been live since v0.4.0) and an out-of-date batch size, plus it was still quoting the pre-optimization training time. ### Changes to README.md - Training Details table: - Batch size `4` → `1 (effective 4 with gradient accumulation steps = 4)` to match `BATCH_SIZE = 1` and `GRADIENT_ACCUMULATION_STEPS = 4` in fine_tune.py (lines 72-73) - Added explicit rows for Max sequence length (256), Optimizer (`adamw_torch`), Weight dtype (bfloat16), Device (MPS), and Max gradient norm (0.3) to match the code - Training time `~2–2.5 hours` → `~1–1.5 hours` to match the in-code comment on line 241, with a note that the older figure reflected the pre-v0.4.3 config - Limitations: "Binary classification only" note replaced with "Three-class classification (SPAM / HAM / PHISHING) as of v0.4.0" ### Rationale `fine_tune.py` is the source of truth. Values read from the file: ``` LORA_RANK = 8 (line 53) LORA_ALPHA = 16 (line 54) LORA_DROPOUT = 0.1 (line 55) LORA_TARGET_MODULES = 8 (lines 56-68; q/k/v/out_proj, w1/w2/w3, in_proj) NUM_EPOCHS = 3 (line 71) BATCH_SIZE = 1 (line 72) GRADIENT_ACCUMULATION_STEPS= 4 (line 73) LEARNING_RATE = 2e-4 (line 74) MAX_LENGTH = 256 (line 75) optim = "adamw_torch" (line 226) torch_dtype = bfloat16 (line 167) device_map = "mps" (line 166) max_grad_norm = 0.3 (comment / training args) Training time comment = "~1-1.5 hours" (line 241) ``` No code changes, no retraining in this release. --- ## [v0.4.6] - 2026-04-14 (HF Spaces deployment fixes) ### Summary Got the liquid Space (`VoltageVagabond/spam-classifier-liquid`) running on HF after several iterations diagnosing adapter download failures. ### Q&A from this session **Q: Why does the Space log say `Adapters not found at /app/adapters` when the local app works fine?** A: The local `adapters/` directory is git-ignored and never uploaded to the Space (too large + the upload script explicitly excludes it). On HF Spaces the directory doesn't exist, so the app falls through to the "no adapter" code path. **Q: How was that fixed?** A: Added a `snapshot_download` fallback in `app.py`: if local adapters are missing, download them from the `VoltageVagabond/spam-classifier-liquid` model repo at startup. **Q: First attempt got `401 Repository Not Found`. Why?** A: The model repo was set to **private** and the Space had no `HF_TOKEN` secret. The Space container runs anonymously by default, so it couldn't authenticate. Fix: made the model repo public (no token needed). Alternative: keep private and add `HF_TOKEN` as a Space repository secret with read scope. **Q: Next error: `Can't find 'adapter_config.json' at '/root/.cache/.../snapshots/...'`. Why?** A: The model repo doesn't store adapter files at the root — they're nested under `adapters_fast/`, `adapters_full/`, `adapters_backup/`. The download succeeded but `PeftModel.from_pretrained` looked at the snapshot root and couldn't find `adapter_config.json`. Fix: use `allow_patterns=["adapters_fast/*"]` and set `ADAPTER_PATH = snapshot_path / "adapters_fast"` so PEFT loads from the right subdir. **Q: Why is classification slow on HF but fast locally?** A: HF free tier (`cpu-basic`) is 2 vCPUs, 16 GB RAM, no GPU. Local Mac uses Apple Silicon Metal/MPS acceleration. A 1.2B-param transformer on CPU is just slow. Realistic speedups (high → low impact): 1. Upgrade Space to a T4 GPU (~$0.40/hr, only billed when running) 2. 4-bit quantization via `bitsandbytes` (~2-3× faster on CPU) 3. Reduce `max_tokens` from 750 → ~100 (you only need SPAM/HAM) 4. `model.merge_and_unload()` — bake LoRA into base model, removes per-call overhead 5. Switch to GGUF + llama-cpp-python — significantly faster than HF transformers on CPU **Q: Why does the model repo need to be public for the Space to work?** A: The Space container runs anonymously. Public repo = anonymous downloads work. Private repo = need an authenticated `HF_TOKEN` secret in the Space settings. The Space being public/private is independent — that controls who can view the demo, not what the container can fetch. ### Changes - `app.py` — added `snapshot_download` fallback that pulls from the HF model repo when local adapters are missing - `app.py` — passes `os.environ.get("HF_TOKEN")` to `snapshot_download` so the same code path works for both public and private model repos - `app.py` — `allow_patterns=["adapters_fast/*"]` and `ADAPTER_PATH` now points at the `adapters_fast/` subdirectory inside the downloaded snapshot --- ## [v0.4.5] - 2026-04-14 ### Beginner-Code Compliance — app.py Refactored `app.py` to match the beginner-friendly coding style used in course lecture notebooks. **What changed:** - Replaced 3 lambda functions in Gradio event handlers with named functions (`make_example_handler`, `clear_input`) - Replaced ternary operator for emoji selection with explicit `if/else` block - No behavior changes — all Gradio event wiring, feedback logging, and chat logic unchanged ## [v0.4.4] - 2026-04-14 ### Chat App Upgrade — app.py Replaced the two-tab Gradio app (Classify + Chat) with a polished chat-only interface. **What changed:** - Removed the Classify tab entirely — chat is now the full interface - Added HTML topbar with project title, model name, and badge pills (matches XAI project style) - Added clickable example prompt buttons (spam, ham, phishing) that populate the input - Added 👍 / 👎 feedback buttons that log to `data/feedback/feedback_log.csv` - CSV columns: `timestamp`, `user_input`, `model_response`, `rating` - Feedback status resets after each new submission - Increased `max_tokens` from 500 → 750 to reduce mid-sentence cutoffs - Fixed Gradio 6 compatibility: `theme`/`css` moved to `launch()`, `gr.Chatbot` returns full history list - Paths anchored to `Path(__file__).parent` so the app works from any launch directory - Updated `Dockerfile`: consolidated to install deps from `requirements.txt`, removed redundant pip install lines ## [v0.4.3] - 2026-04-07 ### Memory & Speed Optimization — fine_tune.py Reduced peak memory usage from ~50 GB to a target of ~8–14 GB by changing five training parameters. No change to model architecture or LoRA adapter structure — accuracy is unaffected. | Parameter | Before | After | Why | |-----------|--------|-------|-----| | `BATCH_SIZE` | 4 | 1 | Smaller batch = 4× less activation memory per step | | `GRADIENT_ACCUMULATION_STEPS` | 1 | 4 | Keeps effective batch size at 4 so training dynamics are unchanged | | `MAX_LENGTH` | 512 | 256 | Attention memory scales O(n²) with sequence length — halving it cuts ~4× attention memory; spam emails rarely exceed 256 tokens | | `optim` | `adamw` (default) | `adamw_8bit` | Adam optimizer normally stores 2 full float32 copies of every parameter for momentum tracking (~9.6 GB for a 1.2B model); 8-bit Adam quantizes those to 8-bit integers with negligible quality loss (~75% reduction) | | `torch_dtype` | `"auto"` | `torch.bfloat16` | Forces model weights to load in bfloat16 (2 bytes/param) instead of float32 (4 bytes/param), halving weight memory; bfloat16 has the same exponent range as float32 so training stability is preserved | | `device_map` | `"auto"` | `"mps"` | Pins all layers to the MPS GPU; `"auto"` can spill layers to CPU causing slow cross-device copies and inflated memory readings | | `gradient_checkpointing_kwargs` | not set | `{"use_reentrant": False}` | Suppresses deprecation warning on newer PyTorch; no behavior change | | `max_grad_norm` | not set | `0.3` | Clips gradient norms to prevent occasional instability spikes during training | **Why quality is unaffected:** - 8-bit Adam was validated by Dettmers et al. (2022) to match full-precision Adam loss curves on LLM fine-tuning - bfloat16 was designed specifically for training — same exponent range as float32, just less mantissa precision - Effective batch size (1 × 4 accumulation = 4) is identical to the original (4 × 1) - 256 tokens covers the vast majority of spam/ham emails in this dataset ## [v0.4.2] - 2026-04-07 ### Updated — Training Data Pipeline - **Added puyang2025/seven-phishing-email-datasets and zefang-liu/phishing-email-dataset** as additional sources in `build_liquid_datasets.py` — parquets generated by the spam-xai-project sibling and shared across all three classifier projects - **Updated data counts** in `retrain-fast.command` and `retrain-full.command` to reflect new ~190K source pool ## [v0.4.1] - 2026-03-28 ### Retrain Commands with Adapter Swap - `retrain-fast.command` and `retrain-full.command` now prompt after training to swap the new adapter as the default - Selecting "y" backs up `adapters/` to `adapters_backup/` and copies the new adapter in - App and notebook automatically use whichever adapter is in `adapters/` - Old `retrain.command` (2-class, 4K examples) removed — replaced by fast/full versions ## [v0.4.0] - 2026-03-28 ### Added — 3-Class Training Data + HuggingFace Upload - **NEW: Phishing detection** — model can now classify as SPAM, HAM, or PHISHING (previously binary only) - Prepared two new training datasets from 5 combined sources: - **FAST** (8,000 examples): ~1 hr retrain — `new_training_data/liquid_fast/` - **FULL** (20,000 examples): ~3 hr retrain — `new_training_data/liquid_full/` - Data sources: existing 4K FaroukMoc2 + locuoco 250K (HF) + ealvaradob phishing (HF) + luongnv89 phishing with reasoning (HF) + Enron - Added `retrain_liquid.py` script with `--mode fast` and `--mode full` (saves to `adapters_fast/` or `adapters_full/`) - Uploaded project to HuggingFace: `VoltageVagabond/spam-classifier-liquid` (model repo) - Created HuggingFace Space: `VoltageVagabond/spam-classifier-liquid-space` (Docker + Gradio demo) - Created `README.md` with HF model card metadata and `Dockerfile` for HF Space - Uploaded complete dataset to HF: `VoltageVagabond/spam-email-dataset` with all raw sources ## [v0.3.2] - 2026-03-28 ### Fixed - Fixed `ValueError: train_dataset is required` crash during evaluation step — SFTTrainer requires `train_dataset` even for eval-only usage ### Added - `--eval-only` flag for `fine_tune.py` — loads saved adapter and runs evaluation + generation test without retraining (~minutes instead of ~2 hours) - `evaluate.command` — double-click launcher for eval-only mode ## [v0.3.1] - 2026-03-27 ### Updated - Corrected training time estimates across all files: - Notebook (1 epoch): ~45 minutes on Apple Silicon - fine_tune.py (3 epochs): ~2-2.5 hours on Apple Silicon - Slowdown vs v0.2.0 due to targeting 8 module types instead of 4 (better quality, more compute per step) - Fixed training data counts in setup guide (3,200 train / 800 test, not 500/100) - Added training time comparison table to training guide - Added batch size 4 saturation note to tuning tips - Added `docs/07-code-sources-reference.md` — every source, citation, and empirical finding for paper writing ## [v0.3.0] - 2026-03-27 ### Changed — LoRA config aligned with Liquid AI official cookbook - **Source:** [Liquid4All/cookbook](https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb) - Target modules expanded from 4 (attention only) to 8 (attention + GLU + conv): - Attention: `q_proj`, `k_proj`, `v_proj`, `out_proj` - Feed-forward GLU: `w1`, `w2`, `w3` - Conv: `in_proj` - LoRA rank 32 → 8, alpha 64 → 16 (matching cookbook values) - Dropout 0.05 → 0.1 (matching cookbook) - Fixed `o_proj` → `out_proj` (correct layer name for LFM2 architecture) ## [v0.2.1] - 2026-03-27 ### Note - Verified Liquid AI version does NOT have the orphaned port issue that affected the MLX version - PyTorch loads the model directly into the Python process — no child servers spawned - When the app exits, all model memory is freed automatically - No cleanup trap needed (unlike MLX version which spawns llama-server processes) ## [v0.2.0] - 2026-03-27 ### Changed - Increased batch size from 1 to 4 for faster training (parallel processing on MPS) - Increased LoRA rank from 16 to 32 (and alpha from 32 to 64) for better adapter quality - Removed gradient accumulation (not needed with batch size 4) - Memory usage ~7-8 GB (comfortable on 24 GB Apple Silicon) ### Tested and reverted - Batch size 8 tested — MPS GPU saturates at batch size 4, no speed gain beyond that. Steps halved but each step took 2x longer. Batch size 4 is the sweet spot for Apple Silicon. ## [v0.1.1] - 2026-03-27 ### Fixed - Renamed `max_seq_length` to `max_length` in fine_tune.py, notebook, and docs for TRL v0.29 compatibility - Fixed `launch-notebook.command` not showing Jupyter install errors - Added model loading time note (30-60 seconds) to `launch UI.command` ## [v0.1.0] - 2026-03-27 ### Added - Project scaffolding (requirements and gitignore) - Training data copied from MLX sibling project - `fine_tune.py` — LoRA fine-tuning via TRL SFTTrainer (Liquid AI's official method) - `app.py` — Gradio web UI with Classify and Chat tabs - `.command` launcher scripts for macOS - Beginner-friendly documentation (6 guides) - Interactive Jupyter notebook walkthrough