spam-classifier-liquid / CHANGELOG.md
VoltageVagabond's picture
Update notebooks and project files
c2becd4 verified
|
raw
history blame
23.9 kB

Changelog

All notable changes to the Liquid AI Spam Classifier project.

[v0.5.9] - 2026-04-16 (retrain script fixes: NaN gradients + accurate example counts)

Summary

Fixed gradient explosion (NaN loss) that crashed full retrains on Apple Silicon, and corrected stale example counts and time estimates across all retrain command files.

Fixed

  • retrain_liquid.pyACTIVATION_OFFLOADING = True caused AssertionError: Torch not compiled with CUDA enabled; TRL's OffloadActivations is CUDA-only and crashes on MPS. Set to False.
  • retrain_liquid.py — gradient explosion (loss: 0, grad_norm: nan, entropy: nan) caused by learning rate too aggressive without activation offloading. Fixed:
    • LEARNING_RATE: 2e-45e-5
    • max_grad_norm: 0.31.0
    • Added warmup_steps=100 to ramp LR gradually and prevent early gradient spikes
  • retrain.command (liquid), Retrain.command, spam-classifier-mlx/retrain.command — menu displayed stale example counts (20,000) and time estimates (2.5-3.5 hrs) that didn't match actual dataset sizes. Corrected to actual counts (liquid/mlx full: ~16,000; mlx fast: ~6,800) and recalculated time estimates.
  • retrain.command (liquid) — removed "activation offloading" from memory optimizations note since it is now disabled on MPS.

Changed

  • spam-classifier-liquid/spam_classifier_liquid.ipynb — added torch_empty_cache_steps=50 and dataloader_pin_memory=False to notebook SFTConfig to match retrain script

[v0.5.3] - 2026-04-16 (GGUF rename to spam-classifier-F16.gguf)

Summary

Renamed the local and HuggingFace GGUF file to spam-classifier-F16.gguf so that HuggingFace's model card parser can detect the quantization type (F16) and display it in the GGUF variants widget. Updated all local file references accordingly.

Changed

  • spam-classifier.ggufspam-classifier-F16.gguf (local file rename)
  • VoltageVagabond/spam-classifier-liquid-GGUF — deleted spam-classifier.gguf, uploaded spam-classifier-F16.gguf, updated README
  • Updated all references in StartServer.command, Retrain.command, merge_and_convert_gguf.py, verify_gguf_model.py, both Modelfiles, spam-classifier-liquid-GGUF/README.md, and this changelog

[v0.5.2] - 2026-04-16 (GGUF system prompt patch + llama-server fixes)

Summary

Baked the spam classifier system prompt directly into the GGUF model file's tokenizer.chat_template metadata so any client (llama.cpp, LM Studio, Ollama) applies the correct behavior without manual configuration. Fixed two llama-server startup bugs introduced by a brew update.

Changed

  • spam-classifier-F16.gguf — patched tokenizer.chat_template to use "You are an email spam classifier..." as the default system prompt; done via raw binary rewrite of the GGUF metadata section (string grows by 118 bytes; tensor data section is untouched)
  • StartServer.command — fixed -fa on (flag syntax changed in brew b8680; was bare -fa, now requires explicit on/off/auto)
  • StartServer.command — added --webui-config with systemMessage and temperature so the llama.cpp Web UI pre-fills the system prompt automatically (the Web UI uses the raw /completion endpoint and does not apply the chat template on its own)

[v0.5.1] - 2026-04-16 (consolidated retrain script + memory optimizations)

Summary

Replaced the two separate retrain-fast.command and retrain-full.command scripts with a single retrain.command that prompts for fast or full mode at launch. Applied memory optimizations to retrain_liquid.py to reduce MPS GPU pressure during training. Added a top-level Retrain.command pipeline script in the LLM Project root that chains retrain → GGUF rebuild → HuggingFace upload.

Changed

  • retrain-fast.command + retrain-full.command → replaced by single retrain.command
    • Double-click launches a menu: f) Fast (1-1.5 hrs) / u) Full (2.5-3.5 hrs) / q) Quit
    • Includes adapter swap prompt with backup logic (same as before, just unified)
    • Reminds user to run Retrain.command in LLM Project root for GGUF rebuild

Memory optimizations in retrain_liquid.py

  • activation_offloading=True — offloads forward-pass activations from MPS to CPU RAM; frees ~25% MPS memory at ~15% speed cost (biggest knob for avoiding OOM)
  • torch_empty_cache_steps=50 — flushes the MPS memory pool every 50 optimizer steps; prevents memory fragmentation from causing OOM mid-run
  • optim="adamw_torch_fused" — fused AdamW kernel; slightly faster and lower peak memory than unfused adamw_torch
  • dataloader_pin_memory=False — pin_memory is a CUDA optimization that wastes memory on MPS; explicitly disabled
  • Already enabled: gradient_checkpointing=True, bf16=True, MAX_LENGTH=256

Added (LLM Project root)

  • Retrain.command — end-to-end pipeline: retrain → swap adapter → rebuild GGUF (clears merged-liquid-full/ cache so new adapter is actually baked in) → upload adapter + GGUF to HuggingFace → remind to restart llama.cpp server

[v0.5.0] - 2026-04-16 (GGUF merged model + server commands)

Summary

Converted the trained LoRA adapter into a fully merged standalone GGUF file suitable for llama.cpp, Ollama, and LM Studio. Added StartServer.command and StopServer.command for launching the llama.cpp server locally. Uploaded the merged GGUF to a new HuggingFace repo with full platform instructions.

Added

  • merge_and_convert_gguf.py — merges LoRA adapter into base model weights then converts to GGUF F16 using llama.cpp's convert_hf_to_gguf.py script
  • spam-classifier-F16.gguf (~2.2 GB) — fully merged standalone GGUF; no separate base model or adapter file needed at runtime
  • StartServer.command — double-click launcher for llama.cpp server with all Apple Silicon performance flags (-ngl 99, -fa, --mlock, 8-bit KV cache, perf-core thread pinning) and system prompt injected at startup
  • StopServer.command — kills the server by PID file, falls back to port kill
  • Modelfile — Ollama configuration with system prompt and temperature baked in; allows ollama create spam-classifier -f Modelfile for zero-config deployment
  • llama-server-config.json — reference config showing all server flags
  • upload_adapter_to_root.py — uploads adapter files to HF repo root (required by gguf-my-lora Space which expects adapter_config.json at root, not in subfolder)
  • upload_merged_gguf.py — creates and uploads to VoltageVagabond/spam-classifier-liquid-GGUF
  • upload_gguf_readme.py — uploads README with per-platform usage instructions
  • verify_gguf_model.py — tests the GGUF against real test set examples using llama-cpp-python; confirms fine-tuning is active, not just base model behavior

New HuggingFace repo

  • VoltageVagabond/spam-classifier-liquid-GGUF — merged F16 GGUF with Modelfile, README covering Ollama / LM Studio / llama.cpp server / llama.cpp CLI usage, and educational disclaimer for senior project context

docs

  • docs/08-gguf-conversion-guide.md — full guide: Option A (gguf-my-lora Space), Option B (local merge + convert), troubleshooting section covering every error encountered (wrong Space, nested subfolder, too many requests, redirect loop, adapter-only GGUF failing to load)
  • docs/README.md — added guide 8 to table of contents

Key lessons documented

  • gguf-my-lora Space produces an adapter-only GGUF (~8.6 MB), NOT a standalone model — this causes "failed to load" errors in Ollama/LM Studio without --lora
  • The system prompt must match training format exactly or the model falls back to base LFM2.5 general-assistant behavior
  • GGUF format cannot embed a system prompt in weights — Modelfile (Ollama) is the closest "set it and forget it" workaround for end users

[v0.4.9] - 2026-04-16 (GGUF conversion guide + adapter repo root upload)

Summary

Documented how to convert the trained LoRA adapter to GGUF format so it can be used with llama.cpp, Ollama, and LM Studio. Also fixed the HuggingFace model repo so that adapter files are at the root level (required by the gguf-my-lora Space).

Added

  • docs/08-gguf-conversion-guide.md — step-by-step guide covering two conversion paths (Option A: gguf-my-lora Space in browser; Option B: merge locally then convert with llama.cpp), plus a full troubleshooting section for every error encountered
  • upload_adapter_to_root.py (project root) — helper script that uploads adapter_config.json, adapter_model.safetensors, tokenizer_config.json, tokenizer.json, and chat_template.jinja to the root of the VoltageVagabond/spam-classifier-liquid HF repo (the gguf-my-lora Space requires adapter_config.json at root, not inside an adapters/ subfolder)
  • docs/README.md updated to include guide 8 in the table of contents

Issues encountered and fixed (documented in guide 8)

  • gguf-my-repo Space gave "no model_type in config.json" — wrong Space; LoRA adapters need gguf-my-lora, not gguf-my-repo
  • gguf-my-lora gave "adapter_config.json not found" — adapter files were nested in adapters/ subfolder on HF, not at repo root; fixed by uploading to root
  • gguf-my-repo showed "too many requests" — Space has 1,900+ likes and gets heavy traffic; workaround is to duplicate the Space to your own account
  • HuggingFace sign-in redirect loop — caused by stale cookies; fixed by clearing cookies or using incognito window

[v0.4.8] - 2026-04-14 (8-bit KV cache quantization)

Summary

Enable 8-bit quantization for the KV cache at inference time to reduce memory usage without changing model weights or training.

What changed in app.py

  • model.generate() now passes cache_implementation="quantized" and cache_config={"backend": "hqq", "nbits": 8}, quantizing both the key and value cache to 8-bit during generation
  • Used the hqq backend (recommended for int8; quanto only supports int2/int4)
  • Model weights remain at BF16; only the runtime KV cache is affected

What changed in requirements.txt

  • Added hqq>=0.2.0 — required package for the HQQ quantization backend

[v0.4.7] - 2026-04-14 (Documentation sync with fine_tune.py)

Summary

Audit pass to bring README.md in line with fine_tune.py. The README had a stale "Binary classification only" limitation note (3-class has been live since v0.4.0) and an out-of-date batch size, plus it was still quoting the pre-optimization training time.

Changes to README.md

  • Training Details table:
    • Batch size 41 (effective 4 with gradient accumulation steps = 4) to match BATCH_SIZE = 1 and GRADIENT_ACCUMULATION_STEPS = 4 in fine_tune.py (lines 72-73)
    • Added explicit rows for Max sequence length (256), Optimizer (adamw_torch), Weight dtype (bfloat16), Device (MPS), and Max gradient norm (0.3) to match the code
    • Training time ~2–2.5 hours~1–1.5 hours to match the in-code comment on line 241, with a note that the older figure reflected the pre-v0.4.3 config
  • Limitations: "Binary classification only" note replaced with "Three-class classification (SPAM / HAM / PHISHING) as of v0.4.0"

Rationale

fine_tune.py is the source of truth. Values read from the file:

LORA_RANK                  = 8     (line 53)
LORA_ALPHA                 = 16    (line 54)
LORA_DROPOUT               = 0.1   (line 55)
LORA_TARGET_MODULES        = 8     (lines 56-68; q/k/v/out_proj, w1/w2/w3, in_proj)
NUM_EPOCHS                 = 3     (line 71)
BATCH_SIZE                 = 1     (line 72)
GRADIENT_ACCUMULATION_STEPS= 4     (line 73)
LEARNING_RATE              = 2e-4  (line 74)
MAX_LENGTH                 = 256   (line 75)
optim                      = "adamw_torch"   (line 226)
torch_dtype                = bfloat16         (line 167)
device_map                 = "mps"            (line 166)
max_grad_norm              = 0.3              (comment / training args)
Training time comment      = "~1-1.5 hours"   (line 241)

No code changes, no retraining in this release.


[v0.4.6] - 2026-04-14 (HF Spaces deployment fixes)

Summary

Got the liquid Space (VoltageVagabond/spam-classifier-liquid) running on HF after several iterations diagnosing adapter download failures.

Q&A from this session

Q: Why does the Space log say Adapters not found at /app/adapters when the local app works fine? A: The local adapters/ directory is git-ignored and never uploaded to the Space (too large + the upload script explicitly excludes it). On HF Spaces the directory doesn't exist, so the app falls through to the "no adapter" code path.

Q: How was that fixed? A: Added a snapshot_download fallback in app.py: if local adapters are missing, download them from the VoltageVagabond/spam-classifier-liquid model repo at startup.

Q: First attempt got 401 Repository Not Found. Why? A: The model repo was set to private and the Space had no HF_TOKEN secret. The Space container runs anonymously by default, so it couldn't authenticate. Fix: made the model repo public (no token needed). Alternative: keep private and add HF_TOKEN as a Space repository secret with read scope.

Q: Next error: Can't find 'adapter_config.json' at '/root/.cache/.../snapshots/...'. Why? A: The model repo doesn't store adapter files at the root — they're nested under adapters_fast/, adapters_full/, adapters_backup/. The download succeeded but PeftModel.from_pretrained looked at the snapshot root and couldn't find adapter_config.json. Fix: use allow_patterns=["adapters_fast/*"] and set ADAPTER_PATH = snapshot_path / "adapters_fast" so PEFT loads from the right subdir.

Q: Why is classification slow on HF but fast locally? A: HF free tier (cpu-basic) is 2 vCPUs, 16 GB RAM, no GPU. Local Mac uses Apple Silicon Metal/MPS acceleration. A 1.2B-param transformer on CPU is just slow. Realistic speedups (high → low impact):

  1. Upgrade Space to a T4 GPU (~$0.40/hr, only billed when running)
  2. 4-bit quantization via bitsandbytes (~2-3× faster on CPU)
  3. Reduce max_tokens from 750 → ~100 (you only need SPAM/HAM)
  4. model.merge_and_unload() — bake LoRA into base model, removes per-call overhead
  5. Switch to GGUF + llama-cpp-python — significantly faster than HF transformers on CPU

Q: Why does the model repo need to be public for the Space to work? A: The Space container runs anonymously. Public repo = anonymous downloads work. Private repo = need an authenticated HF_TOKEN secret in the Space settings. The Space being public/private is independent — that controls who can view the demo, not what the container can fetch.

Changes

  • app.py — added snapshot_download fallback that pulls from the HF model repo when local adapters are missing
  • app.py — passes os.environ.get("HF_TOKEN") to snapshot_download so the same code path works for both public and private model repos
  • app.pyallow_patterns=["adapters_fast/*"] and ADAPTER_PATH now points at the adapters_fast/ subdirectory inside the downloaded snapshot

[v0.4.5] - 2026-04-14

Beginner-Code Compliance — app.py

Refactored app.py to match the beginner-friendly coding style used in course lecture notebooks.

What changed:

  • Replaced 3 lambda functions in Gradio event handlers with named functions (make_example_handler, clear_input)
  • Replaced ternary operator for emoji selection with explicit if/else block
  • No behavior changes — all Gradio event wiring, feedback logging, and chat logic unchanged

[v0.4.4] - 2026-04-14

Chat App Upgrade — app.py

Replaced the two-tab Gradio app (Classify + Chat) with a polished chat-only interface.

What changed:

  • Removed the Classify tab entirely — chat is now the full interface
  • Added HTML topbar with project title, model name, and badge pills (matches XAI project style)
  • Added clickable example prompt buttons (spam, ham, phishing) that populate the input
  • Added 👍 / 👎 feedback buttons that log to data/feedback/feedback_log.csv
    • CSV columns: timestamp, user_input, model_response, rating
    • Feedback status resets after each new submission
  • Increased max_tokens from 500 → 750 to reduce mid-sentence cutoffs
  • Fixed Gradio 6 compatibility: theme/css moved to launch(), gr.Chatbot returns full history list
  • Paths anchored to Path(__file__).parent so the app works from any launch directory
  • Updated Dockerfile: consolidated to install deps from requirements.txt, removed redundant pip install lines

[v0.4.3] - 2026-04-07

Memory & Speed Optimization — fine_tune.py

Reduced peak memory usage from ~50 GB to a target of ~8–14 GB by changing five training parameters. No change to model architecture or LoRA adapter structure — accuracy is unaffected.

Parameter Before After Why
BATCH_SIZE 4 1 Smaller batch = 4× less activation memory per step
GRADIENT_ACCUMULATION_STEPS 1 4 Keeps effective batch size at 4 so training dynamics are unchanged
MAX_LENGTH 512 256 Attention memory scales O(n²) with sequence length — halving it cuts ~4× attention memory; spam emails rarely exceed 256 tokens
optim adamw (default) adamw_8bit Adam optimizer normally stores 2 full float32 copies of every parameter for momentum tracking (9.6 GB for a 1.2B model); 8-bit Adam quantizes those to 8-bit integers with negligible quality loss (75% reduction)
torch_dtype "auto" torch.bfloat16 Forces model weights to load in bfloat16 (2 bytes/param) instead of float32 (4 bytes/param), halving weight memory; bfloat16 has the same exponent range as float32 so training stability is preserved
device_map "auto" "mps" Pins all layers to the MPS GPU; "auto" can spill layers to CPU causing slow cross-device copies and inflated memory readings
gradient_checkpointing_kwargs not set {"use_reentrant": False} Suppresses deprecation warning on newer PyTorch; no behavior change
max_grad_norm not set 0.3 Clips gradient norms to prevent occasional instability spikes during training

Why quality is unaffected:

  • 8-bit Adam was validated by Dettmers et al. (2022) to match full-precision Adam loss curves on LLM fine-tuning
  • bfloat16 was designed specifically for training — same exponent range as float32, just less mantissa precision
  • Effective batch size (1 × 4 accumulation = 4) is identical to the original (4 × 1)
  • 256 tokens covers the vast majority of spam/ham emails in this dataset

[v0.4.2] - 2026-04-07

Updated — Training Data Pipeline

  • Added puyang2025/seven-phishing-email-datasets and zefang-liu/phishing-email-dataset as additional sources in build_liquid_datasets.py — parquets generated by the spam-xai-project sibling and shared across all three classifier projects
  • Updated data counts in retrain-fast.command and retrain-full.command to reflect new ~190K source pool

[v0.4.1] - 2026-03-28

Retrain Commands with Adapter Swap

  • retrain-fast.command and retrain-full.command now prompt after training to swap the new adapter as the default
  • Selecting "y" backs up adapters/ to adapters_backup/ and copies the new adapter in
  • App and notebook automatically use whichever adapter is in adapters/
  • Old retrain.command (2-class, 4K examples) removed — replaced by fast/full versions

[v0.4.0] - 2026-03-28

Added — 3-Class Training Data + HuggingFace Upload

  • NEW: Phishing detection — model can now classify as SPAM, HAM, or PHISHING (previously binary only)
  • Prepared two new training datasets from 5 combined sources:
    • FAST (8,000 examples): ~1 hr retrain — new_training_data/liquid_fast/
    • FULL (20,000 examples): ~3 hr retrain — new_training_data/liquid_full/
  • Data sources: existing 4K FaroukMoc2 + locuoco 250K (HF) + ealvaradob phishing (HF) + luongnv89 phishing with reasoning (HF) + Enron
  • Added retrain_liquid.py script with --mode fast and --mode full (saves to adapters_fast/ or adapters_full/)
  • Uploaded project to HuggingFace: VoltageVagabond/spam-classifier-liquid (model repo)
  • Created HuggingFace Space: VoltageVagabond/spam-classifier-liquid-space (Docker + Gradio demo)
  • Created README.md with HF model card metadata and Dockerfile for HF Space
  • Uploaded complete dataset to HF: VoltageVagabond/spam-email-dataset with all raw sources

[v0.3.2] - 2026-03-28

Fixed

  • Fixed ValueError: train_dataset is required crash during evaluation step — SFTTrainer requires train_dataset even for eval-only usage

Added

  • --eval-only flag for fine_tune.py — loads saved adapter and runs evaluation + generation test without retraining (~minutes instead of ~2 hours)
  • evaluate.command — double-click launcher for eval-only mode

[v0.3.1] - 2026-03-27

Updated

  • Corrected training time estimates across all files:
    • Notebook (1 epoch): ~45 minutes on Apple Silicon
    • fine_tune.py (3 epochs): ~2-2.5 hours on Apple Silicon
    • Slowdown vs v0.2.0 due to targeting 8 module types instead of 4 (better quality, more compute per step)
  • Fixed training data counts in setup guide (3,200 train / 800 test, not 500/100)
  • Added training time comparison table to training guide
  • Added batch size 4 saturation note to tuning tips
  • Added docs/07-code-sources-reference.md — every source, citation, and empirical finding for paper writing

[v0.3.0] - 2026-03-27

Changed — LoRA config aligned with Liquid AI official cookbook

  • Source: Liquid4All/cookbook
  • Target modules expanded from 4 (attention only) to 8 (attention + GLU + conv):
    • Attention: q_proj, k_proj, v_proj, out_proj
    • Feed-forward GLU: w1, w2, w3
    • Conv: in_proj
  • LoRA rank 32 → 8, alpha 64 → 16 (matching cookbook values)
  • Dropout 0.05 → 0.1 (matching cookbook)
  • Fixed o_projout_proj (correct layer name for LFM2 architecture)

[v0.2.1] - 2026-03-27

Note

  • Verified Liquid AI version does NOT have the orphaned port issue that affected the MLX version
    • PyTorch loads the model directly into the Python process — no child servers spawned
    • When the app exits, all model memory is freed automatically
    • No cleanup trap needed (unlike MLX version which spawns llama-server processes)

[v0.2.0] - 2026-03-27

Changed

  • Increased batch size from 1 to 4 for faster training (parallel processing on MPS)
  • Increased LoRA rank from 16 to 32 (and alpha from 32 to 64) for better adapter quality
  • Removed gradient accumulation (not needed with batch size 4)
  • Memory usage ~7-8 GB (comfortable on 24 GB Apple Silicon)

Tested and reverted

  • Batch size 8 tested — MPS GPU saturates at batch size 4, no speed gain beyond that. Steps halved but each step took 2x longer. Batch size 4 is the sweet spot for Apple Silicon.

[v0.1.1] - 2026-03-27

Fixed

  • Renamed max_seq_length to max_length in fine_tune.py, notebook, and docs for TRL v0.29 compatibility
  • Fixed launch-notebook.command not showing Jupyter install errors
  • Added model loading time note (30-60 seconds) to launch UI.command

[v0.1.0] - 2026-03-27

Added

  • Project scaffolding (requirements and gitignore)
  • Training data copied from MLX sibling project
  • fine_tune.py — LoRA fine-tuning via TRL SFTTrainer (Liquid AI's official method)
  • app.py — Gradio web UI with Classify and Chat tabs
  • .command launcher scripts for macOS
  • Beginner-friendly documentation (6 guides)
  • Interactive Jupyter notebook walkthrough