Spaces:
Sleeping
Changelog
All notable changes to the Liquid AI Spam Classifier project.
[v0.5.9] - 2026-04-16 (retrain script fixes: NaN gradients + accurate example counts)
Summary
Fixed gradient explosion (NaN loss) that crashed full retrains on Apple Silicon, and corrected stale example counts and time estimates across all retrain command files.
Fixed
retrain_liquid.py—ACTIVATION_OFFLOADING = TruecausedAssertionError: Torch not compiled with CUDA enabled; TRL'sOffloadActivationsis CUDA-only and crashes on MPS. Set toFalse.retrain_liquid.py— gradient explosion (loss: 0,grad_norm: nan,entropy: nan) caused by learning rate too aggressive without activation offloading. Fixed:LEARNING_RATE:2e-4→5e-5max_grad_norm:0.3→1.0- Added
warmup_steps=100to ramp LR gradually and prevent early gradient spikes
retrain.command(liquid),Retrain.command,spam-classifier-mlx/retrain.command— menu displayed stale example counts (20,000) and time estimates (2.5-3.5 hrs) that didn't match actual dataset sizes. Corrected to actual counts (liquid/mlx full: ~16,000; mlx fast: ~6,800) and recalculated time estimates.retrain.command(liquid) — removed "activation offloading" from memory optimizations note since it is now disabled on MPS.
Changed
spam-classifier-liquid/spam_classifier_liquid.ipynb— addedtorch_empty_cache_steps=50anddataloader_pin_memory=Falseto notebookSFTConfigto match retrain script
[v0.5.3] - 2026-04-16 (GGUF rename to spam-classifier-F16.gguf)
Summary
Renamed the local and HuggingFace GGUF file to spam-classifier-F16.gguf so that
HuggingFace's model card parser can detect the quantization type (F16) and display
it in the GGUF variants widget. Updated all local file references accordingly.
Changed
spam-classifier.gguf→spam-classifier-F16.gguf(local file rename)VoltageVagabond/spam-classifier-liquid-GGUF— deletedspam-classifier.gguf, uploadedspam-classifier-F16.gguf, updated README- Updated all references in
StartServer.command,Retrain.command,merge_and_convert_gguf.py,verify_gguf_model.py, bothModelfiles,spam-classifier-liquid-GGUF/README.md, and this changelog
[v0.5.2] - 2026-04-16 (GGUF system prompt patch + llama-server fixes)
Summary
Baked the spam classifier system prompt directly into the GGUF model file's
tokenizer.chat_template metadata so any client (llama.cpp, LM Studio, Ollama)
applies the correct behavior without manual configuration. Fixed two llama-server
startup bugs introduced by a brew update.
Changed
spam-classifier-F16.gguf— patchedtokenizer.chat_templateto use"You are an email spam classifier..."as the default system prompt; done via raw binary rewrite of the GGUF metadata section (string grows by 118 bytes; tensor data section is untouched)StartServer.command— fixed-fa on(flag syntax changed in brew b8680; was bare-fa, now requires expliciton/off/auto)StartServer.command— added--webui-configwithsystemMessageandtemperatureso the llama.cpp Web UI pre-fills the system prompt automatically (the Web UI uses the raw/completionendpoint and does not apply the chat template on its own)
[v0.5.1] - 2026-04-16 (consolidated retrain script + memory optimizations)
Summary
Replaced the two separate retrain-fast.command and retrain-full.command scripts
with a single retrain.command that prompts for fast or full mode at launch.
Applied memory optimizations to retrain_liquid.py to reduce MPS GPU pressure
during training. Added a top-level Retrain.command pipeline script in the LLM
Project root that chains retrain → GGUF rebuild → HuggingFace upload.
Changed
retrain-fast.command+retrain-full.command→ replaced by singleretrain.command- Double-click launches a menu: f) Fast (
1-1.5 hrs) / u) Full (2.5-3.5 hrs) / q) Quit - Includes adapter swap prompt with backup logic (same as before, just unified)
- Reminds user to run
Retrain.commandin LLM Project root for GGUF rebuild
- Double-click launches a menu: f) Fast (
Memory optimizations in retrain_liquid.py
activation_offloading=True— offloads forward-pass activations from MPS to CPU RAM; frees ~25% MPS memory at ~15% speed cost (biggest knob for avoiding OOM)torch_empty_cache_steps=50— flushes the MPS memory pool every 50 optimizer steps; prevents memory fragmentation from causing OOM mid-runoptim="adamw_torch_fused"— fused AdamW kernel; slightly faster and lower peak memory than unfusedadamw_torchdataloader_pin_memory=False— pin_memory is a CUDA optimization that wastes memory on MPS; explicitly disabled- Already enabled:
gradient_checkpointing=True,bf16=True,MAX_LENGTH=256
Added (LLM Project root)
Retrain.command— end-to-end pipeline: retrain → swap adapter → rebuild GGUF (clearsmerged-liquid-full/cache so new adapter is actually baked in) → upload adapter + GGUF to HuggingFace → remind to restart llama.cpp server
[v0.5.0] - 2026-04-16 (GGUF merged model + server commands)
Summary
Converted the trained LoRA adapter into a fully merged standalone GGUF file suitable for llama.cpp, Ollama, and LM Studio. Added StartServer.command and StopServer.command for launching the llama.cpp server locally. Uploaded the merged GGUF to a new HuggingFace repo with full platform instructions.
Added
merge_and_convert_gguf.py— merges LoRA adapter into base model weights then converts to GGUF F16 using llama.cpp's convert_hf_to_gguf.py scriptspam-classifier-F16.gguf(~2.2 GB) — fully merged standalone GGUF; no separate base model or adapter file needed at runtimeStartServer.command— double-click launcher for llama.cpp server with all Apple Silicon performance flags (-ngl 99, -fa, --mlock, 8-bit KV cache, perf-core thread pinning) and system prompt injected at startupStopServer.command— kills the server by PID file, falls back to port killModelfile— Ollama configuration with system prompt and temperature baked in; allowsollama create spam-classifier -f Modelfilefor zero-config deploymentllama-server-config.json— reference config showing all server flagsupload_adapter_to_root.py— uploads adapter files to HF repo root (required by gguf-my-lora Space which expects adapter_config.json at root, not in subfolder)upload_merged_gguf.py— creates and uploads to VoltageVagabond/spam-classifier-liquid-GGUFupload_gguf_readme.py— uploads README with per-platform usage instructionsverify_gguf_model.py— tests the GGUF against real test set examples using llama-cpp-python; confirms fine-tuning is active, not just base model behavior
New HuggingFace repo
VoltageVagabond/spam-classifier-liquid-GGUF— merged F16 GGUF with Modelfile, README covering Ollama / LM Studio / llama.cpp server / llama.cpp CLI usage, and educational disclaimer for senior project context
docs
docs/08-gguf-conversion-guide.md— full guide: Option A (gguf-my-lora Space), Option B (local merge + convert), troubleshooting section covering every error encountered (wrong Space, nested subfolder, too many requests, redirect loop, adapter-only GGUF failing to load)docs/README.md— added guide 8 to table of contents
Key lessons documented
gguf-my-loraSpace produces an adapter-only GGUF (~8.6 MB), NOT a standalone model — this causes "failed to load" errors in Ollama/LM Studio without--lora- The system prompt must match training format exactly or the model falls back to base LFM2.5 general-assistant behavior
- GGUF format cannot embed a system prompt in weights — Modelfile (Ollama) is the closest "set it and forget it" workaround for end users
[v0.4.9] - 2026-04-16 (GGUF conversion guide + adapter repo root upload)
Summary
Documented how to convert the trained LoRA adapter to GGUF format so it can be used with llama.cpp, Ollama, and LM Studio. Also fixed the HuggingFace model repo so that adapter files are at the root level (required by the gguf-my-lora Space).
Added
docs/08-gguf-conversion-guide.md— step-by-step guide covering two conversion paths (Option A: gguf-my-lora Space in browser; Option B: merge locally then convert with llama.cpp), plus a full troubleshooting section for every error encounteredupload_adapter_to_root.py(project root) — helper script that uploadsadapter_config.json,adapter_model.safetensors,tokenizer_config.json,tokenizer.json, andchat_template.jinjato the root of theVoltageVagabond/spam-classifier-liquidHF repo (the gguf-my-lora Space requiresadapter_config.jsonat root, not inside anadapters/subfolder)docs/README.mdupdated to include guide 8 in the table of contents
Issues encountered and fixed (documented in guide 8)
gguf-my-repoSpace gave "no model_type in config.json" — wrong Space; LoRA adapters needgguf-my-lora, notgguf-my-repogguf-my-loragave "adapter_config.json not found" — adapter files were nested inadapters/subfolder on HF, not at repo root; fixed by uploading to rootgguf-my-reposhowed "too many requests" — Space has 1,900+ likes and gets heavy traffic; workaround is to duplicate the Space to your own account- HuggingFace sign-in redirect loop — caused by stale cookies; fixed by clearing cookies or using incognito window
[v0.4.8] - 2026-04-14 (8-bit KV cache quantization)
Summary
Enable 8-bit quantization for the KV cache at inference time to reduce memory usage without changing model weights or training.
What changed in app.py
model.generate()now passescache_implementation="quantized"andcache_config={"backend": "hqq", "nbits": 8}, quantizing both the key and value cache to 8-bit during generation- Used the
hqqbackend (recommended for int8;quantoonly supports int2/int4) - Model weights remain at BF16; only the runtime KV cache is affected
What changed in requirements.txt
- Added
hqq>=0.2.0— required package for the HQQ quantization backend
[v0.4.7] - 2026-04-14 (Documentation sync with fine_tune.py)
Summary
Audit pass to bring README.md in line with fine_tune.py. The README had a
stale "Binary classification only" limitation note (3-class has been live
since v0.4.0) and an out-of-date batch size, plus it was still quoting the
pre-optimization training time.
Changes to README.md
- Training Details table:
- Batch size
4→1 (effective 4 with gradient accumulation steps = 4)to matchBATCH_SIZE = 1andGRADIENT_ACCUMULATION_STEPS = 4in fine_tune.py (lines 72-73) - Added explicit rows for Max sequence length (256), Optimizer
(
adamw_torch), Weight dtype (bfloat16), Device (MPS), and Max gradient norm (0.3) to match the code - Training time
~2–2.5 hours→~1–1.5 hoursto match the in-code comment on line 241, with a note that the older figure reflected the pre-v0.4.3 config
- Batch size
- Limitations: "Binary classification only" note replaced with "Three-class classification (SPAM / HAM / PHISHING) as of v0.4.0"
Rationale
fine_tune.py is the source of truth. Values read from the file:
LORA_RANK = 8 (line 53)
LORA_ALPHA = 16 (line 54)
LORA_DROPOUT = 0.1 (line 55)
LORA_TARGET_MODULES = 8 (lines 56-68; q/k/v/out_proj, w1/w2/w3, in_proj)
NUM_EPOCHS = 3 (line 71)
BATCH_SIZE = 1 (line 72)
GRADIENT_ACCUMULATION_STEPS= 4 (line 73)
LEARNING_RATE = 2e-4 (line 74)
MAX_LENGTH = 256 (line 75)
optim = "adamw_torch" (line 226)
torch_dtype = bfloat16 (line 167)
device_map = "mps" (line 166)
max_grad_norm = 0.3 (comment / training args)
Training time comment = "~1-1.5 hours" (line 241)
No code changes, no retraining in this release.
[v0.4.6] - 2026-04-14 (HF Spaces deployment fixes)
Summary
Got the liquid Space (VoltageVagabond/spam-classifier-liquid) running on HF after
several iterations diagnosing adapter download failures.
Q&A from this session
Q: Why does the Space log say Adapters not found at /app/adapters when the local
app works fine?
A: The local adapters/ directory is git-ignored and never uploaded to the Space
(too large + the upload script explicitly excludes it). On HF Spaces the directory
doesn't exist, so the app falls through to the "no adapter" code path.
Q: How was that fixed?
A: Added a snapshot_download fallback in app.py: if local adapters are missing,
download them from the VoltageVagabond/spam-classifier-liquid model repo at startup.
Q: First attempt got 401 Repository Not Found. Why?
A: The model repo was set to private and the Space had no HF_TOKEN secret.
The Space container runs anonymously by default, so it couldn't authenticate.
Fix: made the model repo public (no token needed). Alternative: keep private and
add HF_TOKEN as a Space repository secret with read scope.
Q: Next error: Can't find 'adapter_config.json' at '/root/.cache/.../snapshots/...'. Why?
A: The model repo doesn't store adapter files at the root — they're nested under
adapters_fast/, adapters_full/, adapters_backup/. The download succeeded but
PeftModel.from_pretrained looked at the snapshot root and couldn't find
adapter_config.json. Fix: use allow_patterns=["adapters_fast/*"] and set
ADAPTER_PATH = snapshot_path / "adapters_fast" so PEFT loads from the right subdir.
Q: Why is classification slow on HF but fast locally?
A: HF free tier (cpu-basic) is 2 vCPUs, 16 GB RAM, no GPU. Local Mac uses Apple
Silicon Metal/MPS acceleration. A 1.2B-param transformer on CPU is just slow.
Realistic speedups (high → low impact):
- Upgrade Space to a T4 GPU (~$0.40/hr, only billed when running)
- 4-bit quantization via
bitsandbytes(~2-3× faster on CPU) - Reduce
max_tokensfrom 750 → ~100 (you only need SPAM/HAM) model.merge_and_unload()— bake LoRA into base model, removes per-call overhead- Switch to GGUF + llama-cpp-python — significantly faster than HF transformers on CPU
Q: Why does the model repo need to be public for the Space to work?
A: The Space container runs anonymously. Public repo = anonymous downloads work.
Private repo = need an authenticated HF_TOKEN secret in the Space settings.
The Space being public/private is independent — that controls who can view the
demo, not what the container can fetch.
Changes
app.py— addedsnapshot_downloadfallback that pulls from the HF model repo when local adapters are missingapp.py— passesos.environ.get("HF_TOKEN")tosnapshot_downloadso the same code path works for both public and private model reposapp.py—allow_patterns=["adapters_fast/*"]andADAPTER_PATHnow points at theadapters_fast/subdirectory inside the downloaded snapshot
[v0.4.5] - 2026-04-14
Beginner-Code Compliance — app.py
Refactored app.py to match the beginner-friendly coding style used in course lecture notebooks.
What changed:
- Replaced 3 lambda functions in Gradio event handlers with named functions (
make_example_handler,clear_input) - Replaced ternary operator for emoji selection with explicit
if/elseblock - No behavior changes — all Gradio event wiring, feedback logging, and chat logic unchanged
[v0.4.4] - 2026-04-14
Chat App Upgrade — app.py
Replaced the two-tab Gradio app (Classify + Chat) with a polished chat-only interface.
What changed:
- Removed the Classify tab entirely — chat is now the full interface
- Added HTML topbar with project title, model name, and badge pills (matches XAI project style)
- Added clickable example prompt buttons (spam, ham, phishing) that populate the input
- Added 👍 / 👎 feedback buttons that log to
data/feedback/feedback_log.csv- CSV columns:
timestamp,user_input,model_response,rating - Feedback status resets after each new submission
- CSV columns:
- Increased
max_tokensfrom 500 → 750 to reduce mid-sentence cutoffs - Fixed Gradio 6 compatibility:
theme/cssmoved tolaunch(),gr.Chatbotreturns full history list - Paths anchored to
Path(__file__).parentso the app works from any launch directory - Updated
Dockerfile: consolidated to install deps fromrequirements.txt, removed redundant pip install lines
[v0.4.3] - 2026-04-07
Memory & Speed Optimization — fine_tune.py
Reduced peak memory usage from ~50 GB to a target of ~8–14 GB by changing five training parameters. No change to model architecture or LoRA adapter structure — accuracy is unaffected.
| Parameter | Before | After | Why |
|---|---|---|---|
BATCH_SIZE |
4 | 1 | Smaller batch = 4× less activation memory per step |
GRADIENT_ACCUMULATION_STEPS |
1 | 4 | Keeps effective batch size at 4 so training dynamics are unchanged |
MAX_LENGTH |
512 | 256 | Attention memory scales O(n²) with sequence length — halving it cuts ~4× attention memory; spam emails rarely exceed 256 tokens |
optim |
adamw (default) |
adamw_8bit |
Adam optimizer normally stores 2 full float32 copies of every parameter for momentum tracking ( |
torch_dtype |
"auto" |
torch.bfloat16 |
Forces model weights to load in bfloat16 (2 bytes/param) instead of float32 (4 bytes/param), halving weight memory; bfloat16 has the same exponent range as float32 so training stability is preserved |
device_map |
"auto" |
"mps" |
Pins all layers to the MPS GPU; "auto" can spill layers to CPU causing slow cross-device copies and inflated memory readings |
gradient_checkpointing_kwargs |
not set | {"use_reentrant": False} |
Suppresses deprecation warning on newer PyTorch; no behavior change |
max_grad_norm |
not set | 0.3 |
Clips gradient norms to prevent occasional instability spikes during training |
Why quality is unaffected:
- 8-bit Adam was validated by Dettmers et al. (2022) to match full-precision Adam loss curves on LLM fine-tuning
- bfloat16 was designed specifically for training — same exponent range as float32, just less mantissa precision
- Effective batch size (1 × 4 accumulation = 4) is identical to the original (4 × 1)
- 256 tokens covers the vast majority of spam/ham emails in this dataset
[v0.4.2] - 2026-04-07
Updated — Training Data Pipeline
- Added puyang2025/seven-phishing-email-datasets and zefang-liu/phishing-email-dataset as additional sources in
build_liquid_datasets.py— parquets generated by the spam-xai-project sibling and shared across all three classifier projects - Updated data counts in
retrain-fast.commandandretrain-full.commandto reflect new ~190K source pool
[v0.4.1] - 2026-03-28
Retrain Commands with Adapter Swap
retrain-fast.commandandretrain-full.commandnow prompt after training to swap the new adapter as the default- Selecting "y" backs up
adapters/toadapters_backup/and copies the new adapter in - App and notebook automatically use whichever adapter is in
adapters/ - Old
retrain.command(2-class, 4K examples) removed — replaced by fast/full versions
[v0.4.0] - 2026-03-28
Added — 3-Class Training Data + HuggingFace Upload
- NEW: Phishing detection — model can now classify as SPAM, HAM, or PHISHING (previously binary only)
- Prepared two new training datasets from 5 combined sources:
- FAST (8,000 examples): ~1 hr retrain —
new_training_data/liquid_fast/ - FULL (20,000 examples): ~3 hr retrain —
new_training_data/liquid_full/
- FAST (8,000 examples): ~1 hr retrain —
- Data sources: existing 4K FaroukMoc2 + locuoco 250K (HF) + ealvaradob phishing (HF) + luongnv89 phishing with reasoning (HF) + Enron
- Added
retrain_liquid.pyscript with--mode fastand--mode full(saves toadapters_fast/oradapters_full/) - Uploaded project to HuggingFace:
VoltageVagabond/spam-classifier-liquid(model repo) - Created HuggingFace Space:
VoltageVagabond/spam-classifier-liquid-space(Docker + Gradio demo) - Created
README.mdwith HF model card metadata andDockerfilefor HF Space - Uploaded complete dataset to HF:
VoltageVagabond/spam-email-datasetwith all raw sources
[v0.3.2] - 2026-03-28
Fixed
- Fixed
ValueError: train_dataset is requiredcrash during evaluation step — SFTTrainer requirestrain_dataseteven for eval-only usage
Added
--eval-onlyflag forfine_tune.py— loads saved adapter and runs evaluation + generation test without retraining (~minutes instead of ~2 hours)evaluate.command— double-click launcher for eval-only mode
[v0.3.1] - 2026-03-27
Updated
- Corrected training time estimates across all files:
- Notebook (1 epoch): ~45 minutes on Apple Silicon
- fine_tune.py (3 epochs): ~2-2.5 hours on Apple Silicon
- Slowdown vs v0.2.0 due to targeting 8 module types instead of 4 (better quality, more compute per step)
- Fixed training data counts in setup guide (3,200 train / 800 test, not 500/100)
- Added training time comparison table to training guide
- Added batch size 4 saturation note to tuning tips
- Added
docs/07-code-sources-reference.md— every source, citation, and empirical finding for paper writing
[v0.3.0] - 2026-03-27
Changed — LoRA config aligned with Liquid AI official cookbook
- Source: Liquid4All/cookbook
- Target modules expanded from 4 (attention only) to 8 (attention + GLU + conv):
- Attention:
q_proj,k_proj,v_proj,out_proj - Feed-forward GLU:
w1,w2,w3 - Conv:
in_proj
- Attention:
- LoRA rank 32 → 8, alpha 64 → 16 (matching cookbook values)
- Dropout 0.05 → 0.1 (matching cookbook)
- Fixed
o_proj→out_proj(correct layer name for LFM2 architecture)
[v0.2.1] - 2026-03-27
Note
- Verified Liquid AI version does NOT have the orphaned port issue that affected the MLX version
- PyTorch loads the model directly into the Python process — no child servers spawned
- When the app exits, all model memory is freed automatically
- No cleanup trap needed (unlike MLX version which spawns llama-server processes)
[v0.2.0] - 2026-03-27
Changed
- Increased batch size from 1 to 4 for faster training (parallel processing on MPS)
- Increased LoRA rank from 16 to 32 (and alpha from 32 to 64) for better adapter quality
- Removed gradient accumulation (not needed with batch size 4)
- Memory usage ~7-8 GB (comfortable on 24 GB Apple Silicon)
Tested and reverted
- Batch size 8 tested — MPS GPU saturates at batch size 4, no speed gain beyond that. Steps halved but each step took 2x longer. Batch size 4 is the sweet spot for Apple Silicon.
[v0.1.1] - 2026-03-27
Fixed
- Renamed
max_seq_lengthtomax_lengthin fine_tune.py, notebook, and docs for TRL v0.29 compatibility - Fixed
launch-notebook.commandnot showing Jupyter install errors - Added model loading time note (30-60 seconds) to
launch UI.command
[v0.1.0] - 2026-03-27
Added
- Project scaffolding (requirements and gitignore)
- Training data copied from MLX sibling project
fine_tune.py— LoRA fine-tuning via TRL SFTTrainer (Liquid AI's official method)app.py— Gradio web UI with Classify and Chat tabs.commandlauncher scripts for macOS- Beginner-friendly documentation (6 guides)
- Interactive Jupyter notebook walkthrough