Spaces:
Sleeping
Sleeping
| # Changelog | |
| All notable changes to the Liquid AI Spam Classifier project. | |
| ## [v0.5.9] - 2026-04-16 (retrain script fixes: NaN gradients + accurate example counts) | |
| ### Summary | |
| Fixed gradient explosion (NaN loss) that crashed full retrains on Apple Silicon, and | |
| corrected stale example counts and time estimates across all retrain command files. | |
| ### Fixed | |
| - `retrain_liquid.py` — `ACTIVATION_OFFLOADING = True` caused `AssertionError: Torch not | |
| compiled with CUDA enabled`; TRL's `OffloadActivations` is CUDA-only and crashes on MPS. | |
| Set to `False`. | |
| - `retrain_liquid.py` — gradient explosion (`loss: 0`, `grad_norm: nan`, `entropy: nan`) | |
| caused by learning rate too aggressive without activation offloading. Fixed: | |
| - `LEARNING_RATE`: `2e-4` → `5e-5` | |
| - `max_grad_norm`: `0.3` → `1.0` | |
| - Added `warmup_steps=100` to ramp LR gradually and prevent early gradient spikes | |
| - `retrain.command` (liquid), `Retrain.command`, `spam-classifier-mlx/retrain.command` — | |
| menu displayed stale example counts (~20,000) and time estimates (~2.5-3.5 hrs) that | |
| didn't match actual dataset sizes. Corrected to actual counts (liquid/mlx full: ~16,000; | |
| mlx fast: ~6,800) and recalculated time estimates. | |
| - `retrain.command` (liquid) — removed "activation offloading" from memory optimizations | |
| note since it is now disabled on MPS. | |
| ### Changed | |
| - `spam-classifier-liquid/spam_classifier_liquid.ipynb` — added `torch_empty_cache_steps=50` | |
| and `dataloader_pin_memory=False` to notebook `SFTConfig` to match retrain script | |
| --- | |
| ## [v0.5.3] - 2026-04-16 (GGUF rename to spam-classifier-F16.gguf) | |
| ### Summary | |
| Renamed the local and HuggingFace GGUF file to `spam-classifier-F16.gguf` so that | |
| HuggingFace's model card parser can detect the quantization type (F16) and display | |
| it in the GGUF variants widget. Updated all local file references accordingly. | |
| ### Changed | |
| - `spam-classifier.gguf` → `spam-classifier-F16.gguf` (local file rename) | |
| - `VoltageVagabond/spam-classifier-liquid-GGUF` — deleted `spam-classifier.gguf`, | |
| uploaded `spam-classifier-F16.gguf`, updated README | |
| - Updated all references in `StartServer.command`, `Retrain.command`, | |
| `merge_and_convert_gguf.py`, `verify_gguf_model.py`, both `Modelfile`s, | |
| `spam-classifier-liquid-GGUF/README.md`, and this changelog | |
| --- | |
| ## [v0.5.2] - 2026-04-16 (GGUF system prompt patch + llama-server fixes) | |
| ### Summary | |
| Baked the spam classifier system prompt directly into the GGUF model file's | |
| `tokenizer.chat_template` metadata so any client (llama.cpp, LM Studio, Ollama) | |
| applies the correct behavior without manual configuration. Fixed two llama-server | |
| startup bugs introduced by a brew update. | |
| ### Changed | |
| - `spam-classifier-F16.gguf` — patched `tokenizer.chat_template` to use | |
| `"You are an email spam classifier..."` as the default system prompt; done via raw | |
| binary rewrite of the GGUF metadata section (string grows by 118 bytes; tensor | |
| data section is untouched) | |
| - `StartServer.command` — fixed `-fa on` (flag syntax changed in brew b8680; was | |
| bare `-fa`, now requires explicit `on`/`off`/`auto`) | |
| - `StartServer.command` — added `--webui-config` with `systemMessage` and | |
| `temperature` so the llama.cpp Web UI pre-fills the system prompt automatically | |
| (the Web UI uses the raw `/completion` endpoint and does not apply the chat | |
| template on its own) | |
| --- | |
| ## [v0.5.1] - 2026-04-16 (consolidated retrain script + memory optimizations) | |
| ### Summary | |
| Replaced the two separate `retrain-fast.command` and `retrain-full.command` scripts | |
| with a single `retrain.command` that prompts for fast or full mode at launch. | |
| Applied memory optimizations to `retrain_liquid.py` to reduce MPS GPU pressure | |
| during training. Added a top-level `Retrain.command` pipeline script in the LLM | |
| Project root that chains retrain → GGUF rebuild → HuggingFace upload. | |
| ### Changed | |
| - `retrain-fast.command` + `retrain-full.command` → replaced by single `retrain.command` | |
| - Double-click launches a menu: f) Fast (~1-1.5 hrs) / u) Full (~2.5-3.5 hrs) / q) Quit | |
| - Includes adapter swap prompt with backup logic (same as before, just unified) | |
| - Reminds user to run `Retrain.command` in LLM Project root for GGUF rebuild | |
| ### Memory optimizations in `retrain_liquid.py` | |
| - `activation_offloading=True` — offloads forward-pass activations from MPS to CPU | |
| RAM; frees ~25% MPS memory at ~15% speed cost (biggest knob for avoiding OOM) | |
| - `torch_empty_cache_steps=50` — flushes the MPS memory pool every 50 optimizer | |
| steps; prevents memory fragmentation from causing OOM mid-run | |
| - `optim="adamw_torch_fused"` — fused AdamW kernel; slightly faster and lower peak | |
| memory than unfused `adamw_torch` | |
| - `dataloader_pin_memory=False` — pin_memory is a CUDA optimization that wastes | |
| memory on MPS; explicitly disabled | |
| - Already enabled: `gradient_checkpointing=True`, `bf16=True`, `MAX_LENGTH=256` | |
| ### Added (LLM Project root) | |
| - `Retrain.command` — end-to-end pipeline: retrain → swap adapter → rebuild GGUF | |
| (clears `merged-liquid-full/` cache so new adapter is actually baked in) → | |
| upload adapter + GGUF to HuggingFace → remind to restart llama.cpp server | |
| --- | |
| ## [v0.5.0] - 2026-04-16 (GGUF merged model + server commands) | |
| ### Summary | |
| Converted the trained LoRA adapter into a fully merged standalone GGUF file | |
| suitable for llama.cpp, Ollama, and LM Studio. Added StartServer.command and | |
| StopServer.command for launching the llama.cpp server locally. Uploaded the | |
| merged GGUF to a new HuggingFace repo with full platform instructions. | |
| ### Added | |
| - `merge_and_convert_gguf.py` — merges LoRA adapter into base model weights | |
| then converts to GGUF F16 using llama.cpp's convert_hf_to_gguf.py script | |
| - `spam-classifier-F16.gguf` (~2.2 GB) — fully merged standalone GGUF; | |
| no separate base model or adapter file needed at runtime | |
| - `StartServer.command` — double-click launcher for llama.cpp server with all | |
| Apple Silicon performance flags (-ngl 99, -fa, --mlock, 8-bit KV cache, | |
| perf-core thread pinning) and system prompt injected at startup | |
| - `StopServer.command` — kills the server by PID file, falls back to port kill | |
| - `Modelfile` — Ollama configuration with system prompt and temperature baked in; | |
| allows `ollama create spam-classifier -f Modelfile` for zero-config deployment | |
| - `llama-server-config.json` — reference config showing all server flags | |
| - `upload_adapter_to_root.py` — uploads adapter files to HF repo root (required | |
| by gguf-my-lora Space which expects adapter_config.json at root, not in subfolder) | |
| - `upload_merged_gguf.py` — creates and uploads to VoltageVagabond/spam-classifier-liquid-GGUF | |
| - `upload_gguf_readme.py` — uploads README with per-platform usage instructions | |
| - `verify_gguf_model.py` — tests the GGUF against real test set examples using | |
| llama-cpp-python; confirms fine-tuning is active, not just base model behavior | |
| ### New HuggingFace repo | |
| - `VoltageVagabond/spam-classifier-liquid-GGUF` — merged F16 GGUF with Modelfile, | |
| README covering Ollama / LM Studio / llama.cpp server / llama.cpp CLI usage, | |
| and educational disclaimer for senior project context | |
| ### docs | |
| - `docs/08-gguf-conversion-guide.md` — full guide: Option A (gguf-my-lora Space), | |
| Option B (local merge + convert), troubleshooting section covering every error | |
| encountered (wrong Space, nested subfolder, too many requests, redirect loop, | |
| adapter-only GGUF failing to load) | |
| - `docs/README.md` — added guide 8 to table of contents | |
| ### Key lessons documented | |
| - `gguf-my-lora` Space produces an adapter-only GGUF (~8.6 MB), NOT a standalone | |
| model — this causes "failed to load" errors in Ollama/LM Studio without `--lora` | |
| - The system prompt must match training format exactly or the model falls back to | |
| base LFM2.5 general-assistant behavior | |
| - GGUF format cannot embed a system prompt in weights — Modelfile (Ollama) is the | |
| closest "set it and forget it" workaround for end users | |
| --- | |
| ## [v0.4.9] - 2026-04-16 (GGUF conversion guide + adapter repo root upload) | |
| ### Summary | |
| Documented how to convert the trained LoRA adapter to GGUF format so it can be | |
| used with llama.cpp, Ollama, and LM Studio. Also fixed the HuggingFace model repo | |
| so that adapter files are at the root level (required by the gguf-my-lora Space). | |
| ### Added | |
| - `docs/08-gguf-conversion-guide.md` — step-by-step guide covering two conversion | |
| paths (Option A: gguf-my-lora Space in browser; Option B: merge locally then convert | |
| with llama.cpp), plus a full troubleshooting section for every error encountered | |
| - `upload_adapter_to_root.py` (project root) — helper script that uploads | |
| `adapter_config.json`, `adapter_model.safetensors`, `tokenizer_config.json`, | |
| `tokenizer.json`, and `chat_template.jinja` to the root of the | |
| `VoltageVagabond/spam-classifier-liquid` HF repo (the gguf-my-lora Space requires | |
| `adapter_config.json` at root, not inside an `adapters/` subfolder) | |
| - `docs/README.md` updated to include guide 8 in the table of contents | |
| ### Issues encountered and fixed (documented in guide 8) | |
| - `gguf-my-repo` Space gave "no model_type in config.json" — wrong Space; LoRA | |
| adapters need `gguf-my-lora`, not `gguf-my-repo` | |
| - `gguf-my-lora` gave "adapter_config.json not found" — adapter files were nested | |
| in `adapters/` subfolder on HF, not at repo root; fixed by uploading to root | |
| - `gguf-my-repo` showed "too many requests" — Space has 1,900+ likes and gets | |
| heavy traffic; workaround is to duplicate the Space to your own account | |
| - HuggingFace sign-in redirect loop — caused by stale cookies; fixed by clearing | |
| cookies or using incognito window | |
| --- | |
| ## [v0.4.8] - 2026-04-14 (8-bit KV cache quantization) | |
| ### Summary | |
| Enable 8-bit quantization for the KV cache at inference time to reduce memory | |
| usage without changing model weights or training. | |
| ### What changed in app.py | |
| - `model.generate()` now passes `cache_implementation="quantized"` and | |
| `cache_config={"backend": "hqq", "nbits": 8}`, quantizing both the key and | |
| value cache to 8-bit during generation | |
| - Used the `hqq` backend (recommended for int8; `quanto` only supports int2/int4) | |
| - Model weights remain at BF16; only the runtime KV cache is affected | |
| ### What changed in requirements.txt | |
| - Added `hqq>=0.2.0` — required package for the HQQ quantization backend | |
| --- | |
| ## [v0.4.7] - 2026-04-14 (Documentation sync with fine_tune.py) | |
| ### Summary | |
| Audit pass to bring `README.md` in line with `fine_tune.py`. The README had a | |
| stale "Binary classification only" limitation note (3-class has been live | |
| since v0.4.0) and an out-of-date batch size, plus it was still quoting the | |
| pre-optimization training time. | |
| ### Changes to README.md | |
| - Training Details table: | |
| - Batch size `4` → `1 (effective 4 with gradient accumulation steps = 4)` | |
| to match `BATCH_SIZE = 1` and `GRADIENT_ACCUMULATION_STEPS = 4` in | |
| fine_tune.py (lines 72-73) | |
| - Added explicit rows for Max sequence length (256), Optimizer | |
| (`adamw_torch`), Weight dtype (bfloat16), Device (MPS), and Max gradient | |
| norm (0.3) to match the code | |
| - Training time `~2–2.5 hours` → `~1–1.5 hours` to match the in-code comment | |
| on line 241, with a note that the older figure reflected the | |
| pre-v0.4.3 config | |
| - Limitations: "Binary classification only" note replaced with | |
| "Three-class classification (SPAM / HAM / PHISHING) as of v0.4.0" | |
| ### Rationale | |
| `fine_tune.py` is the source of truth. Values read from the file: | |
| ``` | |
| LORA_RANK = 8 (line 53) | |
| LORA_ALPHA = 16 (line 54) | |
| LORA_DROPOUT = 0.1 (line 55) | |
| LORA_TARGET_MODULES = 8 (lines 56-68; q/k/v/out_proj, w1/w2/w3, in_proj) | |
| NUM_EPOCHS = 3 (line 71) | |
| BATCH_SIZE = 1 (line 72) | |
| GRADIENT_ACCUMULATION_STEPS= 4 (line 73) | |
| LEARNING_RATE = 2e-4 (line 74) | |
| MAX_LENGTH = 256 (line 75) | |
| optim = "adamw_torch" (line 226) | |
| torch_dtype = bfloat16 (line 167) | |
| device_map = "mps" (line 166) | |
| max_grad_norm = 0.3 (comment / training args) | |
| Training time comment = "~1-1.5 hours" (line 241) | |
| ``` | |
| No code changes, no retraining in this release. | |
| --- | |
| ## [v0.4.6] - 2026-04-14 (HF Spaces deployment fixes) | |
| ### Summary | |
| Got the liquid Space (`VoltageVagabond/spam-classifier-liquid`) running on HF after | |
| several iterations diagnosing adapter download failures. | |
| ### Q&A from this session | |
| **Q: Why does the Space log say `Adapters not found at /app/adapters` when the local | |
| app works fine?** | |
| A: The local `adapters/` directory is git-ignored and never uploaded to the Space | |
| (too large + the upload script explicitly excludes it). On HF Spaces the directory | |
| doesn't exist, so the app falls through to the "no adapter" code path. | |
| **Q: How was that fixed?** | |
| A: Added a `snapshot_download` fallback in `app.py`: if local adapters are missing, | |
| download them from the `VoltageVagabond/spam-classifier-liquid` model repo at startup. | |
| **Q: First attempt got `401 Repository Not Found`. Why?** | |
| A: The model repo was set to **private** and the Space had no `HF_TOKEN` secret. | |
| The Space container runs anonymously by default, so it couldn't authenticate. | |
| Fix: made the model repo public (no token needed). Alternative: keep private and | |
| add `HF_TOKEN` as a Space repository secret with read scope. | |
| **Q: Next error: `Can't find 'adapter_config.json' at '/root/.cache/.../snapshots/...'`. Why?** | |
| A: The model repo doesn't store adapter files at the root — they're nested under | |
| `adapters_fast/`, `adapters_full/`, `adapters_backup/`. The download succeeded but | |
| `PeftModel.from_pretrained` looked at the snapshot root and couldn't find | |
| `adapter_config.json`. Fix: use `allow_patterns=["adapters_fast/*"]` and set | |
| `ADAPTER_PATH = snapshot_path / "adapters_fast"` so PEFT loads from the right subdir. | |
| **Q: Why is classification slow on HF but fast locally?** | |
| A: HF free tier (`cpu-basic`) is 2 vCPUs, 16 GB RAM, no GPU. Local Mac uses Apple | |
| Silicon Metal/MPS acceleration. A 1.2B-param transformer on CPU is just slow. | |
| Realistic speedups (high → low impact): | |
| 1. Upgrade Space to a T4 GPU (~$0.40/hr, only billed when running) | |
| 2. 4-bit quantization via `bitsandbytes` (~2-3× faster on CPU) | |
| 3. Reduce `max_tokens` from 750 → ~100 (you only need SPAM/HAM) | |
| 4. `model.merge_and_unload()` — bake LoRA into base model, removes per-call overhead | |
| 5. Switch to GGUF + llama-cpp-python — significantly faster than HF transformers on CPU | |
| **Q: Why does the model repo need to be public for the Space to work?** | |
| A: The Space container runs anonymously. Public repo = anonymous downloads work. | |
| Private repo = need an authenticated `HF_TOKEN` secret in the Space settings. | |
| The Space being public/private is independent — that controls who can view the | |
| demo, not what the container can fetch. | |
| ### Changes | |
| - `app.py` — added `snapshot_download` fallback that pulls from the HF model repo | |
| when local adapters are missing | |
| - `app.py` — passes `os.environ.get("HF_TOKEN")` to `snapshot_download` so the same | |
| code path works for both public and private model repos | |
| - `app.py` — `allow_patterns=["adapters_fast/*"]` and `ADAPTER_PATH` now points at | |
| the `adapters_fast/` subdirectory inside the downloaded snapshot | |
| --- | |
| ## [v0.4.5] - 2026-04-14 | |
| ### Beginner-Code Compliance — app.py | |
| Refactored `app.py` to match the beginner-friendly coding style used in course lecture notebooks. | |
| **What changed:** | |
| - Replaced 3 lambda functions in Gradio event handlers with named functions (`make_example_handler`, `clear_input`) | |
| - Replaced ternary operator for emoji selection with explicit `if/else` block | |
| - No behavior changes — all Gradio event wiring, feedback logging, and chat logic unchanged | |
| ## [v0.4.4] - 2026-04-14 | |
| ### Chat App Upgrade — app.py | |
| Replaced the two-tab Gradio app (Classify + Chat) with a polished chat-only interface. | |
| **What changed:** | |
| - Removed the Classify tab entirely — chat is now the full interface | |
| - Added HTML topbar with project title, model name, and badge pills (matches XAI project style) | |
| - Added clickable example prompt buttons (spam, ham, phishing) that populate the input | |
| - Added 👍 / 👎 feedback buttons that log to `data/feedback/feedback_log.csv` | |
| - CSV columns: `timestamp`, `user_input`, `model_response`, `rating` | |
| - Feedback status resets after each new submission | |
| - Increased `max_tokens` from 500 → 750 to reduce mid-sentence cutoffs | |
| - Fixed Gradio 6 compatibility: `theme`/`css` moved to `launch()`, `gr.Chatbot` returns full history list | |
| - Paths anchored to `Path(__file__).parent` so the app works from any launch directory | |
| - Updated `Dockerfile`: consolidated to install deps from `requirements.txt`, removed redundant pip install lines | |
| ## [v0.4.3] - 2026-04-07 | |
| ### Memory & Speed Optimization — fine_tune.py | |
| Reduced peak memory usage from ~50 GB to a target of ~8–14 GB by changing five training parameters. No change to model architecture or LoRA adapter structure — accuracy is unaffected. | |
| | Parameter | Before | After | Why | | |
| |-----------|--------|-------|-----| | |
| | `BATCH_SIZE` | 4 | 1 | Smaller batch = 4× less activation memory per step | | |
| | `GRADIENT_ACCUMULATION_STEPS` | 1 | 4 | Keeps effective batch size at 4 so training dynamics are unchanged | | |
| | `MAX_LENGTH` | 512 | 256 | Attention memory scales O(n²) with sequence length — halving it cuts ~4× attention memory; spam emails rarely exceed 256 tokens | | |
| | `optim` | `adamw` (default) | `adamw_8bit` | Adam optimizer normally stores 2 full float32 copies of every parameter for momentum tracking (~9.6 GB for a 1.2B model); 8-bit Adam quantizes those to 8-bit integers with negligible quality loss (~75% reduction) | | |
| | `torch_dtype` | `"auto"` | `torch.bfloat16` | Forces model weights to load in bfloat16 (2 bytes/param) instead of float32 (4 bytes/param), halving weight memory; bfloat16 has the same exponent range as float32 so training stability is preserved | | |
| | `device_map` | `"auto"` | `"mps"` | Pins all layers to the MPS GPU; `"auto"` can spill layers to CPU causing slow cross-device copies and inflated memory readings | | |
| | `gradient_checkpointing_kwargs` | not set | `{"use_reentrant": False}` | Suppresses deprecation warning on newer PyTorch; no behavior change | | |
| | `max_grad_norm` | not set | `0.3` | Clips gradient norms to prevent occasional instability spikes during training | | |
| **Why quality is unaffected:** | |
| - 8-bit Adam was validated by Dettmers et al. (2022) to match full-precision Adam loss curves on LLM fine-tuning | |
| - bfloat16 was designed specifically for training — same exponent range as float32, just less mantissa precision | |
| - Effective batch size (1 × 4 accumulation = 4) is identical to the original (4 × 1) | |
| - 256 tokens covers the vast majority of spam/ham emails in this dataset | |
| ## [v0.4.2] - 2026-04-07 | |
| ### Updated — Training Data Pipeline | |
| - **Added puyang2025/seven-phishing-email-datasets and zefang-liu/phishing-email-dataset** as additional sources in `build_liquid_datasets.py` — parquets generated by the spam-xai-project sibling and shared across all three classifier projects | |
| - **Updated data counts** in `retrain-fast.command` and `retrain-full.command` to reflect new ~190K source pool | |
| ## [v0.4.1] - 2026-03-28 | |
| ### Retrain Commands with Adapter Swap | |
| - `retrain-fast.command` and `retrain-full.command` now prompt after training to swap the new adapter as the default | |
| - Selecting "y" backs up `adapters/` to `adapters_backup/` and copies the new adapter in | |
| - App and notebook automatically use whichever adapter is in `adapters/` | |
| - Old `retrain.command` (2-class, 4K examples) removed — replaced by fast/full versions | |
| ## [v0.4.0] - 2026-03-28 | |
| ### Added — 3-Class Training Data + HuggingFace Upload | |
| - **NEW: Phishing detection** — model can now classify as SPAM, HAM, or PHISHING (previously binary only) | |
| - Prepared two new training datasets from 5 combined sources: | |
| - **FAST** (8,000 examples): ~1 hr retrain — `new_training_data/liquid_fast/` | |
| - **FULL** (20,000 examples): ~3 hr retrain — `new_training_data/liquid_full/` | |
| - Data sources: existing 4K FaroukMoc2 + locuoco 250K (HF) + ealvaradob phishing (HF) + luongnv89 phishing with reasoning (HF) + Enron | |
| - Added `retrain_liquid.py` script with `--mode fast` and `--mode full` (saves to `adapters_fast/` or `adapters_full/`) | |
| - Uploaded project to HuggingFace: `VoltageVagabond/spam-classifier-liquid` (model repo) | |
| - Created HuggingFace Space: `VoltageVagabond/spam-classifier-liquid-space` (Docker + Gradio demo) | |
| - Created `README.md` with HF model card metadata and `Dockerfile` for HF Space | |
| - Uploaded complete dataset to HF: `VoltageVagabond/spam-email-dataset` with all raw sources | |
| ## [v0.3.2] - 2026-03-28 | |
| ### Fixed | |
| - Fixed `ValueError: train_dataset is required` crash during evaluation step — SFTTrainer requires `train_dataset` even for eval-only usage | |
| ### Added | |
| - `--eval-only` flag for `fine_tune.py` — loads saved adapter and runs evaluation + generation test without retraining (~minutes instead of ~2 hours) | |
| - `evaluate.command` — double-click launcher for eval-only mode | |
| ## [v0.3.1] - 2026-03-27 | |
| ### Updated | |
| - Corrected training time estimates across all files: | |
| - Notebook (1 epoch): ~45 minutes on Apple Silicon | |
| - fine_tune.py (3 epochs): ~2-2.5 hours on Apple Silicon | |
| - Slowdown vs v0.2.0 due to targeting 8 module types instead of 4 (better quality, more compute per step) | |
| - Fixed training data counts in setup guide (3,200 train / 800 test, not 500/100) | |
| - Added training time comparison table to training guide | |
| - Added batch size 4 saturation note to tuning tips | |
| - Added `docs/07-code-sources-reference.md` — every source, citation, and empirical finding for paper writing | |
| ## [v0.3.0] - 2026-03-27 | |
| ### Changed — LoRA config aligned with Liquid AI official cookbook | |
| - **Source:** [Liquid4All/cookbook](https://github.com/Liquid4All/cookbook/blob/main/finetuning/notebooks/sft_with_trl.ipynb) | |
| - Target modules expanded from 4 (attention only) to 8 (attention + GLU + conv): | |
| - Attention: `q_proj`, `k_proj`, `v_proj`, `out_proj` | |
| - Feed-forward GLU: `w1`, `w2`, `w3` | |
| - Conv: `in_proj` | |
| - LoRA rank 32 → 8, alpha 64 → 16 (matching cookbook values) | |
| - Dropout 0.05 → 0.1 (matching cookbook) | |
| - Fixed `o_proj` → `out_proj` (correct layer name for LFM2 architecture) | |
| ## [v0.2.1] - 2026-03-27 | |
| ### Note | |
| - Verified Liquid AI version does NOT have the orphaned port issue that affected the MLX version | |
| - PyTorch loads the model directly into the Python process — no child servers spawned | |
| - When the app exits, all model memory is freed automatically | |
| - No cleanup trap needed (unlike MLX version which spawns llama-server processes) | |
| ## [v0.2.0] - 2026-03-27 | |
| ### Changed | |
| - Increased batch size from 1 to 4 for faster training (parallel processing on MPS) | |
| - Increased LoRA rank from 16 to 32 (and alpha from 32 to 64) for better adapter quality | |
| - Removed gradient accumulation (not needed with batch size 4) | |
| - Memory usage ~7-8 GB (comfortable on 24 GB Apple Silicon) | |
| ### Tested and reverted | |
| - Batch size 8 tested — MPS GPU saturates at batch size 4, no speed gain beyond that. Steps halved but each step took 2x longer. Batch size 4 is the sweet spot for Apple Silicon. | |
| ## [v0.1.1] - 2026-03-27 | |
| ### Fixed | |
| - Renamed `max_seq_length` to `max_length` in fine_tune.py, notebook, and docs for TRL v0.29 compatibility | |
| - Fixed `launch-notebook.command` not showing Jupyter install errors | |
| - Added model loading time note (30-60 seconds) to `launch UI.command` | |
| ## [v0.1.0] - 2026-03-27 | |
| ### Added | |
| - Project scaffolding (requirements and gitignore) | |
| - Training data copied from MLX sibling project | |
| - `fine_tune.py` — LoRA fine-tuning via TRL SFTTrainer (Liquid AI's official method) | |
| - `app.py` — Gradio web UI with Classify and Chat tabs | |
| - `.command` launcher scripts for macOS | |
| - Beginner-friendly documentation (6 guides) | |
| - Interactive Jupyter notebook walkthrough | |