--- title: Spam Classifier — Liquid AI emoji: 🤖 colorFrom: blue colorTo: green sdk: gradio sdk_version: "5.23.0" python_version: "3.11" app_file: app.py pinned: false license: mit tags: - spam-detection - liquid-ai - lora - peft - gradio - nlp - text-classification models: - LiquidAI/LFM2.5-1.2B-Instruct datasets: - VoltageVagabond/spam-email-dataset --- ## Senior Project Notice This repository was created for a senior project in ENGT 375 Applied Machine Learning at Old Dominion University. It is provided for educational and research demonstration purposes only. It is not intended for production use, security filtering, or making real-world spam/phishing decisions. Always use established security tools for operational email protection. --- library_name: transformers tags: - spam-detection - liquid-ai - lora - peft - apple-silicon - nlp - text-classification license: mit base_model: LiquidAI/LFM2.5-1.2B-Instruct datasets: - VoltageVagabond/spam-email-dataset pipeline_tag: text-generation --- # Spam Classifier — Liquid AI LFM2.5-1.2B LoRA Fine-Tune **ENGT 375 — Applied Machine Learning | Spring 2026 | ODU** Liquid AI's LFM2.5-1.2B-Instruct model fine-tuned with LoRA adapters using HuggingFace Transformers + PEFT for spam email classification. ## Model Details - **Base model:** LiquidAI/LFM2.5-1.2B-Instruct - **Fine-tuning:** LoRA (rank 8, alpha 16, dropout 0.1) - **Framework:** HuggingFace Transformers + PEFT + TRL - **Hardware:** Apple Silicon (M-series) - **Task:** Classify emails as spam or ham ## LoRA Target Modules `w1`, `w2`, `in_proj`, `out_proj`, `v_proj`, `k_proj`, `q_proj`, `w3` ## Training Details | Hyperparameter | Value | |----------------|-------| | Training examples | ~8,000 (fast) / ~16,000 (full) — 3-class Spam/Ham/Phishing | | Test examples | ~20% holdout from the retrain split | | Epochs | 3 | | Batch size | 1 (effective 4 with gradient accumulation steps = 4) | | Learning rate | 2e-4 | | Max sequence length | 256 | | Optimizer | adamw_torch (bitsandbytes 8-bit not supported on MPS) | | Weight dtype | bfloat16 | | Device | MPS (Apple Silicon) | | Gradient checkpointing | Enabled (use_reentrant=False) | | Max gradient norm | 0.3 | | LoRA rank | 8 | | LoRA alpha | 16 | | LoRA dropout | 0.1 | | Target modules | 8 (q_proj, k_proj, v_proj, out_proj, w1, w2, w3, in_proj) | | Training time | ~1–1.5 hours (per fine_tune.py; earlier docs listed ~2–2.5 hours before the v0.4.3 memory optimization) | ### Hardware - **Device:** Apple Silicon (M-series) - **Backend:** PyTorch MPS (Metal Performance Shaders) ## Dataset - [VoltageVagabond/spam-email-dataset](https://huggingface.co/datasets/VoltageVagabond/spam-email-dataset) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") model = PeftModel.from_pretrained(base_model, "adapters") tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") ``` ## Gradio Interface ```bash pip install -r requirements.txt python app.py ``` ## Files - `adapters/` — LoRA adapter weights + config - `fine_tune.py` — Training script - `app.py` — Gradio web interface - `training_data/` — Training dataset ## Intended Use This model is an **educational demonstration** of LLM fine-tuning with HuggingFace PEFT, created as part of a university course project. It is suitable for: - Learning how LoRA fine-tuning works with the HuggingFace ecosystem (Transformers + PEFT + TRL) - Exploring Liquid AI's novel architecture for text classification - Comparing different LLM fine-tuning frameworks (MLX vs. HuggingFace) It is **not** intended for production spam filtering. ## Limitations - May misclassify legitimate marketing emails as spam - Trained on **English emails only** — not suitable for other languages - Training set (~8K fast / ~16K full) is modest compared to production spam filters — generalization may be limited **Note:** Three-class classification (SPAM / HAM / PHISHING) is supported as of v0.4.0 — earlier versions were binary. The model is deployed as a HuggingFace Space (see Space header above). ## Related Models | Model | Description | Link | |-------|-------------|------| | spam-classifier-mlx | Qwen 3.5 0.8B MLX LoRA fine-tune | [VoltageVagabond/spam-classifier-mlx](https://huggingface.co/VoltageVagabond/spam-classifier-mlx) | | spam-xai-model | sklearn voting ensemble (RF + LR + SVM) with LIME/SHAP/ELI5 explainability | [VoltageVagabond/spam-xai-model](https://huggingface.co/VoltageVagabond/spam-xai-model) | | spam-xai-classifier (Space) | Live Gradio web app for the sklearn classifier | [VoltageVagabond/spam-xai-classifier](https://huggingface.co/spaces/VoltageVagabond/spam-xai-classifier) | ## Citation ```bibtex @misc{voltagevagabond2026spamliquid, title={Spam Classifier — Liquid AI LFM2.5-1.2B LoRA Fine-Tune}, author={VoltageVagabond}, year={2026}, howpublished={\url{https://huggingface.co/VoltageVagabond/spam-classifier-liquid}}, note={ENGT 375 — Applied Machine Learning, Old Dominion University, Spring 2026} } ```