--- title: Spam Classifier — Liquid AI emoji: 🤖 colorFrom: blue colorTo: green sdk: gradio sdk_version: "5.23.0" python_version: "3.11" app_file: app.py pinned: false license: mit tags: - spam-detection - liquid-ai - lora - peft - gradio - nlp - text-classification models: - LiquidAI/LFM2.5-1.2B-Instruct datasets: - VoltageVagabond/spam-email-dataset --- --- library_name: transformers tags: - spam-detection - liquid-ai - lora - peft - apple-silicon - nlp - text-classification license: mit base_model: LiquidAI/LFM2.5-1.2B-Instruct datasets: - VoltageVagabond/spam-email-dataset pipeline_tag: text-generation --- # Spam Classifier — Liquid AI LFM2.5-1.2B LoRA Fine-Tune **ENGT 375 — Applied Machine Learning | Spring 2026 | ODU** > **Disclaimer:** This model was created as a student project for ENGT 375 (Applied Machine Learning) at Old Dominion University, Spring 2026. It is intended for **educational and research purposes only** and should not be used as a sole spam/phishing filter in production. Classification accuracy may vary, and the model may produce incorrect or misleading results. Always use established email security tools for real-world spam filtering. Liquid AI's LFM2.5-1.2B-Instruct model fine-tuned with LoRA adapters using HuggingFace Transformers + PEFT for spam email classification. ## Model Details - **Base model:** LiquidAI/LFM2.5-1.2B-Instruct - **Fine-tuning:** LoRA (rank 8, alpha 16, dropout 0.1) - **Framework:** HuggingFace Transformers + PEFT + TRL - **Hardware:** Apple Silicon (M-series) - **Task:** Classify emails as spam or ham ## LoRA Target Modules `w1`, `w2`, `in_proj`, `out_proj`, `v_proj`, `k_proj`, `q_proj`, `w3` ## Training Details | Hyperparameter | Value | |----------------|-------| | Training examples | ~8,000 (fast) / ~16,000 (full) — 3-class Spam/Ham/Phishing | | Test examples | ~20% holdout from the retrain split | | Epochs | 3 | | Batch size | 1 (effective 4 with gradient accumulation steps = 4) | | Learning rate | 2e-4 | | Max sequence length | 256 | | Optimizer | adamw_torch (bitsandbytes 8-bit not supported on MPS) | | Weight dtype | bfloat16 | | Device | MPS (Apple Silicon) | | Gradient checkpointing | Enabled (use_reentrant=False) | | Max gradient norm | 0.3 | | LoRA rank | 8 | | LoRA alpha | 16 | | LoRA dropout | 0.1 | | Target modules | 8 (q_proj, k_proj, v_proj, out_proj, w1, w2, w3, in_proj) | | Training time | ~1–1.5 hours (per fine_tune.py; earlier docs listed ~2–2.5 hours before the v0.4.3 memory optimization) | ### Hardware - **Device:** Apple Silicon (M-series) - **Backend:** PyTorch MPS (Metal Performance Shaders) ## Dataset - [VoltageVagabond/spam-email-dataset](https://huggingface.co/datasets/VoltageVagabond/spam-email-dataset) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") model = PeftModel.from_pretrained(base_model, "adapters") tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct") ``` ## Gradio Interface ```bash pip install -r requirements.txt python app.py ``` ## Files - `adapters/` — LoRA adapter weights + config - `fine_tune.py` — Training script - `app.py` — Gradio web interface - `training_data/` — Training dataset ## Intended Use This model is an **educational demonstration** of LLM fine-tuning with HuggingFace PEFT, created as part of a university course project. It is suitable for: - Learning how LoRA fine-tuning works with the HuggingFace ecosystem (Transformers + PEFT + TRL) - Exploring Liquid AI's novel architecture for text classification - Comparing different LLM fine-tuning frameworks (MLX vs. HuggingFace) It is **not** intended for production spam filtering. ## Limitations - May misclassify legitimate marketing emails as spam - Trained on **English emails only** — not suitable for other languages - Training set (~8K fast / ~16K full) is modest compared to production spam filters — generalization may be limited **Note:** Three-class classification (SPAM / HAM / PHISHING) is supported as of v0.4.0 — earlier versions were binary. The model is deployed as a HuggingFace Space (see Space header above). ## Related Models | Model | Description | Link | |-------|-------------|------| | spam-classifier-mlx | Qwen 3.5 0.8B MLX LoRA fine-tune | [VoltageVagabond/spam-classifier-mlx](https://huggingface.co/VoltageVagabond/spam-classifier-mlx) | | spam-xai-model | sklearn voting ensemble (RF + LR + SVM) with LIME/SHAP/ELI5 explainability | [VoltageVagabond/spam-xai-model](https://huggingface.co/VoltageVagabond/spam-xai-model) | | spam-xai-classifier (Space) | Live Gradio web app for the sklearn classifier | [VoltageVagabond/spam-xai-classifier](https://huggingface.co/spaces/VoltageVagabond/spam-xai-classifier) | ## Citation ```bibtex @misc{voltagevagabond2026spamliquid, title={Spam Classifier — Liquid AI LFM2.5-1.2B LoRA Fine-Tune}, author={VoltageVagabond}, year={2026}, howpublished={\url{https://huggingface.co/VoltageVagabond/spam-classifier-liquid}}, note={ENGT 375 — Applied Machine Learning, Old Dominion University, Spring 2026} } ```