---
title: Spam Email Classifier with XAI
emoji: 📧
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: "5.23.0"
python_version: "3.10"
app_file: app.py
pinned: false
license: mit
tags:
  - spam-detection
  - xai
  - lime
  - shap
  - eli5
  - scikit-learn
  - nlp
  - explainable-ai
models:
  - VoltageVagabond/spam-classifier-gradio-model
  - VoltageVagabond/spam-classifier-mlx
  - VoltageVagabond/spam-classifier-liquid
  - VoltageVagabond/spam-xai-model
datasets:
  - VoltageVagabond/spam-email-dataset
---

## Senior Project Notice

This repository was created for a senior project in ENGT 375 Applied Machine Learning at Old Dominion University. It is provided for educational and research demonstration purposes only. It is not intended for production use, security filtering, or making real-world spam/phishing decisions. Always use established security tools for operational email protection.

# Spam Email Classifier with XAI Explanations
A Gradio web app that classifies emails as spam or ham and provides explainable AI (XAI) insights using three different methods.

## Features

- Paste any email and get an instant spam/ham prediction
- **LIME** explanations — which words pushed the decision
- **SHAP** feature importance — game-theoretic attribution
- **ELI5** — model internal feature weights
- **Side-by-side comparison** of all three XAI methods
- **Plain English summary** of why the model made its decision
- **User feedback** — thumbs up/down to log corrections for batch retraining
- Adjustable classification threshold

## How to Run Locally

```bash
pip install -r requirements.txt
python train.py       # train the models (first time only)
python app.py         # launch the Gradio app
```

## Retraining with Feedback

```bash
python retrain.py             # retrain with accumulated feedback corrections
python retrain.py --no-feedback  # retrain with original data only
```

## Model

Voting ensemble (Random Forest + Logistic Regression + SVM) trained on SpamAssassin + Enron email datasets using TF-IDF + 24 metadata features.

## Tech Stack

- scikit-learn (ensemble classifier)
- LIME + SHAP + ELI5 (explainability)
- Gradio (web interface)
- NLTK (text preprocessing)