--- title: Spam Email Classifier with XAI emoji: 📧 colorFrom: blue colorTo: red sdk: gradio sdk_version: "5.23.0" python_version: "3.10" app_file: app.py pinned: false license: mit tags: - spam-detection - xai - lime - shap - eli5 - scikit-learn - nlp - explainable-ai models: - VoltageVagabond/spam-classifier-gradio-model - VoltageVagabond/spam-classifier-mlx - VoltageVagabond/spam-classifier-liquid - VoltageVagabond/spam-xai-model datasets: - VoltageVagabond/spam-email-dataset --- ## Senior Project Notice This repository was created for a senior project in ENGT 375 Applied Machine Learning at Old Dominion University. It is provided for educational and research demonstration purposes only. It is not intended for production use, security filtering, or making real-world spam/phishing decisions. Always use established security tools for operational email protection. # Spam Email Classifier with XAI Explanations A Gradio web app that classifies emails as spam or ham and provides explainable AI (XAI) insights using three different methods. ## Features - Paste any email and get an instant spam/ham prediction - **LIME** explanations — which words pushed the decision - **SHAP** feature importance — game-theoretic attribution - **ELI5** — model internal feature weights - **Side-by-side comparison** of all three XAI methods - **Plain English summary** of why the model made its decision - **User feedback** — thumbs up/down to log corrections for batch retraining - Adjustable classification threshold ## How to Run Locally ```bash pip install -r requirements.txt python train.py # train the models (first time only) python app.py # launch the Gradio app ``` ## Retraining with Feedback ```bash python retrain.py # retrain with accumulated feedback corrections python retrain.py --no-feedback # retrain with original data only ``` ## Model Voting ensemble (Random Forest + Logistic Regression + SVM) trained on SpamAssassin + Enron email datasets using TF-IDF + 24 metadata features. ## Tech Stack - scikit-learn (ensemble classifier) - LIME + SHAP + ELI5 (explainability) - Gradio (web interface) - NLTK (text preprocessing)