--- base_model: google/paligemma2-3b-pt-224 library_name: peft pipeline_tag: image-text-to-text license: mit language: - en tags: - paligemma - paligemma2 - lora - peft - transformers - multimodal - hate-speech-detection - multi-label-classification - vision-language-model - mmhs150k - content-moderation - social-media - meme-classification datasets: - mmhs150k metrics: - f1 model-index: - name: paligemma2-3b-mmhs150k-lora results: - task: type: image-text-to-text name: Multi-Modal Hate Speech Detection dataset: name: MMHS150K type: mmhs150k metrics: - name: F1 Micro (Test) type: f1 value: 0.5404 - name: F1 Macro (Test) type: f1 value: 0.4896 - name: F1 Micro (Validation) type: f1 value: 0.5378 - name: Subset Accuracy (Validation) type: accuracy value: 0.4338 --- # PaliGemma 2 LoRA Adapter for Multi-Modal Hateful Content Classification
[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue.svg)](https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-orange)](https://huggingface.co/Amirhossein75/paligemma2-3b-mmhs150k-lora)
## 🎯 Model Overview This is a **LoRA (Low-Rank Adaptation) adapter** fine-tuned on top of [google/paligemma2-3b-pt-224](https://huggingface.co/google/paligemma2-3b-pt-224) for **multi-label hateful content detection** on paired **text + image** data using the MMHS150K dataset. ### ✨ Key Features - **Multi-Modal Understanding**: Processes both text and images simultaneously for context-aware classification - **Multi-Label Classification**: Can detect multiple types of hate speech in a single sample - **Generative Approach**: Uses generative classification instead of traditional classification heads - **Efficient Fine-Tuning**: LoRA adapter with only ~24MB of trainable parameters - **JSON Output**: Generates structured JSON arrays for easy downstream processing This model uses **generative classification**: instead of training a dedicated classification head, the model generates a strict JSON array of labels (e.g., `["racist", "sexist"]`). ## Model Details ### Model Description Given an image and its associated text, the model outputs a JSON array containing zero or more labels from a fixed label set. The model is trained to classify hateful memes and social media content into multiple hate speech categories. **High-level flow:** 1. Build a strict "return JSON only" prompt listing allowed labels. 2. Feed (text + image) to the VLM. 3. Generate a short response. 4. Parse the first JSON array found (best-effort JSON extraction). 5. Convert labels into multi-hot predictions and compute multi-label metrics. | Property | Value | |----------|-------| | **Developed by** | [Amirhossein Yousefi](https://github.com/amirhossein-yousefi) | | **Model type** | Vision-Language Model (VLM) with LoRA adapter | | **Language(s)** | English | | **License** | MIT | | **Base model** | [google/paligemma2-3b-pt-224](https://huggingface.co/google/paligemma2-3b-pt-224) | | **Parameters (Base)** | 3B | | **Parameters (Adapter)** | ~24MB | | **Input** | Text + Image (224×224) | | **Output** | JSON array of hate speech labels | ### Model Sources | Resource | Link | |----------|------| | **Repository** | [github.com/amirhossein-yousefi/text_image_multi_modal_vlm](https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm) | | **Base Model** | [google/paligemma2-3b-pt-224](https://huggingface.co/google/paligemma2-3b-pt-224) | | **Dataset** | [MMHS150K](https://gombru.github.io/2019/10/09/MMHS/) | ### 🏷️ Label Classes The model classifies content into the following **5 hate speech categories**: | Label | Description | Examples | |-------|-------------|----------| | `racist` | Content with racial discrimination | Slurs, stereotypes, dehumanization based on race/ethnicity | | `sexist` | Content with gender-based discrimination | Misogyny, gender stereotypes, harassment based on gender | | `homophobe` | Content with anti-LGBTQ+ discrimination | Slurs, stereotypes targeting LGBTQ+ individuals | | `religion` | Content with religious discrimination | Attacks on religious groups, religious stereotypes | | `otherhate` | Other forms of hateful content | Hate not covered by above categories | ## Uses ### ✅ Direct Use This model is intended for detecting and classifying hateful content in multimodal (text + image) social media posts, memes, and similar content. It can be used for: - **Content moderation systems** - Automated flagging of potentially harmful content - **Research on hate speech detection** - Academic studies on multi-modal hate speech - **Social media analysis** - Understanding patterns of hateful content - **Dataset annotation assistance** - Semi-automated labeling of hate speech datasets - **Educational purposes** - Understanding how VLMs can be applied to content moderation ### ⚠️ Out-of-Scope Use - **Production moderation without human review:** This model should not be the sole decision-maker for content removal. - **Non-English content:** The model is trained on English data only. - **Single-modality analysis:** Best results are achieved with both text and image inputs. - **Real-time high-stakes decisions:** The model may produce errors and should not be used for legal or high-stakes decisions without human oversight. - **Surveillance or censorship:** This model should not be used for mass surveillance or unjust censorship. ## Bias, Risks, and Limitations ### Known Limitations - **Dataset Bias:** The model is trained on MMHS150K dataset which may contain biases present in the original annotations. - **Cultural Context:** Performance may vary across different types of hateful content and cultural contexts. - **Error Rate:** The model may produce false positives/negatives and should be used with human oversight. - **JSON Parsing:** Generated JSON output may occasionally be malformed and require robust parsing. - **Temporal Bias:** The model may not recognize new slurs, memes, or evolving hate speech patterns. - **Image Quality:** Performance may degrade on low-quality, distorted, or heavily edited images. ### Recommendations - ✅ Always use human review for critical content moderation decisions. - ✅ Validate model outputs against your specific use case before deployment. - ✅ Consider the cultural and contextual limitations of the training data. - ✅ Implement robust JSON parsing with fallback mechanisms. - ✅ Regularly evaluate model performance on new data distributions. - ✅ Combine with other moderation signals for production systems. ## 🚀 How to Get Started with the Model ### Installation ```bash pip install transformers peft torch pillow accelerate ``` ### Quick Start - Load the Model ```python from transformers import AutoModelForImageTextToText, AutoProcessor from peft import PeftModel import torch # Model identifiers BASE_MODEL = "google/paligemma2-3b-pt-224" LORA_ADAPTER = "Amirhossein75/paligemma2-3b-mmhs150k-lora" # Load the base model base_model = AutoModelForImageTextToText.from_pretrained( BASE_MODEL, torch_dtype=torch.float16, device_map="auto" # or "cpu" for CPU-only inference ) # Load the LoRA adapter model = PeftModel.from_pretrained(base_model, LORA_ADAPTER) # Load the processor processor = AutoProcessor.from_pretrained(BASE_MODEL) print("✅ Model loaded successfully!") ``` ### Full Inference Example ```python import torch from PIL import Image from transformers import AutoModelForImageTextToText, AutoProcessor from peft import PeftModel # Load base model and adapter BASE_MODEL = "google/paligemma2-3b-pt-224" LORA_ADAPTER = "Amirhossein75/paligemma2-3b-mmhs150k-lora" processor = AutoProcessor.from_pretrained(BASE_MODEL) base_model = AutoModelForImageTextToText.from_pretrained( BASE_MODEL, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(base_model, LORA_ADAPTER) # Prepare input image = Image.open("path/to/image.jpg").convert("RGB") text = "Some text to analyze" # Create prompt class_names = ["racist", "sexist", "homophobe", "religion", "otherhate"] prompt = f"Classify the following text and image into zero or more of these labels: {class_names}. Return ONLY a JSON array of applicable labels. Text: {text}" # Generate inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=64) result = processor.decode(outputs[0], skip_special_tokens=True) print(result) # e.g., ["racist", "sexist"] ``` > **Note:** This is a LoRA adapter and requires loading the base model first. You cannot use `AutoModel.from_pretrained()` directly on the adapter. ### Batch Inference ```python import json import re def parse_json_labels(response: str) -> list: """Extract JSON array from model response with fallback.""" try: # Try to find JSON array in response match = re.search(r'\[.*?\]', response) if match: return json.loads(match.group()) except json.JSONDecodeError: pass return [] def classify_batch(model, processor, images, texts, class_names): """Classify a batch of image-text pairs.""" results = [] for image, text in zip(images, texts): prompt = f"Classify the following text and image into zero or more of these labels: {class_names}. Return ONLY a JSON array of applicable labels. Text: {text}" inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=64) response = processor.decode(outputs[0], skip_special_tokens=True) labels = parse_json_labels(response) results.append(labels) return results ``` ## Training Details ### 📊 Training Data **MMHS150K (Multi-Modal Hate Speech)** - A large-scale dataset for multi-modal hate speech detection containing ~150K tweets with associated images. | Split | Samples | Description | |-------|---------|-------------| | Train | ~135,000 | Training samples | | Validation | 5,000 | Validation samples | | Test | ~10,000 | Held-out test samples | **Dataset structure:** - `train.csv`, `val.csv`, `test.csv` with columns: `text`, `image_path`, `labels` - Labels are multi-hot encoded for: racist, sexist, homophobe, religion, otherhate **Data Source:** Twitter/X posts with associated images, annotated for hate speech categories. ### Training Procedure #### 🖥️ Hardware Used | Component | Specification | |-----------|---------------| | **GPU** | NVIDIA A100 (40GB/80GB HBM2e) | | **Platform** | Google Colab Pro | | **GPU Memory** | 40GB+ | | **Precision** | bf16 (Brain Float 16) mixed precision | | **CUDA Version** | 11.8+ | > **Note:** The NVIDIA A100 is a data center GPU based on the Ampere architecture, offering 40GB or 80GB of HBM2e memory with 1.6TB/s bandwidth. It provides excellent performance for large VLM fine-tuning tasks. #### ⚙️ Training Hyperparameters | Parameter | Value | |-----------|-------| | **Training regime** | bf16 mixed precision | | **Optimizer** | AdamW | | **Learning rate** | 2e-4 | | **Batch size** | 4 (with gradient accumulation) | | **Epochs** | 1 | | **Max sequence length** | 512 | | **Warmup steps** | 100 | #### 🔧 LoRA Configuration | Parameter | Value | |-----------|-------| | **LoRA rank (r)** | 4 | | **LoRA alpha** | 32 | | **LoRA dropout** | 0.05 | | **Target modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Task type** | CAUSAL_LM | | **Bias** | none | | **Trainable parameters** | ~24MB | #### ⏱️ Training Time & Throughput | Metric | Value | |--------|-------| | **Validation time** | 458.13s (0:07:38) | | **Validation throughput** | 10.914 samples/s | | **Epochs completed** | 1.0 | | **Final validation loss** | 0.3525 | ## 📈 Evaluation ### Testing Data, Factors & Metrics #### Testing Data | Dataset | Samples | Description | |---------|---------|-------------| | **Validation set** | 5,000 | MMHS150K validation split | | **Test set** | ~10,000 | MMHS150K test split | #### Metrics Explained | Metric | Description | Interpretation | |--------|-------------|----------------| | **F1 Micro** | Micro-averaged F1 score across all labels | Higher is better. Gives equal weight to each sample. | | **F1 Macro** | Macro-averaged F1 score (unweighted mean) | Higher is better. Gives equal weight to each class. | | **Subset Accuracy** | Exact match accuracy | Higher is better. All labels must match exactly. | | **Hamming Loss** | Fraction of incorrectly predicted labels | Lower is better. Measures per-label errors. | ### 📊 Results #### This Model's Performance | Split | F1 Micro | F1 Macro | Subset Accuracy | Hamming Loss | |-------|----------|----------|-----------------|--------------| | **Validation** | 0.5378 | 0.5000 | 0.4338 | 0.1422 | | **Test** | 0.5404 | 0.4896 | – | – | #### Comparison with Other Models in the Project | Model | Hardware | Split | F1 Micro | F1 Macro | Subset Acc | Hamming Loss | |-------|----------|-------|----------|----------|------------|--------------| | **Qwen2-VL 2B + LoRA** | RTX 3080 (16GB) | Validation | 0.6172 | 0.5077 | 0.4366 | 0.14276 | | **PaliGemma 2 3B + LoRA** (this model) | A100 | Validation | 0.5378 | 0.5000 | 0.4338 | 0.14220 | | **Qwen2-VL 2B + LoRA** | RTX 3080 (16GB) | Test | 0.6110 | 0.4992 | – | – | | **PaliGemma 2 3B + LoRA** (this model) | A100 | Test | 0.5404 | 0.4896 | – | – | > **Note:** The Qwen2-VL model was trained on a local Windows machine with NVIDIA GeForce RTX 3080 Laptop GPU (16GB VRAM), NVIDIA driver 581.57, and CUDA 13.0. ## 🔧 Technical Specifications ### Model Architecture and Objective | Component | Description | |-----------|-------------| | **Base Model** | PaliGemma 2 (3B parameters) - a vision-language model by Google | | **Architecture** | Transformer-based VLM with SigLIP vision encoder | | **Vision Encoder** | SigLIP-So400m/14 | | **Text Decoder** | Gemma 2B | | **Image Resolution** | 224 × 224 pixels | | **Adapter** | LoRA (Low-Rank Adaptation) | | **Objective** | Generative multi-label classification via JSON array generation | ### Compute Infrastructure #### Hardware | Component | Training | Inference (Recommended) | |-----------|----------|------------------------| | **GPU** | NVIDIA A100 (40GB) | Any GPU with 8GB+ VRAM | | **Platform** | Google Colab Pro | Local / Cloud | | **Precision** | bf16 | fp16 / bf16 | | **Memory** | 40GB+ GPU RAM | 8GB+ GPU RAM | #### Software | Package | Version | |---------|---------| | **Python** | 3.8+ | | **Transformers** | 4.40+ | | **PEFT** | 0.17.1 | | **PyTorch** | 2.0+ | | **Accelerate** | 0.27+ | | **Pillow** | 9.0+ | ## 📚 Citation If you use this model, please cite: **BibTeX:** ```bibtex @misc{yousefi2024paligemma-hatespeech, author = {Yousefi, Amirhossein}, title = {Multi-Modal Vision-Language Models for Hateful Content Classification}, year = {2024}, publisher = {GitHub}, howpublished = {\url{https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm}}, note = {PaliGemma 2 LoRA adapter for MMHS150K hate speech detection} } ``` **APA:** Yousefi, A. (2024). *Multi-Modal Vision-Language Models for Hateful Content Classification*. GitHub. https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm ## 📖 More Information For more details on training, evaluation, and usage, see the [GitHub repository](https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm). ### Related Models - [Qwen2-VL 2B MMHS150K LoRA](https://huggingface.co/Amirhossein75/qwen2-vl-2b-mmhs150k-lora) - Alternative VLM fine-tuned on the same dataset ## 👤 Model Card Authors [Amirhossein Yousefi](https://github.com/amirhossein-yousefi) ## 📧 Model Card Contact - **GitHub:** [amirhossein-yousefi](https://github.com/amirhossein-yousefi) - **Hugging Face:** [Amirhossein75](https://huggingface.co/Amirhossein75) --- ### Framework Versions | Framework | Version | |-----------|---------| | PEFT | 0.17.1 | | Transformers | 4.40+ | | PyTorch | 2.0+ |