Image-Text-to-Text
Transformers
Safetensors
English
Arabic
mllama
meme-detection
propaganda
hate-speech
multimodal
vision-language
explainability
conversational
text-generation-inference
Instructions to use QCRI/MemeIntel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QCRI/MemeIntel with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="QCRI/MemeIntel") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("QCRI/MemeIntel") model = AutoModelForImageTextToText.from_pretrained("QCRI/MemeIntel") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use QCRI/MemeIntel with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QCRI/MemeIntel" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QCRI/MemeIntel", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/QCRI/MemeIntel
- SGLang
How to use QCRI/MemeIntel with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "QCRI/MemeIntel" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QCRI/MemeIntel", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "QCRI/MemeIntel" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QCRI/MemeIntel", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use QCRI/MemeIntel with Docker Model Runner:
docker model run hf.co/QCRI/MemeIntel
| license: llama3.2 | |
| base_model: meta-llama/Llama-3.2-11B-Vision-Instruct | |
| datasets: | |
| - QCRI/MemeXplain | |
| language: | |
| - en | |
| - ar | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - meme-detection | |
| - propaganda | |
| - hate-speech | |
| - multimodal | |
| - vision-language | |
| - explainability | |
| library_name: transformers | |
| # MemeIntel: Explainable Detection of Propagandistic and Hateful Memes | |
| MemeIntel is a Vision-Language Model fine-tuned from [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) for detecting propaganda in Arabic memes and hateful content in English memes, with explainable reasoning. | |
| ## Model Description | |
| MemeIntel addresses the challenge of understanding and moderating complex, context-dependent multimodal content on social media. The model performs: | |
| - **Label Detection**: Classifies memes into categories (propaganda/not-propaganda/not-meme/other for Arabic; hateful/not-hateful for English) | |
| - **Explanation Generation**: Provides human-readable explanations for its predictions | |
| The model was trained using a novel multi-stage optimization approach on the [MemeXplain](https://huggingface.co/datasets/QCRI/MemeXplain) dataset. | |
| ## Usage | |
| ```python | |
| from transformers import MllamaForConditionalGeneration, AutoProcessor | |
| from PIL import Image | |
| # Load model and processor | |
| model = MllamaForConditionalGeneration.from_pretrained( | |
| "QCRI/MemeIntel", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| processor = AutoProcessor.from_pretrained("QCRI/MemeIntel") | |
| # Load your meme image | |
| image = Image.open("path/to/meme.jpg") | |
| ``` | |
| ### Arabic Propaganda Meme Detection (Arabic Explanation) | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts."}, | |
| {"role": "user", "content": [ | |
| {"type": "image"}, | |
| {"type": "text", "text": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in Arabic. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: لما يقولي انتي مالكيش عزيز\nاعز ما ليا البطاطس المقلية"} | |
| ]} | |
| ] | |
| input_text = processor.apply_chat_template(messages, add_generation_prompt=True) | |
| inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device) | |
| output = model.generate(**inputs, max_new_tokens=256) | |
| print(processor.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| ### Arabic Propaganda Meme Detection (English Explanation) | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts."}, | |
| {"role": "user", "content": [ | |
| {"type": "image"}, | |
| {"type": "text", "text": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in English. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: وأنا أبكي\n٣\nانت تتمنى وانا البي\n{7"} | |
| ]} | |
| ] | |
| input_text = processor.apply_chat_template(messages, add_generation_prompt=True) | |
| inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device) | |
| output = model.generate(**inputs, max_new_tokens=256) | |
| print(processor.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| ### English Hateful Meme Detection | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying hateful content in memes"}, | |
| {"role": "user", "content": [ | |
| {"type": "image"}, | |
| {"type": "text", "text": "I will provide you with memes and the text extracted from these images. Your task is to classify the image as one of the following: 'hateful' or 'not-hateful' and provide a brief explanation. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: bows here, bows there, bows everywhere"} | |
| ]} | |
| ] | |
| input_text = processor.apply_chat_template(messages, add_generation_prompt=True) | |
| inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device) | |
| output = model.generate(**inputs, max_new_tokens=256) | |
| print(processor.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| ## Prompt Templates | |
| ### Arabic Meme (Arabic Explanation) | |
| ``` | |
| System: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. | |
| User: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in Arabic. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT} | |
| ``` | |
| ### Arabic Meme (English Explanation) | |
| ``` | |
| System: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. | |
| User: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in English. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT} | |
| ``` | |
| ### English Hateful Meme | |
| ``` | |
| System: You are an expert social media image analyzer specializing in identifying hateful content in memes | |
| User: I will provide you with memes and the text extracted from these images. Your task is to classify the image as one of the following: 'hateful' or 'not-hateful' and provide a brief explanation. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT} | |
| ``` | |
| ## Expected Output Format | |
| The model outputs in the following format: | |
| ``` | |
| Label: [classification_label] | |
| Explanation: [reasoning for the classification] | |
| ``` | |
| ## Training | |
| - **Base Model**: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) | |
| - **Training Dataset**: [QCRI/MemeXplain](https://huggingface.co/datasets/QCRI/MemeXplain) | |
| - **Training Method**: Multi-stage optimization approach | |
| ## Performance | |
| MemeIntel achieves state-of-the-art results: | |
| - **ArMeme (Arabic Propaganda)**: ~3% absolute improvement over previous SOTA | |
| - **Hateful Memes (English)**: ~7% absolute improvement over previous SOTA | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @inproceedings{kmainasi-etal-2025-memeintel, | |
| title = "{M}eme{I}ntel: Explainable Detection of Propagandistic and Hateful Memes", | |
| author = "Kmainasi, Mohamed Bayan and | |
| Hasnat, Abul and | |
| Hasan, Md Arid and | |
| Shahroor, Ali Ezzat and | |
| Alam, Firoj", | |
| editor = "Christodoulopoulos, Christos and | |
| Chakraborty, Tanmoy and | |
| Rose, Carolyn and | |
| Peng, Violet", | |
| booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", | |
| month = nov, | |
| year = "2025", | |
| address = "Suzhou, China", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2025.emnlp-main.1539/", | |
| doi = "10.18653/v1/2025.emnlp-main.1539", | |
| pages = "30263--30279", | |
| ISBN = "979-8-89176-332-6", | |
| } | |
| ``` | |
| ## License | |
| This model is released under the [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/). | |
| ## Authors | |
| - Mohamed Bayan Kmainasi | |
| - Abul Hasnat | |
| - Md Arid Hasan | |
| - Ali Ezzat Shahroor | |
| - Firoj Alam | |
| Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University | |