---
title: Flickr Image Captioning
emoji: 🖼️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.20.0
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
models:
  - OmarGamal48812/flickr-captioning
---

# Flickr Image Captioning

Live demo of a **ResNet50 + Bahdanau attention + GRU + GloVe** image captioning
model trained on **Flickr8k + Flickr30k** (39,874 images × 5 captions).

Upload an image and the Space will return:
- the top caption (beam search, k = 5)
- the top-k alternative captions
- a per-word attention heatmap showing what the decoder looked at for each token

## Test-set performance (beam = 5)

| BLEU-4 | METEOR | CIDEr | ROUGE-L |
|---:|---:|---:|---:|
| 0.3093 | 0.4709 | 0.7961 | 0.5257 |

## Model

The checkpoint, vocabulary, training config, and metrics are published in
the companion model repository:

[**OmarGamal48812/flickr-captioning**](https://huggingface.co/OmarGamal48812/flickr-captioning)

The Space downloads them at startup with `huggingface_hub.hf_hub_download`.

## Source code

[github.com/OmarGamal488/flickr-image-captioning](https://github.com/OmarGamal488/flickr-image-captioning)