--- title: Flickr Image Captioning emoji: 🖼️ colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 5.20.0 python_version: "3.11" app_file: app.py pinned: false license: mit models: - OmarGamal48812/flickr-captioning --- # Flickr Image Captioning Live demo of a **ResNet50 + Bahdanau attention + GRU + GloVe** image captioning model trained on **Flickr8k + Flickr30k** (39,874 images × 5 captions). Upload an image and the Space will return: - the top caption (beam search, k = 5) - the top-k alternative captions - a per-word attention heatmap showing what the decoder looked at for each token ## Test-set performance (beam = 5) | BLEU-4 | METEOR | CIDEr | ROUGE-L | |---:|---:|---:|---:| | 0.3093 | 0.4709 | 0.7961 | 0.5257 | ## Model The checkpoint, vocabulary, training config, and metrics are published in the companion model repository: [**OmarGamal48812/flickr-captioning**](https://huggingface.co/OmarGamal48812/flickr-captioning) The Space downloads them at startup with `huggingface_hub.hf_hub_download`. ## Source code [github.com/OmarGamal488/flickr-image-captioning](https://github.com/OmarGamal488/flickr-image-captioning)