Linking Language and Vision: A Deep Learning Method for Captioning Images
- DOI
- 10.2991/978-94-6463-805-9_9How to use a DOI?
- Keywords
- Image Captioning; Multi-Modal Alignment Loss; VGG16; Flicker30k; LSTM; CNN; Attention Heatmaps
- Abstract
Image captioning facilitates the matching of visual understanding and language comprehension by helping machines to narrate images in human terms. This paper details the applications of Convolutional Neural Networks (CNNs) fused with Bidirectional Long Short Term Memory BiLSTM neural networks with attention for better captioning. Important changes comprise the introduction of Multi-Modal Alignment Loss for better matching of textual and image data, the use of attention heatmaps for better model interpretability, and advanced data augmentation for higher model robustness. A pre-trained VGG16 model is used to extract visual features and the performance of the model is tested on the Flickr30k dataset with various metrics like BLEU, METEOR, ROUGE-L, CIDEr, and SPICE. These enhancements improve the fetch, readability, and multi-purpose use of captions for images with use in aid for the disabled and automatic content creation.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Houda Benaliouche AU - Lina Ines Filali AU - Oualid Bouhaddi PY - 2025 DA - 2025/08/05 TI - Linking Language and Vision: A Deep Learning Method for Captioning Images BT - Proceedings of the First International Conference on Artificial Intelligence, Smart Technologies and Communications (AISTC 2025) PB - Atlantis Press SP - 65 EP - 75 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-805-9_9 DO - 10.2991/978-94-6463-805-9_9 ID - Benaliouche2025 ER -