Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity

Md Solaiman; Md Aumit Hasan; Afsana Begum

doi:10.2991/978-94-6239-664-7_75

<Previous Article In Volume

Next Article In Volume>

Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity

Authors

Md Solaiman¹^{, *}, Md Aumit Hasan¹, Afsana Begum¹

¹Department of Software Engineering, Daffodil International University, Dhaka, 1216, Bangladesh

^*Corresponding author. Email: solaiman35-1107@diu.edu.bd

Corresponding Author

Md Solaiman

Available Online 8 June 2026.

DOI: 10.2991/978-94-6239-664-7_75 How to use a DOI?
Keywords: Generative AI (GA); CLIP; BLIP; microsoft GIT; Stable Diffusion; Midjourney; and DALL-E 3
Abstract: Generative artificial intelligence (GA) has the potential to revolutionize several industries, including the arts, entertainment, and content creation, by facilitating data synthesis and improving creativity through the use of techniques such as variational autoencoders (VAEs) and generative adversarial networks (GANs). The visual attractiveness of AI generated images and their relationship to the text prompts used to generate them are not entirely evident, though. We are here to demonstrate that, although no one has demonstrated this in any previous work, in practice, we use three two-stage neural-network pipelines: BLIP, GIT, and CLIP ResNet architectures. With a cosine-similarity scale ranging from -1 to 1, we obtained 0.45 similarities from the CLIP architecture, 0.46 from the BLIP architecture, and 0.36 from the microsoft GIT. In that regard, the findings suggest that while generative AI (GA) demonstrates an impressive correlation between image-textual signals, it is unable to mimic the contextual knowledge and nuanced creativity that are fundamental to humans. And for the upcoming research in this field, we will also make available a combine dataset of three generative AI (GA) models images—Stable Diffusion, DALL-E 3, and Midjourney—along with their quality ratings and aesthetics assessed by OpenAI ImageGPT-small, microsoft Swin-Transformer, and Google ViT.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 8 June 2026
ISBN: 978-94-6239-664-7
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-664-7_75 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Md Solaiman
AU  - Md Aumit Hasan
AU  - Afsana Begum
PY  - 2026
DA  - 2026/06/08
TI  - Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 1096
EP  - 1109
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_75
DO  - 10.2991/978-94-6239-664-7_75
ID  - Solaiman2026
ER  -

download .riscopy to clipboard