Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)

Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity

Authors
Md Solaiman1, *, Md Aumit Hasan1, Afsana Begum1
1Department of Software Engineering, Daffodil International University, Dhaka, 1216, Bangladesh
*Corresponding author. Email: solaiman35-1107@diu.edu.bd
Corresponding Author
Md Solaiman
Available Online 8 June 2026.
DOI
10.2991/978-94-6239-664-7_75How to use a DOI?
Keywords
Generative AI (GA); CLIP; BLIP; microsoft GIT; Stable Diffusion; Midjourney; and DALL-E 3
Abstract

Generative artificial intelligence (GA) has the potential to revolutionize several industries, including the arts, entertainment, and content creation, by facilitating data synthesis and improving creativity through the use of techniques such as variational autoencoders (VAEs) and generative adversarial networks (GANs). The visual attractiveness of AI generated images and their relationship to the text prompts used to generate them are not entirely evident, though. We are here to demonstrate that, although no one has demonstrated this in any previous work, in practice, we use three two-stage neural-network pipelines: BLIP, GIT, and CLIP ResNet architectures. With a cosine-similarity scale ranging from -1 to 1, we obtained 0.45 similarities from the CLIP architecture, 0.46 from the BLIP architecture, and 0.36 from the microsoft GIT. In that regard, the findings suggest that while generative AI (GA) demonstrates an impressive correlation between image-textual signals, it is unable to mimic the contextual knowledge and nuanced creativity that are fundamental to humans. And for the upcoming research in this field, we will also make available a combine dataset of three generative AI (GA) models images—Stable Diffusion, DALL-E 3, and Midjourney—along with their quality ratings and aesthetics assessed by OpenAI ImageGPT-small, microsoft Swin-Transformer, and Google ViT.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
8 June 2026
ISBN
978-94-6239-664-7
ISSN
1951-6851
DOI
10.2991/978-94-6239-664-7_75How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Md Solaiman
AU  - Md Aumit Hasan
AU  - Afsana Begum
PY  - 2026
DA  - 2026/06/08
TI  - Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 1096
EP  - 1109
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_75
DO  - 10.2991/978-94-6239-664-7_75
ID  - Solaiman2026
ER  -