Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity
- DOI
- 10.2991/978-94-6239-664-7_75How to use a DOI?
- Keywords
- Generative AI (GA); CLIP; BLIP; microsoft GIT; Stable Diffusion; Midjourney; and DALL-E 3
- Abstract
Generative artificial intelligence (GA) has the potential to revolutionize several industries, including the arts, entertainment, and content creation, by facilitating data synthesis and improving creativity through the use of techniques such as variational autoencoders (VAEs) and generative adversarial networks (GANs). The visual attractiveness of AI generated images and their relationship to the text prompts used to generate them are not entirely evident, though. We are here to demonstrate that, although no one has demonstrated this in any previous work, in practice, we use three two-stage neural-network pipelines: BLIP, GIT, and CLIP ResNet architectures. With a cosine-similarity scale ranging from -1 to 1, we obtained 0.45 similarities from the CLIP architecture, 0.46 from the BLIP architecture, and 0.36 from the microsoft GIT. In that regard, the findings suggest that while generative AI (GA) demonstrates an impressive correlation between image-textual signals, it is unable to mimic the contextual knowledge and nuanced creativity that are fundamental to humans. And for the upcoming research in this field, we will also make available a combine dataset of three generative AI (GA) models images—Stable Diffusion, DALL-E 3, and Midjourney—along with their quality ratings and aesthetics assessed by OpenAI ImageGPT-small, microsoft Swin-Transformer, and Google ViT.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Md Solaiman AU - Md Aumit Hasan AU - Afsana Begum PY - 2026 DA - 2026/06/08 TI - Text-Image Correlation in Generative-AI: An In Silico Study of Their Adaptivity BT - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025) PB - Atlantis Press SP - 1096 EP - 1109 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6239-664-7_75 DO - 10.2991/978-94-6239-664-7_75 ID - Solaiman2026 ER -