MEDFUSION: A Multimodal Medical Diagnosis using Symptoms and Images
- DOI
- 10.2991/978-94-6239-713-2_5How to use a DOI?
- Keywords
- Multimodal fusion; medical diagnosis; symptom analysis; medical imaging; deep learning; MobileNetV2
- Abstract
Multimodal artificial intelligence has emerged as an effective approach in medical diagnostics by integrating heterogeneous data sources such as clinical symptoms and medical images, thereby addressing the limitations of unimodal diagnostic systems [10], [15], [12]. The creation of the modular multimodal medical diagnostic framework, called MEDFUSION, is presented in this article. It combines deep learning-based medical image classification utilizing the confidence-weighted late fusion technique [2], [15] with the strength of structured symptom-based machine learning. The symptom-based component is trained on the benchmark dataset consisting of 4,920 samples and 132 binary symptoms mapped to 41 diseases using an ensemble of Random Forest, Naive Bayes, and Logistic Regression models [3]. With an average classification accuracy of 96%, the image analysis component employs an optimized MobileNetV2 architecture [11] that was trained and tested on 17 publicly available medical image datasets, whereby each dataset was processed individually using standardized image preprocessing methods [4], such as X-rays [9], CT [16], MRI, ultrasound, OCT [14], fundus, dermoscopy [12], endoscopy, and otoscopy images. The decision level fusion technique involves the fusion of probabilistic outcomes of both modalities [15], resulting in an overall diagnosis with increased robustness. Experimental evaluation demonstrates that the proposed multimodal diagnostic framework achieves an accuracy of up to 98% on selected benchmark datasets, indicating the effectiveness of integrating symptom-based prediction with image-based deep learning models [1], [5], outperforming the individual modalities on the multimodal test pair data set. The multimodal decision support framework is implemented as an interface using the Streamlit library, allowing for the interactive input of symptoms, image upload, and display of confidence values. The results show the potential of multimodal AI systems for decision support systems, increasing robustness and interpretability [17], [8].
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - T. D. Venkatesh AU - R. Krishna Priya AU - R. S. Vignesh PY - 2026 DA - 2026/06/25 TI - MEDFUSION: A Multimodal Medical Diagnosis using Symptoms and Images BT - Proceedings of the International Conference on Advances in Computing Technology and Artificial Intelligence (COMPUTATIA 2026) PB - Atlantis Press SP - 66 EP - 84 SN - 2589-4919 UR - https://doi.org/10.2991/978-94-6239-713-2_5 DO - 10.2991/978-94-6239-713-2_5 ID - Venkatesh2026 ER -