Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)

Advanced Hybrid CNN-ViT Ensemble with Attention and FPN Mechanism for Retinal OCT Disease Classification

Authors
Abdullah Al Noman1, *, Eamin Hasan Shanto1, Mahir Faysal1, Jamil Hasan1, Samidul Islam Imran Kayes1, Mohammad Jahangir Alam1
1Department of Computer Science and Engineering, Daffodil International University, Dhaka, 1216, Bangladesh
*Corresponding author. Email: noman15-5713@diu.edu.bd
Corresponding Author
Abdullah Al Noman
Available Online 8 June 2026.
DOI
10.2991/978-94-6239-664-7_41How to use a DOI?
Keywords
Retinal OCT; Deep Learning; Convolutional Neural Networks; Vision Transformer; Feature Pyramid Network; Ensemble Learning
Abstract

Retinal diseases can be considered one of the leading causes of vision loss on the global scene, and OCT imaging is crucial in the timely diagnosis of the disease. In this paper, a hybrid model of deep learning, which combines Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), Feature Pyramid Networks (FPN) and cross-modal attention, is proposed to simultaneously exploit local retinal texture and global structural features. An ensemble mechanism is used to combine three fusion strategies into attention-based, concatenation strategy, and weighted fusion to enhance robustness and accuracy. Experiments on a publicly available OCTDL dataset of seven disease classes get 96.3% accuracy and a 95.1% macro F1-score and outperform the traditional CNN and ViT baselines. These results prove the high promise of the hybrid ensemble models towards the effective retinal OCT multi-disease classification.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
8 June 2026
ISBN
978-94-6239-664-7
ISSN
1951-6851
DOI
10.2991/978-94-6239-664-7_41How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Abdullah Al Noman
AU  - Eamin Hasan Shanto
AU  - Mahir Faysal
AU  - Jamil Hasan
AU  - Samidul Islam Imran Kayes
AU  - Mohammad Jahangir Alam
PY  - 2026
DA  - 2026/06/08
TI  - Advanced Hybrid CNN-ViT Ensemble with Attention and FPN Mechanism for Retinal OCT Disease Classification
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 593
EP  - 607
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_41
DO  - 10.2991/978-94-6239-664-7_41
ID  - AlNoman2026
ER  -