Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)

Deep Cross of Intra and Inter Modalities for Visual Question Answering

Authors
Rishav Bhardwaj
Corresponding Author
Rishav Bhardwaj
Available Online 13 September 2021.
DOI
10.2991/ahis.k.210913.007How to use a DOI?
Keywords
Deep Learning, Inter-Modality Fusion, Intra-Modality Fusion, Visual Question Answering
Abstract

Visual Question Answering (VQA) has recently attained interest in the deep learning community. The main challenge that exists in VQA is to understand the sense of each modality and how to fuse these features. In this paper, DXMN (Deep Cross Modality Network) is introduced which takes into consideration not only the inter-modality fusion but also the intra-modality fusion. The main idea behind this architecture is to take the positioning of each feature into account and then recognize the relationship between multi-modal features as well as establishing a relationship among themselves in order to learn them in a better way. The architecture is pretrained on question answering datasets like, VQA v2.0, GQA, and Visual Genome which is later fine-tuned to achieve state-of-the-art performance. DXMN achieves an accuracy of 68.65 in test-standard and 68.43 in test-dev of VQA v2.0 dataset.

Copyright
© 2021, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)
Series
Atlantis Highlights in Computer Sciences
Publication Date
13 September 2021
ISBN
978-94-6239-428-5
ISSN
2589-4900
DOI
10.2991/ahis.k.210913.007How to use a DOI?
Copyright
© 2021, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Rishav Bhardwaj
PY  - 2021
DA  - 2021/09/13
TI  - Deep Cross of Intra and Inter Modalities for Visual Question Answering
BT  - Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)
PB  - Atlantis Press
SP  - 47
EP  - 53
SN  - 2589-4900
UR  - https://doi.org/10.2991/ahis.k.210913.007
DO  - 10.2991/ahis.k.210913.007
ID  - Bhardwaj2021
ER  -