Multimodal Cross-guided Attention Networks for Visual Question Answering

Haibin Liu; Shengrong Gong; Yi Ji; Jianyu Yang; Tengfei Xing; Chunping Liu

doi:10.2991/cmsa-18.2018.80

<Previous Article In Volume

Next Article In Volume>

Multimodal Cross-guided Attention Networks for Visual Question Answering

Authors

Haibin Liu, Shengrong Gong, Yi Ji, Jianyu Yang, Tengfei Xing, Chunping Liu

Corresponding Author

Haibin Liu

Available Online April 2018.

DOI: 10.2991/cmsa-18.2018.80 How to use a DOI?
Keywords: visual question answering; attention; cross-guided; gated activation
Abstract: Visual Question Answering (VQA) is an attractive topic combin-ing computer vision with natural language processing. It is more challenging than text-based question answering because of its multimodal nature. The VQA reasoning process requires both effective semantic embedding and fine-grained visual compre-hension. Existing approaches predominantly infer answers from visual spatial information, while neglecting important semantic information in questions and the guidance information between images and questions. To remedy this, we imitate the human mechanism of cross-reasoning about visual and textual infor-mation and propose a multimodal cross-guided attention net-work (MCAN) for VQA which employs a cross-guided joint learning strategy with a gated activation learning method, which can simultaneously capture both rich visual spatial information and significant semantic information. We evaluate the proposed model on two public datasets: VQA dataset and COCO-QA da-taset. Extensive experiments show state-of-the-art performance on the datasets.
Copyright: © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018)
Series: Advances in Intelligent Systems Research
Publication Date: April 2018
ISBN: 978-94-6252-523-8
ISSN: 1951-6851
DOI: 10.2991/cmsa-18.2018.80 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Haibin Liu
AU  - Shengrong Gong
AU  - Yi Ji
AU  - Jianyu Yang
AU  - Tengfei Xing
AU  - Chunping Liu
PY  - 2018/04
DA  - 2018/04
TI  - Multimodal Cross-guided Attention Networks for Visual Question Answering
BT  - Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018)
PB  - Atlantis Press
SP  - 347
EP  - 353
SN  - 1951-6851
UR  - https://doi.org/10.2991/cmsa-18.2018.80
DO  - 10.2991/cmsa-18.2018.80
ID  - Liu2018/04
ER  -

download .riscopy to clipboard