Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)

Multi-stage Semantic Attention with Transformer for Multi-label Image Classification

Authors
Qi Du1, *, Ying Ma1, Jianmin Li1
1College of Computer and Information Engineering, Xiamen University of Technology, No.600 Ligong Road, Jimei District, Xiamen, 361024, Fujian Province, China
*Corresponding author. Email: duqi13228882809@163.com
Corresponding Author
Qi Du
Available Online 27 December 2022.
DOI
10.2991/978-94-6463-040-4_178How to use a DOI?
Keywords
Multi-label Image Classification; Transformer; Semantic Attention
Abstract

Multi-label image classification is a fundamental classification task, which seeks to assign numerous possible labels to an image. Many deep convolutional neural network (CNN)-based approaches to discovering the semantics of labels and learning the semantic representation of images by modeling label correlation have been proposed in recent years. However, some small and similar objects cannot be predicted accurately due to the limitation of convolutional kernel representation capability. As a result, in order to solve this problem, this paper introduces twins-transformer. Since different stages of image representation of this model capture different levels or scales of features and have different discriminative capacities, we design a multi-stage semantic attention with transformer (MAST) framework to learn the semantic representation of images using its own multi-stage mechanism, while employing a three-layer standard transformer decoder as an effective component for feature fusion. Experiments conducted on the VOC 2007 dataset show that MSAT achieves better experimental results and improves the performance of multi-label image classification tasks to some extent.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)
Series
Atlantis Highlights in Computer Sciences
Publication Date
27 December 2022
ISBN
10.2991/978-94-6463-040-4_178
ISSN
2589-4900
DOI
10.2991/978-94-6463-040-4_178How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Qi Du
AU  - Ying Ma
AU  - Jianmin Li
PY  - 2022
DA  - 2022/12/27
TI  - Multi-stage Semantic Attention with Transformer for Multi-label Image Classification
BT  - Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022)
PB  - Atlantis Press
SP  - 1193
EP  - 1199
SN  - 2589-4900
UR  - https://doi.org/10.2991/978-94-6463-040-4_178
DO  - 10.2991/978-94-6463-040-4_178
ID  - Du2022
ER  -