The Investigation of Data-Parallelism Strategy Based on VIT Model

Zhengtao Feng

doi:10.2991/978-94-6463-300-9_61

<Previous Article In Volume

Next Article In Volume>

The Investigation of Data-Parallelism Strategy Based on VIT Model

Authors

Zhengtao Feng¹^{, *}

¹Software Engineering, Nanhai Campus, South China Normal University, Taoyuan East Road, Nanhai District, Foshan, 528200, China

^*Corresponding author. Email: 20202005347@m.scnu.edu.cn

Corresponding Author

Zhengtao Feng

Available Online 27 November 2023.

DOI: 10.2991/978-94-6463-300-9_61 How to use a DOI?
Keywords: Vision Transformer Model; Data Parallelism; Distributed Model Training
Abstract: With the advent of advanced techniques for training large models, the research community has become increasingly interested in exploring various methods to enhance the efficiency of model training. The Vision Transform (VIT) model represents a novel approach in the field of image processing, being the first attempt to apply the Transformer model in this domain. This study employs a repeated experimental methodology with data-parallelism to investigate the conditions and settings that optimize the training efficacy of the VIT model. Data-parallelism, a distributed parallel approach, is utilized to evenly distribute training tasks across multiple GPUs, allowing for a comparative assessment of the training effects. By manipulating fundamental configurations and adjusting the number of GPUs, the objective of achieving the most favorable training outcomes is pursued. Subsequently, this research endeavors to determine the optimal GPU configuration for training the CIFAR10 dataset using the VIT model. The experimental findings suggest that employing three GPUs yields the best results when training the CIFAR10 dataset with the VIT model. Specifically, the employment of three GPUs results in the most notable decrease in loss value and the highest accuracy in image classification. Consequently, the training effectiveness surpasses that of alternative experiments. Furthermore, in comparison to training the VIT model without data-parallelism, utilizing data-parallelism with any number of GPUs proves to enhance the efficiency of VIT model training.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
Series: Advances in Computer Science Research
Publication Date: 27 November 2023
ISBN: 10.2991/978-94-6463-300-9_61
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-300-9_61 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Zhengtao Feng
PY  - 2023
DA  - 2023/11/27
TI  - The Investigation of Data-Parallelism Strategy Based on VIT Model
BT  - Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023)
PB  - Atlantis Press
SP  - 589
EP  - 599
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-300-9_61
DO  - 10.2991/978-94-6463-300-9_61
ID  - Feng2023
ER  -

download .riscopy to clipboard