Construction and Deduction of a Multimodal Traffic Prediction System: From Baseline Models to Fusion Innovation

Zijun Chen; Zhengyuan Zhou

doi:10.2991/978-94-6239-648-7_30

<Previous Article In Volume

Next Article In Volume>

Construction and Deduction of a Multimodal Traffic Prediction System: From Baseline Models to Fusion Innovation

Authors

Zijun Chen¹, Zhengyuan Zhou²^{, *}

¹School of Science and Technology, Hong Kong Metropolitan University, Hong Kong, China

²Institute of Collaborative Innovation, University of Macau, Macau, China

^*Corresponding author. Email: mc46674@um.edu.mo

Corresponding Author

Zhengyuan Zhou

Available Online 24 April 2026.

DOI: 10.2991/978-94-6239-648-7_30 How to use a DOI?
Keywords: Traffic Prediction; Multimodal Data; Graph Neural Network; Temporal Convolutional Network; METR-LA
Abstract: With the rapid development of intelligent transportation systems, urban traffic flow prediction has become a core issue in urban management and traffic optimization. Traditional traffic prediction methods often rely on single-modal data, such as historical speed or flow, which cannot fully utilize the multi-source information present in traffic networks. This paper proposes a multimodal traffic flow prediction method that integrates Graph Neural Networks (GNN) with Temporal Convolutional Networks (TCN) to capture complex spatiotemporal dependencies, enhanced by the fusion of multimodal data including historical speeds, weather conditions, event information, and road topology. Based on the Metro Traffic Los Angeles (METR-LA) dataset from the Los Angeles traffic management department, the proposed approach employs a gated attention mechanism to dynamically weigh and combine features from different modalities. Experimental results demonstrate that the proposed method achieves Mean Absolute Error (MAE) values of 2.71, 3.55, and 4.63 km/h for 15-, 30-, and 60-minute predictions, respectively, outperforming state-of-the-art models such as Fusion Transformer Network (FusionTransNet) by 4.9%, 4.1%, and 5.1% in MAE across different horizons, and also shows significant improvements in Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE).
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
Series: Advances in Computer Science Research
Publication Date: 24 April 2026
ISBN: 978-94-6239-648-7
ISSN: 2352-538X
DOI: 10.2991/978-94-6239-648-7_30 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Zijun Chen
AU  - Zhengyuan Zhou
PY  - 2026
DA  - 2026/04/24
TI  - Construction and Deduction of a Multimodal Traffic Prediction System: From Baseline Models to Fusion Innovation
BT  - Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
PB  - Atlantis Press
SP  - 268
EP  - 279
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6239-648-7_30
DO  - 10.2991/978-94-6239-648-7_30
ID  - Chen2026
ER  -

download .riscopy to clipboard