Multi-agent Reinforcement Learning for Traffic Signal Control: Comparison with Centralized, Decentralized and Hybrid Approaches

Lip Wei Ong; Hooi Ling Khoo; Kok-Lim Yau; Chuan Fang Ong

doi:10.2991/978-94-6463-972-8_6

<Previous Article In Volume

Next Article In Volume>

Multi-agent Reinforcement Learning for Traffic Signal Control: Comparison with Centralized, Decentralized and Hybrid Approaches

Authors

Lip Wei Ong¹, Hooi Ling Khoo¹^{, *}, Kok-Lim Yau², Chuan Fang Ong²

¹Centre for Sustainable Mobility Technologies, Universiti Tunku Abdul Rahman, Sungai Long Campus, Bandar Sungai Long, 43000, Kajang, Selangor, Malaysia

²Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, Sungai Long Campus, Bandar Sungai Long, 43000, Kajang, Selangor, Malaysia

^*Corresponding author. Email: khoohl@utar.edu.my

Corresponding Author

Hooi Ling Khoo

Available Online 29 December 2025.

DOI: 10.2991/978-94-6463-972-8_6 How to use a DOI?
Keywords: Traffic Signal Control; Deep Reinforcement Learning; Multi-agent System
Abstract: Unproductive time spent in transportation not only frustrates passengers but also incurs significant economic costs globally. The rise of autonomous transportation systems has intensified the need for adaptive traffic signal control (ATSC) methods capable of responding to dynamic traffic conditions. In the field of traffic signal control, multi-agent reinforcement learning (MARL) methods are gaining interest, treating each traffic signal as an intelligent agent offers a promising solution to adaptive traffic signal control (ATSC) in complex urban environments, yet its real-world deployment remains challenging due to complex spatial-temporal dependencies and inter-agent coordination. Existing deep learning-based ATSC approaches often rely on historical or current traffic states, resulting in a one-step temporal lag that limits real-time responsiveness. To address this, we propose a multi-agent traffic signal control framework that integrates a sequence-to-sequence (seq2seq) time-series prediction module with a value-decomposition network (VDN). The seq2seq model predicts future traffic states based on historical data, enabling proactive signal control. To further enhance temporal representation and decision stability, we incorporate a gated recurrent unit (GRU) into the VDN architecture. Experimental evaluations on two real-world traffic networks, Jinan12 and Hangzhou16, demonstrate that our method outperforms baseline models including independent deep Q-learning with GRU (IDQN-GRU) and standard VDN in Hangzhou16 datasets and achieve comparable performance with VDN in Jinan12 dataset.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 14th Asia-Pacific Conference on Transportation and the Environment (APTE 2025)
Series: Atlantis Highlights in Engineering
Publication Date: 29 December 2025
ISBN: 978-94-6463-972-8
ISSN: 2589-4943
DOI: 10.2991/978-94-6463-972-8_6 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Lip Wei Ong
AU  - Hooi Ling Khoo
AU  - Kok-Lim Yau
AU  - Chuan Fang Ong
PY  - 2025
DA  - 2025/12/29
TI  - Multi-agent Reinforcement Learning for Traffic Signal Control: Comparison with Centralized, Decentralized and Hybrid Approaches
BT  - Proceedings of the 14th Asia-Pacific Conference on Transportation and the Environment (APTE 2025)
PB  - Atlantis Press
SP  - 50
EP  - 60
SN  - 2589-4943
UR  - https://doi.org/10.2991/978-94-6463-972-8_6
DO  - 10.2991/978-94-6463-972-8_6
ID  - Ong2025
ER  -

download .riscopy to clipboard