Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-11-18 , DOI: 10.1007/s40747-024-01651-5 Yanliu Zheng, Juan Luo, Han Gao, Yi Zhou, Keqin Li
Adaptive traffic signal control is the core of the intelligent transportation system (ITS), which can effectively reduce the pressure on traffic congestion and improve travel efficiency. Methods based on deep Q-leaning network (DQN) have become the mainstream to solve single-intersection traffic signal control. However, most of them neglect the important difference of samples and the dependence of traffic states, and cannot quickly respond to randomly changing traffic flows. In this paper, we propose a new single-intersection traffic signal control method (Pri-DDQN) based on reinforcement learning and model the traffic environment as a reinforcement learning environment, and the agent chooses the best action to schedule the traffic flow at the intersection based on the real-time traffic states. With the goal of minimizing the waiting time and queue length at intersections, we use double DQN to train the agent, incorporate traffic state and reward into the loss function, and update the target network parameters asynchronously, to improve the agent’s learning ability. We try to use the power function to dynamically change the exploration rate to accelerate convergence. In addition, we introduce a priority-based dynamic experience replay mechanism to increase the sampling rate of important samples. The results show that Pri-DDQN achieves better performance, compared to the best baseline, it reduces the average queue length is reduced by 13.41%, and the average waiting time by 32.33% at the intersection.
中文翻译:
Pri-DDQN:通过混合代理学习自适应交通信号控制策略
自适应交通信号控制是智能交通系统(ITS)的核心,可以有效减轻交通拥堵压力,提高出行效率。基于深度 Q-leaning 网络 (DQN) 的方法已成为解决单路通信号控制的主流。然而,它们大多忽略了样本的重要差异和交通状态的依赖性,无法快速响应随机变化的交通流。在本文中,我们提出了一种新的基于强化学习的单路通信号控制方法(Pri-DDQN),并将交通环境建模为强化学习环境,智能体根据实时交通状态选择最佳动作来调度路口的交通流。为了最小化路口的等待时间和排队长度,我们使用双 DQN 来训练智能体,将交通状态和奖励纳入损失函数,并异步更新目标网络参数,以提高智能体的学习能力。我们尝试使用 power 函数动态改变勘探速率以加速收敛。此外,我们引入了基于优先级的动态体验重放机制,以提高重要样本的采样率。结果表明,Pri-DDQN 取得了更好的性能,与最佳基线相比,它减少了平均队列长度减少了 13.41%,交叉口的平均等待时间减少了 32.33%。