当前位置: X-MOL 学术IEEE Comput. Intell. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-Lane Scenarios [Research Frontier]
IEEE Computational Intelligence Magazine ( IF 10.3 ) Pub Date : 4-5-2024 , DOI: 10.1109/mci.2024.3364428
Jingliang Duan 1 , Yangang Ren 2 , Fawang Zhang 3 , Jie Li 2 , Shengbo Eben Li 2 , Yang Guan 2 , Keqiang Li 2
Affiliation  

This paper proposes a new reinforcement learning (RL) algorithm, called encoding distributional soft actor-critic (E-DSAC), for decision-making in autonomous driving. Unlike existing RL-based decision-making methods, E-DSAC is suitable for situations where the number of surrounding vehicles is variable and eliminates the requirement for manually pre-designed sorting rules, resulting in higher policy performance and generality. Firstly, an encoding distributional policy iteration (DPI) framework is developed by embedding a permutation invariant module, which employs a feature neural network (NN) to encode the indicators of each vehicle, in the distributional RL framework. The proposed DPI framework is proven to exhibit important properties in terms of convergence and global optimality. Next, based on the developed encoding DPI framework, the E-DSAC algorithm is proposed by adding the gradient-based update rule of the feature NN to the policy evaluation process of the DSAC algorithm. Then, the multi-lane driving task and the corresponding reward function are designed to verify the effectiveness of the proposed algorithm. Results show that the policy learned by E-DSAC can realize efficient, smooth, and relatively safe autonomous driving in the designed scenario. And the final policy performance attained by E-DSAC surpasses that of DSAC by approximately threefold. Furthermore, its effectiveness has also been verified in real vehicle experiments.

中文翻译:


多车道场景下自动驾驶的分布式 Soft Actor-Critic 编码 [研究前沿]



本文提出了一种新的强化学习(RL)算法,称为编码分布式软演员评论家(E-DSAC),用于自动驾驶决策。与现有的基于强化学习的决策方法不同,E-DSAC适用于周围车辆数量变化的情况,并且无需手动预先设计排序规则,从而具有更高的策略性能和通用性。首先,通过在分布式强化学习框架中嵌入排列不变模块来开发编码分布式策略迭代(DPI)框架,该模块采用特征神经网络(NN)对每辆车的指标进行编码。事实证明,所提出的 DPI 框架在收敛性和全局最优性方面表现出重要的特性。接下来,基于开发的编码DPI框架,通过在DSAC算法的策略评估过程中添加基于特征NN的梯度更新规则,提出E-DSAC算法。然后设计多车道驾驶任务和相应的奖励函数来验证所提算法的有效性。结果表明,E-DSAC学习到的策略能够在设计场景下实现高效、平稳、相对安全的自动驾驶。 E-DSAC 最终获得的政策绩效比 DSAC 大约高出三倍。此外,其有效性也在实车实验中得到验证。
更新日期:2024-08-19
down
wechat
bug