Transportation Research Part C: Emerging Technologies ( IF 7.6 ) Pub Date : 2024-01-17 , DOI: 10.1016/j.trc.2024.104486 Fabian Hart , Ostap Okhrin , Martin Treiber
One of the biggest challenges in the development of learning-driven automated driving technologies remains the handling of uncommon, rare events that may have not been encountered in training. Especially when training a model with real driving data, unusual situations, such as emergency brakings, may be underrepresented, resulting in a model that lacks robustness in rare events. This study focuses on car-following based on reinforcement learning and demonstrates that existing approaches, trained with real driving data, fail to handle safety–critical situations. Since collecting data representing all kinds of possible car-following events, including safety–critical situations, is challenging, we propose a training environment that harnesses stochastic processes to generate diverse and challenging scenarios.
Our experiments show that training with real data can lead to models that collide in safety–critical situations, whereas the proposed model exhibits excellent performance and remains accident-free, comfortable, and string-stable even in extreme scenarios, such as full-braking by the leading vehicle. Its robustness is demonstrated by simulating car-following scenarios for various reward function parametrizations and a diverse range of artificial and real leader data that were not included in training and were qualitatively different from the learning data. We further show that conventional reward designs can encourage aggressive behavior when approaching other vehicles. Additionally, we compared the proposed model with classical car-following models and found it to achieve equal or superior results.
中文翻译:
基于深度强化学习的稳健跟车
学习驱动的自动驾驶技术开发的最大挑战之一仍然是处理训练中可能未遇到的不常见、罕见的事件。特别是当使用真实驾驶数据训练模型时,紧急制动等异常情况可能会被低估,从而导致模型在罕见事件中缺乏鲁棒性。这项研究的重点是基于强化学习的汽车跟随,并表明现有的方法,用真实的驾驶数据训练,无法处理安全关键的情况。由于收集代表各种可能的跟车事件(包括安全关键情况)的数据具有挑战性,因此我们提出了一个利用随机过程来生成多样化且具有挑战性的场景的训练环境。
我们的实验表明,使用真实数据进行训练可能会导致模型在安全关键情况下发生碰撞,而所提出的模型表现出出色的性能,即使在极端情况下(例如完全制动)也能保持无事故、舒适和稳定。领先的车辆。它的鲁棒性是通过模拟各种奖励函数参数化的跟车场景以及各种人工和真实领导者数据来证明的,这些数据未包含在训练中并且与学习数据有质的不同。我们进一步表明,传统的奖励设计可以鼓励接近其他车辆时的攻击行为。此外,我们将所提出的模型与经典的跟车模型进行了比较,发现它取得了相同或更好的结果。