当前位置:
X-MOL 学术
›
IEEE Netw.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Improving the Robustness of Pedestrian Detection in Autonomous Driving With Generative Data Augmentation
IEEE NETWORK ( IF 6.8 ) Pub Date : 2024-02-16 , DOI: 10.1109/mnet.2024.3366232 Yalun Wu 1 , Yingxiao Xiang 2 , Endong Tong 1 , Yuqi Ye 1 , Zhibo Cui 1 , Yunzhe Tian 1 , Lejun Zhang 3 , Jiqiang Liu 1 , Zhen Han 1 , Wenjia Niu 1
IEEE NETWORK ( IF 6.8 ) Pub Date : 2024-02-16 , DOI: 10.1109/mnet.2024.3366232 Yalun Wu 1 , Yingxiao Xiang 2 , Endong Tong 1 , Yuqi Ye 1 , Zhibo Cui 1 , Yunzhe Tian 1 , Lejun Zhang 3 , Jiqiang Liu 1 , Zhen Han 1 , Wenjia Niu 1
Affiliation
Pedestrian detection plays a crucial role in autonomous driving by identifying the position, size, orientation, and dynamic features of pedestrians in images or videos, assisting autonomous vehicles in making better decisions and controls. It’s worth noting that the performance of pedestrian detection models largely depends on the quality and diversity of available training data. Current datasets for autonomous driving have limitations in terms of diversity, scale, and quality. In recent years, numerous studies have proposed the use of data augmentation strategies to expand the coverage of datasets, aiming to maximize the utilization of existing training data. However, these data augmentation methods often overlook the diversity of data scenarios. To overcome this challenge, in this paper, we propose a more comprehensive method for data augmentation, based on image descriptions and diffusion models. This method aims to cover a wider range of scene variations, including different weather conditions and lighting situations. We have designed a classifier to select data samples for augmentation, followed by extracting visual features based on image captions and converting them into high-level semantic information as textual descriptions for the corresponding samples. Finally, we utilize diffusion models to generate new variants. Additionally, we have designed three modification patterns to increase diversity in aspects such as weather conditions, lighting, and pedestrian poses within the data. We conducted extensive experiments on the KITTI dataset and in real-world environments, demonstrating that our proposed method significantly enhances the performance of pedestrian detection models in complex scenarios. This meticulous consideration of data augmentation will notably enhance the applicability and robustness of pedestrian detection models in actual autonomous driving scenarios.
中文翻译:
通过生成数据增强提高自动驾驶中行人检测的鲁棒性
行人检测在自动驾驶中发挥着至关重要的作用,它可以识别图像或视频中行人的位置、大小、方向和动态特征,帮助自动驾驶汽车做出更好的决策和控制。值得注意的是,行人检测模型的性能很大程度上取决于可用训练数据的质量和多样性。当前的自动驾驶数据集在多样性、规模和质量方面存在局限性。近年来,大量研究提出使用数据增强策略来扩大数据集的覆盖范围,旨在最大限度地利用现有训练数据。然而,这些数据增强方法往往忽视了数据场景的多样性。为了克服这一挑战,在本文中,我们提出了一种基于图像描述和扩散模型的更全面的数据增强方法。该方法旨在覆盖更广泛的场景变化,包括不同的天气条件和照明情况。我们设计了一个分类器来选择数据样本进行增强,然后根据图像标题提取视觉特征,并将其转换为高级语义信息作为相应样本的文本描述。最后,我们利用扩散模型来生成新的变体。此外,我们还设计了三种修改模式,以增加数据中天气条件、照明和行人姿势等方面的多样性。我们在 KITTI 数据集和现实环境中进行了广泛的实验,证明我们提出的方法显着增强了复杂场景中行人检测模型的性能。 这种对数据增强的细致考虑将显着增强行人检测模型在实际自动驾驶场景中的适用性和鲁棒性。
更新日期:2024-02-16
中文翻译:
通过生成数据增强提高自动驾驶中行人检测的鲁棒性
行人检测在自动驾驶中发挥着至关重要的作用,它可以识别图像或视频中行人的位置、大小、方向和动态特征,帮助自动驾驶汽车做出更好的决策和控制。值得注意的是,行人检测模型的性能很大程度上取决于可用训练数据的质量和多样性。当前的自动驾驶数据集在多样性、规模和质量方面存在局限性。近年来,大量研究提出使用数据增强策略来扩大数据集的覆盖范围,旨在最大限度地利用现有训练数据。然而,这些数据增强方法往往忽视了数据场景的多样性。为了克服这一挑战,在本文中,我们提出了一种基于图像描述和扩散模型的更全面的数据增强方法。该方法旨在覆盖更广泛的场景变化,包括不同的天气条件和照明情况。我们设计了一个分类器来选择数据样本进行增强,然后根据图像标题提取视觉特征,并将其转换为高级语义信息作为相应样本的文本描述。最后,我们利用扩散模型来生成新的变体。此外,我们还设计了三种修改模式,以增加数据中天气条件、照明和行人姿势等方面的多样性。我们在 KITTI 数据集和现实环境中进行了广泛的实验,证明我们提出的方法显着增强了复杂场景中行人检测模型的性能。 这种对数据增强的细致考虑将显着增强行人检测模型在实际自动驾驶场景中的适用性和鲁棒性。