CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-07-27 , DOI: 10.1007/s40747-024-01567-0
Peicheng Shi , Zhiqiang Liu , Xinlong Dong , Aixi Yang

In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.

中文翻译：

CL-fusionBEV：鸟瞰中相机-LiDAR 融合的 3D 物体检测方法

在自动驾驶研究浪潮中，鸟瞰 (BEV) 视角的 3D 物体检测已成为一个关键的关注领域。这一挑战的本质是将摄像头和激光雷达数据有效地融合到 BEV 中。当前的方法主要在前视图和笛卡尔坐标系内进行训练和预测，通常忽略了相机和激光雷达传感器之间固有的结构和操作差异。本文介绍了 CL-FusionBEV，这是一种专为 BEV 角度的传感器数据融合而定制的创新 3D 物体检测方法。我们的方法从视图转换开始，通过隐式学习模块将相机的视角转换到 BEV 空间，从而对齐预测模块。随后，为了在BEV框架内实现模态融合，我们采用体素化将LiDAR点云转换到BEV空间，从而生成LiDAR BEV空间特征。此外，为了整合来自相机和激光雷达的BEV空间特征，我们开发了多模态交叉注意机制和隐式多模态融合网络，旨在增强双模态数据的协同和应用。为了抵消多模态交叉注意力引起的全局推理和特征交互的潜在缺陷，我们提出了一种 BEV 自注意力机制，促进全面的全局特征操作。我们的方法已经对自动驾驶领域的大量数据集 nuScenes 数据集进行了严格的评估。结果表明，我们的方法实现了 73.3% 的平均精度 (mAP) 和 75 的 nuScenes 检测分数 (NDS)。5%，特别是在汽车和行人的检测方面表现出色，准确率分别高达 89% 和 90.7%。此外，CL-FusionBEV 在识别遮挡和远处物体方面表现出卓越的性能，超越了现有的比较方法。

更新日期：2024-07-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南