当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Object Pose Estimation Based on Multi-precision Vectors and Seg-Driven PnP
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-12-07 , DOI: 10.1007/s11263-024-02317-y
Yulin Wang, Hongli Li, Chen Luo

Object pose estimation based on a single RGB image has wide application potential but is difficult to achieve. Existing pose estimation involves various inference pipelines. One popular pipeline is to first use Convolutional Neural Networks (CNN) to predict 2D projections of 3D keypoints in a single RGB image and then calculate the 6D pose via a Perspective-n-Point (PnP) solver. Due to the gap between synthetic data and real data, the model trained on synthetic data has difficulty predicting the 6D pose accurately when applied to real data. To address the acute problem, we propose a two-stage pipeline of object pose estimation based upon multi-precision vectors and segmentation-driven (Seg-Driven) PnP. In keypoint localization stage, we first develop a CNN-based three-branch network to predict multi-precision 2D vectors pointing to 2D keypoints. Then we introduce an accurate and fast Keypoint Voting scheme of Multi-precision vectors (KVM), which computes low-precision 2D keypoints using low-precision vectors and refines 2D keypoints on mid- and high-precision vectors. In the pose calculation stage, we propose Seg-Driven PnP to refine the 3D Translation of poses and get the optimal pose by minimizing the non-overlapping area between segmented and rendered masks. The Seg-Driven PnP leverages 2D segmentation trained on real images to improve the accuracy of pose estimation trained on synthetic data, thereby reducing the synthetic-to-real gap. Extensive experiments show our approach materially outperforms state-of-the-art methods on LM and HB datasets. Importantly, our proposed method works reasonably well for weakly textured and occluded objects in diverse scenes.



中文翻译:


基于多精度向量和 Seg 驱动 PnP 的物体姿态估计



基于单个 RGB 图像的物体位姿估计具有广泛的应用潜力,但难以实现。现有的姿态估计涉及各种推理管道。一种流行的管道是首先使用卷积神经网络 (CNN) 来预测单个 RGB 图像中 3D 关键点的 2D 投影,然后通过 Perspective-n-Point (PnP) 求解器计算 6D 姿势。由于合成数据和真实数据之间存在差距,在合成数据上训练的模型在应用于真实数据时难以准确预测 6D 姿势。为了解决这个严重的问题,我们提出了一种基于多精度向量和分割驱动(Seg Driven)PnP 的两阶段物体姿态估计管道。在关键点定位阶段,我们首先开发了一个基于 CNN 的三分支网络来预测指向 2D 关键点的多精度 2D 向量。然后,我们引入了一种准确快速的多精度向量 (KVM) 关键点投票方案,该方案使用低精度向量计算低精度 2D 关键点,并在中精度和高精度向量上细化 2D 关键点。在姿势计算阶段,我们提出了 Seg-Driven PnP 来优化姿势的 3D 平移,并通过最小化分割蒙版和渲染蒙版之间的非重叠区域来获得最佳姿势。Seg-Driven PnP 利用在真实图像上训练的 2D 分割来提高在合成数据上训练的姿态估计的准确性,从而缩小合成与真实之间的差距。广泛的实验表明,我们的方法在 LM 和 HB 数据集上的性能明显优于最先进的方法。重要的是,我们提出的方法对于不同场景中纹理较弱和被遮挡的对象效果相当好。

更新日期:2024-12-07
down
wechat
bug