Multi-view human pose and shape estimation via mesh-aligned voxel interpolation,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-view human pose and shape estimation via mesh-aligned voxel interpolation
Information Fusion ( IF 14.7 ) Pub Date : 2024-08-26 , DOI: 10.1016/j.inffus.2024.102651
Yixuan Zhang , Jiguang Zhang , Shibiao Xu , Jun Xiao

Although multi-view human pose and shape regression methods have information from other views for complementing and correcting, existing ones still have its own drawback of not fully taking advantage of multi-view setup. Thus they are far from efficiently aligning and merging features in different views. In order to tackle these problems, we propose a multi-view framework where features from all views are well aligned and merged through multi-view voxel aggregation with inverse projection. Our framework highlights three major characteristics. Firstly, we use a multi-view volumetric aggregation module for better prediction by exploiting various information in different-scale feature maps. Secondly, in our framework, instead of using all voxels, a mesh-aligned voxel selection module is proposed to make effective prediction by eliminating redundant background voxels. Lastly, the framework further improves the performance of human body parametric modeling by adopting a dual-branch strategy, where one branch for parametric human model prediction and the other for 3D keypoints prediction. Their mutual influence is critical to the improvement for both tasks. Additionally, we find the scarcity of datasets also hinders the development of multi-view methods, so we propose a approach for creating occlusion datasets specifically for multi-view occlusion case. Experimental results verify the effectiveness of the proposed framework on two benchmarks, Human3.6M and MPI-INF-3DHP.

中文翻译：

通过网格对齐体素插值进行多视图人体姿势和形状估计

尽管多视图人体姿势和形状回归方法具有来自其他视图的信息来进行补充和校正，但现有的方法仍然存在未充分利用多视图设置的缺点。因此，它们远不能有效地对齐和合并不同视图中的特征。为了解决这些问题，我们提出了一个多视图框架，其中所有视图的特征通过具有逆投影的多视图体素聚合来很好地对齐和合并。我们的框架突出了三个主要特征。首先，我们使用多视图体积聚合模块，通过利用不同尺度特征图中的各种信息来进行更好的预测。其次，在我们的框架中，不是使用所有体素，而是提出了网格对齐体素选择模块，通过消除冗余背景体素来进行有效预测。最后，该框架通过采用双分支策略进一步提高了人体参数化建模的性能，其中一个分支用于参数化人体模型预测，另一个用于3D关键点预测。他们的相互影响对于这两项任务的改进至关重要。此外，我们发现数据集的稀缺也阻碍了多视图方法的发展，因此我们提出了一种专门针对多视图遮挡情况创建遮挡数据集的方法。实验结果验证了所提出的框架在 Human3.6M 和 MPI-INF-3DHP 两个基准上的有效性。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11