当前位置: X-MOL 学术ACM Trans. Graph. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Look Ma, no markers: holistic performance capture without the hassle
ACM Transactions on Graphics  ( IF 7.8 ) Pub Date : 2024-11-19 , DOI: 10.1145/3687772
Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis

We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.

中文翻译:


看马,没有标记:整体性能捕获,轻松



我们解决了同时对面部、身体和手部进行高精度、整体性能捕捉的问题。电影和游戏制作中使用的动作捕捉技术通常只专注于独立捕捉面部、身体或手部,涉及复杂且昂贵的硬件,以及熟练操作员的高度手动干预。虽然存在基于机器学习的方法可以克服这些问题,但它们通常只支持单个相机,通常在身体的单个部位进行操作,不会产生精确的世界空间结果,并且很少在特定上下文之外进行泛化。在这项工作中,我们介绍了第一个无需任何校准、手动干预或定制硬件即可对整个人体(包括眼睛和舌头)进行无标记、高质量重建的技术。我们的方法从任意摄像机装备中生成稳定的世界空间结果,并支持不同的拍摄环境和服装。我们通过混合方法实现这一目标,该方法利用专门在合成数据上训练的机器学习模型和强大的人体形状和运动参数模型。我们根据许多身体、面部和手部重建基准评估了我们的方法,并展示了在各种数据集上推广的最新结果。
更新日期:2024-11-19
down
wechat
bug