当前位置: X-MOL 学术ACM Trans. Graph. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HQ3DAvatar: High-quality Implicit 3D Head Avatar
ACM Transactions on Graphics  ( IF 7.8 ) Pub Date : 2024-04-09 , DOI: 10.1145/3649889
Kartik Teotia 1 , Mallikarjun B R 2 , Xingang Pan 3 , Hyeongwoo Kim 4 , Pablo Garrido 5 , Mohamed Elgharib 6 , Christian Theobalt 2
Affiliation  

Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This article presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high quality, faster training, and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow-based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for a resolution of 480x270. Our method outperforms related approaches both visually and numerically. We will release our multiple-identity dataset to encourage further research.



中文翻译:

HQ3DAvatar:高质量隐式 3D 头部头像

多视图体积渲染技术最近在建模和合成高质量头像方面显示出巨大的潜力。捕获全头部动态性能的常见方法是使用基于网格的模板或基于 3D 立方体的图形基元来跟踪底层几何结构。虽然这些基于模型的方法取得了有希望的结果,但它们通常无法学习复杂的几何细节,例如口腔内部、头发和随时间的拓扑变化。本文提出了一种构建高度逼真的数字头像的新颖方法。我们的方法通过由神经网络参数化的隐式函数来学习规范空间。它在学习的特征空间中利用多分辨率哈希编码,从而实现高质量、更快的训练和高分辨率渲染。在测试时,我们的方法由单目 RGB 视频驱动。在这里,图像编码器提取面部特定特征,这些特征也调节可学习的规范空间。这鼓励了训练期间依赖变形的纹理变化。我们还提出了一种新颖的基于光流的损失,可确保学习的规范空间中的对应性,从而鼓励无伪影和时间一致的渲染。我们展示了具有挑战性的面部表情的结果,并以 480 x 270的分辨率以交互式实时速率展示了自由视点渲染。我们的方法在视觉和数值上都优于相关方法。我们将发布我们的多重身份数据集以鼓励进一步的研究。

更新日期:2024-04-09
down
wechat
bug