Representing Long Volumetric Video with Temporal Gaussian Hierarchy,ACM Transactions on Graphics

当前位置： X-MOL 学术 › ACM Trans. Graph. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Representing Long Volumetric Video with Temporal Gaussian Hierarchy
ACM Transactions on Graphics ( IF 7.8 ) Pub Date : 2024-11-19 , DOI: 10.1145/3687919
Zhen Xu, Yinghao Xu, Zhiyuan Yu, Sida Peng, Jiaming Sun, Hujun Bao, Xiaowei Zhou

This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature grids or point cloud sequences, to achieve high-quality rendering results. However, they are typically limited to short (1~2s) video clips and often suffer from large memory footprints when dealing with longer videos. To solve this issue, we propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos. Our key observation is that there are generally various degrees of temporal redundancy in dynamic scenes, which consist of areas changing at different speeds. Motivated by this, our approach builds a multi-level hierarchy of 4D Gaussian primitives, where each level separately describes scene regions with different degrees of content change, and adaptively shares Gaussian primitives to represent unchanged scene content over different temporal segments, thus effectively reducing the number of Gaussian primitives. In addition, the tree-like structure of the Gaussian hierarchy allows us to efficiently represent the scene at a particular moment with a subset of Gaussian primitives, leading to nearly constant GPU memory usage during the training or rendering regardless of the video length. Moreover, we design a Compact Appearance Model that mixes diffuse and view-dependent Gaussians to further minimize the model size while maintaining the rendering quality. We also develop a rasterization pipeline of Gaussian primitives based on the hardware-accelerated technique to improve rendering speed. Extensive experimental results demonstrate the superiority of our method over alternative methods in terms of training cost, rendering speed, and storage usage. To our knowledge, this work is the first approach capable of efficiently handling hours of volumetric video data while maintaining state-of-the-art rendering quality.

中文翻译：

使用 Temporal Gaussian Hierarchy 表示长体积视频

本文旨在解决从多视图 RGB 视频重建长体积视频的挑战。最近的动态视图综合方法利用强大的 4D 表示（如特征网格或点云序列）来实现高质量的渲染结果。但是，它们通常仅限于短（1~2s）视频剪辑，并且在处理较长的视频时通常会占用大量内存。为了解决这个问题，我们提出了一种名为 Temporal Gaussian Hierarchy 的新型 4D 表示，用于对长体积视频进行紧凑建模。我们的主要观察结果是，动态场景通常存在不同程度的时间冗余，这些场景由以不同速度变化的区域组成。受此启发，我们的方法构建了 4D 高斯基元的多级层次结构，其中每个级别分别描述具有不同内容变化程度的场景区域，并自适应地共享高斯基元以表示不同时间段上未更改的场景内容，从而有效减少高斯基元的数量。此外，高斯层次结构的树状结构使我们能够使用高斯基元的子集有效地表示特定时刻的场景，从而在训练或渲染期间几乎恒定地使用GPU内存，而不管视频长度如何。此外，我们设计了一个紧凑外观模型，该模型混合了漫反射和视图相关的高斯分布，以进一步减小模型大小，同时保持渲染质量。我们还基于硬件加速技术开发了高斯基元的光栅化管道，以提高渲染速度。广泛的实验结果表明，我们的方法在训练成本、渲染速度和存储使用方面优于其他方法。据我们所知，这项工作是第一种方法，能够有效处理数小时的体积视频数据，同时保持最先进的渲染质量。

更新日期：2024-11-19

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南