Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-12-19 , DOI: 10.1007/s40747-024-01696-6 Wei Sun, Ruijia Cui, Qianzhou Wang, Xianguang Kong, Yanning Zhang
We present a novel learning model with attention and prior guidance for view synthesis. In contrast to previous works that focus on optimizing for specific scenes with densely captured views, our model explores a generic deep neural framework to reconstruct radiance fields from a limited number of input views. To address challenges arising from under-constrained conditions, our approach employs cost volumes for geometry-aware scene reasoning, and integrates relevant knowledge from the ray-cast space and the surrounding-view space using an attention model. Additionally, a denoising diffusion model learns a prior over scene color, facilitating regularization of the training process and enabling high-quality radiance field reconstruction. Experimental results on diverse benchmark datasets demonstrate that our approach can generalize across scenes and produce realistic view synthesis results using only three input images, surpassing the performance of previous state-of-the-art methods. Moreover, our reconstructed radiance field can be effectively optimized by fine-tuning the target scene to achieve higher quality results with reduced optimization time. The code will be released at https://github.com/dsdefv/nerf.
中文翻译:

使用有限视图图像对可泛化神经辐射场进行皱状化
我们提出了一种新的学习模型,其中包含视图综合的关注和事先指导。与之前专注于优化具有密集捕获视图的特定场景的工作相比,我们的模型探索了一个通用的深度神经框架,以从有限数量的输入视图中重建辐射场。为了应对约束不足条件带来的挑战,我们的方法采用成本量进行几何感知场景推理,并使用注意力模型集成来自光线投射空间和周围视图空间的相关知识。此外,降噪扩散模型学习先验场景颜色,促进训练过程的正则化并实现高质量的辐射场重建。在不同基准数据集上的实验结果表明,我们的方法可以跨场景泛化,并仅使用三个输入图像即可产生逼真的视图合成结果,超越了以前最先进的方法的性能。此外,我们重建的辐射场可以通过微调目标场景来有效优化,从而以更短的优化时间获得更高质量的结果。该代码将于 https://github.com/dsdefv/nerf 发布。