当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Warping the Residuals for Image Editing with StyleGAN
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-11-18 , DOI: 10.1007/s11263-024-02301-6
Ahmet Burak Yildirim, Hamza Pehlivan, Aysegul Dundar

StyleGAN models show editing capabilities via their semantically interpretable latent organizations which require successful GAN inversion methods to edit real images. Many works have been proposed for inverting images into StyleGAN’s latent space. However, their results either suffer from low fidelity to the input image or poor editing qualities, especially for edits that require large transformations. That is because low bit rate latent spaces lose many image details due to the information bottleneck even though it provides an editable space. On the other hand, higher bit rate latent spaces can pass all the image details to StyleGAN for perfect reconstruction of images but suffer from low editing qualities. In this work, we present a novel image inversion architecture that extracts high-rate latent features and includes a flow estimation module to warp these features to adapt them to edits. This is because edits often involve spatial changes in the image, such as adjustments to pose or smile. Thus, high-rate latent features must be accurately repositioned to match their new locations in the edited image space. We achieve this by employing flow estimation to determine the necessary spatial adjustments, followed by warping the features to align them correctly in the edited image. Specifically, we estimate the flows from StyleGAN features of edited and unedited latent codes. By estimating the high-rate features and warping them for edits, we achieve both high-fidelity to the input image and high-quality edits. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements.



中文翻译:


使用 StyleGAN 翘曲残差以进行图像编辑



StyleGAN 模型通过其语义可解释的潜在组织来展示编辑能力,这需要成功的 GAN 反转方法来编辑真实图像。已经提出了许多将图像倒置到 StyleGAN 的潜在空间中的作品。但是,它们的结果要么对输入图像保真度低,要么编辑质量差,尤其是对于需要大型转换的编辑。这是因为低比特率潜在空间由于信息瓶颈而丢失了许多图像细节,即使它提供了一个可编辑的空间。另一方面,较高比特率的潜在空间可以将所有图像细节传递给 StyleGAN 以完美重建图像,但编辑质量较低。在这项工作中,我们提出了一种新颖的图像反转架构,它提取了高速潜在特征,并包括一个流量估计模块来扭曲这些特征以使其适应编辑。这是因为编辑通常涉及图像中的空间更改,例如调整姿势或微笑。因此,必须准确重新定位高速率潜在特征,以匹配它们在编辑后的图像空间中的新位置。我们通过采用流量估计来确定必要的空间调整,然后扭曲特征以在编辑后的图像中正确对齐它们来实现这一点。具体来说,我们估计了已编辑和未编辑的潜在代码的 StyleGAN 特征的流量。通过估计高速率特征并对其进行 Warp 以进行编辑,我们实现了对输入图像的高保真度和高质量的编辑。我们进行了广泛的实验,并将我们的方法与最先进的反演方法进行了比较。定性指标和视觉比较显示了显著的改进。

更新日期:2024-11-18
down
wechat
bug