当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Mitigating Stability-Plasticity Dilemma in CLIP-guided Image Morphing via Geodesic Distillation Loss
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-12-10 , DOI: 10.1007/s11263-024-02308-z
Yeongtak Oh, Saehyung Lee, Uiwon Hwang, Sungroh Yoon

Large-scale language-vision pre-training models, such as CLIP, have achieved remarkable results in text-guided image morphing by leveraging several unconditional generative models. However, existing CLIP-guided methods face challenges in achieving photorealistic morphing when adapting the generator from the source to the target domain. Specifically, current guidance methods fail to provide detailed explanations of the morphing regions within the image, leading to misguidance and catastrophic forgetting of the original image’s fidelity. In this paper, we propose a novel approach considering proper regularization losses to overcome these difficulties by addressing the SP dilemma in CLIP guidance. Our approach consists of two key components: (1) a geodesic cosine similarity loss that minimizes inter-modality features (i.e., image and text) in a projected subspace of CLIP space, and (2) a latent regularization loss that minimizes intra-modality features (i.e., image and image) on the image manifold. By replacing the naive directional CLIP loss in a drop-in replacement manner, our method achieves superior morphing results for both images and videos across various benchmarks, including CLIP-inversion.



中文翻译:


关于缓解 CLIP 引导图像变形中通过测地线蒸馏损失的稳定性-塑性困境



大规模语言视觉预训练模型,如 CLIP,通过利用几个无条件生成模型,在文本引导图像变形方面取得了显着成果。但是,现有的 CLIP 导向方法在将生成器从源域调整到目标域时,在实现照片级真实感变形方面面临挑战。具体来说,当前的指导方法未能提供图像中变形区域的详细解释,从而导致误导和灾难性地忘记原始图像的保真度。在本文中,我们提出了一种考虑适当正则化损失的新方法,通过解决 CLIP 指导中的 SP 困境来克服这些困难。我们的方法包括两个关键组成部分:(1) 测地线余弦相似性损失,最小化 CLIP 空间投影子空间中的模态间特征(即图像和文本),以及 (2) 潜在正则化损失,最小化图像流形上的模态内特征(即图像和图像)。通过以直接替换方式替换朴素的定向 CLIP 损失,我们的方法在各种基准(包括 CLIP 反转)中为图像和视频实现了卓越的变形结果。

更新日期:2024-12-10
down
wechat
bug