当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring Bi-Level Inconsistency via Blended Images for Generalizable Face Forgery Detection
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-06-20 , DOI: 10.1109/tifs.2024.3417266
Peiqi Jiang 1 , Hongtao Xie 1 , Lingyun Yu 1 , Guoqing Jin 2 , Yongdong Zhang 1
Affiliation  

The challenge of generalization in face forgery detection has become increasingly prominent as manipulation techniques continue to evolve. Although recent image blending-based methods have demonstrated remarkable potential, they often encounter a significant performance drop when applied to datasets exhibiting significant domain gaps. This limitation stems from the exclusive reliance of prior methods on blending unaltered faces with various augmentations to produce common artifacts, which ignores the inherent characteristics of the forged regions. To fully exploit the potential of image blending-based methods for generalizable Deepfake detection, we propose a novel image synthesis framework called Bi-Level Inconsistency Generator (Bi-LIG) to introduce bi-level inconsistency in the synthesized images. Specifically, Bi-LIG generates synthetic images by blending source and target images from both pristine and forged image sets, introducing a) Extrinsic-Inconsistency between real and pseudo-forged regions, and b) Inherent-Inconsistency between real and manipulated areas. In this way, Bi-LIG creates a diverse synthesized image set and establishes a generalizable training domain. Furthermore, we propose a novel face forgery detection network named Token Consistency Constrained Vision Transformer, in which two modules are developed based on patch consistency learning. Firstly, a Patch Token Contrast module is employed to learn the bi-level patch inconsistencies. Secondly, a Progressive Patch Token Assemble module is adopted to aggregate local patch relations and enhance the inconsistency representations. Experimental results demonstrate the effectiveness and superiority of our method on both in-dataset and cross-dataset evaluations. Notably, our approach outperforms state-of-the-art methods by 5.09% and 10.15% on cross-dataset evaluations in DFDCp and DFDC, respectively.

中文翻译:


通过混合图像探索双层不一致性以实现可推广的人脸伪造检测



随着操纵技术的不断发展,人脸伪造检测的泛化挑战变得越来越突出。尽管最近基于图像混合的方法已经表现出巨大的潜力,但当应用于表现出明显域差距的数据集时,它们经常会遇到性能显着下降的情况。这种限制源于现有方法完全依赖于将未改变的面孔与各种增强混合以产生常见的伪影,而忽略了伪造区域的固有特征。为了充分利用基于图像混合的方法进行可推广的 Deepfake 检测的潜力,我们提出了一种称为双层不一致性生成器(Bi-LIG)的新型图像合成框架,以在合成图像中引入双层不一致性。具体来说,Bi-LIG 通过混合来自原始图像集和伪造图像集的源图像和目标图像来生成合成图像,引入 a) 真实区域和伪伪造区域之间的外在不一致,以及 b) 真实区域和操纵区域之间的内在不一致。通过这种方式,Bi-LIG 创建了多样化的合成图像集并建立了可泛化的训练域。此外,我们提出了一种新颖的人脸伪造检测网络,名为令牌一致性约束视觉变换器,其中两个模块是基于补丁一致性学习开发的。首先,采用补丁令牌对比模块来学习双层补丁不一致。其次,采用渐进式补丁令牌组装模块来聚合局部补丁关系并增强不一致表示。实验结果证明了我们的方法在数据集内和跨数据集评估方面的有效性和优越性。 值得注意的是,我们的方法在 DFDCp 和 DFDC 的跨数据集评估中分别优于最先进的方法 5.09% 和 10.15%。
更新日期:2024-06-20
down
wechat
bug