当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-06-03 , DOI: 10.1109/tifs.2024.3409054
Davide Alessandro Coccomini 1 , Giorgos Kordopatis Zilos 2 , Giuseppe Amato 1 , Roberto Caldelli 3 , Fabrizio Falchi 1 , Symeon Papadopoulos 4 , Claudio Gennaro 1
Affiliation  

In this paper, we present MINTIME, a video deepfake detection method that effectively captures spatial and temporal inconsistencies in videos that depict multiple individuals and varying face sizes. Unlike previous approaches that either employ simplistic a-posteriori aggregation schemes, i.e., averaging or max operations, or only focus on the largest face in the video, our proposed method learns to accurately detect spatio-temporal inconsistencies across multiple identities in a video through a Spatio-Temporal Transformer combined with a Convolutional Neural Network backbone. This is achieved through an Identity-aware Attention mechanism that applies a masking operation on the face sequence to process each identity independently, which enables effective video-level aggregation. Furthermore, our system incorporates two novel embedding schemes: (i) the Temporal Coherent Positional Embedding, which encodes the temporal information of the face sequences of each identity, and (ii) the Size Embedding, which captures the relative sizes of the faces to the video frames. MINTIME achieves state-of-the-art performance on the ForgeryNet dataset, with a remarkable improvement of up to 14% AUC in videos containing multiple people. Moreover, it demonstrates very robust generalization capabilities in cross-forgery and cross-dataset settings. The code is publicly available at: https://github.com/davide-coccomini/MINTIME-Multi-Identity-size-iNvariant-TIMEsformer-for-Video-Deepfake-Detection .

中文翻译:


MINTIME:多身份尺寸不变视频 Deepfake 检测



在本文中,我们提出了 MINTIME,一种视频深度伪造检测方法,可以有效捕获描绘多个个体和不同面部尺寸的视频中的空间和时间不一致之处。与之前采用简单的后验聚合方案(即平均或最大操作)或仅关注视频中最大的脸部的方法不同,我们提出的方法通过学习准确检测视频中多个身份之间的时空不一致。时空变换器与卷积神经网络主干相结合。这是通过身份感知注意力机制实现的,该机制对人脸序列应用屏蔽操作来独立处理每个身份,从而实现有效的视频级聚合。此外,我们的系统采用了两种新颖的嵌入方案:(i)时间相干位置嵌入,它对每个身份的面部序列的时间信息进行编码,以及(ii)大小嵌入,它捕获面部与面部的相对大小。视频帧。 MINTIME 在 ForgeryNet 数据集上实现了最先进的性能,在包含多人的视频中 AUC 显着提高了高达 14%。此外,它在交叉伪造和跨数据集设置中展示了非常强大的泛化能力。该代码可在以下位置公开获取:https://github.com/davide-coccomini/MINTIME-Multi-Identity-size-iNvariant-TIMEsformer-for-Video-Deepfake-Detection。
更新日期:2024-06-03
down
wechat
bug