International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-09-30 , DOI: 10.1007/s11263-024-02233-1 Ruicong Liu, Haofei Wang, Feng Lu
Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, i.e., high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.
中文翻译:
从注视抖动到域适应:通过操纵高频分量来推广注视估计
凝视作为人类情感的关键指标,在各种计算机视觉任务中发挥着至关重要的作用。然而,当应用于看不见的环境时,注视估计的准确性往往会显着恶化,从而限制了其实用价值。因此,增强注视估计器对新领域的泛化能力成为一项关键挑战。现有领域适应研究的一个常见限制是无法识别和利用适应过程中真正有影响的因素。这一缺点往往会导致精度有限、自适应不稳定等问题。为了解决这个问题,本文发现了跨域问题中真正有影响的因素,即高频成分(HFC)。这一发现源于对注视抖动的分析,这是一个经常被忽视但影响深远的问题,即使对于视觉上相似的输入图像,预测也可能会出现巨大偏差。受这一发现的启发,我们提出了一种“嵌入然后抑制”HFC 操作策略,以使注视估计适应新领域。我们的方法首先将附加 HFC 嵌入到输入图像中,然后通过抑制 HFC 的影响来执行域适应。具体来说,每个原始图像都与其 HFC 嵌入版本配对,从而使我们的方法能够通过对比配对内的表示来抑制 HFC 影响。所提出的方法在四个跨域凝视中进行评估。实验结果表明,它不仅提高了注视估计精度,而且显着减少了目标域中的注视抖动。 与之前的研究相比,我们的方法提供了更高的精度,减少了注视抖动,并提高了适应稳定性,标志着实际部署的潜力。