当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SuS-X: Training-Free Name-Only Transfer of Vision-Language Models
arXiv - CS - Computation and Language Pub Date : 2022-11-28 , DOI: arxiv-2211.16198
Vishaal Udandarao, Ankush Gupta, Samuel Albanie

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks -- SuS and TIP-X, that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve state-of-the-art results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.

中文翻译:

SuS-X:视觉-语言模型的免训练仅名称迁移

对比语言图像预训练 (CLIP) 已成为训练大规模视觉语言模型的一种简单而有效的方法。CLIP 在各种下游任务上展示了令人印象深刻的零样本分类和检索。然而,要充分发挥其潜力,微调似乎仍然是必要的。微调整个 CLIP 模型可能会占用大量资源且不稳定。此外,最近旨在规避这种微调需求的方法仍然需要访问目标分布中的图像。在本文中,我们采用不同的方法并探索无训练“仅名称迁移”的机制,其中我们拥有的关于下游任务的唯一知识包括下游目标类别的名称。我们提出了一种新方法 SuS-X,它由两个关键构建块组成——SuS 和 TIP-X,这既不需要密集的微调,也不需要昂贵的标记数据。SuS-X 在 19 个基准数据集上实现了最先进的零样本分类结果。我们进一步展示了 TIP-X 在免训练少镜头设置中的效用,我们再次在强大的免训练基线上获得了最先进的结果。代码可在 https://github.com/vishaal27/SuS-X 获得。
更新日期:2022-11-30
down
wechat
bug