Cross-Image Context Matters for Bongard Problems,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-Image Context Matters for Bongard Problems
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2023-09-07 , DOI: arxiv-2309.03468
Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas

Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not incorporate information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because unlike in few-shot learning tasks concerning object classification, the "key concept" in a typical Bongard problem can only be distinguished using multiple positives and multiple negatives. We explore a variety of simple methods to take this cross-image context into account, and demonstrate substantial gains over prior methods, leading to new state-of-the-art performance on Bongard-LOGO (75.3%) and Bongard-HOI (72.45%) and strong performance on the original Bongard problem set (60.84%).

中文翻译：

跨图像上下文对于邦加德问题很重要

当前的机器学习方法很难解决 Bongard 问题，这是一种 IQ 测试，需要从一组正负“支持”图像中导出抽象“概念”，然后对新的查询图像是否描述了关键进行分类概念。在自然图像 Bongard 问题的基准 Bongard-HOI 上，现有方法仅达到 66% 的准确率（机会为 50%）。准确率低通常归因于神经网络缺乏找到类人符号规则的能力。在这项工作中，我们指出，许多现有方法由于一个更简单的问题而丧失了准确性：它们没有将支持集中包含的信息作为一个整体进行合并，而是依赖于从各个支持中提取的信息。这是一个关键问题，因为与涉及对象分类的小样本学习任务不同，典型的 Bongard 问题中的“关键概念”只能使用多个正值和多个负值来区分。我们探索了各种简单的方法来考虑这种跨图像上下文，并展示了相对于先前方法的巨大收益，从而在 Bongard-LOGO (75.3%) 和 Bongard-HOI (72.45) 上实现了新的最先进的性能%），并且在原始 Bongard 问题集上表现出色（60.84%）。

更新日期：2023-09-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文