A general supply-inspect cost framework to regulate the reliability-usability trade-offs for few-shot inference,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A general supply-inspect cost framework to regulate the reliability-usability trade-offs for few-shot inference
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-08-19 , DOI: 10.1007/s40747-024-01599-6
Fernando Martínez-Plumed , Gonzalo Jaimovitch-López , Cèsar Ferri , María José Ramírez-Quintana , José Hernández-Orallo

Language models and other recent machine learning paradigms blur the distinction between generative and discriminative tasks, in a continuum that is regulated by the degree of pre- and post-supervision that is required from users, as well as the tolerated level of error. In few-shot inference, we need to find a trade-off between the number and cost of the solved examples that have to be supplied, those that have to be inspected (some of them accurate but others needing correction) and those that are wrong but pass undetected. In this paper, we define a new Supply-Inspect Cost Framework, associated graphical representations and comprehensive metrics that consider all these elements. To optimise few-shot inference under specific operating conditions, we introduce novel algorithms that go beyond the concept of rejection rules in both static and dynamic contexts. We illustrate the effectiveness of all these elements for a transformative domain, data wrangling, for which language models can have a huge impact if we are able to properly regulate the reliability-usability trade-off, as we do in this paper.

中文翻译：

一个通用的供应-检查成本框架，用于调节小样本推理的可靠性-可用性权衡

语言模型和其他最近的机器学习范式模糊了生成任务和判别任务之间的区别，这是一个由用户所需的前后监督程度以及可容忍错误水平调节的连续体。在少样本推理中，我们需要在必须提供的已解决示例、必须检查的示例（其中一些是准确的，但其他需要纠正）和错误的示例的数量和成本之间找到权衡但未被发现。在本文中，我们定义了一个新的供应-检查成本框架、相关的图形表示和考虑所有这些元素的综合指标。为了优化特定操作条件下的少样本推理，我们引入了超越静态和动态上下文中拒绝规则概念的新颖算法。我们说明了所有这些元素对于变革性领域（数据整理）的有效性，如果我们能够像本文中那样正确调节可靠性与可用性之间的权衡，那么语言模型可能会产生巨大影响。

更新日期：2024-08-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南