Characterising harmful data sources when constructing multi-fidelity surrogate models,Artificial Intelligence

当前位置： X-MOL 学术 › Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Characterising harmful data sources when constructing multi-fidelity surrogate models
Artificial Intelligence ( IF 5.1 ) Pub Date : 2024-08-23 , DOI: 10.1016/j.artint.2024.104207
Nicolau Andrés-Thió , Mario Andrés Muñoz , Kate Smith-Miles

Surrogate modelling techniques have seen growing attention in recent years when applied to both modelling and optimisation of industrial design problems. These techniques are highly relevant when assessing the performance of a particular design carries a high cost, as the overall cost can be mitigated via the construction of a model to be queried in lieu of the available high-cost source. The construction of these models can sometimes employ other sources of information which are both cheaper and less accurate. The existence of these sources however poses the question of which sources should be used when constructing a model. Recent studies have attempted to characterise harmful data sources to guide practitioners in choosing when to ignore a certain source. These studies have done so in a synthetic setting, characterising sources using a large amount of data that is not available in practice. Some of these studies have also been shown to potentially suffer from bias in the benchmarks used in the analysis. In this study, we approach the characterisation of harmful low-fidelity sources as an algorithm selection problem. We employ recently developed benchmark filtering techniques to conduct a bias-free assessment, providing objectively varied benchmark suites of different sizes for future research. Analysing one of these benchmark suites with the technique known as Instance Space Analysis, we provide an intuitive visualisation of when a low-fidelity source should be used. By performing this analysis using only the limited data available to train a surrogate model, we are able to provide guidelines that can be directly used in an applied industrial setting.

中文翻译：

构建多保真代理模型时表征有害数据源

近年来，代理建模技术在应用于工业设计问题的建模和优化时受到越来越多的关注。当评估特定设计的性能需要高成本时，这些技术非常相关，因为可以通过构建要查询的模型来代替可用的高成本源来降低总体成本。这些模型的构建有时可以使用其他信息源，这些信息既便宜又不太准确。然而，这些来源的存在提出了构建模型时应使用哪些来源的问题。最近的研究试图描述有害数据源的特征，以指导从业者选择何时忽略某个数据源。这些研究是在综合环境中进行的，使用实践中无法获得的大量数据来表征来源。其中一些研究还被证明可能存在分析中使用的基准偏差。在本研究中，我们将有害低保真源的表征作为算法选择问题。我们采用最近开发的基准过滤技术来进行无偏差评估，为未来的研究提供客观的不同规模的基准套件。通过使用称为实例空间分析的技术分析这些基准套件之一，我们提供了何时应使用低保真源的直观可视化。通过仅使用可用于训练替代模型的有限数据来执行此分析，我们能够提供可直接用于应用工业环境的指南。

更新日期：2024-08-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南