当前位置:
X-MOL 学术
›
Sens. Actuators B Chem.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Good results from sensor data: Performance of machine learning algorithms for regression problems in chemical sensors
Sensors and Actuators B: Chemical ( IF 8.0 ) Pub Date : 2024-08-29 , DOI: 10.1016/j.snb.2024.136528 Lajos Höfler
Sensors and Actuators B: Chemical ( IF 8.0 ) Pub Date : 2024-08-29 , DOI: 10.1016/j.snb.2024.136528 Lajos Höfler
Accurately predicting unseen data, instead of mere memorization of training examples, is a critical goal of machine learning. This generalization is particularly important in the field of chemical sensors, where the ability to accurately predict the chemical properties or concentration levels of unknown samples is crucial. The paper presents a comprehensive yet accessible introduction to various machine learning concepts, highlighting the importance of model interpretability and generalization in ensuring reliable and accurate results in this context. Nonlinear sensor array data are utilized to introduce key concepts (e.g., bias-variance tradeoff) and techniques (linear models, partial least squares regression, support vector machines, -nearest neighbors, decision trees, ensemble methods, automated machine learning, symbolic regression, and artificial neural networks), providing a solid foundation to make informed decisions when selecting machine learning techniques for sensor-specific regression applications. The results clearly indicate a number of conclusions. First, overparameterized deep feedforward neural networks show great accuracy and generalization when trained on a sufficiently large dataset. Second, symbolic regression models proved to be more accurate than deep feedforward neural networks and classical machine learning techniques on smaller datasets. Third, the performance of various machine learning models was dataset-dependent, showing the importance of comparative studies to determine the most suitable approach. It is clear that the optimal model cannot be known . This paper aims to provide a starting point for investigations on the performance of different machine learning techniques in chemical sensor applications.
中文翻译:
传感器数据的良好结果:化学传感器中回归问题的机器学习算法的性能
准确预测看不见的数据,而不是仅仅记住训练示例,是机器学习的关键目标。这种概括在化学传感器领域尤其重要,其中准确预测未知样品的化学性质或浓度水平的能力至关重要。本文对各种机器学习概念进行了全面且易于理解的介绍,强调了模型可解释性和泛化性对于确保在此背景下获得可靠和准确结果的重要性。利用非线性传感器阵列数据来介绍关键概念(例如,偏差-方差权衡)和技术(线性模型、偏最小二乘回归、支持向量机、最近邻、决策树、集成方法、自动机器学习、符号回归、和人工神经网络),为在为特定传感器的回归应用选择机器学习技术时做出明智的决策提供了坚实的基础。结果清楚地表明了一些结论。首先,过度参数化的深度前馈神经网络在足够大的数据集上训练时表现出很高的准确性和泛化性。其次,事实证明,在较小的数据集上,符号回归模型比深度前馈神经网络和经典机器学习技术更准确。第三,各种机器学习模型的性能依赖于数据集,这表明比较研究对于确定最合适的方法的重要性。显然,最优模型是未知的。本文旨在为研究化学传感器应用中不同机器学习技术的性能提供一个起点。
更新日期:2024-08-29
中文翻译:
传感器数据的良好结果:化学传感器中回归问题的机器学习算法的性能
准确预测看不见的数据,而不是仅仅记住训练示例,是机器学习的关键目标。这种概括在化学传感器领域尤其重要,其中准确预测未知样品的化学性质或浓度水平的能力至关重要。本文对各种机器学习概念进行了全面且易于理解的介绍,强调了模型可解释性和泛化性对于确保在此背景下获得可靠和准确结果的重要性。利用非线性传感器阵列数据来介绍关键概念(例如,偏差-方差权衡)和技术(线性模型、偏最小二乘回归、支持向量机、最近邻、决策树、集成方法、自动机器学习、符号回归、和人工神经网络),为在为特定传感器的回归应用选择机器学习技术时做出明智的决策提供了坚实的基础。结果清楚地表明了一些结论。首先,过度参数化的深度前馈神经网络在足够大的数据集上训练时表现出很高的准确性和泛化性。其次,事实证明,在较小的数据集上,符号回归模型比深度前馈神经网络和经典机器学习技术更准确。第三,各种机器学习模型的性能依赖于数据集,这表明比较研究对于确定最合适的方法的重要性。显然,最优模型是未知的。本文旨在为研究化学传感器应用中不同机器学习技术的性能提供一个起点。