Journal of Cleaner Production ( IF 9.7 ) Pub Date : 2020-09-06 , DOI: 10.1016/j.jclepro.2020.123795 Dominik Seiler , Garret E. O’Donnell
In the last decade, data availability of industrial processes have gained increasing importance due to the rise of new digital technologies. In practise, many data gaps still exist as a result of missing or broken measuring devices that impede the endeavour of comprehensive data availability. Proxy metering devices represent an easy to adopt solution for missing data by approximating unknown parameters with a mathematical model that is set up on process knowledge and available online meters. To date, the dissemination of proxy metering devices in industrial surroundings is limited due to a lack of general guidelines and the high complexity of available cases. Thus, this research provides comprehensive instructions for developing proxy meter devices on the basis of small datasets. Applied complexity measures categorise five case datasets depending on the inherent intricacy. Based on this classification, the performances of three regression algorithms (multiple linear regression, partial least squares regression, and neural networks) are analysed in combination with dataset modifications techniques bootstrap and artificial noise injection. The results highlight in particular the positive influence of replicating a small training dataset by bootstrapping for neural networks and when the addition of artificial noise is appropriate.
中文翻译:
填补空白:比较小数据方案的代理测量技术
在过去的十年中,由于新的数字技术的兴起,工业过程的数据可用性变得越来越重要。实际上,由于缺少测量设备或损坏了测量设备而导致许多数据差距仍然存在,从而阻碍了全面数据可用性的努力。代理计量设备通过使用基于过程知识和可用在线计量表的数学模型来近似未知参数,代表了易于采用的丢失数据的解决方案。迄今为止,由于缺乏通用准则和可用案例的高度复杂性,在工业环境中代理计量设备的传播受到限制。因此,本研究为基于小型数据集开发代理计量器设备提供了全面的指导。应用的复杂性度量根据固有的复杂性将五个案例数据集分类。基于此分类,结合数据集修改技术自举和人工噪声注入,分析了三种回归算法(多重线性回归,偏最小二乘回归和神经网络)的性能。结果特别强调了通过自举神经网络来复制小型训练数据集的积极影响,以及在适当时添加人工噪声的情况。