Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2024-12-18 , DOI: 10.1145/3708497
Firas Bayram, Bestoun S. Ahmed

Artificial intelligence (AI), and especially its sub-field of Machine Learning (ML), are impacting the daily lives of everyone with their ubiquitous applications. In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. From a practical perspective, conventional ML systems process historical data to extract the features that are consequently used to train ML models that perform the desired task. However, in practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment. Although MLOps demonstrated great success in streamlining ML processes, thoroughly defining the specifications of robust MLOps approaches remains of great interest to researchers and practitioners. In this paper, we provide a comprehensive overview of the trustworthiness property of MLOps systems. Specifically, we highlight technical practices to achieve robust MLOps systems. In addition, we survey the existing research approaches that address the robustness aspects of ML systems in production. We also review the tools and software available to build MLOps systems and summarize their support to handle the robustness aspects. Finally, we present the open challenges and propose possible future directions and opportunities within this emerging field. The aim of this paper is to provide researchers and practitioners working on practical AI applications with a comprehensive view to adopt robust ML solutions in production environments.

中文翻译：

在生产中实现可信机器学习：MLOps 方法的稳健性概述

人工智能（AI），尤其是其机器学习（ML）子领域，正在通过其无处不在的应用程序影响着每个人的日常生活。近年来，AI 研究人员和从业者引入了原则和指南，以构建能够做出可靠且值得信赖的决策的系统。从实用的角度来看，传统的 ML 系统会处理历史数据以提取特征，这些特征因此用于训练执行所需任务的 ML 模型。然而，在实践中，当系统需要运行和部署以在现实环境中持续发展和运行时，就会出现一个根本性的挑战。为了应对这一挑战，机器学习运营（MLOps）已成为在部署中标准化 ML 解决方案的潜在配方。尽管 MLOps 在简化 ML 流程方面取得了巨大成功，但彻底定义稳健的 MLOps 方法的规范仍然引起了研究人员和从业者的极大兴趣。在本文中，我们全面概述了 MLOps 系统的可信度。具体来说，我们重点介绍了实现稳健 MLOps 系统的技术实践。此外，我们还调查了解决生产中 ML 系统稳健性方面的现有研究方法。我们还回顾了可用于构建 MLOps 系统的工具和软件，并总结了它们对处理稳健性方面的支持。最后，我们提出了这一新兴领域面临的公开挑战，并提出了可能的未来方向和机遇。本文的目的是为从事实际 AI 应用的研究人员和从业者提供在生产环境中采用强大的 ML 解决方案的全面视图。

更新日期：2024-12-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南