Computers in Industry ( IF 8.2 ) Pub Date : 2023-11-28 , DOI: 10.1016/j.compind.2023.104053 François Loison , Benoit Eynard
Structured enterprise information systems such as Enterprise Resources Planning (ERP) and Product Lifecycle Management (PLM) have reached a maturity plateau and are storing up to hundreds of millions of objects and links. Such data is crucial for enterprise processes and operations. They are frequently the target of data transformation such as migration to a new data system, re-organisation according to new business paradigms, cleansing, purge and archive, etc. To make data transformation manageable, iterative, and achievable, it requires a divide and conquer strategy therefore producing loosely coupled data packages. Most data migration methods recommend divide and conquer strategy but do not explain how to produce these loosely coupled data packages. This paper outlines there exist two different approaches relying on a wide range of algorithms: clustering and community detection. Also, data package must be PLM business meaningful and fit into a mesoscopic scale to provide operational and achievable options for data transformation. Finally, a PLM specific algorithm is proposed for pre-processing data before clustering. A multi-pass tooled-up method able to combine and sequence data clustering approaches/algorithms has been developed for this purpose: Data Systemizer (D6). Using graph-based clustering metrics will help to assess the benefit of multi-pass data clustering approach and provide some principles to select right clustering approaches/algorithms chain.
中文翻译:
PLM 数据转换:细观尺度视角和工业案例研究
企业资源规划 (ERP) 和产品生命周期管理(PLM) 等结构化企业信息系统已达到成熟期,并存储多达数亿个对象和链接。此类数据对于企业流程和运营至关重要。它们经常是数据转换的目标,例如迁移到新的数据系统、根据新的业务范式重新组织、清理、清除和归档等。为了使数据转换可管理、可迭代和可实现,需要进行划分和划分因此,征服策略会产生松散耦合的数据包。大多数数据迁移方法都推荐分而治之的策略,但没有解释如何生成这些松散耦合的数据包。本文概述了依赖于各种算法的两种不同方法:聚类和社区检测。此外,数据包必须对 PLM 业务有意义并适合介观规模,以便为数据转换提供可操作且可实现的选项。最后,提出了一种 PLM 专用算法,用于在聚类之前对数据进行预处理。为此目的,开发了一种能够组合和排序数据聚类方法/算法的多通道工具方法:Data Systemizer (D6)。使用基于图的聚类指标将有助于评估多通道数据聚类方法的好处,并提供一些选择正确的聚类方法/算法链的原则。