A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2024-10-16 , DOI: 10.1145/3689036
Sheng Zhou, Hongjia Xu, Zhuonan Zheng, Jiawei Chen, Zhao Li, Jiajun Bu, Jia Wu, Xin Wang, Wenwu Zhu, Martin Ester

Clustering is a fundamental machine learning task which aims at assigning instances into groups so that similar samples belong to the same cluster while dissimilar samples belong to different clusters. Shallow clustering methods usually assume that data are collected and expressed as feature vectors within which clustering is performed. However, clustering high-dimensional data, such as images, texts, videos, and graphs, poses significant challenges for clustering tasks, such as indiscriminate representation and intricate relationships among instances. Over the past decades, deep learning has achieved remarkable success in effective representation learning and modeling complex relationships. Motivated by these advancements, Deep Clustering seeks to improve clustering outcomes through deep learning techniques, garnering considerable interest from both academia and industry. Despite many contributions to this vibrant area of research, the lack of systematic analysis and a comprehensive taxonomy has hindered progress in this field. In this survey, we first explore how deep learning can be integrated into deep clustering and identify two fundamental components: the representation learning module and the clustering module. Then we summarize and analyze the representative design of these two modules. Furthermore, we introduce a novel taxonomy of deep clustering based on how these two modules interact, specifically through multistage, generative, iterative, and simultaneous approaches. In addition, we present well-known benchmark datasets, evaluation metrics, and open-source tools to clearly demonstrate different experimental approaches. Finally, we examine the practical applications of deep clustering and propose challenging areas for future research.

中文翻译：

深度聚类综合调查：分类、挑战和未来方向

聚类是一项基本的机器学习任务，旨在将实例分配到组中，以便相似的样本属于同一集群，而不同的样本属于不同的集群。浅层聚类方法通常假定收集数据并将其表示为特征向量，并在其中执行聚类。但是，对高维数据（如图像、文本、视频和图形）进行聚类分析会给聚类任务带来重大挑战，例如不加区分的表示和实例之间的复杂关系。在过去的几十年里，深度学习在有效的表示学习和复杂关系建模方面取得了显著的成功。在这些进步的推动下，Deep Clustering 寻求通过深度学习技术改善聚类结果，这引起了学术界和工业界的极大兴趣。尽管对这个充满活力的研究领域做出了许多贡献，但缺乏系统分析和全面的分类法阻碍了该领域的进步。在这项调查中，我们首先探讨了如何将深度学习集成到深度聚类中，并确定了两个基本组成部分：表示学习模块和聚类模块。然后我们总结分析了这两个模块的代表性设计。此外，我们根据这两个模块的交互方式，特别是通过多阶段、生成式、迭代式和同步方法，引入了一种新的深度聚类分类法。此外，我们还提供了众所周知的基准数据集、评估指标和开源工具，以清楚地演示不同的实验方法。最后，我们研究了深度聚类的实际应用，并为未来的研究提出了具有挑战性的领域。

更新日期：2024-10-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南