当前位置: X-MOL 学术WIREs Data Mining Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reflecting on a Decade of Evolution: MapReduce‐Based Advances in Partitioning‐Based, Hierarchical‐Based, and Density‐Based Clustering (2013–2023)
WIREs Data Mining and Knowledge Discovery ( IF 6.4 ) Pub Date : 2024-10-21 , DOI: 10.1002/widm.1566
Tanvir Habib Sardar

The traditional clustering algorithms are not appropriate for large real‐world datasets or big data, which is attributable to computational expensiveness and scalability issues. As a solution, the last decade's research headed towards distributed clustering using the MapReduce framework. This study conducts a bibliometric review to assess, establish, and measure the patterns and trends of the MapReduce‐based partitioning, hierarchical, and density clustering algorithms over the past decade (2013–2023). A digital text‐mining‐based comprehensive search technique with multiple field‐specific keywords, inclusion measures, and exclusion criteria is employed to obtain the research landscape from the Scopus database. The Scopus‐obtained data is analyzed using the VOSViewer software tool and coded using the R statistical analysis tool. The analysis identifies the numbers of scholarly articles, diversities of article sources, their impact and growth patterns, details of most influential authors and co‐authors, most cited articles, most contributing affiliations and countries, and their collaborations, use of different keywords and their impact, and so forth. The study further explores the articles and reports the methodologies employed for designing MapReduce‐based counterparts of traditional partitioning, hierarchical, and density clustering algorithms and their optimizations and hybridizations. Finally, the study lists the main research challenges encountered in the past decade for MapReduce‐based partitioning, hierarchical, and density clustering. It suggests possible areas for future research to contribute further in this field.

中文翻译:


回顾十年的演变:基于 MapReduce 的基于分区、基于层次结构和基于密度的聚类的进展 (2013–2023)



传统的聚类算法不适用于大型真实数据集或大数据,这是由于计算成本和可扩展性问题造成的。作为一种解决方案,过去十年的研究转向了使用 MapReduce 框架的分布式集群。本研究进行了文献计量学审查,以评估、建立和衡量过去十年(2013-2023 年)基于 MapReduce 的分区、分层和密度聚类算法的模式和趋势。采用基于数字文本挖掘的综合搜索技术,具有多个特定领域的关键词、纳入措施和排除标准,从 Scopus 数据库中获取研究前景。使用 VOSViewer 软件工具对 Scopus 获得的数据进行分析,并使用 R 统计分析工具进行编码。该分析确定了学术文章的数量、文章来源的多样性、它们的影响和增长模式、最有影响力的作者和合著者的详细信息、被引用次数最多的文章、贡献最多的附属机构和国家、他们的合作、不同关键词的使用及其影响,等等。该研究进一步探讨了这些文章,并报告了用于设计基于 MapReduce 的传统分区、分层和密度聚类算法及其优化和混合的方法。最后,该研究列出了过去十年中在基于 MapReduce 的分区、分层和密度聚类方面遇到的主要研究挑战。它为未来研究在这一领域做出进一步贡献的可能领域提出了建议。
更新日期:2024-10-21
down
wechat
bug