当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Hadoop MapReduce performance on heterogeneous single board computer clusters
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-06-15 , DOI: 10.1016/j.future.2024.06.025
Sooyoung Lim , Dongchul Park

Over the past decade, Apache Hadoop has become a leading framework for big data processing. Single board computer (SBC) clusters, predominantly adopting Raspberry Pi (RPi), have been employed to explore the potential of MapReduce processing in terms of low power and cost because, capital costs aside, power consumption has also become a primary concern in many industries. After building SBC clusters, it is prevalent to consider adding more nodes, particularly newer generation SBCs, to the existing clusters or replacing old (or inactive) nodes with new ones to improve performance, inevitably causing heterogeneous SBC clusters. The Hadoop framework on these heterogeneous SBC clusters creates challenging new problems due to computing resource discrepancies in each node. Native Hadoop does not carefully consider the heterogeneity of the cluster nodes. Consequently, heterogeneous SBC Hadoop clusters result in significant performance variation or, more critically, persistent node failures. This paper proposes a new Hadoop Yet Another Resource Negotiator (YARN) architecture design to improve MapReduce performance on heterogeneous SBC Hadoop clusters with tight computing resources. We newly implement two main scheduling policies on Hadoop YARN based on the correct computing resource information that each SBC node provides: (1) two (master-driven slave-driven) MapReduce task scheduling frameworks to determine more effective processing modes and (2) ApplicationMaster (AM) and reduce task distribution mechanisms to provide the best Hadoop performance by minimizing performance variation. Thus, the proposed Hadoop framework makes the best use of the performance-frugal SBC Hadoop cluster by intelligently distributing MapReduce tasks to each node. To our knowledge, the proposed framework is the first redesigned Hadoop YARN architecture to address various challenging problems particularly on for big data processing. The extensive experiments with Hadoop benchmarks demonstrate that the redesigned framework performs better performance than the native Hadoop by an average of 2.55 and 1.55 under I/O intensive and CPU-intensive workloads, respectively.

中文翻译:


提高异构单板计算机集群上的 Hadoop MapReduce 性能



在过去的十年中,Apache Hadoop 已成为大数据处理的领先框架。主要采用 Raspberry Pi (RPi) 的单板计算机 (SBC) 集群已被用来探索 MapReduce 处理在低功耗和成本方面的潜力,因为除了资本成本之外,功耗也已成为许多行业的主要关注点。在构建SBC集群之后,普遍考虑向现有集群添加更多节点,特别是新一代SBC,或者用新节点替换旧的(或不活动的)节点以提高性能,这不可避免地导致异构SBC集群。由于每个节点的计算资源差异,这些异构 SBC 集群上的 Hadoop 框架带来了具有挑战性的新问题。 Native Hadoop没有仔细考虑集群节点的异构性。因此,异构 SBC Hadoop 集群会导致显着的性能变化,或更严重的是,持久的节点故障。本文提出了一种新的 Hadoop Yet Another Resource Negotiator (YARN) 架构设计,以提高计算资源紧张的异构 SBC Hadoop 集群上的 MapReduce 性能。我们根据每个SBC节点提供的正确计算资源信息,在Hadoop YARN上新实现了两个主要调度策略:(1)两个(主驱动从驱动)MapReduce任务调度框架以确定更有效的处理模式和(2)ApplicationMaster (AM) 并减少任务分配机制,通过最小化性能变化来提供最佳 Hadoop 性能。因此,所提出的 Hadoop 框架通过智能地将 MapReduce 任务分配到每个节点,充分利用了性能节约的 SBC Hadoop 集群。 据我们所知,所提出的框架是第一个重新设计的 Hadoop YARN 架构,旨在解决各种具有挑战性的问题,特别是大数据处理方面的问题。 Hadoop 基准测试的大量实验表明,重新设计的框架在 I/O 密集型和 CPU 密集型工作负载下的性能比原生 Hadoop 平均分别提高了 2.55 和 1.55。
更新日期:2024-06-15
down
wechat
bug