当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Global reduction for geo-distributed MapReduce across cloud federation
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-08-26 , DOI: 10.1016/j.future.2024.107492
Thouraya Gouasmi , Ahmed Hadj Kacem

Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing performances. Most popular frameworks like Hadoop and Spark are re-designed to process geographically distributed data at their locations. However, these methods still suffer from a large amount of data transfer over the Internet, which prohibits a high processing time and cost for many applications, and in several cases, the output results of the computation are smaller than its inputs. In this paper, we keep the data locality principle for processing data at different locations but ignore the principle of transferring the entire intermediate results to a single global reducer. We propose Geo-MR, an intelligent geo-distributed MapReduce-based framework across federated cloud based on two heuristic algorithms: (i) chosen the best clusters as global reducers to reduce the communication and optimize the transfer on the bandwidth, GResearch. (ii) The second, Geo-MR, ensures the scheduling of only the relevant data to selected global reducers that process the final results. As a baseline, we propose an exact MapReduce scheduling model for benchmarking and to compare and discuss the Geo-MR heuristic algorithm results. The experimental results show that the proposed algorithm Geo-MR can improve resource (bandwidth and VMs of clusters) utilization of the cloud federation and consequently reduce cost and job response time.

中文翻译:


跨云联邦的地理分布式 MapReduce 的全局缩减



地理分布式大数据处理日益增加,导致数据的来源分布在不同的国家并在全球范围内拥有数据中心(DC),也导致应用程序使用不同的站点来提高可靠性、安全性和处理能力表演。最流行的框架(如 Hadoop 和 Spark)经过重新设计,可以在其位置处理地理分布的数据。然而,这些方法仍然受到互联网上大量数据传输的影响,这阻碍了许多应用程序的高处理时间和成本,并且在某些情况下,计算的输出结果小于其输入。在本文中,我们保留了在不同位置处理数据的数据局部性原则,但忽略了将整个中间结果传输到单个全局减速器的原则。我们提出了 Geo-MR,一种跨联邦云的基于 MapReduce 的智能地理分布式框架,基于两种启发式算法:(i)选择最佳集群作为全局减速器,以减少通信并优化带宽传输,GResearch。 (ii) 第二个,Geo-MR,确保仅将相关数据调度到处理最终结果的选定全局减速器。作为基线,我们提出了一个精确的 MapReduce 调度模型来进行基准测试,并比较和讨论 Geo-MR 启发式算法结果。实验结果表明,所提出的算法Geo-MR可以提高云联邦的资源(带宽和集群虚拟机)利用率,从而降低成本和作业响应时间。
更新日期:2024-08-26
down
wechat
bug