当前位置:
X-MOL 学术
›
Future Gener. Comput. Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Straggler mitigation via hierarchical scheduling in elastic stream computing systems
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-12-14 , DOI: 10.1016/j.future.2024.107673 Minghui Wu, Dawei Sun, Shang Gao, Rajkumar Buyya
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-12-14 , DOI: 10.1016/j.future.2024.107673 Minghui Wu, Dawei Sun, Shang Gao, Rajkumar Buyya
Skewed data distribution leads to certain tasks or nodes handling much more data than others, thereby slowing down their execution speed and classifying them as stragglers. Existing solutions attempt to establish a well-balanced workload to mitigate stragglers by using either data stream grouping or task scheduling. This “one size fits all” approach only considers single-level requirements and fails to address the diverse needs of the system across multiple levels, ultimately limiting its performance. To address these issues and mitigate stragglers effectively, we propose a hierarchical collaborative strategy called Ms-Stream. It aims to balance the data stream workloads among tasks and maintain load difference among compute nodes within an acceptable range. This paper discusses this strategy from the following aspects: (1) Ms-Stream constructs models for topology, grouping, and resource, along with the formalization of problems, including data stream grouping, task subgraph partitioning, and task deployment. (2) Ms-Stream employs a lightweight two-level grouping method to support dynamic workload assignment for stateful tasks, selectively offloading resources from task stragglers to others. (3) Ms-Stream allocates communication-intensive tasks to the same group through the directed acyclic graph representations of streaming applications, concurrently ensuring the equitable distribution of computation-intensive tasks across groups. (4) Ms-Stream deploys task groups to compute nodes with varying resource capacities following the descending maximum padding priority rule for a balanced workload. Performance metrics such as system throughput and latency are evaluated with real-world streaming applications. Experimental results demonstrate the significant improvements made by Ms-Stream, reducing maximum system latency by 61% and increasing maximum throughput by more than 2x compared to existing state-of-the-art works.
中文翻译:
通过在 Elastic Stream Computing Systems 中通过分层调度缓解 Straggler
倾斜的数据分布导致某些任务或节点处理的数据比其他任务或节点多得多,从而减慢它们的执行速度并将其归类为落后者。现有解决方案尝试通过使用数据流分组或任务调度来建立均衡的工作负载,以缓解落后的情况。这种“一刀切”的方法只考虑了单级需求,无法跨多个级别满足系统的不同需求,最终限制了其性能。为了解决这些问题并有效缓解落后者,我们提出了一种称为 Ms-Stream 的分层协作策略。它旨在平衡任务之间的数据流工作负载,并将计算节点之间的负载差异保持在可接受的范围内。本文从以下几个方面讨论了该策略:(1) Ms-Stream 构建拓扑、分组和资源模型,以及问题的形式化,包括数据流分组、任务子图分区和任务部署。(2) Ms-Stream 采用轻量级的两级分组方法,支持有状态任务的动态工作负载分配,有选择地将资源从落后任务卸载到其他任务。(3) Ms-Stream 通过流应用程序的有向无环图表示将通信密集型任务分配给同一组,同时确保计算密集型任务在组之间的公平分配。(4) Ms-Stream 按照最大填充优先级降序规则,将任务组部署到资源容量不同的计算节点,以实现工作负载均衡。系统吞吐量和延迟等性能指标使用实际流式处理应用程序进行评估。 实验结果表明,Ms-Stream 取得了显著的改进,与现有的先进产品相比,最大系统延迟降低了 61%,最大吞吐量提高了 2 倍以上。
更新日期:2024-12-14
中文翻译:
通过在 Elastic Stream Computing Systems 中通过分层调度缓解 Straggler
倾斜的数据分布导致某些任务或节点处理的数据比其他任务或节点多得多,从而减慢它们的执行速度并将其归类为落后者。现有解决方案尝试通过使用数据流分组或任务调度来建立均衡的工作负载,以缓解落后的情况。这种“一刀切”的方法只考虑了单级需求,无法跨多个级别满足系统的不同需求,最终限制了其性能。为了解决这些问题并有效缓解落后者,我们提出了一种称为 Ms-Stream 的分层协作策略。它旨在平衡任务之间的数据流工作负载,并将计算节点之间的负载差异保持在可接受的范围内。本文从以下几个方面讨论了该策略:(1) Ms-Stream 构建拓扑、分组和资源模型,以及问题的形式化,包括数据流分组、任务子图分区和任务部署。(2) Ms-Stream 采用轻量级的两级分组方法,支持有状态任务的动态工作负载分配,有选择地将资源从落后任务卸载到其他任务。(3) Ms-Stream 通过流应用程序的有向无环图表示将通信密集型任务分配给同一组,同时确保计算密集型任务在组之间的公平分配。(4) Ms-Stream 按照最大填充优先级降序规则,将任务组部署到资源容量不同的计算节点,以实现工作负载均衡。系统吞吐量和延迟等性能指标使用实际流式处理应用程序进行评估。 实验结果表明,Ms-Stream 取得了显著的改进,与现有的先进产品相比,最大系统延迟降低了 61%,最大吞吐量提高了 2 倍以上。