当前位置: X-MOL 学术IEEE Trans. Serv. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tetris: Proactive Container Scheduling for Long-Term Load Balancing in Shared Clusters
IEEE Transactions on Services Computing ( IF 5.5 ) Pub Date : 2024-08-13 , DOI: 10.1109/tsc.2024.3442544
Fei Xu 1 , Xiyue Shen 1 , Shuohao Lin 1 , Li Chen 2 , Zhi Zhou 3 , Fen Xiao 4 , Fangming Liu 5
Affiliation  

Long-running containerized workloads (e.g., machine learning), which typically show time-varying patterns, are increasingly prevailing in shared production clusters. To improve workload performance, current schedulers mainly focus on optimizing short-term benefits of cluster load balancing or initial container placement on servers. However, this would inevitably bring many invalid migrations (i.e., containers are migrated back and forth among servers over a short time window), leading to significant service level objective (SLO) violations. This paper introduces Tetris , a model predictive control (MPC)-based container scheduling strategy to proactively migrate long-running workloads for cluster load balancing. Specifically, we first build a discrete-time dynamic model for long-term optimization of container scheduling. To solve such an optimization problem, Tetris then employs two main components: (1) a container resource predictor, which leverages time-series analysis approaches to accurately predict the container resource consumption; (2) an MPC-based container scheduler that jointly optimizes the cluster load balancing and container migration cost over a certain sliding time window . We implement and open source a prototype of Tetris based on K8s. Extensive prototype experiments and trace-driven simulations demonstrate that Tetris can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations, compared to the state-of-the-art container scheduling strategies.

中文翻译:


Tetris:用于共享集群中长期负载均衡的主动容器调度



长时间运行的容器化工作负载(例如机器学习)通常表现出随时间变化的模式,在共享生产集群中越来越普遍。为了提高工作负载性能,当前的调度程序主要关注优化集群负载均衡或服务器上初始容器放置的短期优势。但是,这将不可避免地带来许多无效的迁移(即,容器在短时间内在服务器之间来回迁移),从而导致严重的服务级别目标 (SLO) 违规。本白皮书介绍了 Tetris,这是一种基于模型预测控制 (MPC) 的容器调度策略,用于主动迁移长时间运行的工作负载以实现集群负载均衡。具体来说,我们首先构建了一个离散时间动态模型,用于容器调度的长期优化。为了解决这样的优化问题,俄罗斯方块随后采用了两个主要组件:(1) 容器资源预测器,它利用时间序列分析方法来准确预测容器资源消耗;(2) 基于 MPC 的容器调度器,在一定的滑动时间窗口内共同优化集群负载均衡和容器迁移成本。我们实现并开源了基于 K8s 的俄罗斯方块原型。广泛的原型实验和跟踪驱动的模拟表明,与最先进的容器调度策略相比,俄罗斯方块可以将集群负载均衡度提高多达 77.8%,而不会产生任何 SLO 违规。
更新日期:2024-08-13
down
wechat
bug