Gwydion: Efficient auto-scaling for complex containerized applications in Kubernetes through Reinforcement Learning,Journal of Network and Computer Applications

当前位置： X-MOL 学术 › J. Netw. Comput. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Gwydion: Efficient auto-scaling for complex containerized applications in Kubernetes through Reinforcement Learning
Journal of Network and Computer Applications ( IF 7.7 ) Pub Date : 2024-11-26 , DOI: 10.1016/j.jnca.2024.104067
José Santos, Efstratios Reppas, Tim Wauters, Bruno Volckaert, Filip De Turck

Containers have reshaped application deployment and life-cycle management in recent cloud platforms. The paradigm shift from large monolithic applications to complex graphs of loosely-coupled microservices aims to increase deployment flexibility and operational efficiency. However, efficient allocation and scaling of microservice applications is challenging due to their intricate inter-dependencies. Existing works do not consider microservice dependencies, which could lead to the application’s performance degradation when service demand increases. As dependencies increase, communication between microservices becomes more complex and frequent, leading to slower response times and higher resource consumption, especially during high demand. In addition, performance issues in one microservice can also trigger a ripple effect across dependent services, exacerbating the performance degradation across the entire application. This paper studies the impact of microservice inter-dependencies in auto-scaling by proposing Gwydion, a novel framework that enables different auto-scaling goals through Reinforcement Learning (RL) algorithms. Gwydion has been developed based on the OpenAI Gym library and customized for the popular Kubernetes (K8s) platform to bridge the gap between RL and auto-scaling research by training RL algorithms on real cloud environments for two opposing reward strategies: cost-aware and latency-aware. Gwydion focuses on improving resource usage and reducing the application’s response time by considering microservice inter-dependencies when scaling horizontally. Experiments with microservice benchmark applications, such as Redis Cluster (RC) and Online Boutique (OB), show that RL agents can reduce deployment costs and the application’s response time compared to default scaling mechanisms, achieving up to 50% lower latency while avoiding performance degradation. For RC, cost-aware algorithms can reduce the number of deployed pods (2 to 4), resulting in slightly higher latency (300μs to 6 ms) but lower resource consumption. For OB, all RL algorithms exhibit a notable response time improvement by considering all microservices in the observation space, enabling the sequential triggering of actions across different deployments. This leads to nearly 30% cost savings while maintaining consistently lower latency throughout the experiment. Gwydion aims to advance auto-scaling research in a rapidly evolving dynamic cloud environment.

中文翻译：

Gwydion：通过强化学习为 Kubernetes 中的复杂容器化应用程序实现高效的自动扩展

容器重塑了最近云平台中的应用程序部署和生命周期管理。从大型整体式应用程序到松散耦合微服务的复杂图形的范式转变旨在提高部署灵活性和运营效率。但是，由于微服务应用程序错综复杂的相互依赖关系，因此微服务应用程序的高效分配和扩展具有挑战性。现有工作不考虑微服务依赖，当服务需求增加时，这可能会导致应用程序的性能下降。随着依赖关系的增加，微服务之间的通信变得更加复杂和频繁，从而导致响应时间变慢和资源消耗增加，尤其是在高需求期间。此外，一个微服务中的性能问题也可能在依赖服务之间触发连锁反应，从而加剧整个应用程序的性能下降。本文通过提出 Gwydion 来研究微服务相互依赖关系在自动扩展中的影响，Gwydion 是一种新颖的框架，可通过强化学习（RL）算法实现不同的自动扩展目标。Gwydion 基于 OpenAI Gym 库开发，并针对流行的 Kubernetes （K8s）平台进行定制，通过在真实的云环境中训练 RL 算法来弥合 RL 和自动扩展研究之间的差距，以实现两种相反的奖励策略：成本感知和延迟感知。Gwydion 专注于通过在水平扩展时考虑微服务相互依赖关系来提高资源利用率和缩短应用程序的响应时间。对微服务基准测试应用程序（如 Redis Cluster （RC）和 Online Boutique （OB））的实验表明，与默认扩展机制相比，RL 代理可以降低部署成本和应用程序的响应时间，在避免性能下降的同时将延迟降低多达 50%。对于 RC 来说，成本感知算法可以减少部署的 Pod 数量（2 到 4 个），从而导致延迟略高（300μs 到 6 毫秒），但资源消耗更低。对于 OB 来说，所有 RL 算法都通过考虑观察空间中的所有微服务，表现出显著的响应时间改进，从而能够跨不同部署顺序触发操作。这可节省近 30% 的成本，同时在整个实验过程中始终保持较低的延迟。Gwydion 的目标是在快速发展的动态云环境中推进自动扩展研究。

更新日期：2024-11-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南