当前位置:
X-MOL 学术
›
J. Netw. Comput. Appl.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Fatriot: Fault-tolerant MEC architecture for mission-critical systems using a SmartNIC
Journal of Network and Computer Applications ( IF 7.7 ) Pub Date : 2024-07-29 , DOI: 10.1016/j.jnca.2024.103978 Taejune Park , Myoungsung You , Jinwoo Kim , Seungsoo Lee
Journal of Network and Computer Applications ( IF 7.7 ) Pub Date : 2024-07-29 , DOI: 10.1016/j.jnca.2024.103978 Taejune Park , Myoungsung You , Jinwoo Kim , Seungsoo Lee
Multi-access edge computing (MEC), deploying cloud infrastructures proximate to end-devices and reducing latency, takes pivotal roles for mission-critical services such as smart grids, self-driving cars, and healthcare. Ensuring fault-tolerance is paramount for mission-critical services, as failures in these services can lead to fatal accidents and blackouts. However, the distributed nature of MEC architectures makes them more susceptible to failures than traditional cloud systems. Existing research in this field has focused on enhancing to prevent failures in MEC systems rather than restoring them from failure conditions. To bridge this gap, we introduce , a SmartNIC-based architecture designed to ensure fault-tolerance in MEC systems. actively monitors for anomalies on MEC hosts and seamlessly redirects incoming service traffic to backup hosts upon detecting failures. Operating as a stand-alone solution on a SmartNIC, guarantees the continuous operation of its fault-tolerance mechanism, even during severe errors (e.g., kernel failure) on the MEC host, maintaining uninterrupted service in mission-critical services. Our prototype of , implemented on the NetFPGA-SUME, demonstrates effective mitigation of various failure scenarios, achieving this with minimal overhead to services (less than 1%).
中文翻译:
Fatriot:使用 SmartNIC 的任务关键型系统的容错 MEC 架构
多接入边缘计算 (MEC) 可在终端设备附近部署云基础设施并减少延迟,在智能电网、自动驾驶汽车和医疗保健等关键任务服务中发挥着关键作用。确保容错对于关键任务服务至关重要,因为这些服务的故障可能导致致命事故和停电。然而,MEC 架构的分布式特性使其比传统云系统更容易出现故障。该领域的现有研究重点是增强 MEC 系统的故障预防,而不是从故障情况下恢复系统。为了弥补这一差距,我们引入了一种基于 SmartNIC 的架构,旨在确保 MEC 系统的容错能力。主动监控 MEC 主机上的异常情况,并在检测到故障时将传入服务流量无缝重定向到备份主机。作为 SmartNIC 上的独立解决方案运行,即使在 MEC 主机上出现严重错误(例如内核故障)时,也能保证其容错机制的持续运行,从而保持关键任务服务的不间断服务。我们在 NetFPGA-SUME 上实现的原型展示了对各种故障场景的有效缓解,以最小的服务开销(小于 1%)实现这一目标。
更新日期:2024-07-29
中文翻译:
Fatriot:使用 SmartNIC 的任务关键型系统的容错 MEC 架构
多接入边缘计算 (MEC) 可在终端设备附近部署云基础设施并减少延迟,在智能电网、自动驾驶汽车和医疗保健等关键任务服务中发挥着关键作用。确保容错对于关键任务服务至关重要,因为这些服务的故障可能导致致命事故和停电。然而,MEC 架构的分布式特性使其比传统云系统更容易出现故障。该领域的现有研究重点是增强 MEC 系统的故障预防,而不是从故障情况下恢复系统。为了弥补这一差距,我们引入了一种基于 SmartNIC 的架构,旨在确保 MEC 系统的容错能力。主动监控 MEC 主机上的异常情况,并在检测到故障时将传入服务流量无缝重定向到备份主机。作为 SmartNIC 上的独立解决方案运行,即使在 MEC 主机上出现严重错误(例如内核故障)时,也能保证其容错机制的持续运行,从而保持关键任务服务的不间断服务。我们在 NetFPGA-SUME 上实现的原型展示了对各种故障场景的有效缓解,以最小的服务开销(小于 1%)实现这一目标。