当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SAS: Speculative Locality Aware Scheduling for I/O intensive scientific analysis in clouds
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-11-29 , DOI: 10.1016/j.future.2024.107622
Ali Zahir, Ashiq Anjum, Satish Narayana Srirama, Rajkumar Buyya

The execution of data intensive analysis workflows in a multi-cloud environment, such as the World Large hadron collider Computing Grid (WLCG) at CERN, requires a large amount of input data, which is stored in multiple storage elements. The turnaround time taken by an individual analysis workflow running on an edge machine is mostly affected by the data reading time. Minimizing the data reading time can improve the overall efficiency of the data analysis process. To overcome this problem, we have used Speculative Scheduling to optimize the multi-cloud analysis workflows by intelligently streaming data before a task arrives for execution at the edge machine. We propose an Event System (ES) which is an in-memory Serverless process responsible for proactively providing input data to the workflow processes. It prefetches the data from the storage elements to the memory of the edge machine, which executes the workflow. Using locality aware scheduling and prefetching algorithms, it performs Speculative Scheduling on the basis of the evaluation of historic execution logs using the Bayesian Inference model. The Serverless ES learns about the incoming jobs ahead of time and makes use of intelligent data streaming to supply data to these jobs, thus reducing the overall scheduling and data access latencies and leading to significant improvements in the overall turnaround time. We have evaluated the proposed system using a large analysis workflow from High Energy Physics (HEP) by emulating the WLCG infrastructure in a controlled environment. The results have shown that by using speculative and locality aware scheduling techniques, significant improvements (i.e. over 30%) can be achieved in the execution of data intensive workflows in the cloud environment.

中文翻译:


SAS:用于云中 I/O 密集型科学分析的推测位置感知调度



在多云环境中执行数据密集型分析工作流,例如 CERN 的世界大型强子对撞机计算网格 (WLCG),需要大量输入数据,这些数据存储在多个存储元素中。在边缘机器上运行的单个分析工作流程所花费的周转时间主要受数据读取时间的影响。最大限度地减少数据读取时间可以提高数据分析过程的整体效率。为了克服这个问题,我们使用了推测调度来优化多云分析工作流,方法是在任务到达边缘机器执行之前智能地流式传输数据。我们提出了一个事件系统 (ES),它是一个内存中的无服务器进程,负责主动向工作流进程提供输入数据。它将数据存储从存储元素预取到边缘计算机的内存中,边缘计算机将执行工作流。它使用位置感知调度和预取算法,根据使用贝叶斯推理模型对历史执行日志的评估来执行推测调度。Serverless ES 提前了解传入的作业,并利用智能数据流为这些作业提供数据,从而减少整体调度和数据访问延迟,从而显著缩短整体周转时间。通过在受控环境中模拟 WLCG 基础设施,我们使用高能物理 (HEP) 的大型分析工作流程评估了所提出的系统。结果表明,通过使用推测性和位置感知调度技术,可以在云环境中执行数据密集型工作流方面实现显著改进(即超过 30%)。
更新日期:2024-11-29
down
wechat
bug