当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
STSyn: Speeding Up Local SGD With Straggler-Tolerant Synchronization
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2024-08-30 , DOI: 10.1109/tsp.2024.3452035
Feng Zhu 1 , Jingjing Zhang 1 , Xin Wang 1
Affiliation  

Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. To address this issue, a novel local SGD strategy called STSyn is proposed in this paper. The key point is to wait for the $K$ fastest workers while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. To evaluate the performance of STSyn, an analysis of the average wall-clock time, average number of local updates, and average number of uploading workers per round is provided. The convergence of STSyn is also rigorously established even when the objective function is nonconvex for both homogeneous and heterogeneous data distributions. Experimental results highlight the superiority of STSyn over state-of-the-art schemes, thanks to its straggler-tolerant technique and the inclusion of additional effective local updates at each worker. Furthermore, the impact of system parameters is investigated. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.

中文翻译:


STSyn:通过容错同步加速本地 SGD



同步本地随机梯度下降(本地 SGD)会因为等待工作线程完成相同数量的本地更新而导致一些工作线程闲置,并且由于工作线程缓慢且散乱而导致随机延迟。为了解决这个问题,本文提出了一种称为 STSyn 的新型局部 SGD 策略。关键点是等待 $K$ 最快的工作人员,同时保持所有工作人员在每个同步轮中持续计算,并充分利用每个工作人员的任何有效(已完成)本地更新,无论落后者如何。为了评估 STSyn 的性能,提供了对平均挂钟时间、平均本地更新数量和每轮平均上传工作人员数量的分析。即使目标函数对于同质和异构数据分布都是非凸的,STSyn 的收敛性也是严格建立的。实验结果突显了 STSyn 相对于最先进方案的优越性,这要归功于其容忍掉队的技术以及在每个工作线程中包含额外的有效本地更新。此外,还研究了系统参数的影响。通过等待更快的工作人员并允许工作人员之间不同数量的本地更新进行异构同步,STSyn 在时间和通信效率方面都提供了显着的改进。
更新日期:2024-08-30
down
wechat
bug