当前位置: X-MOL 学术Autom. Constr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Occlusion-aware and jitter-rejection 3D video real-time pose estimation for construction workers
Automation in Construction ( IF 9.6 ) Pub Date : 2025-01-29 , DOI: 10.1016/j.autcon.2025.106015
Benyang Song, Jiajun Wang, Xiaoling Wang, Tuocheng Zeng, Dongze Li

Video pose estimation is widely employed to monitor the activities of workers at construction sites. However, previous studies have often overlooked the challenges posed by complex occlusions and motion jitters, resulting in inaccurate or unrealistic postures that impact subsequent analysis. This paper presents a three-dimensional (3D) worker pose estimation pipeline to mitigate occlusions and jitters using on-site videos. Initially, YOLOv8 is adopted to rapidly extract two-dimensional (2D) skeletons from videos. Subsequently, a view-temporal fusion module is introduced comprising a heuristic multi-view fusion module and a motion smoothing module, designed to address occlusions and jitters, respectively. Finally, MotionBERT is employed to convert 2D skeletons into 3D skeletons. The proposed method achieves a mean per joint position error of 25.64 on the Human3.6 M dataset with 25 % fewer computations, running at 60 FPS. On-site experiments indicate that it was valuable for reconstructing 3D workers' postures from videos, facilitating safety monitoring at construction sites.

中文翻译:


面向建筑工人的遮挡感知和抖动抑制 3D 视频实时姿态估计



视频姿态估计被广泛用于监控建筑工地工人的活动。然而,以前的研究往往忽视了复杂遮挡和运动抖动带来的挑战,导致姿势不准确或不切实际,从而影响后续分析。本文提出了一个三维 (3D) 工人姿势估计管道,以使用现场视频来减轻遮挡和抖动。最初,采用 YOLOv8 从视频中快速提取二维 (2D) 骨骼。随后,引入了一个视图-时间融合模块,包括一个启发式多视图融合模块和一个运动平滑模块,旨在分别解决遮挡和抖动。最后,使用 MotionBERT 将 2D 骨骼转换为 3D 骨骼。所提出的方法在 Human3.6 M 数据集上实现了 25.64 的平均每个关节位置误差,计算量减少了 25%,以 60 FPS 运行。现场实验表明,它对于从视频中重建 3D 工人的姿势很有价值,有助于在建筑工地进行安全监控。
更新日期:2025-01-29
down
wechat
bug