当前位置: X-MOL 学术Autom. Constr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proactive safety hazard identification using visual–text semantic similarity for construction safety management
Automation in Construction ( IF 9.6 ) Pub Date : 2024-07-12 , DOI: 10.1016/j.autcon.2024.105602
Yiheng Wang , Bo Xiao , Ahmed Bouferguene , Mohamed Al-Hussein

Automated safety management in construction can reduce injuries by identifying hazardous postures, actions, and missing personal protective equipment (PPE). However, existing computer vision (CV) methods have limitations in connecting recognition results to text-based safety rules. To address this issue, this paper presents a multi-modal framework that bridges the gap between construction image monitoring and safety knowledge. The framework includes an image processing module that utilizes CV and dense image captioning techniques, and a text processing module that employs natural language processing for semantic similarity evaluation. Experiments showed a mean average precision of 49.6% in dense captioning and an F1 score of 74.3% in hazard identification. While the proposed framework demonstrates a promising multi-modal approach towards automated safety hazard identification and reasoning, improvements in dataset size and model performance are still needed to enhance its effectiveness in real-world applications.

中文翻译:


利用视觉文本语义相似性进行主动安全隐患识别以进行施工安全管理



施工中的自动化安全管理可以通过识别危险姿势、动作和缺失的个人防护装备 (PPE) 来减少伤害。然而,现有的计算机视觉(CV)方法在将识别结果与基于文本的安全规则连接起来方面存在局限性。为了解决这个问题,本文提出了一个多模式框架,弥合了施工图像监控和安全知识之间的差距。该框架包括一个利用 CV 和密集图像字幕技术的图像处理模块,以及一个利用自然语言处理进行语义相似性评估的文本处理模块。实验表明,密集字幕的平均准确率为 49.6%,危险识别的 F1 分数为 74.3%。虽然所提出的框架展示了一种有前途的自动安全隐患识别和推理的多模式方法,但仍需要改进数据集大小和模型性能,以增强其在实际应用中的有效性。
更新日期:2024-07-12
down
wechat
bug