当前位置: X-MOL 学术Autom. Constr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
Automation in Construction ( IF 9.6 ) Pub Date : 2024-11-22 , DOI: 10.1016/j.autcon.2024.105863
Wei-Lun Tsai, Phuong-Linh Le, Wang-Fat Ho, Nai-Wen Chi, Jacob J. Lin, Shuai Tang, Shang-Hsien Hsieh

Traditional safety inspections require significant human effort and time to capture site photos and textual descriptions. While standardized forms and image captioning techniques have been explored to improve inspection efficiency, compiling reports with both visual and text data remains challenging due to the multiplicity of safety-related knowledge. To assist inspectors in evaluating violations more efficiently, this paper presents an image-language model, utilizing Contrastive Language-Image Pre-training (CLIP) fine-tuning and prefix captioning to automatically generate safety observations. A user-friendly mobile phone application has been created to streamline safety report documentation for site engineers. The language model successfully classifies nine violation types with an average accuracy of 73.7%, outperforming the baseline model by 41.8%. Experiment participants confirmed that the mobile application is helpful for safety inspections. This automated framework simplifies safety documentation by identifying violation scenes through images, improves overall safety performance, and supports the digital transformation of construction sites.

中文翻译:


使用对比语言图像预训练 (CLIP) 图像描述和关注进行施工安全检查



传统的安全检查需要大量的人力和时间来捕获现场照片和文字描述。虽然已经探索了标准化表格和图像描述技术来提高检测效率,但由于安全相关知识的多样性,使用视觉和文本数据编制报告仍然具有挑战性。为了帮助检查员更有效地评估违规行为,本文提出了一种图像语言模型,利用对比语言-图像预训练 (CLIP) 微调和前缀字幕自动生成安全观察结果。已经创建了一个用户友好的手机应用程序,以简化现场工程师的安全报告文档。该语言模型成功地对 9 种违规类型进行了分类,平均准确率为 73.7%,比基线模型高出 41.8%。实验参与者证实,该移动应用程序有助于安全检查。该自动化框架通过图像识别违规场景,简化了安全文档编制,提高了整体安全绩效,并支持建筑工地的数字化转型。
更新日期:2024-11-22
down
wechat
bug