当前位置:
X-MOL 学术
›
Comput. Aided Civ. Infrastruct. Eng.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Integrated vision language and foundation model for automated estimation of building lowest floor elevation
Computer-Aided Civil and Infrastructure Engineering ( IF 8.5 ) Pub Date : 2024-07-26 , DOI: 10.1111/mice.13310 Yu‐Hsuan Ho 1 , Longxiang Li 2 , Ali Mostafavi 1
Computer-Aided Civil and Infrastructure Engineering ( IF 8.5 ) Pub Date : 2024-07-26 , DOI: 10.1111/mice.13310 Yu‐Hsuan Ho 1 , Longxiang Li 2 , Ali Mostafavi 1
Affiliation
Street view imagery has emerged as a valuable resource for urban analytics research. Recent studies have explored its potential for estimating lowest floor elevation (LFE), offering a scalable alternative to traditional on‐site measurements, crucial for assessing properties' flood risk and damage extent. While existing methods rely on object detection, the introduction of image segmentation has expanded the utility of street view images for LFE estimation, although challenges still remain in segmentation quality and capability to distinguish front doors from other doors. To address these challenges in LFE estimation, this study integrates the Segment Anything model, a segmentation foundation model, with vision language models (VLMs) to conduct text‐prompt image segmentation on street view images for LFE estimation. By evaluating various VLMs, integration methods, and text prompts, the most suitable model was identified for street view image analytics and LFE estimation tasks, thereby improving the coverage of the current LFE estimation model based on image segmentation from 33% to 56% of properties. Remarkably, our proposed method, ELEV‐VISION‐SAM, significantly enhances the availability of LFE estimation to almost all properties in which the front door is visible in the street view image. In addition, the findings present the first baseline and quantified comparison of various vision models for street view image‐based LFE estimation. The model and findings not only contribute to advancing street view image segmentation for urban analytics but also provide a novel approach for image segmentation tasks for other civil engineering and infrastructure analytics tasks.
中文翻译:
集成视觉语言和基础模型,用于自动估计建筑物最低楼层标高
街景图像已成为城市分析研究的宝贵资源。最近的研究探索了其估算最低楼层标高 (LFE) 的潜力,为传统现场测量提供了可扩展的替代方案,这对于评估房产的洪水风险和损坏程度至关重要。虽然现有方法依赖于对象检测,但图像分割的引入扩展了街景图像用于 LFE 估计的效用,尽管分割质量和区分前门与其他门的能力仍然存在挑战。为了解决 LFE 估计中的这些挑战,本研究将分割基础模型 Segment Anything 模型与视觉语言模型 (VLM) 集成,对街景图像进行文本提示图像分割以进行 LFE 估计。通过评估各种VLM、集成方法和文本提示,确定了最适合街景图像分析和LFE估计任务的模型,从而将当前基于图像分割的LFE估计模型的覆盖率从33%提高到56% 。值得注意的是,我们提出的方法 ELEV-VISION-SAM 显着增强了 LFE 估计对几乎所有在街景图像中可见前门的属性的可用性。此外,研究结果还提出了基于街景图像的 LFE 估计的各种视觉模型的第一个基线和量化比较。该模型和研究结果不仅有助于推进城市分析的街景图像分割,而且还为其他土木工程和基础设施分析任务的图像分割任务提供了一种新颖的方法。
更新日期:2024-07-26
中文翻译:
集成视觉语言和基础模型,用于自动估计建筑物最低楼层标高
街景图像已成为城市分析研究的宝贵资源。最近的研究探索了其估算最低楼层标高 (LFE) 的潜力,为传统现场测量提供了可扩展的替代方案,这对于评估房产的洪水风险和损坏程度至关重要。虽然现有方法依赖于对象检测,但图像分割的引入扩展了街景图像用于 LFE 估计的效用,尽管分割质量和区分前门与其他门的能力仍然存在挑战。为了解决 LFE 估计中的这些挑战,本研究将分割基础模型 Segment Anything 模型与视觉语言模型 (VLM) 集成,对街景图像进行文本提示图像分割以进行 LFE 估计。通过评估各种VLM、集成方法和文本提示,确定了最适合街景图像分析和LFE估计任务的模型,从而将当前基于图像分割的LFE估计模型的覆盖率从33%提高到56% 。值得注意的是,我们提出的方法 ELEV-VISION-SAM 显着增强了 LFE 估计对几乎所有在街景图像中可见前门的属性的可用性。此外,研究结果还提出了基于街景图像的 LFE 估计的各种视觉模型的第一个基线和量化比较。该模型和研究结果不仅有助于推进城市分析的街景图像分割,而且还为其他土木工程和基础设施分析任务的图像分割任务提供了一种新颖的方法。