当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transformer fusion-based scale-aware attention network for multispectral victim detection
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-06-16 , DOI: 10.1007/s40747-024-01515-y
Yunfan Chen , Yuting Li , Wenqi Zheng , Xiangkui Wan

The aftermath of a natural disaster leaves victims trapped in rubble which is challenging to detect by smart drones due to the victims in low visibility under the adverse disaster environments and victims in various sizes. To overcome the above challenges, a transformer fusion-based scale-aware attention network (TFSANet) is proposed to overcome adverse environmental impacts in disaster areas by robustly integrating the latent interactions between RGB and thermal images and to address the problem of various-sized victim detection. Firstly, a transformer fusion model is developed to incorporate a two-stream backbone network to effectively fuse the complementary characteristics between RGB and thermal images. This aims to solve the problem that the victims cannot be seen clearly due to the adverse disaster area, such as smog and heavy rain. In addition, a scale-aware attention mechanism is designed to be embedded into the head network to adaptively adjust the size of receptive fields aiming to capture victims with different scales. Extensive experiments on two challenging datasets indicate that our TFSANet achieves superior results. The proposed method achieves 86.56% average precision (AP) on the National Institute of Informatics—Chiba University (NII-CU) multispectral aerial person detection dataset, outperforming the state-of-the-art approach by 4.38%. On the drone-captured RGBT person detection (RGBTDronePerson) dataset, the proposed method significantly improves the AP of the state-of-the-art approach by 4.33%.



中文翻译:


基于变压器融合的尺度感知注意力网络,用于多光谱受害者检测



自然灾害发生后,受灾者被困在废墟中,由于受灾者在恶劣的灾害环境下能见度较低,且受灾者体型各异,智能无人机难以发现。为了克服上述挑战,提出了一种基于变压器融合的尺度感知注意网络(TFSANet),通过稳健地整合RGB和热图像之间的潜在相互作用来克服灾区的不利环境影响,并解决各种规模受害者的问题检测。首先,开发了变压器融合模型,结合双流主干网络,有效融合RGB和热图像之间的互补特性。此举旨在解决受灾区雾霾、暴雨等恶劣影响,无法看清灾民的问题。此外,还设计了一种尺度感知注意力机制,将其嵌入到头部网络中,以自适应调整感受野的大小,旨在捕获不同尺度的受害者。对两个具有挑战性的数据集的大量实验表明,我们的 TFSANet 取得了优异的结果。该方法在千叶大学国立信息研究所 (NII-CU) 多光谱航空人员检测数据集上实现了 86.56% 的平均精度 (AP),比最先进的方法高出 4.38%。在无人机捕获的 RGBT 人物检测(RGBTDronePerson)数据集上,所提出的方法将最先进方法的 AP 显着提高了 4.33%。

更新日期:2024-06-17
down
wechat
bug