A hybrid CNN-transformer network: Accurate and efficient semantic segmentation of crops and weeds on resource-constrained embedded devices,Crop Protection

当前位置： X-MOL 学术 › Crop Prot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A hybrid CNN-transformer network: Accurate and efficient semantic segmentation of crops and weeds on resource-constrained embedded devices
Crop Protection ( IF 2.5 ) Pub Date : 2024-11-12 , DOI: 10.1016/j.cropro.2024.107018
Yifan Wei, Yuncong Feng, Dongcheng Zu, Xiaoli Zhang

Weed control plays a crucial role in agricultural production. The utilization of advanced vision algorithms on intelligent weeding robots enables the autonomous and efficient resolution of weed-related issues. Vision transformers are highly sensitive to plant texture and shape, but their computational cost is too high. Consequently, we propose a novel hybrid CNN-transformer network for the semantic segmentation of crops and weeds on Resource-Constrained Embedded Devices. Our network follows an encoder–decoder structure, incorporating the proposed concat extended downsampling block in the encoder, which increases inference speed by reducing memory access time and improves the accuracy of feature extraction. For global semantic extraction, we introduce the proposed Parallel input transformer semantic enhancement module, which employs a shared transformer block to increase the computation rate. Additionally, global–local semantic fusion block mitigates the semantic gap problem well. To fully utilize the transformer’s ability to process plant texture and shape, we employ the fusion enhancement block in the decoder, thus minimizing the loss of feature information. Segmentation results on three publicly benchmark datasets show that our network outperforms the commonly used CNN-based, transformer-based, and hybrid CNN-transformer-based methods in terms of segmentation accuracy. Moreover, our network comprises only 0.1887M parameters and 0.2145G floating-point operations. We also evaluate the inference speed on an NVIDIA Jetson Orin NX embedded system, which result for inference single image 28.28 msec, and achieving a detection speed of 35.36 FPS. The experimental results highlight that our network maintains the best inference speed and exhibits the strongest segmentation performance on resource-constrained embedded systems.

中文翻译：

混合 CNN 转换器网络：在资源受限的嵌入式设备上对作物和杂草进行准确高效的语义分割

杂草控制在农业生产中起着至关重要的作用。在智能除草机器人上利用先进的视觉算法，可以自主有效地解决与杂草相关的问题。视觉转换器对植物的纹理和形状高度敏感，但它们的计算成本太高。因此，我们提出了一种新的混合 CNN-transformer 网络，用于在资源受限的嵌入式设备上对农作物和杂草进行语义分割。我们的网络遵循编码器-解码器结构，在编码器中加入了提议的 concat 扩展下采样块，通过减少内存访问时间来提高推理速度，并提高特征提取的准确性。对于全局语义提取，我们介绍了提出的 Parallel input transformer 语义增强模块，该模块采用共享 transformer 块来提高计算速率。此外，global-local semantic fusion block 很好地缓解了语义差距问题。为了充分利用 transformer 处理植物纹理和形状的能力，我们在解码器中采用了融合增强块，从而最大限度地减少了特征信息的损失。三个公开基准数据集的分割结果表明，我们的网络在分割精度方面优于常用的基于 CNN、基于 transformer 和基于 CNN-transformer 的混合方法。此外，我们的网络仅包含 0.1887M 参数和 0.2145G 浮点运算。我们还评估了 NVIDIA Jetson Orin NX 嵌入式系统上的推理速度，推理单张图像为 28.28 毫秒，检测速度为 35.36 FPS。实验结果表明，我们的网络保持了最佳的推理速度，并在资源受限的嵌入式系统上表现出最强的分割性能。

更新日期：2024-11-12

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南