Few-shot agricultural pest recognition based on multimodal masked autoencoder,Crop Protection

当前位置： X-MOL 学术 › Crop Prot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Few-shot agricultural pest recognition based on multimodal masked autoencoder
Crop Protection ( IF 2.5 ) Pub Date : 2024-10-28 , DOI: 10.1016/j.cropro.2024.106993
Yinshuo Zhang, Lei Chen, Yuan Yuan

Visual recognition methods based on deep convolutional neural networks have performed well in pest diagnosis and have gradually become a research hotspot. However, agricultural pest recognition faces challenges such as few-shot learning, category imbalance, similarity in appearance, and small pest targets. Existing deep learning-based pest recognition methods typically rely solely on unimodal image data, which results in a model whose recognition performance is heavily dependent on the size and quality of the annotated training dataset. However, the construction of large-scale, high-quality pest datasets requires significant economic and technical costs, limiting the practical generalization of existing methods for pest recognition. To address these challenges, this paper proposes a few-shot pest recognition model called MMAE (multimodal masked autoencoder). Firstly, the masked autoencoder of MMAE integrates self-supervised learning, which can be applied to few-shot datasets and improves recognition accuracy. Secondly, MMAE embeds textual modal information on top of image modal information, thus improving the performance of pest recognition by utilizing the correlation and complementarity between the two modalities. The experimental results show that MMAE is the most effective for pest identification compared with the existing excellent models, and the identification accuracy is as high as 98.12%, which is 1.61 percentage points higher than the current state-of-the-art MAE method. The work in this paper shows that the introduction of textual information can assist the visual coder in capturing agricultural pest characterization information at a higher level of granularity, providing a methodological reference for solving the problem of agricultural pest recognition under few-shot conditions.

中文翻译：

基于多模态掩蔽自编码器的农业有害生物少样本识别

基于深度卷积神经网络的视觉识别方法在害虫诊断中表现良好，逐渐成为研究热点。然而，农业有害生物识别面临着诸如小样本学习、类别不平衡、外观相似和有害生物目标小等挑战。现有的基于深度学习的害虫识别方法通常完全依赖于单峰图像数据，这导致模型的识别性能在很大程度上取决于带注释的训练数据集的大小和质量。然而，构建大规模、高质量的有害生物数据集需要大量的经济和技术成本，限制了现有有害生物识别方法的实际推广。为了应对这些挑战，本文提出了一种称为 MMAE （多模态掩蔽自动编码器）的小样本害虫识别模型。首先，MMAE 的掩码自编码器集成了自监督学习，可以应用于小样本数据集，提高识别准确率;其次，MMAE 在图像模态信息之上嵌入文本模态信息，从而利用两种模态之间的相关性和互补性来提高害虫识别的性能。实验结果表明，与现有的优秀模型相比，MMAE 对害虫鉴定最有效，识别准确率高达 98.12%，比目前最先进的 MAE 方法提高了 1.61 个百分点。本文的工作表明，文本信息的引入可以帮助视觉编码器在更高粒度上捕获农业有害生物特征信息，为解决小样本条件下的农业有害生物识别问题提供方法论参考。

更新日期：2024-10-28

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南