Self-supervised learning of Vision Transformers for digital soil mapping using visual data,Geoderma

当前位置： X-MOL 学术 › Geoderma › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-supervised learning of Vision Transformers for digital soil mapping using visual data
Geoderma ( IF 5.6 ) Pub Date : 2024-10-10 , DOI: 10.1016/j.geoderma.2024.117056
Paul Tresson, Maxime Dumont, Marc Jaeger, Frédéric Borne, Stéphane Boivin, Loïc Marie-Louise, Jérémie François, Hassan Boukcim, Hervé Goëau

In arid environments, prospecting cultivable land is challenging due to harsh climatic conditions and vast, hard-to-access areas. However, the soil is often bare, with little vegetation cover, making it easy to observe from above. Hence, remote sensing can drastically reduce costs to explore these areas. For the past few years, deep learning has extended remote sensing analysis, first with Convolutional Neural Networks (CNNs), then with Vision Transformers (ViTs). The main drawback of deep learning methods is their reliance on large calibration datasets, as data collection is a cumbersome and costly task, particularly in drylands. However, recent studies demonstrate that ViTs can be trained in a self-supervised manner to take advantage of large amounts of unlabelled data to pre-train models. These backbone models can then be finetuned to learn a supervised regression model with few labelled data.

中文翻译：

Vision Transformers 的自我监督学习，用于使用视觉数据进行数字土壤测绘

在干旱环境中，由于气候条件恶劣且面积广阔、难以进入，勘探可耕地具有挑战性。然而，土壤通常是裸露的，植被覆盖率低，很容易从上面观察。因此，遥感可以大大降低探索这些领域的成本。在过去的几年里，深度学习扩展了遥感分析，首先是卷积神经网络（CNN），然后是视觉转换器（ViT）。深度学习方法的主要缺点是它们依赖于大型校准数据集，因为数据收集是一项繁琐且昂贵的任务，尤其是在干旱地区。然而，最近的研究表明，ViT 可以以自我监督的方式进行训练，以利用大量未标记的数据来预训练模型。然后可以对这些主干模型进行微调，以学习具有少量标记数据的监督回归模型。

更新日期：2024-10-10

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南