当前位置: X-MOL 学术Precision Agric. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On crop yield modelling, predicting, and forecasting and addressing the common issues in published studies
Precision Agriculture ( IF 5.4 ) Pub Date : 2024-12-07 , DOI: 10.1007/s11119-024-10212-2
Patrick Filippi, Si Yang Han, Thomas F.A. Bishop

There has been a recent surge in the number of studies that aim to model crop yield using data-driven approaches. This has largely come about due to the increasing amounts of remote sensing (e.g. satellite imagery) and precision agriculture data available (e.g. high-resolution crop yield monitor data), as well as the abundance of machine learning modelling approaches. However, there are several common issues in published studies in the field of precision agriculture (PA) that must be addressed. This includes the terminology used in relation to crop yield modelling, predicting, forecasting, and interpolating, as well as the way that models are calibrated and validated. As a typical example, many studies will take a crop yield map or several plots within a field from a single season, build a model with satellite or Unmanned Aerial Vehicle (UAV) imagery, validate using data-splitting or some kind of cross-validation (e.g. k-fold), and say that it is a ‘prediction’ or ‘forecast’ of crop yield. However, this poses a problem as the approach is not testing the forecasting ability of the model, as it is built on the same season that it is then validating with, thus giving a substantial overestimation of the value for decision-making, such as an application of fertiliser in-season. This is an all-too-common flaw in the logic construct of many published studies. Moving forward, it is essential that clear definitions and guidelines for data-driven yield modelling and validation are outlined so that there is a greater connection between the goal of the study, and the actual study outputs/outcomes. To demonstrate this, the current study uses a case study dataset from a collection of large neighbouring farms in New South Wales, Australia. The dataset includes 160 yield maps of winter wheat (Triticum aestivum) covering 26,400 hectares over a 10-year period (2014–2023). Machine learning crop yield models are built at 30 m spatial resolution with a suite of predictor data layers that relate to crop yield. This includes datasets that represent soil variation, terrain, weather, and satellite imagery of the crop. Predictions are made at both the within-field (30 m), and field resolution. Crop yield predictions are useful for an array of applications, so four different experiments were set up to reflect different scenarios. This included Experiment 1: forecasting yield mid-season (e.g. for mid-season fertilisation), Experiment 2: forecasting yield late-season (e.g. for late-season logistics/forward selling), Experiment 3: predicting yield in a previous season for a field with no yield data in a season, and Experiment 4: predicting yield in a previous season for a field with some yield data (e.g. two combine harvesters, but only one was fitted with a yield monitor). This study showcases how different model calibration and validation approaches clearly impact prediction quality, and therefore how they should be interpreted in data-driven crop yield modelling studies. This is key for ensuring that the wealth of data-driven crop yield modelling studies not only contribute to the science, but also deliver actual value to growers, industry, and governments.



中文翻译:


关于作物产量建模、预测和预测以及解决已发表研究中的常见问题



最近,旨在使用数据驱动方法对作物产量进行建模的研究数量激增。这主要是由于可用的遥感(例如卫星图像)和精准农业数据(例如高分辨率作物产量监测数据)的数量增加,以及丰富的机器学习建模方法。然而,在精准农业 (PA) 领域已发表的研究中有几个常见问题必须解决。这包括与作物产量建模、预测、预测和插值相关的术语,以及模型的校准和验证方式。举个典型的例子,许多研究会采用一个作物产量图或一个季节的田地内的几个地块,用卫星或无人机 (UAV) 图像构建一个模型,使用数据拆分或某种交叉验证(例如 k 折叠)进行验证,并说它是对作物产量的“预测”或“预测”。然而,这带来了一个问题,因为该方法没有测试模型的预测能力,因为它是建立在随后验证的同一季节之上的,因此大大高估了决策价值,例如当季施肥。这是许多已发表研究的逻辑结构中非常常见的缺陷。展望未来,必须为数据驱动的产量建模和验证概述明确的定义和指南,以便研究目标与实际研究结果/结果之间建立更大的联系。为了证明这一点,目前的研究使用了来自澳大利亚新南威尔士州一系列大型邻近农场的案例研究数据集。 该数据集包括 10 年(2014-2023 年)的 160 张冬小麦 (Triticum aestivum) 产量图,覆盖 26,400 公顷。机器学习作物产量模型以 30 m 的空间分辨率构建,其中包含一套与作物产量相关的预测数据图层。这包括表示作物的土壤变化、地形、天气和卫星图像的数据集。在场内 (30 m) 和场分辨率下进行预测。农作物产量预测对于一系列应用都很有用,因此设置了四个不同的实验来反映不同的场景。这包括实验 1:预测季中产量(例如,季中施肥),实验 2:预测季末产量(例如,季末物流/远期销售),实验 3:预测一个季节没有产量数据的田地在上一季的产量,以及实验 4:预测具有一些产量数据的田地在上一季的产量(例如,两台联合收割机, 但只有一个配备了产量监视器)。本研究展示了不同的模型校准和验证方法如何明显影响预测质量,以及因此在数据驱动的作物产量建模研究中应如何解释它们。这是确保大量数据驱动的作物产量建模研究不仅有助于科学发展的关键,也是为种植者、行业和政府提供实际价值的关键。

更新日期:2024-12-07
down
wechat
bug