当前位置: X-MOL 学术Crop Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An investigation of the impact of imbalance on the analysis of the US crop variety evaluation program data
Crop Science ( IF 2.0 ) Pub Date : 2024-05-24 , DOI: 10.1002/csc2.21262
Zhou Fang 1 , Dewayne D. Deng 2 , Johnie N. Jenkins 2 , Qian M. Zhou 1
Affiliation  

Multi‐environment trial data from many crop variety evaluation programs are imbalanced because only a subset of varieties is selected for the following year, which leads to missing variety by year. Inspired by the US National Cotton Variety Test trial, we conducted new simulation studies to investigate selection processes that differ from the existing literature. The followings are our four main contributions. First, we adopted a framework that utilizes a logistic regression to generate imbalanced data that follow missing completely at random, missing at random, or missing not at random (MNAR). Second, our selection process can depend on multiple traits, whereas all existing studies only used a single trait for selection. Third, besides variance components (VCs), long‐term trends that reflect genetic and non‐genetic development are of interest since the simulated data span over 30 years. Last, we evaluated the prediction accuracy for variety's overall and location‐specific performance. The results show that the VC and long‐term trends estimations are the worst under MNAR using the single trait for selection. Compared to VC, the long‐term trends estimation is more influenced by the missing mechanism and missing rate. However, the prediction accuracy for variety's performance is mainly driven by the missing rate and is less sensitive to the selection process. If ignoring the genetic and non‐genetic long‐term trends, both estimation and prediction will deteriorate. More testing years would improve estimation and prediction, despite a higher missing rate.

中文翻译:


不平衡对美国农作物品种评价计划数据分析的影响调查



许多作物品种评价项目的多环境试验数据不平衡,因为下一年只选择了品种的子集,导致逐年品种缺失。受美国国家棉花品种测试试验的启发,我们进行了新的模拟研究,以调查与现有文献不同的选择过程。以下是我们的四个主要贡献。首先,我们采用了一个框架,利用逻辑回归来生成不平衡数据,这些数据遵循完全随机丢失、随机丢失或非随机丢失(MNAR)。其次,我们的选择过程可以取决于多个性状,而所有现有的研究仅使用单个性状进行选择。第三,除了方差成分(VC)之外,由于模拟数据跨度超过 30 年,反映遗传和非遗传发展的长期趋势也很有趣。最后,我们评估了品种整体和特定位置表现的预测准确性。结果表明,在使用单一性状进行选择的 MNAR 下,VC 和长期趋势估计最差。与VC相比,长期趋势估计更受缺失机制和缺失率的影响。然而,品种性能的预测精度主要由缺失率驱动,对选择过程不太敏感。如果忽略遗传和非遗传的长期趋势,估计和预测都会恶化。尽管漏检率较高,但更多的测试年数将改善估计和预测。
更新日期:2024-05-24
down
wechat
bug