Using machine learning to link climate, phylogeny and leaf area in eucalypts through a 50-fold expansion of leaf trait datasets,Journal of Ecology

当前位置： X-MOL 学术 › J. Ecol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using machine learning to link climate, phylogeny and leaf area in eucalypts through a 50-fold expansion of leaf trait datasets
Journal of Ecology ( IF 5.3 ) Pub Date : 2024-07-14 , DOI: 10.1111/1365-2745.14354
Karina Guo _{1,

2} , William K. Cornwell ₂ , Jason G. Bragg _{1,

2}

Affiliation

1 INTRODUCTION

Leaves are a fundamental unit of photosynthesis, and their size influences many ecological processes. This has led to an extensive body of research, ranging from the regulation of carbon flux over vast areas of the earth (Reich, 2012), to influencing ecosystem dynamics by affecting individual plant growth and survival (Leigh et al., 2017; Wang et al., 2019; Wright et al., 2017). Understanding variation in leaf size can potentially facilitate better predictions of adaptive shifts in traits (Pritzkow et al., 2020; Wang et al., 2022). This will enable better comprehension of leaf energy balances (Wright et al., 2017) and their relationship with models of forest productivity and plantation growth (Battaglia et al., 1998; Madani et al., 2018; Reich, 2012; Violle et al., 2007).

Across climatic gradients, leaf area has been generally found to increase from dry to wet environments and from colder to hotter climates (Moles et al., 2014; Peppe et al., 2011; Souza et al., 2018; Wright et al., 2017). One proposed explanation is that smaller leaves exchange heat and gases with the environment more readily due to a reduced boundary layer. This likely promotes cooling in warm environments, provided sufficient water is available (Leigh et al., 2017; Nobel, 2009). However, trait–climate relationships are complex, and can also depend on spatial scale and evolutionary history (Ackerly et al., 2002; McDonald et al., 2003; Milla & Reich, 2011). For example, global trait–climate relationships have been found to be decoupled at local scales (Ackerly & Cornwell, 2007; Reich et al., 2003). At this scale, when trait–climate relationships are examined within species, among a very diverse set, they are sometimes reported to be weaker, absent, or even follow opposite directions (referred to as intraspecific trait variation, or ITV) (Ackerly et al., 2002; An et al., 2021; McDonald et al., 2003; Westerband et al., 2021; Wilde et al., 2023). Several possible scenarios are illustrated in Figure 1, depicting ways trait–climate associations might vary at different scales within a large clade. In Scenario 1, adaptation to climate leads to concordant trait–climate associations within and between species. In Scenario 2, gene flow between populations may prevent adaptation to local environments within species, counteracting environmental pressures (Alexander et al., 2022; Leimu & Fischer, 2008). In Scenario 3, phenotype adaptation is limited over longer time scales, and trait–climate associations are observed primarily in deep clades (An et al., 2021; Leimu & Fischer, 2008). Studies of links between leaf traits and climatic variables across varying evolutionary scales, from ITV (e.g. An et al., 2021) to major plant clades (e.g. Ackerly & Reich, 1999; Wilde et al., 2023), is critical to predicting phenotypic evolution and shifts in traits under a changing climate.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewerPowerPoint

Three scenarios illustrating impacts of evolutionary divergence and intraspecific gene flow on trait–climate relationships. Groups A and A* are populations of the same species and remain connected to each other by gene flow. Groups C and D are separate species that have quite recently, but completely, diverged and share limited recent gene flow. The circles represent different internal nodes within the hypothetical phylogenetic tree. In all three scenarios, there is a positive overall trait–climate association. In Scenario 1, there is a strong trait–climate relationship within each of the two recently diverged clades, resulting in roughly similar slopes in each clade. In Scenario 2, gene flow within species supresses local adaptation, causing divergence from overall trait–climate trends as seen in clades A and A*. However, this effect is relaxed in groups C and D from the lack of gene flow, thus exhibiting a strong trait–climate relationship. In Scenario 3, trait evolution is more constrained, so that strong adaptation is observed only among longer diverged groups. Here, there is no trait–climate relationships within the clade containing A and A* or C and D, but there is an association overall, reflecting adaptation over longer time scales.

Currently, there is a paucity of research that examines leaf area variation in the perspective of phylogeny and ITV simultaneously, despite its potential usefulness (Leimu & Fischer, 2008; Mudrák et al., 2019; Souza et al., 2018). One potential reason lies in the laborious and time-intensive nature of data collection (Bastias et al., 2017; Li et al., 2020). Manual measurements hinder the acquisition of datasets with high intraspecific sampling within and across different clades and climates (Bastias et al., 2017; Li et al., 2020). Consequently, few studies have spanned intraspecific and phylogenetic scales (see also Brenskelle et al., 2020; Cutts et al., 2021; Goëau et al., 2020; Pearson et al., 2020; Wilde et al., 2023), possibly resulting in conflicting reports on the effects of ITV (also suggested by Bastias et al., 2017; Li et al., 2020).

This study addresses this issue by using machine learning (ML) paired with herbarium records. Herbarium specimens are pressed plants of various taxa collected globally. These specimens provide a holistic representation of plant material that includes both mature and juvenile leaves, along with mature and immature individuals (Kozlov et al., 2021). These datasets offer extensive phylogenetic and geographic sampling due to the long history of herbaria, but extracting trait data manually is impractical. Thus, we employed ML to automate the extraction of trait data from these specimens. Other studies have successfully used ML in trait extraction from herbarium images (Hussein et al., 2021; Weaver et al., 2020; Wilde et al., 2023; Younis et al., 2018). Our approach has extended these studies to explore a focal clade in a phylogenetic framework. This was motivated by a recent study of herbarium specimens (Wilde et al., 2023) that found trait–climate relationships observed at the level of genera were often inconsistent or absent within species. This raises the question, within a large and widely distributed clade, at what phylogenetic or taxonomic scales are trait–climate associations observed? Here we address this question in eucalypt trees. By using ML, we are able to rapidly estimate leaf size from plant specimens to generate a dataset that explores the shifts in trait–climate associations across different taxonomic levels, and a molecular phylogeny.

Eucalypts are dominant canopy trees throughout many Australian forests and shrublands (Booth et al., 2015; Govindan, 2005). The eucalypt clade consists of three genera, Eucalyptus (L'Hér.), Angophora (Cav.) and Corymbia (K.D. Hill & L.A.S. Johnson). They were selected as our study taxa due to their wide distribution across Australia (Figure 2), the availability of a molecular phylogeny (Thornhill et al., 2019) and characteristic simple leaves with entire margins making this method feasible. By pairing our ML-generated dataset with the fully resolved eucalypt phylogenetic tree (Thornhill et al., 2019), we could link microevolution to macroevolution, facilitating observations of the shift in trait–climate relationships across different clades and evolutionary depths.

In conclusion, we address the scarcity of trait datasets spanning a wide phylogenetic and spatial scales (Moran et al., 2016) using ML. This allows us to build a large dataset to address the following questions: (a) How does leaf area vary in eucalypts, across Australian climates? (b) Do associations between leaf area and climate change at different levels of evolutionary depth? We will examine this question in relation to the scenarios outlined in Figure 1. Addressing these questions will lead to a better understanding of factors shaping leaf area and provide methods for the automation of trait data collection from specimen images, applicable to additional taxa and traits in the future.

中文翻译：

使用机器学习通过将叶子性状数据集的 50 倍扩展来将桉树的气候、系统发育和叶面积联系起来

1 引言

叶子是光合作用的基本单位，它们的大小影响着许多生态过程。这导致了广泛的研究，从地球广大地区的碳通量调节（Reich，2012 年），到通过影响单个植物的生长和生存来影响生态系统动态（Leigh 等人，2017 年;Wang et al.， 2019;Wright等人，2017 年）。了解叶子大小的变化可能有助于更好地预测性状的适应性变化（Pritzkow等人，2020 年;Wang et al.， 2022）。这将使人们更好地理解叶片能量平衡（Wright等，2017）及其与森林生产力和人工林生长模型的关系（Battaglia等，1998;Madani等人，2018 年;Reich，2012 年;Violle et al.， 2007）。

在整个气候梯度中，通常发现叶面积从干燥到潮湿的环境以及从寒冷到炎热的气候增加（Moles et al.， 2014;Peppe et al.， 2011;Souza等人，2018 年;Wright等人，2017 年）。一种提出的解释是，由于边界层减少，较小的叶子更容易与环境交换热量和气体。如果有足够的水可用，这可能会促进温暖环境中的冷却（Leigh et al.， 2017;诺贝尔奖，2009 年）。然而，性状-气候关系很复杂，也可能取决于空间尺度和进化历史（Ackerly et al.， 2002;McDonald 等人，2003 年;Milla & Reich，2011 年）。例如，已发现全球性状-气候关系在局部尺度上是解耦的（Ackerly & Cornwell，2007 年;Reich et al.， 2003）。在这个尺度上，当在非常多样化的一组物种中检查物种内部的性状-气候关系时，有时会报告它们较弱、不存在，甚至遵循相反的方向（称为种内性状变异，或 ITV）（Ackerly 等人，2002 年;An et al.， 2021;McDonald 等人，2003 年;Westerband等人，2021 年;Wilde等人，2023 年）。图 1 说明了几种可能的情景，描述了一个大分支内性状-气候关联在不同尺度上可能变化的方式。在情景 1 中，对气候的适应导致物种内部和物种之间的一致性状-气候关联。在情景 2 中，种群之间的基因流动可能会阻止物种内对当地环境的适应，从而抵消环境压力（Alexander 等人。，2022 年;Leimu & Fischer，2008 年）。在情景 3 中，表型适应在较长的时间尺度上受到限制，并且性状-气候关联主要在深分支中观察到（An et al.， 2021;Leimu & Fischer，2008 年）。研究不同进化尺度的叶子性状和气候变量之间的联系，从ITV（例如An等人（2021年）到主要植物分支（例如Ackerly&Reich，1999;Wilde et al.， 2023）对于预测气候变化下的表型进化和性状变化至关重要。

目前，尽管其潜在有用性（Leimu & Fischer， 2008;Mudrák等人，2019 年;Souza等人，2018 年）。一个潜在原因在于数据收集的费力和时间密集型性质（Bastias 等人，2017 年;Li et al.， 2020）。手动测量阻碍了在不同分支和气候内和不同分支和气候之间获取具有高种内采样的数据集（Bastias et al.， 2017;Li et al.， 2020）。因此，很少有研究跨越种内和系统发育尺度（另见 Brenskelle 等人，2020 年;Cutts et al.， 2021;Goëau等人，2020 年;Pearson等人，2020 年;Wilde et al.， 2023），可能导致关于 ITV 效果的相互矛盾的报告（Bastias et al.， 2017 也建议;Li et al.， 2020）。

本研究通过使用机器学习（ML）与植物标本馆记录配对来解决这个问题。植物标本是在全球范围内收集的各种分类群的压制植物。这些标本提供了植物材料的整体表示，包括成熟和幼年叶子，以及成熟和未成熟的个体（Kozlov et al.， 2021）。由于标本馆的悠久历史，这些数据集提供了广泛的系统发育和地理采样，但手动提取性状数据是不切实际的。因此，我们采用 ML 来自动从这些样本中提取特征数据。其他研究已成功将 ML 用于从植物标本馆图像中提取性状（Hussein等人，2021 年;Weaver等人，2020 年;Wilde 等人，2023 年;Younis et al.， 2018）。我们的方法扩展了这些研究，以探索系统发育框架中的焦点分支。这是由最近一项对植物标本的研究（Wilde et al.， 2023）推动的，该研究发现在属水平上观察到的性状-气候关系在物种内往往不一致或不存在。这就提出了一个问题，在一个庞大且分布广泛的分支中，在什么系统发育或分类尺度上观察到性状-气候关联？在这里，我们在桉树中解决了这个问题。通过使用 ML，我们能够快速估计植物标本的叶子大小，以生成一个数据集，探索不同分类水平上性状-气候关联的变化，以及分子系统发育。

桉树是澳大利亚许多森林和灌木丛中的主要树冠乔木（Booth et al.， 2015;Govindan，2005 年）。桉树分支由三个属组成，桉树属（L'Hér.），Angophora属（Cav.）和Corymbia（K.D. Hill & L.A.S. Johnson）。它们被选为我们的研究分类群，因为它们在澳大利亚广泛分布（图 2）、分子系统发育的可用性（Thornhill 等人，2019 年）和具有整个边缘的特征性简单叶子使这种方法变得可行。通过将我们的 ML 生成的数据集与完全解析的桉树系统发育树（Thornhill et al.， 2019）配对，我们可以将微进化与宏观进化联系起来，从而促进观察不同分支和进化深度的性状-气候关系的变化。

总之，我们使用 ML 解决了跨越广泛的系统发育和空间尺度的性状数据集的稀缺性（Moran et al.， 2016）。这使我们能够构建一个大型数据集来解决以下问题：（a）不同澳大利亚气候条件下，桉树的叶面积如何变化？（b）叶面积与气候变化之间的关联在不同进化深度水平上是否？我们将结合图 1 中概述的情景来研究这个问题。解决这些问题将导致更好地了解塑造叶面积的因素，并提供从标本图像中自动收集性状数据的方法，适用于未来的其他分类群和性状。

更新日期：2024-07-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南