Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2024-09-20 , DOI: 10.1111/pbi.14479 Junxiang Zhang, Shuang Liu, Shuo Zhao, Yuxin Nie, Zhihong Zhang
Cultivated strawberry (Fragaria × ananassa, 2n = 8x = 56) is an important horticultural crop with substantial economic and nutritional value. The improvement of cultivated strawberry is more challenging not only in its octoploid genome but also in the frequent homoeologous exchanges and polyploidization, which replaces substantial portions of some subgenomes with sequences derived from ancestrally related chromosomes (Edger et al., 2019). Therefore, a high-quality genome for the cultivated strawberry will provide important information for identifying agriculturally important genes for breeding. Several cultivated strawberry genomes have been assembled. However, some published reference genomes of cultivated strawberries remained incomplete, and some published genomes of cultivated strawberries were not truly haplotype-resolved (Edger et al., 2019; Lee et al., 2021; Mao et al., 2023; Song et al., 2024).
Here, we de novo assembled a telomere-to-telomere haplotype-resolved reference genome with 56 chromosomes (Figure 1a) of the white-fruited strawberry cultivar ‘Chulian’ (Figure S1) by incorporating PacBio HiFi, ONT ultra-long and Hi-C sequencing, and Illumina sequencing data. The centromere candidate sequences and regions of each chromosome were identified (Figure S2 and Table S1). We divided 56 chromosomes into two haplotypes, Hap1 (chr × − × −1) and Hap2 (chr × − × −2), and each haplotype includes 28 chromosomes. The final genome assembly sizes were 787.52 Mb with 33 contigs for Hap1 and 778.03 Mb with 34 contigs for Hap2, respectively. The contigs N50 of Hap1 and Hap2 were 27.92 Mb and 26.45 Mb, respectively. We identified 52 telomeres in Hap1 and 50 in Hap2 by investigating telomeric repeats (TTTAGGG)n (Figures S2; Table S2).
The integrity and accuracy of the genome assembly of ‘Chulian’ were evaluated by Benchmarking Universal Single-Copy Orthologs (BUSCO) assessments (Tables S3 and S4) and showed that the genome assembly of ‘Chulian’ had high coverage and quality. A total of 110 001 and 108 859 protein-coding genes were annotated in the Hap1 and Hap2, respectively. In addition, 5864 and 5830 transcription factors were predicted in the Hap1 and Hap2, respectively. The information on repetitive sequences is in Tables S5 and S6.
We conducted collinearity analysis of Hap1 (Reference) and Hap2 (Query) to investigate variations of two haplotype genomes of ‘Chulian’ strawberry. We discovered 16 315 syntenic blocks totaling ~631 Mb, covering 92.82% and 93.96% of the Hap1 and Hap2 genomes (Figure S3; Table S7). Moreover, we compared ‘Chulian’ with the high-quality cultivated strawberry ‘Yanli’ (Mao et al., 2023) due to their diverse phenotype differences, such as fruit colour, hardness and powdery mildew resistance. The comparison results showed that the haplotype genome of ‘Chulian’ and ‘Yanli’ had high similarity and collinearity (Figure S4). We compared Hap1 and Hap2 of ‘Chulian’ to Hap1 and Hap2 of ‘Yanli’ to analyse the number of structural variations (SVs), the length range of SVs and the position of the maximum SVs per chromosome (Figure S5). The SVs with lengths over 100 bp and located in the genomic gene regions (exons and introns), promoter region (2 kb from start codon) and downstream regions (2 kb from stop codon) between ‘Chulian’ and ‘Yanli’ had also been completely identified (Appendix S1). Interestingly, many genes of ‘Chulian’ with large SVs in their exon and promoter regions were related to disease resistance, including receptor protein kinase containing LRR repeats, TIR-NBS-LRR class protein, chitinase and putative powdery mildew resistance protein compared with ‘Yanli’ (Appendix S2). Moreover, we also found numerous transcription factors (WRKY, MYB, MADS-box, bHLH, ERF, bZIP, etc.) of ‘Chulian’ with large SVs in these exon and promoter regions compared with ‘Yanli’ (Appendix S2), and the functions of these transcription factors need to be investigated in further.
The fruit flesh of ‘Chulian’ was white due to the loss of anthocyanin accumulation. To identify candidate genes responsible for the white fruit phenotype of ‘Chulian’, we examined the master positive regulator FaMYB10 of anthocyanin biosynthesis in ‘Chulian’ and ‘Yanli’ by utilizing the high-quality genomic sequence. Interestingly, the FaMYB10 on chr1-2-1 had 8-bp ‘ACTTATAC’ insertion in the 491 nucleotides of ‘Chulian’ (Figure S6a). The FaMYB10 on chr1-2-1 of ‘Chulian’ germinated a truncated protein with 179 amino acids due to a premature stop codon relative to ‘Yanli’ (producing 233 amino acids; Figure S6b). The FaMYB10 on chr1-2-2 only had a single nucleotide difference compared with ‘Yanli’. The point mutation (C to A) was found at the 94th nucleotide, resulting in an amino acid substitution from histidine (H) in ‘Yanli’ to asparagine (N) in ‘Chulian’ (Figures S1a, b). The transient functional analysis found that overexpression of FaMYB10 on chr1-2-1 of ‘Yanli’ could restore the anthocyanin deficiency phenotype of ‘Chulian’ (Figure S7). Interestingly, the transient functional analysis found that the fruits of importing FaMYB10 on chr1-2-2 of ‘Chulian’ with its promoter [Pro-CL-FaMYB10(1–2-2)] did not restore the anthocyanin deficiency phenotype of ‘Chulian’. In contrast, the fruits of importing FaMYB10 on chr1-2-2 of ‘Yanli’ with its promoter [Pro-YL-FaMYB10(1–2-2)] recovered the anthocyanin deficiency phenotype of ‘Chulian’ (Figure 1b). Furthermore, some anthocyanin biosynthetic genes' expression levels increased in the fruits of importing Pro-YL-FaMYB10(1–2-2) compared with the control fruit (Figure S8). These results suggested that the point mutation of FaMYB10 on chr1-2-2 of ‘Chulian’ affected its function, and the molecular basis awaits further investigation. Together, 8-bp insertion in FaMYB10 on chr1-2-1 and the point mutation in FaMYB10 on chr1-2-2 were the main reasons for the white fruit phenotype of the ‘Chulian’ strawberry.
Cultivated strawberry is an allo-octoploid species with four subgenomes (Edger et al., 2019). Genes from different subgenomes display expression differences, and the dominant gene expression pattern is detected in many allopolyploid species. During the development of ‘Chulian’ strawberry fruits, FaMYB10 on chr1-2 was the dominant expression gene (Figure S9). However, the fruit skin of ‘Chulian’ turned red and accumulated anthocyanin under light treatment (Figure 1c). We conducted RNA-seq of fruit skin of ripening fruits under lighting and shading treatments. A total of 5265 genes were differentially expressed. 2215 were upregulated, and 3050 were downregulated (Figure S10a). KEGG analysis revealed these differentially expressed genes mainly involved in plant hormone signal transduction, plant circadian rhythm, protein processing in the endoplasmic reticulum and flavonoid metabolism pathways (Figure S10b). Intriguingly, we found the transcript level of FaMYB10 on chr1-4 of ‘Chulian’ other than FaMYB10 on chr1-2 in the fruit skin under lighting treatment was significantly increased compared with fruit skin under shading treatment (Figure 1d). Moreover, we found the promoter of FaMYB10 on chr1-4 included more light-responsive elements and salicylic acid and methyl jasmonate elements (Figure S11; Table S8) than FaMYB10 on chr1-2 of ‘Chulian’.
In conclusion, we obtained a high-quality haplotype-resolved genome of the octoploid white-fruited cultivar ‘Chulian’. We found that an 8-bp insertion in the coding region of FaMYB10 on chr1-2-1 and the single nucleotide mutation in FaMYB10 on chr1-2-2 were related to the loss of anthocyanins in the fruits. Interestingly, we found that the accumulation of anthocyanins was light-regulated by activating the expression of FaMYB10 on chr1-4 instead of the dominant homoeologous FaMYB10 on chr1-2 during fruit development. These results will lay a solid foundation for comparative genomic analysis, understanding the expression pattern of genes in the subgenome of polyploidy species and fruit colour breeding of cultivated strawberry.
中文翻译:
白果草莓端粒到端粒单倍型解析基因组揭示了栽培草莓果实颜色形成的复杂性
栽培草莓( Fragaria × ananassa , 2 n = 8 x = 56)是一种重要的园艺作物,具有巨大的经济和营养价值。栽培草莓的改良更具挑战性,不仅在于其八倍体基因组,还在于频繁的同源交换和多倍化,即用源自祖先相关染色体的序列取代了一些亚基因组的大部分(Edger等, 2019 )。因此,栽培草莓的高质量基因组将为鉴定农业上重要的育种基因提供重要信息。几种栽培草莓的基因组已经组装完毕。然而,一些已发表的栽培草莓参考基因组仍然不完整,一些已发表的栽培草莓基因组并未真正解析单倍型(Edger等, 2019 ;Lee等, 2021 ;Mao等, 2023 ;Song等) ., 2024 )。
在这里,我们通过结合 PacBio HiFi、ONT 超长和 Hi-从头组装了白果草莓品种“楚莲”(图 S1)的 56 条染色体的端粒到端粒单倍型解析参考基因组(图 1a)。 C 测序和 Illumina 测序数据。确定了每条染色体的着丝粒候选序列和区域(图 S2 和表 S1)。我们将56条染色体分为两个单倍型,Hap1(chr × − × −1)和Hap2(chr × − × −2),每个单倍型包括28条染色体。最终的基因组组装大小分别为 787.52 Mb(包含 33 个重叠群)(Hap1)和 778.03 Mb(包含 34 个重叠群)(Hap2)。 Hap1 和 Hap2 的重叠群 N50 分别为 27.92 Mb 和 26.45 Mb。通过研究端粒重复序列 (TTTAGGG)n,我们鉴定了 Hap1 中的 52 个端粒和 Hap2 中的 50 个端粒(图 S2;表 S2)。
通过通用单拷贝直系同源基准(BUSCO)评估(表S3和S4)对‘楚莲’基因组组装的完整性和准确性进行了评估,结果表明‘楚莲’基因组组装具有高覆盖度和质量。 Hap1和Hap2中分别注释了总共110 001和108 859个蛋白质编码基因。此外,在 Hap1 和 Hap2 中分别预测了 5864 个和 5830 个转录因子。有关重复序列的信息在表 S5 和 S6 中。
我们对Hap1(参考)和Hap2(查询)进行共线性分析,以研究'初莲'草莓的两个单倍型基因组的变异。我们发现了 16 315 个同线性块,总计约 631 Mb,覆盖了 Hap1 和 Hap2 基因组的 92.82% 和 93.96%(图 S3;表 S7)。此外,我们将“初莲”与优质栽培草莓“艳丽”进行了比较(Mao et al ., 2023 ),因为它们具有不同的表型差异,如果实颜色、硬度和白粉病抗性。比对结果表明,‘楚莲’和‘艳丽’的单倍型基因组具有较高的相似性和共线性(图S4)。我们将“初联”的 Hap1 和 Hap2 与“艳丽”的 Hap1 和 Hap2 进行比较,分析结构变异(SV)的数量、SV 的长度范围以及每条染色体最大 SV 的位置(图 S5)。长度超过 100 bp 的 SV 位于“初炼”和“艳丽”之间的基因组基因区域(外显子和内含子)、启动子区域(距起始密码子 2 kb)和下游区域(距终止密码子 2 kb)。完全识别(附录 S1)。有趣的是,与“燕丽”相比,“楚莲”的外显子和启动子区具有大SV的许多基因与抗病性相关,包括含有LRR重复序列的受体蛋白激酶、TIR-NBS-LRR类蛋白、几丁质酶和推定的白粉病抗性蛋白。 ”(附录 S2)。此外,我们还发现‘楚联’的外显子和启动子区域存在大量转录因子(WRKY、MYB、MADS-box、bHLH、ERF、bZIP等),与‘燕丽’相比,SV较大(附录S2),并且这些转录因子的功能需要进一步研究。
由于花青素积累的损失,‘初莲’的果肉呈白色。为了鉴定导致“初莲”白色果实表型的候选基因,我们利用高质量的基因组序列对“初莲”和“艳丽”花青素生物合成的主要正调控因子FaMYB10进行了检测。有趣的是,chr1-2-1 上的FaMYB10在“Chulian”的 491 个核苷酸中插入了 8 bp“ACTTATAC”(图 S6a)。由于相对于“Yanli”(产生 233 个氨基酸;图 S6b)的过早终止密码子,“Chulian”的 chr1-2-1 上的FaMYB10萌发了具有 179 个氨基酸的截短蛋白质。 FaMYB10在chr1-2-2上与“艳丽”相比仅存在一个核苷酸差异。在第 94 个核苷酸处发现点突变(C 到 A),导致氨基酸从“艳丽”中的组氨酸(H)替换为“楚莲”中的天冬酰胺(N)(图 S1a、b)。瞬时功能分析发现,在'艳丽'的chr1-2-1上过表达FaMYB10可以恢复'初莲'的花青素缺乏表型(图S7)。有趣的是,瞬时功能分析发现,在'初莲'的chr1-2-2上导入FaMYB10及其启动子[Pro-CL-FaMYB10(1–2-2)]的果实并没有恢复'初莲'的花青素缺乏表型。 '。相比之下,在'艳丽'的chr1-2-2上导入FaMYB10及其启动子[Pro-YL-FaMYB10(1–2-2)]的果实恢复了'初莲'的花青素缺乏表型(图1b)。此外,与对照果实相比,导入Pro-YL-FaMYB10(1–2-2)的果实中一些花青素生物合成基因的表达水平有所增加(图S8)。 这些结果表明'初联'的FaMYB10基因chr1-2-2上的点突变影响了其功能,其分子基础有待进一步研究。 FaMYB10 chr1-2-1 上的 8 bp 插入和FaMYB10 chr1-2-2 上的点突变共同是“楚莲”草莓白色果实表型的主要原因。
栽培草莓是一种异源八倍体物种,具有四个亚基因组(Edger等, 2019 )。来自不同亚基因组的基因表现出表达差异,并且在许多异源多倍体物种中检测到显性基因表达模式。在‘初莲’草莓果实发育过程中,chr1-2上的FaMYB10是显性表达基因(图S9)。然而,‘初莲’的果皮在光照处理下变红并积累了花青素(图1c)。我们对光照和遮光处理下成熟水果的果皮进行了 RNA 测序。共有5265个基因存在差异表达。 2215 个上调,3050 个下调(图 S10a)。 KEGG分析显示这些差异表达基因主要涉及植物激素信号转导、植物昼夜节律、内质网蛋白质加工和类黄酮代谢途径(图S10b)。有趣的是,我们发现,与遮光处理下的果皮相比,光照处理下的果皮中除chr1-2上的FaMYB10外,‘楚莲’的FaMYB10在chr1-4上的转录水平显着增加(图1d)。此外,我们发现FaMYB10在chr1-4上的启动子比'楚莲'的在chr1-2上的FaMYB10包含更多的光响应元件以及水杨酸和茉莉酸甲酯元件(图S11;表S8)。
总之,我们获得了八倍体白果品种“楚莲”的高质量单倍型解析基因组。我们发现FaMYB10在chr1-2-1编码区的8bp插入和FaMYB10在chr1-2-2上的单核苷酸突变与果实中花青素的损失有关。有趣的是,我们发现在果实发育过程中,通过激活 chr1-4 上的FaMYB10 (而不是 chr1-2 上的显性同源FaMYB10)的表达来光调节花青素的积累。这些结果将为比较基因组分析、了解多倍体物种亚基因组中基因的表达模式以及栽培草莓的果实颜色育种奠定坚实的基础。