Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2024-09-12 , DOI: 10.1111/pbi.14466 Wenting Xu 1 , Jingang Liang 2 , Fan Wang 1 , Litao Yang 1, 3
Gene copy number is crucial for understanding genomic architecture and its implications in plant and animal genetics (Alonge et al., 2020; Castagnone-Sereno et al., 2019). In agriculture, variations in gene copy number (CNVs) are vital as they affect yield, stress resistance and metabolic capabilities (Yuan et al., 2021). Transgenesis, involving the introduction of foreign DNA into plant genomes, has revolutionized agriculture by creating genetically modified (GM) plants with desirable traits. Assessing gene copy numbers in GMOs ensures stability and expression of introduced traits and is crucial for regulatory compliance and biosafety assessments (Liang et al., 2022). Evaluating gene copy numbers in transgenic plants is technically challenging due to variability in transgene integration events (Faure, 2021). Various techniques like Southern blotting (SB), quantitative real-time PCR (qPCR), digital PCR (dPCR) and paired-end whole-genome sequencing (PE-WGS) have been reported for gene copy number determination (Cusenza et al., 2021). However, no systematic comparison of these four methods has been reported, especially concerning PE-WGS.
Here, we performed a comparative benchmarking of gene copy number assessment techniques, including SB, qPCR, dPCR and PE-WGS, employing 4 GM crop events (FG72 soybean, 12-5 maize, G6H1 and G281 rice) as examples (Figure 1a; Data S1).
In SB analysis, we used various restriction endonucleases for genomic DNA digestion. For the event G6H1, BamHI, SacI, KpnI and StuI revealed one copy of cry1Ab/vip3H and G6epsps. The G281 event showed a single copy of hLF but uncertain G6epsps copy numbers (one or two). FG72 displayed inconsistent band patterns for 2mepsps and hppdPfW336, suggesting copy numbers of one or two. Maize 12-5, analysed with KpnI and XbaI, indicated a single copy of G10epsps and cry1Ab/cry2Aj (Figure 1b, Table S1).
We employed absolute quantification via qPCR, using endogenous reference genes SPS, Lectin and zSSIIb for rice, soybean and maize, respectively. All assays were validated for high efficiency and precision (Figure 1c, Table S2). Quantitative real-time PCR results showed G6H1's G6epsps and cry1Ab/vip3H had values of 0.98 and 0.96; G281's G6epsps and hLF were 1.68 and 1.54; FG72's 2mepsps and hppdPfW336 were 1.72 and 1.67; maize 12-5's G10epsps and cry1Ab/cry2Aj were 0.81 and 0.83 (Table S3). These values suggest a single T-DNA fragment integration in G6H1 and 12-5 and two fragments in G281 and FG72.
Digital PCR provides absolute quantification by comparing target DNA to a reference gene. Results showed G6H1 had 0.94 and 0.97 copies for G6epsps and cry1Ab/vip3H. G281's G6epsps and hLF were 1.85 and 1.93. FG72's 2mepsps and hppdPfW336 were 1.69 and 1.68. 12-5's G10epsps and cry1Ab/cry2A were 0.57 and 0.59 (Table S4). These indicate single exogenous gene copies in G6H1 and 12-5 and dual copies in G281 and FG72.
In PE-WGS analysis, the sequencing depths for G6H1, G281, FG72 and 12-5 were 28.81, 28.91, 48.70 and 23.94, respectively (Table S5). Read counts for target genes were used to calculate copy numbers: G6H1 had 1.08 copies of G6epsps and 0.83 copies of cry1Ab/vip3H; G281 had 2.01 copies of G6epsps and 1.91 copies of hLF; FG72 had 1.80 copies of 2mepsps and 2.00 copies of hppdPfW336; 12-5 had 0.58 copies of G10epsps and 0.61 copies of cry1Ab/cry2Aj (Table S6).
Our systematic measurements of four GM events showed that all four techniques are suitable for this purpose to varying degrees. All methods accurately quantified single-copy genes; however, discrepancies emerged for multi-copy genes (Figure 1d). The strengths and limitations of each method concerning various aspects were summarized in Figure 1e.
Southern blotting is less accurate and sensitive for multi-copy genes. Southern blotting often underestimates multi-copy genes due to complex arrangements like tandem repeats and can overestimate due to incomplete digestion and cross-hybridization. qPCR, while more accurate than SB, struggles with high-copy genes due to resolution limits (around two-fold variation). Proper primer design and reaction optimization can still yield relatively accurate results. Digital PCR excels with high accuracy for multi-copy genes due to its partitioning capability, allowing it to detect minor changes in copy number, such as a 1.2-fold change from 5 to 6 copies (Whale et al., 2012). Paired-end whole-genome sequencing also provides precise quantification for multi-copy genes through adequate coverage and sophisticated data analysis tools, especially useful for genes with complex genomic rearrangements (Hehir-Kwa et al., 2018). Furthermore, PE-WGS has demonstrated high performance in elucidating the comprehensive molecular characterization of transgenic plants and animals. This includes identifying transgene insertion sites, flanking sequences, entire T-DNA integration structures and plasmid backbone presence, among other features.
Digital PCR and PE-WGS are most effective for distinguishing heterozygotes from homozygotes. Digital PCR offers absolute quantification without needing a standard curve, providing precise measurements. Paired-end whole-genome sequencing, through high-resolution mapping of paired-end reads, can differentiate between heterozygotes and homozygotes by analysing read depth. It is challenging to distinguish between homozygotes and heterozygotes with SB due to similar patterns from sequence homology. Quantitative PCR can distinguish heterozygotes from homozygotes based on Ct values, but it requires careful calibration and control and is influenced by PCR efficiency.
Southern blotting and PE-WGS require substantial amounts of DNA, whereas PCR-based methods need significantly less. Quantitative PCR necessitates high-quality DNA free from degradation and PCR inhibitors. Digital PCR is more tolerant of DNA degradation and inhibitors, delivering accurate quantification even with crude DNA extracts (Whale et al., 2012).
Quantitative PCR and dPCR are generally easier to set up and perform than SB and PE-WGS. SB is labor-intensive, involving several complex steps, including DNA digestion, transformation, hybridization and autoradiography. Paired-end whole-genome sequencing requires strict protocols, thorough DNA extraction, library preparation, machine sequencing and data analysis, necessitating skilled personnel in molecular biology and bioinformatics. Quantitative PCR involves optimizing conditions, constructing positive plasmid DNA, creating standard curves and testing, requiring normal molecular laboratory skills. Digital PCR entails sample partitioning using instruments that streamline the process into one platform.
Quantitative PCR requires the least technical expertise, primarily understanding primer design and qPCR setup. Digital PCR needs moderate expertise, especially for sample emulsification or droplet/well creation. Southern blotting demands high technical skills, particularly for DNA digestion, gel electrophoresis, transfer and hybridization. Paired-end whole-genome sequencing requires significant expertise in data analytics and bioinformatics, besides library preparation and sequencing skills. The experimental duration of qPCR and dPCR is faster than SB and PE-WGS, typically concluding within a day. Southern blotting and PE-WGS require at least 3 days, respectively. Digital PCR is quicker than qPCR because it does not require a standard curve.
Costwise, SB is relatively cheap owing to lower reagent costs and basic equipment needs. Quantitative PCR has a medium cost, with moderately expensive reagents and higher throughput, which reduces per-sample costs. Digital PCR is more costly due to expensive equipment and lower throughput but provides absolute quantification without standard curves. Paired-end whole-genome sequencing is the most expensive, justified by its comprehensive genomic characterization capabilities beyond single gene copy estimation.
We propose prioritizing dPCR and PE-WGS for precise gene copy number analysis. Paired-end whole-genome sequencing is especially suited for assessing multiple gene copies within a sample, while dPCR is optimal for smaller quantities per sample, offering robust tools for genomic research and biotechnological applications.
中文翻译:
转基因作物基因拷贝数估计技术的比较评估:来自 Southern blotting、qPCR、dPCR 和 NGS 的见解
基因拷贝数对于理解基因组结构及其对植物和动物遗传学的影响至关重要(Alonge等人, 2020 ;Castagnone-Sereno等人, 2019 )。在农业中,基因拷贝数(CNV)的变化至关重要,因为它们影响产量、抗逆性和代谢能力(Yuan et al ., 2021 )。转基因涉及将外源 DNA 引入植物基因组,通过创造具有理想性状的转基因 (GM) 植物,彻底改变了农业。评估转基因生物中的基因拷贝数可确保引入性状的稳定性和表达,对于监管合规性和生物安全评估至关重要(Liang et al ., 2022 )。由于转基因整合事件的可变性,评估转基因植物中的基因拷贝数在技术上具有挑战性(Faure, 2021 )。多种技术如 Southern blotting (SB)、定量实时 PCR (qPCR)、数字 PCR (dPCR) 和双端全基因组测序 (PE-WGS) 已被报道可用于基因拷贝数测定(Cusenza等人, 2021 )。然而,尚未报道这四种方法的系统比较,特别是在 PE-WGS 方面。
在这里,我们以 4 个转基因作物事件(FG72 大豆、12-5 玉米、G6H1 和 G281 水稻)为例,对基因拷贝数评估技术(包括 SB、qPCR、dPCR 和 PE-WGS)进行了比较基准测试(图 1a;数据S1)。
在 SB 分析中,我们使用各种限制性内切酶进行基因组 DNA 消化。对于 G6H1 事件,BamHI、SacI、KpnI 和 StuI 揭示了一个cry1Ab/vip3H和G6epsps的副本。 G281 事件显示hLF的单个拷贝,但G6epsps拷贝数不确定(一个或两个)。 FG72 显示2mepsps和hppdPfW336的条带模式不一致,表明拷贝数为一或两个。用 KpnI 和 XbaI 分析玉米 12-5,表明有G10epsps和cry1Ab/cry2Aj的单个副本(图 1b,表 S1)。
我们通过 qPCR 进行绝对定量,分别使用水稻、大豆和玉米的内源参考基因 SPS、凝集素和 zSSIIb。所有测定均经过验证,具有高效率和高精度(图 1c,表 S2)。实时定量PCR结果显示G6H1的G6epsps和cry1Ab/vip3H的值为0.98和0.96; G281的G6epsps和hLF分别为1.68和1.54; FG72的2mepsps和hppdPfW336分别为1.72和1.67;玉米12-5的G10epsps和cry1Ab/cry2Aj分别为0.81和0.83(表S3)。这些值表明 G6H1 和 12-5 中有一个 T-DNA 片段整合,G281 和 FG72 中有两个片段整合。
数字 PCR 通过将目标 DNA 与参考基因进行比较来提供绝对定量。结果显示,G6H1 的G6epsps和cry1Ab/vip3H的拷贝数为0.94 和0.97。 G281的G6epsps和hLF分别为1.85和1.93。 FG72的2mepsps和hppdPfW336分别为1.69和1.68。 12-5的G10epsps和cry1Ab/cry2A分别为0.57和0.59(表S4)。这些表明 G6H1 和 12-5 中存在单外源基因拷贝,G281 和 FG72 中存在双拷贝。
在PE-WGS分析中,G6H1、G281、FG72和12-5的测序深度分别为28.81、28.91、48.70和23.94(表S5)。目标基因的读数计数用于计算拷贝数:G6H1 有 1.08 个G6epsps拷贝和 0.83 个cry1Ab/vip3H拷贝; G281有2.01个G6epsps拷贝和1.91个hLF拷贝; FG72 有 1.80 个2mepsps拷贝和 2.00 个hppdPfW336拷贝; 12-5有0.58个G10epsps拷贝和0.61个cry1Ab/cry2Aj拷贝(表S6)。
我们对四个 GM 事件的系统测量表明,所有四种技术都在不同程度上适合此目的。所有方法均准确定量单拷贝基因;然而,多拷贝基因出现了差异(图1d)。图 1e 总结了每种方法在各个方面的优点和局限性。
Southern 印迹对于多拷贝基因的准确性和敏感性较差。 Southern 印迹经常会由于串联重复等复杂排列而低估多拷贝基因,并且由于不完全消化和交叉杂交而可能高估。 qPCR 虽然比 SB 更准确,但由于分辨率限制(大约两倍变异),难以处理高拷贝基因。适当的引物设计和反应优化仍然可以产生相对准确的结果。数字 PCR 由于其分区能力,在多拷贝基因方面表现出色,能够检测拷贝数的微小变化,例如从 5 拷贝到 6 拷贝的 1.2 倍变化(Whale等, 2012 )。配对末端全基因组测序还通过足够的覆盖范围和复杂的数据分析工具为多拷贝基因提供精确的定量,特别适用于具有复杂基因组重排的基因(Hehir-Kwa等人, 2018 )。此外,PE-WGS 在阐明转基因植物和动物的全面分子特征方面表现出高性能。这包括识别转基因插入位点、侧翼序列、整个 T-DNA 整合结构和质粒主链的存在等特征。
数字 PCR 和 PE-WGS 对于区分杂合子和纯合子最有效。数字 PCR 无需标准曲线即可提供绝对定量,从而提供精确的测量。双端全基因组测序,通过双端读段的高分辨率作图,可以通过分析读段深度来区分杂合子和纯合子。由于序列同源性的相似模式,区分具有 SB 的纯合子和杂合子是具有挑战性的。定量PCR可以根据Ct值区分杂合子和纯合子,但需要仔细的校准和控制,并且受到PCR效率的影响。
Southern 印迹和 PE-WGS 需要大量 DNA,而基于 PCR 的方法需要的 DNA 量要少得多。定量 PCR 需要不含降解和 PCR 抑制剂的高质量 DNA。数字 PCR 对 DNA 降解和抑制剂的耐受性更强,即使使用粗 DNA 提取物也能进行准确的定量(Whale等人, 2012 )。
定量 PCR 和 dPCR 通常比 SB 和 PE-WGS 更容易设置和执行。 SB 是劳动密集型的,涉及几个复杂的步骤,包括 DNA 消化、转化、杂交和放射自显影。双端全基因组测序需要严格的实验方案、彻底的DNA提取、文库制备、机器测序和数据分析,需要分子生物学和生物信息学方面的熟练人员。定量 PCR 涉及优化条件、构建阳性质粒 DNA、创建标准曲线和测试,需要正常的分子实验室技能。数字 PCR 需要使用仪器进行样品分配,从而将流程简化到一个平台中。
定量 PCR 需要的技术专业知识最少,主要了解引物设计和 qPCR 设置。数字 PCR 需要一定的专业知识,尤其是样品乳化或液滴/孔的创建。 Southern 印迹需要很高的技术技能,特别是 DNA 消化、凝胶电泳、转移和杂交。除了文库制备和测序技能之外,双端全基因组测序还需要数据分析和生物信息学方面的丰富专业知识。 qPCR 和 dPCR 的实验持续时间比 SB 和 PE-WGS 更快,通常在一天内结束。 Southern blotting 和 PE-WGS 分别需要至少 3 天。数字 PCR 比 qPCR 更快,因为它不需要标准曲线。
从成本角度来看,由于试剂成本较低和基本设备需求较低,SB 相对便宜。定量 PCR 的成本中等,试剂价格适中,通量较高,从而降低了每个样品的成本。由于设备昂贵且通量较低,数字 PCR 成本较高,但无需标准曲线即可提供绝对定量。双端全基因组测序是最昂贵的,其综合基因组表征能力超出了单基因拷贝估计。
我们建议优先考虑 dPCR 和 PE-WGS 进行精确的基因拷贝数分析。双端全基因组测序特别适合评估样本中的多个基因拷贝,而 dPCR 最适合每个样本的较小数量,为基因组研究和生物技术应用提供强大的工具。