Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2024-10-24 , DOI: 10.1111/pbi.14496 Shouzhen Teng, Dan Wang, Yiheng Qian, Revocatus Bahitwa, Jinghong Shao, Mingrui Suo, Mingchi Xu, Luyuan Yang, Tianyi Li, Qiuying Yu, Hai Wang
Quantifying transcript levels is essential for understanding the gene functions and regulatory networks. To achieve this, various techniques have been developed to measure gene transcription levels, including transcriptome sequencing (Mutz et al., 2013). Standard RNA-seq experiment involves several key steps: RNA extraction, mRNA purification, reverse transcription, second-strand cDNA synthesis, adapter ligation, and library amplification. Traditionally, RNA-seq covers the entire coding region and requires sufficient sequencing depth to produce reliable results. To address this, targeted 3′ end transcriptome sequencing methods like PAT-seq (Harrison et al., 2015) have been developed, reducing both the sequencing coverage and costs, but these methods can only process one RNA sample at a time, limiting their application in large-scale studies. Fortunately, advancements like SiPAS (Wang et al., 2022) and MP3RNA-seq (Chen et al., 2021) have made the construction of 3′ terminal libraries more convenient and suitable for high-throughput applications. Recent studies have shown that Tn5 can act on RNA/DNA hybrids (Choi et al., 2024; Lu et al., 2020), eliminating the need for cDNA second-strand synthesis. Thus, we developed QUIC-seq, a quick, ultra-affordable, high-throughput and convenient method for gene expression analysis.
The workflow for QUIC-seq library preparation is illustrated in Figure 1a. Initially, total RNA is extracted from various samples and quantified. Approximately 500 ng RNA from each sample is transferred into a 96-well plate for reverse transcription using specific reverse transcription primers (Table S1). Next, half of the RNA/DNA hybrids from each well are pooled and recovered with magnetic beads. The RNA/DNA hybrids are then fragmented using Tn5 transposase, followed by direct PCR amplification. The amplified product is selected using magnetic beads and sequenced on illumina sequencing platform.
Optimizing the QUIC-seq library conditions involved adjusting various steps to achieve the best results. First, we tested the effect of reverse transcriptase from different manufacturers (Figure S1). The AIII enzyme performed the best, followed by VIII enzyme and TIV enzyme. Despite AIII's superior performance, its cost was double that of VIII, making it less economical. We also examined the effect of inactivating the reverse transcription enzyme. The results indicated that enzyme inactivation negatively affected the subsequent data. This could be due to high temperatures causing the RNA/DNA hybrids to denature and separate, resulting in sub-optimal library construction. In addition, we found that whether SDS was added or not had little effect on the number of genes detected (Figure 1b; Figure S2).
When Tn5 fragment RNA/DNA hybrids, it creates a 9 bp gap. Previous studies (Choi et al., 2024; Lu et al., 2020) suggested adding Bst3.0 or reverse transcriptase to fill this gap. However, we found that more detected genes (Figure 1b) were achieved without adding Bst3.0 or reverse transcriptase. This indicates that Takara Ex Premier polymerase effectively fills the hybrids' gap – a novel finding in this study. In summary, we eliminated the steps of reverse transcriptase inactivation, Tn5 transposase inactivation, and Bst3.0 or reverse transcriptase gap-filling, streamlining the procedure and reducing both time and cost.
To assess the reproducibility of the library construction, we performed two independent QUIC-seq experiments. The results demonstrated a high correlation coefficient of 0.98 between the replicates (Figure 1c). TruSeq reads were evenly distributed across the entire gene, while QUIC-seq reads were concentrated at the 3′ end as expected (Figure 1d). QUIC-seq has fewer reads aligned to intergenic regions than TruSeq, improving read utilization (Figure S3). To test for sample mixing, we used RNA from Arabidopsis, rice and maize in a single experiment. Approximately 99.8% of the reads from these samples aligned with their respective reference genomes, indicating minimal cross-contamination between barcodes (Figure 1e; Figure S4). Theoretical predictions suggest that increasing read counts leads to more gene detections, but also raises costs. Our analysis showed that around 20 000 genes were detected at 1.5 × 106 reads, with the number plateauing thereafter (Figure 1f).
The performances of QUIC-seq and other RNA-seq methods were compared. The correlation coefficient between the TruSeq and QUIC-seq library construction methods was 0.9 (Figure 1g), comparable to other 3′ terminal RNA-seq methods like MP3RNA-seq (Chen et al., 2021). In terms of gene identification, both methods detected 22 939 common genes (Figure 1h; Table S2). However, QUIC-seq offers significant advantages in cost and throughput. Unlike MP3RNA-seq, QUIC-seq simplifies the process by omitting RNA strand degradation and the conversion of single-stranded cDNA to double-stranded cDNA. While BOLT-seq requires the same amount of time for library preparation, it involves more complex steps such as adding Tn5 and performing PCR amplification in a 96-well plate (Choi et al., 2024). In contrast, QUIC-seq requires only reverse transcription in a 96-well plate, with Tn5 fragmentation and PCR amplification done in a few Eppendorf tubes, significantly reducing operational complexity. Additionally, QUIC-seq libraries includes three barcodes, further reducing costs. The incorporation of UMIs in the reverse transcription primers also enables more accurate gene expression quantification after UMI deduplication, a feature lacking in BOLT-seq. In summary, QUIC-seq is a simplified and efficient library preparation method that can process high-throughput samples (96 samples) within 4 h at a significantly reduced cost of only $0.823 per sample (Figure 1i; Tables S3 and S4).
Given the method's stability and convenience, we conducted several experiments using QUIC-seq. Initially, we identified 2477 differentially expressed genes (DEGs) in response to nitrogen and phosphorus stress, with 1339 genes up-regulated and 1138 down-regulated (Figure 1j; Figure S5). To further explore the role of Opaque2 (O2) and prolamin-box binding factor 1 (PBF1) in endosperm formation and to understand the expression regulatory network, we used QUIC-seq to screen target genes. In a previous transcriptome analysis using O2-GFP and GFP, 1715 up-regulated and 772 down-regulated genes were identified with FDR < 0.05 (Zhu et al., 2023). Under stricter criteria (FDR < 0.05, |log2FoldChange| > 1), QUIC-seq identified 2281 O2-up-regulated and 2913 O2-down-regulated genes, revealing more DEGs compared to PER-seq (Figure S6). Additionally, we found 539 PBF1-activated and 480 PBF1-repressed genes, with 88 genes up-regulated and 217 genes down-regulated in both O2-GFP and PBF1-GFP datasets. In analysing the expression regulatory network of BBM, we identified four transcription factor genes with functions similar to BBM. QUIC-seq was employed for library construction and sequencing, leading to the identification of DEGs in bHLH48, WRKY95, GATA28, and IFA1, ultimately uncovering 50 co-expressed regulatory genes associated with these four transcription factors (Figure 1k; Figure S7).
中文翻译:
QUIC‐seq:快速、超实惠、高通量、方便的 RNA 测序
定量转录水平对于了解基因功能和调控网络至关重要。为了实现这一目标,已经开发了各种技术来测量基因转录水平,包括转录组测序(Mutz等 人,2013 年)。标准 RNA-seq 实验涉及几个关键步骤:RNA 提取、mRNA 纯化、逆转录、第二链 cDNA 合成、接头连接和文库扩增。传统上,RNA-seq 覆盖整个编码区,需要足够的测序深度才能产生可靠的结果。为了解决这个问题,已经开发了靶向 3' 末端转录组测序方法,如 PAT-seq(Harrison等 人,2015 年),减少了测序覆盖率和成本,但这些方法一次只能处理一个 RNA 样本,限制了它们在大规模研究中的应用。幸运的是,SiPAS(Wang等 人,2022 年)和 MP3RNA-seq(Chen等 人,2021 年)等进步使 3' 末端文库的构建更加方便,适用于高通量应用。最近的研究表明,Tn5 可以作用于 RNA/DNA 杂交体(Choi等 人,2024 年;Lu et al., 2020),消除了 cDNA 第二链合成的需要。因此,我们开发了 QUIC-seq,这是一种快速、超实惠、高通量且方便的基因表达分析方法。
QUIC-seq 文库制备的工作流程如图 1a 所示。最初,从各种样品中提取总 RNA 并进行定量。将每个样品中的大约 500 ng RNA 转移到 96 孔板中,使用特异性逆转录引物进行逆转录(表 S1)。接下来,将来自每个孔的一半 RNA/DNA 杂交体合并并用磁珠回收。然后使用 Tn5 转座酶对 RNA/DNA 杂交体进行片段化,然后直接进行 PCR 扩增。使用磁珠选择扩增产物,并在 illumina 测序平台上进行测序。
优化 QUIC-seq 文库条件涉及调整各种步骤以获得最佳结果。首先,我们测试了来自不同制造商的逆转录酶的效果(图 S1)。AIII 酶表现最好,其次是 VIII 酶和 TIV 酶。尽管 AIII 的性能更胜一筹,但其成本是 VIII 的两倍,因此经济性较差。我们还检查了灭活逆转录酶的效果。结果表明,酶失活对后续数据产生负面影响。这可能是由于高温导致 RNA/DNA 杂交体变性和分离,从而导致文库构建不理想。此外,我们发现是否添加 SDS 对检测到的基因数量影响不大(图 1b;图 S2)。
当 Tn5 片段 RNA/DNA 杂交时,它会产生 9 bp 的间隙。以前的研究(Choi等 人,2024 年;Lu et al., 2020)建议添加 Bst3.0 或逆转录酶来填补这一空白。然而,我们发现在没有添加 Bst3.0 或逆转录酶的情况下获得了更多检测到的基因 (图 1b)。这表明 Takara Ex Premier 聚合酶有效地填补了杂交种的空白——这是本研究中的一个新发现。总之,我们消除了逆转录酶灭活、Tn5 转座酶灭活和 Bst3.0 或逆转录酶缺口填充的步骤,简化了程序并减少了时间和成本。
为了评估文库构建的可重复性,我们进行了两个独立的 QUIC-seq 实验。结果表明,重复之间的高相关系数为 0.98(图 1c)。TruSeq 读数均匀分布在整个基因中,而 QUIC-seq 读数如预期的那样集中在 3' 端(图 1d)。与 TruSeq 相比,QUIC-seq 比对基因间区域的读数更少,从而提高了读数利用率(图 S3)。为了测试样品混合,我们在单个实验中使用了来自拟南芥、水稻和玉米的 RNA。这些样品中大约 99.8% 的读数与其各自的参考基因组一致,表明条形码之间的交叉污染最小(图 1e;图 S4)。理论预测表明,增加读取计数会导致更多的基因检测,但也会增加成本。我们的分析表明,在 1.5 × 106 个读数处检测到大约 20 000 个基因,此后数量趋于稳定(图 1f)。
比较了 QUIC-seq 和其他 RNA-seq 方法的性能。TruSeq 和 QUIC-seq 文库构建方法之间的相关系数为 0.9(图 1g),与其他 3' 末端 RNA-seq 方法(如 MP3RNA-seq)相当(Chen等 人,2021 年)。在基因鉴定方面,两种方法都检测到 22 939 个常见基因(图 1h;表 S2)。但是,QUIC-seq 在成本和吞吐量方面具有显著优势。与 MP3RNA-seq 不同,QUIC-seq 通过省略 RNA 链降解和单链 cDNA 向双链 cDNA 的转化来简化该过程。虽然 BOLT-seq 需要相同的文库制备时间,但它涉及更复杂的步骤,例如添加 Tn5 和在 96 孔板中进行 PCR 扩增(Choi等人 ,2024 年)。相比之下,QUIC-seq 只需要在 96 孔板中进行逆转录,在几根 Eppendorf 管中完成 Tn5 片段化和 PCR 扩增,显著降低了操作复杂性。此外,QUIC-seq 文库包括三个条形码,进一步降低了成本。在逆转录引物中掺入 UMI 还可以在 UMI 去重后实现更准确的基因表达定量,这是 BOLT-seq 所缺乏的功能。总之,QUIC-seq 是一种简化且高效的文库制备方法,可以在 4 小时内处理高通量样品(96 个样品),每个样品的成本仅为 0.823 美元(图 1i;表 S3 和 S4)。
鉴于该方法的稳定性和便利性,我们使用 QUIC-seq 进行了几次实验。最初,我们鉴定了 2477 个响应氮和磷胁迫的差异表达基因 (DEG),其中 1339 个基因上调,1138 个基因下调(图 1j;图 S5)。为了进一步探讨不透明 2 (O2) 和醇溶蛋白盒结合因子 1 (PBF1) 在胚乳形成中的作用并了解表达调控网络,我们使用 QUIC-seq 筛选靶基因。在先前使用 O2-GFP 和 GFP 的转录组分析中,鉴定出 1715 个上调和 772 个下调基因,FDR < 为 0.05(Zhu等人 ,2023 年)。在更严格的标准 (FDR < 0.05, |log2FoldChange| > 1) 下,QUIC-seq 鉴定了 2281 个 O2 上调和 2913 个 O2 下调基因,与 PER-seq 相比,揭示了更多的 DEGs(图 S6)。此外,我们在 O2-GFP 和 PBF1-GFP 数据集中发现了 539 个 PBF1 激活的基因和 480 个 PBF1 抑制的基因,其中 88 个基因上调,217 个基因下调。在分析 BBM 的表达调控网络时,我们确定了 4 个功能与 BBM 相似的转录因子基因。QUIC-seq 用于文库构建和测序,导致在 bHLH48、WRKY95、GATA28 和 IFA1 中鉴定出 DEGs,最终发现了与这四种转录因子相关的 50 个共表达的调节基因(图 1k;图 S7)。