当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Calling small variants with universality and Bayesian-frequentist hybridism
bioRxiv - Bioinformatics Pub Date : 2021-02-22 , DOI: 10.1101/2020.08.23.263749
Xiaofei Zhao , Allison Hu , Sizhen Wang , Xiaoyue Wang

The accuracy of variant calling is crucially important in clinical settings, as the misdiagnosis of a genetic disease such as cancer can compromise patient survival. Although many variant callers were developed, variant-calling accuracy is still insufficient for clinical applications. Here we describe UVC, a method for calling small variants of germline or somatic origin. By combining contrary assumptions with sublation, we found two principles to improve variant calling. First, we discovered the following power-law universality: allele fraction is inversely proportional to the cubic root of variant-calling error rate. Second, we found that zero inflation can combine Bayesian and frequentist models of sequencing bias. We evaluated UVC with other state-of-the-art variant callers by considering a variety of calling modes (germline, somatic, tumor-only, and cell-free DNA with unique molecular identifiers (UMIs)), sequencing platforms (Illumina, BGI, and IonTorrent), sequencing types (whole-genome, whole-exome, and PCR-amplicon), human reference genomes (hg19, hs37d5, and GRCh38), aligners (BWA and NovoAlign), and representative sequencing depths and purities for both tumor and normal. UVC generally outperformed other germline variant callers on the GIAB germline truth sets. UVC strongly outperformed other somatic variant callers on 192 scenarios of in silico mixtures simulating 192 combinations of tumor/normal sequencing depths and tumor/normal purities. UVC strongly outperformed other somatic variant callers on the GIAB somatic truth sets derived from physical mixture and on the SEQC2 somatic reference sets derived from the breast-cancer cell-line HCC1395. UVC achieved 100% concordance with the manual review conducted by multiple independent researchers on a Qiagen 71-gene-panel dataset derived from 16 patients with colon adenoma. Additionally, UVC outperformed Mageri and smCounter2, the state-of-the-art UMI-aware variant callers, on the tumor-only datasets used for publishing these two variant callers. Performance is measured by using sensitivity-specificity trade off for all called variants. The improved variant calls generated by UVC from previously published UMI-based sequencing data are able to provide additional biological insight about DNA damage repair. UVC enables highly accurate calling of small variants from a variety of sequencing data, which can directly benefit patients in clinical settings. UVC is open-sourced under the BSD 3-Clause license at https://github.com/genetronhealth/uvc and quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694.

中文翻译:

用普遍性和贝叶斯-频率混合论称呼小变异

变体检出的准确性在临床环境中至关重要,因为对遗传疾病(例如癌症)的误诊会损害患者的生存。尽管开发了许多变体调用方,但变体调用精度仍不足以用于临床应用。在这里,我们描述了UVC,一种称为种系或体细胞来源的小变异的方法。通过将相反的假设与减法相结合,我们发现了两个改进变体调用的原理。首先,我们发现了以下幂律通用性:等位基因分数与变异调用错误率的立方根成反比。其次,我们发现零通货膨胀可以结合贝叶斯和序列偏差偏向模型。我们通过考虑多种调用方式(胚芽,体细胞,仅肿瘤且无细胞的DNA,具有独特的分子标识符(UMI),测序平台(Illumina,BGI和IonTorrent),测序类型(全基因组,全外显子组和PCR-amplicon),人类参考基因组( hg19,hs37d5和GRCh38),比对剂(BWA和NovoAlign)以及代表性的肿瘤和正常人的测序深度和纯度。在GIAB种系真相集上,UVC通常优于其他种系变体调用者。在模拟192种肿瘤/正常测序深度和肿瘤/正常纯度组合的计算机模拟混合物中,UVC的性能明显优于其他体细胞变异体。在物理混合物衍生的GIAB体细胞真集和乳腺癌细胞HCC1395的SEQC2体细胞参考集上,UVC的表现大大优于其他体细胞变异调用者。在多个独立研究人员对来自16例结肠腺瘤患者的Qiagen 71基因面板数据集进行的手动审查中,UVC达到了100%的一致性。此外,在用于发布这两个变异调用者的仅肿瘤数据集上,UVC的性能优于Mageri和smCounter2(最先进的UMI感知变异调用者)。通过对所有被称为变体的敏感性-特异性折衷来衡量性能。UVC从以前发布的基于UMI的测序数据生成的改进的变异调用能够提供有关DNA损伤修复的其他生物学见解。UVC可以从各种测序数据中高度准确地调用小变异体,这可以在临床环境中直接使患者受益。UVC根据BSD 3-条款许可在https:// github上开源。
更新日期:2021-02-22
down
wechat
bug