-
Considerations in the search for epistasis Genome Biol. (IF 10.1) Pub Date : 2024-11-19 Marleen Balvert, Johnathan Cooper-Knock, Julian Stamp, Ross P. Byrne, Soufiane Mourragui, Juami van Gils, Stefania Benonisdottir, Johannes Schlüter, Kevin Kenna, Sanne Abeln, Alfredo Iacoangeli, Joséphine T. Daub, Brian L. Browning, Gizem Taş, Jiajing Hu, Yan Wang, Elham Alhathli, Calum Harvey, Luna Pianesi, Sara C. Schulte, Jorge González-Domínguez, Erik Garrisson, Michael P. Snyder, Alexander Schönhuth
Epistasis refers to changes in the effect on phenotype of a unit of genetic information, such as a single nucleotide polymorphism or a gene, dependent on the context of other genetic units. Such interactions are both biologically plausible and good candidates to explain observations which are not fully explained by an additive heritability model. However, the search for epistasis has so far largely
-
Transcription of a centromere-enriched retroelement and local retention of its RNA are significant features of the CENP-A chromatin landscape Genome Biol. (IF 10.1) Pub Date : 2024-11-18 B. J. Chabot, R. Sun, A. Amjad, S. J. Hoyt, L. Ouyang, C. Courret, R. Drennan, L. Leo, A. M. Larracuente, L. J. Core, R. J. O’Neill, B. G. Mellone
Centromeres depend on chromatin containing the conserved histone H3 variant CENP-A for function and inheritance, while the role of centromeric DNA repeats remains unclear. Retroelements are prevalent at centromeres across taxa and represent a potential mechanism for promoting transcription to aid in CENP-A incorporation or for generating RNA transcripts to maintain centromere integrity. In this study
-
VI-VS: calibrated identification of feature dependencies in single-cell multiomics Genome Biol. (IF 10.1) Pub Date : 2024-11-15 Pierre Boyeau, Stephen Bates, Can Ergen, Michael I. Jordan, Nir Yosef
Unveiling functional relationships between various molecular cell phenotypes from data using machine learning models is a key promise of multiomics. Existing methods either use flexible but hard-to-interpret models or simpler, misspecified models. VI-VS (Variational Inference for Variable Selection) balances flexibility and interpretability to identify relevant feature relationships in multiomic data
-
Cohesin distribution alone predicts chromatin organization in yeast via conserved-current loop extrusion Genome Biol. (IF 10.1) Pub Date : 2024-11-14 Tianyu Yuan, Hao Yan, Kevin C. Li, Ivan Surovtsev, Megan C. King, Simon G. J. Mochrie
Inhomogeneous patterns of chromatin-chromatin contacts within 10–100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences
-
Adenine base editors induce off-target structure variations in mouse embryos and primary human T cells Genome Biol. (IF 10.1) Pub Date : 2024-11-11 Leilei Wu, Shutan Jiang, Meisong Shi, Tanglong Yuan, Yaqin Li, Pinzheng Huang, Yingqi Li, Erwei Zuo, Changyang Zhou, Yidi Sun
The safety of CRISPR-based gene editing methods is of the utmost priority in clinical applications. Previous studies have reported that Cas9 cleavage induced frequent aneuploidy in primary human T cells, but whether cleavage-mediated editing of base editors would generate off-target structure variations remains unknown. Here, we investigate the potential off-target structural variations associated
-
IAMSAM: image-based analysis of molecular signatures using the Segment Anything Model Genome Biol. (IF 10.1) Pub Date : 2024-11-11 Dongjoo Lee, Jeongbin Park, Seungho Cook, Seongjin Yoo, Daeseung Lee, Hongyoon Choi
Spatial transcriptomics is a cutting-edge technique that combines gene expression with spatial information, allowing researchers to study molecular patterns within tissue architecture. Here, we present IAMSAM, a user-friendly web-based tool for analyzing spatial transcriptomics data focusing on morphological features. IAMSAM accurately segments tissue images using the Segment Anything Model, allowing
-
SpottedPy quantifies relationships between spatial transcriptomic hotspots and uncovers environmental cues of epithelial-mesenchymal plasticity in breast cancer Genome Biol. (IF 10.1) Pub Date : 2024-11-11 Eloise Withnell, Maria Secrier
Spatial transcriptomics is revolutionizing the exploration of intratissue heterogeneity in cancer, yet capturing cellular niches and their spatial relationships remains challenging. We introduce SpottedPy, a Python package designed to identify tumor hotspots and map spatial interactions within the cancer ecosystem. Using SpottedPy, we examine epithelial-mesenchymal plasticity in breast cancer and highlight
-
scDOT: optimal transport for mapping senescent cells in spatial transcriptomics Genome Biol. (IF 10.1) Pub Date : 2024-11-08 Nam D. Nguyen, Lorena Rosas, Timur Khaliullin, Peiran Jiang, Euxhen Hasanaj, Jose A. Ovando-Ricardez, Marta Bueno, Irfan Rahman, Gloria S. Pryhuber, Dongmei Li, Qin Ma, Toren Finkel, Melanie Königshoff, Oliver Eickelberg, Mauricio Rojas, Ana L. Mora, Jose Lugo-Martinez, Ziv Bar-Joseph
The low resolution of spatial transcriptomics data necessitates additional information for optimal use. We developed scDOT, which combines spatial transcriptomics and single cell RNA sequencing to improve the ability to reconstruct single cell resolved spatial maps and identify senescent cells. scDOT integrates optimal transport and expression deconvolution to learn non-linear couplings between cells
-
GraphPCA: a fast and interpretable dimension reduction algorithm for spatial transcriptomics data Genome Biol. (IF 10.1) Pub Date : 2024-11-07 Jiyuan Yang, Lu Wang, Lin Liu, Xiaoqi Zheng
The rapid advancement of spatial transcriptomics technologies has revolutionized our understanding of cell heterogeneity and intricate spatial structures within tissues and organs. However, the high dimensionality and noise in spatial transcriptomic data present significant challenges for downstream data analyses. Here, we develop GraphPCA, an interpretable and quasi-linear dimension reduction algorithm
-
CaClust: linking genotype to transcriptional heterogeneity of follicular lymphoma using BCR and exomic variants Genome Biol. (IF 10.1) Pub Date : 2024-11-05 Kazimierz Oksza-Orzechowski, Edwin Quinten, Shadi Shafighi, Szymon M. Kiełbasa, Hugo W. van Kessel, Ruben A. L. de Groen, Joost S. P. Vermaat, Julieta H. Sepúlveda Yáñez, Marcelo A. Navarrete, Hendrik Veelken, Cornelis A. M. van Bergen, Ewa Szczurek
Tumours exhibit high genotypic and transcriptional heterogeneity. Both affect cancer progression and treatment, but have been predominantly studied separately in follicular lymphoma. To comprehensively investigate the evolution and genotype-to-phenotype maps in follicular lymphoma, we introduce CaClust, a probabilistic graphical model integrating deep whole exome, single-cell RNA and B-cell receptor
-
TDFPS-Designer: an efficient toolkit for barcode design and selection in nanopore sequencing Genome Biol. (IF 10.1) Pub Date : 2024-11-04 Junhai Qi, Zhengyi Li, Yao-zhong Zhang, Guojun Li, Xin Gao, Renmin Han
Oxford Nanopore Technologies (ONT) offers ultrahigh-throughput multi-sample sequencing but only provides barcode kits that enable up to 96-sample multiplexing. We present TDFPS-Designer, a new toolkit for nanopore sequencing barcode design, which creates significantly more barcodes: 137 with a length of 20 base pairs, 410 at 24 bp, and 1779 at 30 bp, far surpassing ONT’s offerings. It includes GPU-based
-
Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data Genome Biol. (IF 10.1) Pub Date : 2024-10-31 Xiaoting Li, Lucas A. N. Melo, Harmen J. Bussemaker
Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines
-
Response to "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" and "Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples" Genome Biol. (IF 10.1) Pub Date : 2024-10-30 Xinzhou Ge, Yumei Li, Wei Li, Jingyi Jessica Li
Two correspondences raised concerns or comments about our analyses regarding exaggerated false positives found by differential expression (DE) methods. Here, we discuss the points they raise and explain why we agree or disagree with these points. We add new analysis to confirm that the Wilcoxon rank-sum test remains the most robust method compared to the other five DE methods (DESeq2, edgeR, limma-voom
-
Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples Genome Biol. (IF 10.1) Pub Date : 2024-10-30 Lu Yang, Xianyang Zhang, Jun Chen
A recent study found severely inflated type I error rates for DESeq2 and edgeR, two dominant tools used for differential expression analysis of RNA-seq data. Here, we show that by properly addressing the outliers in the RNA-Seq data using winsorization, the type I error rate of DESeq2 and edgeR can be substantially reduced, and the power is comparable to Wilcoxon rank-sum test for large datasets. Therefore
-
Neglecting the impact of normalization in semi-synthetic RNA-seq data simulations generates artificial false positives Genome Biol. (IF 10.1) Pub Date : 2024-10-30 Boris P. Hejblum, Kalidou Ba, Rodolphe Thiébaut, Denis Agniel
A recent study reported exaggerated false positives by popular differential expression methods when analyzing large population samples. We reproduce the differential expression analysis simulation results and identify a caveat in the data generation process. Data not truly generated under the null hypothesis led to incorrect comparisons of benchmark methods. We provide corrected simulation results
-
pan-Draft: automated reconstruction of species-representative metabolic models from multiple genomes Genome Biol. (IF 10.1) Pub Date : 2024-10-25 Nicola De Bernardini, Guido Zampieri, Stefano Campanaro, Johannes Zimmermann, Silvio Waschina, Laura Treu
The accurate reconstruction of genome-scale metabolic models (GEMs) for unculturable species poses challenges due to the incomplete and fragmented genetic information typical of metagenome-assembled genomes (MAGs). While existing tools leverage sequence homology from single genomes, this study introduces pan-Draft, a pan-reactome-based approach exploiting recurrent genetic evidence to determine the
-
Plant conservation in the age of genome editing: opportunities and challenges Genome Biol. (IF 10.1) Pub Date : 2024-10-24 Kangquan Yin, Mi Yoon Chung, Bo Lan, Fang K. Du, Myong Gi Chung
Numerous plant taxa are threatened by habitat destruction or overexploitation. To overcome these threats, new methods are urgently needed for rescuing threatened and endangered plant species. Here, we review the genetic consequences of threats to species populations. We highlight potential advantages of genome editing for mitigating negative effects caused by new pathogens and pests or climate change
-
STASCAN deciphers fine-resolution cell distribution maps in spatial transcriptomics by deep learning Genome Biol. (IF 10.1) Pub Date : 2024-10-22 Ying Wu, Jia-Yi Zhou, Bofei Yao, Guanshen Cui, Yong-Liang Zhao, Chun-Chun Gao, Ying Yang, Shihua Zhang, Yun-Gui Yang
Spatial transcriptomics technologies have been widely applied to decode cellular distribution by resolving gene expression profiles in tissue. However, sequencing techniques still limit the ability to create a fine-resolved spatial cell-type map. To this end, we develop a novel deep-learning-based approach, STASCAN, to predict the spatial cellular distribution of captured or uncharted areas where only
-
Mapping lineage-traced cells across time points with moslin Genome Biol. (IF 10.1) Pub Date : 2024-10-21 Marius Lange, Zoe Piran, Michal Klein, Bastiaan Spanjaard, Dominik Klein, Jan Philipp Junker, Fabian J. Theis, Mor Nitzan
Simultaneous profiling of single-cell gene expression and lineage history holds enormous potential for studying cellular decision-making. Recent computational approaches combine both modalities into cellular trajectories; however, they cannot make use of all available lineage information in destructive time-series experiments. Here, we present moslin, a Gromov-Wasserstein-based model to couple cellular
-
A comprehensive study of genetic regulation and disease associations of plasma circulatory microRNAs using population-level data Genome Biol. (IF 10.1) Pub Date : 2024-10-21 Rima Mustafa, Michelle M. J. Mens, Arno van Hilten, Jian Huang, Gennady Roshchupkin, Tianxiao Huan, Linda Broer, Joyce B. J. van Meurs, Paul Elliott, Daniel Levy, M. Arfan Ikram, Marina Evangelou, Abbas Dehghan, Mohsen Ghanbari
MicroRNAs (miRNAs) are small non-coding RNAs that post-transcriptionally regulate gene expression. Perturbations in plasma miRNA levels are known to impact disease risk and have potential as disease biomarkers. Exploring the genetic regulation of miRNAs may yield new insights into their important role in governing gene expression and disease mechanisms. We present genome-wide association studies of
-
Scalable identification of lineage-specific gene regulatory networks from metacells with NetID Genome Biol. (IF 10.1) Pub Date : 2024-10-18 Weixu Wang, Yichen Wang, Ruiqi Lyu, Dominic Grün
The identification of gene regulatory networks (GRNs) is crucial for understanding cellular differentiation. Single-cell RNA sequencing data encode gene-level covariations at high resolution, yet data sparsity and high dimensionality hamper accurate and scalable GRN reconstruction. To overcome these challenges, we introduce NetID leveraging homogenous metacells while avoiding spurious gene–gene correlations
-
MHConstructor: a high-throughput, haplotype-informed solution to the MHC assembly challenge Genome Biol. (IF 10.1) Pub Date : 2024-10-17 Kristen J. Wade, Rayo Suseno, Kerry Kizer, Jacqueline Williams, Juliano Boquett, Stacy Caillier, Nicholas R. Pollock, Adam Renschen, Adam Santaniello, Jorge R. Oksenberg, Paul J. Norman, Danillo G. Augusto, Jill A. Hollenbach
The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short-read, de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly
-
HBI: a hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci incorporating priors from cell-sorted bisulfite sequencing data Genome Biol. (IF 10.1) Pub Date : 2024-10-15 Youshu Cheng, Biao Cai, Hongyu Li, Xinyu Zhang, Gypsyamber D’Souza, Sadeep Shrestha, Andrew Edmonds, Jacquelyn Meyers, Margaret Fischl, Seble Kassaye, Kathryn Anastos, Mardge Cohen, Bradley E. Aouizerat, Ke Xu, Hongyu Zhao
Methylation quantitative trait loci (meQTLs) quantify the effects of genetic variants on DNA methylation levels. However, most published studies utilize bulk methylation datasets composed of different cell types and limit our understanding of cell-type-specific methylation regulation. We propose a hierarchical Bayesian interaction (HBI) model to infer cell-type-specific meQTLs, which integrates a large-scale
-
Multi-omics reveals lactylation-driven regulatory mechanisms promoting tumor progression in oral squamous cell carcinoma Genome Biol. (IF 10.1) Pub Date : 2024-10-15 Fengyang Jing, Lijing Zhu, Jianyun Zhang, Xuan Zhou, Jiaying Bai, Xuefen Li, Heyu Zhang, Tiejun Li
Lactylation, a post-translational modification, is increasingly recognized for its role in cancer progression. This study investigates its prevalence and impact in oral squamous cell carcinoma (OSCC). Immunohistochemical staining of 81 OSCC cases shows lactylation levels correlate with malignancy grading. Proteomic analyses of six OSCC tissue pairs reveal 2765 lactylation sites on 1033 proteins, highlighting
-
SDePER: a hybrid machine learning and regression method for cell-type deconvolution of spatial barcoding-based transcriptomic data Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Yunqing Liu, Ningshan Li, Ji Qi, Gang Xu, Jiayi Zhao, Nating Wang, Xiayuan Huang, Wenhao Jiang, Huanhuan Wei, Aurélien Justet, Taylor S. Adams, Robert Homer, Amei Amei, Ivan O. Rosas, Naftali Kaminski, Zuoheng Wang, Xiting Yan
Spatial barcoding-based transcriptomic (ST) data require deconvolution for cellular-level downstream analysis. Here we present SDePER, a hybrid machine learning and regression method to deconvolve ST data using reference single-cell RNA sequencing (scRNA-seq) data. SDePER tackles platform effects between ST and scRNA-seq data, ensuring a linear relationship between them while addressing sparsity and
-
When less is more: sketching with minimizers in genomics Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Malick Ndiaye, Silvia Prieto-Baños, Lucy M. Fitzgerald, Ali Yazdizadeh Kharrazi, Sergey Oreshkov, Christophe Dessimoz, Fritz J. Sedlazeck, Natasha Glover, Sina Majidian
The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze
-
scCTS: identifying the cell type-specific marker genes from population-level single-cell RNA-seq Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Luxiao Chen, Zhenxing Guo, Tao Deng, Hao Wu
Single-cell RNA-sequencing (scRNA-seq) provides gene expression profiles of individual cells from complex samples, facilitating the detection of cell type-specific marker genes. In scRNA-seq experiments with multiple donors, the population level variation brings an extra layer of complexity in cell type-specific gene detection, for example, they may not appear in all donors. Motivated by this observation
-
The ribosome profiling landscape of yeast reveals a high diversity in pervasive translation Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Chris Papadopoulos, Hugo Arbes, David Cornu, Nicolas Chevrollier, Sandra Blanchet, Paul Roginski, Camille Rabier, Safiya Atia, Olivier Lespinet, Olivier Namy, Anne Lopes
Pervasive translation is a widespread phenomenon that plays a critical role in the emergence of novel microproteins, but the diversity of translation patterns contributing to their generation remains unclear. Based on 54 ribosome profiling (Ribo-Seq) datasets, we investigated the yeast Ribo-Seq landscape using a representation framework that allows the comprehensive inventory and classification of
-
zMAP toolset: model-based analysis of large-scale proteomic data via a variance stabilizing z-transformation Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Xiuqi Gui, Jing Huang, Linjie Ruan, Yanjun Wu, Xuan Guo, Ruifang Cao, Shuhan Zhou, Fengxiang Tan, Hongwen Zhu, Mushan Li, Guoqing Zhang, Hu Zhou, Lixing Zhan, Xin Liu, Shiqi Tu, Zhen Shao
Isobaric labeling-based mass spectrometry (ILMS) has been widely used to quantify, on a proteome-wide scale, the relative protein abundance in different biological conditions. However, large-scale ILMS data sets typically involve multiple runs of mass spectrometry, bringing great computational difficulty to the integration of ILMS samples. We present zMAP, a toolset that makes ILMS intensities comparable
-
Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Chloé Bessière, Haoliang Xue, Benoit Guibert, Anthony Boureux, Florence Rufflé, Julien Viot, Rayan Chikhi, Mikaël Salson, Camille Marchet, Thérèse Commes, Daniel Gautheret
Indexing techniques relying on k-mers have proven effective in searching for RNA sequences across thousands of RNA-seq libraries, but without enabling direct RNA quantification. We show here that arbitrary RNA sequences can be quantified in seconds through their decomposition into k-mers, with a precision akin to that of conventional RNA quantification methods. Using an index of the Cancer Cell Line
-
Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Mir Henglin, Maryam Ghareghani, William T. Harvey, David Porubsky, Sergey Koren, Evan E. Eichler, Peter Ebert, Tobias Marschall
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale
-
Spatiotemporal modeling reveals high-resolution invasion states in glioblastoma Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Varsha Thoppey Manoharan, Aly Abdelkareem, Gurveer Gill, Samuel Brown, Aaron Gillmor, Courtney Hall, Heewon Seo, Kiran Narta, Sean Grewal, Ngoc Ha Dang, Bo Young Ahn, Kata Osz, Xueqing Lun, Laura Mah, Franz Zemp, Douglas Mahoney, Donna L. Senger, Jennifer A. Chan, A. Sorana Morrissy
Diffuse invasion of glioblastoma cells through normal brain tissue is a key contributor to tumor aggressiveness, resistance to conventional therapies, and dismal prognosis in patients. A deeper understanding of how components of the tumor microenvironment (TME) contribute to overall tumor organization and to programs of invasion may reveal opportunities for improved therapeutic strategies. Towards
-
Systematic perturbations of SETD2, NSD1, NSD2, NSD3, and ASH1L reveal their distinct contributions to H3K36 methylation Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Gerry A. Shipman, Reinnier Padilla, Cynthia Horth, Bo Hu, Eric Bareke, Francisca N. Vitorino, Joanna M. Gongora, Benjamin A. Garcia, Chao Lu, Jacek Majewski
Methylation of histone 3 lysine 36 (H3K36me) has emerged as an essential epigenetic component for the faithful regulation of gene expression. Despite its importance in development and disease, how the molecular agents collectively shape the H3K36me landscape is unclear. We use mouse mesenchymal stem cells to perturb the H3K36me methyltransferases (K36MTs) and infer the activities of the five most prominent
-
Drought-responsive dynamics of H3K9ac-marked 3D chromatin interactions are integrated by OsbZIP23-associated super-enhancer-like promoter regions in rice Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Yu Chang, Jiahan Liu, Minrong Guo, Weizhi Ouyang, Jiapei Yan, Lizhong Xiong, Xingwang Li
In response to drought stress (DS), plants undergo complex processes that entail significant transcriptome reprogramming. However, the intricate relationship between the dynamic alterations in the three-dimensional (3D) genome and the modulation of gene co-expression in drought responses remains a relatively unexplored area. In this study, we reconstruct high-resolution 3D genome maps based on genomic
-
Improved detection of methylation in ancient DNA Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Susanna Sawyer, Pere Gelabert, Benjamin Yakir, Alejandro Llanos-Lizcano, Alessandra Sperduti, Luca Bondioli, Olivia Cheronet, Christine Neugebauer-Maresch, Maria Teschler-Nicola, Mario Novak, Ildikó Pap, Ildikó Szikossy, Tamás Hajdu, Vyacheslav Moiseyev, Andrey Gromov, Gunita Zariņa, Eran Meshorer, Liran Carmel, Ron Pinhasi
Reconstructing premortem DNA methylation levels in ancient DNA has led to breakthrough studies such as the prediction of anatomical features of the Denisovan. These studies rely on computationally inferring methylation levels from damage signals in naturally deaminated cytosines, which requires expensive high-coverage genomes. Here, we test two methods for direct methylation measurement developed for
-
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics Genome Biol. (IF 10.1) Pub Date : 2024-10-08 Zijie Zhao, Tim Gruenloh, Meiyi Yan, Yixuan Wu, Zhongxuan Sun, Jiacheng Miao, Yuchang Wu, Jie Song, Qiongshi Lu
Polygenic risk score (PRS) is a major research topic in human genetics. However, a significant gap exists between PRS methodology and applications in practice due to often unavailable individual-level data for various PRS tasks including model fine-tuning, benchmarking, and ensemble learning. We introduce an innovative statistical framework to optimize and benchmark PRS models using summary statistics
-
Visualizing scRNA-Seq data at population scale with GloScope Genome Biol. (IF 10.1) Pub Date : 2024-10-08 Hao Wang, William Torous, Boying Gong, Elizabeth Purdom
Increasingly, scRNA-Seq studies explore cell populations across different samples and the effect of sample heterogeneity on organism’s phenotype. However, relatively few bioinformatic methods have been developed which adequately address the variation between samples for such population-level analyses. We propose a framework for representing the entire single-cell profile of a sample, which we call
-
DEMINING: A deep learning model embedded framework to distinguish RNA editing from DNA mutations in RNA sequencing data Genome Biol. (IF 10.1) Pub Date : 2024-10-08 Zhi-Can Fu, Bao-Qing Gao, Fang Nan, Xu-Kai Ma, Li Yang
Precise calling of promiscuous adenosine-to-inosine RNA editing sites from transcriptomic datasets is hindered by DNA mutations and sequencing/mapping errors. Here, we present a stepwise computational framework, called DEMINING, to distinguish RNA editing and DNA mutations directly from RNA sequencing datasets, with an embedded deep learning model named DeepDDR. After transfer learning, DEMINING can
-
Integrated large-scale metagenome assembly and multi-kingdom network analyses identify sex differences in the human nasal microbiome Genome Biol. (IF 10.1) Pub Date : 2024-10-08 Yanmei Ju, Zhe Zhang, Mingliang Liu, Shutian Lin, Qiang Sun, Zewei Song, Weiting Liang, Xin Tong, Zhuye Jie, Haorong Lu, Kaiye Cai, Peishan Chen, Xin Jin, Wenwei Zhang, Xun Xu, Huanming Yang, Jian Wang, Yong Hou, Liang Xiao, Huijue Jia, Tao Zhang, Ruijin Guo
Respiratory diseases impose an immense health burden worldwide. Epidemiological studies have revealed extensive disparities in the incidence and severity of respiratory tract infections between men and women. It has been hypothesized that there might also be a nasal microbiome axis contributing to the observed sex disparities. Here, we study the nasal microbiome of healthy young adults in the largest
-
In vivo perturb-seq of cancer and microenvironment cells dissects oncologic drivers and radiotherapy responses in glioblastoma Genome Biol. (IF 10.1) Pub Date : 2024-10-07 S. John Liu, Christopher Zou, Joanna Pak, Alexandra Morse, Dillon Pang, Timothy Casey-Clyde, Ashir A. Borah, David Wu, Kyounghee Seo, Thomas O’Loughlin, Daniel A. Lim, Tomoko Ozawa, Mitchel S. Berger, Roarke A. Kamber, William A. Weiss, David R. Raleigh, Luke A. Gilbert
Genetic perturbation screens with single-cell readouts have enabled rich phenotyping of gene function and regulatory networks. These approaches have been challenging in vivo, especially in adult disease models such as cancer, which include mixtures of malignant and microenvironment cells. Glioblastoma (GBM) is a fatal cancer, and methods of systematically interrogating gene function and therapeutic
-
APC mutations dysregulate alternative polyadenylation in cancer Genome Biol. (IF 10.1) Pub Date : 2024-10-07 Austin M. Gabel, Andrea E. Belleville, James D. Thomas, Jose Mario Bello Pineda, Robert K. Bradley
Alternative polyadenylation (APA) affects most human genes and is recurrently dysregulated in all studied cancers. However, the mechanistic origins of this dysregulation are incompletely understood. We describe an unbiased analysis of molecular regulators of poly(A) site selection across The Cancer Genome Atlas and identify that colorectal adenocarcinoma is an outlier relative to all other cancer subtypes
-
Assessing and mitigating batch effects in large-scale omics studies Genome Biol. (IF 10.1) Pub Date : 2024-10-03 Ying Yu, Yuanbang Mai, Yuanting Zheng, Leming Shi
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In
-
Jointly benchmarking small and structural variant calls with vcfdist Genome Biol. (IF 10.1) Pub Date : 2024-10-02 Tim Dunn, Justin M. Zook, James M. Holt, Satish Narayanasamy
In this work, we extend vcfdist to be the first variant call benchmarking tool to jointly evaluate phased single-nucleotide polymorphisms (SNPs), small insertions/deletions (INDELs), and structural variants (SVs) for the whole genome. First, we find that a joint evaluation of small and structural variants uniformly reduces measured errors for SNPs (− 28.9%), INDELs (− 19.3%), and SVs (− 52.4%) across
-
A genome-wide association study reveals molecular mechanism underlying powdery mildew resistance in cucumber Genome Biol. (IF 10.1) Pub Date : 2024-10-02 Xuewen Xu, Yujiao Du, Suhao Li, Ming Tan, Hamza Sohail, Xueli Liu, Xiaohua Qi, Xiaodong Yang, Xuehao Chen
Powdery mildew is a disease with one of the most substantial impacts on cucumber production globally. The most efficient approach for controlling powdery mildew is the development of genetic resistance; however, few genes associated with inherent variations in cucumber powdery mildew resistance have been identified as of yet. In this study, we re-sequence 299 cucumber accessions, which are divided
-
Characterization of regeneration initiating cells during Xenopus laevis tail regeneration Genome Biol. (IF 10.1) Pub Date : 2024-10-01 Radek Sindelka, Ravindra Naraine, Pavel Abaffy, Daniel Zucha, Daniel Kraus, Jiri Netusil, Karel Smetana, Lukas Lacina, Berwini Beduya Endaya, Jiri Neuzil, Martin Psenicka, Mikael Kubista
Embryos are regeneration and wound healing masters. They rapidly close wounds and scarlessly remodel and regenerate injured tissue. Regeneration has been extensively studied in many animal models using new tools such as single-cell analysis. However, until now, they have been based primarily on experiments assessing from 1 day post injury. In this paper, we reveal that critical steps initiating regeneration
-
Loss of Lateral suppressor gene is associated with evolution of root nodule symbiosis in Leguminosae Genome Biol. (IF 10.1) Pub Date : 2024-09-30 Tengfei Liu, Zhi Liu, Jingwei Fan, Yaqin Yuan, Haiyue Liu, Wenfei Xian, Shuaiying Xiang, Xia Yang, Yucheng Liu, Shulin Liu, Min Zhang, Yuannian Jiao, Shifeng Cheng, Jeff J. Doyle, Fang Xie, Jiayang Li, Zhixi Tian
Root nodule symbiosis (RNS) is a fascinating evolutionary event. Given that limited genes conferring the evolution of RNS in Leguminosae have been functionally validated, the genetic basis of the evolution of RNS remains largely unknown. Identifying the genes involved in the evolution of RNS will help to reveal the mystery. Here, we investigate the gene loss event during the evolution of RNS in Leguminosae
-
Massive detection of cryptic recessive genetic defects in dairy cattle mining millions of life histories Genome Biol. (IF 10.1) Pub Date : 2024-09-30 Florian Besnard, Ana Guintard, Cécile Grohs, Laurence Guzylack-Piriou, Margarita Cano, Clémentine Escouflaire, Chris Hozé, Hélène Leclerc, Thierry Buronfosse, Lucie Dutheil, Jeanlin Jourdain, Anne Barbat, Sébastien Fritz, Marie-Christine Deloche, Aude Remot, Blandine Gaussères, Adèle Clément, Marion Bouchier, Elise Contat, Anne Relun, Vincent Plassard, Julie Rivière, Christine Péchoux, Marthe Vilotte
Dairy cattle breeds are populations of limited effective size, subject to recurrent outbreaks of recessive defects that are commonly studied using positional cloning. However, this strategy, based on the observation of animals with characteristic features, may overlook a number of conditions, such as immune or metabolic genetic disorders, which may be confused with pathologies of environmental etiology
-
Author Correction: NERD-seq: a novel approach of Nanopore direct RNA sequencing that expands representation of non-coding RNAs Genome Biol. (IF 10.1) Pub Date : 2024-09-26 Luke Saville, Li Wu, Jemaneh Habtewold, Yubo Cheng, Babita Gollen, Liam Mitchell, Matthew Stuart-Edwards, Travis Haight, Majid Mohajerani, Athanasios Zovoilis
Correction: Genome Biol 25, 233 (2024) https://doi.org/10.1186/s13059-024-03375-8 Following publication of the original article [1], the authors identified two typos in affiliation 3 and 4. The incorrect and correct affiliations are given below. Incorrect: 3 Southern Alberta Genome Sciences Centre, Lethbridge, AB, T1K3M4, Canada Correct: 3 Southern Alberta Genome Sciences Centre, University of Lethbridge
-
A realistic benchmark for differential abundance testing and confounder adjustment in human microbiome studies Genome Biol. (IF 10.1) Pub Date : 2024-09-25 Jakob Wirbel, Morgan Essex, Sofia Kirke Forslund, Georg Zeller
In microbiome disease association studies, it is a fundamental task to test which microbes differ in their abundance between groups. Yet, consensus on suitable or optimal statistical methods for differential abundance testing is lacking, and it remains unexplored how these cope with confounding. Previous differential abundance benchmarks relying on simulated datasets did not quantitatively evaluate
-
Recruitment of the m6A/m6Am demethylase FTO to target RNAs by the telomeric zinc finger protein ZBTB48 Genome Biol. (IF 10.1) Pub Date : 2024-09-19 Syed Nabeel-Shah, Shuye Pu, Giovanni L. Burke, Nujhat Ahmed, Ulrich Braunschweig, Shaghayegh Farhangmehr, Hyunmin Lee, Mingkun Wu, Zuyao Ni, Hua Tang, Guoqing Zhong, Edyta Marcon, Zhaolei Zhang, Benjamin J. Blencowe, Jack F. Greenblatt
N6-methyladenosine (m6A), the most abundant internal modification on eukaryotic mRNA, and N6, 2′-O-dimethyladenosine (m6Am), are epitranscriptomic marks that function in multiple aspects of posttranscriptional regulation. Fat mass and obesity-associated protein (FTO) can remove both m6A and m6Am; however, little is known about how FTO achieves its substrate selectivity. Here, we demonstrate that ZBTB48
-
A dynamic regulome of shoot-apical-meristem-related homeobox transcription factors modulates plant architecture in maize Genome Biol. (IF 10.1) Pub Date : 2024-09-19 Zi Luo, Leiming Wu, Xinxin Miao, Shuang Zhang, Ningning Wei, Shiya Zhao, Xiaoyang Shang, Hongyan Hu, Jiquan Xue, Tifu Zhang, Fang Yang, Shutu Xu, Lin Li
The shoot apical meristem (SAM), from which all above-ground tissues of plants are derived, is critical to plant morphology and development. In maize (Zea mays), loss-of-function mutant studies have identified several SAM-related genes, most encoding homeobox transcription factors (TFs), located upstream of hierarchical networks of hundreds of genes. Here, we collect 46 transcriptome and 16 translatome
-
Atlas of telomeric repeat diversity in Arabidopsis thaliana Genome Biol. (IF 10.1) Pub Date : 2024-09-16 Yueqi Tao, Wenfei Xian, Zhigui Bao, Fernando A. Rabanal, Andrea Movilli, Christa Lanz, Gautam Shirsekar, Detlef Weigel
Telomeric repeat arrays at the ends of chromosomes are highly dynamic in composition, but their repetitive nature and technological limitations have made it difficult to assess their true variation in genome diversity surveys. We have comprehensively characterized the sequence variation immediately adjacent to the canonical telomeric repeat arrays at the very ends of chromosomes in 74 genetically diverse
-
Splam: a deep-learning-based splice site predictor that improves spliced alignments Genome Biol. (IF 10.1) Pub Date : 2024-09-16 Kuan-Hao Chao, Alan Mao, Steven L. Salzberg, Mihaela Pertea
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals
-
ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets Genome Biol. (IF 10.1) Pub Date : 2024-09-16 Sarah M. Goggin, Eli R. Zunder
Clustering is widely used for single-cell analysis, but current methods are limited in accuracy, robustness, ease of use, and interpretability. To address these limitations, we developed an ensemble clustering method that outperforms other methods at hard clustering without the need for hyperparameter tuning. It also performs soft clustering to characterize continuum-like regions and quantify clustering
-
Dimension reduction, cell clustering, and cell–cell communication inference for single-cell transcriptomics with DcjComm Genome Biol. (IF 10.1) Pub Date : 2024-09-09 Qian Ding, Wenyi Yang, Guangfu Xue, Hongxin Liu, Yideng Cai, Jinhao Que, Xiyun Jin, Meng Luo, Fenglan Pang, Yuexin Yang, Yi Lin, Yusong Liu, Haoxiu Sun, Renjie Tan, Pingping Wang, Zhaochun Xu, Qinghua Jiang
Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell–cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell
-
A comprehensive map of the aging blood methylome in humans Genome Biol. (IF 10.1) Pub Date : 2024-09-06 Kirsten Seale, Andrew Teschendorff, Alexander P. Reiner, Sarah Voisin, Nir Eynon
During aging, the human methylome undergoes both differential and variable shifts, accompanied by increased entropy. The distinction between variably methylated positions (VMPs) and differentially methylated positions (DMPs), their contribution to epigenetic age, and the role of cell type heterogeneity remain unclear. We conduct a comprehensive analysis of > 32,000 human blood methylomes from 56 datasets
-
DeepKINET: a deep generative model for estimating single-cell RNA splicing and degradation rates Genome Biol. (IF 10.1) Pub Date : 2024-09-06 Chikara Mizukoshi, Yasuhiro Kojima, Satoshi Nomura, Shuto Hayashi, Ko Abe, Teppei Shimamura
Messenger RNA splicing and degradation are critical for gene expression regulation, the abnormality of which leads to diseases. Previous methods for estimating kinetic rates have limitations, assuming uniform rates across cells. DeepKINET is a deep generative model that estimates splicing and degradation rates at single-cell resolution from scRNA-seq data. DeepKINET outperforms existing methods on
-
Author Correction: A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data Genome Biol. (IF 10.1) Pub Date : 2024-09-04 Alessandro Vinceti, Rafaele M. Iannuzzi, Isabella Boyle, Lucia Trastulla, Catarina D. Campbell, Francisca Vazquez, Joshua M. Dempster, Francesco Iorio
Correction: Genome Biol 25, 192 (2024) https://doi.org/10.1186/s13059-024-03336-1 Following publication of the original article [1], the authors identified an omission in the completing interests section. The omitted text is given in bold below. Competing interests FI receives funding from Open Targets, a public-private initiative involving academia and industry and performs consultancy for the joint
-
Publisher Correction: scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis Genome Biol. (IF 10.1) Pub Date : 2024-09-04 Kai Zhao, Hon-Cheong So, Zhixiang Lin
Publisher Correction: Genome Biol 25, 223 (2024) https://doi.org/10.1186/s13059-024-03345-0 Following publication of the original article [1], the authors identified a typesetting error in Eq. 3, 4 and 10, as well as in Algorithm 1 equation. An erroneous “ll” was typeset at the start of the equations. The incorrect and corrected versions are published in this correction article. Incorrect equation
-
Improved simultaneous mapping of epigenetic features and 3D chromatin structure via ViCAR Genome Biol. (IF 10.1) Pub Date : 2024-09-03 Sean M. Flynn, Somdutta Dhir, Krzysztof Herka, Colm Doyle, Larry Melidis, Angela Simeone, Winnie W. I. Hui, Rafael de Cesaris Araujo Tavares, Stefan Schoenfelder, David Tannahill, Shankar Balasubramanian
Methods to measure chromatin contacts at genomic regions bound by histone modifications or proteins are important tools to investigate chromatin organization. However, such methods do not capture the possible involvement of other epigenomic features such as G-quadruplex DNA secondary structures (G4s). To bridge this gap, we introduce ViCAR (viewpoint HiCAR), for the direct antibody-based capture of