-
Multi-omics approaches reveal that diffuse midline gliomas present altered DNA replication and are susceptible to replication stress therapy Genome Biol. (IF 10.1) Pub Date : 2024-12-20 Anastasia E. Hains, Kashish Chetal, Tsunetoshi Nakatani, Joana G. Marques, Andreas Ettinger, Carlos A. O. Biagi Junior, Adriana Gonzalez-Sandoval, Renjitha Pillai, Mariella G. Filbin, Maria-Elena Torres-Padilla, Ruslan I. Sadreyev, Capucine Van Rechem
The fatal diffuse midline gliomas (DMG) are characterized by an undruggable H3K27M mutation in H3.1 or H3.3. K27M impairs normal development by stalling differentiation. The identification of targetable pathways remains very poorly explored. Toward this goal, we undertake a multi-omics approach to evaluate replication timing profiles, transcriptomics, and cell cycle features in DMG cells from both
-
Systematic evaluation of methylation-based cell type deconvolution methods for plasma cell-free DNA Genome Biol. (IF 10.1) Pub Date : 2024-12-19 Tongyue Sun, Jinqi Yuan, Yacheng Zhu, Jingqi Li, Shen Yang, Junpeng Zhou, Xinzhou Ge, Susu Qu, Wei Li, Jingyi Jessica Li, Yumei Li
Plasma cell-free DNA (cfDNA) is derived from cellular death in various tissues. Investigating the tissue origin of cfDNA through cell type deconvolution, we can detect changes in tissue homeostasis that occur during disease progression or in response to treatment. Consequently, cfDNA has emerged as a valuable noninvasive biomarker for disease detection and treatment monitoring. Although there are many
-
TEMPTED: time-informed dimensionality reduction for longitudinal microbiome studies Genome Biol. (IF 10.1) Pub Date : 2024-12-19 Pixu Shi, Cameron Martino, Rungang Han, Stefan Janssen, Gregory Buck, Myrna Serrano, Kouros Owzar, Rob Knight, Liat Shenhav, Anru R. Zhang
Longitudinal studies are crucial for understanding complex microbiome dynamics and their link to health. We introduce TEMPoral TEnsor Decomposition (TEMPTED), a time-informed dimensionality reduction method for high-dimensional longitudinal data that treats time as a continuous variable, effectively characterizing temporal information and handling varying temporal sampling. TEMPTED captures key microbial
-
HIPSD&R-seq enables scalable genomic copy number and transcriptome profiling Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Jan Otoničar, Olga Lazareva, Jan-Philipp Mallm, Milena Simovic-Lorenz, George Philippos, Pooja Sant, Urja Parekh, Linda Hammann, Albert Li, Umut Yildiz, Mikael Marttinen, Judith Zaugg, Kyung Min Noh, Oliver Stegle, Aurélie Ernst
Single-cell DNA sequencing (scDNA-seq) enables decoding somatic cancer variation. Existing methods are hampered by low throughput or cannot be combined with transcriptome sequencing in the same cell. We propose HIPSD&R-seq (HIgh-throughPut Single-cell Dna and Rna-seq), a scalable yet simple and accessible assay to profile low-coverage DNA and RNA in thousands of cells in parallel. Our approach builds
-
GenomeDelta: detecting recent transposable element invasions without repeat library Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Riccardo Pianezza, Anna Haider, Robert Kofler
We present GenomeDelta, a novel tool for identifying sample-specific sequences, such as recent transposable element (TE) invasions, without requiring a repeat library. GenomeDelta compares high-quality assemblies with short-read data to detect sequences absent from the short reads. It is applicable to both model and non-model organisms and can identify recent TE invasions, spatially heterogeneous sequences
-
SQUiD: ultra-secure storage and analysis of genetic data for the advancement of precision medicine Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Jacob Blindenbach, Jiayi Kang, Seungwan Hong, Caline Karam, Thomas Lehner, Gamze Gürsoy
Cloud computing allows storing the ever-growing genotype-phenotype datasets crucial for precision medicine. Due to the sensitive nature of this data and varied laws and regulations, additional security measures are needed to ensure data privacy. We develop SQUiD, a secure queryable database for storing and analyzing genotype-phenotype data. SQUiD allows storage and secure querying of data in a low-security
-
Transcriptional regulatory network reveals key transcription factors for regulating agronomic traits in soybean Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Wu Jiao, Mangmang Wang, Yijian Guan, Wei Guo, Chang Zhang, Yuanchun Wei, Zhenwei Zhao, Hongyu Ma, Longfei Wang, Xinyu Jiang, Wenxue Ye, Dong Cao, Qingxin Song
Transcription factors (TFs) bind regulatory genomic regions to orchestrate spatio-temporal expression of target genes. Global dissection of the cistrome is critical for elucidating transcriptional networks underlying complex agronomic traits in crops. Here, we generate a comprehensive genome-wide binding map for 148 TFs using DNA affinity purification sequencing in soybean. We find TF binding sites
-
Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Prasad Sarashetti, Josipa Lipovac, Filip Tomas, Mile Šikić, Jianjun Liu
Long-read technologies from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have transformed genomics research by providing diverse data types like HiFi, Duplex, and ultra-long ONT. Despite recent strides in achieving haplotype-phased gapless genome assemblies using long-read technologies, concerns persist regarding the representation of genetic diversity, prompting the development
-
Cas12f1 gene drives propagate efficiently in herpesviruses and induce minimal resistance Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Zhuangjie Lin, Qiaorui Yao, Keyuan Lai, Kehua Jiao, Xianying Zeng, Guanxiong Lei, Tongwen Zhang, Hongsheng Dai
Synthetic CRISPR-Cas9 gene drive has been developed to control harmful species. However, resistance to Cas9 gene drive can be acquired easily when DNA repair mechanisms patch up the genetic insults introduced by Cas9 and incorporate mutations to the sgRNA target. Although many strategies to reduce the occurrence of resistance have been developed so far, they are difficult to implement and not always
-
EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Zijing Gao, Qiao Liu, Wanwen Zeng, Rui Jiang, Wing Hung Wong
The inherent similarities between natural language and biological sequences have inspired the use of large language models in genomics, but current models struggle to incorporate chromatin interactions or predict in unseen cellular contexts. To address this, we propose EpiGePT, a transformer-based model designed for predicting context-specific human epigenomic signals. By incorporating transcription
-
SyntheVAEiser: augmenting traditional machine learning methods with VAE-based gene expression sample generation for improved cancer subtype predictions Genome Biol. (IF 10.1) Pub Date : 2024-12-18 Brian Karlberg, Raphael Kirchgaessner, Jordan Lee, Matthew Peterkort, Liam Beckman, Jeremy Goecks, Kyle Ellrott
The accuracy of machine learning methods is often limited by the amount of training data that is available. We proposed to improve machine learning training regimes by augmenting datasets with synthetically generated samples. We present a method for synthesizing gene expression samples and test the system’s capabilities for improving the accuracy of categorical prediction of cancer subtypes. We developed
-
Increased spatial coupling of integrin and collagen IV in the immunoresistant clear-cell renal-cell carcinoma tumor microenvironment Genome Biol. (IF 10.1) Pub Date : 2024-12-05 Alex C. Soupir, Mitchell T. Hayes, Taylor C. Peak, Oscar Ospina, Nicholas H. Chakiryan, Anders E. Berglund, Paul A. Stewart, Jonathan Nguyen, Carlos Moran Segura, Natasha L. Francis, Paola M. Ramos Echevarria, Jad Chahoud, Roger Li, Kenneth Y. Tsai, Jodi A. Balasi, Yamila Caraballo Peres, Jasreman Dhillon, Lindsey A. Martinez, Warren E. Gloria, Nathan Schurman, Sean Kim, Mark Gregory, James Mulé, Brooke
Immunotherapy has improved survival for patients with advanced clear cell renal cell carcinoma (ccRCC), but resistance to therapy develops in most patients. We use cellular-resolution spatial transcriptomics in patients with immunotherapy naïve and exposed primary ccRCC tumors to better understand immunotherapy resistance. Spatial molecular imaging of tumor and adjacent stroma samples from 21 tumors
-
Systemic evaluation of various CRISPR/Cas13 orthologs for knockdown of targeted transcripts in plants Genome Biol. (IF 10.1) Pub Date : 2024-12-05 Lu Yu, Jiawei Zou, Amjad Hussain, Ruoyu Jia, Yibo Fan, Jinhang Liu, Xinhui Nie, Xianlong Zhang, Shuangxia Jin
CRISPR/Cas13 system, recognized for its compact size and specificity in targeting RNA, is currently employed for RNA degradation. However, the potential of various CRISPR/Cas13 subtypes, particularly concerning the knockdown of endogenous transcripts, remains to be comprehensively characterized in plants. Here we present a full spectrum of editing profiles for seven Cas13 orthologs from five distinct
-
Chromatin loops gather targets of upstream regulators together for efficient gene transcription regulation during vernalization in wheat Genome Biol. (IF 10.1) Pub Date : 2024-12-03 Yanyan Liu, Xintong Xu, Chao He, Liujie Jin, Ziru Zhou, Jie Gao, Minrong Guo, Xin Wang, Chuanye Chen, Mohammed H. Ayaad, Xingwang Li, Wenhao Yan
Plants respond to environmental stimuli by altering gene transcription that is highly related with chromatin status, including histone modification, chromatin accessibility, and three-dimensional chromatin interaction. Vernalization is essential for the transition to reproductive growth for winter wheat. How wheat reshapes its chromatin features, especially chromatin interaction during vernalization
-
EpiCHAOS: a metric to quantify epigenomic heterogeneity in single-cell data Genome Biol. (IF 10.1) Pub Date : 2024-12-03 Katherine Kelly, Michael Scherer, Martina Maria Braun, Pavlo Lutsik, Christoph Plass
Epigenetic heterogeneity is a fundamental property of biological systems and is recognized as a potential driver of tumor plasticity and therapy resistance. Single-cell epigenomics technologies have been widely employed to study epigenetic variation between—but not within—cellular clusters. We introduce epiCHAOS: a quantitative metric of cell-to-cell heterogeneity, applicable to any single-cell epigenomics
-
SMART: spatial transcriptomics deconvolution using marker-gene-assisted topic model Genome Biol. (IF 10.1) Pub Date : 2024-12-02 Chen Xi Yang, Don D. Sin, Raymond T. Ng
While spatial transcriptomics offer valuable insights into gene expression patterns within the spatial context of tissue, many technologies do not have a single-cell resolution. Here, we present SMART, a marker gene-assisted deconvolution method that simultaneously infers the cell type-specific gene expression profile and the cellular composition at each spot. Using multiple datasets, we show that
-
MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis, and allostery from deep mutational scanning data Genome Biol. (IF 10.1) Pub Date : 2024-12-02 Andre J. Faure, Ben Lehner
We present MoCHI, a tool to fit interpretable models using deep mutational scanning data. MoCHI infers free energy changes, as well as interaction terms (energetic couplings) for specified biophysical models, including from multimodal phenotypic data. When a user-specified model is unavailable, global nonlinearities (epistasis) can be estimated from the data. MoCHI also leverages ensemble, background-averaged
-
HTAD: a human-in-the-loop framework for supervised chromatin domain detection Genome Biol. (IF 10.1) Pub Date : 2024-12-02 Wei Shen, Ping Zhang, Yiwei Jiang, Hailin Tao, Zhike Zi, Li Li
Topologically associating domains (TADs) are essential units of genome architecture, influencing transcriptional regulation and diseases. Despite numerous methods proposed for TAD identification, it remains challenging due to complex background and nested TAD structures. We introduce HTAD, a human-in-the-loop TAD caller that combines machine learning with human supervision to achieve high accuracy
-
Functional screening reveals genetic dependencies and diverging cell cycle control in atypical teratoid rhabdoid tumors Genome Biol. (IF 10.1) Pub Date : 2024-12-02 Daniel J. Merk, Foteini Tsiami, Sophie Hirsch, Bianca Walter, Lara A. Haeusser, Jens D. Maile, Aaron Stahl, Mohamed A. Jarboui, Anna Lechado-Terradas, Franziska Klose, Sepideh Babaei, Jakob Admard, Nicolas Casadei, Cristiana Roggia, Michael Spohn, Jens Schittenhelm, Stephan Singer, Ulrich Schüller, Federica Piccioni, Nicole S. Persky, Manfred Claassen, Marcos Tatagiba, Philipp J. Kahle, David E. Root
Atypical teratoid rhabdoid tumors (ATRT) are incurable high-grade pediatric brain tumors. Despite intensive research efforts, the prognosis for ATRT patients under currently established treatment protocols is poor. While novel therapeutic strategies are urgently needed, the generation of molecular-driven treatment concepts is a challenge mainly due to the absence of actionable genetic alterations.
-
Genetic-by-age interaction analyses on complex traits in UK Biobank and their potential to identify effects on longitudinal trait change Genome Biol. (IF 10.1) Pub Date : 2024-11-28 Thomas W. Winkler, Simon Wiegrebe, Janina M. Herold, Klaus J. Stark, Helmut Küchenhoff, Iris M. Heid
Genome-wide association studies (GWAS) have identified thousands of loci for disease-related human traits in cross-sectional data. However, the impact of age on genetic effects is underacknowledged. Also, identifying genetic effects on longitudinal trait change has been hampered by small sample sizes for longitudinal data. Such effects on deteriorating trait levels over time or disease progression
-
Hierarchical annotation of eQTLs by H-eQTL enables identification of genes with cell type-divergent regulation Genome Biol. (IF 10.1) Pub Date : 2024-11-25 Pawel F. Przytycki, Katherine S. Pollard
While context-type-specific regulation of genes is largely determined by cis-regulatory regions, attempts to identify cell type-specific eQTLs are complicated by the nested nature of cell types. We present hierarchical eQTL (H-eQTL), a network-based model for hierarchical annotation of bulk-derived eQTLs to levels of a cell type tree using single-cell chromatin accessibility data and no clustering
-
Publisher Correction: Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing Genome Biol. (IF 10.1) Pub Date : 2024-11-25 Jianxia Niu, Wenxi Wang, Zihao Wang, Zhe Chen, Xiaoyu Zhang, Zhen Qin, Lingfeng Miao, Zhengzhao Yang, Chaojie Xie, Mingming Xin, Huiru Peng, Yingyin Yao, Jie Liu, Zhongfu Ni, Qixin Sun, Weilong Guo
Correction: Genome Biol 25, 171 (2024) https://doi.org/10.1186/s13059-024-03315-6 Following publication of the original article [1], the authors identified a typesetting error, whereby the equal contribution statement was mistakenly omitted. The correct statement is as follow: Jianxia Niu, Wenxi Wang and Zihao Wang are co-first authors and contributed equally. The original article [1] has been corrected
-
scStateDynamics: deciphering the drug-responsive tumor cell state dynamics by modeling single-cell level expression changes Genome Biol. (IF 10.1) Pub Date : 2024-11-21 Wenbo Guo, Xinqi Li, Dongfang Wang, Nan Yan, Qifan Hu, Fan Yang, Xuegong Zhang, Jianhua Yao, Jin Gu
Understanding tumor cell heterogeneity and plasticity is crucial for overcoming drug resistance. Single-cell technologies enable analyzing cell states at a given condition, but catenating static cell snapshots to characterize dynamic drug responses remains challenging. Here, we propose scStateDynamics, an algorithm to infer tumor cell state dynamics and identify common drug effects by modeling single-cell
-
The genomic portrait of the Picene culture provides new insights into the Italic Iron Age and the legacy of the Roman Empire in Central Italy Genome Biol. (IF 10.1) Pub Date : 2024-11-21 Francesco Ravasini, Helja Kabral, Anu Solnik, Luciana de Gennaro, Francesco Montinaro, Ruoyun Hui, Chiara Delpino, Stefano Finocchi, Pierluigi Giroldini, Oscar Mei, Michael Allen Beck De Lotto, Elisabetta Cilli, Mogge Hajiesmaeil, Letizia Pistacchia, Flavia Risi, Chiara Giacometti, Christiana Lyn Scheib, Kristiina Tambets, Mait Metspalu, Fulvio Cruciani, Eugenia D’Atanasio, Beniamino Trombetta
The Italic Iron Age is characterized by the presence of various ethnic groups partially examined from a genomic perspective. To explore the evolution of Iron Age Italic populations and the genetic impact of Romanization, we focus on the Picenes, one of the most fascinating pre-Roman civilizations, who flourished on the Middle Adriatic side of Central Italy between the 9th and the 3rd century BCE, until
-
Considerations in the search for epistasis Genome Biol. (IF 10.1) Pub Date : 2024-11-19 Marleen Balvert, Johnathan Cooper-Knock, Julian Stamp, Ross P. Byrne, Soufiane Mourragui, Juami van Gils, Stefania Benonisdottir, Johannes Schlüter, Kevin Kenna, Sanne Abeln, Alfredo Iacoangeli, Joséphine T. Daub, Brian L. Browning, Gizem Taş, Jiajing Hu, Yan Wang, Elham Alhathli, Calum Harvey, Luna Pianesi, Sara C. Schulte, Jorge González-Domínguez, Erik Garrisson, Michael P. Snyder, Alexander Schönhuth
Epistasis refers to changes in the effect on phenotype of a unit of genetic information, such as a single nucleotide polymorphism or a gene, dependent on the context of other genetic units. Such interactions are both biologically plausible and good candidates to explain observations which are not fully explained by an additive heritability model. However, the search for epistasis has so far largely
-
Transcription of a centromere-enriched retroelement and local retention of its RNA are significant features of the CENP-A chromatin landscape Genome Biol. (IF 10.1) Pub Date : 2024-11-18 B. J. Chabot, R. Sun, A. Amjad, S. J. Hoyt, L. Ouyang, C. Courret, R. Drennan, L. Leo, A. M. Larracuente, L. J. Core, R. J. O’Neill, B. G. Mellone
Centromeres depend on chromatin containing the conserved histone H3 variant CENP-A for function and inheritance, while the role of centromeric DNA repeats remains unclear. Retroelements are prevalent at centromeres across taxa and represent a potential mechanism for promoting transcription to aid in CENP-A incorporation or for generating RNA transcripts to maintain centromere integrity. In this study
-
VI-VS: calibrated identification of feature dependencies in single-cell multiomics Genome Biol. (IF 10.1) Pub Date : 2024-11-15 Pierre Boyeau, Stephen Bates, Can Ergen, Michael I. Jordan, Nir Yosef
Unveiling functional relationships between various molecular cell phenotypes from data using machine learning models is a key promise of multiomics. Existing methods either use flexible but hard-to-interpret models or simpler, misspecified models. VI-VS (Variational Inference for Variable Selection) balances flexibility and interpretability to identify relevant feature relationships in multiomic data
-
Cohesin distribution alone predicts chromatin organization in yeast via conserved-current loop extrusion Genome Biol. (IF 10.1) Pub Date : 2024-11-14 Tianyu Yuan, Hao Yan, Kevin C. Li, Ivan Surovtsev, Megan C. King, Simon G. J. Mochrie
Inhomogeneous patterns of chromatin-chromatin contacts within 10–100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences
-
Adenine base editors induce off-target structure variations in mouse embryos and primary human T cells Genome Biol. (IF 10.1) Pub Date : 2024-11-11 Leilei Wu, Shutan Jiang, Meisong Shi, Tanglong Yuan, Yaqin Li, Pinzheng Huang, Yingqi Li, Erwei Zuo, Changyang Zhou, Yidi Sun
The safety of CRISPR-based gene editing methods is of the utmost priority in clinical applications. Previous studies have reported that Cas9 cleavage induced frequent aneuploidy in primary human T cells, but whether cleavage-mediated editing of base editors would generate off-target structure variations remains unknown. Here, we investigate the potential off-target structural variations associated
-
IAMSAM: image-based analysis of molecular signatures using the Segment Anything Model Genome Biol. (IF 10.1) Pub Date : 2024-11-11 Dongjoo Lee, Jeongbin Park, Seungho Cook, Seongjin Yoo, Daeseung Lee, Hongyoon Choi
Spatial transcriptomics is a cutting-edge technique that combines gene expression with spatial information, allowing researchers to study molecular patterns within tissue architecture. Here, we present IAMSAM, a user-friendly web-based tool for analyzing spatial transcriptomics data focusing on morphological features. IAMSAM accurately segments tissue images using the Segment Anything Model, allowing
-
SpottedPy quantifies relationships between spatial transcriptomic hotspots and uncovers environmental cues of epithelial-mesenchymal plasticity in breast cancer Genome Biol. (IF 10.1) Pub Date : 2024-11-11 Eloise Withnell, Maria Secrier
Spatial transcriptomics is revolutionizing the exploration of intratissue heterogeneity in cancer, yet capturing cellular niches and their spatial relationships remains challenging. We introduce SpottedPy, a Python package designed to identify tumor hotspots and map spatial interactions within the cancer ecosystem. Using SpottedPy, we examine epithelial-mesenchymal plasticity in breast cancer and highlight
-
scDOT: optimal transport for mapping senescent cells in spatial transcriptomics Genome Biol. (IF 10.1) Pub Date : 2024-11-08 Nam D. Nguyen, Lorena Rosas, Timur Khaliullin, Peiran Jiang, Euxhen Hasanaj, Jose A. Ovando-Ricardez, Marta Bueno, Irfan Rahman, Gloria S. Pryhuber, Dongmei Li, Qin Ma, Toren Finkel, Melanie Königshoff, Oliver Eickelberg, Mauricio Rojas, Ana L. Mora, Jose Lugo-Martinez, Ziv Bar-Joseph
The low resolution of spatial transcriptomics data necessitates additional information for optimal use. We developed scDOT, which combines spatial transcriptomics and single cell RNA sequencing to improve the ability to reconstruct single cell resolved spatial maps and identify senescent cells. scDOT integrates optimal transport and expression deconvolution to learn non-linear couplings between cells
-
GraphPCA: a fast and interpretable dimension reduction algorithm for spatial transcriptomics data Genome Biol. (IF 10.1) Pub Date : 2024-11-07 Jiyuan Yang, Lu Wang, Lin Liu, Xiaoqi Zheng
The rapid advancement of spatial transcriptomics technologies has revolutionized our understanding of cell heterogeneity and intricate spatial structures within tissues and organs. However, the high dimensionality and noise in spatial transcriptomic data present significant challenges for downstream data analyses. Here, we develop GraphPCA, an interpretable and quasi-linear dimension reduction algorithm
-
CaClust: linking genotype to transcriptional heterogeneity of follicular lymphoma using BCR and exomic variants Genome Biol. (IF 10.1) Pub Date : 2024-11-05 Kazimierz Oksza-Orzechowski, Edwin Quinten, Shadi Shafighi, Szymon M. Kiełbasa, Hugo W. van Kessel, Ruben A. L. de Groen, Joost S. P. Vermaat, Julieta H. Sepúlveda Yáñez, Marcelo A. Navarrete, Hendrik Veelken, Cornelis A. M. van Bergen, Ewa Szczurek
Tumours exhibit high genotypic and transcriptional heterogeneity. Both affect cancer progression and treatment, but have been predominantly studied separately in follicular lymphoma. To comprehensively investigate the evolution and genotype-to-phenotype maps in follicular lymphoma, we introduce CaClust, a probabilistic graphical model integrating deep whole exome, single-cell RNA and B-cell receptor
-
TDFPS-Designer: an efficient toolkit for barcode design and selection in nanopore sequencing Genome Biol. (IF 10.1) Pub Date : 2024-11-04 Junhai Qi, Zhengyi Li, Yao-zhong Zhang, Guojun Li, Xin Gao, Renmin Han
Oxford Nanopore Technologies (ONT) offers ultrahigh-throughput multi-sample sequencing but only provides barcode kits that enable up to 96-sample multiplexing. We present TDFPS-Designer, a new toolkit for nanopore sequencing barcode design, which creates significantly more barcodes: 137 with a length of 20 base pairs, 410 at 24 bp, and 1779 at 30 bp, far surpassing ONT’s offerings. It includes GPU-based
-
Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data Genome Biol. (IF 10.1) Pub Date : 2024-10-31 Xiaoting Li, Lucas A. N. Melo, Harmen J. Bussemaker
Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines
-
Response to "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" and "Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples" Genome Biol. (IF 10.1) Pub Date : 2024-10-30 Xinzhou Ge, Yumei Li, Wei Li, Jingyi Jessica Li
Two correspondences raised concerns or comments about our analyses regarding exaggerated false positives found by differential expression (DE) methods. Here, we discuss the points they raise and explain why we agree or disagree with these points. We add new analysis to confirm that the Wilcoxon rank-sum test remains the most robust method compared to the other five DE methods (DESeq2, edgeR, limma-voom
-
Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples Genome Biol. (IF 10.1) Pub Date : 2024-10-30 Lu Yang, Xianyang Zhang, Jun Chen
A recent study found severely inflated type I error rates for DESeq2 and edgeR, two dominant tools used for differential expression analysis of RNA-seq data. Here, we show that by properly addressing the outliers in the RNA-Seq data using winsorization, the type I error rate of DESeq2 and edgeR can be substantially reduced, and the power is comparable to Wilcoxon rank-sum test for large datasets. Therefore
-
Neglecting the impact of normalization in semi-synthetic RNA-seq data simulations generates artificial false positives Genome Biol. (IF 10.1) Pub Date : 2024-10-30 Boris P. Hejblum, Kalidou Ba, Rodolphe Thiébaut, Denis Agniel
A recent study reported exaggerated false positives by popular differential expression methods when analyzing large population samples. We reproduce the differential expression analysis simulation results and identify a caveat in the data generation process. Data not truly generated under the null hypothesis led to incorrect comparisons of benchmark methods. We provide corrected simulation results
-
pan-Draft: automated reconstruction of species-representative metabolic models from multiple genomes Genome Biol. (IF 10.1) Pub Date : 2024-10-25 Nicola De Bernardini, Guido Zampieri, Stefano Campanaro, Johannes Zimmermann, Silvio Waschina, Laura Treu
The accurate reconstruction of genome-scale metabolic models (GEMs) for unculturable species poses challenges due to the incomplete and fragmented genetic information typical of metagenome-assembled genomes (MAGs). While existing tools leverage sequence homology from single genomes, this study introduces pan-Draft, a pan-reactome-based approach exploiting recurrent genetic evidence to determine the
-
Plant conservation in the age of genome editing: opportunities and challenges Genome Biol. (IF 10.1) Pub Date : 2024-10-24 Kangquan Yin, Mi Yoon Chung, Bo Lan, Fang K. Du, Myong Gi Chung
Numerous plant taxa are threatened by habitat destruction or overexploitation. To overcome these threats, new methods are urgently needed for rescuing threatened and endangered plant species. Here, we review the genetic consequences of threats to species populations. We highlight potential advantages of genome editing for mitigating negative effects caused by new pathogens and pests or climate change
-
STASCAN deciphers fine-resolution cell distribution maps in spatial transcriptomics by deep learning Genome Biol. (IF 10.1) Pub Date : 2024-10-22 Ying Wu, Jia-Yi Zhou, Bofei Yao, Guanshen Cui, Yong-Liang Zhao, Chun-Chun Gao, Ying Yang, Shihua Zhang, Yun-Gui Yang
Spatial transcriptomics technologies have been widely applied to decode cellular distribution by resolving gene expression profiles in tissue. However, sequencing techniques still limit the ability to create a fine-resolved spatial cell-type map. To this end, we develop a novel deep-learning-based approach, STASCAN, to predict the spatial cellular distribution of captured or uncharted areas where only
-
Mapping lineage-traced cells across time points with moslin Genome Biol. (IF 10.1) Pub Date : 2024-10-21 Marius Lange, Zoe Piran, Michal Klein, Bastiaan Spanjaard, Dominik Klein, Jan Philipp Junker, Fabian J. Theis, Mor Nitzan
Simultaneous profiling of single-cell gene expression and lineage history holds enormous potential for studying cellular decision-making. Recent computational approaches combine both modalities into cellular trajectories; however, they cannot make use of all available lineage information in destructive time-series experiments. Here, we present moslin, a Gromov-Wasserstein-based model to couple cellular
-
A comprehensive study of genetic regulation and disease associations of plasma circulatory microRNAs using population-level data Genome Biol. (IF 10.1) Pub Date : 2024-10-21 Rima Mustafa, Michelle M. J. Mens, Arno van Hilten, Jian Huang, Gennady Roshchupkin, Tianxiao Huan, Linda Broer, Joyce B. J. van Meurs, Paul Elliott, Daniel Levy, M. Arfan Ikram, Marina Evangelou, Abbas Dehghan, Mohsen Ghanbari
MicroRNAs (miRNAs) are small non-coding RNAs that post-transcriptionally regulate gene expression. Perturbations in plasma miRNA levels are known to impact disease risk and have potential as disease biomarkers. Exploring the genetic regulation of miRNAs may yield new insights into their important role in governing gene expression and disease mechanisms. We present genome-wide association studies of
-
Scalable identification of lineage-specific gene regulatory networks from metacells with NetID Genome Biol. (IF 10.1) Pub Date : 2024-10-18 Weixu Wang, Yichen Wang, Ruiqi Lyu, Dominic Grün
The identification of gene regulatory networks (GRNs) is crucial for understanding cellular differentiation. Single-cell RNA sequencing data encode gene-level covariations at high resolution, yet data sparsity and high dimensionality hamper accurate and scalable GRN reconstruction. To overcome these challenges, we introduce NetID leveraging homogenous metacells while avoiding spurious gene–gene correlations
-
MHConstructor: a high-throughput, haplotype-informed solution to the MHC assembly challenge Genome Biol. (IF 10.1) Pub Date : 2024-10-17 Kristen J. Wade, Rayo Suseno, Kerry Kizer, Jacqueline Williams, Juliano Boquett, Stacy Caillier, Nicholas R. Pollock, Adam Renschen, Adam Santaniello, Jorge R. Oksenberg, Paul J. Norman, Danillo G. Augusto, Jill A. Hollenbach
The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short-read, de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly
-
HBI: a hierarchical Bayesian interaction model to estimate cell-type-specific methylation quantitative trait loci incorporating priors from cell-sorted bisulfite sequencing data Genome Biol. (IF 10.1) Pub Date : 2024-10-15 Youshu Cheng, Biao Cai, Hongyu Li, Xinyu Zhang, Gypsyamber D’Souza, Sadeep Shrestha, Andrew Edmonds, Jacquelyn Meyers, Margaret Fischl, Seble Kassaye, Kathryn Anastos, Mardge Cohen, Bradley E. Aouizerat, Ke Xu, Hongyu Zhao
Methylation quantitative trait loci (meQTLs) quantify the effects of genetic variants on DNA methylation levels. However, most published studies utilize bulk methylation datasets composed of different cell types and limit our understanding of cell-type-specific methylation regulation. We propose a hierarchical Bayesian interaction (HBI) model to infer cell-type-specific meQTLs, which integrates a large-scale
-
Multi-omics reveals lactylation-driven regulatory mechanisms promoting tumor progression in oral squamous cell carcinoma Genome Biol. (IF 10.1) Pub Date : 2024-10-15 Fengyang Jing, Lijing Zhu, Jianyun Zhang, Xuan Zhou, Jiaying Bai, Xuefen Li, Heyu Zhang, Tiejun Li
Lactylation, a post-translational modification, is increasingly recognized for its role in cancer progression. This study investigates its prevalence and impact in oral squamous cell carcinoma (OSCC). Immunohistochemical staining of 81 OSCC cases shows lactylation levels correlate with malignancy grading. Proteomic analyses of six OSCC tissue pairs reveal 2765 lactylation sites on 1033 proteins, highlighting
-
SDePER: a hybrid machine learning and regression method for cell-type deconvolution of spatial barcoding-based transcriptomic data Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Yunqing Liu, Ningshan Li, Ji Qi, Gang Xu, Jiayi Zhao, Nating Wang, Xiayuan Huang, Wenhao Jiang, Huanhuan Wei, Aurélien Justet, Taylor S. Adams, Robert Homer, Amei Amei, Ivan O. Rosas, Naftali Kaminski, Zuoheng Wang, Xiting Yan
Spatial barcoding-based transcriptomic (ST) data require deconvolution for cellular-level downstream analysis. Here we present SDePER, a hybrid machine learning and regression method to deconvolve ST data using reference single-cell RNA sequencing (scRNA-seq) data. SDePER tackles platform effects between ST and scRNA-seq data, ensuring a linear relationship between them while addressing sparsity and
-
When less is more: sketching with minimizers in genomics Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Malick Ndiaye, Silvia Prieto-Baños, Lucy M. Fitzgerald, Ali Yazdizadeh Kharrazi, Sergey Oreshkov, Christophe Dessimoz, Fritz J. Sedlazeck, Natasha Glover, Sina Majidian
The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze
-
scCTS: identifying the cell type-specific marker genes from population-level single-cell RNA-seq Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Luxiao Chen, Zhenxing Guo, Tao Deng, Hao Wu
Single-cell RNA-sequencing (scRNA-seq) provides gene expression profiles of individual cells from complex samples, facilitating the detection of cell type-specific marker genes. In scRNA-seq experiments with multiple donors, the population level variation brings an extra layer of complexity in cell type-specific gene detection, for example, they may not appear in all donors. Motivated by this observation
-
The ribosome profiling landscape of yeast reveals a high diversity in pervasive translation Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Chris Papadopoulos, Hugo Arbes, David Cornu, Nicolas Chevrollier, Sandra Blanchet, Paul Roginski, Camille Rabier, Safiya Atia, Olivier Lespinet, Olivier Namy, Anne Lopes
Pervasive translation is a widespread phenomenon that plays a critical role in the emergence of novel microproteins, but the diversity of translation patterns contributing to their generation remains unclear. Based on 54 ribosome profiling (Ribo-Seq) datasets, we investigated the yeast Ribo-Seq landscape using a representation framework that allows the comprehensive inventory and classification of
-
zMAP toolset: model-based analysis of large-scale proteomic data via a variance stabilizing z-transformation Genome Biol. (IF 10.1) Pub Date : 2024-10-14 Xiuqi Gui, Jing Huang, Linjie Ruan, Yanjun Wu, Xuan Guo, Ruifang Cao, Shuhan Zhou, Fengxiang Tan, Hongwen Zhu, Mushan Li, Guoqing Zhang, Hu Zhou, Lixing Zhan, Xin Liu, Shiqi Tu, Zhen Shao
Isobaric labeling-based mass spectrometry (ILMS) has been widely used to quantify, on a proteome-wide scale, the relative protein abundance in different biological conditions. However, large-scale ILMS data sets typically involve multiple runs of mass spectrometry, bringing great computational difficulty to the integration of ILMS samples. We present zMAP, a toolset that makes ILMS intensities comparable
-
Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Chloé Bessière, Haoliang Xue, Benoit Guibert, Anthony Boureux, Florence Rufflé, Julien Viot, Rayan Chikhi, Mikaël Salson, Camille Marchet, Thérèse Commes, Daniel Gautheret
Indexing techniques relying on k-mers have proven effective in searching for RNA sequences across thousands of RNA-seq libraries, but without enabling direct RNA quantification. We show here that arbitrary RNA sequences can be quantified in seconds through their decomposition into k-mers, with a precision akin to that of conventional RNA quantification methods. Using an index of the Cancer Cell Line
-
Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Mir Henglin, Maryam Ghareghani, William T. Harvey, David Porubsky, Sergey Koren, Evan E. Eichler, Peter Ebert, Tobias Marschall
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale
-
Spatiotemporal modeling reveals high-resolution invasion states in glioblastoma Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Varsha Thoppey Manoharan, Aly Abdelkareem, Gurveer Gill, Samuel Brown, Aaron Gillmor, Courtney Hall, Heewon Seo, Kiran Narta, Sean Grewal, Ngoc Ha Dang, Bo Young Ahn, Kata Osz, Xueqing Lun, Laura Mah, Franz Zemp, Douglas Mahoney, Donna L. Senger, Jennifer A. Chan, A. Sorana Morrissy
Diffuse invasion of glioblastoma cells through normal brain tissue is a key contributor to tumor aggressiveness, resistance to conventional therapies, and dismal prognosis in patients. A deeper understanding of how components of the tumor microenvironment (TME) contribute to overall tumor organization and to programs of invasion may reveal opportunities for improved therapeutic strategies. Towards
-
Systematic perturbations of SETD2, NSD1, NSD2, NSD3, and ASH1L reveal their distinct contributions to H3K36 methylation Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Gerry A. Shipman, Reinnier Padilla, Cynthia Horth, Bo Hu, Eric Bareke, Francisca N. Vitorino, Joanna M. Gongora, Benjamin A. Garcia, Chao Lu, Jacek Majewski
Methylation of histone 3 lysine 36 (H3K36me) has emerged as an essential epigenetic component for the faithful regulation of gene expression. Despite its importance in development and disease, how the molecular agents collectively shape the H3K36me landscape is unclear. We use mouse mesenchymal stem cells to perturb the H3K36me methyltransferases (K36MTs) and infer the activities of the five most prominent
-
Drought-responsive dynamics of H3K9ac-marked 3D chromatin interactions are integrated by OsbZIP23-associated super-enhancer-like promoter regions in rice Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Yu Chang, Jiahan Liu, Minrong Guo, Weizhi Ouyang, Jiapei Yan, Lizhong Xiong, Xingwang Li
In response to drought stress (DS), plants undergo complex processes that entail significant transcriptome reprogramming. However, the intricate relationship between the dynamic alterations in the three-dimensional (3D) genome and the modulation of gene co-expression in drought responses remains a relatively unexplored area. In this study, we reconstruct high-resolution 3D genome maps based on genomic
-
Improved detection of methylation in ancient DNA Genome Biol. (IF 10.1) Pub Date : 2024-10-10 Susanna Sawyer, Pere Gelabert, Benjamin Yakir, Alejandro Llanos-Lizcano, Alessandra Sperduti, Luca Bondioli, Olivia Cheronet, Christine Neugebauer-Maresch, Maria Teschler-Nicola, Mario Novak, Ildikó Pap, Ildikó Szikossy, Tamás Hajdu, Vyacheslav Moiseyev, Andrey Gromov, Gunita Zariņa, Eran Meshorer, Liran Carmel, Ron Pinhasi
Reconstructing premortem DNA methylation levels in ancient DNA has led to breakthrough studies such as the prediction of anatomical features of the Denisovan. These studies rely on computationally inferring methylation levels from damage signals in naturally deaminated cytosines, which requires expensive high-coverage genomes. Here, we test two methods for direct methylation measurement developed for
-
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics Genome Biol. (IF 10.1) Pub Date : 2024-10-08 Zijie Zhao, Tim Gruenloh, Meiyi Yan, Yixuan Wu, Zhongxuan Sun, Jiacheng Miao, Yuchang Wu, Jie Song, Qiongshi Lu
Polygenic risk score (PRS) is a major research topic in human genetics. However, a significant gap exists between PRS methodology and applications in practice due to often unavailable individual-level data for various PRS tasks including model fine-tuning, benchmarking, and ensemble learning. We introduce an innovative statistical framework to optimize and benchmark PRS models using summary statistics