研究领域
Research in my laboratory focuses on (1) the vertebrate mitochondrial as a simplified biological system featuring the three essential biological processes, (2) host-parasite interactions at the molecular level, especially of the mechanism governing the origin of new viral and bacterial pathogens through genomic recombination and horizontal gene transfer, (3) molecular phylogenetics, (4) microbial genomic evolution, (5) the origin and evolution of alternative splicing, and (5) development of powerful computational tools.
近期论文
查看导师新发文章
(温馨提示:请注意重名现象,建议点开原文通过作者单位确认)
Xia X. 2017. DAMBE6: New tools for microbial genomics, phylogenetics and molecular evolution. J Hered esx033. doi: 10.1093/jhered/esx033.
DAMBE is a comprehensive software workbench for data analysis in molecular biology, phylogenetics and evolution. Several important new functions have been added since version 5 of DAMBE: 1) comprehensive genomic profiling of translation initiation efficiency of different genes in different prokaryotic species, 2) a new index of translation elongation (ITE) that takes into account both tRNA-mediated selection and background mutation on codon-anticodon adaptation, 3) a new and accurate phylogenetic approach based on pairwise alignment only, which is useful for highly divergent sequences from which a reliable multiple sequence alignment is difficult to obtain. Many other functions have been updated and improved including PWM for motif characterization, Gibbs sampler for de novo motif discovery, hidden Markov models for protein secondary structure prediction, self-organizing map for non-linear clustering of transcriptomic data, comprehensive sequence alignment and phylogenetic functions. DAMBE features a graphic, user-friendly and intuitive interface, and is freely available from http://dambe.bio.uottawa.ca.
Abolbaghaei A, Silke JR, Xia X. 2017 How Changes in Anti-SD Sequences Would Affect SD Sequences in Escherichia coli and Bacillus subtilis. G3: Genes, Genomes Genetics
The 3' end of the small ribosomal RNAs (ssu rRNA) in bacteria is directly involved in the selection and binding of mRNA transcripts during translation initiation via well-documented interactions between a Shine-Dalgarno (SD) sequence located upstream of the initiation codon and an anti-SD (aSD) sequence at the 3' end of the ssu rRNA. Consequently, the 3' end of ssu rRNA (3'TAIL) is strongly conserved among bacterial species because a change in the region may impact the translation of many protein-coding genes. Escherichia coli and Bacillus subtilis differ in their 3' ends of ssu rRNA, being GAUCACCUCCUUA3' in E. coli and GAUCACCUCCUUUCU3' or GAUCACCUCCUUUCUA3' in B. subtilis. Such differences in 3'TAIL lead to species-specific SDs (designated SDEc for E. coli and SDBs for B. subtilis) that can form strong and well-positioned SD/aSD pairing in one species but not in the other. Selection mediated by the species-specific 3'TAIL is expected to favour SDBs against SDEc in B. subtilis but favour SDEc against SDBs in E. coli. Among well-positioned SDs, SDEc is used more in E. coli than in B. subtilis, and SDBs more in B. subtilis than in E. coli. Highly expressed genes and genes of high translation efficiency tend to have longer SDs than lowly expressed genes and genes with low translation efficiency in both species, but more so in B. subtilis than in E. coli. Both species overuse SDs matching the bolded part of 3'TAIL shown above. The 3'TAIL difference contributes to host-specificity of phages.
Xia X. 2017. Bioinformatics and Drug Discovery. Curr Top Med Chem 17:xxx-xxx
Bioinformatic analysis can not only accelerate drug target identification and drug candidate screening and refinement, but also facilitate characterization of side effects and predict drug resistance. High-throughput data such as genomic, epigenetic, genome architecture, cistromic, transcriptomic, proteomic, and ribosome profiling data have all made significant contribution to mechanism-based drug discovery and drug repurposing. Accumulation of protein and RNA structures, as well as development of homology modeling and protein structure simulation, coupled with large structure databases of small molecules and metabolites, paved the way for more realistic protein-ligand docking experiments and more informative virtual screening. I present the conceptual framework that drives the collection of these high-throughput data, summarize the utility and potential of mining these data in drug discovery, outline a few inherent limitations in data and software mining these data, point out news ways to refine analysis of these diverse types of data, and highlight commonly used software and databases relevant to drug discovery.
Wei Y, Xia X 2017 The Role of +4U as an Extended Translation Termination Signal in Bacteria. Genetics 205:539–549
Termination efficiency of stop codons depends on the first 3’ flanking (+4) base in bacteria and eukaryotes. In both Escherichia coli and Saccharomyces cerevisiae, termination read-through is reduced in the presence of +4U; however, the molecular mechanism underlying +4U function is poorly understood. Here, we perform comparative genomics analysis on 25 bacterial species (covering Actinobacteria, Bacteriodetes, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Proteobacteria and Spirochaetae) with bioinformatics approaches to examine the influence of +4U in bacterial translation termination by contrasting between highly and lowly expressed genes (HEGs and LEGs). We estimated gene expression using the recently formulated Index of Translation Elongation, ITE, and identified stop codon near-cognate tRNAs from well annotated genomes. We show that +4U was consistently over-represented in UAA-ending HEGs relative to LEGs. The result is consistent with the interpretation that +4U enhances termination mainly for UAA. Usage of +4U decreases in GC-rich species where most stop codons are UGA and UAG, with few UAA-ending genes, which is expected if UAA usage in HEGs drives up +4U usage. In highly expressed genes, +4U usage increases significantly with abundance of UAA nc_tRNAs (near-cognate tRNAs which decode codons differing from UAA by a single nucleotide), particularly those with a mismatch at the first stop codon site. UAA is always the preferred stop codon in highly expressed genes, and our results suggest that UAAU is the most efficient translation termination signal in bacteria.
Vlasschaert C, Cook D, Xia X, Gray DA. 2017. The evolution and functional diversification of the deubiquitinating enzyme superfamily. Genome Biol Evol. 9:558-573
Ubiquitin and ubiquitin-like molecules are attached to and removed from cellular proteins in a dynamic and highly regulated manner. Deubiquitinating enzymes are critical to this process, and the genetic catalogue of deubiquitinating enzymes expanded greatly over the course of evolution. Extensive functional redundancy has been noted among the 93 members of the human deubiquitinating enzyme (DUB) superfamily. This is especially true of genes that were generated by duplication (termed paralogs) as they often retain considerable sequence similarity. Since complete redundancy in systems should be eliminated by selective pressure we theorized that many overlapping DUBs must have significant and unique spatiotemporal roles that can be evaluated in an evolutionary context. We have determined the evolutionary history of the entire class of deubiquitinating enzymes, including the sequence and means of duplication for all paralogous pairs. To establish their uniqueness, we have investigated cell-type specificity in developmental and adult contexts, and have investigated the co-emergence of substrates from the same duplication events. Our analysis has revealed examples of DUB gene subfunctionalization, neofunctionalization, and nonfunctionalization.
Xia X. 2016. PhyPA: phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences. Molecular Phylogenetics and Evolution 102:331–343 .
While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.
Wei, Y., Wang, J., Xia, X. 2016. Coevolution between stop codon usage and release factors in bacterial species. Molecular Biology and Evolution 33:2357-2367. .
Three stop codons in bacteria represent different translation termination signals, and their usage is expected to depend on their differences in translation termination efficiency, mutation bias, and relative abundance of release factors (RF1 decoding UAA and UAG, and RF2 decoding UAA and UGA). In 14 bacterial species (covering Proteobacteria, Firmicutes, Cyanobacteria, Actinobacteria and Spirochetes) with cellular RF1 and RF2 quantified, UAA is consistently over-represented in highly expressed genes (HEGs) relative to lowly expressed genes (LEGs), whereas UGA usage is the opposite even in species where RF2 is far more abundant than RF1. UGA usage relative to UAG increases significantly with PRF2 [=RF2/(RF1+RF2)] as expected from adaptation between stop codons and their decoders. PRF2 is greater than 0.5 over a wide range of AT content (measured by PAT3 as the proportion of AT at third codon sites), but decreases rapidly towards zero at the high range of PAT3. This explains why bacterial lineages with high PAT3 often have UGA reassigned because of low RF2. There is no indication that UAG is a minor stop codon in bacteria as claimed in a recent publication. The claim is invalid because of the failure to apply the two key criteria in identifying a minor codon: 1) it is least preferred by HEGs (or most preferred by LEGs) and 2) it corresponds to the least abundant decoder. Our results suggest a more plausible explanation for why UAA usage increases, and UGA usage decreases, with PAT3, but UAG usage remains low over the entire PAT3 range.
Vlasschaert, C., Xia, X., Gray, D.A. 2016. Selection preserves Ubiquitin Specific Protease 4 alternative exon skipping in therian mammals. Scientific Reports 6:20039 .
Ubiquitin specific protease 4 (USP4) is a highly networked deubiquitinating enzyme with reported roles in cancer, innate immunity and RNA splicing. In mammals it has two dominant isoforms arising from inclusion or skipping of exon 7 (E7). We evaluated two plausible mechanisms for the generation of these isoforms: (A) E7 skipping due to a long upstream intron and (B) E7 skipping due to inefficient 5′ splice sites (5′SS) and/or branchpoint sites (BPS). We then assessed whether E7 alternative splicing is maintained by selective pressure or arose from genetic drift. Both transcript variants were generated from a USP4-E7 minigene construct with short flanking introns, an observation consistent with the second mechanism whereby differential splice signal strengths are the basis of E7 skipping. Optimization of the downstream 5′SS eliminated E7 skipping. Experimental validation of the correlation between 5′SS identity and exon skipping in vertebrates pinpointed the +6 site as the key splicing determinant. Therian mammals invariably display a 5′SS configuration favouring alternative splicing and the resulting isoforms have distinct subcellular localizations. We conclude that alternative splicing of mammalian USP4 is under selective maintenance and that long and short USP4 isoforms may target substrates in various cellular compartments.
Vlasschaert, C., Xia, X., Coulombe, J., Gray, D.A. 2015. Evolution of the highly networked deubiquitinating enzymes USP4, USP15 and USP11. BMC Evolutionary Biology 15:230 .
Background: USP4, USP15 and USP11 are paralogous deubiquitinating enzymes as evidenced by structural organization and sequence similarity. Based on known interactions and substrates it would appear that they have partially redundant roles in pathways vital to cell proliferation, development and innate immunity, and elevated expression of all three has been reported in various human malignancies. The nature and order of duplication events that gave rise to these extant genes has not been determined, nor has their functional redundancy been established experimentally at the organismal level. Methods We have employed phylogenetic and syntenic reconstruction methods to determine the chronology of the duplication events that generated the three paralogs and have performed genetic crosses to evaluate redundancy in mice. Results Our analyses indicate that USP4 and USP15 arose from whole genome duplication prior to the emergence of jawed vertebrates. Despite having lower sequence identity USP11 was generated later in vertebrate evolution by small-scale duplication of the USP4-encoding region. While USP11 was subsequently lost in many vertebrate species, all available genomes retain a functional copy of either USP4 or USP15, and through genetic crosses of mice with inactivating mutations we have confirmed that viability is contingent on a functional copy of USP4 or USP15. Loss of ubiquitin-exchange regulation, constitutive skipping of the seventh exon and neural-specific expression patterns are derived states of USP11. Post-translational modification sites differ between USP4, USP15 and USP11 throughout evolution. Conclusions In isolation sequence alignments can generate erroneous USP gene phylogenies. Through a combination of methodologies the gene duplication events that gave rise to USP4, USP15, and USP11 have been established. Although it operates in the same molecular pathways as the other USPs, the rapid divergence of the more recently generated USP11 enzyme precludes its functional interchangeability with USP4 and USP15. Given their multiplicity of substrates the emergence (and in some cases subsequent loss) of these USP paralogs would be expected to alter the dynamics of the networks in which they are embedded.
Prabhakaran, R., Chithambaram, S., Xia, X. 2015. Escherichia coli and Staphylococcus phages: Effect of translation initiation efficiency on differential codon adaptation mediated by virulent and temperate lifestyles. Journal of General Virology 96:1169-1179. .
Rapid biosynthesis is key to the success of bacteria and viruses. Highly expressed genes in bacteria exhibit strong codon bias corresponding to differential availability of tRNAs. However, a large clade of lambdoid coliphages exhibit relatively poor codon adaptation to the host translation machinery, in contrast to other coliphages that exhibit strong codon adaptation to the host. Three possible explanations were previously proposed but dismissed: 1) the phage-borne tRNA genes that reduce the dependence of phage translation on host tRNAs, 2) lack of time needed for evolving codon adaptation due to recent host switching, and 3) strong strand asymmetry with biased mutation disrupting codon adaptation. Here we examine the possibility that phages with relatively poor codon adaptation have poor translation initiation which would weaken the selection on codon adaptation. We measure translation initiation by: 1) the strength and position of the Shine-Dalgarno (SD) sequence and (2) stability of secondary structure of sequences flanking SD and start codon known to affect accessibility of SD and start codon. Phage genes with strong codon adaptation have significantly stronger SD sequences than those with poor codon adaptation. The former also have significantly weaker secondary structure in sequences flanking SD and start codon than the latter. Thus, lambdoid phages do not exhibit strong codon adaptation because they have relatively inefficient translation initiation and would benefit little from increased elongation efficiency. We also provide evidence suggesting that phage lifestyle (virulent versus temperate) affects selection intensity on the efficiency of translation initiation and elongation.
Sun,X, Xia H, Yang Q. 2015. Dating the origin of the major lineages of Branchiopoda. Palaeoworld 25:303–317 .
Despite the well-established phylogeny and good fossil record of branchiopods, a consistent macro-evolutionary timescale for the group remains elusive. This study focuses on the early branchiopod divergence dates where fossil record is extremely fragmentary or missing. On the basis of a large genomic dataset and carefully evaluated fossil calibration points, we assess the quality of the branchiopod fossil record by calibrating the tree against well-established first occurrences, providing paleontological estimates of divergence times and completeness of their fossil record. The maximum age constraints were set using a quantitative approach of Marshall (2008). We tested the alternative placements of Yicaris and Wujicaris in the referred arthropod tree via the likelihood checkpoints method. Divergence dates were calculated using Bayesian relaxed molecular clock and penalized likelihood methods. Our results show that the stem group of Branchiopoda is rooted in the late Neoproterozoic (563 ± 7 Ma); the crown-Branchiopoda diverged during middle Cambrian to Early Ordovician (478–512 Ma), likely representing the origin of the freshwater biota; the Phyllopoda clade diverged during Ordovician (448–480 Ma) and Diplostraca during Late Ordovician to early Silurian (430–457 Ma). By evaluating the congruence between the observed times of appearance of clade in the fossil record and the results derived from molecular data, we found that the uncorrelated rate model gave more congruent results for shallower divergence events whereas the auto-correlated rate model gives more congruent results for deeper events.
Xia X. 2015. A major controversy in codon-anticodon adaptation resolved by a new codon usage index. Genetics 199:573-579 Access the recommendation on F1000Prime