-
A systematic review of deep learning chemical language models in recent era J. Cheminfom. (IF 7.1) Pub Date : 2024-11-18 Hector Flores-Hernandez, Emmanuel Martinez-Ledesma
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data,
-
QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Helle W. van den Maagdenberg, Martin Šícho, David Alencar Araripe, Sohvi Luukkonen, Linde Schoenmaker, Michiel Jespers, Olivier J. M. Béquignon, Marina Gorostiola González, Remco L. van den Broek, Andrius Bernatavicius, J. G. Coen van Hasselt, Piet. H. van der Graaf, Gerard J. P. van Westen
Building reliable and robust quantitative structure–property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of
-
Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1 J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Gintautas Kamuntavičius, Alvaro Prat, Tanya Paquet, Orestis Bastas, Hisham Abdel Aty, Qing Sun, Carsten B. Andersen, John Harman, Marc E. Siladi, Daniel R. Rines, Sarah J. L. Flatters, Roy Tal, Povilas Norvaišas
Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening
-
Comparative evaluation of methods for the prediction of protein–ligand binding sites J. Cheminfom. (IF 7.1) Pub Date : 2024-11-11 Javier S. Utgés, Geoffrey J. Barton
The accurate identification of protein–ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years
-
Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning J. Cheminfom. (IF 7.1) Pub Date : 2024-11-06 Jue Wang, Yufan Liu, Boxue Tian
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without
-
Milestones in chemoinformatics: global view of the field J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Jürgen Bajorath
Over the past ~ 25 years, chemoinformatics has evolved as a scientific discipline, with a strong foundation in pharmaceutical research and scientific roots that can be traced back to the late 1950s. It covers a wide methodological spectrum and is perhaps best positioned in the greater context of chemical information science. Herein, the chemoinformatics discipline is delineated, characteristic (and
-
StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
Molecular dynamics simulations serve as a prevalent approach for investigating the dynamic behaviour of proteins and protein–ligand complexes. Due to its versatility and speed, GROMACS stands out as a commonly utilized software platform for executing molecular dynamics simulations. However, its effective utilization requires substantial expertise in configuring, executing, and interpreting molecular
-
Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Domenico Gadaleta, Marina Garcia de Lomana, Eva Serrano-Candelas, Rita Ortega-Vallbona, Rafael Gozalbes, Alessandra Roncaglioni, Emilio Benfenati
The adverse outcome pathway (AOP) concept has gained attention as a way to explore the mechanism of chemical toxicity. In this study, quantitative structure–activity relationship (QSAR) models were developed to predict compound activity toward protein targets relevant to molecular initiating events (MIE) upstream of organ-specific toxicities, namely liver steatosis, cholestasis, nephrotoxicity, neural
-
Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee
We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules
-
Searching chemical databases in the pre-history of cheminformatics J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Peter Willett
This article highlights research from the last century that has provided the basis for the searching techniques that are used in present-day cheminformatics systems, and thus provides an acknowledgement of the contributions made by early pioneers in the field.
-
GTransCYPs: an improved graph transformer neural network with attention pooling for reliably predicting CYP450 inhibitors J. Cheminfom. (IF 7.1) Pub Date : 2024-10-29 Candra Zonyfar, Soualihou Ngnamsie Njimbouom, Sophia Mosalla, Jeong-Dong Kim
State‑of‑the‑art medical studies proved that predicting CYP450 enzyme inhibitors is beneficial in the early stage of drug discovery. However, accurate machine learning-based (ML) in silico methods for predicting CYP450 inhibitors remains challenging. Here, we introduce GTransCYPs, an improved graph neural network (GNN) with a transformer mechanism for predicting CYP450 inhibitors. This model significantly
-
A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Sina Abdollahi, Darius P. Schaub, Madalena Barroso, Nora C. Laubach, Wiebke Hutwelker, Ulf Panzer, S.øren W. Gersting, Stefan Bonn
The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here,
-
Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Zeqing Bao, Gary Tom, Austin Cheng, Jeffrey Watchorn, Alán Aspuru-Guzik, Christine Allen
Drug solubility is an important parameter in the drug development process, yet it is often tedious and challenging to measure, especially for expensive drugs or those available in small quantities. To alleviate these challenges, machine learning (ML) has been applied to predict drug solubility as an alternative approach. However, the majority of existing ML research has focused on the predictions of
-
MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model J. Cheminfom. (IF 7.1) Pub Date : 2024-10-23 Sadettin Y. Ugurlu, David McDonald, Shan He
A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms
-
Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization J. Cheminfom. (IF 7.1) Pub Date : 2024-10-23 Miguel García-Ortegón, Srijit Seal, Carl Rasmussen, Andreas Bender, Sergio Bacallado
Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to
-
Large-scale annotation of biochemically relevant pockets and tunnels in cognate enzyme–ligand complexes J. Cheminfom. (IF 7.1) Pub Date : 2024-10-15 O. Vavra, J. Tyzack, F. Haddadi, J. Stourac, J. Damborsky, S. Mazurenko, J. M. Thornton, D. Bednar
Tunnels in enzymes with buried active sites are key structural features allowing the entry of substrates and the release of products, thus contributing to the catalytic efficiency. Targeting the bottlenecks of protein tunnels is also a powerful protein engineering strategy. However, the identification of functional tunnels in multiple protein structures is a non-trivial task that can only be addressed
-
Insights into predicting small molecule retention times in liquid chromatography using deep learning J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Yuting Liu, Akiyasu C. Yoshizawa, Yiwei Ling, Shujiro Okuda
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in
-
Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Sejal Sharma, Liping Feng, Nicha Boonpattrawong, Arvinder Kapur, Lisa Barroilhet, Manish S. Patankar, Spencer S. Ericksen
Focused screening on target-prioritized compound sets can be an efficient alternative to high throughput screening (HTS). For most biomolecular targets, compound prioritization models depend on prior screening data or a target structure. For phenotypic or multi-protein pathway targets, it may not be clear which public assay records provide relevant data. The question also arises as to whether data
-
Bitter peptide prediction using graph neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Prashant Srivastava, Alexandra Steuer, Francesco Ferri, Alessandro Nicoli, Kristian Schultz, Saptarshi Bej, Antonella Di Pizio, Olaf Wolkenhauer
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification
-
A multi-view feature representation for predicting drugs combination synergy based on ensemble and multi-task attention models J. Cheminfom. (IF 7.1) Pub Date : 2024-09-27 Samar Monem, Aboul Ella Hassanien, Alaa H. Abdel-Hamid
This paper proposes a novel multi-view ensemble predictor model that is designed to address the challenge of determining synergistic drug combinations by predicting both the synergy score value values and synergy class label of drug combinations with cancer cell lines. The proposed methodology involves representing drug features through four distinct views: Simplified Molecular-Input Line-Entry System
-
Combining graph neural networks and transformers for few-shot nuclear receptor binding activity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-09-27 Luis H. M. Torres, Joel P. Arrais, Bernardete Ribeiro
Nuclear receptors (NRs) play a crucial role as biological targets in drug discovery. However, determining which compounds can act as endocrine disruptors and modulate the function of NRs with a reduced amount of candidate drugs is a challenging task. Moreover, the computational methods for NR-binding activity prediction mostly focus on a single receptor at a time, which may limit their effectiveness
-
Computer-aided pattern scoring (C@PS): a novel cheminformatic workflow to predict ligands with rare modes-of-action J. Cheminfom. (IF 7.1) Pub Date : 2024-09-23 Sven Marcel Stefan, Katja Stefan, Vigneshwaran Namasivayam
The identification, establishment, and exploration of potential pharmacological drug targets are major steps of the drug development pipeline. Target validation requires diverse chemical tools that come with a spectrum of functionality, e.g., inhibitors, activators, and other modulators. Particularly tools with rare modes-of-action allow for a proper kinetic and functional characterization of the targets-of-interest
-
EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Zhiguang Fan, Yuedong Yang, Mingyuan Xu, Hongming Chen
Despite recent advancement in 3D molecule conformation generation driven by diffusion models, its high computational cost in iterative diffusion/denoising process limits its application. Here, an equivariant consistency model (EC-Conf) was proposed as a fast diffusion method for low-energy conformation generation. In EC-Conf, a modified SE (3)-equivariant transformer model was directly used to encode
-
RAIChU: automating the visualisation of natural product biosynthesis J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Barbara R. Terlouw, Friederike Biermann, Sophie P. J. M. Vromans, Elham Zamani, Eric J. N. Helfrich, Marnix H. Medema
Natural products are molecules that fulfil a range of important ecological functions. Many natural products have been exploited for pharmaceutical and agricultural applications. In contrast to many other specialised metabolites, the products of modular nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) systems can often (partially) be predicted from the DNA sequence of the biosynthetic
-
Evaluating the generalizability of graph neural networks for predicting collision cross section J. Cheminfom. (IF 7.1) Pub Date : 2024-08-29 Chloe Engler Hart, António José Preto, Shaurya Chanana, David Healey, Tobias Kind, Daniel Domingo-Fernández
Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development
-
BuildAMol: a versatile Python toolkit for fragment-based molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-08-25 Noah Kleinschmidt, Thomas Lemmin
In recent years computational methods for molecular modeling have become a prime focus of computational biology and cheminformatics. Many dedicated systems exist for modeling specific classes of molecules such as proteins or small drug-like ligands. These are often heavily tailored toward the automated generation of molecular structures based on some meta-input by the user and are not intended for
-
Deep learning of multimodal networks with topological regularization for drug repositioning J. Cheminfom. (IF 7.1) Pub Date : 2024-08-23 Yuto Ohnuki, Manato Akiyama, Yasubumi Sakakibara
Computational techniques for drug-disease prediction are essential in enhancing drug discovery and repositioning. While many methods utilize multimodal networks from various biological databases, few integrate comprehensive multi-omics data, including transcriptomes, proteomes, and metabolomes. We introduce STRGNN, a novel graph deep learning approach that predicts drug-disease relationships using
-
Automatic molecular fragmentation by evolutionary optimisation J. Cheminfom. (IF 7.1) Pub Date : 2024-08-19 Fiona C. Y. Yu, Jorge L. Gálvez Vallejo, Giuseppe M. J. Barca
Molecular fragmentation is an effective suite of approaches to reduce the formal computational complexity of quantum chemistry calculations while enhancing their algorithmic parallelisability. However, the practical applicability of fragmentation techniques remains hindered by a dearth of automation and effective metrics to assess the quality of a fragmentation scheme. In this article, we present the
-
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow J. Cheminfom. (IF 7.1) Pub Date : 2024-08-16 José T. Moreira-Filho, Dhruv Ranganath, Mike Conway, Charles Schmitt, Nicole Kleinstreuer, Kamel Mansouri
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or
-
Metis: a python-based user interface to collect expert feedback for generative chemistry models J. Cheminfom. (IF 7.1) Pub Date : 2024-08-14 Janosch Menke, Yasmine Nahal, Esben Jannik Bjerrum, Mikhail Kabeshov, Samuel Kaski, Ola Engkvist
One challenge that current de novo drug design models face is a disparity between the user’s expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists’ implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is
-
Geometric deep learning for molecular property predictions with chemical accuracy across chemical space J. Cheminfom. (IF 7.1) Pub Date : 2024-08-13 Maarten R. Dobbelaere, István Lengyel, Christian V. Stevens, Kevin M. Van Geem
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase
-
MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models J. Cheminfom. (IF 7.1) Pub Date : 2024-08-12 Sergey Sosnin
The exponential growth of data is challenging for humans because their ability to analyze data is limited. Especially in chemistry, there is a demand for tools that can visualize molecular datasets in a convenient graphical way. We propose a new, ready-to-use, multi-tool, and open-source framework for visualizing and navigating chemical space. This framework adheres to the low-code/no-code (LCNC) paradigm
-
Building shape-focused pharmacophore models for effective docking screening J. Cheminfom. (IF 7.1) Pub Date : 2024-08-09 Paola Moyano-Gómez, Jukka V. Lehtonen, Olli T. Pentikäinen, Pekka A. Postila
The performance of molecular docking can be improved by comparing the shape similarity of the flexibly sampled poses against the target proteins’ inverted binding cavities. The effectiveness of these pseudo-ligands or negative image-based models in docking rescoring is boosted further by performing enrichment-driven optimization. Here, we introduce a novel shape-focused pharmacophore modeling algorithm
-
An automated calculation pipeline for differential pair interaction energies with molecular force fields using the Tinker Molecular Modeling Package J. Cheminfom. (IF 7.1) Pub Date : 2024-08-08 Felix Bänsch, Mirco Daniel, Harald Lanig, Christoph Steinbeck, Achim Zielesny
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer–monomer distances, estimation
-
Evaluation of reinforcement learning in transformer-based molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-08-08 Jiazhen He, Alessandro Tibo, Jon Paul Janet, Eva Nittinger, Christian Tyrchan, Werngard Czechtizky, Ola Engkvist
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization
-
Hamiltonian diversity: effectively measuring molecular diversity by shortest Hamiltonian circuits J. Cheminfom. (IF 7.1) Pub Date : 2024-08-07 Xiuyuan Hu, Guoqing Liu, Quanming Yao, Yang Zhao, Hao Zhang
In recent years, significant advancements have been made in molecular generation algorithms aimed at facilitating drug development, and molecular diversity holds paramount importance within the realm of molecular generation. Nonetheless, the effective quantification of molecular diversity remains an elusive challenge, as extant metrics exemplified by Richness and Internal Diversity fall short in concurrently
-
Advancements in biotransformation pathway prediction: enhancements, datasets, and novel functionalities in enviPath J. Cheminfom. (IF 7.1) Pub Date : 2024-08-06 Jasmin Hafner, Tim Lorsbach, Sebastian Schmidt, Liam Brydon, Katharina Dost, Kunyang Zhang, Kathrin Fenner, Jörg Wicker
enviPath is a widely used database and prediction system for microbial biotransformation pathways of primarily xenobiotic compounds. Data and prediction system are freely available both via a web interface and a public REST API. Since its initial release in 2016, we extended the data available in enviPath and improved the performance of the prediction system and usability of the overall system. We
-
PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications J. Cheminfom. (IF 7.1) Pub Date : 2024-08-02 Yang Tan, Mingchen Li, Ziyi Zhou, Pan Tan, Huiqun Yu, Guisheng Fan, Liang Hong
Protein language models (PLMs) play a dominant role in protein representation learning. Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem with this representation method is that it simply divides the protein sequence into sequences of individual amino acids, ignoring the fact that certain residues often occur together. Therefore, it is inappropriate to view amino
-
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example J. Cheminfom. (IF 7.1) Pub Date : 2024-08-02 Run-Hsin Lin, Pinpin Lin, Chia-Chi Wang, Chun-Wei Tung
Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets
-
Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling J. Cheminfom. (IF 7.1) Pub Date : 2024-08-01 Louis Plyer, Gilles Marcou, Céline Perves, Fanny Bonachera, Alexander Varnek
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed
-
Transfer learning across different chemical domains: virtual screening of organic materials with deep learning models pretrained on small molecule and chemical reaction data J. Cheminfom. (IF 7.1) Pub Date : 2024-07-30 Chengwei Zhang, Yushuang Zhai, Ziyang Gong, Hongliang Duan, Yuan-Bin She, Yun-Fang Yang, An Su
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules
-
Reproducible MS/MS library cleaning pipeline in matchms J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Niek F. de Jonge, Helge Hecht, Michael Strobel, Mingxun Wang, Justin J. J. van der Hooft, Florian Huber
Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting
-
Hilbert-curve assisted structure embedding method J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Gergely Zahoránszky-Kőhalmi, Kanny K. Wan, Alexander G. Godfrey
Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret
-
A computational workflow for analysis of missense mutations in precision oncology J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Rayyan Tariq Khan, Petra Pokorna, Jan Stourac, Simeon Borko, Ihor Arefiev, Joan Planas-Iglesias, Adam Dobias, Gaspar Pinto, Veronika Szotkowska, Jaroslav Sterba, Ondrej Slaby, Jiri Damborsky, Stanislav Mazurenko, David Bednar
Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for
-
Enhancing molecular property prediction with auxiliary learning and task-specific adaptation J. Cheminfom. (IF 7.1) Pub Date : 2024-07-24 Vishal Dey, Xia Ning
Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple
-
CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis J. Cheminfom. (IF 7.1) Pub Date : 2024-07-24 Karla P. Godinez-Macias, Elizabeth A. Winzeler
It is well-accepted that knowledge of a small molecule’s target can accelerate optimization. Although chemogenomic databases are helpful resources for predicting or finding compound interaction partners, they tend to be limited and poorly annotated. Furthermore, unlike genes, compound identifiers are often not standardized, and many synonyms may exist, especially in the biological literature, making
-
Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore J. Cheminfom. (IF 7.1) Pub Date : 2024-07-23 Shuan Chen, Yousung Jung
Synthetic accessibility prediction is a task to estimate how easily a given molecule might be synthesizable in the laboratory, playing a crucial role in computer-aided molecular design. Although synthesis planning programs can determine synthesis routes, their slow processing times make them impractical for large-scale molecule screening. On the other hand, existing rapid synthesis accessibility estimation
-
Reaction rebalancing: a novel approach to curating reaction databases J. Cheminfom. (IF 7.1) Pub Date : 2024-07-19 Tieu-Long Phan, Klaus Weinbauer, Thomas Gärtner, Daniel Merkle, Jakob L. Andersen, Rolf Fagerberg, Peter F. Stadler
Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the
-
piscesCSM: prediction of anticancer synergistic drug combinations J. Cheminfom. (IF 7.1) Pub Date : 2024-07-19 Raghad AlJarf, Carlos H. M. Rodrigues, Yoochan Myung, Douglas E. V. Pires, David B. Ascher
While drug combination therapies are of great importance, particularly in cancer treatment, identifying novel synergistic drug combinations has been a challenging venture. Computational methods have emerged in this context as a promising tool for prioritizing drug combinations for further evaluation, though they have presented limited performance, utility, and interpretability. Here, we propose a novel
-
Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment J. Cheminfom. (IF 7.1) Pub Date : 2024-07-15 Kaipeng Zeng, Bo Yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu
Retrosynthesis planning poses a formidable challenge in the organic chemical industry, particularly in pharmaceuticals. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels
-
LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification J. Cheminfom. (IF 7.1) Pub Date : 2024-07-07 Ruifeng Zhou, Jing Fan, Sishu Li, Wenjie Zeng, Yilun Chen, Xiaoshan Zheng, Hongyang Chen, Jun Liao
Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally
-
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture J. Cheminfom. (IF 7.1) Pub Date : 2024-07-05 Kohulan Rajan, Henning Otto Brinkhaus, Achim Zielesny, Christoph Steinbeck
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced
-
PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models J. Cheminfom. (IF 7.1) Pub Date : 2024-07-04 Morgan Thomas, Mazen Ahmad, Gary Tresadern, Gianni de Fabritiis
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple
-
Application of machine reading comprehension techniques for named entity recognition in materials science J. Cheminfom. (IF 7.1) Pub Date : 2024-07-02 Zihui Huang, Liqiang He, Yuhang Yang, Andi Li, Zhiwen Zhang, Siwei Wu, Yang Wang, Yan He, Xujie Liu
Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can
-
CPSign: conformal prediction for cheminformatics modeling J. Cheminfom. (IF 7.1) Pub Date : 2024-06-28 Staffan Arvidsson McShane, Ulf Norinder, Jonathan Alvarsson, Ernst Ahlberg, Lars Carlsson, Ola Spjuth
Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and
-
AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-06-27 Lung-Yi Chen, Yi-Pei Li
This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction
-
Llamol: a dynamic multi-conditional generative transformer for de novo molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-06-21 Niklas Dobberstein, Astrid Maass, Jan Hamaekers
Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present Llamol, a single novel generative transformer model based on
-
Physicochemical modelling of the retention mechanism of temperature-responsive polymeric columns for HPLC through machine learning algorithms J. Cheminfom. (IF 7.1) Pub Date : 2024-06-21 Elena Bandini, Rodrigo Castellano Ontiveros, Ardiana Kajtazi, Hamed Eghbali, Frédéric Lynen
Temperature-responsive liquid chromatography (TRLC) offers a promising alternative to reversed-phase liquid chromatography (RPLC) for environmentally friendly analytical techniques by utilizing pure water as a mobile phase, eliminating the need for harmful organic solvents. TRLC columns, packed with temperature-responsive polymers coupled to silica particles, exhibit a unique retention mechanism influenced
-
A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence J. Cheminfom. (IF 7.1) Pub Date : 2024-06-19 Xiaofan Zheng, Yoichi Tomiura
Among the various molecular properties and their combinations, it is a costly process to obtain the desired molecular properties through theory or experiment. Using machine learning to analyze molecular structure features and to predict molecular properties is a potentially efficient alternative for accelerating the prediction of molecular properties. In this study, we analyze molecular properties
-
Stereochemically-aware bioactivity descriptors for uncharacterized chemical compounds J. Cheminfom. (IF 7.1) Pub Date : 2024-06-18 Arnau Comajuncosa-Creus, Aksel Lenes, Miguel Sánchez-Palomino, Dylan Dalton, Patrick Aloy
Stereochemistry plays a fundamental role in pharmacology. Here, we systematically investigate the relationship between stereoisomerism and bioactivity on over 1 M compounds, finding that a very significant fraction (~ 40%) of spatial isomer pairs show, to some extent, distinct bioactivities. We then use the 3D representation of these molecules to train a collection of deep neural networks (Signaturizers3D)