-
Interface-aware molecular generative framework for protein–protein interaction modulators J. Cheminfom. (IF 7.1) Pub Date : 2024-12-20 Jianmin Wang, Jiashun Mao, Chunyan Li, Hongxin Xiang, Xun Wang, Shuang Wang, Zixu Wang, Yangyang Chen, Yuquan Li, Kyoung Tai No, Tao Song, Xiangxiang Zeng
Protein–protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs
-
MolNexTR: a generalized deep learning model for molecular image recognition J. Cheminfom. (IF 7.1) Pub Date : 2024-12-18 Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao
In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of
-
FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data J. Cheminfom. (IF 7.1) Pub Date : 2024-12-10 Fabio Herrera-Rocha, Miguel Fernández-Niño, Jorge Duitama, Mónica P. Cala, María José Chica, Ludger A. Wessjohann, Mehdi D. Davari, Andrés Fernando González Barrios
Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up
-
Be aware of overfitting by hyperparameter optimization! J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09 Igor V. Tetko, Ruud van Deursen, Guillaume Godin
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each
-
Human-in-the-loop active learning for goal-oriented molecule generation J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09 Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist, Samuel Kaski
Machine learning (ML) systems have enabled the modelling of quantitative structure–property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical
-
CSearch: chemical space search via virtual synthesis and global optimization J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05 Hakjean Kim, Seongok Ryu, Nuri Jung, Jinsol Yang, Chaok Seok
The two key components of computational molecular design are virtually generating molecules and predicting the properties of these generated molecules. This study focuses on an effective method for molecular generation through virtual synthesis and global optimization of a given objective function. Using a pre-trained graph neural network (GNN) objective function to approximate the docking energies
-
Deepmol: an automated machine and deep learning framework for computational chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05 João Correia, João Capela, Miguel Rocha
The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance
-
Sort & Slice: a simple and superior alternative to hash-based folding for extended-connectivity fingerprints J. Cheminfom. (IF 7.1) Pub Date : 2024-12-03 Markus Dablander, Thierry Hanser, Renaud Lambiotte, Garrett M. Morris
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods. In contrast, sets of detected
-
cidalsDB: an AI-empowered platform for anti-pathogen therapeutics research J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28 Emna Harigua-Souiai, Ons Masmoudi, Samer Makni, Rafeh Oualha, Yosser Z. Abdelkrim, Sara Hamdi, Oussama Souiai, Ikram Guizani
Computer-aided drug discovery (CADD) is nurtured by late advances in big data analytics and Artificial Intelligence (AI) towards enhanced drug discovery (DD) outcomes. In this context, reliable datasets are of utmost importance. We herein present CidalsDB a novel web server for AI-assisted DD against infectious pathogens, namely Leishmania parasites and Coronaviruses. We performed a literature search
-
Group graph: a molecular graph representation with enhanced performance, efficiency and interpretability J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28 Piao-Yang Cao, Yang He, Ming-Yang Cui, Xiao-Min Zhang, Qingye Zhang, Hong-Yu Zhang
The exploration of chemical space holds promise for developing influential chemical entities. Molecular representations, which reflect features of molecular structure in silico, assist in navigating chemical space appropriately. Unlike atom-level molecular representations, such as SMILES and atom graph, which can sometimes lead to confusing interpretations about chemical substructures, substructure-level
-
GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts J. Cheminfom. (IF 7.1) Pub Date : 2024-11-26 Haochen Chen, Tao Liang, Kai Tan, Anan Wu, Xin Lu
In this work, inspired by the graph transformer, we presented an improved protocol, termed GT-NMR, which integrates 2D molecular graph representation with Transformer architecture, for accurate yet efficient prediction of NMR chemical shifts. The effectiveness of the GT-NMR was thoroughly examined with the standard nmrshiftdb2 dataset, 37 natural products and structural elucidation of 11 pairs of natural
-
Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature J. Cheminfom. (IF 7.1) Pub Date : 2024-11-26 Sarveswara Rao Vangala, Sowmya Ramaswamy Krishnan, Navneet Bung, Dhandapani Nandagopal, Gomathi Ramasamy, Satyam Kumar, Sridharan Sankaran, Rajgopal Srinivasan, Arijit Roy
With the advent of artificial intelligence (AI), it is now possible to design diverse and novel molecules from previously unexplored chemical space. However, a challenge for chemists is the synthesis of such molecules. Recently, there have been attempts to develop AI models for retrosynthesis prediction, which rely on the availability of a high-quality training dataset. In this work, we explore the
-
Molecular identification via molecular fingerprint extraction from atomic force microscopy images J. Cheminfom. (IF 7.1) Pub Date : 2024-11-25 Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez
Non–Contact Atomic Force Microscopy with CO–functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR–AFM images, leading to molecular
-
A systematic review of deep learning chemical language models in recent era J. Cheminfom. (IF 7.1) Pub Date : 2024-11-18 Hector Flores-Hernandez, Emmanuel Martinez-Ledesma
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data,
-
QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Helle W. van den Maagdenberg, Martin Šícho, David Alencar Araripe, Sohvi Luukkonen, Linde Schoenmaker, Michiel Jespers, Olivier J. M. Béquignon, Marina Gorostiola González, Remco L. van den Broek, Andrius Bernatavicius, J. G. Coen van Hasselt, Piet. H. van der Graaf, Gerard J. P. van Westen
Building reliable and robust quantitative structure–property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of
-
Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1 J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Gintautas Kamuntavičius, Alvaro Prat, Tanya Paquet, Orestis Bastas, Hisham Abdel Aty, Qing Sun, Carsten B. Andersen, John Harman, Marc E. Siladi, Daniel R. Rines, Sarah J. L. Flatters, Roy Tal, Povilas Norvaišas
Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening
-
Comparative evaluation of methods for the prediction of protein–ligand binding sites J. Cheminfom. (IF 7.1) Pub Date : 2024-11-11 Javier S. Utgés, Geoffrey J. Barton
The accurate identification of protein–ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years
-
Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning J. Cheminfom. (IF 7.1) Pub Date : 2024-11-06 Jue Wang, Yufan Liu, Boxue Tian
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without
-
Milestones in chemoinformatics: global view of the field J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Jürgen Bajorath
Over the past ~ 25 years, chemoinformatics has evolved as a scientific discipline, with a strong foundation in pharmaceutical research and scientific roots that can be traced back to the late 1950s. It covers a wide methodological spectrum and is perhaps best positioned in the greater context of chemical information science. Herein, the chemoinformatics discipline is delineated, characteristic (and
-
StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
Molecular dynamics simulations serve as a prevalent approach for investigating the dynamic behaviour of proteins and protein–ligand complexes. Due to its versatility and speed, GROMACS stands out as a commonly utilized software platform for executing molecular dynamics simulations. However, its effective utilization requires substantial expertise in configuring, executing, and interpreting molecular
-
Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Domenico Gadaleta, Marina Garcia de Lomana, Eva Serrano-Candelas, Rita Ortega-Vallbona, Rafael Gozalbes, Alessandra Roncaglioni, Emilio Benfenati
The adverse outcome pathway (AOP) concept has gained attention as a way to explore the mechanism of chemical toxicity. In this study, quantitative structure–activity relationship (QSAR) models were developed to predict compound activity toward protein targets relevant to molecular initiating events (MIE) upstream of organ-specific toxicities, namely liver steatosis, cholestasis, nephrotoxicity, neural
-
Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee
We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules
-
Searching chemical databases in the pre-history of cheminformatics J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Peter Willett
This article highlights research from the last century that has provided the basis for the searching techniques that are used in present-day cheminformatics systems, and thus provides an acknowledgement of the contributions made by early pioneers in the field.
-
GTransCYPs: an improved graph transformer neural network with attention pooling for reliably predicting CYP450 inhibitors J. Cheminfom. (IF 7.1) Pub Date : 2024-10-29 Candra Zonyfar, Soualihou Ngnamsie Njimbouom, Sophia Mosalla, Jeong-Dong Kim
State‑of‑the‑art medical studies proved that predicting CYP450 enzyme inhibitors is beneficial in the early stage of drug discovery. However, accurate machine learning-based (ML) in silico methods for predicting CYP450 inhibitors remains challenging. Here, we introduce GTransCYPs, an improved graph neural network (GNN) with a transformer mechanism for predicting CYP450 inhibitors. This model significantly
-
A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Sina Abdollahi, Darius P. Schaub, Madalena Barroso, Nora C. Laubach, Wiebke Hutwelker, Ulf Panzer, S.øren W. Gersting, Stefan Bonn
The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here,
-
Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Zeqing Bao, Gary Tom, Austin Cheng, Jeffrey Watchorn, Alán Aspuru-Guzik, Christine Allen
Drug solubility is an important parameter in the drug development process, yet it is often tedious and challenging to measure, especially for expensive drugs or those available in small quantities. To alleviate these challenges, machine learning (ML) has been applied to predict drug solubility as an alternative approach. However, the majority of existing ML research has focused on the predictions of
-
MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model J. Cheminfom. (IF 7.1) Pub Date : 2024-10-23 Sadettin Y. Ugurlu, David McDonald, Shan He
A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms
-
Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization J. Cheminfom. (IF 7.1) Pub Date : 2024-10-23 Miguel García-Ortegón, Srijit Seal, Carl Rasmussen, Andreas Bender, Sergio Bacallado
Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to
-
Large-scale annotation of biochemically relevant pockets and tunnels in cognate enzyme–ligand complexes J. Cheminfom. (IF 7.1) Pub Date : 2024-10-15 O. Vavra, J. Tyzack, F. Haddadi, J. Stourac, J. Damborsky, S. Mazurenko, J. M. Thornton, D. Bednar
Tunnels in enzymes with buried active sites are key structural features allowing the entry of substrates and the release of products, thus contributing to the catalytic efficiency. Targeting the bottlenecks of protein tunnels is also a powerful protein engineering strategy. However, the identification of functional tunnels in multiple protein structures is a non-trivial task that can only be addressed
-
Insights into predicting small molecule retention times in liquid chromatography using deep learning J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Yuting Liu, Akiyasu C. Yoshizawa, Yiwei Ling, Shujiro Okuda
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in
-
Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Sejal Sharma, Liping Feng, Nicha Boonpattrawong, Arvinder Kapur, Lisa Barroilhet, Manish S. Patankar, Spencer S. Ericksen
Focused screening on target-prioritized compound sets can be an efficient alternative to high throughput screening (HTS). For most biomolecular targets, compound prioritization models depend on prior screening data or a target structure. For phenotypic or multi-protein pathway targets, it may not be clear which public assay records provide relevant data. The question also arises as to whether data
-
Bitter peptide prediction using graph neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Prashant Srivastava, Alexandra Steuer, Francesco Ferri, Alessandro Nicoli, Kristian Schultz, Saptarshi Bej, Antonella Di Pizio, Olaf Wolkenhauer
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification
-
A multi-view feature representation for predicting drugs combination synergy based on ensemble and multi-task attention models J. Cheminfom. (IF 7.1) Pub Date : 2024-09-27 Samar Monem, Aboul Ella Hassanien, Alaa H. Abdel-Hamid
This paper proposes a novel multi-view ensemble predictor model that is designed to address the challenge of determining synergistic drug combinations by predicting both the synergy score value values and synergy class label of drug combinations with cancer cell lines. The proposed methodology involves representing drug features through four distinct views: Simplified Molecular-Input Line-Entry System
-
Combining graph neural networks and transformers for few-shot nuclear receptor binding activity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-09-27 Luis H. M. Torres, Joel P. Arrais, Bernardete Ribeiro
Nuclear receptors (NRs) play a crucial role as biological targets in drug discovery. However, determining which compounds can act as endocrine disruptors and modulate the function of NRs with a reduced amount of candidate drugs is a challenging task. Moreover, the computational methods for NR-binding activity prediction mostly focus on a single receptor at a time, which may limit their effectiveness
-
Computer-aided pattern scoring (C@PS): a novel cheminformatic workflow to predict ligands with rare modes-of-action J. Cheminfom. (IF 7.1) Pub Date : 2024-09-23 Sven Marcel Stefan, Katja Stefan, Vigneshwaran Namasivayam
The identification, establishment, and exploration of potential pharmacological drug targets are major steps of the drug development pipeline. Target validation requires diverse chemical tools that come with a spectrum of functionality, e.g., inhibitors, activators, and other modulators. Particularly tools with rare modes-of-action allow for a proper kinetic and functional characterization of the targets-of-interest
-
EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Zhiguang Fan, Yuedong Yang, Mingyuan Xu, Hongming Chen
Despite recent advancement in 3D molecule conformation generation driven by diffusion models, its high computational cost in iterative diffusion/denoising process limits its application. Here, an equivariant consistency model (EC-Conf) was proposed as a fast diffusion method for low-energy conformation generation. In EC-Conf, a modified SE (3)-equivariant transformer model was directly used to encode
-
RAIChU: automating the visualisation of natural product biosynthesis J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Barbara R. Terlouw, Friederike Biermann, Sophie P. J. M. Vromans, Elham Zamani, Eric J. N. Helfrich, Marnix H. Medema
Natural products are molecules that fulfil a range of important ecological functions. Many natural products have been exploited for pharmaceutical and agricultural applications. In contrast to many other specialised metabolites, the products of modular nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) systems can often (partially) be predicted from the DNA sequence of the biosynthetic
-
Evaluating the generalizability of graph neural networks for predicting collision cross section J. Cheminfom. (IF 7.1) Pub Date : 2024-08-29 Chloe Engler Hart, António José Preto, Shaurya Chanana, David Healey, Tobias Kind, Daniel Domingo-Fernández
Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development
-
BuildAMol: a versatile Python toolkit for fragment-based molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-08-25 Noah Kleinschmidt, Thomas Lemmin
In recent years computational methods for molecular modeling have become a prime focus of computational biology and cheminformatics. Many dedicated systems exist for modeling specific classes of molecules such as proteins or small drug-like ligands. These are often heavily tailored toward the automated generation of molecular structures based on some meta-input by the user and are not intended for
-
Deep learning of multimodal networks with topological regularization for drug repositioning J. Cheminfom. (IF 7.1) Pub Date : 2024-08-23 Yuto Ohnuki, Manato Akiyama, Yasubumi Sakakibara
Computational techniques for drug-disease prediction are essential in enhancing drug discovery and repositioning. While many methods utilize multimodal networks from various biological databases, few integrate comprehensive multi-omics data, including transcriptomes, proteomes, and metabolomes. We introduce STRGNN, a novel graph deep learning approach that predicts drug-disease relationships using
-
Automatic molecular fragmentation by evolutionary optimisation J. Cheminfom. (IF 7.1) Pub Date : 2024-08-19 Fiona C. Y. Yu, Jorge L. Gálvez Vallejo, Giuseppe M. J. Barca
Molecular fragmentation is an effective suite of approaches to reduce the formal computational complexity of quantum chemistry calculations while enhancing their algorithmic parallelisability. However, the practical applicability of fragmentation techniques remains hindered by a dearth of automation and effective metrics to assess the quality of a fragmentation scheme. In this article, we present the
-
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow J. Cheminfom. (IF 7.1) Pub Date : 2024-08-16 José T. Moreira-Filho, Dhruv Ranganath, Mike Conway, Charles Schmitt, Nicole Kleinstreuer, Kamel Mansouri
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or
-
Metis: a python-based user interface to collect expert feedback for generative chemistry models J. Cheminfom. (IF 7.1) Pub Date : 2024-08-14 Janosch Menke, Yasmine Nahal, Esben Jannik Bjerrum, Mikhail Kabeshov, Samuel Kaski, Ola Engkvist
One challenge that current de novo drug design models face is a disparity between the user’s expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists’ implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is
-
Geometric deep learning for molecular property predictions with chemical accuracy across chemical space J. Cheminfom. (IF 7.1) Pub Date : 2024-08-13 Maarten R. Dobbelaere, István Lengyel, Christian V. Stevens, Kevin M. Van Geem
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase
-
MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models J. Cheminfom. (IF 7.1) Pub Date : 2024-08-12 Sergey Sosnin
The exponential growth of data is challenging for humans because their ability to analyze data is limited. Especially in chemistry, there is a demand for tools that can visualize molecular datasets in a convenient graphical way. We propose a new, ready-to-use, multi-tool, and open-source framework for visualizing and navigating chemical space. This framework adheres to the low-code/no-code (LCNC) paradigm
-
Building shape-focused pharmacophore models for effective docking screening J. Cheminfom. (IF 7.1) Pub Date : 2024-08-09 Paola Moyano-Gómez, Jukka V. Lehtonen, Olli T. Pentikäinen, Pekka A. Postila
The performance of molecular docking can be improved by comparing the shape similarity of the flexibly sampled poses against the target proteins’ inverted binding cavities. The effectiveness of these pseudo-ligands or negative image-based models in docking rescoring is boosted further by performing enrichment-driven optimization. Here, we introduce a novel shape-focused pharmacophore modeling algorithm
-
An automated calculation pipeline for differential pair interaction energies with molecular force fields using the Tinker Molecular Modeling Package J. Cheminfom. (IF 7.1) Pub Date : 2024-08-08 Felix Bänsch, Mirco Daniel, Harald Lanig, Christoph Steinbeck, Achim Zielesny
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer–monomer distances, estimation
-
Evaluation of reinforcement learning in transformer-based molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-08-08 Jiazhen He, Alessandro Tibo, Jon Paul Janet, Eva Nittinger, Christian Tyrchan, Werngard Czechtizky, Ola Engkvist
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization
-
Hamiltonian diversity: effectively measuring molecular diversity by shortest Hamiltonian circuits J. Cheminfom. (IF 7.1) Pub Date : 2024-08-07 Xiuyuan Hu, Guoqing Liu, Quanming Yao, Yang Zhao, Hao Zhang
In recent years, significant advancements have been made in molecular generation algorithms aimed at facilitating drug development, and molecular diversity holds paramount importance within the realm of molecular generation. Nonetheless, the effective quantification of molecular diversity remains an elusive challenge, as extant metrics exemplified by Richness and Internal Diversity fall short in concurrently
-
Advancements in biotransformation pathway prediction: enhancements, datasets, and novel functionalities in enviPath J. Cheminfom. (IF 7.1) Pub Date : 2024-08-06 Jasmin Hafner, Tim Lorsbach, Sebastian Schmidt, Liam Brydon, Katharina Dost, Kunyang Zhang, Kathrin Fenner, Jörg Wicker
enviPath is a widely used database and prediction system for microbial biotransformation pathways of primarily xenobiotic compounds. Data and prediction system are freely available both via a web interface and a public REST API. Since its initial release in 2016, we extended the data available in enviPath and improved the performance of the prediction system and usability of the overall system. We
-
PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications J. Cheminfom. (IF 7.1) Pub Date : 2024-08-02 Yang Tan, Mingchen Li, Ziyi Zhou, Pan Tan, Huiqun Yu, Guisheng Fan, Liang Hong
Protein language models (PLMs) play a dominant role in protein representation learning. Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem with this representation method is that it simply divides the protein sequence into sequences of individual amino acids, ignoring the fact that certain residues often occur together. Therefore, it is inappropriate to view amino
-
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example J. Cheminfom. (IF 7.1) Pub Date : 2024-08-02 Run-Hsin Lin, Pinpin Lin, Chia-Chi Wang, Chun-Wei Tung
Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets
-
Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling J. Cheminfom. (IF 7.1) Pub Date : 2024-08-01 Louis Plyer, Gilles Marcou, Céline Perves, Fanny Bonachera, Alexander Varnek
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed
-
Transfer learning across different chemical domains: virtual screening of organic materials with deep learning models pretrained on small molecule and chemical reaction data J. Cheminfom. (IF 7.1) Pub Date : 2024-07-30 Chengwei Zhang, Yushuang Zhai, Ziyang Gong, Hongliang Duan, Yuan-Bin She, Yun-Fang Yang, An Su
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules
-
Reproducible MS/MS library cleaning pipeline in matchms J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Niek F. de Jonge, Helge Hecht, Michael Strobel, Mingxun Wang, Justin J. J. van der Hooft, Florian Huber
Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting
-
Hilbert-curve assisted structure embedding method J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Gergely Zahoránszky-Kőhalmi, Kanny K. Wan, Alexander G. Godfrey
Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret
-
A computational workflow for analysis of missense mutations in precision oncology J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Rayyan Tariq Khan, Petra Pokorna, Jan Stourac, Simeon Borko, Ihor Arefiev, Joan Planas-Iglesias, Adam Dobias, Gaspar Pinto, Veronika Szotkowska, Jaroslav Sterba, Ondrej Slaby, Jiri Damborsky, Stanislav Mazurenko, David Bednar
Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for
-
Enhancing molecular property prediction with auxiliary learning and task-specific adaptation J. Cheminfom. (IF 7.1) Pub Date : 2024-07-24 Vishal Dey, Xia Ning
Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple
-
CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis J. Cheminfom. (IF 7.1) Pub Date : 2024-07-24 Karla P. Godinez-Macias, Elizabeth A. Winzeler
It is well-accepted that knowledge of a small molecule’s target can accelerate optimization. Although chemogenomic databases are helpful resources for predicting or finding compound interaction partners, they tend to be limited and poorly annotated. Furthermore, unlike genes, compound identifiers are often not standardized, and many synonyms may exist, especially in the biological literature, making
-
Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore J. Cheminfom. (IF 7.1) Pub Date : 2024-07-23 Shuan Chen, Yousung Jung
Synthetic accessibility prediction is a task to estimate how easily a given molecule might be synthesizable in the laboratory, playing a crucial role in computer-aided molecular design. Although synthesis planning programs can determine synthesis routes, their slow processing times make them impractical for large-scale molecule screening. On the other hand, existing rapid synthesis accessibility estimation