Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
Immunity ( IF 25.5 ) Pub Date : 2024-08-19 , DOI: 10.1016/j.immuni.2024.07.022 Yiquan Wang 1 , Huibin Lv 2 , Qi Wen Teo 2 , Ruipeng Lei 1 , Akshita B Gopal 1 , Wenhao O Ouyang 1 , Yuen-Hei Yeung 3 , Timothy J C Tan 4 , Danbi Choi 1 , Ivana R Shen 1 , Xin Chen 4 , Claire S Graham 1 , Nicholas C Wu 5
Immunity ( IF 25.5 ) Pub Date : 2024-08-19 , DOI: 10.1016/j.immuni.2024.07.022 Yiquan Wang 1 , Huibin Lv 2 , Qi Wen Teo 2 , Ruipeng Lei 1 , Akshita B Gopal 1 , Wenhao O Ouyang 1 , Yuen-Hei Yeung 3 , Timothy J C Tan 4 , Danbi Choi 1 , Ivana R Shen 1 , Xin Chen 4 , Claire S Graham 1 , Nicholas C Wu 5
Affiliation
Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and the inaccessibility of datasets for model training. In this study, we curated >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM could identify key sequence features of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of the antibody response to the influenza virus but also provides a valuable resource for applying deep learning to antibody research.
中文翻译:
使用精选流感血凝素抗体进行抗体特异性预测的可解释语言模型
尽管进行了数十年的抗体研究,但仅根据抗体序列来预测抗体的特异性仍然具有挑战性。两个主要障碍是缺乏合适的模型和无法获取用于模型训练的数据集。在这项研究中,我们通过挖掘研究出版物和专利策划了 >5,000 流感血凝素 (HA) 抗体,揭示了 HA 头和茎结构域抗体之间的许多不同序列特征。然后,我们利用该数据集开发了一个轻量级记忆 B 细胞语言模型 (mBLM),用于基于序列的抗体特异性预测。模型可解释性分析表明,mBLM 可以识别 HA 茎抗体的关键序列特征。此外,通过将 mBLM 应用于表位未知的 HA 抗体,我们发现并实验验证了许多 HA 干细胞抗体。总的来说,这项研究不仅促进了我们对抗体对流感病毒反应的分子理解,还为将深度学习应用于抗体研究提供了宝贵的资源。
更新日期:2024-08-19
中文翻译:
使用精选流感血凝素抗体进行抗体特异性预测的可解释语言模型
尽管进行了数十年的抗体研究,但仅根据抗体序列来预测抗体的特异性仍然具有挑战性。两个主要障碍是缺乏合适的模型和无法获取用于模型训练的数据集。在这项研究中,我们通过挖掘研究出版物和专利策划了 >5,000 流感血凝素 (HA) 抗体,揭示了 HA 头和茎结构域抗体之间的许多不同序列特征。然后,我们利用该数据集开发了一个轻量级记忆 B 细胞语言模型 (mBLM),用于基于序列的抗体特异性预测。模型可解释性分析表明,mBLM 可以识别 HA 茎抗体的关键序列特征。此外,通过将 mBLM 应用于表位未知的 HA 抗体,我们发现并实验验证了许多 HA 干细胞抗体。总的来说,这项研究不仅促进了我们对抗体对流感病毒反应的分子理解,还为将深度学习应用于抗体研究提供了宝贵的资源。