Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3583780.3614974acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

MPerformer: An SE(3) Transformer-based Molecular Perceptron

Published: 21 October 2023 Publication History

Abstract

Molecular perception aims to construct 3D molecules from 3D atom clouds (i.e., atom types and corresponding 3D coordinates), determining bond connections, bond orders, and other molecular attributes within molecules. It is essential for realizing many applications in cheminformatics and bioinformatics, such as modeling quantum chemistry-derived molecular structures in protein-ligand complexes. Additionally, many molecular generation methods can only generate molecular 3D atom clouds, requiring molecular perception as a necessary post-processing. However, existing molecular perception methods mainly rely on predefined chemical rules and fail to leverage 3D geometric information, whose performance is sub-optimal fully. In this study, we propose MPerformer, an SE(3) Transformer-based molecular perceptron exhibiting SE(3)-invariance, to construct 3D molecules from 3D atom clouds efficiently. Besides, we propose a multi-task pretraining-and-finetuning paradigm to learn this model. In the pretraining phase, we jointly minimize an attribute prediction loss and an atom cloud reconstruction loss, mitigating the data imbalance issue of molecular attributes and enhancing the robustness and generalizability of the model. Experiments show that MPerformer significantly outperforms state-of-the-art molecular perception methods in precision and robustness, benefiting various molecular generation scenarios.

References

[1]
Jon C Baber and Edward E Hodgkin. 1992. Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural Database. Journal of chemical information and computer sciences, Vol. 32, 5 (1992), 401--406.
[2]
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).
[3]
Guy Barshatski, Galia Nordon, and Kira Radinsky. 2021. Multi-Property Molecular Optimization using an Integrated Poly-Cycle Architecture. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3727--3736.
[4]
JA Bell, Y Cao, JR Gunn, T Day, E Gallicchio, Z Zhou, R Levy, and R Farid. 2012. PrimeX and the Schrödinger computational chemistry suite of programs. (2012).
[5]
Roy Benjamin, Uriel Singer, and Kira Radinsky. 2022. Graph Neural Networks Pretraining Through Inherent Supervision for Molecular Property Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2903--2912.
[6]
Helen M Berman, John Westbrook, Zukang Feng, Gary Gilliland, Talapady N Bhat, Helge Weissig, Ilya N Shindyalov, and Philip E Bourne. 2000. The protein data bank. Nucleic acids research, Vol. 28, 1 (2000), 235--242.
[7]
Jean-Claude Bradley, Andrew Lang, Antony Williams, and Evan Curtin. 2011. ONS Open Melting Point Collection. Nature Precedings (2011), 1--1.
[8]
Stephen K Burley, Charmi Bhikadiya, Chunxiao Bi, Sebastian Bittrich, Li Chen, Gregg V Crichlow, Cole H Christie, Kenneth Dalenberg, Luigi Di Costanzo, Jose M Duarte, et al. 2021. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic acids research, Vol. 49, D1 (2021), D437--D451.
[9]
Ziqi Chen, Martin Renqiang Min, Srinivasan Parthasarathy, and Xia Ning. 2021. A deep generative model for molecule optimization via one fragment modification. Nature machine intelligence, Vol. 3, 12 (2021), 1040--1049.
[10]
Anna Katharina Dehof, Alexander Rurainski, Quang Bao Anh Bui, Sebastian Böcker, Hans-Peter Lenhof, and Andreas Hildebrandt. 2011. Automated bond order assignment as an optimization problem. Bioinformatics, Vol. 27, 5 (2011), 619--625.
[11]
Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. 2022. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, Vol. 4, 2 (2022), 127--134.
[12]
Jinjia Feng, Zhen Wang, Yaliang Li, Bolin Ding, Zhewei Wei, and Hongteng Xu. 2022. MGMAE: Molecular Representation Learning by Reconstructing Heterogeneous Graphs with A High Mask Ratio. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 509--519.
[13]
Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. 2020. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 1970--1981.
[14]
Niklas WA Gebauer, Michael Gastegger, Stefaan SP Hessmann, Klaus-Robert Müller, and Kristof T Schütt. 2022. Inverse design of 3d molecular structures with conditional generative neural networks. Nature communications, Vol. 13, 1 (2022), 973.
[15]
Marcus D Hanwell, Donald E Curtis, David C Lonie, Tim Vandermeersch, Eva Zurek, and Geoffrey R Hutchison. 2012. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. Journal of cheminformatics, Vol. 4, 1 (2012), 1--17.
[16]
Manfred Hendlich, Friedrich Rippmann, and Gerhard Barnickel. 1997. BALI: automatic assignment of bond and atom types for protein ligands in the brookhaven protein databank. Journal of chemical information and computer sciences, Vol. 37, 4 (1997), 774--778.
[17]
Emiel Hoogeboom, Victor Garcia Satorras, Clément Vignac, and Max Welling. 2022. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning. PMLR, 8867--8887.
[18]
Maria Kadukova and Sergei Grudinin. 2016. Knodle: a support vector machines-based automatic perception of organic molecules from 3D coordinates. Journal of Chemical Information and Modeling, Vol. 56, 8 (2016), 1410--1419.
[19]
Paul Labute. 2005. On the perception of molecules from 3D atomic coordinates. Journal of chemical information and modeling, Vol. 45, 2 (2005), 215--221.
[20]
Greg Landrum et al. 2013. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, Vol. 8 (2013).
[21]
Elke Lang, Claus-Wilhelm von der Lieth, and Thomas Förster. 1992. Automatic assignment of bond orders based on the analysis of the internal coordinates of molecular structures. Analytica chimica acta, Vol. 265, 2 (1992), 283--289.
[22]
Andrew R Leach, Daniel P Dolata, and Keith Prout. 1990. Automated conformational analysis and structure generation: algorithms for molecular perception. Journal of chemical information and computer sciences, Vol. 30, 3 (1990), 316--324.
[23]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[24]
Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. 2022a. Generating 3D Molecules for Target Protein Binding. In International Conference on Machine Learning.
[25]
Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. 2022b. Pre-training Molecular Graph Representation with 3D Geometry. In International Conference on Learning Representations.
[26]
Zhiyuan Liu, Yankai Lin, and Maosong Sun. 2020. Representation learning for natural language processing. Springer Nature.
[27]
Siyu Long, Yi Zhou, Xinyu Dai, and Hao Zhou. 2022. Zero-Shot 3D Drug Design by Sketching and Generating. In NeurIPS.
[28]
Christoph Loschen. 2018. Perception of Chemical Bonds via Machine Learning. (2018).
[29]
Shuqi Lu, Lin Yao, Xi Chen, Hang Zheng, Di He, and Guolin Ke. 2023. 3D Molecular Generation via Virtual Dynamics. arXiv preprint arXiv:2302.05847 (2023).
[30]
Nicholas Lubbers, Justin S Smith, and Kipton Barros. 2018. Hierarchical modeling of molecular energies using a deep neural network. The Journal of chemical physics, Vol. 148, 24 (2018), 241715.
[31]
Shitong Luo, Jiaqi Guan, Jianzhu Ma, and Jian Peng. 2021. A 3D generative model for structure-based drug design. Advances in Neural Information Processing Systems, Vol. 34 (2021), 6229--6239.
[32]
Youzhi Luo and Shuiwang Ji. 2022. An autoregressive flow model for 3d molecular geometry generation from scratch. In International Conference on Learning Representations (ICLR).
[33]
Changsheng Ma, Qiang Yang, Xin Gao, and Xiangliang Zhang. 2022a. Disentangled Molecular Graph Generation via an Invertible Flow Model. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1420--1429.
[34]
Changsheng Ma and Xiangliang Zhang. 2021. GF-VAE: a flow-based variational autoencoder for molecule generation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1181--1190.
[35]
Runze Ma, Yidan Zhang, Xinye Wang, Zhenyang Yu, and Lei Duan. 2022b. MORN: Molecular Property Prediction Based on Textual-Topological-Spatial Multi-View Learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1461--1470.
[36]
Sergey Magedov, Christopher Koh, Walter Malone, Nicholas Lubbers, and Benjamin Nebgen. 2021. Bond order predictions using deep neural networks. Journal of Applied Physics, Vol. 129, 6 (2021), 064701.
[37]
Rocco Meli. 2022. Deep learning applications in structure-based drug discovery. Ph.,D. Dissertation. University of Oxford.
[38]
Noel M O'Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. 2011. Open Babel: An open chemical toolbox. Journal of cheminformatics, Vol. 3, 1 (2011), 1--14.
[39]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
[40]
Matthew Ragoza, Tomohide Masuda, and David Ryan Koes. 2022. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem Sci, Vol. 13 (7 Feb 2022), 2701--2713. https://doi.org/10.1039/D1SC05976A
[41]
Bharath Ramsundar, V Pande, P Eastman, E Feinberg, J Gomes, K Leswing, A Pappu, and M Wu. 2016. Democratizing deep-learning for drug discovery, quantum chemistry, materials science and biology. GitHub repository (2016).
[42]
Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. 2020. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12559--12571.
[43]
Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. 2021. E (n) equivariant graph neural networks. In International conference on machine learning. PMLR, 9323--9332.
[44]
Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, and Tie-Yan Liu. 2022. Benchmarking graphormer on large-scale molecular modeling datasets. arXiv preprint arXiv:2203.04810 (2022).
[45]
Yuancheng Sun, Yimeng Chen, Weizhi Ma, Wenhao Huang, Kang Liu, Zhiming Ma, Wei-Ying Ma, and Yanyan Lan. 2022. PEMP: Leveraging Physics Properties to Enhance Molecular Property Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3505--3513.
[46]
Jeffrey J Sutherland, Lee A O'Brien, and Donald F Weaver. 2004. A comparison of methods for modeling quantitative structure- activity relationships. Journal of Medicinal Chemistry, Vol. 47, 22 (2004), 5541--5554.
[47]
Sascha Urbaczek, Adrian Kolodzik, Inken Groth, Stefan Heuser, and Matthias Rarey. 2013. Reading pdb: perception of molecules from 3d atomic coordinates. Journal of chemical information and modeling, Vol. 53, 1 (2013), 76--87.
[48]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[49]
Jike Wang, Chang-Yu Hsieh, Mingyang Wang, Xiaorui Wang, Zhenxing Wu, Dejun Jiang, Benben Liao, Xujun Zhang, Bo Yang, Qiaojun He, et al. 2021. Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning. Nature Machine Intelligence, Vol. 3, 10 (2021), 914--922.
[50]
Junmei Wang, Wei Wang, Peter A Kollman, and David A Case. 2006. Automatic atom type and bond type perception in molecular mechanical calculations. Journal of molecular graphics and modelling, Vol. 25, 2 (2006), 247--260.
[51]
Qi Wang, Yue Ma, Kun Zhao, and Yingjie Tian. 2020. A comprehensive survey of loss functions in machine learning. Annals of Data Science (2020), 1--26.
[52]
Sheng Wang, Yuzhi Guo, Yuhong Wang, Hongmao Sun, and Junzhou Huang. 2019. SMILES-BERT: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics. 429--436.
[53]
Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. 2022. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, Vol. 4, 3 (2022), 279--287.
[54]
Ivan D Welsh and Jane R Allison. 2019. Automated simultaneous assignment of bond orders and formal charges. Journal of Cheminformatics, Vol. 11, 1 (2019), 1--12.
[55]
John D Westbrook, Chenghua Shao, Zukang Feng, Marina Zhuravleva, Sameer Velankar, and Jasmine Young. 2015. The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank. Bioinformatics, Vol. 31, 8 (2015), 1274--1278.
[56]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, Vol. 32, 1 (2020), 4--24.
[57]
Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. 2022. GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation. In International Conference on Learning Representations. https://openreview.net/forum?id=PzcvxEMzvQC
[58]
Jinliang Yuan, Hualei Yu, Meng Cao, Ming Xu, Junyuan Xie, and Chongjun Wang. 2021. Semi-supervised and self-supervised classification with multi-view graph neural networks. In Proceedings of the 30th ACM international conference on information & knowledge management. 2466--2476.
[59]
Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 12 (2021), 5586--5609.
[60]
Yuan Zhao, Tiejun Cheng, and Renxiao Wang. 2007. Automatic perception of organic molecules based on essential structural information. Journal of chemical information and modeling, Vol. 47, 4 (2007), 1379--1385.
[61]
Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. 2023. Uni-Mol: A Universal 3D Molecular Representation Learning Framework. In The Eleventh International Conference on Learning Representations.
[62]
Xinyu Zhu, Yongliang Shen, and Weiming Lu. 2022. Molecular substructure-aware network for drug-drug interaction prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 4757--4761.

Cited By

View all
  • (2025)Generation of SARS-CoV-2 dual-target candidate inhibitors through 3D equivariant conditional generative neural networksJournal of Pharmaceutical Analysis10.1016/j.jpha.2025.101229(101229)Online publication date: Feb-2025
  • (2024)MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property PredictionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679684(2336-2346)Online publication date: 21-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SE(3) transformer
  2. molecular generation
  3. molecular perception
  4. multi-task pretraining

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)15
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Generation of SARS-CoV-2 dual-target candidate inhibitors through 3D equivariant conditional generative neural networksJournal of Pharmaceutical Analysis10.1016/j.jpha.2025.101229(101229)Online publication date: Feb-2025
  • (2024)MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property PredictionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679684(2336-2346)Online publication date: 21-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media