Pre-training via Denoising for Molecular Property Prediction

Zaidi, Sheheryar; Schaarschmidt, Michael; Martens, James; Kim, Hyunjik; Teh, Yee Whye; Sanchez-Gonzalez, Alvaro; Battaglia, Peter; Pascanu, Razvan; Godwin, Jonathan

Computer Science > Machine Learning

arXiv:2206.00133 (cs)

[Submitted on 31 May 2022 (v1), last revised 25 Oct 2022 (this version, v2)]

Title:Pre-training via Denoising for Molecular Property Prediction

Authors:Sheheryar Zaidi, Michael Schaarschmidt, James Martens, Hyunjik Kim, Yee Whye Teh, Alvaro Sanchez-Gonzalez, Peter Battaglia, Razvan Pascanu, Jonathan Godwin

View PDF

Abstract:Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Relying on the well-known link between denoising autoencoders and score-matching, we show that the denoising objective corresponds to learning a molecular force field -- arising from approximating the Boltzmann distribution with a mixture of Gaussians -- directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors -- dataset sizes, model size and architecture, and the choice of upstream and downstream datasets -- on pre-training.

Subjects:	Machine Learning (cs.LG); Biomolecules (q-bio.BM); Machine Learning (stat.ML)
Cite as:	arXiv:2206.00133 [cs.LG]
	(or arXiv:2206.00133v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2206.00133

Submission history

From: Sheheryar Zaidi [view email]
[v1] Tue, 31 May 2022 22:28:34 UTC (512 KB)
[v2] Tue, 25 Oct 2022 00:58:51 UTC (554 KB)

Computer Science > Machine Learning

Title:Pre-training via Denoising for Molecular Property Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pre-training via Denoising for Molecular Property Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators