Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Ji, Xiaohong; Wang, Zhen; Gao, Zhifeng; Zheng, Hang; Zhang, Linfeng; Ke, Guolin; E, Weinan

Computer Science > Machine Learning

arXiv:2406.14969 (cs)

[Submitted on 21 Jun 2024 (v1), last revised 1 Jul 2024 (this version, v2)]

Title:Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Authors:Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

View PDF HTML (experimental)

Abstract:In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining models remains unexplored. In this work, we present Uni-Mol2 , an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, characterizing the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show consistent improvement in the downstream tasks as the model size grows. The Uni-Mol2 with 1.1B parameters also outperforms existing methods, achieving an average 27% improvement on the QM9 and 14% on COMPAS-1D dataset.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.14969 [cs.LG]
	(or arXiv:2406.14969v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.14969

Submission history

From: Xiaohong Ji [view email]
[v1] Fri, 21 Jun 2024 08:28:54 UTC (2,397 KB)
[v2] Mon, 1 Jul 2024 09:08:44 UTC (2,396 KB)

Computer Science > Machine Learning

Title:Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Uni-Mol2: Exploring Molecular Pretraining Model at Scale

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators