Multi-Scale Protein Language Model for Unified Molecular Modeling

Zheng, Kangjie; Long, Siyu; Lu, Tianyu; Yang, Junwei; Dai, Xinyu; Zhang, Ming; Nie, Zaiqing; Ma, Wei-Ying; Zhou, Hao

Quantitative Biology > Biomolecules

arXiv:2403.12995v1 (q-bio)

[Submitted on 5 Mar 2024 (this version), latest version 13 Jun 2024 (v4)]

Title:Multi-Scale Protein Language Model for Unified Molecular Modeling

Authors:Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

View PDF HTML (experimental)

Abstract:Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ms-ESM (multi-scale ESM), a novel approach that enables multi-scale unified molecular modeling. ms-ESM achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ms-ESM surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ms-ESM not only gains molecular knowledge but also retains its understanding of proteins.

Subjects:	Biomolecules (q-bio.BM); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Cite as:	arXiv:2403.12995 [q-bio.BM]
	(or arXiv:2403.12995v1 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2403.12995

Submission history

From: Siyu Long [view email]
[v1] Tue, 5 Mar 2024 13:35:41 UTC (578 KB)
[v2] Thu, 16 May 2024 08:21:11 UTC (802 KB)
[v3] Fri, 31 May 2024 07:28:40 UTC (910 KB)
[v4] Thu, 13 Jun 2024 02:29:34 UTC (910 KB)

Quantitative Biology > Biomolecules

Title:Multi-Scale Protein Language Model for Unified Molecular Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:Multi-Scale Protein Language Model for Unified Molecular Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators