Learning Molecular Representation in a Cell

Liu, Gang; Seal, Srijit; Arevalo, John; Liang, Zhenwen; Carpenter, Anne E.; Jiang, Meng; Singh, Shantanu

Computer Science > Machine Learning

arXiv:2406.12056 (cs)

[Submitted on 17 Jun 2024 (v1), last revised 2 Oct 2024 (this version, v3)]

Title:Learning Molecular Representation in a Cell

Authors:Gang Liu, Srijit Seal, John Arevalo, Zhenwen Liang, Anne E. Carpenter, Meng Jiang, Shantanu Singh

View PDF HTML (experimental)

Abstract:Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream applications: molecular property prediction against up to 27 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

Comments:	20 pages, 5 tables, 7 figures
Subjects:	Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2406.12056 [cs.LG]
	(or arXiv:2406.12056v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.12056

Submission history

From: Gang Liu [view email]
[v1] Mon, 17 Jun 2024 19:48:42 UTC (604 KB)
[v2] Sat, 22 Jun 2024 22:50:05 UTC (576 KB)
[v3] Wed, 2 Oct 2024 19:26:46 UTC (615 KB)

Computer Science > Machine Learning

Title:Learning Molecular Representation in a Cell

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Molecular Representation in a Cell

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators