ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins

Castro, Egbert; Godavarthi, Abhinav; Rubinfien, Julian; Givechian, Kevin B.; Bhaskar, Dhananjay; Krishnaswamy, Smita

Computer Science > Machine Learning

arXiv:2201.09948 (cs)

[Submitted on 24 Jan 2022 (v1), last revised 31 May 2022 (this version, v2)]

Title:ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins

Authors:Egbert Castro, Abhinav Godavarthi, Julian Rubinfien, Kevin B. Givechian, Dhananjay Bhaskar, Smita Krishnaswamy

View PDF

Abstract:The development of powerful natural language models have increased the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labeled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder which features a highly structured latent space that is trained to jointly generate sequences as well as predict fitness. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and novel approach for efficient fitness landscape traversal. Using ReLSO, we explicitly model the sequence-function landscape of large labeled datasets and generate new molecules by optimizing within the latent space using gradient-based methods. We evaluate this approach on several publicly-available protein datasets, including variant sets of anti-ranibizumab and GFP. We observe a greater sequence optimization efficiency (increase in fitness per optimization step) by ReLSO compared to other approaches, where ReLSO more robustly generates high-fitness sequences. Furthermore, the attention-based relationships learned by the jointly-trained ReLSO models provides a potential avenue towards sequence-level fitness attribution information.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2201.09948 [cs.LG]
	(or arXiv:2201.09948v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2201.09948

Submission history

From: Egbert Castro [view email]
[v1] Mon, 24 Jan 2022 20:55:53 UTC (8,448 KB)
[v2] Tue, 31 May 2022 14:51:32 UTC (11,034 KB)

Computer Science > Machine Learning

Title:ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators