Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: yhmath

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-ND 4.0
arXiv:2403.13829v1 [q-bio.BM] 07 Mar 2024

Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Xiangxin Zhou1,2,3123{}^{1,2,3}start_FLOATSUPERSCRIPT 1 , 2 , 3 end_FLOATSUPERSCRIPT   Xiwei Cheng3,4*34{}^{3,4*}start_FLOATSUPERSCRIPT 3 , 4 * end_FLOATSUPERSCRIPT Yuwei Yang33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT Yu Bao33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT Liang Wang1,212{}^{1,2}start_FLOATSUPERSCRIPT 1 , 2 end_FLOATSUPERSCRIPT Quanquan Gu33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTSchool of Artificial Intelligence, University of Chinese Academy of Sciences
22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTCenter for Research on Intelligent Perception and Computing (CRIPAC),
  State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS),
  Institute of Automation, Chinese Academy of Sciences (CASIA)
33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTByteDance Research
44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPTHalıcıoğlu Data Science Institute, University of California San Diego
Equal Contribution. Work was done during Xiangxin’s and Xiwei’s internship at ByteDance.Corresponding Author: Quanquan Gu (quanquan.gu@bytedance.com).
Abstract

Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery – designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes particularly pronounced when the target-ligand pairs used for training do not align with these desired properties. Moreover, most existing methods aim at solving de novo design task, while many generative scenarios requiring flexible controllability, such as R-group optimization and scaffold hopping, have received little attention. In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar, Additionally, DecompOpt offers a unified framework covering both de novo design and controllable generation. To achieve so, ligands are decomposed into substructures which allows fine-grained control and local optimization. Experiments show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines, and demonstrate great potential in controllable generation tasks.

1 Introduction

Refer to caption
Figure 1: Vina Scores distribution of protein-ligand pairs in CrossDocked2020 dataset. 8.188.18-8.18- 8.18 kcal/mol, marked by the red vertical line, is a commonly used value representing moderate binding affinity.

Structure-based drug design (SBDD) (Anderson, 2003) is an approach that involves designing drug molecules based on the 3D structure of a target. The goal of SBDD is to generate ligands with desired properties which can bind tightly to the target binding site. Recently, several works have cast SBDD into a conditional 3D molecular generation task and achieved remarkable success thanks to the powerful deep generative models. In these models, the target binding site serves as the condition and the conditional distribution of ligands is learnt in a data-driven manner using various generative models. Peng et al. (2022) and Zhang et al. (2022) proposed to generate molecules given pockets in an auto-regressive fasion using atoms or fragments as building blocks respectively. Guan et al. (2023a); Schneuing et al. (2022); Lin et al. (2022) use diffusion models to generate ligands by modeling atom types and positions.

Generative models are powerful approaches for extracting the underlying molecular grammar (e.g., the reasonable atomic valence, stable molecular conformation, etc.). However, they cannot generate molecules with desired properties if the training data do not align with these properties as well. Indeed, unsatisfying data quality is a common challenges in drug discovery (Vamathevan et al., 2019). As Figure 1 shows, the ligands in CrossDocked2020 (Francoeur et al., 2020), a widely used training dataset for SBDD models, have moderate binding affinities measured by molecular docking scores. Solely maximizing the likelihood of training data can mislead SBDD models and cause inefficiency in generating potential drug candidates. To overcome this limitation, molecular optimization (Xie et al., 2021; Fu et al., 2022) offers a direct path for searching molecules with desired properties in the broad chemical space. However, its application to 3D molecule generation remains unexplored.

On the other hand, current SBDD models are mostly limited to De novo design (Hartenfeller & Schneider, 2011), which focuses on generating ligands from scratch, is the main task that most efforts have been devoted to. However, controllable molecular generation scenarios, such as R-group design (Takeuchi et al., 2021) (also known as biosteric replacement) and scaffold hopping (Böhm et al., 2004), are equally, if not more, important. Unlike de novo design, controllable generation tasks start from an existing compound, and only modify a local substructure to improve the synthetic accessibility, potency and drug-likeness properties or to move into novel chemical space for patenting (Langdon et al., 2010). Controllable generation aims to utilize prior knowledge, such as a known active compound, in the design process to increase the chance of finding promising candidates. Some initial efforts have been made to address controllable molecular generation problem. For example, Igashov et al. (2022); Imrie et al. (2020; 2021); Huang et al. (2022) propose to use generative models to design linkers between the given fragments. However, these methods are designed for a specific controllable generation task and cannot be generalized.

To overcome the aforementioned challenges and limitations in existing SBDD approaches, we propose DecompOpt, a controllable and decomposed diffusion model for structure-based molecular optimization. DecompOpt combines diffusion models with optimization algorithm to harness the advantages of both approaches. Diffusion models are used to extract molecular grammar in a data-driven fashion, while optimization algorithm is used to effectively optimize the desired properties. Furthermore, DecompOpt offers a unified generation framework for both de novo design and controllable generation through ligands decomposition. Notably, a ligand that binds to a target binding site can be naturally decomposed into several substructures, i.e., arms and scaffold, where arms locally interact with corresponding subpockets and scaffold links all arms to form a complete molecule. Such decomposition motivates us to design a conditional diffusion models in the decomposed drug space which ensures flexible and fine-grained control over each substructure. We highlight our main contributions as follows:

  • We propose a new molecular generation paradigm, which combines diffusion models with iterative optimization to learn molecular grammar and optimize desired properties simultaneously.

  • We design a unified generation framework for de novo design and controllable generation via a controllable and decomposed diffusion model.

  • For de novo design, our method can generate ligands with an average Vina Dock score of 8.988.98-8.98- 8.98 and a Success Rate of 52.5%percent52.552.5\%52.5 %, achieving a new SOTA on the CrossDocked2020 benchmark.

  • For controllable generation, our method shows promising results in various practical tasks of SBDD, including R-group design and scaffold hopping.

2 Related Work

Molecule Generation

Deep generative models have shown promising results in molecule generation. In the last decade, researchers have explored various representations and models for molecule generation. Molecules can be represented in 1D (e.g., SMILES (Weininger, 1988), SELFIES (Krenn et al., 2020)), 2D (i.e., molecular graph (Bonchev, 1991)), and 3D. Among them, 3D representations attract recent attention since they capture the complete information of molecules, and have better potential to generate and optimize molecules with regard to 3D properties, such as bioactivity for a given target (Baillif et al., 2023).

SBDD represents an important application for 3D molecule generation. Ragoza et al. (2022) generate 3D molecules in atomic density grids using a variational autoencoder (Kingma & Welling, 2013). Luo et al. (2021); Liu et al. (2022); Peng et al. (2022) propose to generated atoms (and bonds) auto-regressively in 3D space, while Zhang et al. (2022) use fragment as building blocks instead. Guan et al. (2023a); Lin et al. (2022); Schneuing et al. (2022) introduce SE(3)-equivariant diffusion models for SBDD. More recent works have incorporated domain knowledge into 3D generative models, such as the correspondence between local fragments and subpockets. Guan et al. (2023a) suggest to break ligands into substructures and model them using decomposed priors in a diffusion framework, leading to remarkably improved binding affinities of the generated molecules. Zhang & Liu (2023) propose a subpocket prototype-augmented 3D molecule generation scheme to establish the relation between subpockets and their corresponding fragments. Existing methods based on deep generative models are powerful at distribution learning. However, when the training examples do not have the desired properties, these models can hardly generate out-of-distribution samples with these properties.

Molecule Optimization Optimization-based algorithms are another popular approach to design drug molecules. Methods within this category rely on predefined computable objectives to guide the optimization. Various optimization methods have been proposed for 2D drug design. JTVAE (Jin et al., 2018) uses Bayesian optimization in the latent space to indirectly optimize molecules. Reinforcement learning is used to manipulate SMILES strings (Olivecrona et al., 2017) and 2D molecular graphs (Zhou et al., 2019; Jin et al., 2020). MARS (Xie et al., 2021) leverages adaptive Markov chain Monte Carlo sampling to accelerate the exploration of chemical space. RetMol develops a retrieval-based generation scheme for iteratively improving molecular properties. Genetic algorithm is also a popular choice. GA+D (Nigam et al., 2020) uses deep learning enhanced genetic algorithm to design SELFIES strings. Graph-GA (Jensen, 2019) conducts genetic algorithm on molecular graph representation. GEGL (Ahn et al., 2020) adopts genetic algorithm to generate high quality samples for imitation learning by deep neural networks. AutoGrow 4 (Spiegel & Durrant, 2020) and RGA (Fu et al., 2022) are genetic algorithms for SBDD which incorporate target structures in molecular optimization. Both of them use molecular docking score as an objective to optimize the fitness between target structure and the generated ligands. In addition, RGA uses neural models to stablize genetic algorithm and includes target structures as a condition in its modeling. To our best knowledge, there are limited efforts on generating 3D molecules using molecular optimization.

Although optimization algorithms offers a direct approach to achieve desired properties, they require computable and accurate objectives to guide the exploration. However, not all desired properties for drug design can be easily formulated as objectives, such as molecular validity. Considering the benefits of both generative models and optimization algorithm, it is reasonable to combine them to achieve further enhanced results.

Controllable Generation De novo design aims to generate molecules from scratch, and the above-mentioned methods mainly focus on this task. Besides it, another line of research focuses on controllable molecule generation, which requires generating or optimizing partial molecules. R-group design is a task to decorate a fixed scaffold with fragments to enhance the desired properties. Langevin et al. (2020) and Maziarz et al. (2022); Imrie et al. (2021) propose to constrain the scaffold region using SMILES-based and 2D graph-based models. However, similar attempts have been rarely observed in 3D molecule generation. Scaffold hopping, on the other hand, requires the replacement of the scaffold to explore novel chemical space while maintaining the favorable decorative substructures. Imrie et al. (2020; 2021) propose autoregressive models to design 2D linker conditioning on geometric features of the input fragments and pharmacophores. Huang et al. (2022); Igashov et al. (2022) extend the application to the 3D space using variational autoencoders and diffusion models. However, existing 3D controllable generation methods are specifically designed for a single task. There lacks an unified framework to cover all possible conditional generation tasks, as well as de novo design.

3 Method

In this section, we will present our method, named DecompOpt, as illustrated in Figure 2. In Section 3.1, we show how to design a controllable and decomposed diffusion model that can generate ligand molecules conditioning on both protein subpockets and reference arms. In Section 3.2, we show how to efficiently optimize the properties of the generated ligand molecules in the decomposed drug space by improving the arm conditions.

3.1 Controllable and Decomposed Diffusion Models

Refer to caption
Figure 2: Illustration of DecompOpt. In each iteration of optimization: (1) For each subpocket, a reference arm is sampled from the ordered arm list. (2) The controllable and decomposed diffusion model generated ligand molecules based on arm (and subpocket) conditions. (3) The generated ligand molecules are collected and further decomposed into scaffolds and arms. (4) Poor arms in the ordered arm lists are replaced with the new arms that show better properties.

A ligand molecule that binds to a specific protein pocket can be naturally decomposed into several components (i.e., arms and scaffold). The arms of a ligand molecule locally interact with subpockets of the target protein. Notably, they are the main contributors to binding affinity. The scaffold links all the arms to form a complete molecule. Inspired by this, Guan et al. (2023b) introduced decomposed priors to diffusion models for SBDD. The decomposed priors not only induce a better variational lower bound as the training objective but also provides possibilities to achieve controllability in molecular generation. Specifically, the decomposed priors allow for relatively independent modeling of each arm. To combine generative models with optimization, a flexible and controllable generation framework is need. Thus we propose a controllable and decomposed diffusion model that allows for fine-grained control over the arms of the generated ligands. Considering the different functionalities of the arms and scaffold, we only control the arms that play important roles in interaction with pockets, and leave room for the generative model on the scaffold to achieve the trade-off between controllability and diversity.

Provided with a target binding site that can be represented as 𝒫={(𝒙i𝒫,𝒗i𝒫)}i{1,,N𝒫}𝒫subscriptsubscriptsuperscript𝒙𝒫𝑖subscriptsuperscript𝒗𝒫𝑖𝑖1subscript𝑁𝒫{\mathcal{P}}=\{({\bm{x}}^{\mathcal{P}}_{i},{\bm{v}}^{\mathcal{P}}_{i})\}_{i% \in\{1,\ldots,N_{\mathcal{P}}\}}caligraphic_P = { ( bold_italic_x start_POSTSUPERSCRIPT caligraphic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_v start_POSTSUPERSCRIPT caligraphic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i ∈ { 1 , … , italic_N start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT } end_POSTSUBSCRIPT, we aim to generate a ligand molecule that can be represented as ={(𝒙i,𝒗i,𝒃ij)}i,j{1,,N}subscriptsubscriptsuperscript𝒙𝑖subscriptsuperscript𝒗𝑖subscriptsuperscript𝒃𝑖𝑗𝑖𝑗1subscript𝑁{\mathcal{M}}=\{({\bm{x}}^{\mathcal{M}}_{i},{\bm{v}}^{\mathcal{M}}_{i},{\bm{b}% }^{\mathcal{M}}_{ij})\}_{i,j\in\{1,\ldots,N_{\mathcal{M}}\}}caligraphic_M = { ( bold_italic_x start_POSTSUPERSCRIPT caligraphic_M end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_v start_POSTSUPERSCRIPT caligraphic_M end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_b start_POSTSUPERSCRIPT caligraphic_M end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i , italic_j ∈ { 1 , … , italic_N start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT } end_POSTSUBSCRIPT. N𝒫subscript𝑁𝒫N_{\mathcal{P}}italic_N start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT and Nsubscript𝑁N_{\mathcal{M}}italic_N start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT are the number of atoms in the protein pocket and ligand molecule, respectively. Here 𝒙3𝒙superscript3{\bm{x}}\in\mathbb{R}^{3}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, 𝒗d𝒗superscript𝑑{\bm{v}}\in\mathbb{R}^{d}bold_italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, 𝒃5𝒃superscript5{\bm{b}}\in\mathbb{R}^{5}bold_italic_b ∈ blackboard_R start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT denote the atom position, atom type, and bond type, respectively. A ligand molecule {\mathcal{M}}caligraphic_M can be decomposed into a scaffold 𝒮𝒮{\mathcal{S}}caligraphic_S and several arms {𝒜k}k{1,,K}subscriptsubscript𝒜𝑘𝑘1𝐾\{{\mathcal{A}}_{k}\}_{k\in\{1,\ldots,K\}}{ caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ { 1 , … , italic_K } end_POSTSUBSCRIPT, where =𝒮𝒜1𝒜K𝒮subscript𝒜1subscript𝒜𝐾{\mathcal{M}}={\mathcal{S}}\cup{\mathcal{A}}_{1}\cup\cdots\cup{\mathcal{A}}_{K}caligraphic_M = caligraphic_S ∪ caligraphic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ ⋯ ∪ caligraphic_A start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. We denote the subpocket that is within 10Å10Å10\text{\r{A}}10 Å of the atoms of the k𝑘kitalic_k-th arm 𝒜ksubscript𝒜𝑘{\mathcal{A}}_{k}caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as 𝒫ksubscript𝒫𝑘{\mathcal{P}}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The controllable and decomposed diffusion model is expected to generate a ligand molecule {\mathcal{M}}caligraphic_M given a protein pocket 𝒫𝒫{\mathcal{P}}caligraphic_P and several reference arms {𝒜k}subscript𝒜𝑘\{{{\mathcal{A}}}_{k}\}{ caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }, which can be formulated as modeling the conditional distribution q(|𝒫,{Ak})𝑞conditional𝒫subscript𝐴𝑘q({\mathcal{M}}|{\mathcal{P}},\{{A}_{k}\})italic_q ( caligraphic_M | caligraphic_P , { italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ). And the arms of the generated ligand molecules are expected to be similar to the corresponding reference arms.

Generally, there are two critical modules in our model: a condition encoder and a diffusion-based decoder. We employ an SE(3)-equivariant neural network (Satorras et al., 2021), named EGNN, to encode a reference arm 𝒜ksubscript𝒜𝑘{\mathcal{A}}_{k}caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and its surrounding subpocket 𝒫ksubscript𝒫𝑘{\mathcal{P}}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. We introduce subpockets here to include information of intermolecular interaction and relative positions. Specifically, we build a k𝑘kitalic_k-nearest-neighbor (knn) geometric graph on the complex of the reference arm and its surrounding subpocket and apply the EGNN to learn its representation as follows: [𝑨k,𝑷k]=Enc(𝒜k,𝒫k)subscript𝑨𝑘subscript𝑷𝑘Encsubscript𝒜𝑘subscript𝒫𝑘[{\bm{A}}_{k},{\bm{P}}_{k}]=\text{Enc}({\mathcal{A}}_{k},{\mathcal{P}}_{k})[ bold_italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = Enc ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where 𝑨k|𝒜k|×Dsubscript𝑨𝑘superscriptsubscript𝒜𝑘𝐷{\bm{A}}_{k}\in\mathbb{R}^{|{\mathcal{A}}_{k}|\times D}bold_italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | × italic_D end_POSTSUPERSCRIPT, 𝑷k|𝒫k|×Dsubscript𝑷𝑘superscriptsubscript𝒫𝑘𝐷{\bm{P}}_{k}\in\mathbb{R}^{|{\mathcal{P}}_{k}|\times D}bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | × italic_D end_POSTSUPERSCRIPT, []delimited-[][\cdot][ ⋅ ] denotes concatenation along the first dimension. Each row of 𝑨ksubscript𝑨𝑘{\bm{A}}_{k}bold_italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (resp. 𝑷ksubscript𝑷𝑘{\bm{P}}_{k}bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) corresponds to a condition feature of an atom in the reference arm 𝒜ksubscript𝒜𝑘{\mathcal{A}}_{k}caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (resp. the subpocket 𝒫ksubscript𝒫𝑘{\mathcal{P}}_{k}caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT). 𝒂k=Agg([𝑨k,𝑷k])Dsubscript𝒂𝑘Aggsubscript𝑨𝑘subscript𝑷𝑘superscript𝐷{\bm{a}}_{k}=\text{Agg}([{\bm{A}}_{k},{\bm{P}}_{k}])\in\mathbb{R}^{D}bold_italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = Agg ( [ bold_italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT is the global SE(3)-invariant condition feature aggregated from the atom-wise condition features.

The diffusion model first perturbs samples by iteratively injecting random noises which are independent of the arm conditions. This leads to the forward process as follows:

q(1:T|0,𝒫,{𝒜k})=q(1:T|0,𝒫)=t=1Tq(t|t1,𝒫),𝑞conditionalsubscript:1𝑇subscript0𝒫subscript𝒜𝑘𝑞conditionalsubscript:1𝑇subscript0𝒫superscriptsubscriptproduct𝑡1𝑇𝑞conditionalsubscript𝑡subscript𝑡1𝒫\displaystyle q({\mathcal{M}}_{1:T}|{\mathcal{M}}_{0},{\mathcal{P}},\{{{% \mathcal{A}}}_{k}\})=q({\mathcal{M}}_{1:T}|{\mathcal{M}}_{0},{\mathcal{P}})=% \prod_{t=1}^{T}q({\mathcal{M}}_{t}|{\mathcal{M}}_{t-1},{\mathcal{P}}),italic_q ( caligraphic_M start_POSTSUBSCRIPT 1 : italic_T end_POSTSUBSCRIPT | caligraphic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_P , { caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ) = italic_q ( caligraphic_M start_POSTSUBSCRIPT 1 : italic_T end_POSTSUBSCRIPT | caligraphic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_P ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_q ( caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | caligraphic_M start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , caligraphic_P ) , (1)

where 0q(|𝒫,{𝒜k})similar-tosubscript0𝑞conditional𝒫subscript𝒜𝑘{\mathcal{M}}_{0}\sim q({\mathcal{M}}|{\mathcal{P}},\{{\mathcal{A}}_{k}\})caligraphic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_q ( caligraphic_M | caligraphic_P , { caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ) and {t}t=1Tsuperscriptsubscriptsubscript𝑡𝑡1𝑇\{{\mathcal{M}}_{t}\}_{t=1}^{T}{ caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is a sequence of perturbed ligands. We will introduced the aforementioned condition features into the reverse (generative) process of the diffusion model as:

pθ(0:T1|T,𝒫,{𝒜k})=t=1Tpθ(t1|t,𝒫,{𝒜k}).subscript𝑝𝜃conditionalsubscript:0𝑇1subscript𝑇𝒫subscript𝒜𝑘superscriptsubscriptproduct𝑡1𝑇subscript𝑝𝜃conditionalsubscript𝑡1subscript𝑡𝒫subscript𝒜𝑘\displaystyle p_{\theta}({\mathcal{M}}_{0:T-1}|{\mathcal{M}}_{T},{\mathcal{P}}% ,\{{{\mathcal{A}}}_{k}\})=\prod_{t=1}^{T}p_{\theta}({\mathcal{M}}_{t-1}|{% \mathcal{M}}_{t},{\mathcal{P}},\{{{\mathcal{A}}}_{k}\}).italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( caligraphic_M start_POSTSUBSCRIPT 0 : italic_T - 1 end_POSTSUBSCRIPT | caligraphic_M start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , caligraphic_P , { caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( caligraphic_M start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_P , { caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ) . (2)

To model pθ(t1|t,𝒫,{𝒜k})subscript𝑝𝜃conditionalsubscript𝑡1subscript𝑡𝒫subscript𝒜𝑘p_{\theta}({\mathcal{M}}_{t-1}|{\mathcal{M}}_{t},{\mathcal{P}},\{{{\mathcal{A}% }}_{k}\})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( caligraphic_M start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_P , { caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ), for the input (i.e., the ligand molecule being generated at time step t𝑡titalic_t) of the diffusion-based decoder, we denote the SE(3)-invariant feature of its each arm atom as 𝒗i𝒜subscriptsuperscript𝒗𝒜𝑖{\bm{v}}^{{\mathcal{A}}}_{i}bold_italic_v start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and each scaffold atom as 𝒗isubscriptsuperscript𝒗𝑖{\bm{v}}^{{\mathcal{M}}}_{i}bold_italic_v start_POSTSUPERSCRIPT caligraphic_M end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For each arm atom that belongs to its k𝑘kitalic_k-th arm, we incorporate the aforementioned arm condition as 𝒗~i𝒜=MLP([𝒗i𝒜,𝒂k])subscriptsuperscript~𝒗𝒜𝑖MLPsubscriptsuperscript𝒗𝒜𝑖subscript𝒂𝑘\tilde{{\bm{v}}}^{\mathcal{A}}_{i}=\text{MLP}([{\bm{v}}^{{\mathcal{A}}}_{i},{% \bm{a}}_{k}])over~ start_ARG bold_italic_v end_ARG start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = MLP ( [ bold_italic_v start_POSTSUPERSCRIPT caligraphic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ). For each scaffold atom, we do not introduce any condition (i.e., 𝒗~i𝒮MLP(𝒗i𝒮)subscriptsuperscript~𝒗𝒮𝑖MLPsubscriptsuperscript𝒗𝒮𝑖\tilde{{\bm{v}}}^{\mathcal{S}}_{i}\coloneqq\text{MLP}({\bm{v}}^{\mathcal{S}}_{% i})over~ start_ARG bold_italic_v end_ARG start_POSTSUPERSCRIPT caligraphic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ MLP ( bold_italic_v start_POSTSUPERSCRIPT caligraphic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )) and leave enough room for the generative model to generate diverse scaffolds. For each atom of the protein pocket, we let 𝒗i𝒫=MLP([𝒗i𝒫,Agg([MLP(𝒗i𝒫1),,MLP(𝒗i𝒫K)])){\bm{v}}^{\mathcal{P}}_{i}=\text{MLP}([{\bm{v}}^{\mathcal{P}}_{i},\text{Agg}([% \text{MLP}({\bm{v}}^{{\mathcal{P}}_{1}}_{i}),\cdots,\text{MLP}({\bm{v}}^{{% \mathcal{P}}_{K}}_{i})]))bold_italic_v start_POSTSUPERSCRIPT caligraphic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = MLP ( [ bold_italic_v start_POSTSUPERSCRIPT caligraphic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , Agg ( [ MLP ( bold_italic_v start_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , ⋯ , MLP ( bold_italic_v start_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ) ), where 𝒗i𝒫ksubscriptsuperscript𝒗subscript𝒫𝑘𝑖{\bm{v}}^{{\mathcal{P}}_{k}}_{i}bold_italic_v start_POSTSUPERSCRIPT caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the atom condition feature that corresponds to a specific row of 𝑷ksubscript𝑷𝑘{\bm{P}}_{k}bold_italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT if this atom belongs to the k𝑘kitalic_k-th subpocket and is set to be 𝟎0{\bm{0}}bold_0 otherwise. For the SE(3)-equivariant feature (i.e., the coordinate in 3D space) of each atom, we do not introduce any conditions. Nevertheless, the geometric information is embedded in the SE(3)-invariant features thanks to the particular design of EGNN. After the input feature is augmented by the condition, the rest of the diffusion-based decoder mainly follows DecompDiff (Guan et al., 2023b), including the decomposed prior distribution, model architecture, training loss, etc.

Algorithm 1 Optimization Process
A specific protein pocket 𝒫𝒫{\mathcal{P}}caligraphic_P with detected subpockets {𝒫k}k=1,,Ksubscriptsubscript𝒫𝑘𝑘1𝐾\{{\mathcal{P}}_{k}\}_{k=1,\ldots,K}{ caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 , … , italic_K end_POSTSUBSCRIPT, a reference ligand {\mathcal{M}}caligraphic_M, pre-trained decomposed and controllable diffusion model denoted as Enc()Enc\text{Enc}(\cdot)Enc ( ⋅ ) and DiffDec()DiffDec\text{DiffDec}(\cdot)DiffDec ( ⋅ )
OAL(𝒫k)OALsubscript𝒫𝑘\text{OAL}({\mathcal{P}}_{k})OAL ( caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
# Intialize all ordered arm lists for arms.
{𝒜k}Decompose(,𝒫,{𝒫k})subscript𝒜𝑘Decompose𝒫subscript𝒫𝑘\{{\mathcal{A}}_{k}\}\leftarrow\text{Decompose}({\mathcal{M}},{\mathcal{P}},\{% {\mathcal{P}}_{k}\}){ caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ← Decompose ( caligraphic_M , caligraphic_P , { caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } )
for k1𝑘1k\leftarrow 1italic_k ← 1 to K𝐾Kitalic_K do
     # sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the evaluated score.
     (sk,𝒜k)DockEval(𝒜k,𝒫k)subscript𝑠𝑘subscript𝒜𝑘DockEvalsubscript𝒜𝑘subscript𝒫𝑘(s_{k},{\mathcal{A}}_{k})\leftarrow\text{DockEval}({\mathcal{A}}_{k},{\mathcal% {P}}_{k})( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ← DockEval ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
     OAL(𝒫k)OALsubscript𝒫𝑘\text{OAL}({\mathcal{P}}_{k})OAL ( caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (sk,𝒜k)absentdelimited-⟨⟩subscript𝑠𝑘subscript𝒜𝑘\leftarrow\langle(s_{k},{\mathcal{A}}_{k})\rangle← ⟨ ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ⟩
end for
# Start iterative optimization.
for i1𝑖1i\leftarrow 1italic_i ← 1 to N𝑁Nitalic_N do
     # The inner loop is just for better illustration
     # and is paralled as a batch in practice.
     for j1𝑗1j\leftarrow 1italic_j ← 1 to B𝐵Bitalic_B do
         Sample {𝒜k}subscript𝒜𝑘\{{\mathcal{A}}_{k}\}{ caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } from OAL(𝒫k)OALsubscript𝒫𝑘\text{OAL}({\mathcal{P}}_{k})OAL ( caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
         DiffDec(𝒫,{Enc(𝒜k,𝒫k)})DiffDec𝒫Encsubscript𝒜𝑘subscript𝒫𝑘{\mathcal{M}}\leftarrow\text{DiffDec}({\mathcal{P}},\{\text{Enc}({\mathcal{A}}% _{k},{\mathcal{P}}_{k})\})caligraphic_M ← DiffDec ( caligraphic_P , { Enc ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } )
         {𝒜k}Decompose(,𝒫,{𝒫k})subscript𝒜𝑘Decompose𝒫subscript𝒫𝑘\{{\mathcal{A}}_{k}\}\leftarrow\text{Decompose}({\mathcal{M}},{\mathcal{P}},\{% {\mathcal{P}}_{k}\}){ caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ← Decompose ( caligraphic_M , caligraphic_P , { caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } )
         for k1𝑘1k\leftarrow 1italic_k ← 1 to K𝐾Kitalic_K do
              (sk,𝒜k)DockEval(𝒜k,𝒫k)subscript𝑠𝑘subscript𝒜𝑘DockEvalsubscript𝒜𝑘subscript𝒫𝑘(s_{k},{\mathcal{A}}_{k})\leftarrow\text{DockEval}({\mathcal{A}}_{k},{\mathcal% {P}}_{k})( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ← DockEval ( caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
              Append (sk,𝒜k)subscript𝑠𝑘subscript𝒜𝑘(s_{k},{\mathcal{A}}_{k})( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , caligraphic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) to OAL(𝒫k)OALsubscript𝒫𝑘\text{OAL}({\mathcal{P}}_{k})OAL ( caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
              Sort OAL(𝒫k)OALsubscript𝒫𝑘\text{OAL}({\mathcal{P}}_{k})OAL ( caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) by sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
              Keep top-M𝑀Mitalic_M elements in OAL(𝒫k)OALsubscript𝒫𝑘\text{OAL}({\mathcal{P}}_{k})OAL ( caligraphic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
         end for
     end for
end for

To shed lights on the insights behind the dedicated model design, we will discuss more about our special considerations as follows. The encoded conditions follow the principle of decomposition. It is notable that different reference arms that locally interact with different subpockets are encoded and play roles in parallel, which allows us to control the generation more flexibly. For example, we can control the different arms of the generated ligand molecules separately. Another concern is about diversity. We do not explicitly introduce any regularization like VAE (Kingma & Welling, 2013) over the representation space of conditions in consideration of the unsmooth nature of chemical space. Nevertheless, the source of randomness is two-fold: the sampling procedure of the diffusion model and the degree of freedom of scaffold. Notably, the scaffold and arms will impact each other in the generative process and the randomness of scaffolds will also flow into arms. Expectedly, the arm of the generated ligand molecule will be similar to its corresponding reference arm but not exactly the same, which is the workhorse of our framework. This characteristic reflects both the abilities of exploration and exploitation, which is critical to the optimization process that will be introduced in the following.

3.2 Optimization in the Decomposed Drug Space

Thanks to the controllability and decomposition of the model introduced above, we can optimize the ligand molecules in the decomposed drug space. We will introduce the optimization process of DecompOpt as follows.

Due to the characteristics of decomposition, different arms that locally interact with different subpockets can be evaluated seperately. Thus we can define a score for each arm, which can be a single objective or a value that scalarizes multiple objectives by weighted sum. The following optimization process is oriented towards a given protein pocket. For each subpocket of the given protein pocket, we build an ordered list with a certain max size to store potential arms sorted by their scores. We can initialize the ordered arm lists (OAL) by decomposing reference ligands or ligand molecules generated by generative models. In each iteration of optimization, we use the controllable and decomposed diffusion models to generate a batch of ligand molecules conditioned on reference arms sampled from the ordered arm lists and further decompose them to get new arm candidates. The new arms are first refined by re-docking and then evaluated by oracles. Here we introduce an optional re-docking procedure to assign each arm with a higher-quality pose so that better interaction information can be injected into arm conditions. Then the new arms are inserted into the corresponding ordered lists and the arms with bad scores will be remove to keep predefined max size. As the process goes on, more and more arms that can well interact with the subpockets will be discovered.

The optimization process based DecompOpt is summarized as Algorithm 1. This optimization process shares similar ideas with well-recognized evolutionary algorithms. Arms in each ordered arm list evolve towards desired properties. The controllable generation can be viewed as one kind of mutation. But, differently, generative mutation is more efficient than the widely-used mutation defined by rule-based perturbation.

4 Experiments

4.1 Experimental Setup

Dataset

We utilized the CrossDocked2020 dataset (Francoeur et al., 2020) to train and evaluate our model. Additionally, we adopted the same filtering and splitting strategies as the previous work (Luo et al., 2021; Peng et al., 2022; Guan et al., 2023a). The strategy focuses on retaining high-quality complexes (RMSD <1absent1<1< 1Å) and diverse proteins (sequence identity <<< 30%percent3030\%30 %), leading to 100,000100000100,000100 , 000 protein-binding complexes for training and 100100100100 novel protein for testing.

Implementation Details For iterative optimization, we select arms with desired properties as conditions. We initialize the list of arms with 20 molecules generated by DecompDiff in our experiment. To better align with the practical requirements of pharmaceutical practice, where the goal is to generate molecules with high drug likeness, synthesis feasibility, binding affinity, and other pertinent properties, we introduced a multi-objective optimization score to effectively balance these different objectives. In each iteration, we evaluate QED, SA, Vina Min (which will be introduced in detail later) of decomposed arms from generated molecules, then calculate Z-score(xi)=(ximean(X))/std(X),xiXformulae-sequenceZ-scoresubscript𝑥𝑖subscript𝑥𝑖mean𝑋std𝑋subscript𝑥𝑖𝑋\text{Z-score}(x_{i})=(x_{i}-\text{mean}(X))/\text{std}(X),\ x_{i}\in XZ-score ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - mean ( italic_X ) ) / std ( italic_X ) , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_X (also known as the standard score (Zill, 2020)) of each property, where X𝑋Xitalic_X denotes the set of evaluated values of a specific property. The Z-scores of each property are aggregated with same weights as criteria for selecting the top-M𝑀Mitalic_M arms as next iteration conditions. In our experiment, we conduct 30 rounds of optimization and sample 20 molecules in each round. For top-M𝑀Mitalic_M arm selection, we set M𝑀Mitalic_M=3. For each protein pocket, if the average properties of sampled ligand molecules are no longer improved, the optimization process will be early stopped.

Baselines There are two types of baselines from the perspective of generation and optimization, respectively. Generation Perspective: We compare our model with various representative generative baselines: liGAN (Ragoza et al., 2022) is a 3D CNN-based conditional VAE model which generate ligand molecules in atomic density grids. AR (Luo et al., 2021), Pocket2Mol (Peng et al., 2022) and GraphBP (Liu et al., 2022) are GNN-based methods that generate 3D molecules atom by atom in an autoregressive manner. TargetDiff (Guan et al., 2023a) is a diffusion-based method which generates atom coordinates and atom types in a non-autoregressive way, and the prior distribution is a standard Gaussian and bonds are generated with OpenBabel (O’Boyle et al., 2011). DecompDiff (Guan et al., 2023b) is a diffusion-based method with decomposed priors and validity guidance which generates atoms and bonds of 3D ligand molecules in an end-to-end manner. Decompdiff has three optional decomposed priors: reference priors, pocket priors, and optimal priors. Our method also follows this setting. Optimization Perspective: We choose the most related work, RGA (Fu et al., 2022), which is a reinforced genetic algorithm with policy networks and also focus on structure-based molecular optimization, as the baseline. Besides, we also introduced the controllability into TargetDiff by whole molecule conditions rather than arm conditions. We name this baseline as TargetDiff w/ Opt..

Table 1: Summary of different properties of reference molecules and molecules generated by our model and other generation (Gen.) and optimization (Opt.) baselines. (\uparrow) / (\downarrow) denotes a larger / smaller number is better. Top 2 results are highlighted with bold text and underlined text, respectively.

Methods Vina Score (\downarrow) Vina Min (\downarrow) Vina Dock (\downarrow) High Affinity (\uparrow) QED (\uparrow) SA (\uparrow) Diversity (\uparrow) Success Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Rate (\uparrow) Reference -6.36 -6.46 -6.71 -6.49 -7.45 -7.26 - - 0.48 0.47 0.73 0.74 - - 25.0% Gen. LiGAN - - - - -6.33 -6.20 21.1% 11.1% 0.39 0.39 0.59 0.57 0.66 0.67 3.9% GraphBP - - - - -4.80 -4.70 14.2% 6.7% 0.43 0.45 0.49 0.48 0.79 0.78 0.1% AR -5.75 -5.64 -6.18 -5.88 -6.75 -6.62 37.9% 31.0% 0.51 0.50 0.63 0.63 0.70 0.70 7.1% Pocket2Mol -5.14 -4.70 -6.42 -5.82 -7.15 -6.79 48.4% 51.0% 0.56 0.57 0.74 0.75 0.69 0.71 24.4% TargetDiff -5.47 -6.30 -6.64 -6.83 -7.80 -7.91 58.1% 59.1% 0.48 0.48 0.58 0.58 0.72 0.71 10.5% DecompDiff -5.67 -6.04 -7.04 -7.09 -8.39 -8.43 64.4% 71.0% 0.45 0.43 0.61 0.60 0.68 0.68 24.5% Opt. RGA - - - - -8.01 -8.17 64.4% 89.3% 0.57 0.57 0.71 0.73 0.41 0.41 46.2% Gen. + Opt. TargetDiff w/ Opt. -7.87 -7.48 -7.82 -7.48 -8.30 -8.15 71.5% 95.9% 0.66 0.68 0.68 0.67 0.31 0.30 25.8% DecompOpt -5.87 -6.81 -7.35 -7.72 -8.98 -9.01 73.5% 93.3% 0.48 0.45 0.65 0.65 0.60 0.61 52.5%

Metircs

We evaluate the generated molecules from target binding affinity and molecular properties. We employ AutoDock Vina (Eberhardt et al., 2021) to estimate the target binding affinity, following the same setup as Luo et al. (2021); Ragoza et al. (2022). We collect all generated molecules across 100 test proteins and report the mean and median of affinity-related metrics (Vina Score, Vina Min, Vina Dock, and High Affinity) and property-related metrics (drug-likeness QED (Bickerton et al., 2012), synthesizability SA (Ertl & Schuffenhauer, 2009), and diversity). Vina Score directly estimates the binding affinity based on the generated 3D molecules, Vina Min conducts a local structure minimization before estimation, Vina Dock involves a re-docking process and reflects the best possible binding affinity, and High Affinity measures the percentage of how many generated molecules binds better than the reference molecule per test protein. Following Yang et al. (2021); Long et al. (2022); Guan et al. (2023b); Yang et al. (2024), we further report the percentage of molecules which pass certain criteria (QED >0.25absent0.25>0.25> 0.25, SA >0.59absent0.59>0.59> 0.59, Vina Dock <8.18absent8.18<-8.18< - 8.18) as Success Rate for comprehensive evaluation, considering that practical drug design also requires the generated molecules to be drug-like, synthesizable, and maintain high binding affinity simultaneously (Jin et al., 2020; Xie et al., 2021). Additionally, we also evaluate the generated molecules from the perspective of molecular conformation. Please refer to Appendix B for more results.

4.2 Main Results

We evaluate the effectiveness of our model in terms of binding affinity and molecular properties. As shown in Table 1, DecompOpt outperforms baselines by a large margin in affinity-related metrics and Success Rate, a comprehensive metric. Specifically, DecompOpt surpasses the strong baseline DecompDiff in all metrics, except diversity. All these gains clearly indicate that the optimization works as we expected and our method better aligns with the goal of drug discovery compared with the generative baselines.

Notably, RGA, which also focuses on structure-based molecular optimization, achieves promising Sucess Rate. And our method performs even better. More importantly, molecular optimization methods like RGA inevitably encounter the pitfalls of low diversity and inefficiency. Because these optimization methods usually start with a reference ligand and iteratively make small changes on it to gradually improve the property. They are easy to be trapped in local solutions. However, DecompOpt inherites advantages of both optimization and generation, achieving high Success Rate and considerable diversity at the same time. See Appendix C for additional results.

Table 2: Summary of results of single-objective optimization. The improvements of DecompOpt over the baseline are highlighted in green.
Table 3: Comparison of optimization strategies using molecule-level and arm-level conditions.
Property DecompDiff DecompOpt
Avg. Med. Avg. Med.
QED (\uparrow) 0.48 0.48 0.52 (+8.3%) 0.53 (+10.4%)
SA (\uparrow) 0.67 0.66 0.74 (+10.5%) 0.74 (+12.1%)
Vina Min (\downarrow) -6.01 -5.98 -6.72 (+11.8%) -6.72 (+12.4%)
Method Vina Min (\downarrow)
Avg. Med.
DecompDiff -6.01 -5.98
Molecule-level Opt. -6.62 -6.66
Arm-level Opt. -6.72 -6.72
Table 3: Comparison of optimization strategies using molecule-level and arm-level conditions.

4.3 Ablation Studies

Single-Objective Optimization

To further validate the effectiveness of DecompOpt, we test our method on the setting of single-objective optimization with reference priors. Specifically, we use QED, SA, and Vina Min as the objective and analyze the results of three experiments, respectively. As shown in Table 3, each experiment effectively improves the corresponding property.

Benefits of Decomposed Optimization The decomposed drug space allows for optimizing each arm of a ligand molecule separately. In our method, arms of each subpockets are evaluated and selected independently (namely, arm-level optimization). We also tried evaluating the property of the whole ligand molecules, choosing those with desired property, and decomposing them to serve as the arm conditions in the next optimization iteration (namely, molecule-level optimization). We compare these two optimization under the setting of reference prior. As shown in Table 3, arm-wise optimization performs better than molecule-wise optimization, which demonstrates benefits brought by decomposition in terms of optimization efficiency.

4.4 Controllability

Various molecular optimization scenarios, including R-group design and scaffold hopping, play a crucial role in real-world drug discovery. They enhance binding affinity, potency, and other relevant molecular properties with greater precision. Our controllable and decomposed diffusion model seamlessly integrates with these scenarios by incorporating expert knowledge through decomposed arm conditions, better aligned with the demands and objectives of the pharmaceutical industry.

Table 4: Scaffold hopping results of Decompdiff and DecompOpt on CrossDocked2020 test set.

Methods Valid (\uparrow) Unique (\uparrow) Novel (\uparrow) Complete   Rate (\uparrow) Scaffold Similarity (\downarrow) DecompDiff+Inpainting 0.95 0.48 0.85 89.2% 0.40 DecompOpt +Inpainting 0.96 0.46 0.88 93.0% 0.35

R-group Design

R-group optimization is a widely used technique to optimize molecules’ substituents for improving biological activity. Our model is well-suited for this task by employing finer-level arm conditions to guide the optimization. To achieve the optimization goal, we decompose the compound into a scaffold and multiple arms. Subsequently, we choose a specific arm for optimization to enhance its binding affinity. The optimization process involves conditional inpainting (Lugmayr et al., 2022) inspired by Schneuing et al. (2022).

We diffuse the remaining part of the compound while predicting the selected arm, which is conditioned on the reference arm at each step. We initialize the list of arms by the arms defined as r-group to optimize. From the molecules generated through this process, we can select substituents with higher Vina Min Score to serve as the condition for the next iteration. Results of R-group optimization on protein 3DAF and 4F1M are presented in Figure 3. More results on protein 4G3D can be found in Appendix D. After 30 rounds of optimization, our generated molecules achieve a vina minimize score more than 1 kcal/mol better than the reference. Moreover, we compare DecompOpt with Decompdiff for R-group optimization on Vina minimize, Tanimoto similarity, and Complete rate with detailed result in Appendix D. Tanimoto similarity is calculated using rdkit GetMorganFingerprint and TanimotoSimilarity functions. Complete rate measures the proportion of completed molecules in generated result. As Table 14 shows, the decomposed conditional arms bring greater controllability over the shape, positioning, and properties of the generated substituents compared to diffusion inpainting.

Fragment growing is a another technique for R-group design. Unlike R-group optimization, fragment growing aims to design new substituents instead of optimization existing ones. By designing novel arms priors and predicting the number of atoms through chemistry software or expert guidance, DecompOpt can naturally facilitate incremental growth and optimization of newly generated arms, leading to improved biological activity. A case study on the application of fragment growing can be found in Appendix D.

Scaffold Hopping

[Uncaptioned image]
Figure 3: Visualization of reference binding molecules (left column), molecules generated by DecompOpt (middle and right column) with 30 rounds of optimization on protein 3DAF (top row) and 4F1M (bottom row). Optimized R-group are highlighted in red.
[Uncaptioned image]
Figure 4: Examples of scaffold hopping accomplished by DecompOpt. For each row, the left image shows the reference ligand, the middle and right images are two examples generated by DecompOpt. Reference and generated scaffolds are highlighted in green.

In contrast to R-group design, scaffold hopping involves the generation of scaffolds to connect pivotal groups, which play a critical role in interaction with proteins. We apply our decomposed conditional diffusion model to scaffold hopping by inpainting scaffold in fixed arms and incorporating pivotal arms as structural conditions. We generate 20 molecules for each target on our test set. Following Huang et al. (2022), we measure the Validity, Uniqueness, and Novelty of generated molecules. Additionally, we compute the Complete Rate, which measures the proportion of successfully constructed molecules with all atoms connected. To better understand the influence of conditional arms on scaffold generation, we estimate Scaffold Similarity between generated scaffold and reference scaffold following Polykovskiy et al. (2020). A detailed description of scaffold hopping evaluation metrics can be found in Appendix D. The supplementary information provided indicates the inclusion of arm conditions can influence scaffold generation through message passing, leading to a more consistent scaffold when compared to diffusion inpainting without arm conditions. As Table 4 shows, our model achieves higher validity and complete rate. Higher novelty and lower scaffold similarity indicate that our model is better at maintaining diversity and exploring novel molecules while controlling pivotal groups. In Figure 4, we show results of scaffold hopping on 1R1H and 5AEH.

More visualization results can be found in Appendix D.

5 Conclusions

In this work, we proposed a controllable and decomposed diffusion model for structure-based molecular optimization and opened a new paradigm that combines generation and optimization for structure-based drug design. Our method shows promising performance on both de novo design and controllable generation, indicating its great potential in drug discovery. We would like to point out that in our current controllable diffusion models, we did not explore the best way for multi-objective optimization. We plan to investigate it in our future work.

References

  • Ahn et al. (2020) Sungsoo Ahn, Junsu Kim, Hankook Lee, and Jinwoo Shin. Guiding deep molecular optimization with genetic exploration. Advances in neural information processing systems, 33:12008–12021, 2020.
  • Anderson (2003) Amy C Anderson. The process of structure-based drug design. Chemistry & biology, 10(9):787–797, 2003.
  • Baillif et al. (2023) Benoit Baillif, Jason Cole, Patrick McCabe, and Andreas Bender. Deep generative models for 3d molecular structure. Current Opinion in Structural Biology, 80:102566, 2023.
  • Bickerton et al. (2012) G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, and Andrew L Hopkins. Quantifying the chemical beauty of drugs. Nature chemistry, 4(2):90–98, 2012.
  • Böhm et al. (2004) Hans-Joachim Böhm, Alexander Flohr, and Martin Stahl. Scaffold hopping. Drug discovery today: Technologies, 1(3):217–224, 2004.
  • Bonchev (1991) Danail Bonchev. Chemical graph theory: introduction and fundamentals, volume 1. CRC Press, 1991.
  • Eberhardt et al. (2021) Jerome Eberhardt, Diogo Santos-Martins, Andreas F Tillack, and Stefano Forli. Autodock vina 1.2. 0: New docking methods, expanded force field, and python bindings. Journal of chemical information and modeling, 61(8):3891–3898, 2021.
  • Ertl & Schuffenhauer (2009) Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 1(1):1–11, 2009.
  • Francoeur et al. (2020) Paul G Francoeur, Tomohide Masuda, Jocelyn Sunseri, Andrew Jia, Richard B Iovanisci, Ian Snyder, and David R Koes. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling, 60(9):4200–4215, 2020.
  • Fu et al. (2022) Tianfan Fu, Wenhao Gao, Connor Coley, and Jimeng Sun. Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35:12325–12338, 2022.
  • Guan et al. (2023a) Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. In International Conference on Learning Representations, 2023a.
  • Guan et al. (2023b) Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, and Quanquan Gu. DecompDiff: Diffusion models with decomposed priors for structure-based drug design. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  11827–11846. PMLR, 23–29 Jul 2023b. URL https://proceedings.mlr.press/v202/guan23a.html.
  • Halgren (1996) Thomas A Halgren. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of computational chemistry, 17(5-6):490–519, 1996.
  • Hartenfeller & Schneider (2011) Markus Hartenfeller and Gisbert Schneider. De novo drug design. Chemoinformatics and computational chemical biology, pp. 299–323, 2011.
  • Ho et al. (2020) Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  • Huang et al. (2022) Yinan Huang, Xingang Peng, Jianzhu Ma, and Muhan Zhang. 3dlinker: an e (3) equivariant variational autoencoder for molecular linker design. arXiv preprint arXiv:2205.07309, 2022.
  • Igashov et al. (2022) Ilia Igashov, Hannes Stärk, Clément Vignac, Victor Garcia Satorras, Pascal Frossard, Max Welling, Michael Bronstein, and Bruno Correia. Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv:2210.05274, 2022.
  • Imrie et al. (2020) Fergus Imrie, Anthony R Bradley, Mihaela van der Schaar, and Charlotte M Deane. Deep generative models for 3d linker design. Journal of chemical information and modeling, 60(4):1983–1995, 2020.
  • Imrie et al. (2021) Fergus Imrie, Thomas E Hadfield, Anthony R Bradley, and Charlotte M Deane. Deep generative design with 3d pharmacophoric constraints. Chemical science, 12(43):14577–14589, 2021.
  • Jensen (2019) Jan H Jensen. A graph-based genetic algorithm and generative model/monte carlo tree search for the exploration of chemical space. Chemical science, 10(12):3567–3572, 2019.
  • Jin et al. (2018) Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp. 2323–2332. PMLR, 2018.
  • Jin et al. (2020) Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Multi-objective molecule generation using interpretable substructures. In International conference on machine learning, pp. 4849–4859. PMLR, 2020.
  • Kingma & Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Kingma & Welling (2013) Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  • Krenn et al. (2020) Mario Krenn, Florian Häse, AkshatKumar Nigam, Pascal Friederich, and Alan Aspuru-Guzik. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020.
  • Langdon et al. (2010) Sarah R Langdon, Peter Ertl, and Nathan Brown. Bioisosteric replacement and scaffold hopping in lead generation and optimization. Molecular informatics, 29(5):366–385, 2010.
  • Langevin et al. (2020) Maxime Langevin, Hervé Minoux, Maximilien Levesque, and Marc Bianciotto. Scaffold-constrained molecular generation. Journal of Chemical Information and Modeling, 60(12):5637–5646, 2020.
  • Lin et al. (2022) Haitao Lin, Yufei Huang, Meng Liu, Xuanjing Li, Shuiwang Ji, and Stan Z Li. Diffbp: Generative diffusion of 3d molecules for target protein binding. arXiv preprint arXiv:2211.11214, 2022.
  • Liu et al. (2022) Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, and Shuiwang Ji. Generating 3d molecules for target protein binding. arXiv preprint arXiv:2204.09410, 2022.
  • Long et al. (2022) Siyu Long, Yi Zhou, Xinyu Dai, and Hao Zhou. Zero-shot 3d drug design by sketching and generating. arXiv preprint arXiv:2209.13865, 2022.
  • Lugmayr et al. (2022) Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11461–11471, 2022.
  • Luo et al. (2021) Shitong Luo, Jiaqi Guan, Jianzhu Ma, and Jian Peng. A 3d generative model for structure-based drug design. Advances in Neural Information Processing Systems, 34:6229–6239, 2021.
  • Maziarz et al. (2022) Krzysztof Maziarz, Henry Richard Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, and Marc Brockschmidt. Learning to extend molecular scaffolds with structural motifs. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=ZTsoE8G3GG.
  • Nichol & Dhariwal (2021) Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
  • Nigam et al. (2020) AkshatKumar Nigam, Pascal Friederich, Mario Krenn, and Alan Aspuru-Guzik. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1lmyRNFvr.
  • O’Boyle et al. (2011) Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open babel: An open chemical toolbox. Journal of cheminformatics, 3(1):1–14, 2011.
  • Olivecrona et al. (2017) Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, and Hongming Chen. Molecular de-novo design through deep reinforcement learning. Journal of cheminformatics, 9(1):1–14, 2017.
  • Peng et al. (2022) Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, and Jianzhu Ma. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, pp. 17644–17655. PMLR, 2022.
  • Polykovskiy et al. (2020) Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, et al. Molecular sets (moses): a benchmarking platform for molecular generation models. Frontiers in pharmacology, 11:565644, 2020.
  • Ragoza et al. (2022) Matthew Ragoza, Tomohide Masuda, and David Ryan Koes. Generating 3d molecules conditional on receptor binding sites with deep generative models. Chemical science, 13(9):2701–2713, 2022.
  • Satorras et al. (2021) Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021.
  • Schneuing et al. (2022) Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, and Bruno Correia. Structure-based drug design with equivariant diffusion models. arXiv preprint arXiv:2210.13695, 2022.
  • Spiegel & Durrant (2020) Jacob O Spiegel and Jacob D Durrant. Autogrow4: an open-source genetic algorithm for de novo drug design and lead optimization. Journal of cheminformatics, 12(1):1–16, 2020.
  • Takeuchi et al. (2021) Kosuke Takeuchi, Ryo Kunimoto, and Jürgen Bajorath. R-group replacement database for medicinal chemistry. Future Science OA, 7(8):FSO742, 2021.
  • Vamathevan et al. (2019) Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development. Nature reviews Drug discovery, 18(6):463–477, 2019.
  • Weininger (1988) David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  • Xie et al. (2021) Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, and Lei Li. {MARS}: Markov molecular sampling for multi-objective drug discovery. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=kHSu4ebxFXY.
  • Yang et al. (2021) Yuwei Yang, Siqi Ouyang, Meihua Dang, Mingyue Zheng, Lei Li, and Hao Zhou. Knowledge guided geometric editing for unsupervised drug design. 2021. URL https://openreview.net/forum?id=91muTwt1_t5.
  • Yang et al. (2024) Yuwei Yang, Siqi Ouyang, Xueyu Hu, Meihua Dang, Mingyue Zheng, Hao Zhou, and Lei Li. Structure-based drug design via 3d molecular generative pre-training and sampling. arxiv, 2024. URL https://arxiv.org/abs/2402.14315v1.
  • Zhang & Liu (2023) Zaixi Zhang and Qi Liu. Learning subpocket prototypes for generalizable structure-based drug design. arXiv preprint arXiv:2305.13997, 2023.
  • Zhang et al. (2022) Zaixi Zhang, Yaosen Min, Shuxin Zheng, and Qi Liu. Molecule generation for target protein binding with structural motifs. In The Eleventh International Conference on Learning Representations, 2022.
  • Zhou et al. (2019) Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N Zare, and Patrick Riley. Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1):10752, 2019.
  • Zill (2020) Dennis G Zill. Advanced engineering mathematics. Jones & Bartlett Learning, 2020.

Appendix A Implementation Details

In this section, we will provide more implementation details of our methods. Though some contents, such as the SE(3)-equivariant layer and the loss function, can be referred to Guan et al. (2023b), we still include them here to make our paper more self-containing.

A.1 Featurization

We follow the decomposition algorithm proposed by Guan et al. (2023b) to decompose ligand molecules into arms and a scaffold. We define the part of proteins that lies within 10Å of any atom of an arm as its corresponding subpocket.

Following DecompDiff (Guan et al., 2023b), we represent each protein atom with the following features: one-hot element indicator (H, C, N, O, S, Se), one-hot amino acid type indicator (20 dimension), one-dim flag indicating whether the atom is a backbone atom, and one-hot arm/scaffold region indicator. If the distance between the protein atom and any arm center is within 10Å, the protein atom will be labeled as belonging to an arm region and otherwise a scaffold region. The ligand atom is represented with following features: one-hot element indicator (C, N, O, F, P, S, Cl) and one-hot arm/scaffold indicator. Different from DecompDiff, the atom features are enhanced by concatenating an SE(3)-invariant feature of arms and their corresponding subpockets encoded by the condition encoder after the orignal features.

Two graphs are constructed for message passing in the protein-ligand complex: a k-nearest neighbors graph 𝒢Ksubscript𝒢𝐾\mathcal{G}_{K}caligraphic_G start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT upon ligand atoms and protein atoms (we choose k=32𝑘32k=32italic_k = 32 in all experiments) and a fully-connected graph 𝒢Lsubscript𝒢𝐿{\mathcal{G}}_{L}caligraphic_G start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT upon ligand atoms. The edge features are the outer products of distance embedding and edge type. The distance embedding is obtained by expanding distance with radial basis functions (RBF) located at 20 centers between 0Å and 10Å. The edge type is a 4-dim one-hot vector indicating the edge is between ligand atoms, protein atoms, ligand-protein atoms or protein-ligand atoms. In the ligand graph, the ligand bond is represented with a one-hot bond type vector (non-bond, single, double, triple, aromatic), an additional feature indicating whether or not two ligand atoms are from the same arm/scaffold.

A.2 Model Details

The controllable and decomposed diffusion model consist of two parts: a condition encoder and a diffusion-based decoder. The building block is an SE(3)-equivariant layer that is composed of three layers: atom update layer, bond update layer, and position update layer.

We denote the protein pocket as 𝒫={(𝐱i𝒫,vi𝒫)}i{1,,N𝒫}𝒫subscriptsubscriptsuperscript𝐱𝒫𝑖subscriptsuperscriptv𝒫𝑖𝑖1subscript𝑁𝒫{\mathcal{P}}=\{({\mathbf{x}}^{\mathcal{P}}_{i},{\textnormal{v}}^{\mathcal{P}}% _{i})\}_{i\in\{1,\dots,N_{\mathcal{P}}\}}caligraphic_P = { ( bold_x start_POSTSUPERSCRIPT caligraphic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , v start_POSTSUPERSCRIPT caligraphic_P end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i ∈ { 1 , … , italic_N start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT } end_POSTSUBSCRIPT and the ligand molecule as ={(𝒙i,𝒗i,𝒃ij)}i,j{1,,N}subscriptsubscript𝒙𝑖subscript𝒗𝑖subscript𝒃𝑖𝑗𝑖𝑗1subscript𝑁\mathcal{M}=\{({\bm{x}}_{i},{\bm{v}}_{i},{\bm{b}}_{ij})\}_{i,j\in\{1,\dots,N_{% \mathcal{M}}\}}caligraphic_M = { ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i , italic_j ∈ { 1 , … , italic_N start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT } end_POSTSUBSCRIPT, where 𝒙𝒙{\bm{x}}bold_italic_x is the atom position, 𝒗𝒗{\bm{v}}bold_italic_v is the atom type, and 𝒃ijsubscript𝒃𝑖𝑗{\bm{b}}_{ij}bold_italic_b start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the chemical bond type between the atom i𝑖iitalic_i and the atom j𝑗jitalic_j. For brevity, we omit the superscript 𝒫𝒫{\mathcal{P}}caligraphic_P or {\mathcal{M}}caligraphic_M in the following. We use 𝐡isubscript𝐡𝑖{\mathbf{h}}_{i}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to denote the the SE(3)-invariant hidden state of i𝑖iitalic_i-th atom, 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to denote the i𝑖iitalic_i-th atom’s coordinate, which is SE(3)-equivariant, and 𝐞ijsubscript𝐞𝑖𝑗{\mathbf{e}}_{ij}bold_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT to denote the hidden state of the edge between the i𝑖iitalic_i-th atom and the j𝑗jitalic_j-th atom. They can be obtained as we described in the previous subsection. And we use t𝑡titalic_t to denote the time embedding as that in Ho et al. (2020).

Atom Update Layer

We denote the atom update layer as ϕa{ϕa1,ϕa2,ϕa3,ϕa4}subscriptitalic-ϕ𝑎subscriptitalic-ϕ𝑎1subscriptitalic-ϕ𝑎2subscriptitalic-ϕ𝑎3subscriptitalic-ϕ𝑎4\phi_{a}\coloneqq\{\phi_{a1},\phi_{a2},\phi_{a3},\phi_{a4}\}italic_ϕ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ≔ { italic_ϕ start_POSTSUBSCRIPT italic_a 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_a 2 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_a 3 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_a 4 end_POSTSUBSCRIPT }.

We first use the atom update layer ϕa1subscriptitalic-ϕ𝑎1\phi_{a1}italic_ϕ start_POSTSUBSCRIPT italic_a 1 end_POSTSUBSCRIPT to model protein-ligand interation as follows:

Δ𝐡K,ij𝒩K(i)ϕa1(𝐡i,𝐡j,𝐱i𝐱j,𝐞ij,t),Δsubscript𝐡𝐾𝑖subscript𝑗subscript𝒩𝐾𝑖subscriptitalic-ϕ𝑎1subscript𝐡𝑖subscript𝐡𝑗normsubscript𝐱𝑖subscript𝐱𝑗subscript𝐞𝑖𝑗𝑡\Delta{\mathbf{h}}_{K,i}\leftarrow\sum_{j\in\mathcal{N}_{K}(i)}\phi_{a1}({% \mathbf{h}}_{i},{\mathbf{h}}_{j},||{\mathbf{x}}_{i}-{\mathbf{x}}_{j}||,{% \mathbf{e}}_{ij},t),roman_Δ bold_h start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_a 1 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , | | bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | , bold_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_t ) , (3)

where 𝒩K(i)subscript𝒩𝐾𝑖{\mathcal{N}}_{K}(i)caligraphic_N start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_i ) is the set of neighbors of the i𝑖iitalic_i-th atom in the protein-ligand complex graph 𝒢Ksubscript𝒢𝐾{\mathcal{G}}_{K}caligraphic_G start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT.

We then further use the atom update layer ϕa2subscriptitalic-ϕ𝑎2\phi_{a2}italic_ϕ start_POSTSUBSCRIPT italic_a 2 end_POSTSUBSCRIPT and ϕa3subscriptitalic-ϕ𝑎3\phi_{a3}italic_ϕ start_POSTSUBSCRIPT italic_a 3 end_POSTSUBSCRIPT to model the interaction inside the ligand as follows:

𝐦ijϕa2(𝐱i𝐱j,𝐞ij),subscript𝐦𝑖𝑗subscriptitalic-ϕ𝑎2normsubscript𝐱𝑖subscript𝐱𝑗subscript𝐞𝑖𝑗\displaystyle{\mathbf{m}}_{ij}\leftarrow\phi_{a2}(||{\mathbf{x}}_{i}-{\mathbf{% x}}_{j}||,{\mathbf{e}}_{ij}),bold_m start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ← italic_ϕ start_POSTSUBSCRIPT italic_a 2 end_POSTSUBSCRIPT ( | | bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | , bold_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) , (4)
Δ𝐡L,ij𝒩L(i)ϕa3(𝐡i,𝐡j,𝐦ji,t),Δsubscript𝐡𝐿𝑖subscript𝑗subscript𝒩𝐿𝑖subscriptitalic-ϕ𝑎3subscript𝐡𝑖subscript𝐡𝑗subscript𝐦𝑗𝑖𝑡\displaystyle\Delta{\mathbf{h}}_{L,i}\leftarrow\sum_{j\in{\mathcal{N}}_{L}(i)}% \phi_{a3}({\mathbf{h}}_{i},{\mathbf{h}}_{j},{\mathbf{m}}_{ji},t),roman_Δ bold_h start_POSTSUBSCRIPT italic_L , italic_i end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_a 3 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_m start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , italic_t ) , (5)

where 𝒩L(i)subscript𝒩𝐿𝑖{\mathcal{N}}_{L}(i)caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_i ) represents the set of neighbors of the i𝑖iitalic_i-th atom in the ligand graph 𝒢Lsubscript𝒢𝐿{\mathcal{G}}_{L}caligraphic_G start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT. Finally, we update the hidden state of atoms by the atom update layer ϕa4subscriptitalic-ϕ𝑎4\phi_{a4}italic_ϕ start_POSTSUBSCRIPT italic_a 4 end_POSTSUBSCRIPT as follows:

𝐡i𝐡i+ϕa4(Δ𝐡K,i+Δ𝐡L,i).subscript𝐡𝑖subscript𝐡𝑖subscriptitalic-ϕ𝑎4Δsubscript𝐡𝐾𝑖Δsubscript𝐡𝐿𝑖{\mathbf{h}}_{i}\leftarrow{\mathbf{h}}_{i}+\phi_{a4}(\Delta{\mathbf{h}}_{K,i}+% \Delta{\mathbf{h}}_{L,i}).\vspace{-6mm}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ϕ start_POSTSUBSCRIPT italic_a 4 end_POSTSUBSCRIPT ( roman_Δ bold_h start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT + roman_Δ bold_h start_POSTSUBSCRIPT italic_L , italic_i end_POSTSUBSCRIPT ) . (6)

Bond Update Layer

We update the hidden states of the edges by the bond update layer ϕbsubscriptitalic-ϕ𝑏\phi_{b}italic_ϕ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT as follows:

𝐞ijk𝒩L(i)\{j}ϕb(𝐡i,𝐡j,𝐡k,𝐦kj,𝐦ji,t).subscript𝐞𝑖𝑗subscript𝑘\subscript𝒩𝐿𝑖𝑗subscriptitalic-ϕ𝑏subscript𝐡𝑖subscript𝐡𝑗subscript𝐡𝑘subscript𝐦𝑘𝑗subscript𝐦𝑗𝑖𝑡{\mathbf{e}}_{ij}\leftarrow\sum_{k\in{\mathcal{N}}_{L}(i)\backslash\{j\}}\phi_% {b}({\mathbf{h}}_{i},{\mathbf{h}}_{j},{\mathbf{h}}_{k},{\mathbf{m}}_{kj},{% \mathbf{m}}_{ji},t).\vspace{-6mm}bold_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_i ) \ { italic_j } end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_m start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT , bold_m start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , italic_t ) . (7)

Position Update Layer

The atom positions are updated by the position update layer ϕp{ϕp1,ϕp2}subscriptitalic-ϕ𝑝subscriptitalic-ϕ𝑝1subscriptitalic-ϕ𝑝2\phi_{p}\coloneqq\{\phi_{p1},\phi_{p2}\}italic_ϕ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ≔ { italic_ϕ start_POSTSUBSCRIPT italic_p 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_p 2 end_POSTSUBSCRIPT } as follows:

Δ𝐱K,ij𝒩K(i)(𝐱j𝐱i)ϕp1(𝐡i,𝐡j,𝐱i𝐱j,t),Δsubscript𝐱𝐾𝑖subscript𝑗subscript𝒩𝐾𝑖subscript𝐱𝑗subscript𝐱𝑖subscriptitalic-ϕ𝑝1subscript𝐡𝑖subscript𝐡𝑗normsubscript𝐱𝑖subscript𝐱𝑗𝑡\displaystyle\Delta{\mathbf{x}}_{K,i}\leftarrow\sum_{j\in{\mathcal{N}}_{K}(i)}% ({\mathbf{x}}_{j}-{\mathbf{x}}_{i})\phi_{p1}({\mathbf{h}}_{i},{\mathbf{h}}_{j}% ,||{\mathbf{x}}_{i}-{\mathbf{x}}_{j}||,t),roman_Δ bold_x start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_p 1 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , | | bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | , italic_t ) , (8)
Δ𝐱L,ij𝒩L(i)(𝐱j𝐱i)ϕp2(𝐡i,𝐡j,𝐱i𝐱j,𝐦ji,t),Δsubscript𝐱𝐿𝑖subscript𝑗subscript𝒩𝐿𝑖subscript𝐱𝑗subscript𝐱𝑖subscriptitalic-ϕ𝑝2subscript𝐡𝑖subscript𝐡𝑗normsubscript𝐱𝑖subscript𝐱𝑗subscript𝐦𝑗𝑖𝑡\displaystyle\Delta{\mathbf{x}}_{L,i}\leftarrow\sum_{j\in{\mathcal{N}}_{L}(i)}% ({\mathbf{x}}_{j}-{\mathbf{x}}_{i})\phi_{p2}({\mathbf{h}}_{i},{\mathbf{h}}_{j}% ,||{\mathbf{x}}_{i}-{\mathbf{x}}_{j}||,{\mathbf{m}}_{ji},t),roman_Δ bold_x start_POSTSUBSCRIPT italic_L , italic_i end_POSTSUBSCRIPT ← ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_p 2 end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , | | bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | | , bold_m start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , italic_t ) , (9)
𝐱i𝐱i+(Δ𝐱K,i+Δ𝐱L,i)𝟙mol,subscript𝐱𝑖subscript𝐱𝑖Δsubscript𝐱𝐾𝑖Δsubscript𝐱𝐿𝑖subscript1mol\displaystyle{\mathbf{x}}_{i}\leftarrow{\mathbf{x}}_{i}+(\Delta{\mathbf{x}}_{K% ,i}+\Delta{\mathbf{x}}_{L,i})\cdot\mathds{1}_{\text{mol}},bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( roman_Δ bold_x start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT + roman_Δ bold_x start_POSTSUBSCRIPT italic_L , italic_i end_POSTSUBSCRIPT ) ⋅ blackboard_1 start_POSTSUBSCRIPT mol end_POSTSUBSCRIPT , (10)

where 𝟙molsubscript1mol\mathds{1}_{\text{mol}}blackboard_1 start_POSTSUBSCRIPT mol end_POSTSUBSCRIPT is the indicator of ligand atoms since we assume the protein atoms are fixed as the context.

In practice, the condition encoder consists of two SE(3)-equivariant layers and the diffusion-based decoder consists of six SE(3)-equivaraint layers. In each SE(3)-equivariant layer, following Guan et al. (2023b), we apply graph attention to aggregate the message of each node/edge. The key/value/query embedding is obtained through a 2-layer MLP with LayerNorm and ReLU activation. Stacking these three layers as a block, our model consists of 6 blocks with hidden_dim=128 and n_heads=16. Additionally, the diffusion-based decoder also have two prediction heads (which are simply 2-layer MLPs and following Softmax function) that maps the learned hidden states of atoms and edges to the predicted atom type and bond type.

A.3 Training Details

Refer to caption
Figure 5: Illustration of training. For this case, there are actually three pairs of arms and subpockets input to the condition encoders separately. For brevity, we only plot one as an example.

Given a pair of protein and ligand molecule, we first decompose the molecule to get the arms. We add noise to the ligand molecules in the training set to get the perturbed molecules as the forward process of diffusion models (equation 1). The forward process is a Markov chain with fixed variance schedule {βt}t=1,,Tsubscriptsubscript𝛽𝑡𝑡1𝑇\{\beta_{t}\}_{t=1,\dots,T}{ italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 , … , italic_T end_POSTSUBSCRIPT (Ho et al., 2020). We denote αt=1βtsubscript𝛼𝑡1subscript𝛽𝑡\alpha_{t}=1-\beta_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and α¯t=s=1tαssubscript¯𝛼𝑡superscriptsubscriptproduct𝑠1𝑡subscript𝛼𝑠\bar{\alpha}_{t}=\prod_{s=1}^{t}\alpha_{s}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. More specifically, the noises at time t𝑡titalic_t are injected as follows:

q(𝐱t|𝐱0)=𝒩(𝐱t;α¯t𝐱0,(1α¯t)𝐈),𝑞conditionalsubscript𝐱𝑡subscript𝐱0𝒩subscript𝐱𝑡subscript¯𝛼𝑡subscript𝐱01subscript¯𝛼𝑡𝐈\displaystyle q({\mathbf{x}}_{t}|{\mathbf{x}}_{0})={\mathcal{N}}({\mathbf{x}}_% {t};\sqrt{\bar{\alpha}_{t}}{\mathbf{x}}_{0},(1-\bar{\alpha}_{t}){\mathbf{I}}),italic_q ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_I ) , (11)
q(𝐯t|𝐯0)=𝒞(𝐯t|α¯t𝐯0+(1α¯t)/Ka),𝑞conditionalsubscript𝐯𝑡subscript𝐯0𝒞conditionalsubscript𝐯𝑡subscript¯𝛼𝑡subscript𝐯01subscript¯𝛼𝑡subscript𝐾𝑎\displaystyle q({\mathbf{v}}_{t}|{\mathbf{v}}_{0})={\mathcal{C}}({\mathbf{v}}_% {t}|\bar{\alpha}_{t}{\mathbf{v}}_{0}+(1-\bar{\alpha}_{t})/K_{a}),italic_q ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_C ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) , (12)
q(𝐛t|𝐛0)=𝒞(𝐯t|α¯t𝐛0+(1α¯t)/Kb),𝑞conditionalsubscript𝐛𝑡subscript𝐛0𝒞conditionalsubscript𝐯𝑡subscript¯𝛼𝑡subscript𝐛01subscript¯𝛼𝑡subscript𝐾𝑏\displaystyle q({\mathbf{b}}_{t}|{\mathbf{b}}_{0})={\mathcal{C}}({\mathbf{v}}_% {t}|\bar{\alpha}_{t}{\mathbf{b}}_{0}+(1-\bar{\alpha}_{t})/K_{b}),italic_q ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_C ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) , (13)

where Kasubscript𝐾𝑎K_{a}italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and Kbsubscript𝐾𝑏K_{b}italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT are the number of atom classes and bond classes respectively.

Then the arms and subpockets are input to the condition encoder. The output of condition encoder and the perturbed ligand are further input to the diffusion-based decoder. Then the reconstruction loss Ltsubscript𝐿𝑡L_{t}italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at time t𝑡titalic_t is defined as follows:

Lt(v)=k=1Ka𝒄(𝐯t,𝐯0)klog𝒄(𝐯t,𝐯0)k𝒄(𝐯t,𝐯^0)k,superscriptsubscript𝐿𝑡𝑣superscriptsubscript𝑘1subscript𝐾𝑎𝒄subscriptsubscript𝐯𝑡subscript𝐯0𝑘𝒄subscriptsubscript𝐯𝑡subscript𝐯0𝑘𝒄subscriptsubscript𝐯𝑡subscript^𝐯0𝑘\displaystyle L_{t}^{(v)}=\sum_{k=1}^{K_{a}}\bm{c}({\mathbf{v}}_{t},{\mathbf{v% }}_{0})_{k}\log\frac{\bm{c}({\mathbf{v}}_{t},{\mathbf{v}}_{0})_{k}}{\bm{c}({% \mathbf{v}}_{t},\hat{\mathbf{v}}_{0})_{k}},italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_c ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_log divide start_ARG bold_italic_c ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG bold_italic_c ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , (14)
Lt(b)=k=1Kb𝒄(𝐛t,𝐛0)klog𝒄(𝐛t,𝐛0)k𝒄(𝐛t,𝐛^0)k,superscriptsubscript𝐿𝑡𝑏superscriptsubscript𝑘1subscript𝐾𝑏𝒄subscriptsubscript𝐛𝑡subscript𝐛0𝑘𝒄subscriptsubscript𝐛𝑡subscript𝐛0𝑘𝒄subscriptsubscript𝐛𝑡subscript^𝐛0𝑘\displaystyle L_{t}^{(b)}=\sum_{k=1}^{K_{b}}\bm{c}({\mathbf{b}}_{t},{\mathbf{b% }}_{0})_{k}\log\frac{\bm{c}({\mathbf{b}}_{t},{\mathbf{b}}_{0})_{k}}{\bm{c}({% \mathbf{b}}_{t},\hat{\mathbf{b}}_{0})_{k}},italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_c ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_log divide start_ARG bold_italic_c ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG bold_italic_c ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , (15)
Lt(x)=𝐱0𝐱^02,superscriptsubscript𝐿𝑡𝑥superscriptnormsubscript𝐱0subscript^𝐱02\displaystyle L_{t}^{(x)}=||{\mathbf{x}}_{0}-\hat{{\mathbf{x}}}_{0}||^{2},italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_x ) end_POSTSUPERSCRIPT = | | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (16)
Lt=Lt(x)+γvLt(v)+γbLt(b),subscript𝐿𝑡superscriptsubscript𝐿𝑡𝑥subscript𝛾𝑣superscriptsubscript𝐿𝑡𝑣subscript𝛾𝑏superscriptsubscript𝐿𝑡𝑏\displaystyle L_{t}=L_{t}^{(x)}+\gamma_{v}L_{t}^{(v)}+\gamma_{b}L_{t}^{(b)},italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_x ) end_POSTSUPERSCRIPT + italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_v ) end_POSTSUPERSCRIPT + italic_γ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , (17)

where (𝐱t,𝐯t,𝐛t)subscript𝐱𝑡subscript𝐯𝑡subscript𝐛𝑡({\mathbf{x}}_{t},{\mathbf{v}}_{t},{\mathbf{b}}_{t})( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), (𝐱0,𝐯t,𝐛t)subscript𝐱0subscript𝐯𝑡subscript𝐛𝑡({\mathbf{x}}_{0},{\mathbf{v}}_{t},{\mathbf{b}}_{t})( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), and (𝐱^0,𝐯^0,𝐛^0)subscript^𝐱0subscript^𝐯0subscript^𝐛0(\hat{{\mathbf{x}}}_{0},\hat{{\mathbf{v}}}_{0},\hat{{\mathbf{b}}}_{0})( over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) represents atom positions, atom types, and bond types of the perturbed molecule at time t𝑡titalic_t, ground truth molecule, and the predicted molecule respectively, 𝒄(𝐯t,𝐯0)=𝒄/k=1Kack𝒄subscript𝐯𝑡subscript𝐯0superscript𝒄superscriptsubscript𝑘1subscript𝐾𝑎superscriptsubscript𝑐𝑘{\bm{c}}({\mathbf{v}}_{t},{\mathbf{v}}_{0})=\bm{c}^{\star}/\sum_{k=1}^{K_{a}}c% _{k}^{\star}bold_italic_c ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = bold_italic_c start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and 𝒄(𝐯t,𝐯0)=[αt𝐯t+(1αt)/Ka][α¯t1𝐯0+(1α¯t1)/Ka]superscript𝒄subscript𝐯𝑡subscript𝐯0direct-productdelimited-[]subscript𝛼𝑡subscript𝐯𝑡1subscript𝛼𝑡subscript𝐾𝑎delimited-[]subscript¯𝛼𝑡1subscript𝐯01subscript¯𝛼𝑡1subscript𝐾𝑎\bm{c}^{\star}({\mathbf{v}}_{t},{\mathbf{v}}_{0})=[\alpha_{t}{\mathbf{v}}_{t}+% (1-\alpha_{t})/K_{a}]\odot[\bar{\alpha}_{t-1}{\mathbf{v}}_{0}+(1-\bar{\alpha}_% {t-1})/K_{a}]bold_italic_c start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = [ italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] ⊙ [ over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ], 𝒄(𝐛t,𝐛0)=𝒄/k=1Kbck𝒄subscript𝐛𝑡subscript𝐛0superscript𝒄superscriptsubscript𝑘1subscript𝐾𝑏superscriptsubscript𝑐𝑘{\bm{c}}({\mathbf{b}}_{t},{\mathbf{b}}_{0})=\bm{c}^{\star}/\sum_{k=1}^{K_{b}}c% _{k}^{\star}bold_italic_c ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = bold_italic_c start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and 𝒄(𝐛t,𝐛0)=[αt𝐛t+(1αt)/Kb][α¯t1𝐛0+(1α¯t1)/Kb]superscript𝒄subscript𝐛𝑡subscript𝐛0direct-productdelimited-[]subscript𝛼𝑡subscript𝐛𝑡1subscript𝛼𝑡subscript𝐾𝑏delimited-[]subscript¯𝛼𝑡1subscript𝐛01subscript¯𝛼𝑡1subscript𝐾𝑏\bm{c}^{\star}({\mathbf{b}}_{t},{\mathbf{b}}_{0})=[\alpha_{t}{\mathbf{b}}_{t}+% (1-\alpha_{t})/K_{b}]\odot[\bar{\alpha}_{t-1}{\mathbf{b}}_{0}+(1-\bar{\alpha}_% {t-1})/K_{b}]bold_italic_c start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = [ italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] ⊙ [ over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ]. Note that the condition encoder and the diffusion-based decoder are jointly trained.

In practice, we set the loss weights as γv=100subscript𝛾𝑣100\gamma_{v}=100italic_γ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = 100 and γb=100subscript𝛾𝑏100\gamma_{b}=100italic_γ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = 100. Follwing the setting of Guan et al. (2023b), we set the number of diffusion steps as 1000. For this diffusion noise schedule, we choose to use a sigmoid β𝛽\betaitalic_β schedule with β1=1e-7subscript𝛽11e-7\beta_{1}=\texttt{1e-7}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1e-7 and βT=2e-3subscript𝛽𝑇2e-3\beta_{T}=\texttt{2e-3}italic_β start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 2e-3 for atom coordinates, and a cosine β𝛽\betaitalic_β schedule suggested in Nichol & Dhariwal (2021) with s=0.01𝑠0.01s=0.01italic_s = 0.01 for atom types and bond types.

We use Adam Kingma & Ba (2014) with init_learning_rate=0.0005, betas=(0.95, 0.999) to train the model. And we set batch_size=16 and clip_gradient_norm=8. During the training phase, we add a small Gaussian noise with a standard deviation of 0.1 to protein atom coordinates as data augmentation. We also schedule to decay the learning rate exponentially with a factor of 0.6 and a minimum learning rate of 1e-6. The learning rate is decayed if there is no improvement for the validation loss in 10 consecutive evaluations. The evaluation is performed for every 1000 training steps. We trained our model on one NVIDIA GeForce GTX A100 GPU, and it could converge within 237k steps.

A.4 Sampling Details

Refer to caption
Figure 6: Illustration of the sampling.

To sample molecules using the pre-trained controllable and decomposed diffusion model, assume that there are available arms as conditions, we can first sample a noisy molecule from the prior distribution and derive a molecule by iteratively denoising following the reverse process (equation 2). More specifically, the denoising step at time t𝑡titalic_t corresponds to sampling molecules from the following distributions:

q(𝐱t1|𝐱t,𝐱^0)=𝒩(𝐱t1;𝝁~t(𝐱t,𝐱^0),β~t𝐈),𝑞conditionalsubscript𝐱𝑡1subscript𝐱𝑡subscript^𝐱0𝒩subscript𝐱𝑡1subscript~𝝁𝑡subscript𝐱𝑡subscript^𝐱0subscript~𝛽𝑡𝐈\displaystyle q({\mathbf{x}}_{t-1}|{\mathbf{x}}_{t},\hat{{\mathbf{x}}}_{0})={% \mathcal{N}}({\mathbf{x}}_{t-1};\tilde{{\bm{\mu}}}_{t}({\mathbf{x}}_{t},\hat{{% \mathbf{x}}}_{0}),\tilde{\beta}_{t}{\mathbf{I}}),italic_q ( bold_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ; over~ start_ARG bold_italic_μ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I ) , (18)
q(𝐯t1|𝐯t,𝐯^0)=𝒞(𝐯t1|𝒄~t(𝐯t,𝐯^0)),𝑞conditionalsubscript𝐯𝑡1subscript𝐯𝑡subscript^𝐯0𝒞conditionalsubscript𝐯𝑡1subscript~𝒄𝑡subscript𝐯𝑡subscript^𝐯0\displaystyle q({\mathbf{v}}_{t-1}|{\mathbf{v}}_{t},\hat{{\mathbf{v}}}_{0})={% \mathcal{C}}({\mathbf{v}}_{t-1}|\tilde{{\bm{c}}}_{t}({\mathbf{v}}_{t},\hat{{% \mathbf{v}}}_{0})),italic_q ( bold_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_C ( bold_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | over~ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) , (19)
q(𝐛t1|𝐛t,𝐛^0)=𝒞(𝐛t1|𝒄~t(𝐛t,𝐛^0)),𝑞conditionalsubscript𝐛𝑡1subscript𝐛𝑡subscript^𝐛0𝒞conditionalsubscript𝐛𝑡1subscript~𝒄𝑡subscript𝐛𝑡subscript^𝐛0\displaystyle q({\mathbf{b}}_{t-1}|{\mathbf{b}}_{t},\hat{{\mathbf{b}}}_{0})={% \mathcal{C}}({\mathbf{b}}_{t-1}|\tilde{{\bm{c}}}_{t}({\mathbf{b}}_{t},\hat{{% \mathbf{b}}}_{0})),italic_q ( bold_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_C ( bold_b start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | over~ start_ARG bold_italic_c end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) , (20)

where 𝝁~t(𝐱t,𝐱^0)=α¯t1βt1α¯t𝐱^0+αt(1α¯t1)1α¯t𝐱tsubscript~𝝁𝑡subscript𝐱𝑡subscript^𝐱0subscript¯𝛼𝑡1subscript𝛽𝑡1subscript¯𝛼𝑡subscript^𝐱0subscript𝛼𝑡1subscript¯𝛼𝑡11subscript¯𝛼𝑡subscript𝐱𝑡\tilde{{\bm{\mu}}}_{t}({\mathbf{x}}_{t},\hat{{\mathbf{x}}}_{0})=\frac{\sqrt{% \bar{\alpha}_{t-1}\beta_{t}}}{1-\bar{\alpha}_{t}}\hat{{\mathbf{x}}}_{0}+\frac{% \sqrt{\alpha}_{t}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}}{\mathbf{x}}_{t}over~ start_ARG bold_italic_μ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = divide start_ARG square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG square-root start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, β~t=1α¯t11α¯tβtsubscript~𝛽𝑡1subscript¯𝛼𝑡11subscript¯𝛼𝑡subscript𝛽𝑡\tilde{\beta}_{t}=\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\beta_{t}over~ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, 𝒄~(𝐯t,𝐯^0)=𝒄~/k=1Kac~k~𝒄subscript𝐯𝑡subscript^𝐯0superscript~𝒄superscriptsubscript𝑘1subscript𝐾𝑎superscriptsubscript~𝑐𝑘\tilde{\bm{c}}({\mathbf{v}}_{t},\hat{{\mathbf{v}}}_{0})=\tilde{\bm{c}}^{\star}% /\sum_{k=1}^{K_{a}}\tilde{c}_{k}^{\star}over~ start_ARG bold_italic_c end_ARG ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = over~ start_ARG bold_italic_c end_ARG start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and 𝒄~(𝐯t,𝐯^0)=[αt𝐯t+(1αt)/Ka][α¯t1𝐯^0+(1α¯t1)/Ka]superscript~𝒄subscript𝐯𝑡subscript^𝐯0direct-productdelimited-[]subscript𝛼𝑡subscript𝐯𝑡1subscript𝛼𝑡subscript𝐾𝑎delimited-[]subscript¯𝛼𝑡1subscript^𝐯01subscript¯𝛼𝑡1subscript𝐾𝑎\tilde{\bm{c}}^{\star}({\mathbf{v}}_{t},\hat{{\mathbf{v}}}_{0})=[\alpha_{t}{% \mathbf{v}}_{t}+(1-\alpha_{t})/K_{a}]\odot[\bar{\alpha}_{t-1}\hat{{\mathbf{v}}% }_{0}+(1-\bar{\alpha}_{t-1})/K_{a}]over~ start_ARG bold_italic_c end_ARG start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = [ italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] ⊙ [ over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ], 𝒄~(𝐛t,𝐛^0)=𝒄~/k=1Kbc~k~𝒄subscript𝐛𝑡subscript^𝐛0superscript~𝒄superscriptsubscript𝑘1subscript𝐾𝑏superscriptsubscript~𝑐𝑘\tilde{\bm{c}}({\mathbf{b}}_{t},\hat{{\mathbf{b}}}_{0})=\tilde{\bm{c}}^{\star}% /\sum_{k=1}^{K_{b}}\tilde{c}_{k}^{\star}over~ start_ARG bold_italic_c end_ARG ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = over~ start_ARG bold_italic_c end_ARG start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT / ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and 𝒄~(𝐛t,𝐛^0)=[αt𝐛t+(1αt)/Kb][α¯t1𝐛^0+(1α¯t1)/Kb]superscript~𝒄subscript𝐛𝑡subscript^𝐛0direct-productdelimited-[]subscript𝛼𝑡subscript𝐛𝑡1subscript𝛼𝑡subscript𝐾𝑏delimited-[]subscript¯𝛼𝑡1subscript^𝐛01subscript¯𝛼𝑡1subscript𝐾𝑏\tilde{\bm{c}}^{\star}({\mathbf{b}}_{t},\hat{{\mathbf{b}}}_{0})=[\alpha_{t}{% \mathbf{b}}_{t}+(1-\alpha_{t})/K_{b}]\odot[\bar{\alpha}_{t-1}\hat{{\mathbf{b}}% }_{0}+(1-\bar{\alpha}_{t-1})/K_{b}]over~ start_ARG bold_italic_c end_ARG start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = [ italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] ⊙ [ over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) / italic_K start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ]. Here (𝐱^0,𝐯^0,𝐛^0)subscript^𝐱0subscript^𝐯0subscript^𝐛0(\hat{{\mathbf{x}}}_{0},\hat{{\mathbf{v}}}_{0},\hat{{\mathbf{b}}}_{0})( over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG bold_v end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG bold_b end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is the molecule output by the diffusion-based decoder, whose input is the noisy molecule at time t𝑡titalic_t and the condition feature. The sampling step is illustrated as Figure 6. During sampling, we also apply validity guidance proposed by Guan et al. (2023b), which encourages the model to generate molecules with valid structures.

A.5 Optimization Details

Refer to caption
Figure 7: Illustration of molecular optimization (revised based on Figure 2). It is highlighted where we apply the condition encoder and the diffusion-based decoder.

To generate molecules with desired properties, we can apply the pre-trained controllable and decomposed diffusion models for structure-based molecular optimization without any fine-tuning. The optimization procedure is summarized as Algorithm 1 and illustrated as Figure 7. In practice, since reference ligands are not available, we can initialize the arm lists with 20 ligands generated by DecompDiff, and this is the actual setting in our experiment. We have provided the optimization procedure in detail that can be found in Section 3.2 and Section 4.1.

Appendix B Evaluation of Molecular Conformation

To evaluate generated molecules from the perspective of molecular conformation, we compute the Jensen-Shannon divergences (JSD) in atom distance distributions between the reference molecules and the generated molecules (see Figure 8).

We also compute different bond distance and bond angle distributions of the generated molecules and compare them against the corresponding reference empirical distributions in Tables 5 and 6.

Refer to caption
Figure 8: Comparing the distribution for distances of all-atom for reference molecules in the test set and model-generated molecules. Jensen-Shannon divergence (JSD) between two distributions is reported.
Table 5: Jensen-Shannon divergence between bond distance distributions of the reference molecules and the generated molecules, and lower values indicate better performances. “-”, “=”, and “:” represent single, double, and aromatic bonds, respectively. We highlight the best two results with bold text and underlined text, respectively.
Bond liGAN GraphBP AR Pocket2 Mol Target Diff Decomp Diff Ours
C--C 0.601 0.368 0.609 0.496 0.369 0.359 0.362
C===C 0.665 0.530 0.620 0.561 0.505 0.537 0.504
C--N 0.634 0.456 0.474 0.416 0.363 0.344 0.328
C===N 0.749 0.693 0.635 0.629 0.550 0.584 0.566
C--O 0.656 0.467 0.492 0.454 0.421 0.376 0.373
C===O 0.661 0.471 0.558 0.516 0.461 0.374 0.329
C::::C 0.497 0.407 0.451 0.416 0.263 0.251 0.196
C::::N 0.638 0.689 0.552 0.487 0.235 0.269 0.219
Table 6: Jensen-Shannon divergence between bond angle distributions of the reference molecules and the generated molecules, and lower values indicate better performances. We highlight the best two results with bold text and underlined text, respectively.
Angle liGAN GraphBP AR Pocket2 Mol Target Diff Decomp Diff Ours
CCC 0.598 0.424 0.340 0.323 0.328 0.314 0.280
CCO 0.637 0.354 0.442 0.401 0.385 0.324 0.331
CNC 0.604 0.469 0.419 0.237 0.367 0.297 0.280
OPO 0.512 0.684 0.367 0.274 0.303 0.217 0.198
NCC 0.621 0.372 0.392 0.351 0.354 0.294 0.266
CC=O 0.636 0.377 0.476 0.353 0.356 0.259 0.257
COC 0.606 0.482 0.459 0.317 0.389 0.339 0.338

To further measure the quality of generated conformation, we optimize the generated structures with Merck Molecular Force Field (MMFF) (Halgren, 1996) and calculate the energy difference between pre- and pos- MMFF-optimized coordinates for different rigid fragments that do not contain any rotatable bonds. As Table 7 and Figure 9 show, DecompOpt achieves low energy differences and outperforms baselines in most cases. We also calculate the energy difference before and after force field optimization for the whole molecules. As Table 8 and Figure 10 show, notably, DecompOpt outperforms all diffusion-based methods by a large margin and achieve comparable performance with the best baseline. These results show that the conformation of ligands generated by DecompOpt is high-quality and stable.

Table 7: Median energy difference for rigid fragment of different fragment size (3/4/5/6/7/8 atoms) before and after the force-field optimization.
Methods Median Energy Difference (\downarrow)
3 4 5 6 7 8
LiGAN 86.32 165.15 105.96 185.70 243.79 332.81
AR 25.79 73.06 23.89 30.42 56.47 76.50
Pocket2Mol 10.43 33.93 34.47 27.86 33.90 42.97
TargetDiff 7.31 30.57 18.01 11.98 28.92 50.42
DecompDiff 6.01 29.20 10.78 4.33 12.74 30.68
DecompOpt 6.00 16.59 9.89 2.61 13.29 31.49
Refer to caption
Figure 9: Median energy difference for molecules with different number of rotatable bonds before and after the force-field optimization.
Table 8: Median energy difference for molecules with different number of rotatable bonds (1/2/3/4/5/6/7 rotatable bonds) before and after the force-field optimization.
Methods Median Energy Difference (\downarrow)
1 2 3 4 5 6 7
LiGAN 810.45 981.53 1145.96 1783.95 1960.24 2547.32 2735.75
AR 176.67 222.74 244.51 268.01 332.89 388.70 441.90
Pocket2Mol 105.64 125.19 168.84 199.33 204.82 226.73 263.96
TargetDiff 225.48 253.72 303.60 344.12 360.74 420.47 434.30
DecompDiff 279.44 264.16 268.23 265.57 262.69 279.73 289.07
DecompOpt 63.33 169.17 215.19 248.35 202.81 237.38 238.32
Refer to caption
Figure 10: Median energy difference for molecules with different number of rotatable bonds before and after the force-field optimization.

Appendix C Additional Results

C.1 Full evaluation results

We provide box plots of evaluation metrics as shown in Figure 11.

Refer to caption
Figure 11: The boxplots of QED, SA, Vina Score, Vina Minimize, and Vina Dock of ligands generated by DecompOpt and baseline models.

Following Guan et al. (2023b), our model also has variants of priors. Table 9 shows the results of multiple variants of our models. The setting of Ref Prior, Pocket Prior, and Opt Prior strictly follows DecompDiff (Guan et al., 2023b). Ref Best means using the best checkpoint instead of the last checkpoint for each target pocket during optimization with reference priors for evaluation. For Pocket/Opt Best, it is similar. Best of Best means using the best checkpoint across all checkpoints with Ref Prior and Pocket Prior during optimization for each target pocket.

Table 9: Summary of different properties of reference molecules and molecules generated by our model and other generation (Gen.) and optimization (Opt.) baselines. (\uparrow) / (\downarrow) denotes a larger / smaller number is better.

Methods Vina Score (\downarrow) Vina Min (\downarrow) Vina Dock (\downarrow) High Affinity (\uparrow) QED (\uparrow) SA (\uparrow) Diversity (\uparrow) Success Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Rate Reference -6.36 -6.46 -6.71 -6.49 -7.45 -7.26 - - 0.48 0.47 0.73 0.74 - - 25.0% DecompOpt (Ref Prior) -5.68 -5.88 -6.53 -6.49 -7.49 -7.66 59.2% 65.0% 0.56 0.58 0.73 0.73 0.64 0.66 35.4% DecompOpt (Ref Best) -5.75 -5.97 -6.58 -6.70 -7.63 -8.02 62.6% 74.3% 0.56 0.59 0.73 0.72 0.63 0.67 39.4% DecompOpt (Pocket Prior) -5.27 -6.38 -7.07 -7.45 -8.85 -8.72 71.4% 93.8% 0.40 0.36 0.63 0.63 0.60 0.61 29.2% DecompOpt (Pocket Best) -5.33 -6.49 -7.08 -7.60 -9.01 -8.98 73.9% 100% 0.41 0.39 0.63 0.63 0.59 0.60 44.7% DecompOpt (Opt Prior) -5.73 -6.64 -7.29 -7.53 -8.78 -8.72 70.3% 89.9% 0.46 0.44 0.65 0.65 0.61 0.61 38.1% DecompOpt (Opt Best) -5.87 -6.81 -7.35 -7.72 -8.98 -9.01 73.5% 93.3% 0.48 0.45 0.65 0.65 0.60 0.61 52.5% DecompOpt (Best of Best) -6.22 -6.94 -7.50 -7.74 -8.98 -8.95 76.2% 100% 0.51 0.51 0.67 0.67 0.61 0.63 60.6%

C.2 Trade-off between Success Rate and Diversity

In addition to overall performance, we also show the trade-off between Success Rate and diversity of RGA, TargetDiff w/ Opt., and DecompOpt for each target protein pocket in Figure 12. DecompOpt shows general superiority to the other two baselines in most cases considering both Success Rate and diversity.

Refer to caption
Figure 12: Trade-off of Success Rate and diversity. Each point with coordinate (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) represents a pocket with Success Rate x𝑥xitalic_x and diversity y𝑦yitalic_y. The closer to the top right, the better.

C.3 Evaluation of the ability to design novel ligands

We additionally test the Novelty and Similarity of generated ligands compared with the reference ligand. Novelty is defined as the ratio of generated ligands that are different from the reference ligands of the corresponding pockets in the test set. Similarity is defined as the Tanimoto Similarity between the generated ligands and the corresponding reference ligands. The results show that the generated ligands are not similar to reference ligands in the test set. Besides, we also test Uniqueness and Diversity of generated ligands. Uniqueness is the percentage of unique molecules among all the generated molecules. Diversity is the same as that in Section 4.1. The results are reported in Table 10. These results show that DecompOpt can design novel ligands, which is an important ability for drug discovery.

Table 10: Evaluation of the ability to design novel ligands.
Methods Novelty Similarity Uniqueness Diversity
LiGAN 100%percent\%% 0.22 87.82%percent\%% 0.66
AR 100%percent\%% 0.24 100%percent\%% 0.70
Pocket2Mol 100%percent\%% 0.26 100%percent\%% 0.69
TargetDiff 100%percent\%% 0.30 99.63%percent\%% 0.72
DecompDiff 100%percent\%% 0.34 99.99%percent\%% 0.68
RGA 100%percent\%% 0.37 96.82%percent\%% 0.41
DecompOpt 100%percent\%% 0.36 100%percent\%% 0.60

C.4 Effects of subpockets.

To study the influence of subpockets in controlling the optimization, we further conducted an ablation study using only arms without subpockets as conditions. As Table 11 shows, while DecompOpt, when solely with arms as conditions, is capable of optimizing all metrics, its efficiency in this scenario is not as well as DecompOpt that utilizes both arms and pockets as conditions. Recall that we use SE(3)-invaraint features of arms (and subpockets) as conditions. Without subpockets, this feature would be agnostic to the molecular interaction and spatial relation between the arms and subpockets. Such information is important to some of the properties (e.g., Vina scores). The SE(3)-invaraint features from pairs of subpockets and arms contain the aforementioned information and are better aligned with the protein-ligand complex being generated.

Table 11: Comparison of DecompOpt optimization results with only arms and arm-pocket complexes as conditions. (\uparrow) / (\downarrow) denotes a larger / smaller number is better.

Methods Vina Score (\downarrow) Vina Min (\downarrow) Vina Dock (\downarrow) High Affinity (\uparrow) QED (\uparrow) SA (\uparrow) Diversity (\uparrow) Success Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Rate DecompDiff -5.67 -6.04 -7.04 -7.09 -8.39 -8.43 64.4% 71.0% 0.45 0.43 0.61 0.60 0.68 0.68 24.5% DecompOpt (arms-only) -5.52 -6.26 -7.05 -7.26 -8.65 -8.64 66.6% 86.1% 0.46 0.43 0.63 0.63 0.63 0.63 45.7% DecompOpt -5.87 -6.81 -7.35 -7.72 -8.98 -9.01 73.5% 93.3% 0.48 0.45 0.65 0.65 0.60 0.61 52.5%

C.5 Influence of the quality of initial ligands on performance.

To study the influence of the quality of initial ligands on performance of structure-based molecular optimization, we have conducted an ablation study focusing on Vina Min score optimization, using ligands with high and low Vina Min scores as initializations for the arm lists. Due to limited resources, we chose to conduct this study on the protein 2V3R, which is randomly chosen from our test set. We generated 100 ligands using DecompDiff and selected 20 ligands with the highest and lowest Vina Min scores. These ligands were then used as the initial conditions for the optimization process. As shown in Table 12, the optimization outcomes are slightly influenced by the quality of the initial ligands. However, regardless the quality of the initial ligands, DecompOpt can consistently improve the quality of the generated ligands.

Table 12: Comparison of optimization with initial ligands of different quality.
High Vina Min Scores Low Vina Min Scores ΔΔ\Deltaroman_Δ (high - low)
Avg. Med. Avg. Med. Avg.
Initial ligands -8.54 -8.48 -7.08 -7.04 -1.46
DecompOpt -9.12 -8.96 -9.00 -8.96 -0.12

C.6 Influence of the number of initial ligands on performance.

To study the influence of the number of initial molecules on the performance of structure-based molecular optimization, we further run the experiments with initial arm lists of 1 and 5 molecules generated by DecompDiff. As Table 13 indicates, the initial number of molecules has a modest impact on the optimization outcomes, with a higher number of molecules generally leading to improved performance. Notably, even when starting with a single molecule generated by DecompDiff, DecompOpt demonstrates a considerably high success rate.

Table 13: Summary of results using different number of molecules to initialize arm lists. (\uparrow) / (\downarrow) denotes a larger / smaller number is better.

Methods Vina Score (\downarrow) Vina Min (\downarrow) Vina Dock (\downarrow) High Affinity (\uparrow) QED (\uparrow) SA (\uparrow) Diversity (\uparrow) Success Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Rate init num = 1 -5.41 -6.61 -7.12 -7.51 -8.78 -8.82 70.4% 88.9% 0.47 0.45 0.64 0.63 0.61 0.61 47.0% init num = 5 -5.71 -6.71 -7.25 -7.58 -8.86 -8.97 71.8% 93.3% 0.49 0.46 0.65 0.64 0.60 0.61 49.4% init num = 20 -5.87 -6.81 -7.35 -7.72 -8.98 -9.01 73.5% 93.3% 0.48 0.45 0.65 0.65 0.60 0.61 52.5%

Appendix D Extended Results of Controllability

D.1 R-group Optimization

We provide additional R-group Optimization experiment on protein 4G3D, as shown in Figure 13.

Refer to caption
Figure 13: Additional R-group optimization result. The left column is reference binding molecule, the middle and right columns are molecules generated by DecompOpt with 30 rounds of optimization on protein 4G3D. Optimized R-group are highlighted in red.
Table 14: R-group optimization results generated using Decompdiff and DecompOpt on protein 3DAF and 4F1M. DecompOpt was optimized over 30 rounds towards high Vina Min Score and evaluated using the final round results. Both targets were assessed with 20 generated molecules and the mean of properties are reported.

Model 3DAF 4F1M Vian Min (\downarrow) Tanimoto Sim. (\uparrow) Complete. (\uparrow) Vian Min (\downarrow) Tanimoto Sim. (\uparrow) Complete. (\uparrow) Decompdiff -8.44 0.15 60.0% -5.90 0.15 65.0% DecompOpt -9.39 0.23 95.0% -6.32 0.49 55.0%

D.2 Fragment Growing

Enhancing the binding affinity of drug candidates through combination of R-group optimization and fragment growing can effectively leverage capabilities of DecompOpt. The quantitative results are shown in Table 14. For our case study, we perform R-group optimization and fragment growing on 5AEH. Starting from a high binding affinity drug candidate, we first optimize R-group for 30 rounds same as workflow in Section 4. Subsequently, we design the new arms prior and atom num with expert guidance, and expand fragments using DecompOpt. As Figure 14 shows, DecompOpt ultimately generates molecules with a Vina Min Score more than 4 kcal/mol better than the reference.

Refer to caption
Figure 14: Example of R-group optimization and fragment growing conducted using DecompOpt on 5AEH. The reference ligand, the best R-group result, and the best fragment growing result based on R-group optimization are displayed from left to right. The selected R-group is highlighted in red, while the newly extended arm is highlighted in orange.

D.3 Scaffold Hopping

Additional Evaluation Metrics

In addition to evaluation metrics discussed in Section 4, we evaluated Validity, Uniqueness, Novelty, Complete Rate, and Scaffold Similarity to measure models’ capability in scaffold hopping. Detailed calculation of these metrics as follows:

  • Validity is defined as the fraction of generated molecules that can be successfully sanitized.

  • Uniqueness measures the proportion of unique molecules among the generated molecules.

  • Novelty measures the fraction of generated molecules that not presented in training set.

  • Complete Rate measures the proportion of completed molecules within the generated results.

  • Scaffold Similarity Following Polykovskiy et al. (2020), Bemis–Murcko scaffolds are extracted using rdkit function MurckoScaffold. We count the occurrences of scaffolds in all generated and reference molecules, creating vectors G𝐺Gitalic_G and R𝑅Ritalic_R, where each dimension represents the count of a specific scaffold. The scaffold similarity is calculated as the cosine similarity between vectors G𝐺Gitalic_G and R𝑅Ritalic_R.

More Examples of Generated Results

For scaffold hopping, we provide more visualization of ligands generated by DecompOpt and DecompDiff on protein 2Z3H, 4AVW, 4QLK, and 4BEL, which are shown in Figure 15.

Refer to caption
Figure 15: More examples of Scaffold Hopping results. The left column shows reference ligands, Scaffold Hopping results generated by DecompOpt are shown at the second and the third rows, and results generated by DecompDiff are shown at the fourth and the fifth rows. Scaffold are highlighted in green.

Changes in Molecular Properties After Scaffold Hopping

Scaffold hopping aims at finding scaffold structures that can connect existing functional groups without disrupting their interactions with the target protein. The main purpose of this is to find novel scaffolds which are not protected by existing patents while maintaining comparable properties as the original molecule. Therefore, we did not implement property optimization mechanisms in scaffold hopping tasks and solely focusing on designing scaffolds that can connect existing arms. We provide the property comparison before and after scaffold hopping in Table 15. As the result shows, the properties of the ligands remain relatively consistent before and after the process of scaffold hopping.

Table 15: Summary of properties of reference molecules and molecules generated through scaffold hopping using DecompOpt. (\uparrow) / (\downarrow) denotes a larger / smaller number is better.

Methods Vina Score (\downarrow) Vina Min (\downarrow) Vina Dock (\downarrow) QED (\uparrow) SA (\uparrow) Avg. Med. Avg. Med. Avg. Med. Avg. Med. Avg. Med. Reference -6.36 -6.46 -6.71 -6.49 -7.45 -7.26 0.48 0.47 0.73 0.74 Scaffold Hopping by DecompOpt -5.89 -6.13 -6.46 -6.28 -7.28 -7.48 0.49 0.48 0.71 0.69