Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Journal of Cheminformatics logoLink to Journal of Cheminformatics
. 2018 Jul 24;10:33. doi: 10.1186/s13321-018-0287-6

Multi-objective de novo drug design with conditional graph generative model

Yibo Li 1, Liangren Zhang 1,, Zhenming Liu 1,
PMCID: PMC6057868  PMID: 30043127

Abstract

Recently, deep generative models have revealed itself as a promising way of performing de novo molecule design. However, previous research has focused mainly on generating SMILES strings instead of molecular graphs. Although available, current graph generative models are are often too general and computationally expensive. In this work, a new de novo molecular design framework is proposed based on a type of sequential graph generators that do not use atom level recurrent units. Compared with previous graph generative models, the proposed method is much more tuned for molecule generation and has been scaled up to cover significantly larger molecules in the ChEMBL database. It is shown that the graph-based model outperforms SMILES based models in a variety of metrics, especially in the rate of valid outputs. For the application of drug design tasks, conditional graph generative model is employed. This method offers highe flexibility and is suitable for generation based on multiple objectives. The results have demonstrated that this approach can be effectively applied to solve several drug design problems, including the generation of compounds containing a given scaffold, compounds with specific drug-likeness and synthetic accessibility requirements, as well as dual inhibitors against JNK3 and GSK-3β.graphic file with name 13321_2018_287_Figa_HTML.jpg

Electronic supplementary material

The online version of this article (10.1186/s13321-018-0287-6) contains supplementary material, which is available to authorized users.

Keywords: Deep learning, De novo drug design, Graph generative model

Background

The ultimate goal of drug design is the discovery of new chemical entities with desirable pharmacological properties. Achieving this goal requires medicinal chemists to explore the chemical space for new molecules, which is proved to be extremely difficult, mainly due to the size and complexity of the chemical space. It is estimated that there are around 1060-10100 synthetically available molecules [1]. Meanwhile, the space of chemical compounds exhibits a discontinues structure, making searching difficult to perform [2].

De novo molecular design aims at assisting this processes with computer-based methods. Early works have developed various algorithms to produce new molecular structures, such as atom based elongation or fragment based combination [3, 4]. Those algorithms are often coupled with global optimization techniques such as ant colony optimization [5, 6], genetic algorithms [7, 8] or particle swam optimization [9] for the generation of molecules with desired properties.

Recent developments in deep learning [10] have shed new light on the area of de novo molecule generation. Previous works have shown that deep generative models are very effective in modeling the SMILES representation of molecules using recurrent neural networks (RNN), an architecture that has been extensively applied to tasks related sequential data [11]. Segler et al. [12] applied SMILES language model (LM) on the task of generating focused molecule libraries by fine-tuning the trained network with a smaller set of molecules with desirable properties. Olivecrona et al. [13] used a GRU [14] based LM trained on the ChEMBL [15] dataset to generate SMILES string. The mode is then fine-tuned using reinforcement learning for the generation of molecules with specific requirements. Popova et al. [16] proposed to integrate the generative and predictive network together in the generation phase. Beside language model, Gómez–Bombarelli et al. [13] used variational autoencoder (VAE) [17] to generate drug-like compounds from ZINC database [18]. This work aimed at obtaining a bi-directional mapping between molecule space and a continuous latent space so that operations on molecules can be achieved by manipulating the latent representation. Blaschke et al. [19] compared different architectures for VAE and applied it to the task of designing active compounds against DRD2.

The researches described above demonstrated the effectiveness of SMILES based model regarding molecule generation. However, producing valid SMILES strings requires the model to learn rules that are irrelevant to molecular structures, such as the SMILES grammar and atom ordering, which increases the burden of the model and makes the SMILES string a less preferable representation compared with molecular graphs. Research in deep learning has recently enabled the direct generation of molecular graphs. Johnson et al. [20] proposed a sequential generation approach for graphs. Though their implementation is mainly for reasoning tasks, this framework is potentially applicable to molecule generation. A more recent method [21] was proposed for generating the entire graph all at once. This model has been successfully applied to the generation of small molecular graphs. The implementation that is most similar to ours is by the recent work by Li et al. [22] using a sequential decoding scheme similar to that by Johnson et al. Decoding invariance is introduced by sampling different atom ordering from a predefined distribution. This method has been applied to the generation of molecules with less than 20 heavy atoms from ChEMBL dataset. Though inspiring, the methods discussed above have a few common problems. First of all, the generators proposed are relatively general. This design allows those techniques to be applied to various scenarios but requires further optimization for application in molecule generation. Secondly, many of those models suffer from scalability issue, which restricts the application to molecules with small sizes.

In this work, we propose a graph-based generator that is more suited for molecules. The model is scaled to cover compounds containing up to 50 heavy atoms in the ChEMBL dataset. Results show the graph-based model proposed is able to outperform SMILES based methods in a variety metrics, including the rate of valid outputs, KL and JS divergence of molecular properties, as well as NLL loss. A conditional version of the model is employed to solve various drug design related tasks with multiple objectives, and results have demonstrated promising performance.

Methods

Molecular graph

Molecular graph is a way of representating the structural information of molecules using graphs (G=(V,E)). Atoms and bonds in the molecule are viewed as graph nodes (vV) and edges (eE). Each node is labeled with its corresponding atom type, while each edge is labeled with its corresponding bond type. We refer the set of all atom types and bond types as A and B respectively. In this work, the atom type is specified using three variables: the atomic symbol (or equally the atomic number), the number of explicit hydrogens attached, and the number of formal charges. For example, the nitrogen atom in pyrrole can be represented as the triple (“N”, 1, 0). The set of all atom types (A) is extracted from molecules in the ChEMBL dataset (see Additional file 1: Supplementary Text 1), and contains 33 members in total. For bonds, we only consider the following four bond types: single, double, triple and aromatic. A visualized demonstration of molecular graph is given in Fig. 1.

Fig. 1.

Fig. 1

Cimetidine and its graph based representation. In the graph based generative models, molecules (see a) are represented as graphs G=(V,E) (see b), where atoms are bonds are viewed as nodes and edges respectively. Atom types are specified by three parameters: the atomic symbol (or equally the atomic number), the number of explicit hydrogens attached, and the number of formal charges (see c). For bond types, only single, double, triple and aromatic bonds are considered in this work (see d)

Graph generative model

We now consider the deep generative models that can directly output molecular graphs. In this work, we mainly focus on sequential graph generators, which build graph by iteratively refining its intermediate structure. The process starts from the empty graph G0=(,). At step i, a graph transition ti is selected from the set of all available transition actions T(Gi) based on the generation history (G0,,Gi). The selection is done by sampling ti from a probability distribution tipθ(ti|Gi,,G0), which is parametrized by a deep network. Then, ti is performed on Gi to get the graph structure for the next step Gi+1=ti(Gi). At the final step n, termination operation t is performed and the model outputs G=Gn as the final result.

The entire process is illustrated in Fig. 2. We call the mapping T, which determines all available graph transitions at each step, a decoding scheme. The sequence r=((G0,t0),(G1,t1),,(Gn,tn)) is called a decoding route of G, and the distribution pθ(ti|Gi,,G0) is called a decoding policy.

Fig. 2.

Fig. 2

A schematic representation of molecule generation process. Starting with the empty graph G0, initialization is performed to add the first atom. At each step, a graph transition (append, connect or terminate) is sampled and performed on the intermediate molecule structure. The probability for sampling each transition is given by pθ(t|Gi,,G0), which is parametrized using deep neural network. Finally, termination operation is performed to end the generation

Previous graph generative models are usually too general and less optimized for the generation of molecular graphs. Here we offer the following optimizations:

  1. A much simpler decoding scheme T is used to decrease the number of steps required for generation.

  2. No atom level recurrent unit is used in the decoding policy. Instead, we explored two other options: (1) parametrizing the decoding policy as a Markov process and (2) using only molecule level recurrent unit. Those modifications helps to increase the scalability of the model.

  3. During the calculation of log-likelihood loss, we sample r from a parametrized distribution qα(r|G). The parameter α controls the degree of randomness of qα, offering higher flexibility for the model.

The following three sections are devoted to the detailed discussions of the optimizations above.

Decoding scheme

The transitions in T(Gi) given the intermediate state Gi is restricted to the following four types:

  1. Initialization At the beginning of the generation, the only allowed transition is to add the first atom to the empty graph G0.

  2. Append This action adds a new atom to Gi and connect it to an existing atom with a new bond.

  3. Connect This action connects two existing atoms v1,v2Vi with a new bond. For simplicity, we only allow connections to start from the latest appended atom v, which means that v1=v.

  4. Termination (t) End the generation process.

The entire process is shown in Fig. 2, and a more detailed illustration is provided in Additional file 2: Figure S1 and S2. In theory, T(G) should not contain actions that violate the chemical validity of molecules. However, in order to test the ability for the model to learn those constraints, we do not explicity exclude those actions from T(G) during training.

Note that compared with the implementation in [22], the action of adding new atom and the action of connecting it to the molecule is merged into a single “append” step. This helps to reduce the number of steps during generation. It is easy to show that the number of steps required for generating graph G=(V,E) equals exactly to |E|+2, which is generally much smaller than the length of the corresponding SMILES string (as shown in Additional file 2: Figure S3).

Decoding policy

During generation, the decoding policy pθ need to specify the probability value for each graph transition in T(Gi). More specifically, pθ need to output the following probability values:

  1. pvA for each vVi A matrix with size |A|×|B|, whose element (pv)ab represents the probability of appending a new atom of type aA to atom v with a new bond of type bB.

  2. pvC for each vVi A vector with size |B|, whose element (pvC)b represents the probability of connecting the latest added atom v with v using a new bond of type bB.

  3. p A scalar value indicating the probability of terminating the generation.

A visualized depiction of pvA, pvC and p is shown in Fig. 2. The decoding policy pθ is parameterized using neural network. At each step, the network accepts the the decoding history (G0,,Gi) as input and calculates the probability values (pvA, pvC, p) as output. In this work, we explored two novel graph generation architectures, namely MolMP and MolRNN. Unlike the methods proposed in [20, 22], the two architectures do not involve atom level recurrency, which helps to increase the scalability of the model.

MolMP

MolMP models graph generation as a Markov process, where the transition of Gi only depends on the current state of the graph, not on the history (Fig. 3a). This means that pθ(t|Gi,,G0)=pθ(t|Gi). Since this type of architecture does not include any recurrent units, it will be less expensive compared with RNN based models. Moreover, the computation at different steps can be easily parallelized during training. The detailed architecture of MolMP is given as follows:

  1. An initial atom embedding hv0 is first generated for each atom v:
    hv0=Embeddingθ(v) 1
    hv0 is determined based on the following information: (1) the atom type of v and (2) whether v is the latest appended atom. The dimension of hv0 is set to 16.
  2. hv0 is passed to a sequence of L graph convolutional layers:
    hvl=GraphConvθl(hvl-1,Gi) 2
    where l=1,,L. Except the first layer, each convolutional layer GraphConvθl adopts a “BN-ReLU-Conv” structure as suggested in [23]. The detailed architecture of graph convolution is described in “Graph Convolution”. We use six convolution layers in this work (L=6), each with 32, 64, 128, 128, 256, 256 output units.
    The outputs from all graph convolutional layers are then concatenated together, followed by batch normalization and ReLU:
    hvskip=relubn(Concat(hv1,,hvL)) 3
  3. hvskip is passed to the fully connected network MLPθFC to obtain the final atom level representation hv.
    hv=MLPθFC(hvskip) 4
    MLPθFC consists of two linear layers, with 256 and 512 output units each. Batch normalization and ReLU are applied after each layer.
  4. Average pooling is applied at graph level to obtain the molecule representation hGi:
    hGi=AvgPool([hv]vVi) 5
  5. The probability value for each action is produced by first calculate the unnormalized values (p^vA, p^vC and p^) as follows:
    p^vA,p^vC=MLPθ(hv,hGi) 6
    p^=MLPθ(hGi) 7
    Those values are then normalized to get the final result:
    pvA=p^vA/P 8
    pvC=p^vC/P 9
    p=p^/P 10
    where P=vab(p^vA)ab+vb(p^vC)b+p^

    MLPθis a two layer fully connected network with hidden size 128 and output size |A|×|B|+|B|. This output is then split into the matrix p^vA of size |A|×|B| and the vector p^vC of length |B|. MLP is a one layer fully connected network. Both MLPθ and MLP uses exponential activiaton in the output layer.

The architecture of the entire network is shown in Fig. 4.

Fig. 3.

Fig. 3

The two type of graph generative architectures explored in this work: a MolMP: this architecture treats graph generation as a Markov process, in which the transition of Gi only depends on the current state of the graph, not on the history. b MolRNN: this architecture adds a single molecule level recurrent unit to MolMP

Fig. 4.

Fig. 4

Network architecture for MolMP. This figure shows the detailed model architecture for MolMP. MolRNN adopts a structure highly similar to that of MolMP, except the inclusion of a molecule level recurrent unit

MolRNN

The second architecture adds a single molecule level recurrent unit to MolMP, as shown in Fig. 3. We refer to this method as MolRNN. The model architecture is specified as follows:

  1. First of all, the model generates the atom level (hv,vVi) and molecule level (hGi) representation for the graph state Gi. This part of the network uses the same architecture as that in MolMP.

  2. Given hv and hGi, the hidden state of the molecule level recurrent unit (hiRNN) is updated as:
    hi+1RNN=RNNθ(hiRNN,hv,hGi) 11
    where hv is the representation of the latest appended atom v. The recurrent network RNNθ is employed using three GRU layers with a hidden size of 512.
  3. The probability values pvA, pvC, p are calculated in the same manner as MolMP by replacing hGi in Eqs. 6 and 7 with hi+1RNN.

The overall architecture of MolRNN is highly similar to that of MolMP. However, it is found that the molecule level recurrent unit in MolRNN provides significant improvements to the model performance (see “Model performance and sample quality”), while inducing little extra computational cost compared with MolMP.

Graph convolution

In this work, we rely on graph convolutional network (GCN) [24] to extract information from intermediate graph states Gi. Each graph convolutional layer adopts the “BN-ReLU-Conv” structure as described before. In terms of the convolution part, the architecture is structured as follows:

hvl=Wlhvl-1+bBΘbluNbbond(v)hul-1+1<dDΦdluNdpath(v)hul-1 12

where hvl is the output representation of atom v at layer l, and hvl-1 is the input representation. Nbbond(v) is the set of all atoms directly connected to atom v with bond of type b, and Ndpath(v) is the set of all atoms whose distance to atom v equals to d. D represents the receptive field size, which is set to 3 in this work. Wl, Θbl and Φdl are weight parameters of layer l.

In brief, the output representation of atom v at each layer l (hvl) is calculated according to the following information:

  1. The input representation of v (hvl-1),

  2. Information of local neighbors, which is given by bBΘbluNbbond(v)hul-1. Note that this part of information is conditioned on the bond type b between v and its neighborhood atom u.

  3. Information of remote neighbors, given by 1<dDΦdluNdpath(v)hul-1. This part of information is conditioned on the distance d between v and its remote neighbor u.

The architecture is illustrated in Fig. 5. Our implementation of graph convolution is similar to the edge conditioned convolution by Simonovsky et al. [25], except that we directly include the information of remote neighbors of v in order to achieve a larger receptive field with fewer layers.

Fig. 5.

Fig. 5

Architecture of graph convolutional layer. At each layer, the output representation for atom v is given by: (1) the input representation of v from previous layers, (2) information of local neighbors and (3) information of distant neighbors

Likelihood function

To train the generative model, we need to maximize the log-likelihood pθ(G) for the training samples. However, for the step-wise generative models discussed above, the likelihood is only tractable for a given decoding route r=((G0,t0),(G1,t1),,(Gn,tn)):

logpθ(G,r)=i=0nlogpθ(ti|Gi,,G0) 13

While the marginal likelihood can be computed as:

logpθ(G)=logrR(G)pθ(G,r) 14

where R(G) is the set of all possible decoding route for G. The marginal likelihood function is intractable for most molecules encountered in drug design. One way to resolve this problem is to use importance sampling as proposed in [22]:

logpθ(G)=logErq(r|G)pθ(G,r)q(r|G) 15

where q(r|G) is a predefined distribution on R(G). Both the deterministic and the fully randomized q(r|G) were explored in the previous work [22]. However, a more desirable solution would lie in somewhere between deterministic decoding and fully randomized decoding. In this work, instead of sample from the distribution q(r|G), we sample r from distribution qα(r|G) that is parameterized by 0α1. qα(r|G) is designed such that the decoding will largely follow depth first decoding with canonical ordering, but at each step, there is a small possibility 1-α that the model will make a random mistake. In this way, the parameter α measures can be used to control the randomness of the distribution qα. The algorithm is shown in Additional file 1: Supplementary Text 4.

logpθ(G)=logErqα(r|G)pθ(G,r)qα(r|G)log1ki=1kpθ(G,ri)qα(ri|G) 16

For α=1, the distribution falls back to the deterministic decoding. The parameter α is treated as a hyperparameter which is optimized for model performance. We tried α{1.0,0.8,0.6} on both MolMP and MolRNN.

Conditional generative model

Most molecule design tasks require to produce compounds satisfying certain criteria, such as being synthetically available or having a high affinity for a certain target. Previous researches have developed various methods to achieve objective directed molecule generation. Segler et al. [12] used transfer learning in the design of focused compound libraries. Olivecrona et al. [13] applied reinforcement learning (RL) in the objective based chemical design and have reported promising performance in various tasks. Guimaraes et al. [26] proposed ORGAN, which combines SeqGAN with an domain-specific objective term, and showed that ORGAN is effective in the optimization of different molecular properties. Neil et al. [27] created a benchmark analysis of various RL based method in different tasks of molecule design. In this work, we explored another way to achieve requirement based molecule design using conditional generative model. We first convert the given requirement to the numerial representation called conditional code (c), and the generative model is then modified to be conditioned on c. For graph generative model, this means that the decoding policy is now pθ(ti|Gi,,G0,c) (see Fig. 6). Compared with previous approaches by Olivecrona et al. and Guimaraes et al., conditional generative model does not require reinforcement learning, and provides the following flexibilities:

  1. The conditional code can incorporate both discrete and continuous objectives, and can easily scale to multiple objective.

  2. When changing the generation objective, previous methods usually require the model to be retained on the new condition. But for conditional generative models, this can be achieve simply by changing the conditional code input c.

Both graph based and SMILES based conditional generators are implemented in this work. For graph based model, the graph convolution is modified to include c as input:

hvl=Wlhvl-1+bBΘbluNbbond(v)hul-1+1<dDΦdluNdpath(v)hul-1+Ψlc 17

Simply state, c is included in the graph convoludion architecture by adding an additional term Ψlc to the unconditional implementation in Eq. 12. For SMILELS based model, the conditional code is included by concatenating it with the input at each step: xi=Concat(xi,c). where xi is the one-hot representation of the SMILES charactor input at step i.

Fig. 6.

Fig. 6

Conditonal generative models: a For the generation of molecules based on a given requriement, the requriement (query) is first converted to the numerical representation called conditoinal code, the generative model is then modified to be conditioned on the conditonal code. b Scaffold based molecule generation: The model is required to generate molecules based on a given scaffold. This requriement is first converted to the conditional code called scaffold fingerprint. Then the molecule containing the query scaffold is generated based on the fingerprint. c Generation based on drug-likeness and synthetic accessibility: Drug-likeness and synthetic accessibility are first quantized using QED and SAscore. Then model then generates molecules based on the two properties. d Designing of dual inhibitors of JNK3 and GSK-3β: the generation process is based on the bioactivity fingerprint containing the activity requirement of each target

Conditional models have already been used by the previous work [21] for molecule generation, but was restricted to simple properties such as the number of heavy atoms as conditional codes. Also, the method have not yet applied to multi-objective molecule generation. Here, we apply this method to other more complexed drug design tasks, including scaffold-based generation, property-based generation and the design of dual inhibitor of JNK3 and GSK-3β (see 6). The best performing graph and SMILES based generator (see “Model performance and sample quality”) are implemented in conditionalized version and applied to those tasks.

Scaffold-based generation

The concept of molecular scaffold has long been of significant importance in medicinal chemistry [28]. Though various definitions are available, the most widely accepted definition is given by Bemis and Murcko [29], who proposed derive the scaffold of a given molecule by removing all side chain atoms. Studies have found various scaffolds that have privileged characteristics in terms of the activity of certain target [3032]. Once such privileged structure is found, a related task is to produce compound libraries containing such scaffolds for subsequent screening.

Here, conditional graph generative model is applied to generate compounds containing scaffold s, which is drawn from the pre-defined scaffold set S={si}i=1NS. The set S is extracted from the list of approved drugs in DrugBank [33]. Two types of structures are extracted from the molecules to construct S: (1) the Bemis–Murcko scaffolds, and (2) ring assemblies. Ring assemblies are included in S since we found that including extra structural information beside Bemis–Murcko scaffolds helps to improve the conditional generation performance. Detailed scaffold extraction workflow is shown in Additional file 1: Supplementary Text 2. For each molecule G, the conditional code c=(c1,c2,,cNS) is set to be the binary vector such that ci=1 if G contains si as substructure, and ci=0 otherwise. We refer c as the scaffold fingerprint of G, as it can in fact be viewed as a substructure fingerprint based on scaffold set S. To generate molecule containing substructure sS, the fingerprint cs for s is used as conditional code. The output should contain two type of molecules:

  1. Molecules containing s as its Bemis–Murcko scaffold.

  2. Molecules whose Bemis–Murcko scaffold contains s but does not reside inside S.

The procedure is better demonstrated in Fig. 7. Using this method, detailed control can be performed on the scaffold of the output structure.

Fig. 7.

Fig. 7

Workflow for scaffold based molecule generation. Scaffold set S is first extracted from compounds in DrugBank. The conditional code c is set to be the substructure fingerprint based on S. Training is performed with the training samples labeled with cG. After training, scaffold based generation is performed using the fingerprint cs of the query scaffold sS

Generation based on synthetic accessibility and drug-likeness

Drug-likeness and synthetic accessibility are two properties that have significant importance in the development of novel drug candidate. Drug-likeness measures the consistency of a given compound with the currently known drugs in terms of the structural or physical properties, and is frequently used to filter out obvious non-drug like compounds in the early phase of screening [34, 35]. Synthetic accessibility is also an important property for de novo drug design since subsequent experimental validation requires synthesis of the given compound [36]. In this task, the model is required to generate molecules according to a given level of drug-likeness and synthetic accessibility. The drug-likeness is measured using the Quantitative Estimate of Drug-likeness (QED) [37], and synthetic accessibility is evaluated using the SA score [36]. The conditional code c is defined as c=(QED,SA), where the QED and SA score is all calculated using RDKit [38].

In practice, instead of specifying a single value of QED and SA score, we often use intervals to express the requirements for desired output molecules. This means that we are required to sample molecules from the distribution pθ(G|cC), where the generation requirement is described as a set C instead of a single point c. Here, we samples from pθ(G|cC) by first drawing c from p(c|cC), and then drawing G from pθ(G|c). Sampling from p(c|cC) can be achieved by first sample c from p(c) using molecules from the test set, then filter c according to the requirement cC.

Designing dual inhibitor against JNK3 and GSK-3β

With the ability to model multiple requirements at once, conditional generative models can be used to design compounds with specific activity profiles for multiple targets. Here, we consider the task of designing dual inhibitors against both c-Jun N-terminal kinase 3 (JNK3) and glycogen synthase kinase-3 beta (GSK-3β). Both of the two targets are serine/threonine (S/T) kinases, and have shown to be related to the pathogenesis of various types of diseases [39, 40]. Notably, both JNK3 and GSK-3β are shown to be potential target in the treatment of Alzheimer’s disease (AD). Jointly inhibiting JNK3 and GSK-3β may provide potential benefit for the treatment of AD.

The conditional code is set to be c=(cJNK3,cGSK-3β), where cJNK3, cGSK-3β are binary values indicating whether the compound is active against JNK3 and GSK-3β. For compounds in the ChEMBL dataset, cJNK3 and cGSK-3β are labeled using a separately trained predictor. Random forest (RF) classifier, which has been demonstrated to provide good performance for kinase activity prediction [41], is used as the predictor for GSK-3β and JNK3 activity. Here, we use ECFP6 [42] as the descriptor. The predictive model is trained using activity data from ExCAPE-DB [43], which is an integrated database with activity values from ChEMBL and PubChem [44]. Workflow for data extraction and predictor training is provided in Additional file 1: Supplementary Text 3. It is found that there is only 1.2% of molecules in ChEMBL that is predicted to be active against JNK3 or GSK-3β. This imbalance results in low enrichment rate during conditioned generation. For better result, the model is first trained under the unconditioned setting, and then fine-tuned based on the 1.2% molecules mentioned above.

Training details

The graph generative models are trained using the ChEMBL dataset. The data processing workflow largely follows Olivecrona et al. [13], as described in Additional file 1: Supplementary Text 1. MXNet [45] is used to implement the networks, and Adam optimizer [46] is used for network training. An initial learning rate of 0.001 is used together with a decay rate of 0.001 for every 100 iterations. Other parameters of the optimizer are set to be the default values suggested in [46] (that is, β1=0.9,β2=0.999 and ϵ=10-8). The training lasts for 5 epochs, and the size of each mini-batch is set to 200 during the training.

During training, the decoding route is drawn from the distribution qα(r|G). We tried three α values: 1.0, 0.8 and 0.6, as discussed previously. For α=1.0, k is set to 1 and the training can be performed on a single Nvidia GeForce GTX 1080Ti GPU for both MolMP and MolRNN. The training lasts for 14h for MolMP and 16h for MolRNN. For α=0.8 and α=0.6, k is set to 5 and the training is performed synchronously on 4 GPUs. The training lasts for 30h for MolMP and 35h for MolRNN.

For scaffold based and property based generation tasks, the conditonal graph generator is trained using the same setting as unconditional model. For the generation of GSK-3β and JNK3 inhibitors, the model is first trained using the full dataset, and the fine tuned on the subset that is predicted to be active against GSK-3β or JNK3. The fine-tuning uses a learning rate of 0.0001 and a decay rate of 0.002 for every 100 iterations. The fine-tuning lasts for 10 epochs, and takes 1h to finish.

In theory, the hyperparameters for the models mentioned above, including the training condition (batch size, learning rate, decay rate, β1, β2), model architectures(the number of convolutional layers, the hidden size in each layer) as well as α, should be optimized to achieve the best performance. However, due to the computational cost of both MolMP and MolRNN, we are unable to systematically optimize the hyperparameters. A througout discussion is only given for α, which determines the degree of randomness of qα. No optimization is performed on model architecture except fitting it into the memory.

Comparison with SMILES based methods

The proposed graph-based model is compared with several SMILES based models. Two type of methods, variational autoencoder (VAE) and language model (LM), are considered in this comparison. The implementation of SMILES VAE follows Gómez-Bombarelli et al. [2]. The encoder contains three 1D convolutional layers, with 9, 9, 10 filters and 9, 9, 11 kernels each, and a fully connected layer with 435 hidden units. The model uses 196 latent variables and a decoder with three GRU layers with 488 hidden units. VAE for sequential data faces from the issue of “optimization challenge” [47, 48]. While the original implementation uses KL-annealing to tackle this problem, we follow the method provided by Kingma et al. [49] by controlling the level of free bits. This offers higher flexibility and stability compared with KL-annealing. We restrict the minimal level of free bits to 0.03 for each latent variable.

For LM, two types recurrent units are adopted. The first type uses GRU, and includes two architectures: the first architecture (SMILES GRU1) consists of three GRU layers with 512 hidden units each, and the second (SMILES GRU2), uses a wider GRU architecture with 1024 units, following the implementation by Olivecrona et al. [13]. Beside GRU, we also included a LSTM based SMILES language model following Segler et al. [12]. This architecture uses three LSTM layers, each with 1024 units.

Comparison with reinforcement learning (RL) based methods

We also compared the performance of conditional generative model with three RL based method. The first method, which is proposed by Olivecrona et al., maximizes following objective during model optimization:

G(x)=-logp(x)+σS(x)-logqθ(x)2 18

where p(x) is the Prior network pre-trained using ChEMBL dataset, and q(x) is the Agent network for task-specific molecule generation. SMILES GRU2 is used as the architecture for Prior and Agent.

This method is refered to as “REINVENT” [13]. We also include the following two baselines in the comparison. The first is a non-regularized RL method with the following objective:

G(x)=σS(x) 19

We refer to this method as “Naive RL”. The second method includes a prior term in addition to 19:

G(x)=σS(x)+logp(x) 20

We refer to this method as “RL + Prior”.

Evaluation metrics

Several metrics have been employed to evaluate the performance of generative models:

Sample validity

To test whether the generative models are capable of producing chemically correct outputs, 300,000 structures are generated for each model, and subsequently evalulated by RDKit for the rate of valid outputs. We also evaluate the ability of each model to produce novel structures. This is done by accessing the rate of generated compounds that do not occure inside the training set.

DKL and DJS for molecular properties

A good molecule generator should correctly model the distribution of important molecular properties. Therefore, the distribution of molecular weight (MW), log-partition coefficient (LogP) and QED between the generated dataset (pg) and the test set (pdata) is compared for each method using Kullback–Leibler divergence (DKL) and Jensen–Shannon divergence(DJS):

DKL(pg||pdata)=Rpg(x)logpg(x)pdata(x)dx 21
DJS(pg||pdata)=12DKLpg||pg+pdata2+12DKLpdata||pg+pdata2 22

DKL and DJS are widely used in deep generated models for both training [17, 50] and evaluation [51]. Here, the two values are determined using kernel density method implemented in SciPy [52]. We used a gaussian kernel with bandwidth selected based on Scott’s Rule [53].

Performance metrics (Rc, Kcc and EORc) for task specific molecule design

For discrete conditional codes c, let Mc be the set containing molecules sampled from distribution pθ(G|c). Mc is obtained by first sampling molecule graphs conditioned on c and then removing invalid molecules. The size of |Mc| is set to 1000. Let Ncc be the set of molecules in Mc that satisfy the condition c (c may be different from c). The ratio Kcc is defined as:

Kcc=|Ncc||Mc| 23

The matrix Kcc can be used to evaluate the ability of the model to control the output based on conditional code c. When c=c, this value gives the rate of correctly generated outputs, denoted by Rc. High quality conditional models should have a high value of Rc and low values of Kcc for cc. In paractice, we find that the value of Kcc for scaffold and property based generation is significantly samller than Rc and have relatively low influence on the model’s performance. Therefore, the result of Kcc is omitted for scaffold and property based task, and is only reported for the task of kinase inhibitor design.

Let Rc0 be the rate of molecules in the training data that satisfy condition c. The enrichment over random EORc is defined as:

EORc=RcRc0 24

The definition is similar to that used in previous work [12], except that in their implementation Rc0 is calculated using the generated samples from the unconditioned model pθ(G). For continuous codes, a subset C of the conditional code space is used to describe the generation requirements. MC is sampled from pθ(G|cC), and values for KCC, RC and EORC can be calculated in a similar manner.

Rate of reproduced active compounds

For target based generation tasks, the rate of reproduced molecules is also reported following previous works [12, 13]. Take JNK3 as an example. During the evaluation, two sets of outputs are generated using two conditions: JNK3(+), GSK-3β (−) and JNK3(+), GSK-3β(+). The two set of outputs are denoted Mc1and Mc2respectively. Here, the size of |Mc1| and |Mc2| are both set to 50,000. Let T be the set containing the active molecules within the test set of JNK3. The rate of reproduced molecules (reprod) is calculated as:

reprod=|(Mc1Mc2)T||T| 25

For GSK-3β, the calculation can be done in a similar manner.

Sample diversity

For a good objective based molecule generator, the outputs are not only required to satisfy the given condition c, but also required to be structurally diverse. Benhenda [54] have suggested that the diversity of the model outputs should be consistent with the natural diversity of molecules satisfying the c. Also, Benhenda proposed to use the following statistics to evaluate the structural diversity of a given set of compounds:

I(M)=1|M|2(x,y)M×MTd(x,y) 26

where M is the set of sampled molecules, and Td(x,y) is the Tanimoto-distance between the two molecules x and y. Td(x,y) is defined using the Tanimoto-similarity Ts: Td(x,y)=1-Ts(x,y). This metric is called the internal diversity of the molecule set M.

For each condition c, the natural diversity Ic0 is first calculated using molecules in ChEMBL. The diversity of conditional outputs Ic is then calculated for each model. Note that when calculating Ic0 and Ic, we only include molecules that satisfy the condition c. Finally, the value |Ic-Ic0| is compared among different models for their ability to reconstruct the natural compound diversity.

Results and discussion

Model performance and sample quality

Several randomly generated samples by MolRNN are grouped by molecular weight and shown in Fig. 8a–c. The qauntitative comparison between SMILES based and graph based models (MolMP and MolRNN) has been performed, and the results are summarized in Tables 1 and 2. We first analysed the model performance in terms of the rate of valid outputs and the rate of valid and novel outputs. It can be seen from the results that both MolRNN and MolMP outperform all SMILES based methods. It is also noted that changing α from 1.0 to 0.8 can significantly increase the rate of valid outputs for both MolMP and MolRNN. Further decreasing α produces only marginal effect. The high validity of output structures by graph-based model is not surprising as the generation of SMILES poses much stricter rules to the output compared with the generation of molecular graphs.

Fig. 8.

Fig. 8

A visualized demonstration of model outputs. ac. Output samples by MolRNN. Results are grouped by molecular weight (a MW < 300, b 300 MW < 500, c MW 500); d, e Common mistakes made by SMILES based model and graph based model respectively; f Examples of broken aromaticity occurred during graph generation

Table 1.

Comparison between SMILES based and graph-based generators in output validity

Model % valid % novel % valid and novel
SMILES VAE 0.804 ± 0.016 0.986 ± 0.000 0.793 ± 0.016
SMILES GRU1 0.886 ± 0.002 0.984 ± 0.000 0.872 ± 0.002
SMILES GRU2 0.932 ± 0.002 0.965 ± 0.001 0.899 ± 0.002
SMILES LSTM 0.935 ± 0.006 0.975 ± 0.001 0.912 ± 0.006
MolMP (α=1.0) 0.952 ± 0.002 0.98 ± 0.001 0.933 ± 0.001
MolMP (α=0.8) 0.962 ± 0.002 0.984 ± 0.001 0.946 ± 0.001
MolMP (α=0.6) 0.963 ± 0.001 0.988 ± 0.001** 0.951 ± 0.001
MolRNN (α=1.0) 0.967 ± 0.001 0.959 ± 0.000 0.928 ± 0.001
MolRNN (α=0.8) 0.970 ± 0.001 0.976 ± 0.001 0.947 ± 0.001
MolRNN (α=0.6) 0.970 ± 0.001 0.985 ± 0.000 0.955 ± 0.001***

Results are reported as Mean±SD. The best performance in each metric is highlighted in italics face. Also, for each metric, paired t-test is carried out for the difference between the best and second performing methods (*** for p0.001, ** for p0.01 and * for p0.05). Multiple models are highlighted if the difference is not significant

Table 2.

Comparison between SMILES based and graph-based generators in DKL(×10-3) and DJS(×10-3)

Model MW LogP QED
DKL DJS DKL DJS DKL DJS
SMILES VAE 13.5 ± 0.6 3.6 ± 0.2 3.9 ± 0.4 0.9 ± 0.1 2.6 ± 0.4 0.6 ± 0.1
SMILES GRU1 8.6 ± 0.4 2.3 ± 0.1 3.1 ± 0.3 0.7 ± 0.0 1.5 ± 0.3 0.3 ± 0.1
SMILES GRU2 7.8 ± 0:3 2:0 ± 0.1 1.4 ± 0.2 0.3 ± 0.0 2.2 ± 0.3 0.5 ± 0.1
SMILES LSTM 6.5 ± 0.7 1.8 ± 0.2 3.4 ± 1.2 0.8 ± 0.3 1.9 ± 1.3 0.4 ± 0.3
MolMP (α=1.0) 11.5 ± 1.3 3.4 ± 0.4 7.0 ± 1.8 1.7 ± 0.4 5.3 ± 1.2 1.3 ± 0.3
MolMP (α=0.8) 8.3 ± 1.6 2.4 ± 0.5 4.3 ± 1.2 0.9 ± 0.2 2.7 ± 0.8 0.6 ± 0.2
MolMP (α=0.6) 8.4 ± 1.0 2.4 ± 0.3 5.0 ± 1.3 1.1 ± 0.4 3.0 ± 0.9 0.7 ± 0.2
MolRNN (α=1.0) 5.0 ± 0.6 1.4 ± 0.2 2.8 ± 0.5 0.7 ± 0.1 2.0 ± 0.6 0.5 ± 0.1
MolRNN (α=0.8) 4.1 ± 0.7 1.1 ± 0.2 1.6 ± 0.3 0.3 ± 0.1 1.0 ± 0.2 0.2 ± 0.0
MolRNN (α=0.6) 3.3 ± 0.2* 0.9 ± 0.1** 3.0 ± 0.4 0.5 ± 0.1 1.1 ± 0.4 0.2 ± 0.1

Results are reported as Mean±SD. The best performance in each metric is highlighted in italics face. Paired t-tests are carried out for the difference between the best and second performing methods (*** for p0.001, ** for p0.01 and * for p0.05). Multiple models are highlighted if the difference is not significant

Figure 8d, e summarize respectively the common mistakes made by SMILES-based and graph-based model during generation. Results in Fig. 8d show that the most common cause of invalid output for SMILES based models is grammar mistakes, such as unclosed parentheses or unpaired ring numberings. But for the graph-based model, the majority of invalid output is caused by broken aromaticity, as demonstrated in Fig. 8f. This is likely a result of stepwise decoding pattern of graph-based models, as the decoder can only see part of the aromatic structure during generation, while the determination of aromaticity requires the information of the entire ring. It is also observed that mistakes related to atom valance are relatively minor, meaning that those rules are easy to learn using graph convolution.

Graph-based methods also have the advantage of giving highly interpretable outputs compared with SMILES. This means that a large portion of invalid outputs can be easily corrected if necessary. For example, broken aromaticity can be restored by literately refining the number explicit hydrogens of aromatic atoms, and unclosed aromatic rings can be corrected simply by connecting the two ends using a new aromatic bond. Though possible, those corrections may introduce additional bias to the output samples depending on the implementation, thus not adopted in the subsequent tasks.

Next, we investigate the ability for the generators to learn the distribution of molecular properties, as demonstrated in Table 2. Results have shown that MolRNN gives the best performance in DKL and DJS for molecular weight (MW) and QED, while SMILES GRU2 gives the best performance for LogP. For MolMP, although it is able to outperform SMILES GRU1 in the rate of valid outputs, it fails to give better performance in DKL and DJS. This observation suggest that the molecule level recurrent unit in MolRNN can significantly imporved the ability for the model to learn information about the data distribution.

When it comes to the influence of α to DKL and DJS, it is found that changing α from 1.0 to 0.8 can significantly improve the perforamnce of MolMP and MolRNN for all molecular properties. Further decreasing α to 0.6 will have different effect for MolMP and MolRNN. For MolMP, this will hurt the overall performance of DKL and DJS, while for MolRNN, this will inprove the performance for molecular weight, but will significantly decrease the performance of LogP. Overall, α=0.8 will be a better choise for MolMP, and α=0.6 will be more suited for MolRNN.

Generally, MolRNN have showed significant advantages among all generative mdoels considered. In the subsequent evaluation of conditonal generative models, the best performing graph based model (MolRNN) and the best performing SMILES based model (SMILES GRU2) are implemented as conditonal models and are compared among all tasks.

Scaffold-based generation

In the first task, conditional generative models are trained to produce molecules based on given scaffolds. To illustrate the result, scaffold 1, extracted from the antihypertensive drug Candesartan (see Fig. 9a), is used as an example, along with several related scaffolds (scaffold 2–4) derived from scaffold 1 (Fig. 9a). Conditional codes c are constructed for each type of scaffold, and output structures are produced according to the corresponding code.

Fig. 9.

Fig. 9

Results of scaffold based generation. a Candesartan and the extracted scaffolds (scaffold 1–4); b Output samples based on scaffold 1–4

Results for both the SMILES based and graph based conditional generator are given in Table 3. In terms of output validity, graph based model is able to produce a higher fraction of valid outputs for scaffolds 1–4, compared with SMILES based methods, which is similar to the results of unconditional models

Table 3.

Performance of graph based and SMILES based model on scaffold diversification tasks

Condition (c) R0 I0 Model % valid Rc EORc Diversity (Ic)
scaffold 1 7.9×10-5 0.46 Graph 0.931 ± 0.008 0.86 ± 0.03 10865 0.45 ± 0.01
SMILES 0.924 ± 0.005 0.87 ± 0.01 10976 0.46 ± 0.01
scaffold 2 1.1×10-4 0.50 Graph 0.900 ± 0.016 0.77 ± 0.04 6972 0.47 ± 0.02*
SMILES 0.896 ± 0:011 0.84 ± 0.01* 7607 0.44 ± 0.01
scaffold 3 7.9×10-5 0.56 Graph 0.940 ± 0.019* 0.56 ± 0.08** 7086 0.60 ± 0.02
SMILES 0.898 ± 0.024 0.37 ± 0.07 4623 0.59 ± 0.03
scaffold 4 5.8×10-3 0.82 Graph 0.982 ± 0.001*** 0.88 ± 0.01 151 0.815 ± 0.001
SMILES 0.969 ± 0.002 0.88 ± 0.00 151 0.819 ± 0.00***

Results are reported as Mean±SD. The best performance in each metric is highlighted in italics face. Paired t-tests are carried out for the difference between the graph and SMILES based method (*** for p0.001, ** for p0.01 and * for p0.05)

In terms of the rate of correctly generated outputs (Rc), although the models are unable to achieve 100% correctness, the Rc results are significantly higher than Rc0, offering high enrichment rate over random. Both graph based and SMILES based model are able to achieve EORc>1000 for scaffold 1–3 as well as EORc>100 for scaffold 4, showing promising ability for the model to produce enriched output according to the given scaffold query. By comparing the result of Rc between the two type of architectures, it is found that graph based model have a higher performance for scaffold 3, while SMILES based method have a higher performance for scaffold 2. The two model have similar performance for scaffold 1 and scaffold 4.

The structural diversity of the output samples is also evaluated for each model. Both graph based and SMILES based methods have resulted in a slightly lower output diversity Ic compared with the natural diversity Ic0. For scaffold 2, the graph based method have better performance compared with SMILES based method, while for scaffold 4, the SMILES based methods yields better result. For scaffold 1 and 3, the difference is not significant between graph based and SMILES based method.

A comparison between conditional generative model and RL based approach is performed, using scaffold 4 as example. We set σ=20, and formulate the score function Sc(x) as follows:

Sc(x)=1,ifxis\,valid\,and\,satisfies\,c0,otherwise 27

The result is summarized in Table 4. It is easily observed that all RL based approaches, including Naive RL, RL + Prior and REINVENT, are capable of achieving near perfect result on Rc. However, in terms of output diversity, the RL based methods yields worse performance compared with conditional generative models. Among them, Naive RL result in the lowest output diversity of 0.468, followed by the RL + Prior, whose output diversity is 0.55. REINVENT results in a much higher output diversity of 0.750, but is still lower than that of conditional generative models.

Table 4.

The comparison between condtional generative models RL based models in the task of generating molecules containing scaffold 4

Model % valid Rc EORc Diversity (Ic)
REINVENT 0.998 ± 0.000 1.000 ± 0.000 172 0.75 ± 0.01
Naive RL 0.984 ± 0.015 0.999 ± 0.001 172 0.48 ± 0.08
RL + Prior 0.999 ± 0.001 1.000 ± 0.000 172 0.55 ± 0.09
Graph 0.982 ± 0.001 0.88 ± 0.01 151 0.815 ± 0.001
SMILES 0.969 ± 0.002 0.88 ± 0.00 151 0.819 ± 0.000

Results are reported as Mean±SD

The best performance in each metric is highlighted in italics face

The results above shows that conditional generators and RL based methods have opposite performance on Rc and Ic. This is mainly caused by the fact that the two methods actually operate on different objectives. The former, which is trained under maximum likelihood estimation (MLE), optimizes DKL(p(x|c)||qθ) (the proof is given in Additional file 1: Supplementary Text 5). During training, conditional generative model are encouraged to cover all modes in the data distribution, but are not punish for malicious modes, and therefore result in lower Rc.

The RL based approach, however, optimizes a completely different objective. It can be proved that maximizing Eq. 18 is equivalent to minimizing the reviersed KL divergence DKL(qθ||p(x|c)). In fact, if the score σS(x) is formulated as logp(c|x), which is the log-likelihood for the molecule x to satisfy the requirement c, we can obtain the following relationship between G(x) ans DKL:

θDKL(qθ||p(x|c))=-Exqθ[θG(x)] 28

The derivation is given in Additional file 1: Supplementary Text 6. This objective will force the model to comply with the given condition c, but may result in potential mode collapse, and therefore lower output diversity. In short, conditonal generative model and RL based methods each emphasizes different aspect of the molecule distribution, and future research may explore the possibility to combine those methods.

Several generated samples by graph based model are given for each scaffold in Fig. 9b. Recall that the outputs given scaffold s should contain two type of molecules: (1) molecules with s as its Bemis–Murcko scaffold and (2) molecule whose Bemis–Murcko scaffold contains s but does not reside inside S. Both types are observed for scaffold 1–4 as shown in Fig. 9b. By further investigating the generated samples, it is observed that the model seems to have learnt about the side chains characteristics each scaffold. For example, samples generated from scaffold 1–3 usually have their substitutions occur at restricted positions, and frequently contains a long aliphatic side chain. Interestingly, this actually reflects the structural activity relationship (SAR) for angiotensin II (Ang II) receptor antagonists [55]. In fact, scaffold 1–3 have long been treated as a privileged structure against Ang II receptors [28], and as a result, molecules with scaffold 1–3 are largely biased to those who matches the SAR rules for the target. When trained with the biased dataset, the model can memorize the underlying structural activity relationship as a byproduct of scaffold based learning. This characteristic is beneficial for the generation of libraries containing specified privileged structures.

Generation based on drug-likeness and synthetic accessibility

In this task, the generative model is used to produce molecules according to the requirement on drug-likeness and synthetic accessibility. The conditional code is specified as c=(QED,SA). In the first experiments, the models are required to generate molecules based on the following requirements expressed as subsets of conditional code space: C1=(0.84,1)×(0,1.9), C2=(0,0.27)×(0,2.5), C3=(0.84,1)×(3.4,+) and C4=(0,0.27)×(4.8,+).

The values are determined from the distribution of QED and SA in ChEMBL dataset (see Fig. 10a) using the 90 and 10% quantile. The conditions are illustrated in Fig. 10d. The four sets represent four classes of molecules respectively and the first class C1, which contains structures with high drug-likeness and high synthetic accessibility, defines the set of compounds that are most important for drug design.

Fig. 10.

Fig. 10

Location of C1C4 and c1c4. a Distribution of QED and SAscore in the ChEMBL dataset; b Location of input conditions (C1C4 and c1c4)

Quantitative evaluations of graph based and SMILES based models are demonstrated in Table 5. Again, under all conditions (C1C4), the graph based model is able to outperform SMILES based model on the rate of valid outputs. The difference is most significant for conditions specifying low synthetic accessibility (that is, high SAscore, which is given by C3 and C4). This observation suggests that SMILES based model have difficulty in generating complex structures while maintaining the structural validity.

Table 5.

Performance of graph based and SMILES based model on property based generation tasks

Condition (C) R0 I0 Model % valid RC EORC Diversity (Ic)
C1 0.009 0.810 Graph 0.997 ± 0.000*** 0.55 ± 0.01*** 61 0.798 ± 0.002
SMILES 0.995 ± 0.001 0.51 ± 0.00 57 0.806 ± 0.000***
C2 0.012 0.850 Graph 0.970 ± 0.002*** 0.55 ± 0.01** 46 0.841 ± 0.001
SMILES 0.944 ± 0.001 0.52 ± 0.00 43 0.841 ± 0.001
C3 0.011 0.868 Graph 0.957 ± 0.001*** 0.35 ± 0.01** 32 0.864 ± 0.001
SMILES 0.894 ± 0.007 0.31 ± 0.00 28 0.866 ± 0.001**
C4 0.008 0.867 Graph 0.929 ± 0.003*** 0.73 ± 0.01** 91 0.863 ± 0.001
SMILES 0.613 ± 0.015 0.66 ± 0.00 82 0.863 ± 0.000

Results are reported as Mean±SD. The best performance in each metric is highlighted in italics face. Paired t-tests are carried out for the difference between the graph and SMILES based method (*** for p0.001, ** for p0.01 and * for p0.05)

The graph based model also provides better performance in terms of RC and EORC as shown in Table 5. It is noted that both graph and SMILES based models result in comparatively low RC and EORC on condition C3, which corresponds to molecules with high drug-likeness and low synthetic accessibility. However, this result is relatively easy to understand. Since the definition of drug-likeness contains the requriement for high synthetic accessibility, therefore finding molecules with high QED score and high SAscore is in itself a difficult task. For other conditions, the RC results for both models varies from 50 to 70%. Those values are lower compared with scaffold based task, but nonetheless showing enrichments for all conditions over the distribution from ChEMBL. The diversity of generated samples are also reported.Different from the performance in %valid and Rc, SMILES based method is able to produce outputs with slighly higher diversity compared with graph based method. The different is statistically significant for tasks C1 and C3.

We compared conditional generative model with RL based methods using C1 as example. Similar to “Scaffold-Based Generation”, we set σ to 20, and use the discrete score function SC(x) defined in Eq. 27. The results are summarized in Table 6. Overall, the performance of RL based methods are similar to that in the scaffold-based task. All RL methods are able to achieve high level of Rc, but with lower output diversity.

Table 6.

The comparison between condtional generative models RL based models in the task of generating molecules satisfying condition C1 (that is, QED > 0.84 and SA score < 1.9)

Model % valid Rc EORc Diversity (Ic)
REINVENT 0.999 ± 0.001 0.986 ± 0.004 110 0.73 ± 0.07
Naive RL 0.993 ± 0.006 0.948 ± 0.052 105 0.64 ± 0.05
RL + Prior 1.000 ± 0.000 0.999 ± 0.000 111 0.44 ± 0.16
Graph 0.929 ± 0.003 0.73 ± 0.01 91 0.863 ± 0.001
SMILES 0.613 ± 0.015 0.66 ± 0.00 82 0.863 ± 0.000

Results are reported as Mean±SD

The best performance in each metric is highlighted in italics face

For a visualized demonstration, the distributions of QED and SA score for the output samples from graph based generator are shown in Fig. 11. Random samples are also chosen for each class and are visualization in Fig. 12. The structural features for the output samples are mostly consistent with the predefined conditions, with small and simple molecules for C1 and highly complex molecules for C4.

Fig. 11.

Fig. 11

Distribution of QED and SAscore for generated results: the upper row indicates distribution of QED and SAscore of molecules generated under conditions C1, C2 , C3 and C4. The conditions C1C4 are shown as intervals represented by error bar. The lower row indicates distribution of QED and SAscore of molecules generated using single point conditions (c1, c2, c3 and c4). The conditions c1c4 are represented as dots in the plot

Fig. 12.

Fig. 12

Samples generated under the four predefined conditions on drug-likeness and synthetic accessibility score

Note that conditional models also support generation based on a given point of QED and SAscore. This is demonstrated visually using graph based conditional model. Now, the molecule generation process is conditioned on a single points of conditional code c. Here, we use four different conditions as specified as follows: c1=(0.84,1.9), c2=(0.27,2.5), c3=(0.84,3.8) and c4=(0.27,4.8). Those conditons are also demonstrated in Fig. 10.

The distributions of QED and SAscore for the output molecules by graph based model are shown in Fig. 10e–h. Results show that, although the requirement is specified using a single value of QED and SAscore, the distribution of the two properties for output samples are relatively dispersed. This result is not surprising since the QED and SAscore score are relatively abstract descriptions of structural features of molecules, and a small modification of molecule structure may lead to significant changes in QED and SA scores. Nonetheless, it can be found that the generated samples are enriched around the corresponding code c. It is also observed that the distribution of SAscore is more concentrated than that of QED. This is probably because that SAscore is direct measurement of molecular graph complexity, which may be easier to model for the graph based generator. In contrast, QED is a more abstract descriptor related to various molecular properties.

Generating dual inhibitors for JNK3 and GSK-3β

In this task, the models are used to generate dual inhibitor for JNK3 and GSK-3β. A predictive model is first used to label the conditional code for ChEMBL dataset, and the conditional graph generator is trained on the labeled training set. The two predictors yield good results in general, with AUC = 0.983 for JNK3 and AUC = 0.984 for GSK-3β. The ROC curves for the two models are show in Additional file 2: Figure S4.

Results for both the SMILES based and graph based conditional generator are given in Table 7. In terms of output validity, graph based model outperforms SMILES based model in generating GSK-3β selective and JNK3 selective compounds, but for the generation of dual inhibitors, SMILES based model achieves better performance. In terms of Rc and EORc, SMILES based model is able to obtain better performance in generating dual inhibitors and selective inhibitors against GSK-3β, while the graph based model performs better in the task of generating JNK3 selective inhibitors.The Kcc matrices for graph based and SMILES based model are shown in Table 8. For both graph based and SMILES based model, it is noted that when generating compounds that is active to both JNK3 and GSK-3β, there is a significant amount of outputs falling into the category of GSK-3β positive and JNK3 negative. Nonetheless, in terms of the enrichment over random EORc, the two models are able to achieve high performance for all selectivity combinations. Note that selective inhibitors for GSK-3β are relatively enriched in ChEMBL database, according to the result of the predictor. In comparison, the selective inhibitors against JNK3 and the dual inhibitor for both JNK3 and GSK-3β are much rarer. However, the model is still able to achieve significant enrichment for the two types of selectivity. The result shows potential application for target combinations that have low data enrichment rate.

Table 7.

Performance of graph based and SMILES based model on inhibitor generation, results are reported as Mean±SD

Condition (c) R0 I0 Model % valid Rc EORc Diversity
GSK-3β(+) 0.0008 0.806 Graph 0.939 ± 0.007 0.53 ± 0.01 666 0.783 ± 0.006
JNK3(+) SMILES 0.959 ± 0.003** 0.56 ± 0.01*** 697 0.784 ± 0.003
GSK-3β(+) 0.01 0.860 Graph 0.932 ± 0.007 0.42 ± 0.01 42 0.851 ± 0.001
JNK3(−) SMILES 0.928 ± 0.003* 0.47 ± 0.01*** 47 0.854 ± 0.001**
GSK-3β(−) 0.0008 0.829 Graph 0.955 ± 0.003** 0.61 ± 0.00*** 759 0.814 ± 0.002
JNK3(+) SMILES 0.944 ± 0.003 0.56 ± 0.01 698 0.821 ± 0.001***

The best performance in each metric is highlighted in italics face. Paired t-tests are carried out for the difference between the graph and SMILES based method (*** for p0.001, ** for p0.01 and * for p0.05)

Table 8.

The Kcc matrix for kinase inhibitor generation task, the diagnal elements Kcc=Rc are omitted since they have been reported in Table 7

Condition (c) Model Results(c)
GSK-3β(+),
JNK3(+)
GSK-3β(+),
JNK3(−)
GSK-3β(−),
JNK3(+)
GSK-3β(+) Graph 0.178±0.007 0.018±0.001
JNK3(+) SMILES 0.167±0.010* 0.063±0.006
GSK-3β(+) Graph 0.034±0.001*** 0.003±0.000***
JNK3(−) SMILES 0.082±0.007 0.023±0.002
GSK-3β(−) Graph 0.024±0.004*** 0.022±0.002***
JNK3(+) SMILES 0.083±0.007 0.057±0.002

Results are reported as Mean±SD. The best performance in each metric is highlighted in italics face.Paired t-tests are carried out for the difference between the graph and SMILES based method (*** for p0.001, ** for p0.01 and * for p0.05)

Similar to previous tasks, a comparison with RL based methods is performed. Here, we mainly focus on the task to generate joint inhibitors to JNK3 and GSK-3β. In terms of the design of score function, we have employed Sc(x) similar to that used in previous tasks (Eq. 27). The value of σ is set to 20 for Sd. The results are summarized in Table 9. Note that result for RL + Prior is omited, since in this task, we found that it tends to collapse quickly to a single molecule that could not provide meaningful result. The performance of Naive RL and REINVENT is similar to that reported in previous sections. Both RL based methods achieves high value of Rc, but have much lower output diversity.

Table 9.

The comparison between condtional generative models RL based models in the task of generating dual inhibitors against GSK-3β and JNK3

Model % valid Rc EORc Diversity (Ic)
REINVENT 0.999 ± 0.001 0.996 ± 0.005 1245 0.3 ± 0.2
Naive RL 0.987 ± 0.007 0.969 ± 0.022 1211 0.4 ± 0.1
Graph 0.955 ± 0.003 0.61 ± 0.00 759 0.814 ± 0.002
SMILES 0.944 ± 0.003 0.56 ± 0.01 698 0.821 ± 0.001

Results are reported as Mean±SD

The best performance in each metric is highlighted in italics face

To better demonstrate the structural distribution of the generated samples, visualization based on t-SNE [56]is performed using the ECFP6 fingerprint. The generated samples under different selectivity specifications and molecules in the test set for each target are projected into two-dimensional embeddings and are shown in Fig. 13a–d. According to the result, it is shown that the conditional generator tends to produce molecules near the test set samples, which is consistent with observations based on other methods [12]. It is also observed that molecules generated under different selectivity condition occupy distinct region of chemical space.

Fig. 13.

Fig. 13

Visualizing the distribution of generated samples for each target. The figure shows the t-SNE visualization of: a molecules form test set of GSK-3β and samples conditioned on JNK3(−), GSK-3β(+), b molecules from test set of GSK-3β and samples conditioned on JNK3(+), GSK-3β(+). c Molecules from test set of JNK3 and samples conditioned on JNK3(+), GSK-3β(−), d molecules from test set of JNK3 and samples conditioned on JNK3(+), GSK-3β(+)

For each selectivity condition, several molecules are sampled using the model and are demonstrated in Fig. 14a–c. By investigating the generated structures in detail, it can be observed that the model tends to generate samples containing well-established scaffold for the corresponding target. For JNK3, structures such as diaminopurines [57] and triazolones [58], which have been frequently used in the design of JNK inhibitors, show high occurrence in the generated samples. The observation is the same for GSK-3β, with example like 2,3-bis-arylmaleimides, a class of widely studied inhibitors of GSK-3 [59]. On the other hand, aminopyrimidines are frequently shown in the outputs of all selectivity conditions, but they are more enriched in generated dual inhibitors. Those observations show good interpretability of the outputs, and indicate that the structural features of generated samples are in line with the existing knowledge about the two targets.

Fig. 14.

Fig. 14

Samples conditioned on different selectivity conditions. a–c Generated samples under different condition of selectivity (a for dual inhibitors, b for GSK-3β selective inhibitors, and c for JNK3 selective inhibitors); d, e Some recovered actives of JNK3 and GSK-3β respectively. a Generated dual inhibitors. b Generated GSK-3β selective inhibiors. c Generated JNK3 selective inhibiors. d Recovered JNK3 inhibitors. e Recovered GSK-3β inhibitors

Finally, we report the percentage of reproduced samples from the test set for each target. From the result, 10.3% of molecules are reproduced for JNK3 and, 6.0% of molecules are reproduced for GSK-3β. Note that molecules in the test sets for each targets have been excluded from the ChEMBL training set in this task, which means that the method is capable of generating molecules that have been confirmed to be positive, without seeing them in the training set of predictive model and conditional generative model.

Several recovered actives are shown in Fig. 14d–e. Those molecules show relatively high diversity in structure, indicating that the model does not collapse to a subgroup of active compounds. A quantitative evaluation is performed using the internal diversity, and the result shows that the recovered GSK-3β inhibitors have a internal diversity of 0.819, while the recovered JNK3 inhibitors have a internal diversity of 0.761. Those values, although slighly lower, are relatively close to the diversity of test set molecules, which are 0.867 for GSK-3β and 0.852 for JNK3.

Conclusions

In this work, a new framework for de novo molecular design is proposed based on graph generative model and is applied to solve different drug design problems. The graph generator is designed to be more fitted to the tasks of molecule generation using a simple decoding scheme and a graph convolutional architecture that is less computationally expensive. Furthermore, a more flexible way of introducing decoding invariance is also suggested. The method is trained using molecules in ChEMBL dataset and has demonstrated better performance compared with SMILES based methods, especially in terms of the rate of valid outputs.

To generate molecules with specific requirements, we propose to use conditional generative model, which offers high flexibility and do not require reinforcement learning. The model is applied to solve problems that is highly related to drug design, such as generating molecules based on a given scaffold, generating molecules with good drug-likeness and synthetic accessibility and the generation of molecules with specific profile against multiple targets. Results have showed that the conditional generative model can effectively produce enriched outputs based on the given requirements. A comparison with RL based method is performed, and results shows that although conditional generative model yields lower output accuracy, but it is capable of achieving higher output diversity.

This work can be extended in various aspects. First of all, the models used in this work completely ignores the stereochemistry information for molecules. In fact, stereochemistry is extremely important in the process of drug development, and introducing this information helps to improve the applicability of existing models. Secondly, for the target based generation, it will be much more helpful to jointly train the generator and the decoder, utilizing strategies such as semi-supervised learning [60, 61]. Finally, besides the three tasks experimented in this work, conditional graph generator can be used in many other scenarios. To summarize, the graph generative architecture proposed in this work gives promising result in various drug design tasks, and it is worthwhile to explore other potential applications using this method.

Additional files

13321_2018_287_MOESM1_ESM.pdf (238.6KB, pdf)

Additional file 1. Containing additional information about the implementation details of experiments.

13321_2018_287_MOESM2_ESM.pdf (502.7KB, pdf)

Additional file 2. Contianing supplementary figures.

Author's contributions

YL formulated the concept and contributed to the implementation. YL wrote the manuscript, LZ and ZL reviewed and edited the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We would like to thank Xiaodong Dou for his help on the discussion of generated inhibitors of JNK3 and GSK3β. Thanks to Bo Yang who helped with the profiling of Additional file 1: Supplementary Text 8.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The materials supporing this article, including the source code, sampled molecules and pretrained models, are available at: https://github.com/kevinid/molecule_generator.

Funding

This research was supported by the National Natural Science Foundation of China (Grant Nos. 81573273, 81673279 21572010 and 21772005), the National Major Scientific and Technological Special Project for “Significant New Drugs Development” (2018ZX09735001-003) and Beijing Natural Science Foundation (7172118).

Ethics approval and consent to participate

Not applicable.

Consent for publication

The authors declare no competing fnancial interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Abbreviations

SMILES

simplified molecular-input line-entry system

RNN

recurrent neural network

LM

language model

RF

random forest

RL

reinforcement learning

VAE

variational autoencoder

GRU

gated recurrent unit

DRD2

dopamine receptor D2

JNK3

c-Jun N-terminal kinase 3

GSK3β

glycogen synthase kinase-3 beta

QED

quantitative estimate of drug-likeness

SA

synthetic accessibility

ECFP

extended connectivity fingerprint

t-SNE

t-distributed stochastic neighbor embedding

DKL

Kullback–Leibler divergence

DJS

Jensen–Shannon divergence

Footnotes

Electronic supplementary material

The online version of this article (10.1186/s13321-018-0287-6) contains supplementary material, which is available to authorized users.

Contributor Information

Liangren Zhang, Email: liangren@bjmu.edu.cn.

Zhenming Liu, Email: zmliu@bjmu.edu.cn.

References

  • 1.Schneider G, Fechner U. Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov. 2005;4(8):649–663. doi: 10.1038/nrd1799. [DOI] [PubMed] [Google Scholar]
  • 2.Gómez-Bombarelli R, Duvenaud D, Hernández-Lobato JM, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2016) Automatic chemical design using a data-driven continuous representation of molecules. arXiv preprint arXiv:1610.02415v1 [DOI] [PMC free article] [PubMed]
  • 3.Böhm H-J. The computer program ludi: a new method for the de novo design of enzyme inhibitors. J Comput Aided Mol Des. 1992;6(1):61–78. doi: 10.1007/BF00124387. [DOI] [PubMed] [Google Scholar]
  • 4.Mauser H, Stahl M. Chemical fragment spaces for de novo design. J Chem Inf Model. 2007;47(2):318–324. doi: 10.1021/ci6003652. [DOI] [PubMed] [Google Scholar]
  • 5.Reutlinger M, Rodrigues T, Schneider P, Schneider G. Multi-objective molecular de novo design by adaptive fragment prioritization. Angew Chem Int Ed. 2014;53(16):4244–4248. doi: 10.1002/anie.201310864. [DOI] [PubMed] [Google Scholar]
  • 6.Hiss JA, Reutlinger M, Koch CP, Perna AM, Schneider P, Rodrigues T, Haller S, Folkers G, Weber L, Baleeiro RB. Combinatorial chemistry by ant colony optimization. Future Med Chem. 2014;6(3):267–280. doi: 10.4155/fmc.13.203. [DOI] [PubMed] [Google Scholar]
  • 7.Dey F, Caflisch A. Fragment-based de novo ligand design by multiobjective evolutionary optimization. J Chem Inf Model. 2008;48(3):679–690. doi: 10.1021/ci700424b. [DOI] [PubMed] [Google Scholar]
  • 8.Yuan Y, Pei J, Lai L. Ligbuilder 2: a practical de novo drug design approach. J Chem Inf Model. 2011;51(5):1083–1091. doi: 10.1021/ci100350u. [DOI] [PubMed] [Google Scholar]
  • 9.Hartenfeller M, Proschak E, Schüller A, Schneider G. Concept of combinatorial de novo design of drug-like molecules by particle swarm optimization. Chem Biol Drug Des. 2008;72(1):16–26. doi: 10.1111/j.1747-0285.2008.00672.x. [DOI] [PubMed] [Google Scholar]
  • 10.Goodfellow I, Bengio Y, Courville A. Deep learning. Massachusetts: MIT Press; 2016. [Google Scholar]
  • 11.Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019
  • 12.Segler MH, Kogej T, Tyrchan C, Waller MP. Generating focussed molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci. 2018;4(1):120–130. doi: 10.1021/acscentsci.7b00512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Olivecrona M, Blaschke T, Engkvist O, Chen H. Molecular de-novo design through deep reinforcement learning. J Cheminform. 2017;9(1):48. doi: 10.1186/s13321-017-0235-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
  • 15.Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B. Chembl: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2011;40(D1):1100–1107. doi: 10.1093/nar/gkr777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Popova M, Isayev O, Tropsha A (2017) Deep reinforcement learning for de-novo drug design. arXiv preprint arXiv:1711.10907 [DOI] [PMC free article] [PubMed]
  • 17.Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
  • 18.And JJI, Shoichet BK. Zinc: a free database of commercially available compounds for virtual screening. J Chem Inf Model. 2005;45(1):177. doi: 10.1021/ci049714+. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H. Application of generative autoencoder in de novo molecular design. Mol Inform. 2018;37(1–2):1700123. doi: 10.1002/minf.201700123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Johnson DD (2017) Learning graphical state transitions. In: International conference on learning representations
  • 21.Simonovsky M, Komodakis N (2018) Graphvae: towards generation of small graphs using variational autoencoders. arXiv preprint arXiv:1802.03480
  • 22.Li Y, Vinyals O, Dyer C, Pascanu R, Battaglia P (2018) Learning deep generative models of graphs. In: International conference on learning representations
  • 23.He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv preprint arXiv:1603.05027
  • 24.Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) Moleculenet: a benchmark for molecular machine learning. arXiv preprint arXiv:1703.00564 [DOI] [PMC free article] [PubMed]
  • 25.Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. arXiv preprint arXiv:1704.02901
  • 26.Lima Guimaraes G, Sanchez-Lengeling B, Outeiral C, Cunha Farias PL, Aspuru-Guzik A (2017) Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. arXiv preprint arXiv:1705.10843
  • 27.Neil D, Segler M, Guasch L, Ahmed M, Plumbley D, Sellwood M, Brown N (2018) Exploring deep recurrent models with reinforcement learning for molecule design. In: International conference on learning representations
  • 28.Braese S. Privileged scaffolds in medicinal chemistry:design, synthesis, evaluation. London: RSC Publishing; 2015. [Google Scholar]
  • 29.Bemis GW, Murcko MA. The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996;39(15):2887–2893. doi: 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
  • 30.Reis J, Gaspar A, Milhazes N, Borges FM. Chromone as a privileged scaffold in drug discovery: recent advances. J Med Chem. 2017;60(19):7941–7957. doi: 10.1021/acs.jmedchem.6b01720. [DOI] [PubMed] [Google Scholar]
  • 31.Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H. The scaffold tree: visualization of the scaffold universe by hierarchical scaffold classification. J Chem Inf Model. 2007;47(1):47–58. doi: 10.1021/ci600338x. [DOI] [PubMed] [Google Scholar]
  • 32.Varin T, Schuffenhauer A, Ertl P, Renner S. Mining for bioactive scaffolds with scaffold networks: improved compound set enrichment from primary screening data. J Chem Inf Model. 2011;51(7):1528–1538. doi: 10.1021/ci2000924. [DOI] [PubMed] [Google Scholar]
  • 33.Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):668–672. doi: 10.1093/nar/gkj067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kadam R, Roy N. Recent trends in drug-likeness prediction: a comprehensive review of in silico methods. Indian J Pharm Sci. 2007;69(5):609. doi: 10.4103/0250-474X.38464. [DOI] [Google Scholar]
  • 35.Tian S, Wang J, Li Y, Li D, Xu L, Hou T. The application of in silico drug-likeness predictions in pharmaceutical research. Adv Drug Deliv Rev. 2015;86:2–10. doi: 10.1016/j.addr.2015.01.009. [DOI] [PubMed] [Google Scholar]
  • 36.Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform. 2009;1(1):8. doi: 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL. Quantifying the chemical beauty of drugs. Nat Chem. 2012;4(2):90–98. doi: 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.RDKit: Open Source Cheminformatics. http://www.rdkit.org/
  • 39.Koch P, Gehringer M, Laufer SA. Inhibitors of c-jun N-terminal kinases: an update. J Med Chem. 2014;58(1):72–95. doi: 10.1021/jm501212r. [DOI] [PubMed] [Google Scholar]
  • 40.McCubrey JA, Davis NM, Abrams SL, Montalto G, Cervello M, Basecke J, Libra M, Nicoletti F, Cocco L, Martelli AM. Diverse roles of gsk-3: tumor promoter-tumor suppressor, target in cancer therapy. Adv Biol Regul. 2014;54:176. doi: 10.1016/j.jbior.2013.09.013. [DOI] [PubMed] [Google Scholar]
  • 41.Merget B, Turk S, Eid S, Rippmann F, Fulle S. Profiling prediction of kinase inhibitors: toward the virtual assay. J Med Chem. 2016;60(1):474–485. doi: 10.1021/acs.jmedchem.6b01611. [DOI] [PubMed] [Google Scholar]
  • 42.Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  • 43.Sun J, Jeliazkova N, Chupakhin V, Golib-Dzib J-F, Engkvist O, Carlsson L, Wegner J, Ceulemans H, Georgiev I, Jeliazkov V. Excape-db: an integrated large scale dataset facilitating big data analysis in chemogenomics. J Cheminform. 2017;9(1):17. doi: 10.1186/s13321-017-0203-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Bolton EE, Wang Y, Thiessen PA, Bryant SH. Pubchem: integrated platform of small molecules and biological activities. Annu Rep Comput Chem. 2008;4:217–241. doi: 10.1016/S1574-1400(08)00012-1. [DOI] [Google Scholar]
  • 45.Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274
  • 46.Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
  • 47.Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2015) Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349
  • 48.Chen X, Kingma DP, Salimans T, Duan Y, Dhariwal P, Schulman J, Sutskever I, Abbeel P (2016) Variational lossy autoencoder. arXiv preprint arXiv:1611.02731
  • 49.Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M Improved variational inference with inverse autoregressive flow. In: Advances in neural information processing systems, pp 4743–4751
  • 50.Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv preprint arXiv:1406.2661
  • 51.Im DJ, Ma AH, Taylor GW, Branson K (2018) Quantitatively evaluating GANs with divergences proposed for training. In: International conference on learning representations
  • 52.Jones E, Oliphant T, Peterson P et al (2001) SciPy: open source scientific tools for Python. http://www.scipy.org/
  • 53.Scott DW. Multivariate density estimation: theory, practice, and visualization. New York: Wiley; 2008. [Google Scholar]
  • 54.Benhenda M (2017) ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227
  • 55.Almansa C, Gómez LA, Cavalcanti FL, de Arriba AF, García-Rafanell J, Forn J. Synthesis and structure: activity relationship of a new series of potent at1 selective angiotensin ii receptor antagonists: 5-(biphenyl-4-ylmethyl) pyrazoles. J Med Chem. 1997;40(4):547–558. doi: 10.1021/jm9604383. [DOI] [PubMed] [Google Scholar]
  • 56.van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–2605. [Google Scholar]
  • 57.Krenitsky VP, Nadolny L, Delgado M, Ayala L, Clareen SS, Hilgraf R, Albers R, Hegde S, D’Sidocky N, Sapienza J. Discovery of cc-930, an orally active anti-fibrotic JNK inhibitor. Bioorg Med Chem Lett. 2012;22(3):1433–1438. doi: 10.1016/j.bmcl.2011.12.027. [DOI] [PubMed] [Google Scholar]
  • 58.Probst GD, Bowers S, Sealy JM, Truong AP, Hom RK, Galemmo RA, Konradi AW, Sham HL, Quincy DA, Pan H. Highly selective c-jun N-terminal kinase (JNK) 2 and 3 inhibitors with in vitro CNS-like pharmacokinetic properties prevent neurodegeneration. Bioorg Med Chem Lett. 2011;21(1):315–319. doi: 10.1016/j.bmcl.2010.11.010. [DOI] [PubMed] [Google Scholar]
  • 59.Osolodkin DI, Palyulin VA, Zefirov NS. Glycogen synthase kinase 3 as an anticancer drug target: novel experimental findings and trends in the design of inhibitors. Curr Pharm Des. 2013;19(4):665–679. doi: 10.2174/138161213804581972. [DOI] [PubMed] [Google Scholar]
  • 60.Kingma DP, Mohamed S, Rezende DJ, Welling M Semi-supervised learning with deep generative models. In: Advances in neural information processing systems, pp 3581–3589
  • 61.Siddharth N, Paige B, de Meent V, Desmaison A, Wood F, Goodman ND, Kohli P, Torr PH (2017) Learning disentangled representations with semi-supervised deep generative models. arXiv preprint arXiv:1706.00400

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13321_2018_287_MOESM1_ESM.pdf (238.6KB, pdf)

Additional file 1. Containing additional information about the implementation details of experiments.

13321_2018_287_MOESM2_ESM.pdf (502.7KB, pdf)

Additional file 2. Contianing supplementary figures.

Data Availability Statement

The materials supporing this article, including the source code, sampled molecules and pretrained models, are available at: https://github.com/kevinid/molecule_generator.


Articles from Journal of Cheminformatics are provided here courtesy of BMC

RESOURCES