Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design

  • Conference paper
  • First Online:
Inductive Logic Programming (ILP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13191))

Included in the following conference series:

Abstract

We are interested in generating new small molecules which could act as inhibitors of a biological target, when there is limited prior information on target-specific inhibitors. This form of drug-design is assuming increasing importance with the advent of new disease threats for which known chemicals only provide limited information about target inhibition. In this paper, we propose the combined use of deep neural networks and Inductive Logic Programming (ILP) that allows the use of symbolic domain-knowledge (B) to explore the large space of possible molecules. Assuming molecules and their activities to be instances of random variables X and Y, the problem is to draw instances from the conditional distribution of X, given YB (\(D_{X|Y,B}\)). We decompose this into the constituent parts of obtaining the distributions \(D_{X|B}\) and \(D_{Y|X,B}\), and describe the design and implementation of models to approximate the distributions. The design consists of generators (to approximate \(D_{X|B}\) and \(D_{X|Y,B}\)) and a discriminator (to approximate \(D_{Y|X,B})\). We investigate our approach using the well-studied problem of inhibitors for the Janus kinase (JAK) class of proteins. We assume first that if no data on inhibitors are available for a target protein (JAK2), but a small numbers of inhibitors are known for homologous proteins (JAK1, JAK3 and TYK2). We show that the inclusion of relational domain-knowledge results in a potentially more effective generator of inhibitors than simple random sampling from the space of molecules or a generator without access to symbolic relations. The results suggest a way of combining symbolic domain-knowledge and deep generative models to constrain the exploration of the chemical space of molecules, when there is limited information on target-inhibitors. We also show how samples from the conditional generator can be used to identify potentially novel target inhibitors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Such a model is only possible in the controlled experiment here. In practice, no inhibitors would be available for the target and activity values would have to be obtained by hit assays, or perhaps in silico docking calculations.

  2. 2.

    Again, this is feasible in the controlled experiment here. In practice, we will have no inhibitors for the target, and we will have to perform this assessment on the data available for the target’s homologues (Tr).

  3. 3.

    Could we have directly used ILP for constructing the discriminator? Yes, but there is substantial evidence to suggest that the use of ILP through BotGNNs results in better discriminators [4].

  4. 4.

    A good reason to consider dissimilar molecules is that it allows us to explore more diverse molecules.

  5. 5.

    It is likely that a BotGNN with access to the information in \(B_D\) along with the Chemprop prediction would result in a better proxy model. We do not explore this here.

References

  1. Schneider, P., et al.: Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19(5), 353–364 (2020)

    Article  Google Scholar 

  2. Gaulton, A., et al.: The ChEMBL database in 2017. Nucleic Acids Res. 45(D1), D945–D954 (2017)

    Article  Google Scholar 

  3. Williams, K., et al.: Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases. J. R. Soc. Interface 12(104), 20141289 (2015)

    Article  Google Scholar 

  4. Dash, T., Srinivasan, A., Baskar, A.: Inclusion of domain-knowledge into GNNs using mode-directed inverse entailment. arXiv arXiv:2105.10709 (2021)

  5. Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13(3), 245–286 (1995). https://doi.org/10.1007/BF03037227

    Article  Google Scholar 

  6. Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: CoNLL (2016)

    Google Scholar 

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arXiv:1505.00387 (2015)

  9. Krishnan, S.R., Bung, N., Bulusu, G., Roy, A.: Accelerating de novo drug design against novel proteins using deep learning. J. Chem. Inf. Model. 61(2), 621–630 (2021)

    Article  Google Scholar 

  10. Landrum, G., et al.: RDKit: open-source cheminformatics (2006). https://www.rdkit.org/docs/index.html

  11. Van Craenenbroeck, E., Vandecasteele, H., Dehaspe, L.: DMax’s functional group and ring library (2002). https://dtai.cs.kuleuven.be/software/dmax/

  12. Stokes, J.M., et al.: A deep learning approach to antibiotic discovery. Cell 180(4), 688–702 (2020)

    Article  Google Scholar 

  13. Srinivasan, A.: The aleph manual (2001). https://www.cs.ox.ac.uk/activities/programinduction/Aleph/aleph.html

  14. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)

    Google Scholar 

  15. Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS (2017)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  17. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)

    Google Scholar 

  18. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  19. Dymock, B.W., See, C.S.: Inhibitors of JAK2 and JAK3: an update on the patent literature 2010–2012. Exp. Opin. Ther. Pat. 23(4), 449–501 (2013)

    Article  Google Scholar 

  20. Dymock, B.W., Yang, E.G., Chu-Farseeva, Y., Yao, L.: Selective JAK inhibitors. Fut. Med. Chem. 6(12), 1439–1471 (2014)

    Article  Google Scholar 

  21. Mak, K.K., Rao, P.M.: Artificial intelligence in drug development: present status and future prospects. Drug Disc. Today 24(3), 773–780 (2019)

    Article  Google Scholar 

  22. Popova, M., Isayev, O., Tropsha, A.: Deep reinforcement learning for de novo drug design. Sci. Adv. 4(7), eaap7885 (2018)

    Article  Google Scholar 

  23. Segler, M.H., Kogej, T., Tyrchan, C., Waller, M.P.: Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4(1), 120–131 (2017)

    Article  Google Scholar 

  24. Born, J., Manica, M., Oskooei, A., Cadow, J., Markert, G., Martínez, M.R.: PaccMann\(^{RL}\): de novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning. iScience 24(4), 102269 (2021)

    Article  Google Scholar 

  25. Stahl, N., Falkman, G., Karlsson, A., Mathiason, G., Bostrom, J.: Deep reinforcement learning for multiparameter optimization in de novo drug design. J. Chem. Inf. Model. 59(7), 621–630 (2019)

    Article  Google Scholar 

  26. Bung, N., Krishnan, S.R., Bulusu, G., Roy, A.: De novo design of new chemical entities for SARS-CoV-2 using artificial intelligence. Fut. Med. Chem. 13(6), 575–585 (2021)

    Article  Google Scholar 

  27. Grisoni, F., Moret, M., Lingwood, R., Schneider, G.: Bidirectional molecule generation with recurrent neural networks. J. Chem. Inf. Model. 60(3), 1175–1183 (2020)

    Article  Google Scholar 

  28. Grechishnikova, D.: Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11(1), 1–13 (2021)

    Article  Google Scholar 

  29. Mahmood, O., Mansimov, E., Bonneau, R., Cho, K.: Masked graph modeling for molecule generation. Nat. Commun. 12(1), 1–12 (2021)

    Article  Google Scholar 

  30. Schwalbe-Koda, D., Gómez-Bombarelli, R.: Generative models for automatic chemical design. In: Schütt, K.T., Chmiela, S., von Lilienfeld, O.A., Tkatchenko, A., Tsuda, K., Müller, K.-R. (eds.) Machine Learning Meets Quantum Physics. LNP, vol. 968, pp. 445–467. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40245-7_21

    Chapter  Google Scholar 

  31. Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: Incorporating domain knowledge into deep neural networks. arXiv arXiv:2103.00180 (2021)

  32. Lavrač, N., Džeroski, S., Grobelnik, M.: Learning nonrecursive definitions of relations with LINUS. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 265–281. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0017020

    Chapter  Google Scholar 

  33. França, M.V.M., Zaverucha, G., d’Avila Garcez, A.S.: Fast relational learning using bottom clause propositionalization with artificial neural networks. Mach. Learn. 94(1), 81–104 (2013). https://doi.org/10.1007/s10994-013-5392-1

    Article  MathSciNet  Google Scholar 

  34. Dash, T., Srinivasan, A., Vig, L., Orhobor, O.I., King, R.D.: Large-scale assessment of deep relational machines. In: Riguzzi, F., Bellodi, E., Zese, R. (eds.) ILP 2018. LNCS (LNAI), vol. 11105, pp. 22–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99960-9_2

    Chapter  Google Scholar 

  35. Lodhi, H.: Deep relational machines. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8227, pp. 212–219. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42042-9_27

    Chapter  Google Scholar 

  36. Dash, T., Srinivasan, A., Joshi, R.S., Baskar, A.: Discrete stochastic search and its application to feature-selection for deep relational machines. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 29–45. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_3

    Chapter  Google Scholar 

  37. Dash, T., Srinivasan, A., Vig, L.: Incorporating symbolic domain knowledge into graph neural networks. Mach. Learn. 110(7), 1609–1636 (2021). https://doi.org/10.1007/s10994-021-05966-z

    Article  MathSciNet  MATH  Google Scholar 

  38. Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: How to tell deep neural networks what we know. arXiv arXiv:2107.10295 (2021)

  39. Stevens, R., Taylor, V., Nichols, J., Maccabe, A.B., Yelick, K., Brown, D.: AI for science. Technical report, Argonne National Lab. (ANL), Argonne, IL (United States) (2020)

    Google Scholar 

  40. Kaalia, R., Srinivasan, A., Kumar, A., Ghosh, I.: ILP-assisted de novo drug design. Mach. Learn. 103(3), 309–341 (2016)

    Article  MathSciNet  Google Scholar 

  41. Ertl, P., Schuffenhauer, A.: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1(1), 1–11 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

AS is a Visiting Professorial Fellow at UNSW, Sydney; and a TCS Affiliate Professor. We thank Indrajit Bhattacharya for thoughtful discussions on system-design.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tirtharaj Dash .

Editor information

Editors and Affiliations

Appendices

A Domain-Knowledge Used in Experiments

The domain constraints in \(B_G\) are in the form of constraints on acceptable molecules. These constraints are broadly of two kinds: (i) Those concerned with the validity of a generated SMILES string. This involves various syntax-level checks, and is done here by the RDKit molecular modelling package; (ii) Problem-specific constraints on some bulk-properties of the molecule. These are: molecular weight is in the range (200, 700), the octanol-water partition coefficients (logP) must be below 6.0, and the synthetic accessibility score (SAS) must be below 5.0. We use the scoring approach proposed in [41] to compute the SAS of a molecule based on its SMILES representation.

The domain-knowledge in \(B_D\) broadly divides into two kinds: (i) Propositional, consisting of molecular properties. These are: molecular weight, logP, SAS, number of hydrogen bond donors (HBD), number of hydrogen bond acceptor (HBA), number of rotatable bonds (NRB), number of aromatic rings (NumRings), Topological Polar Surface Area (TPSA), and quantitative estimation of drug-likeness (QED); (ii) Relational, which is a collection of logic programs (written in Prolog) defining almost 100 relations for various functional groups (such as amide, amine, ether, etc.) and various ring structures (such as aromatic, non-aromatic, etc.). The initial version of these background relations was used within DMax chemistry assistant [11]. More details on this background knowledge can be found in [4, 37].

B Proxy Model for Predicting Hit Confirmation

A proxy for the results of hit confirmation assays is constructed using the assay results available for the target. This allows us to approximate the results of such assays on molecules for which experimental activity is not available. Of course, such a model is only possible within the controlled experimental design we have adopted, in which information on target inhibition is deliberately not used when constructing the discriminator in D and generator in G2. In practice, if such target-inhibition information is not available, then a proxy model would have to be constructed by other means (for example, using the activity of inhibitors of homologues).

We use the state-of-the-art chemical activity prediction package Chemprop.Footnote 5 We train a Chemprop model using the data consisting of JAK2 inhibitors and their pIC50 values. The parameter settings used are: class-balance = TRUE, and epochs = 100 (all other parameters were set to their default values within Chemprop). Chemprop partitions the data into 80% for training, 10% validation and 10% for test. Chemprop allows the construction of both classification and regression models. The performance of both kinds of models are tabulated below:

Partition

Classification (AUC)

Regression (RMSE)

Valid

0.9472

0.6515

Test

0.8972

0.6424

The classification model is more robust, since pIC50 values are on a log-scale. We use the classification model for obtaining the results in Fig. 7, and we use the prediction of pIC50 values from the regression model as a proxy for the results of the hit-confirmation assays.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dash, T., Srinivasan, A., Vig, L., Roy, A. (2022). Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design. In: Katzouris, N., Artikis, A. (eds) Inductive Logic Programming. ILP 2021. Lecture Notes in Computer Science(), vol 13191. Springer, Cham. https://doi.org/10.1007/978-3-030-97454-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-97454-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-97453-4

  • Online ISBN: 978-3-030-97454-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics