Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design

Dash, Tirtharaj; Srinivasan, Ashwin; Vig, Lovekesh; Roy, Arijit

doi:10.1007/978-3-030-97454-1_6

Tirtharaj Dash¹⁰,
Ashwin Srinivasan¹⁰,
Lovekesh Vig¹¹ &
…
Arijit Roy¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13191))

Included in the following conference series:

International Conference on Inductive Logic Programming

468 Accesses
1 Citations

Abstract

We are interested in generating new small molecules which could act as inhibitors of a biological target, when there is limited prior information on target-specific inhibitors. This form of drug-design is assuming increasing importance with the advent of new disease threats for which known chemicals only provide limited information about target inhibition. In this paper, we propose the combined use of deep neural networks and Inductive Logic Programming (ILP) that allows the use of symbolic domain-knowledge (B) to explore the large space of possible molecules. Assuming molecules and their activities to be instances of random variables X and Y, the problem is to draw instances from the conditional distribution of X, given Y, B ($D_{X|Y,B}$). We decompose this into the constituent parts of obtaining the distributions $D_{X|B}$ and $D_{Y|X,B}$, and describe the design and implementation of models to approximate the distributions. The design consists of generators (to approximate $D_{X|B}$ and $D_{X|Y,B}$) and a discriminator (to approximate $D_{Y|X,B})$. We investigate our approach using the well-studied problem of inhibitors for the Janus kinase (JAK) class of proteins. We assume first that if no data on inhibitors are available for a target protein (JAK2), but a small numbers of inhibitors are known for homologous proteins (JAK1, JAK3 and TYK2). We show that the inclusion of relational domain-knowledge results in a potentially more effective generator of inhibitors than simple random sampling from the space of molecules or a generator without access to symbolic relations. The results suggest a way of combining symbolic domain-knowledge and deep generative models to constrain the exploration of the chemical space of molecules, when there is limited information on target-inhibitors. We also show how samples from the conditional generator can be used to identify potentially novel target inhibitors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Has Artificial Intelligence Impacted Drug Discovery?

Learning to discover medicines

Article 18 November 2022

Deep Learning in Structure-Based Drug Design

Notes

1.
Such a model is only possible in the controlled experiment here. In practice, no inhibitors would be available for the target and activity values would have to be obtained by hit assays, or perhaps in silico docking calculations.
2.
Again, this is feasible in the controlled experiment here. In practice, we will have no inhibitors for the target, and we will have to perform this assessment on the data available for the target’s homologues (Tr).
3.
Could we have directly used ILP for constructing the discriminator? Yes, but there is substantial evidence to suggest that the use of ILP through BotGNNs results in better discriminators [4].
4.
A good reason to consider dissimilar molecules is that it allows us to explore more diverse molecules.
5.
It is likely that a BotGNN with access to the information in $B_D$ along with the Chemprop prediction would result in a better proxy model. We do not explore this here.

References

Schneider, P., et al.: Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19(5), 353–364 (2020)
Article Google Scholar
Gaulton, A., et al.: The ChEMBL database in 2017. Nucleic Acids Res. 45(D1), D945–D954 (2017)
Article Google Scholar
Williams, K., et al.: Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases. J. R. Soc. Interface 12(104), 20141289 (2015)
Article Google Scholar
Dash, T., Srinivasan, A., Baskar, A.: Inclusion of domain-knowledge into GNNs using mode-directed inverse entailment. arXiv arXiv:2105.10709 (2021)
Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13(3), 245–286 (1995). https://doi.org/10.1007/BF03037227
Article Google Scholar
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: CoNLL (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arXiv:1505.00387 (2015)
Krishnan, S.R., Bung, N., Bulusu, G., Roy, A.: Accelerating de novo drug design against novel proteins using deep learning. J. Chem. Inf. Model. 61(2), 621–630 (2021)
Article Google Scholar
Landrum, G., et al.: RDKit: open-source cheminformatics (2006). https://www.rdkit.org/docs/index.html
Van Craenenbroeck, E., Vandecasteele, H., Dehaspe, L.: DMax’s functional group and ring library (2002). https://dtai.cs.kuleuven.be/software/dmax/
Stokes, J.M., et al.: A deep learning approach to antibiotic discovery. Cell 180(4), 688–702 (2020)
Article Google Scholar
Srinivasan, A.: The aleph manual (2001). https://www.cs.ox.ac.uk/activities/programinduction/Aleph/aleph.html
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Google Scholar
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Dymock, B.W., See, C.S.: Inhibitors of JAK2 and JAK3: an update on the patent literature 2010–2012. Exp. Opin. Ther. Pat. 23(4), 449–501 (2013)
Article Google Scholar
Dymock, B.W., Yang, E.G., Chu-Farseeva, Y., Yao, L.: Selective JAK inhibitors. Fut. Med. Chem. 6(12), 1439–1471 (2014)
Article Google Scholar
Mak, K.K., Rao, P.M.: Artificial intelligence in drug development: present status and future prospects. Drug Disc. Today 24(3), 773–780 (2019)
Article Google Scholar
Popova, M., Isayev, O., Tropsha, A.: Deep reinforcement learning for de novo drug design. Sci. Adv. 4(7), eaap7885 (2018)
Article Google Scholar
Segler, M.H., Kogej, T., Tyrchan, C., Waller, M.P.: Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4(1), 120–131 (2017)
Article Google Scholar
Born, J., Manica, M., Oskooei, A., Cadow, J., Markert, G., Martínez, M.R.: PaccMann$^{RL}$: de novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning. iScience 24(4), 102269 (2021)
Article Google Scholar
Stahl, N., Falkman, G., Karlsson, A., Mathiason, G., Bostrom, J.: Deep reinforcement learning for multiparameter optimization in de novo drug design. J. Chem. Inf. Model. 59(7), 621–630 (2019)
Article Google Scholar
Bung, N., Krishnan, S.R., Bulusu, G., Roy, A.: De novo design of new chemical entities for SARS-CoV-2 using artificial intelligence. Fut. Med. Chem. 13(6), 575–585 (2021)
Article Google Scholar
Grisoni, F., Moret, M., Lingwood, R., Schneider, G.: Bidirectional molecule generation with recurrent neural networks. J. Chem. Inf. Model. 60(3), 1175–1183 (2020)
Article Google Scholar
Grechishnikova, D.: Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11(1), 1–13 (2021)
Article Google Scholar
Mahmood, O., Mansimov, E., Bonneau, R., Cho, K.: Masked graph modeling for molecule generation. Nat. Commun. 12(1), 1–12 (2021)
Article Google Scholar
Schwalbe-Koda, D., Gómez-Bombarelli, R.: Generative models for automatic chemical design. In: Schütt, K.T., Chmiela, S., von Lilienfeld, O.A., Tkatchenko, A., Tsuda, K., Müller, K.-R. (eds.) Machine Learning Meets Quantum Physics. LNP, vol. 968, pp. 445–467. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40245-7_21
Chapter Google Scholar
Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: Incorporating domain knowledge into deep neural networks. arXiv arXiv:2103.00180 (2021)
Lavrač, N., Džeroski, S., Grobelnik, M.: Learning nonrecursive definitions of relations with LINUS. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 265–281. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0017020
Chapter Google Scholar
França, M.V.M., Zaverucha, G., d’Avila Garcez, A.S.: Fast relational learning using bottom clause propositionalization with artificial neural networks. Mach. Learn. 94(1), 81–104 (2013). https://doi.org/10.1007/s10994-013-5392-1
Article MathSciNet Google Scholar
Dash, T., Srinivasan, A., Vig, L., Orhobor, O.I., King, R.D.: Large-scale assessment of deep relational machines. In: Riguzzi, F., Bellodi, E., Zese, R. (eds.) ILP 2018. LNCS (LNAI), vol. 11105, pp. 22–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99960-9_2
Chapter Google Scholar
Lodhi, H.: Deep relational machines. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8227, pp. 212–219. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42042-9_27
Chapter Google Scholar
Dash, T., Srinivasan, A., Joshi, R.S., Baskar, A.: Discrete stochastic search and its application to feature-selection for deep relational machines. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 29–45. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_3
Chapter Google Scholar
Dash, T., Srinivasan, A., Vig, L.: Incorporating symbolic domain knowledge into graph neural networks. Mach. Learn. 110(7), 1609–1636 (2021). https://doi.org/10.1007/s10994-021-05966-z
Article MathSciNet MATH Google Scholar
Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: How to tell deep neural networks what we know. arXiv arXiv:2107.10295 (2021)
Stevens, R., Taylor, V., Nichols, J., Maccabe, A.B., Yelick, K., Brown, D.: AI for science. Technical report, Argonne National Lab. (ANL), Argonne, IL (United States) (2020)
Google Scholar
Kaalia, R., Srinivasan, A., Kumar, A., Ghosh, I.: ILP-assisted de novo drug design. Mach. Learn. 103(3), 309–341 (2016)
Article MathSciNet Google Scholar
Ertl, P., Schuffenhauer, A.: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1(1), 1–11 (2009)
Article Google Scholar

Download references

Acknowledgements

AS is a Visiting Professorial Fellow at UNSW, Sydney; and a TCS Affiliate Professor. We thank Indrajit Bhattacharya for thoughtful discussions on system-design.

Author information

Authors and Affiliations

Department of CSIS & APPCAIR, BITS Pilani, Goa Campus, India
Tirtharaj Dash & Ashwin Srinivasan
TCS Research, New Delhi, India
Lovekesh Vig
TCS Innovation Labs (Life Sciences Division), Hyderabad, India
Arijit Roy

Authors

Tirtharaj Dash
View author publications
You can also search for this author in PubMed Google Scholar
Ashwin Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Lovekesh Vig
View author publications
You can also search for this author in PubMed Google Scholar
Arijit Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tirtharaj Dash .

Editor information

Editors and Affiliations

National Cener for Scientific Research Demokritos, Athens, Greece
Nikos Katzouris
University of Piraeus, Athens, Greece
Alexander Artikis

Appendices

A Domain-Knowledge Used in Experiments

The domain constraints in $B_G$ are in the form of constraints on acceptable molecules. These constraints are broadly of two kinds: (i) Those concerned with the validity of a generated SMILES string. This involves various syntax-level checks, and is done here by the RDKit molecular modelling package; (ii) Problem-specific constraints on some bulk-properties of the molecule. These are: molecular weight is in the range (200, 700), the octanol-water partition coefficients (logP) must be below 6.0, and the synthetic accessibility score (SAS) must be below 5.0. We use the scoring approach proposed in [41] to compute the SAS of a molecule based on its SMILES representation.

The domain-knowledge in $B_D$ broadly divides into two kinds: (i) Propositional, consisting of molecular properties. These are: molecular weight, logP, SAS, number of hydrogen bond donors (HBD), number of hydrogen bond acceptor (HBA), number of rotatable bonds (NRB), number of aromatic rings (NumRings), Topological Polar Surface Area (TPSA), and quantitative estimation of drug-likeness (QED); (ii) Relational, which is a collection of logic programs (written in Prolog) defining almost 100 relations for various functional groups (such as amide, amine, ether, etc.) and various ring structures (such as aromatic, non-aromatic, etc.). The initial version of these background relations was used within DMax chemistry assistant [11]. More details on this background knowledge can be found in [4, 37].

B Proxy Model for Predicting Hit Confirmation

A proxy for the results of hit confirmation assays is constructed using the assay results available for the target. This allows us to approximate the results of such assays on molecules for which experimental activity is not available. Of course, such a model is only possible within the controlled experimental design we have adopted, in which information on target inhibition is deliberately not used when constructing the discriminator in D and generator in G2. In practice, if such target-inhibition information is not available, then a proxy model would have to be constructed by other means (for example, using the activity of inhibitors of homologues).

We use the state-of-the-art chemical activity prediction package Chemprop.^{Footnote 5} We train a Chemprop model using the data consisting of JAK2 inhibitors and their pIC50 values. The parameter settings used are: class-balance = TRUE, and epochs = 100 (all other parameters were set to their default values within Chemprop). Chemprop partitions the data into 80% for training, 10% validation and 10% for test. Chemprop allows the construction of both classification and regression models. The performance of both kinds of models are tabulated below:

Partition	Classification (AUC)	Regression (RMSE)
Valid	0.9472	0.6515
Test	0.8972	0.6424

The classification model is more robust, since pIC50 values are on a log-scale. We use the classification model for obtaining the results in Fig. 7, and we use the prediction of pIC50 values from the regression model as a proxy for the results of the hit-confirmation assays.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dash, T., Srinivasan, A., Vig, L., Roy, A. (2022). Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design. In: Katzouris, N., Artikis, A. (eds) Inductive Logic Programming. ILP 2021. Lecture Notes in Computer Science(), vol 13191. Springer, Cham. https://doi.org/10.1007/978-3-030-97454-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-97454-1_6
Published: 24 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97453-4
Online ISBN: 978-3-030-97454-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Has Artificial Intelligence Impacted Drug Discovery?

Learning to discover medicines

Deep Learning in Structure-Based Drug Design

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Domain-Knowledge Used in Experiments

B Proxy Model for Predicting Hit Confirmation

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Using Domain-Knowledge to Assist Lead Discovery in Early-Stage Drug Design

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Has Artificial Intelligence Impacted Drug Discovery?

Learning to discover medicines

Deep Learning in Structure-Based Drug Design

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Domain-Knowledge Used in Experiments

B Proxy Model for Predicting Hit Confirmation

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation