Abstract
The prediction of acid dissociation constants (pKa) is a prerequisite for predicting many other properties of a small molecule, such as its protein–ligand binding affinity, distribution coefficient (log D), membrane permeability, and solubility. The prediction of each of these properties requires knowledge of the relevant protonation states and solution free energy penalties of each state. The SAMPL6 pKa Challenge was the first time that a separate challenge was conducted for evaluating pKa predictions as part of the Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) exercises. This challenge was motivated by significant inaccuracies observed in prior physical property prediction challenges, such as the SAMPL5 log D Challenge, caused by protonation state and pKa prediction issues. The goal of the pKa challenge was to assess the performance of contemporary pKa prediction methods for drug-like molecules. The challenge set was composed of 24 small molecules that resembled fragments of kinase inhibitors, a number of which were multiprotic. Eleven research groups contributed blind predictions for a total of 37 pKa distinct prediction methods. In addition to blinded submissions, four widely used pKa prediction methods were included in the analysis as reference methods. Collecting both microscopic and macroscopic pKa predictions allowed in-depth evaluation of pKa prediction performance. This article highlights deficiencies of typical pKa prediction evaluation approaches when the distinction between microscopic and macroscopic pKas is ignored; in particular, we suggest more stringent evaluation criteria for microscopic and macroscopic pKa predictions guided by the available experimental data. Top-performing submissions for macroscopic pKa predictions achieved RMSE of 0.7–1.0 pKa units and included both quantum chemical and empirical approaches, where the total number of extra or missing macroscopic pKas predicted by these submissions were fewer than 8 for 24 molecules. A large number of submissions had RMSE spanning 1–3 pKa units. Molecules with sulfur-containing heterocycles or iodo and bromo groups were less accurately predicted on average considering all methods evaluated. For a subset of molecules, we utilized experimentally-determined microstates based on NMR to evaluate the dominant tautomer predictions for each macroscopic state. Prediction of dominant tautomers was a major source of error for microscopic pKa predictions, especially errors in charged tautomers. The degree of inaccuracy in pKa predictions observed in this challenge is detrimental to the protein-ligand binding affinity predictions due to errors in dominant protonation state predictions and the calculation of free energy corrections for multiple protonation states. Underestimation of ligand pKa by 1 unit can lead to errors in binding free energy errors up to 1.2 kcal/mol. The SAMPL6 pKa Challenge demonstrated the need for improving pKa prediction methods for drug-like molecules, especially for challenging moieties and multiprotic molecules.












Similar content being viewed by others
Data availability
SAMPL6 \(\text {p}K_{\text{a}}\) challenge instructions, submissions, experimental data and analysis is available at SAMPL6 GitHub Repository: https://github.com/samplchallenges/SAMPL6. An archive copy of the pKa Challenge directory of SAMPL6 GitHub Repository (SAMPL6-repository-pKadirectory.zip) is also available in the Supplementary Documents bundle (Electronic Supplementary Material 2). Supplementary Documents bundle also includes the following: (1) Table S1 in CSV format (SAMPL6-pKa-chemical-identiers-table.csv), (2) Table S2 in CSV format (macroscopic-pKa-statistics-24mol-hungarian-match.csv), (3) Table S3 in CSV format (microscopic-pKa-statistics-8mol-hungarian-match-table.csv), (4) Table S4 in CSV format (microscopic-pKa-statistics-8mol-microstate- match-table.csv), (5) Figure S1 in CSV format (experimental-microstates-of-8mol-based-on-NMR.csv), (6) The JupyterNotebook used for the enumeration of microstates (enumerate-microstates-with-Epik-and-OpenEye-QUACPAC.ipynb), (7) A CSV table of SAMPL6 molecule IDs and OpenEye OEChem generated SMILES (molecule_ID_and_SMILES.csv).
Abbreviations
- SAMPL:
-
Statistical Assessment of the Modeling of Proteins and Ligands
- pK a :
-
\(-\log _{10}\) of the acid dissociation equilibrium constant
- log P :
-
\(\log _{10}\) of the organic solvent-water partition coefficient (\(K_{ow}\)) of neutral species
- log D :
-
\(\log _{10}\) of organic solvent-water distribution coefficient (\(D_{ow}\))
- SEM:
-
Standard error of the mean
- RMSE:
-
Root mean squared error
- MAE:
-
Mean absolute error
- \(\tau \) :
-
Kendall’s rank correlation coefficient (Tau)
- R2 :
-
Coefficient of determination (R-Squared)
- MPSC:
-
Multiple protonation states correction for binding free energy
- DL:
-
Database lookup
- LFER:
-
Linear free energy relationship
- QSPR:
-
Quantitative structure–property relationship
- ML:
-
Machine learning
- QM:
-
Quantum mechanics
- LEC:
-
Linear empirical correction
References
Manallack DT, Prankerd RJ, Yuriev E, Oprea TI, Chalmers DK (2013) The significance of acid/base properties in drug discovery. Chem Soc Rev 42(2):485–496. https://doi.org/10.1039/C2CS35348B
Charifson PS, Walters WP (2014) Acidic and basic drugs in medicinal chemistry: a perspective. J Med Chem 57(23):9701–9717. https://doi.org/10.1021/jm501000a
Manallack DT, Prankerd RJ, Nassta GC, Ursu O, Oprea TI, Chalmers DK (2013) A chemogenomic analysis of ionization constants-implications for drug discovery. ChemMedChem 8(2):242–255. https://doi.org/10.1002/cmdc.201200507
de Oliveira C, Yu HS, Chen W, Abel R, Wang L (2019) Rigorous free energy perturbation approach to estimating relative binding affinities between ligands with multiple protonation and tautomeric states. J Chem Theory Comput 15(1):424–435. https://doi.org/10.1021/acs.jctc.8b00826
Darvey IG (1995) The assignment of pKa values to functional groups in amino acids. Biochem Educ 23(2):80–82. https://doi.org/10.1016/0307-4412(94)00150-N
Bodner GM (1986) Assigning the pKa’s of polyprotic acids. J Chem Educ 63(3):246. https://doi.org/10.1021/ed063p246
Murray R (1995) Microscopic equilibria. Anal Chem 95:217
Işık M, Levorse D, Rustenburg AS, Ndukwe IE, Wang H, Wang X, Reibarkh M, Martin GE, Makarov AA, Mobley DL, Rhodes T, Chodera JD (2018) pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments. J Comput Aided Mol Des 32(10):1117–1138. https://doi.org/10.1007/s10822-018-0168-0
Bochevarov AD, Watson MA, Greenwood JR, Philipp DM (2016) Multiconformation, density functional theory-based p \(K_{{\rm a}}\) prediction in application to large, flexible organic molecules with diverse functional groups. J Chem Theory Comput 12(12):6001–6019. https://doi.org/10.1021/acs.jctc.6b00805
Selwa E, Kenney IM, Beckstein O, Iorga BI (2018) SAMPL6: calculation of macroscopic pKa values from ab initio quantum mechanical free energies. J Comput Aided Mol Des 32(10):1203–1216. https://doi.org/10.1007/s10822-018-0138-6
Pickard FC, König G, Tofoleanu F, Lee J, Simmonett AC, Shao Y, Ponder JW, Brooks BR (2016) Blind prediction of distribution in the SAMPL5 challenge with QM based protomer and pK a corrections. J Comput Aided Mol Des 30(11):1087–1100. https://doi.org/10.1007/s10822-016-9955-7
Bannan CC, Mobley DL, Skillman AG (2018) SAMPL6 challenge results from $$pK\_a$$ predictions based on a general Gaussian process model. J Comput Aided Mol Des 32(10):1165–1177. https://doi.org/10.1007/s10822-018-0169-z
Işık M, Levorse D, Mobley DL, Rhodes T, Chodera JD (2020) Octanol-water partition coefficient measurements for the SAMPL6 blind prediction challenge. J Comput Aided Mol Des 34(4):405–420. https://doi.org/10.1007/s10822-019-00271-3
Işık M, Bergazin TD, Fox T, Rizzi A, Chodera JD, Mobley DL (2020) Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 Part II log P challenge. J Comput Aided Mol Des 34(4):335–370. https://doi.org/10.1007/s10822-020-00295-0
Kogej T, Muresan S (2005) Database mining for pKa prediction. Curr Drug Discov Technol 2(4):221–229. https://doi.org/10.2174/157016305775202964
Perrin DD, Dempsey B, Serjeant EP (1981) pKa prediction for organic acids and bases, 1st edn. Chapman and Hall, London
Hammett LP (1940) Physical organic chemistry. McGraw-Hill, New York
Taft RW, Lewis IC (1959) Evaluation of resonance effects on reactivity by application of the linear inductive energy relationship V. Concerning a R scale of resonance effects. J Am Chem Soc 81(20):5343–5352. https://doi.org/10.1021/ja01529a025
Xing L, Glen RC, Clark RD (2003) Predicting p \(K_{{\rm a}}\) by molecular tree structured fingerprints and PLS. J Chem Inf Comput Sci 43(3):870–879. https://doi.org/10.1021/ci020386s
Zhang J, Kleinöder T, Gasteiger J (2006) Prediction of p \(K_{{\rm a}}\) values for aliphatic carboxylic acids and alcohols with empirical atomic charge descriptors. J Chem Inf Model 46(6):2256–2266. https://doi.org/10.1021/ci060129d
Cruciani G, Milletti F, Storchi L, Sforna G, Goracci L (2009) In silico p \(K_{{\rm a}}\) prediction and ADME profiling. Chem Biodiv 6(11):1812–1821. https://doi.org/10.1002/cbdv.200900153
Milletti F, Storchi L, Sforna G, Cruciani G (2007) New and original p \(K_{{\rm a}}\) prediction method using grid molecular interaction fields. J Chem Inf Model 47(6):2172–2181. https://doi.org/10.1021/ci700018y
Fraczkiewicz R (2013) In silico prediction of ionization. In: Hage DS (ed) Reference module in chemistry, molecular sciences and chemical engineering. Elsevier, Amsterdam
Simulations Plus ADMET Predictor v8.5;. Simulations Plus, Lancaster, CA, 2018. https://www.simulations-plus.com/software/admetpredictor/physicochemical-biopharmaceutical/
Radak BK, Chipot C, Suh D, Jo S, Jiang W, Phillips JC, Schulten K, Roux B (2017) Constant-pH molecular dynamics simulations for large biomolecular systems. J Chem Theory Comput 13(12):5933–5944. https://doi.org/10.1021/acs.jctc.7b00875
Gunner MR, Murakami T, Rustenburg AS, Işık M, Chodera JD (2020) Standard state free energies, not pKas, are ideal for describing small molecule protonation and tautomeric states. J Comput Aided Mol Des 34(5):561–573. https://doi.org/10.1007/s10822-020-00280-7
Ullmann GM (2003) Relations between protonation constants and titration curves in polyprotic acids: a critical view. J Phys Chem B 107(5):1263–1271. https://doi.org/10.1021/jp026454v
Yang AS, Gunner MR, Sampogna R, Sharp K, Honig B (1993) On the calculation of pKas in proteins. Proteins 15:252–265
Special Issue: SAMPL6 (Statistical Assessment of the Modeling of Proteins and Ligands (2018) J Comput Aided Mol Design 32(10)
Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M (2007) Epik: a software program for pK a prediction and protonation state generation for drug-like molecules. J Comput Aided Mol Des 21(12):681–691. https://doi.org/10.1007/s10822-007-9133-z
QUACPAC Toolkit (2017) OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
OEChem Toolkit (2017) OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Log Q 2(1–2):83–97. https://doi.org/10.1002/nav.3800020109
Munkres J (1957) Algorithms for the assignment and transportation problems. J SIAM 5(1):28–32
SciPy v1.3.1 (2019) Linear Sum Assignment Documentation. The SciPy community. https://docs.scipy.org/doc/scipy-1.3.1/reference/generated/scipy.optimize.linear_sum_assignment.html
OpenEye pKa Prospector;. OpenEye Scientific Software, Santa Fe, NM. https://www.eyesopen.com/pka-prospector accessed on Jan 23, 2018
ACD/pKa GALAS (ACD/Percepta Kernel v1.6);. Advanced Chemistry Development, Inc., Toronto, ON, Canada, 2018. https://www.acdlabs.com/products/percepta/predictors/pKa/
ACD/pKa Classic (ACD/Percepta Kernel v1.6);. Advanced Chemistry Development, Inc., Toronto, ON, Canada, 2018. https://www.acdlabs.com/products/percepta/predictors/pKa/
Chemicalize v18.23 (ChemAxon MarvinSketch v18.23);. ChemAxon, Budapest, Hungary, 2018. https://docs.chemaxon.com/display/docs/pKa+Plugin
MoKa;. Molecular Discovery, Hertfordshire, UK, 2018. https://www.moldiscovery.com/software/moka/
Zeng Q, Jones MR, Brooks BR (2018) Absolute and relative pKa predictions via a DFT approach applied to the SAMPL6 blind challenge. J Comput Aided Mol Des 32(10):1179–1189. https://doi.org/10.1007/s10822-018-0150-x
Bochevarov AD, Harder E, Hughes TF, Greenwood JR, Braden DA, Philipp DM, Rinaldo D, Halls MD, Zhang J, Friesner RA (2013) Jaguar: a high-performance quantum chemistry software program with strengths in life and materials sciences. Int J Quantum Chem 113(18):2110–2142. https://doi.org/10.1002/qua.24481
Tielker N, Eberlein L, Güssregen S, Kast SM (2018) The SAMPL6 challenge on predicting aqueous pKa values from EC-RISM theory. J Comput Aided Mol Des 32(10):1151–1163. https://doi.org/10.1007/s10822-018-0140-z
Klamt A, Eckert F, Diedenhofen M, Beck ME (2003) First principles calculations of aqueous p \(K_{{\rm a}}\) values for organic and inorganic acids using COSMO-RS reveal an inconsistency in the slope of the p \(K_{{\rm a}}\) scale. J Phys Chem A 107(44):9380–9386. https://doi.org/10.1021/jp034688o
Eckert F, Klamt A (2006) Accurate prediction of basicity in aqueous solution with COSMO-RS. J Comput Chem 27(1):11–19. https://doi.org/10.1002/jcc.20309
Pracht P, Wilcken R, Udvarhelyi A, Rodde S, Grimme S (2018) High accuracy quantum-chemistry-based calculation and blind prediction of macroscopic pKa values in the context of the SAMPL6 challenge. J Comput Aided Mol Des 32(10):1139–1149. https://doi.org/10.1007/s10822-018-0145-7
Prasad S, Huang J, Zeng Q, Brooks BR (2018) An explicit-solvent hybrid QM and MM approach for predicting pKa of small molecules in SAMPL6 challenge. J Comput Aided Mol Des 32(10):1191–1201. https://doi.org/10.1007/s10822-018-0167-1
Robert Fraczkiewicz MW (2018) SAMPL6 pKa Challenge: Predictions of ionization constants performed by the S+pKa method implemented in ADMET Predictor software. The Joint D3R/SAMPL Workshop 2018. https://drugdesigndata.org/about/d3r-2018-workshop
OEMolProp Toolkit 2017.Feb.1;. OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
Balogh GT, Tarcsay Á, Keserű GM (2012) Comparative evaluation of pKa prediction tools on a drug discovery dataset. J Pharm Biomed Anal 67–68:63–70. https://doi.org/10.1016/j.jpba.2012.04.021
Settimo L, Bellman K, Knegtel RMA (2014) Comparison of the accuracy of experimental and predicted pKa values of basic and acidic compounds. Pharm Res 31(4):1082–1095. https://doi.org/10.1007/s11095-013-1232-z
Meloun M, Bordovská S (2007) Benchmarking and Validating algorithms that estimate pK a values of drugs based on their molecular structures. Anal Bioanal Chem 389(4):1267–1281. https://doi.org/10.1007/s00216-007-1502-x
Liao C, Nicklaus MC (2009) Comparison of nine programs predicting p \(K_{{\rm a}}\) values of pharmaceutical substances. J Chem Inf Model 49(12):2801–2812. https://doi.org/10.1021/ci900289x
Manchester J, Walkup G, Rivin O, You Z (2010) Evaluation of p \(K_{{\rm a}}\) estimation methods on 211 druglike compounds. J Chem Inf Model 50(4):565–571. https://doi.org/10.1021/ci100019p
Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, Allen D, Casey WM, Kleinstreuer NC, Williams AJ (2019) Open-source QSAR models for pKa prediction using multiple machine learning approaches. J Cheminf 1:11
Baltruschat M (2020) Machine learning meets pKa [version 2; peer review: 2 approved]. F1000Research 9:113. https://doi.org/10.12688/f1000research.22090.2
Hunt P, Hosseini-Gerami L, Chrien T, Plante J, Ponting DJ, Segall M (2020) Predicting p \(K_{{\rm a}}\) using a combination of semi-empirical quantum mechanics and radial basis function methods. J Chem Inf Model 60(6):2989–2997. https://doi.org/10.1021/acs.jcim.0c00105
Zdrazil B, Guha R (2018) The rise and fall of a scaffold: a trend analysis of scaffolds in the medicinal chemistry literature. J Med Chem 61(11):4688–4703. https://doi.org/10.1021/acs.jmedchem.7b00954
Ertl P, Altmann E, McKenna JM (2020) The most common functional groups in bioactive molecules and how their popularity has evolved over time. J Med Chem 63(15):8408–8418. https://doi.org/10.1021/acs.jmedchem.0c00754
Acknowledgements
We would like to acknowledge the infrastructure and website support of Mike Chiu that allowed a seamless collection of challenge submissions. Mike Chiu also provided assistance with constructing a submission validation script to ensure all submissions adhered to the machine-readable format. We are grateful to Kiril Lanevskij for suggesting the Hungarian algorithm for matching experimental and predicted \(\text {p}K_{\text{a}}\) values. We would like to thank Thomas Fox for providing MoKa reference calculations. We acknowledge Caitlin Bannan for guidance on defining a working microstate definition for the challenge and guidance for designing the challenge. We thank Brad Sherborne for his valuable insights at the conception of the \(\text {p}K_{\text{a}}\) challenge and connecting us with Timothy Rhodes and Dorothy Levorse who were able to provide resources and expertise for experimental measurements performed at MRL. We acknowledge Paul Czodrowski who provided feedback on multiple stages of this work: challenge construction, purchasable compound selection, and manuscript draft. MI, JDC, and DLM gratefully acknowledge support from NIH Grant R01GM124270 supporting the SAMPL Blind Challenges. MI, ASR, AR, and JDC acknowledge support from the Sloan Kettering Institute. JDC acknowledges support from NIH Grant P30CA008748 and NIH Grant R01GM121505. DLM appreciates financial support from the National Institutes of Health (Grant No. R01GM108889) and the National Science Foundation (Grant No. CHE 1352608). MRG acknowledges support of MCB-1519640 from the National Science Foundation. MI acknowledges Doris J. Hutchinson Fellowship. MI, ASR, AR, and JDC are grateful to OpenEye Scientific for providing a free academic software license for use in this work. MI, ASR, AR, and JDC thank Janos Fejervari and ChemAxon team that gave us permission to include ChemAxon/Chemicalize \(\text {p}K_{\text{a}}\) predictions as a reference prediction in challenge analysis.
Disclaimers
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
Conceptualization, MI, JDC ; Methodology, MI, JDC, ASR ; Software, MI, AR, ASR ; Formal Analysis, MI, ASR ; Investigation, MI ; Resources, JDC, DLM; Data Curation, MI ; Writing-Original Draft, MI; Writing - Review and Editing, MI, JDC, ASR, AR, DLM, MRG; Visualization, MI, AR ; Supervision, JDC, DLM ; Project Administration, MI ; Funding Acquisition, JDC, DLM, MI.
Corresponding author
Ethics declarations
Conflict of interest
JDC was a member of the Scientific Advisory Board for Schrödinger, LLC during part of this study, and is a current Scientific Advisory Board member for OpenEye Scientific and scientific advisor to Foresite Labs. DLM is a current member of the Scientific Advisory Board of OpenEye Scientific and an Open Science Fellow with Silicon Therapeutics. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Vir Biotechnology, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, XtalPi, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, The Einstein Foundation, and the Sloan Kettering Institute. A complete list of funding can be found at http://choderalab.org/funding.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Işık, M., Rustenburg, A.S., Rizzi, A. et al. Overview of the SAMPL6 pKa challenge: evaluating small molecule microscopic and macroscopic pKa predictions. J Comput Aided Mol Des 35, 131–166 (2021). https://doi.org/10.1007/s10822-020-00362-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-020-00362-6