Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Use of 3D QSAR Models For Database Screening: A Feasibility Study

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

384 J. Chem. Inf. Model.

2008, 48, 384-396

Use of 3D QSAR Models for Database Screening: A Feasibility Study

Alexander Hillebrecht and Gerhard Klebe*


Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marbacher Weg 6, 35032 Marburg,
Germany

Received August 8, 2007

The applicability and scope of 3D QSAR methods (CoMFA, CoMSIA) to screen databases are examined.
A protocol requiring minimal user intervention has been established to align training and test set molecules
using FlexS. As model system isozymes of human carbonic anhydrase (hCA) are used, all results are
exemplified studying affinity toward hCA II and selectivity between hCA I and II. The predictive power of
the obtained models is assessed through prediction of 663 compounds not included in the training set and
compared to 2D QSAR models derived from fragment (MACCS) or property (VSA) based descriptors. The
predictive power is evaluated with respect to the following criteria: the numerical, concerning the absolute
accuracy of prediction, and the categorical, characterizing the ability to assign a compound to the correct
activity class.

INTRODUCTION Despite such potential only a limited number of studies


has been communicated that actually use 3D QSAR models
Quantitative Structure Activity Relationships (QSAR)
for database screening. Moro et al. employed autocorrelation
describe the dependence of the biological activity or any
vectors of the molecular electrostatic potential (MEP) in
other property under consideration (then sometimes termed
conjunction with PLS analysis to screen a small-sized
QSPR) of a set of molecules with respect to their chemical
focused virtual combinatorial library containing 841 entries
structures. The various approaches differ in the type of
for potent antagonists of the human A3 receptor.9 The
descriptors employed to encode the structures (e.g., an overall
autocorrelation approach produces descriptors which encode
property, topological indices, molecular field descriptors) and
the three-dimensional distribution of molecular properties,
in the statistical learning method used to extract a correlation
but their comparative evaluation does not depend on an
(e.g., linear regression, PLS, decision trees, artificial neural
alignment. Similarly, Pastor et al. use a modified auto- and
networks, support vector machines). Since the introduction
cross-correlation transform to derive alignment-independent
of Hansch1 and Free-Wilson2 analysis in 1964, QSAR
3D descriptors from the well-known GRID molecular
techniques have become widely used and established tools
interaction fields. This GRIND10 (GRid-INdependent De-
in the lead optimization process.3
scriptors) approach has been used for virtual screening11 and
QSAR methods require a training set of molecules for
3D QSAR analyses.12 Murcia and Ortiz13 described the
which the target property is known in advance to generate a
development of a fully automatic virtual screening workflow
model (except, e.g., kNN-QSAR4 which is purely instance-
consisting of docking and a subsequent COMBINE analysis.
based thus requiring no model deduction). Such a model can
They established the models and simulated virtual screening
subsequently be used to predict the numeric (i.e., pKi, IC50)
experiments using factor Xa inhibitors. COMparative BINd-
or categorical (i.e., active/inactive, high/low pKi) activity of
ing Energy (COMBINE) analysis14 is a receptor-based 3D
a novel molecule not contained in the training set. Further-
QSAR approach using protein-ligand interaction energy
more, several of the statistical methods available provide
terms as independent variables thus requiring knowledge or
information about the relevance of the applied descriptors
reasonable assumptions about the binding geometries. Zhang
for the activity/property being studied. For example, regres-
et al. performed a virtual screen for HMG-CoA reductase
sion techniques return weights for each independent variable
inhibitors using a combination of pharmacophore filtering,
thus indicating through sign and magnitude how strongly
docking, and CoMFA predictions. Therein, the docking
and in which direction the respective descriptor modulates
procedure provided the molecular alignment rule.15
the dependent variable. This knowledge provides insights
into the mechanism of action of the studied molecules and In the present study we assess systematically the suitability
supports the synthesis planning of prospective modifications of 3D QSAR models (CoMFA,16,17 CoMSIA18,19) for large
in order to improve the desired properties (direct design). scale applications. 3D QSAR techniques require an appropri-
However, all QSAR approachesseven those which do not ate spatial superimposition of the molecules under consid-
produce easily interpretable output (like ANNs)scan be used eration prior to calculation of field descriptors. Various
to predict the activity of molecules (indirect design). Thus, procedures have been pursued for this crucial step20,21 many
QSAR models can be used for lead optimization5 or database of them being entirely manual (rmsd fit, minimization in the
screening.6-8 binding pocket of the target protein) or at least requiring
user intervention (selection of docking poses). Besides the
* Corresponding author e-mail: Klebe@staff.uni-marburg.de. subjective nature of a user-supervised alignment, the enor-
10.1021/ci7002945 CCC: $40.75 © 2008 American Chemical Society
Published on Web 01/23/2008
3D QSAR FOR DATABASE SCREENING J. Chem. Inf. Model., Vol. 48, No. 2, 2008 385

Chart 1. Formulas of the Nine Scaffolds Contained in Training Set and Test Set Ligands.

mous time effort makes such an approach inappropriate for drase (hCA) isozymes. CAs are zinc containing hydrolases
large screening scenarios. Thus, we decided to develop a (EC 4.2.1.1), which catalyze the reversible hydration of
protocol which employs the property- and interaction-based carbon dioxide to bicarbonate under the release of one
FlexS22 program as alignment engine. FlexS is fast enough proton.29,30 They are involved in a variety of important
for screening purposes, easily extensible to combinatorial physiological processes, such as pH and CO2 regulation, bone
libraries, and uses a physicochemical description of the resorption, calcification, metabolic reactions, tumorigenicity,
molecules in terms of Gaussian functions which is consistent and electrolyte secretion.31,32 Therefore, inhibitors of carbonic
with the methodology applied in CoMSIA. The statistical anhydrases offer the opportunity to treat several physiological
performance of the models obtained by this fully automated disorders, e.g., as drugs against glaucoma, mountain sickness,
alignment procedure is compared to the results of a study and epilepsy.33 Due to the high structural similarity and the
previously conducted in our laboratory.23 There, the same broad tissue distribution of the different isoforms of CAs
data set was used, but the molecules were manually aligned selectivity is an issue of major concern in the design of CA
and minimized in the binding pocket. inhibitors. In the present study we used the pKi value
In addition to 3D QSAR we also derived 2D models and measured against hCA II (in the following sections referred
applied both to an external test set of 663 molecules. Two to as pKi(II)) as an example for affinity prediction and the
kinds of 2D descriptors are used: The public version of the difference of the pKi values toward hCA I and hCA II (in
well-known MDL MACCS keys,24,25 which encode the the following sections referred to as ∆pKi(I-II)) as an
presence or absence of distinct structural fragments in a example for selectivity prediction.
molecule, and the 32 partitioned van der Waals surface area
(VSA) descriptors developed by Labute,26 which are property METHODS
based. The performance in terms of external predictivity is
compared on a numerical and categorical basis: predictive All 3D QSAR studies and all PLS analyses were per-
r2 and Spearman’s r2 assess the correlation between experi- formed using SYBYL7.1.34 MACCS keys24 and VSA
mental and calculated activity values, Spearman’s rank descriptors26 were calculated using MOE 2005.06.35 VSA
correlation coefficient characterizing the ability to correctly descriptor selection was accomplished by the SVL script
predict the rank order of the compounds according to their (Scientific Vector Language) AutoQSAR.36 Automation of
activity. While these parameters characterize the overall ligand alignment was realized via the Python interface
numerical accuracy of the models, the ability to enrich “pyflexs” running FlexS 1.20.2.22 Compound prediction was
compounds with a desired activity is even more important implemented in SPL (Sybyl Programming Language).
for database screening purposes. In contrast to other virtual Data Set and Preparation. In order to obtain a suitable
screening approaches such as docking/scoring or pharma- training set 144 ligands were taken from our previous 3D
cophore matching, however, one limitation of QSAR is that QSAR study and subjected to the FlexS alignment procedure.
it works only on a rather limited piece of chemical space For details about selection criteria and composition of this
covered by the training set compounds. Hence, only focused data set we refer to ref 16. The following procedure was
libraries can be screened reliably, and in order to obtain pursued to automate ligand alignment: For each of the nine
enrichment or classification ratings a certain activity thresh- scaffolds (Chart 1, throughout this paper we will refer to
old has to be predefined for class assignment (e.g., if pKi > these scaffolds by the corresponding capital letters A to I)
7, compound is classified “active”). In this study, we report comprised in the training and the test sets two items had to
sensitivity and specificity, classical enrichment plots, receiver be defined: (1) a so-called MAPREF which is an anchoring
operating characteristic (ROC) curves, the area under the fragment used to initiate the incremental construction
ROC curve (AUC), and hit rates to compare different models algorithm of FlexS and (2) a reference ligand for spatial
from an objective point of view.27,28 alignment. The assignment of atoms in each ligand to the
All studies reported herein are based on pKi values of corresponding MAPREF substructure is achieved by graph
sulfonamide-type inhibitors toward human carbonic anhy- matching and does not require any user intervention. The
386 J. Chem. Inf. Model., Vol. 48, No. 2, 2008 HILLEBRECHT AND KLEBE

second item to be predefined is one complete reference ligand fragment in a molecule. This protocol allows better dif-
onto which the respective candidate molecules are superim- ferentiating molecules possessing not only qualitatively the
posed. This reference should be at best a compound with same substructures but differing in counts. This is often the
maximum spatial extensions adopting a “representative” case in focused libraries.
conformation for the individual chemical class, if possible In addition to such fragment oriented models also the
extracted from a crystal structure. The utilization of a property based VSA descriptors were used to derive 1D
MAPREF substructure as alignment template ensures internal QSAR models. Each of the 32 VSA descriptors is calculated
consistence of the produced alignment. Without such con- as the sum of all atomic contributions to the approximated
straints placements lacking proper superimposition of the van der Waals surface areas with the respective property
anchoring fragments with that of the reference ligand might falling into a certain range. In MOE this type of descriptor
achieve higher similarity scores because the overall similarity can be calculated for three properties: SlogP (calculated
comprising various properties of the entire molecule of a octanol/water partition coefficient), SMR (calculated molar
distinct pose can be higher than the score obtained for the refractivity), and PEOE (Gasteiger-Marsili partial charges).
desired pose where common substructures are mutually These descriptors show a rather low degree of correlation
overlaid. Using this procedure all except six molecules could among each other, capture different aspects of protein-ligand
be aligned ending up with a final training set size of 138 interactions and transport phenomena, and their broad
molecules. applicability to many QSAR related problems has been
The training set protocol was applied to 663 compounds shown. In order to select a suitable subset of descriptors the
serving as a real life test example. It has to be noted that AutoQSAR36 procedure was applied. It identifies the “best”
this test set represents a rather unbalanced sample since the model based on leave-one-out cross-validation. This proce-
only selection criteria were availability of experimental data dure should only generate rather crude 1D QSAR models.
to be predicted and affiliation to one of the crude chemo- It will not evaluate the capacity of VSA descriptors
types A to I considered in the training set. Most of the exhaustively, and the obtained models should rather serve
compounds could be aligned applying default settings in as a first overview for comparison to get an idea as to what
FlexS, only for some rather rigid and large molecules the can be achieved by a crude “quick and dirty” 1D model.
threshold for the minimum van der Waals overlap volume
had to be reduced from 0.6 to 0.4 in order to obtain a Evaluation of Predictivity. In order to assess and compare
superimposition solution. the predictive power of the different models, several statisti-
cal parameters as well as plots are reported: The predictive
3D QSAR Analyses. For CoMFA16 analysis the interac-
r2 value is usually used to characterize the performance in
tion energies between a probe atom and the ligand atoms
were calculated using a grid box of 26 × 34 × 25 points terms of external predictivity of QSAR models. Additionally,
with 1 Å spacing, embedding all ligands with a margin of at Spearman’s rank correlation coefficient was calculated which
least 4 Å in each direction. The same box dimensions were quantifies the ability of a model to correctly predict the
also used for CoMSIA18 studies. A positively charged sp3- relative order instead of the absolute numeric value of the
carbon atom was used as a probe atom for calculating steric modeled variable.
and electrostatic CoMFA fields applying SYBYL standard Since our study is intended to assess the predictive power
parameters (TRIPOS standard field, dielectric constant 1/r, of QSAR models for database screening purposes it is
cutoff 30 kcal/mol). CoMSIA fields were computed for steric, reasonable to apply also figures-of-merit commonly used in
electrostatic, hydrophobic, and hydrogen-bonding properties, virtual screening, where the correct prediction of class
using a probe of charge +1, radius of 1, hydrophobicity and memberships is even more important. Therefore, a somewhat
hydrogen-bonding properties of +1, and an attenuation factor arbitrary threshold had to be defined which determines class
R of 0.3 for the Gaussian distance-dependent function. All affiliation: For pKi(II) the 2% or 5% of molecules with the
fields were scaled with the CoMFA_STD scaling procedure, highest (or lowest, respectively) activity were considered as
assigning equal weights to each field. The response variables the “high activity” (or “low activity”) class, whereas the
(pKi(II), ∆pKi(I-II)) were correlated with the field descrip- remaining part of the molecules is assigned to the comple-
tors using SAMPLS37 in a leave-one-out cross-validation mentary class. This setting simulates screening experiments
analysis. The optimal number of PLS components was where one wants to enrich compounds which possess a
determined by subsequently extracting one more latent remarkably high affinity toward the target of interest,
variable until the corresponding q2 value is not further compared to the bulk of the training set. Defining the 2% or
increased by more than five percent.38 Afterward, a PLS 5% of lowest activity as the class of interest, the situation
analysis39 was performed without cross-validation using the represents antitarget modeling, where a distinct receptor must
optimal number of components, applying no column filtering. not be inhibited (this holds particularly for, e.g., cytochrome
1D and 2D QSAR Analyses. The MDL MACCS keys P450 or hERG channel blockers). The same aspects are
have already proved useful for screening purposes in examined for the activity difference ∆pKi(I -II). This
conjunction with PLS Discriminant Analysis (PLS-DA) corresponds to a screening scenario with the aim of enriching
models.40 We decided to use them for building numeric 2D compounds selective toward hCA I or hCA II, respectively.
QSAR models. The public version was used that evaluates As direct measures for correct class prediction the sensitivity
the presence of 166 distinct molecular fragments. Instead of Se (also called “true positives rate” or “recall”), the specific-
computing binary fingerprints which store the information ity Sp, and the hit rate H (or “precision”) are reported27,28
about the presence or the absence of one particular substruc-
ture the “counted” version was used resulting in 166 integers TP
Se ) × 100%
each capturing the exact number of occurrences of the TP + FN
3D QSAR FOR DATABASE SCREENING J. Chem. Inf. Model., Vol. 48, No. 2, 2008 387

TN
Sp ) × 100%
TN + FP
TP
H) × 100%
TP + FP
where FP and FN are the number of false positives and false
negatives, and TP and TN are the number of true positives
and negatives, respectively.
To give an even more comprehensive illustration of the
results several kinds of plots are reported: (1) A plot
displaying the predicted activity/selectivity value of a Figure 1. Comparison of performance of different models evalu-
molecule on the ordinate versus its experimental one on the ated via cross-validation. The leave-one-out-q2 is shown for
abscissa. (2) A classical enrichment plot. On the x-axis this prediction of pKi(II) (left) and ∆pKi(I-II) (right), respectively.
graph displays the amount of database entries screened, and Table 1. Statistical Results of the Different QSAR Analyses
on the y-axis it shows the amount of actives retrieved from
method CoMFA CoMSIA MACCS VSA
the database. (3) A receiver operating characteristic (ROC)
curve. This type of plot is quite popular in many scientific alignment manual FlexS manual FlexS
areas like psychology, medicine, acoustics, or criminology dep. var. pKi(II)
to assess the ability of a diagnostic system to distinguish q2 0.853 0.798 0.860 0.822 0.818 0.790
sPRESS 0.504 0.593 0.489 0.552 0.556 0.629
signal from noise. On the x-axis a ROC plot displays the r2 0.949 0.880 0.943 0.867 0.884 0.837
term 1 - specificity which corresponds to the “noise” in S 0.297 0.457 0.313 0.476 0.444 0.553
the data set, and on the y-axis it shows the sensitivity which F 423.840 243.854 453.840 441.585 264.532 27.33
can be thought of as the “signal” that is to be identified by no. comp. 6 4 5 2 4 17
dep. var. ∆pKi(I-II)
the ranking procedure. Nevertheless, its application in drug q2 0.758 0.715 0.786 0.784 0.670 0.584
design and especially in virtual screening is still not standard sPRESS 0.598 0.629 0.552 0.548 0.684 0.799
although it exhibits some advantages compared to the usually r2 0.977 0.851 0.950 0.905 0.802 0.660
applied enrichment curves.28,41 Most important the area under S 0.184 0.455 0.368 0.364 0.529 0.720
F 633.455 190.350 330.962 316.402 188.962 14.028
the curve (AUC) of a ROC plot can be used to directly no. comp. 9 4 4 4 3 13
compare the achieved accuracy of a computer test. Further-
more, the shape of the “ideal curve” of an enrichment plot Table 2. Numerical Measures of Predictivity for Different QSAR
depends on the ratio of actives to inactives in the database, Methods
which is not the case for the ROC curves. Finally, enrichment CoMFA CoMSIA MACCS VSA
plots only capture one aspect of a screening experiment, pred. r2 pKi(II) 0.454 0.482 0.302 -0.710
namely the power to retrieve actives, i.e., the sensitivity, ∆pKi(I-II) -0.079 0.001 -0.476 -0.893
whereas ROC curves also illustrate the second important Spearman’s r2 pKi(II) 0.407 0.443 0.393 0.288
aspect, the ability to discard inactives, i.e., the specificity. ∆pKi(I-II) 0.115 0.118 0.069 0.000

pronounced for the VSA descriptors compared to the


RESULTS AND DISCUSSION
MACCS keys. Nevertheless, still highly significant models
Comparison of 3D and 2D QSAR Models in Terms of (q2 > 0.5) can be obtained.
Internal Predictivity. To get an impression about the The full set of statistical parameters is shown in Table 1.
internal consistency, Figure 1 displays the leave-one-out q2 External Numerical Predictivity: Correlation Coef-
value as a crude measure of model performance. For all ficients. A test set of 663 sulfonamide type inhibitors was
methods q2 is smaller for selectivity prediction (∆pKi(I- aligned with the above-mentioned FlexS protocol, subse-
II)) than for affinity prediction (pKi(II)) which is a conse- quently pKi(II) and ∆pKi(I-II) were predicted based on the
quence of the fact that ∆pKi(I-II) is a difference of two 3D QSAR models previously obtained by FlexS alignment
single variables and thus contains the sum of both errors. In and the 1D/2D models (independent of conformation and
the case of CoMFA and CoMSIA the automated alignment orientation). Table 2 shows the predictive r2 and Spearman’s
yields slightly smaller q2 values (the largest difference rank correlation coefficient for the different models. CoMSIA
between manual and automated alignment: 0.055 for CoM- yields the best results in all cases, with CoMFA exhibiting
FA, pKi(II)). The observation that the differences with respect slightly worse values. The MACCS approach performs
to the alignment method are smaller for CoMSIA can be significantly better than VSA; however, its predictive power
attributed to the smoother Gaussian functional form used to is clearly worse compared to the 3D techniques. Regarding
derive the descriptors which makes this method less sensitive the size and structural imbalance of the test set with respect
toward slight shifts of the molecules. Since the overall to the training set the results for prediction of pKi(II) are
performance of the 3D QSAR methods is virtually the same, remarkably good for the 3D methods and still acceptable
it can be stated that the automated alignment procedure is for the MACCS approach, whereas the VSA approach fails
suitable for large scale applications. to make useful predictions. These findings confirm the well-
The 1D and 2D QSAR approaches produce models of known “beware of q2 !” phenomenon, stating that a “good
similar quality for prediction of pKi(II); however, in the case q2” is by no means indicative that a model also possesses
of selectivity prediction the decrease in q2 is significantly sufficient predictivity with respect to novel external com-
higher compared to the 3D methods. This decrease is more pounds not regarded for training, though the relative ranking
388 J. Chem. Inf. Model., Vol. 48, No. 2, 2008 HILLEBRECHT AND KLEBE
3D QSAR FOR DATABASE SCREENING J. Chem. Inf. Model., Vol. 48, No. 2, 2008 389

Figure 2. Plots of experimental versus predicted pKi(II) values for the four QSAR approaches considered in this study. A: CoMFA; B:
CoMSIA; C: MACCS; D: VSA. Dashed lines mark a range of (1 logarithmic unit deviation; the solid line indicates perfect correlation.
The shape of the points indicates the individual scaffold class of the respective ligands, the capital letters corresponding to the structural
assignment given in Chart 1. The bold circle in part D marks two extreme outliers discussed in the text.
of the model quality can still be estimated based on the most of the training set molecules does not exceed an upper
achieved q2 in our study. Obviously all approaches are not limit of 160 but takes values of 212 and 231 for both ill-
able to give accurate numerical predictions for the selectivity predicted compounds, respectively. Obviously, the field- and
variable ∆pKi(I-II). Nevertheless, we will show in the fragment-based descriptors are more robust with respect to
following sections that these models are not completely structural extrapolation, at least in the case of our data set.
useless with respect to screening purposes for which accurate It has to be noted, however, that in CoMFA/CoMSIA a
numerical predictions are less important than correct assign- descriptor used to evaluate a compound contains several
ment to affinity classes. thousand values, however, for MACCS only 166 and for
External Numerical Predictivity: Experimental versus VSA even only 32, respectively. Thus, it is not too surprising
Predicted Plots. Figure 2 shows the plots of experimental that a set of VSA values is only a crude approximative
versus predicted pKi(II) values for all four approaches. Each description of a molecule. Moreover, no information about
point corresponds to one molecule and indicates its member- the spatial distribution of the properties encoded by the VSA
ship to a chemical class by point shape. The distribution of descriptors about the molecules is contained in the latter
points along the abscissa (i.e., the experimental pKi(II)) descriptors.
reflects again the imbalance of the test set. Compounds with As expected deviations from correct predictions are
pKi(II) values below 6.5 are rarely found, whereas those generally worse for selectivity prediction of ∆pKi(I-II) as
possessing pKi(II) between 6.5 and 9.0 are clearly over- shown in Figure 3. However, despite the poor predictive r2
represented. The plots demonstrate the value of CoMFA for all models the overall appearance of the plots suggests
(Figure 2A) and CoMSIA (Figure 2B) for this large scale that the 3D QSAR models can give rough estimates in terms
application since most of the test compounds are predicted of selectivity. The group of thiadiazolesulfonamides (scaffold
correctly within (1 logarithmic unit. The same holds for A, Chart 1) with experimental ∆pKi(I-II) > -1 is remark-
the MACCS key approach (Figure 2C) albeit with some more ably underpredicted by all four models. This can be easily
molecules falling outside this tolerance. Closer inspection explained by the fact that in the training set this chemical
of the plots reveals a tendency of the MACCS keys and even class has a mean ∆pKi(I-II) of -1.91 with a maximum of
more of the VSA descriptors (Figure 2D) to overpredict some -0.71. The poorly predicted subset, however, exhibits a
of the thiadiazolsulfonamides (scaffold A, Chart 1) and the mean ∆pKi(I-II) of -0.52. Correct prediction would require
benzothiazolsulfonamides (C). In order to shed some light clear extrapolation. In the case of benzothiazolsulfonamides
on the reason for this finding, we pick two molecules marked (scaffold C) the MACCS and VSA models perform signifi-
by a bold circle in Figure 2D whose affinity is predicted cantly worse compared to the 3D models.
more than 3 logarithmic units too high by the VSA model, External Categorical Predictivity: Sensitivity, Specific-
whereas the other approaches provide a reasonable estimate. ity, and Hit Rates. In order to assess the categorical external
A projection of these molecules into the PCA space of the predictivity of the established models we will first report
training set calculated using VSA descriptors reveals them the resulting sensitivities (Se), specificities (Sp), and hit rates
as extreme structural outliers. They contain very bulky (H) after a distinct threshold for the modeled variable has
phenylpyridinium groups which are not present in any of been defined. Each compound is labeled “positive” or
the training set compounds. The most relevant descriptor “negative” according to the above-mentioned arbitrarily
responsible for the overprediction is SMR_VSA5 (strongly chosen threshold. The thresholds are selected such that
correlated to molecule volume and polarizability) which for compounds with the highest/lowest 2% or 5% of pKi(II)/
390 J. Chem. Inf. Model., Vol. 48, No. 2, 2008 HILLEBRECHT AND KLEBE
3D QSAR FOR DATABASE SCREENING J. Chem. Inf. Model., Vol. 48, No. 2, 2008 391

Figure 3. Plots of experimental versus predicted ∆pKi(I-II) for the four QSAR approaches considered in this study. A: CoMFA; B:
CoMSIA; C: MACCS; D: VSA. Dashed lines mark a range of ( 1 logarithmic unit deviation; the solid line indicates perfect correlation.
The shape of the points indicates the individual scaffold class of the respective ligands, the capital letters corresponding to the structural
assignment given in Chart 1.

∆pKi(I-II) are retrieved. In a screening scenario one is corresponding results are obtained for the difference ∆pKi-
usually only interested in identifying compounds with (I-II).
“extreme” activities either to find those with high affinity External Categorical Predictivity: ROC Plots. Figure
for a particular target, with low affinity for an antitarget, or 5 shows two examples of ROC plots monitoring the
with extraordinary selectivity profiles. In a real life situation screening progress. The main diagonal corresponds to a
an even lower amount (e1%) would be of interest due to random classifier unable to discriminate signal from noise.
the large size of databases screened, but for our present study Thus, for any possible threshold the same percentage for
this would result in a very small absolute number of sensitivity (signal) and 1 - specificity (noise) is achieved.
molecules. Most likely rather unstable statistical results would Its AUC is 50.0%, and any classifier better than random
be suggested. The same threshold will then be applied to therefore has to produce an AUC above this lower limit. The
the calculated values, and the comparison with the experi-
curve of an ideal classifier coincides with the left and the
mentally determined classification yields the assignment to
top edge of the coordinate system and encloses an AUC of
“true positives/negatives” and “false positives/negatives” (TP,
100.0%.
TN, FP, FN). The main difference between this kind of
evaluation and the ROC and enrichment plots described Figure 5A shows the ROC plots for the retrieval of the
below is that the latter methods monitor the evolution of Se 5% compounds with lowest pKi(II). For this example all four
or Sp in dependency on a variable threshold, whereas the methods perform similarly well. For a real life scenario the
approach applied here analyzes the classification performance left section of the plot is most interesting since it indicates
taking the predicted values “as is”. how much signal (actives) can be identified by the model
The resulting sensitivities, specificities, and hit rates are still discarding most of the noise (inactives). In our case about
shown in Figure 4 for the various approaches and thresholds. 35% of actives can be retrieved without extracting false
The plots reveal a trend toward the 1D/2D methods exhibit- positives. For the remaining part of the screening, the
ing a higher sensitivity, i.e., they tend to omit less actives MACCS descriptors perform slightly worse compared to the
than the 3D models. This is achieved at the cost of a reduced other models. At higher noise levels ([1 - specificty])0.4)
specificity as they also label many inactives as actives. This the 3D methods slightly outperform the 1D and 2D ap-
results in most cases in higher hit rates for the 3D QSAR proaches.
approaches. The MACCS keys-based models perform sig- In Figure 5B, the ROC curves to identify the 5%
nificantly better compared to the VSA models; they yield compounds with lowest ∆pKi(I-II) (i.e., the most selective
high sensitivities in conjunction with reasonable specificities. ones for hCA II) are shown. Here, the differences between
The plots also show that the results are generally better to the four approaches are more pronounced. The worst
identify compounds with minimal pKi(II) or maximal ∆pKi- performance is indicated for the VSA model intermediately
(I-II), respectively, compared to the case of maximal pKi- even dropping below the random line. All approaches exhibit
(II) or minimal ∆pKi(I-II). This finding corresponds to the a poorer performance in the “interesting” left part of the plot
observation described above that compounds with very low compared to the previous example. In this area MACCS and
pKi (i.e., the classes of scaffolds H and I) are usually well 3D models perform similarly, whereas at specificities below
predicted, whereas those with high pKi(II) (particularly the 80% ([1 - specificity] > 0.2) 3D QSAR models outperform
scaffolds A and C) are often overpredicted. In consequence the 1D/2D models.
392 J. Chem. Inf. Model., Vol. 48, No. 2, 2008 HILLEBRECHT AND KLEBE

Figure 4. Comparison of sensitivity (A, D), specificity (B, E), and hit rate (C, F) for the four QSAR methods. A, B, C: pKi(II); D, E, F:
∆pKi(I-II).

Figure 5. Receiver operating characteristic (ROC) curves for the retrieval of the 5% compounds with lowest pKi(II) (A) and ∆pKi(I-II)
(B), respectively, applying the four QSAR approaches considered in this study. The main diagonal corresponds to a random classifier,
whereas the left and top edge of the plot would show the ROC line of an ideal classifier.
In order to present a concise comparison for the other prediction (∆pKi(I-II), Figure 6B) the 1D/2D models
retrieval experiments, Figure 6 shows bar plots denoting the outperform 3D QSAR in successfully retrieving the most
corresponding AUCs. For classification with respect to pKi- hCA I selective compounds, whereas the opposite is true
(II) (Figure 6A), all approaches perform comparably well; for identification of hCA II selective compounds. As
however, the VSA models tend to be worse in identifying demonstrated above, 3D QSAR methods obviously suffer
compounds with maximum pKi(II). In the case of selectivity less from the overprediction problem than the 1D/2D
3D QSAR FOR DATABASE SCREENING J. Chem. Inf. Model., Vol. 48, No. 2, 2008 393

Figure 6. Bar plots displaying the area under the curve (AUC)
values of the ROC curves for the four QSAR approaches considered
in this study and activity thresholds obtained when pKi(II) (A) and
∆pKi(I-II) (B) are predicted.

techniques. The decreasing performance of the 3D models


to classify the most hCA I selective compounds results from
the fact that already the individual pKi(I) values for this
isoform are modeled with reduced accuracy (data not shown). Figure 7. Enrichment plots for the retrieval of the 5% compounds
It has to be noted that the AUCs of the ROC curves allow with lowest pKi(II) (A) and ∆pKi(I-II) (B), respectively, applying
for an overall comparison of the different approaches. the four QSAR approaches considered in this study. The main
diagonal corresponds to a random selection, the steep line to the
However, the detailed characteristics of a classifier’s per- left to an ideal retrieval.
formance can only be deduced considering the overall shape
of a ROC curve.
External Categorical Prediction: Classical Enrichment Interpretation of the MACCS Models. Besides the use
Plots. The important information which cannot be derived of QSAR equations to predict novel compounds the coef-
from the ROC plots is the amount of actives identified when ficients of the equation indicate the relative importance of
a certain subset of the database is screened. This type of individual descriptors (usually multiplied by the standard
information is provided by the classical enrichment plots deviation of the descriptor, denoted as stdev*coeff). With
usually used to visualize the screening performance in drug respect to the evaluation of the 3D QSAR models we refer
design. Similarly to the ROC curves the left part of the plots to our recent paper23 giving a detailed interpretation in terms
is the most interesting for practical applications. We will of contour maps. Depending on the method applied, the
illustrate such an evaluation using the same examples as for resulting contours indicate either purely ligand-based, purely
the ROC plots. protein-based, or mixed protein-ligand-based information.
Figure 7A shows the plot for the enrichment of the 5% In contrast to this former study where an alignment has been
compounds with lowest pKi(II). Within the first few percent produced exploiting the protein binding pocket as a reference,
of the screened database all four methods successfully here we have used a similarity-based alignment strategy.
separate actives from inactives. For the remainder of the Therefore, conclusions based on contour maps have to be
screening process all models show a similarly satisfying interpreted with some caution particularly referring to
albeit not perfect performance. properties given by the protein environment. Nevertheless,
Figure 7B displays the enrichment curves for the retrieval since several crystal structures of complexes were used as
of the 5% compounds with highest selectivity toward hCA reference some correlation between the properties of the
II. On the very left-hand side of the curves CoMFA, binding pocket and the contour maps derived from the
CoMSIA, and MACCS models discriminate satisfactorily for aligned ligands can be expected.
the first 1-2% of the database, whereas the VSA model does Each of the 166 MACCS descriptors captures frequency
not depart from a random selection. For the remainder of of occurrence of a distinct molecular fragment in a molecule.
the screening progress all models perform rather disappoint- Thus, if the product stdev*coeff adopts large absolute values
ingly; however, the 3D models still show some advantages. the corresponding fragment takes strong influence on the
The VSA descriptors remain close to the performance of a biological activity. If the sign is positive its occurrence
random classifier. enhances biological activity (or selectivity, respectively). If
394 J. Chem. Inf. Model., Vol. 48, No. 2, 2008 HILLEBRECHT AND KLEBE

pounds of this subset comprise structures with scaffold H,


(thio)urea, and guanidinium moieties, respectively. An
example for substructures increasing selectivity toward hCA
II is MACCS_72 encoding OAAO. The average ∆pKi(I-
II) for molecules containing this fragment more than once
is -2.64. Hydroxysulfonamides (scaffold F) as well as ortho-
dimethoxy substituted phenyl rings are captured by this
pattern. A comparison of the stdev*coeff plots shows that
many of the values are anticorrelated, i.e., the same MACCS
key is associated with a positive sign for pKi(II) and a
negative sign for ∆pKi(I-II) and vice versa. This is a
consequence of the fact that a factor increasing affinity
toward hCA II often diminishes the difference ∆pKi(I-II).
These examples show not only that the MACCS-based
models are useful for prediction of novel compounds but
also that the coefficients of the derived equation can be used
to identify substructures of importance to influence a
molecular property such as activity. Of course, the conclu-
sions drawn from this type of analysis are only of statistical
nature. There is no basis to assume a causal relationship
between occurrence of a distinct fragment and its effect on
biological activity. It might well be, that, as a hypothetical
example, a compound class such as thiophenes most fre-
Figure 8. Bar plots of stdev*coeff derived from MACCS QSAR quently contains several sulfonyl groups and exhibits at the
equations for prediction of pKi(II) (A) and ∆pKi(I-II) (B),
respectively.
same time higher affinities than the average of the training
set. However, it remains unresolved whether the thiophene
it is negative, the presence of that fragment in a molecule is ring, sulfonyl groups, or the combination of both are
detrimental for its activity. Figure 8A displays a plot of 130 responsible for this observation. Since the MACCS keys do
stdev*coeff values derived from the QSAR equation for not take the functional characteristics of the encoded groups
pKi(II), one for each MACCS key (36 of the 166 were into account with respect to protein-ligand interactions, they
discarded due to zero variance). The highest peak is can only indirectly evidence a possible correlation. Another
experienced by MACCS_81 (MACCS numbering corre- limitation of this approach is that many of the encoded
sponds to the original MDL numbering and not to the fragments are rather small and do not represent a meaningful
numbering scheme used in Figure 8), which corresponds to “chemical unit”. The approach also does not consider any
the fragment SA(A)A (where A is any atom except hydro- information about connectivity between single fragments.
gen). This pattern is highly abundant in the data set, but Thus, the method will fail if molecules are attempted to be
molecules exhibiting this fragment particularly frequently predicted which possess a similar count of the same
(two or three times, found in 25 molecules) achieve an fragments as a subset of ligands from the training set but
extraordinarily high pKi(II) (mean(subset) ) 8.63, mean(all) connected in a different way. Clearly, this shortcoming will
) 7.17). The corresponding molecules possess in general not occur using a 3D QSAR method as an evaluation
more than one sulfonamide group mostly along with a technique. Nevertheless, if narrow focused libraries with
thiophene ring. For example, the high-affinity thienothiopy- close similarity to the training set compounds are evaluated,
ranes (scaffold A) have one sulfonamide anchor and a second the MACCS key based method will probably yield reason-
sulfonyl group in the thiopyrane ring, and additionally a third able accuracy. An advantage of the fragmental description
time the pattern is matched by the thiophene ring. An of the molecules issdespite ignorance of actual functional
example indicating affinity decrease is fragment MACCS_89, groupssthe simple and straightforward translation into
which encodes the number of OAAAO substructures. This chemical structures. This is in contrast to many other
pattern occurs particularly in sulfamates (scaffold I) which property- or graph-based descriptors commonly applied in
have significantly lower pKi(II) values (mean(subset) ) 5.25) QSAR studies, e.g., the VSA descriptors used in our fourth
compared to the entire training set. This is due to the approach to evaluate the data set. We therefore will refrain
replacement of the sulfonamide anchor by a sulfamate group, from a detailed interpretation. In principle the relevance of
which is known to be a poorer zinc binder. properties such as logP, MR etc. falling into distinct intervals
Figure 8B shows a plot of stdev*coeff derived from the could be assessed. However, the obtained information will
QSAR equation for ∆pKi(I-II). The high value of the remain rather indirect and general due to the global character
product stdev*coeff for MACCS_43 means that the sub- of the descriptors. Particularly, it faces the problem of a
structure QHAQH (where Q is any atom except hydrogen nontrivial translation into chemical structures in accordance
and carbon) increases selectivity toward hCA I. The average with the highlighted properties.
for ∆pKi(I-II) is -1.29 (i.e., most compounds inhibit hCA
II stronger than hCA I). Molecules with higher occurrence CONCLUSION
of this fragment compared to the remainder of the training
set (46 molecules with more than one occurrence) hence Within the present study we assessed the predictive power
possess a higher ∆pKi(I-II) (mean(subset) ) 0.03). Com- of QSAR approaches with respect to their applicability to
3D QSAR FOR DATABASE SCREENING J. Chem. Inf. Model., Vol. 48, No. 2, 2008 395

screen databases. Since we were especially interested in 3D has to keep in mind that they will depend on the training
methods, a protocol had to be established which is reliable and test set composition and cannot be transferred generally.
and robust enough to produce consistent spatial alignments Furthermore, additional investigations considering other
of the molecules under consideration. Due to the large QSAR techniques (other statistical methods, 4D,42 5D,43 6D44
number of compounds encountered in real life screening approaches) need to be done to collect more experience on
scenarios the protocolsonce set upshas to be applicable the scope and limitations of QSAR methods for database
without further manual intervention. FlexS in combination screening.
with automated recognition of a chemical compound class
of ligands to be superimposed via the MAPREF methodology ACKNOWLEDGMENT
was chosen to successfully accomplish this task. We could
demonstrate that CoMFA and CoMSIA models based on this The authors acknowledge the kind support of BioSolveIT
with respect to special FlexS parameters, in particular Markus
alignment perform comparably well with similar models
Lilienthal. The Chemical Computing Group (CCG) is
based on manually derived alignments using the protein’s
acknowledged for provision of one research license of MOE.
binding pocket as a reference point along with subsequent The authors are grateful to Prof. Dr. Claudiu T. Supuran
force field relaxation. Since the superimposition comprises (University of Florence) for making the data set of CA I
a rather elaborate and time-consuming step we also tested and CA II inhibitors available to us.
the performance of alignment-free 1D and 2D QSAR models
particularly with respect to database screening. Therefore, Supporting Information Available: Atomic coordinates
fragment-based MACCS descriptors and property-based VSA of all molecules of the data set with assigned pKi(I), pKi(II),
descriptors were computed based on the 2D molecular and ∆pKi(I-II) values as SD file. This material is available
free of charge via the Internet at http://pubs.acs.org.
information. The external predictivity was assessed based
on a test set of 663 compounds with known activities. Of
REFERENCES AND NOTES
course, this number of test molecules does not touch the size
of a real library, but it should be sufficiently large for the (1) Hansch, C.; Fujita, T. r-s-p analysis - A method for the correlation of
intended benchmark test. In summary, the 3D QSAR models biological activity and chemical structure. J. Am. Chem. Soc. 1964,
and the MACCS keys performed quite well with respect to 86, 1616-1626.
(2) Free, S. M., Jr.; Wilson, J. W. A Mathematical Contribution to
affinity prediction (pKi(II)), whereas the VSA descriptors did Structure-Activity Studies. J. Med. Chem. 1964, 53, 395-9.
not achieve to establish models with comparable predictive (3) Kubinyi, H. QSAR: Hansch Analysis and Related Approaches;
power. In terms of numerical affinity prediction the 3D VCH: Weinheim, 1993; Vol. 1.
(4) Zheng, W.; Tropsha, A. Novel variable selection quantitative structure-
QSAR models significantly outperformed the 1D and 2D property relationship approach based on the k-nearest-neighbor
approaches. They tend to be more specific than the MACCS principle. J. Chem. Inf. Comput. Sci. 2000, 40, 185-94.
keys but at the cost of a lower sensitivity. The models are (5) Brown, N.; Lewis, R. A. Exploiting QSAR methods in lead optimiza-
tion. Curr. Opin. Drug DiscoVery DeV. 2006, 9, 419-24.
difficult to mutually rank against each other since relevance, (6) Shen, M.; Beguin, C.; Golbraikh, A.; Stables, J. P.; Kohn, H.; Tropsha,
predictive value, and applicability depend on the specific goal A. Application of predictive QSAR models to database mining:
of the project, e.g., whether retrieval of only a few com- identification and experimental validation of novel anticonvulsant
compounds. J. Med. Chem. 2004, 47, 2356-64.
pounds with an enhanced activity is intended or whether as (7) Oloff, S.; Mailman, R. B.; Tropsha, A. Application of validated QSAR
many actives as possible should be detected. Thus, even if models of D1 dopaminergic antagonists for database mining. J. Med.
the MACCS approach is easier to perform and computa- Chem. 2005, 48, 7322-32.
tionally less demanding it does not make the 3D methods (8) Tropsha, A. Application of Predicitve QSAR Models to Database
Mining. In Chemoinformatics in Drug DiscoVery; Oprea, T. I., Ed.;
superfluous for screening purposes. Also with respect to the Wiley-VCH: Weinheim, 2004; Vol. 23, pp 437-455.
aspect of generality of the descriptors the molecular fields (9) Moro, S.; Bacilieri, M.; Cacciari, B.; Bolcato, C.; Cusan, C.; Pastorin,
of a ligand approximate better the concept of molecular G.; Klotz, K. N.; Spalluto, G. The application of a 3D-QSAR
(autoMEP/PLS) approach as an efficient pharmacodynamic-driven
recognition, and second they should not suffer from the fact filtering method for small-sized virtual library: application to a lead
of missing connectivity information as the MACCS keys do. optimization of a human A3 adenosine receptor antagonist. Bioorg.
Most likely, the 3D QSAR methods are much more robust Med. Chem. 2006, 14, 4923-32.
(10) Pastor, M.; Cruciani, G.; McLay, I.; Pickett, S.; Clementi, S. GRid-
in handling data sets composed of compounds with structur- INdependent descriptors (GRIND): a novel class of alignment-
ally rather diverse molecular skeletons. Nevertheless, we independent three-dimensional molecular descriptors. J. Med. Chem.
could show that the PLS coefficients of the derived MACCS 2000, 43, 3233-43.
(11) Carosati, E.; Mannhold, R.; Wahl, P.; Hansen, J. B.; Fremming, T.;
model can be interpreted meaningfully and used to extract Zamora, I.; Cianchetta, G.; Baroni, M. Virtual screening for novel
knowledge about the influence of individual fragments on openers of pancreatic K(ATP) channels. J. Med. Chem. 2007, 50,
the dependent variables. 2117-26.
(12) Benedetti, P.; Mannhold, R.; Cruciani, G.; Ottaviani, G. GRIND/
With respect to a quantitative selectivity prediction, only ALMOND investigations on CysLT1 receptor antagonists of the
the internal consistency was convincing. None of the quinolinyl(bridged)aryl type. Bioorg. Med. Chem. 2004, 12, 3607-
techniques was able to make satisfying numerical forecasts 17.
(13) Murcia, M.; Ortiz, A. R. Virtual screening with flexible docking and
on this large data set. However, this can be attributed to the COMBINE-based models. Application to a series of factor Xa
rather imbalanced composition of the test set and the inhibitors. J. Med. Chem. 2004, 47, 805-20.
generally higher error contained in a “composed variable”. (14) Ortiz, A. R.; Pisabarro, M. T.; Gago, F.; Wade, R. C. Prediction of
drug binding affinities by comparative binding energy analysis. J. Med.
The results on categorical predictivity suggest that the 3D Chem. 1995, 38, 2681-2691.
models can still give crude estimates about selectivity. (15) Zhang, Q. Y.; Wan, J.; Xu, X.; Yang, G. F.; Ren, Y. L.; Liu, J. J.;
The main purpose of this study was to systematically Wang, H.; Guo, Y., Structure-based rational quest for potential novel
inhibitors of human HMG-CoA reductase by combining CoMFA 3D
assess the performance of 3D QSAR models with respect to QSAR modeling and virtual screening. J. Comb. Chem. 2007, 9, 131-
database screening. Although the results are convincing, one 8.
396 J. Chem. Inf. Model., Vol. 48, No. 2, 2008 HILLEBRECHT AND KLEBE

(16) Cramer, R. D. Comparative Molecular Field Analysis, (CoMFA). 1. (32) Supuran, C. T.; Scozzafava, A. Carbonic Anhydrase Inhibitors. Curr.
Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Med. Chem.: Imm., Endoc., Metab. Agents 2001, 1, 61-97.
Chem. Soc. 1988, 110, 5959-5967. (33) Supuran, C. T.; Scozzafava, A. Applications of carbonic anhydrase
(17) Cramer, R. D.; DePriest, S. A.; Patterson, D. E.; Hecht, P. The inhibitors and activators in therapy. Expert Opin. Ther. Pat. 2002,
Developing Practice of Comparative Molecular Field Analysis. In 3D 12, 217-242.
QSAR in Drug Design: Theory Methods and Applications; Kubinyi, (34) SYBYL molecular modeling package, Version 7.1; Tripos Inc.: 1699
H., Ed.; ESCOM: Leiden, 1993; pp 443-485. South Hanley Road, Suite 303, St. Louis, MO 63144, 2005.
(18) Klebe, G.; Abraham, U.; Mietzner, T. Molecular similarity indices in
(35) MOE; Chemical Computing Group: Montreal, Canada.
a comparative analysis (CoMSIA) of drug molecules to correlate and
predict their biological activity. J. Med. Chem. 1994, 37, 4130-46. (36) AutoQSAR. In The SVL script AutoQSAR is freely available to MOE
(19) Klebe, G. Comparative Molecular Similarity Indices Analysis: CoM- licensees and can be downloaded at http://svl.chemcomp.com.
SIA. Perspect. Drug DiscoVery Des. 1998, 12/13/14, 87-104. (37) Bush, B. L.; Nachbar, R. B., Jr. Sample-distance partial least squares:
(20) Klebe, G., Structural Alignment of Molecules. In 3D QSAR in Drug PLS optimized for many variables, with application to CoMFA. J.
Design: Theory Methods and Applications; Kubinyi, H., Ed.; ES- Comput.-Aided Mol. Des. 1993, 7, 587-619.
COM: Leiden, 1993; Vol. 1, pp 173-199. (38) Thibaut, U.; Folkers, G.; Klebe, G.; Kubinyi, H.; Merz, A.; Rognan,
(21) Lemmen, C.; Lengauer, T. Computational methods for the structural D. Recommendations for CoMFA Studies and 3D QSAR Publications.
alignment of molecules. J. Comput.-Aided Mol. Des. 2000, 14, 215- In 3D QSAR in Drug Design: Theory Methods and Applications;
32. Kubinyi, H., Ed.; ESCOM: Leiden, 1993; Vol. 1, pp 711-716.
(22) Lemmen, C.; Lengauer, T.; Klebe, G. FLEXS: a method for fast (39) Wold, S.; Johansson, E.; Cocchi, M. PLS - Partial Least-Squares
flexible ligand superposition. J. Med. Chem. 1998, 41, 4502-20. Projections to Latent Structures. In 3D QSAR in Drug Design: Theory
(23) Hillebrecht, A.; Supuran, C. T.; Klebe, G. Integrated approach using Methods and Applications; Kubinyi, H., Ed.; ESCOM: Leiden, 1993;
protein and ligand information to analyze selectivity- and affinity- pp 523-550.
determining features of carbonic anhydrase isozymes. ChemMedChem (40) Evers, A.; Hessler, G.; Matter, H.; Klabunde, T. Virtual screening of
2006, 1, 839-53. biogenic amine-binding G-protein coupled receptors: comparative
(24) MDL Information Systems, Inc. 14600 Catalina Street, San Leandro, evaluation of protein- and ligand-based virtual screening protocols. J.
CA 94577. Med. Chem. 2005, 48, 5448-65.
(25) Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization
of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. (41) Triballeau, N.; Acher, F.; Brabet, I.; Pin, J. P.; Bertrand, H. O. Virtual
2002, 42, 1273-80. screening workflow development guided by the “receiver operating
(26) Labute, P. A widely applicable set of descriptors. J. Mol. Graphics characteristic” curve approach. Application to high-throughput docking
Modell. 2000, 18, 464-77. on metabotropic glutamate receptor subtype 4. J. Med. Chem. 2005,
(27) Witten, I. H.; Frank, E. Data Mining: Practical Machine Learning 48, 2534-47.
Tools and Techniques, 2nd ed.; Morgan Kaufmann Publishers: San (42) Hopfinger, A. J.; Wang, S.; Tokarski, J. S.; Jin, B.; Albuquerque, M.;
Fransisco, 2005. Madhav, P. J.; Duraiswami, C. Construction of 3D-QSAR Models
(28) Triballeau, N.; Bertrand, H. O.; Acher, F. Are You Sure You Have a Using the 4D-QSAR Analysis Formalism. J. Am. Chem. Soc. 1997,
Good Model? In Pharmacophores and Pharmacophore Searches; 119, 10509-10524.
Langer, T., Hoffmann, R. D., Eds.; Wiley-VCH: Weinheim, 2006; (43) Vedani, A.; Dobler, M. 5D-QSAR: the key for simulating induced
Vol. 32, pp 325-364. fit? J. Med. Chem. 2002, 45, 2139-49.
(29) Maren, T. H. Carbonic anhydrase: chemistry, physiology, and (44) Vedani, A.; Dobler, M.; Lill, M. A. Combining protein modeling and
inhibition. Physiol. ReV. 1967, 47, 595-781. 6D-QSAR. Simulating the binding of structurally diverse ligands to
(30) Lindskog, S. Structure and mechanism of carbonic anhydrase. Phar- the estrogen receptor. J. Med. Chem. 2005, 48, 3700-3.
macol. Ther. 1997, 74, 1-20.
(31) Geers, C.; Gros, G. Carbon dioxide transport and carbonic anhydrase
in blood and muscle. Physiol. ReV. 2000, 80, 681-715. CI7002945

You might also like