Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Fragment-based identification of druggable 'hot spots' of proteins using Fourier domain correlation techniques

2009, Bioinformatics

BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 5 2009, pages 621–627 doi:10.1093/bioinformatics/btp036 Structural bioinformatics Fragment-based identification of druggable ‘hot spots’ of proteins using Fourier domain correlation techniques Ryan Brenke1,† , Dima Kozakov2,† , Gwo-Yu Chuang2,† , Dmitri Beglov2,† , David Hall2 , Melissa R. Landon1 , Carla Mattos3 and Sandor Vajda2,∗ 1 Program in Bioinformatics, 2 Structural Bioinformatics Laboratory, Department of Biomedical Engineering, Boston University, Boston, MA, USA and 3 Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, NC, USA Received on October 9, 2008; revised on January 8, 2009; accepted on January 13, 2009 Advance Access publication January 28, 2009 Associate Editor: Burkhard Rost ABSTRACT Motivation: The binding sites of proteins generally contain smaller regions that provide major contributions to the binding free energy and hence are the prime targets in drug design. Screening libraries of fragment-sized compounds by NMR or X-ray crystallography demonstrates that such ‘hot spot’ regions bind a large variety of small organic molecules, and that a relatively high ‘hit rate’ is predictive of target sites that are likely to bind drug-like ligands with high affinity. Our goal is to determine the ‘hot spots’ computationally rather than experimentally. Results: We have developed the FTMAP algorithm that performs global search of the entire protein surface for regions that bind a number of small organic probe molecules. The search is based on the extremely efficient fast Fourier transform (FFT) correlation approach which can sample billions of probe positions on dense translational and rotational grids, but can use only sums of correlation functions for scoring and hence is generally restricted to very simple energy expressions. The novelty of FTMAP is that we were able to incorporate and represent on grids a detailed energy expression, resulting in a very accurate identification of low-energy probe clusters. Overlapping clusters of different probes are defined as consensus sites (CSs). We show that the largest CS is generally located at the most important subsite of the protein binding site, and the nearby smaller CSs identify other important subsites. Mapping results are presented for elastase whose structure has been solved in aqueous solutions of eight organic solvents, and we show that FTMAP provides very similar information. The second application is to renin, a long-standing pharmaceutical target for the treatment of hypertension, and we show that the major CSs trace out the shape of the first approved renin inhibitor, aliskiren. Availability: FTMAP is available as a server at http://ftmap.bu.edu/. Contact: vajda@bu.edu Supplementary information: Supplementary Material is available at Bioinformatics online. ∗ To whom correspondence should be addressed. authors wish it to be known that, in their opinion, the first four authors should be regraded as joint First Authors. † The 1 INTRODUCTION It has been recognized for many protein–ligand complexes that certain regions of the binding surface, often called ‘hot spots’, contribute a disproportionate amount to the binding free energy (Hajduk et al., 2005). Such regions are more likely to bind small drug-like compounds with high affinity than the rest of the binding site, and hence their identification is important for drug design (DeLano, 2002; Vajda and Guarnieri, 2006). Both NMR (nuclear magnetic resonance) and X-ray crystallography techniques have been used to find ‘hot spots’ of proteins. Using NMR the 15 N-labeled protein is screened against a library of small probe compounds (Hajduk et al., 2005; Vajda and Guarnieri, 2006). Applications to a variety of proteins demonstrate that the ‘hot spots’ bind a variety of small molecules, and a high ‘hit rate’ is a good predictor of druggability. Indeed, a high correlation was observed between the number of different probes binding to a site, and the ability to identify high-affinity druglike ligands that bind there (Hajduk et al., 2005). The X-ray technique, known as Multiple Solvent Crystal Structures (MSCS) method, is based on solving the structure of the protein in aqueous solutions of various probe compounds, primarily organic solvents (Mattos and Ringe, 1996). Each structure shows a few organic molecules associated with the protein surface in the first shell of water molecules. The power of the method arises from superimposing a number of structures solved in different solvents. Most probes generally cluster in the binding site, and the overlapping probe clusters form ‘consensus’ sites (CSs) that delineate the functionally most important subsites. As demonstrated by applications to porcine elastase (Allen et al., 1996; Mattos and Ringe, 1996; Mattos et al., 2006) and thermolysin (English et al., 1999, 2001), some probes may also bind at crystal contacts or in small buried pockets, but the large CSs occur in the ‘hot spots’ of the binding site. Since the identification of ‘hot spots’ by NMR or X-ray crystallography is very expensive, it is important to explore whether similar information can be obtained by computations. Computational mapping methods place molecular probes on the protein surface in order to explore its binding properties. A number of methods identify potential binding sites (An et al., 2005; Glaser et al., 2006; Laurie and Jackson, 2005). Some early methods such as GRID (Goodford, 1985) and Multiple Copy Simultaneous Search © The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 621 R.Brenke et al. (MCSS) (Caflisch et al., 1993; Miranker and Karplus, 1991) have been developed to find favorable binding positions for specific molecules or functional groups rather than to identify ‘hot spots’. Both methods result in many energy minima, and it is difficult to determine which of the minima are actually relevant (Mattos and Ringe, 1996). In this article, we describe the FTMAP algorithm specifically developed to reproduce NMR and X-ray mapping results using a number of organic probe molecules. For each probe the algorithm generates 2000 bound positions using rigid body docking, refines the positions by energy minimization, clusters the resulting conformations and ranks the clusters on the basis of this average free energy. The docking step is based on the fast Fourier transform (FFT) correlation approach which helps to efficiently sample billions of probe positions on dense translational and rotational grids, but can use only sums of correlation functions for scoring and hence, is generally restricted to very simple energy expressions. The novelty of FTMAP is that we were able to incorporate and represent on grids a detailed energy expression which includes attractive and repulsive van der Waals terms, electrostatic interaction energy based on Poisson–Boltzmann calculations, a cavity term to represent the effect of non-polar enclosures and a structure-based pairwise interaction potential. The energy expression is key to the accuracy of FTMAP, and the algorithm matches or exceeds the accuracy of our earlier mapping method (Dennis et al., 2002; Silberstein et al., 2003), although FTMAP requires only about one-sixth of the CPU time. Here, we show that FTMAP reproduces the X-ray mapping results for elastase very well (Allen et al., 1996; Mattos et al., 2006). The second application presented is to renin, a long-standing pharmaceutical target for the treatment of hypertension, with the first renin inhibitor approved in 2007 (Rahuel et al., 2000; Wood et al., 2003). For renin we do not have experimental mapping results, and our goal here is to show that mapping can reliably identify the ‘hot spots’ that substantially contribute to the free energy of ligand binding and hence should be the primary targets of drug design efforts. Two more applications are described in Supplementary Material. In addition, the FTMAP server page provides mapping results for both unbound and ligand-bound structures of 10 drug target proteins (http://ftmap.bu.edu/). 2 2.1 METHODS Outline of the mapping algorithm The FTMAP algorithm consists of five steps as follows. Step 1: rigid body docking of probe molecules. Protein structures are downloaded from the Protein Data Bank (PDB) (Berman et al., 2000). All bound ligands and water molecules are removed. For each structure, we use 16 small molecules as probes (ethanol, isopropanol, isobutanol, acetone, acetaldehyde, dimethyl ether, cyclohexane, ethane, acetonitrile, urea, methylamine, phenol, benzaldehyde, benzene, acetamide and N,N-dimethylformamide). The selection of this probe library is further discussed in the Supplemantary Material. For each probe, billions of docked conformations are sampled by the rigid body docking step that will be discussed in more details. Note that mapping requires only the atomic coordinates of the two molecules, i.e. no a priori information on the binding site is used. The 2000 best poses for each probe are retained for further processing. Step 2: minimization and rescoring. The free energy of each of the 2000 complexes, generated in Step 1, is minimized using the CHARMM potential with the Analytic Continuum Electrostatic (ACE) model representing the 622 electrostatics and solvation terms as implemented in version 27 of CHARMM (Brooks et al., 1983) using the parameter set from version 19 of the program. During the minimization by an adopted basis Newton–Raphson method the protein atoms are held fixed, while the atoms of the probe molecules are free to move. Step 3: clustering and ranking. The minimized probe conformations from Step 2 are grouped into clusters using a simple greedy algorithm. The lowest energy structure is selected and the structures within 3 Å RMSD are joined in the first cluster. The members of this cluster are removed, and we select the next lowest energy structure to start the second cluster. This step is repeated until the entire set is exhausted. Clusters with less than 10 members are excluded from consideration, thereby avoiding narrow energy minima with low entropy (Ruvinsky and Kozintsev, 2006). The clusters are ranked on the basis of their Boltzman averaged energies, and for each probe six clusters with the lowest average free energies are retained. Further details on the selection of the clustering parameters are given in the Supplementary Material. Step 4: determination of CSs. To determine the ‘hot spots’ defined by overlapping probe clusters, the clusters of different probes are clustered using the distance between the centers of mass of the cluster centers as the distance measure. Again we use a simple greedy algorithm. The cluster with the maximum number of neighbors (defined as cluster centers within 4 Å) is selected as the center of CS1, and the clusters within 4 Å as the members of CS1. The clusters in CS1 are removed from consideration, and the procedure is repeated until all clusters are assigned to a CS. Having formed the CSs, their centers are fixed, but occasionally clusters have to be moved to assure that each is closest to the center of its own CS. Finally the CSs are ranked based on the number of their clusters. Duplicate clusters of the same type are considered in the count. Step 5: characterization of the binding site. First, we select the largest CS1 that generally identifies the most important subsite (or ‘hot spot’). CS1 forms the kernel of the binding site. Since additional clustering of probes close to the main CS is likely to indicate other subsites of the binding site, we expand the binding site by adding any CS (irrespective of its size) within 7 Å from any CS already in the binding site, and continue this procedure until no further expansion is possible. The probes in the resulting set of CSs are used to describe the binding site. In particular, we count the nonbonded interactions and hydrogen bonds between all atoms of these probes and the individual protein residues using the HBPLUS program (McDonald and Thornton, 1994). 2.2 The FFT correlation approach to mapping In Step 1, we perform exhaustive evaluation of an energy function in the discretized 6D space of mutual orientations of the protein (receptor) and a small molecular probe (ligand). The center of mass of the receptor is fixed at the origin of the coordinate system. The translational space is represented as a grid of 0.8 Å displacements of the ligand center of mass, and the rotational space is sampled using 500 rotations based on a deterministic layered Sukharev grid sequence which quasi-uniformly covers the space (Lindemann et al., 2004). The energy function describing the receptor–ligand interactions is defined on the grid and is expressed as the sum of P correlation functions for all possible translations α, β, γ of the ligand at a given rotation:  E(α,β,γ ) = Rp (i,j,k)Lp (i+α,j +β,k +γ ) p i,j,k where Rp (i,j,k) and Lp (i,j,k) are the components of the correlation function defined on the receptor and the ligand, respectively. This expression can be efficiently calculated using P forward and one inverse FFTs, denoted by FT and IFT, respectively: E(α,β,γ ) = IFT( P  p {FT∗ {Rp }FT{Lp }})(α,β,γ ) Protein mapping FT{F}(l,m,n) =  F(i,j,k)exp−2π i(li/N1 +mj/N2 +nk/N3 ) i,j,k IFT{f }(i,j,k) = C  f (l,m,n)exp2π i(li/N1 +mj/N2 +nk/N3 ) l,m,n √ where i = −1; N1 , N2 and N3 are the dimensions of the grid along the three coordinates; and C = 1/(N1 N2 N3 ). If N1 = N2 = N3 = N, the efficiency of this approach is O(N 3 log(N 3 )) as compared with O(N 6 ) when all evaluations are performed directly. For each rotation of the ligand we generate the FT(Lp ) function on the grid and then calculate the sum of the correlation functions using the above formula, resulting in scoring function values for all possible translations. Since the function may have multiple minima, we retain the four lowest energy regions of the translational space for each rotation. To derive the first region we select the lowest energy solution, remove the surrounding volume of the 27 Å3 , and repeat this step three more times. Finally, results from different rotations are collected and sorted. 2.3 Energy function in the rigid body docking step The energy expression in Step 1 includes the simplified van der Waals energy Evdw with attractive (Eattr ) and repulsive (Erep ) contributions, the electrostatic interaction energy Eelec , a cavity term Ecavity describing the contributions from hydrophobic enclosures and the statistical knowledge-based pairwise potential Epair representing other solvation effects: E = Evdw +w2 Eelec +w3 Ecavity +w4 Epair Evdw = Eattr +w1 Erep Eelec = Nl  Epair = NL NR   qi φrPB i=1 2.3.2 Electrostatic interactions We approximate the electrostatic energy as the interaction energy between the electrostatic potential φrPB of the solvated protein and the atomic charges qi of the probe. Thus, the influence of the probe on the electrostatic potential of the protein-solvent system is neglected, assuming that the probe is small and not strongly charged. Using a dielectric continuum model with low ion concentration (corresponding to 0.1 mol salt concentration), the electrostatic potential of the solvated protein is calculated by solving the linearized Poisson–Boltzmann equation → → → → → ∇ǫ(− r )∇φrPB (− r )−κ 2 (− r )φrPB (− r ) = −4πρ(− r ), → → → where ǫ(− r ), κ(− r ) and ρ(− r ) are the dielectric constant, the modified Debye–Hückel screening factor, and the fixed charge density of the protein, respectively. The dielectric boundary between the low dielectric protein region and the external bulk solvent is placed to account for the reduced water mobility and hence reduced polarization in binding site cavities. This is achieved by dividing atoms of the protein into ‘cavity’ and ‘non-cavity’ sets. Atoms are considered ‘cavity’ if they are not accessible to a large spherical probe of 5.75 Å radius. The size of the probe is selected to represent the typical dimensions of protein active sites. Each cavity atom is assigned a dielectric radius equal to its van der Waals radius (rvdw) plus 1.4 Å. In contrast, each non-cavity atom has a small fixed dielectric radius of 0.1 Å. These radii define a continuous surface that separates the low dielectric protein and its extension into the cavities (ǫ = 4.0) from the bulk solvent (ǫ = 80.0). We use the Poisson–Boltzmann module PBEQ (D.Beglov and B.Roux, unpublished data) of CHARMM (Brooks et al., 1983) to calculate the potential φrPB . The electrostatic energy is then expressed as the vector product of the functions Rp (l,m,n) = φrPB (l,m,n)  qj if (l,m,n) ∋ (aj ∈ J) Lp (l,m,n) = 0 otherwise εij i=1 j=1 where NR and NL denote the numbers of atoms in the receptor and the ligand, respectively. The coefficients w1 = 11.1, w2 = 44.4, w3 = 0.88 and w4 = 3.33 weight the different contributions to the scoring function. The weight w1 of the repulsive term is selected to avoid substantial steric clashes, but to allow for some atomic overlaps accounting for the (limited) flexibility of the receptor. The coefficients w2 and w3 are selected according to calorimetric considerations and are scaled to the van der Waals term. The goal of adding the attractive cavity term Ecavity is to bias the sampling toward the cavities of the protein. In the remainder of this section we describe the biophysical origin of these energy terms, as well as their implementation on a grid. 2.3.1 van der Waals energy We use stepwise functions to represent the attractive and repulsive steric terms as suggested by Vakser (Vakser and Aflalo, 1994). The repulsive interactions are cutoff at the van der Waals radius (rvdw) plus 1.8 Å because we want the penalty function to be tolerant enough and to allow for differences between bound and unbound structures. The attractive part is truncated at 6 Å. On the grid, the functions describing the receptor and the ligand can be represented as follows. Rp (l,m,n) = −cl,m,n +w1 rl,m,n  1 if (l,m,n) ∋ (aj ∈ J) Lp (l,m,n) = 0 otherwise where cl,m,n is the number of atoms that are at the distance d < r < D from the grid point (l,m,n), rl,m,n is the number of atoms that are at the distance r < d from the same grid point and (l,m,n) ∋ (aj ∈ J) means that the grid point (l,m,n) overlaps with atom aj of atom type J. As mentioned, D = 6 Å and d = rvdw +1.8 Å. Thus, the correlation of Rp and Lp provides a shape complementarity term representing both repulsive and attractive interactions, the former for the distances r < d, and the latter in the range d < r < D. defined on the receptor and on the ligand grids, respectively. The potential is truncated at 15.0 kcal/mol. 2.3.3 Pairwise statistical potential The general form of a pairwise contact potential is Epair = NL NR   εij . i=1 j=1 For a pair of atoms ai and aj of types I and J, respectively, εij = εIJ , where εIJ is the contact energy between atoms of types I and J, if d < rij < D; otherwise εij = 0. We use the DARS (decoys as the reference state) potential that originally has been developed for protein–protein docking (Chuang et al., 2008; Kozakov et al., 2006), and has been extended to describe the interactions between proteins and the molecular probes considered here. The DARS parameters used in this work are listed in the Supplementary Material. In order to evaluate the energy function using FFT it must be written as a sum of correlation functions. Based on the eigenvalue–eigenvector decomposition of the matrix of pairwise interaction coefficients εIJ , these coefficients can be written as K  εIJ = λp upI upJ p=1 where λp is the p-th eigenvalue of the interaction matrix, and upI is the I-th component of the p-th eigenvector. Each term in the eigenvalue–eigenvector decomposition represents an energy contribution proportional to the absolute value of the eigenvalue λp , and such contributions are independent due to the orthogonality of the eigenvectors. We have shown that restricting consideration to the first four terms yields around 10% error in the energy values, comparable with the error of representing the energies on a grid (Kozakov et al., 2006). The energy term with the p-th eigenvalue of the 623 R.Brenke et al. pairwise potential is defined by the correlation of the functions Rp (l,m,n) = Nr  Lp (l,m,n) =  upI δi i=1 upJ if (l,m,n) ∋ (aj ∈ J) 0 otherwise where δi is 1 if atom i of the receptor is closer than 6 Å to the grid point (l,m,n). 2.3.4 Cavity term Non-polar cavities disrupt water structure and create favorable environment for ligand binding (Young et al., 2007). To represent this effect we place a Gaussian ball, with σ = 10 Å, at each grid point, and calculate its correlation with the Cα atoms of non-polar residues. For each point in space, this function measures the fraction of the ball occupied by the non-polar regions of the protein: Rp (l,m,n) = NR  Lp (l,m,n) =  i=1 2 −ri,(lmn) 1 ) √ exp( σ2 σ 2π 1 if (l,m,n) ∋ AL 0 otherwise where NR is the number of atoms in the receptor, ri,(lmn) is the distance from atom i to grid point (l,m,n), and AL is an atom of the ligand probe. The correlation of Rp and Lp is large in non-polar cavities, and it is small on protrusions and flat surfaces. This way the function, added to the energy expression with a negative sign, improves the sampling of the cavity regions. 3 RESULTS AND DISCUSSION 3.1 The mapping of elastase Porcine pancreatic elastase is a serine endopeptidase with two βbarrel domains. The catalytic residues Ser203, His60 and Asp108 lie in a cleft between the domains and the peptide substrate binding site lies roughly perpendicular to the inter-domain cleft, with the S4 −S1 subsites in one domain and the S1′ −S3′ in the other. Mattos and co-workers (Allen et al., 1996; Mattos et al., 2006) solved the crystal structure of elastase in the presence of 95% acetone, 55% dimethylformamide, 80% 5-hexene-1,2-diol, 80% isopropanol, 80% ethanol and 40% trifluoroethanol. In addition, a crystal structure was solved in a mixture of 40% benzene, 50% isopropanol and 10% water, and one in 40% cyclohexane, 50% isopropanol and 10% water. Isopropanol was bound to elastase in these last two structures, but there were no bound benzene or cyclohexane molecules observed in the electron density maps. Most other organic solvents cluster in the active site, delineating five subsites (Fig. 1A and B). The S1 pocket binds acetone, dimethylformamide, 5-hexene-1,2-diol, ethanol, two molecules of trifluoroethanol and three molecules of isopropanol (Fig. 1A). Further probe binding in the active site occurs at S4 , the oxyanion hole, and two sites on the leaving group side of the catalytic triad close to the S3′ and S1′ pockets (Fig. 1B). S1′ binds only trifluoroethanol, but all the other subsites bind at least three different types of solvent molecules. We have mapped an elastase structure co-crystalyzed with the inhibitor trifluoroacetyl-L-lysyl-L-prolyl-p-isopropylanilid (PDB code 1ela) (Mattos et al., 1994). The structure also includes an acetic acid molecule, a sulfate ion, a calcium ion and 166 water molecules. All ligands, ions and water molecules were removed before the mapping. The largest CS (CS1) is a supercluster of 20 probe clusters in the S1 pocket (Fig. 1C). All the 16 probes are represented 624 Fig. 1. Binding of organic solvents to elastase, determined by X-ray crystallography (Allen et al., 1996; Mattos et al., 2006) and computational mapping. (A) Probe molecules in the S1 pocket of elastase, based on superimposing elastase structures solved in acetone, dimethylformamide, 5-hexene-1,2-diol, isopropanol, ethanol and trifluoroethanol. Probes are color-coded to distinguish between different molecules. (B) Probe molecules in the active site of elastase, based on superimposing the structures listed in (A). The probes are color-coded to distinguish between the different subsites. (C) Centers of probe clusters in the largest CS of elastase, located in the S1 pocket, from mapping the protein using the 16 probes. Probes are color-coded to distinguish between different molecules. (D) Centers of probe clusters in the five CSs located in the active site of elastase as determined by the mapping. The probes are color-coded to distinguish between the different CSs: CS1, cyan; CS4, salmon; CS5, sky blue; CS7, orange; and CS8, pale green. here (including three clusters of benzaldehyde), even benzene and cyclohexane that were not observed to bind in the MSCS experiments (Mattos et al., 2006). Figure 1D shows CS1 (cyan) and the adjacent CSs. CS4 (salmon) is located in the S3′ pocket and includes nine probe clusters. CS5 (sky blue) is in the S1′ pocket, and includes eight probe clusters. CS7 (orange) has seven probe clusters in the oxyanion hole. Finally, CS8 (pale green) is in the S4 subsite and has four probe clusters. A comparison of Figure 1B and D shows that using FTMAP we identify all the five subsites that bind any organic molecule in the experiments. Figure 2A and B compare the protein–probe interactions and hydrogen bonds from the experimental and computational mapping results shown in Figure 1B and D, respectively. The most important residues in the S1 subsite are Gln200, Cys199, Val224 and Thr221 (Fig. 2A). The interactions are primarily provided by the aliphatic portions of Gln200 and Thr221, and none of these residues participate in hydrogen bonds (Fig. 2B), in good agreement with the preference for substrates with apolar residues in the S1 pocket Protein mapping A B Fig. 2. (A) Intermolecular non-bonded interactions between probes and elastase residues, determined by X-ray crystallography (Mattos et al., 2006) and computational mapping. The experimental and computational results are based on the interactions found between various elastase residues and the probes in the clusters shown in Figure 1B and D, respectively. (B) The same as panel (A), but for hydrogen bonds rather than non-bonded interactions. (Mattos et al., 1994). The next set of important residues, Ser203, Thr44, His60 and Gly201, are in the largely polar oxyanion hole. Several probes also form hydrogen bonds with Ser203, Thr44 and the backbone of Gly201 (Fig. 2B). The S4 subsite is lined by Phe223 and the aliphatic portion of Arg226. The only other important residues are in S3′ (His43 and Leu77) and in S1′ (Leu156, which also interacts with S3′ ). The binding of probes coincides with the way these sites are occupied by elastase inhibitors. For example, in the Supplementary Material we compare the residue–probe contacts from our mapping results with the contacts between the elastase residues and a peptidomimetic aminimide inhibitor (Peisach et al., 1995). The agreement is excellent for all important residues. 3.2 Renin Renin, an aspartic endopeptidase, catalyzes the rate-limiting step in the renin-angiotensin system (RAS), the cleavage of the Leu10-Val11 peptide bond of angiotensinogen to form angiotensin I. Angiotensin I is converted by the angiotensin converting enzyme (ACE) into angiotensin II, which increases blood pressure. Renin is a long-standing pharmaceutical target for the treatment of hypertension, and a variety of stable peptide-like analogs of angiotensinogen have been developed over the past 20 years that Fig. 3. Mapping results for renin. (A) Superposition of aliskiren (magenta) and a peptidomimetic inhibitor (orange) in the binding site of renin. The important subsites are labeled. (B) Mapping of the aliskiren-bound structure (PDB code 2v0z). Shown are the clusters centers in the CSs located in the renin-active site, superimposed on the inhibitor structures shown in panel (A). (C) Mapping of the ligand-free renin structure (PDB code 2ren). Cluster centers as in panel (B). (D) Mapping of renin co-crystallized with a peptidomimetic inhibitor (PDB code 1rne). Cluster centers as in panel (B). were shown to inhibit renin and lower blood pressure (Fisher and Hollenberg, 2001). The peptidomimetic inhibitors, however, suffer from limited bioavailability due to poor lipophilicity and a relatively large size. Aliskiren, the first orally effective renin inhibitor with better pharmaceutical properties, was approved for the treatment of hypertension in 2007 (Rahuel et al., 2000; Wood et al., 2003). Figure 3A shows the renin-bound aliskiren structure (PDB code 2v0z, shown in magenta), superimposed with a typical peptidomimetic inhibitor (PDB code 1rne, shown in orange). The peptide-like inhibitors bind to renin subsites S4 −S1 , S1′ and S2′ , while aliskiren is lacking the P4 and P2 analogs, and hence it does not interact with the S2 or S4 subsites. The P3 to P1 pharmacophore of aliskiren is accommodated by the complementary pockets S3 and S1 (Wood et al., 2003). Aliskiren has an additional group binding to sp the S3 subsite, a deep, narrow and fairly polar pocket behind S3 , which is considered essential for high binding affinity (Rahuel et al., 2000). We present mapping results for three renin structures: (i) cocrystallized with aliskiren (PDB code 2v0z), (ii) ligand-free (PDB code 2ren) and (iii) co-crystallized with a peptidomimetic inhibitor (PDB code 1rne). Table 1 lists the largest 10 CSs located in the active site for each structure. The three largest CSs for the aliskirenbound structure are the subsites S1 (extending toward the S1′ pocket), 625 R.Brenke et al. Table 1. Summary of renin mapping results No. 1 2 3 4 5 6 7 8 9 10 A 2V0Z 2REN 1RNE Aliskiren Size Pocket Apo-Enzyme Size Pocket Peptidomimetic Size Pocket 18 13 12 11 10 8 7 7 4 2 22 16 12 10 9 6 5 4 3 3 14 14 12 10 8 7 6 6 5 4 S1 −S1′ S3SP S2′ – – S1′ S1′ – S3 – S2′ S1 −S1′ S3SP S1 S2 −S4 – – S3 – – S2′ S3SP S1 −S1′ S3 S1 – – – S1′ S2 −S4 B sp S3 and S2′ , with 18, 13 and 12 probe clusters, respectively (Table 1 and Fig. 3B). There are two much smaller superclusters in S1′ , and another in S4 . Since this is the aliskiren-bound structure, one could sp assume that the large S3 subsite is induced by ligand binding. However, this is definitely not the case. As shown in Table 1 and Figure 3C, the mapping of the renin apostructure (2ren) places the sp third largest CS (CS3), with 12 probe clusters, in the S3 subsite, thus sp the pocket exists before any ligand binding. The S3 subsite is also found, with 14 probe clusters, in the structure co-crystallized with the peptidomimetic inhibitor (PDB code 1rne). More generally, in all three structures the three most important subsites are the same, sp i.e. S2′ , S3 and S1 , the latter extending toward S1′ , and only the relative sizes (i.e. the numbers of probe clusters) vary. The most important difference between the three structures is a CS between the S2 and S4 subsites, present in both the apostructure (Fig. 3C) and the peptidomimetic-bound protein (Fig. 3D), but absent in the aliskiren-bound structure (Fig. 3B). However, this site is relatively sp small, containing substantially fewer probe clusters than S3 . Figure 4A shows the distribution of non-bonded interactions that the bound peptidomimetic, aliskiren and the probes establish with individual renin residues. The probe interactions are based on mapping the structure co-crystallized with the peptidomimetic inhibitor rather than aliskiren. Nevertheless, the residues in the sp S3 subsite (Thr12, Gln13 and Tyr14) interact with many probes. Figure 4A also shows that these residues, particularly Tyr14 deep in sp the S3 pocket, interact with aliskiren, but slightly or not at all with the peptidomimetic. Tyr14 also forms hydrogen bonds with both aliskiren and the probes, but not with the peptidomimetic (Fig. 4B). In contrast, residues Tyr220, Ser222, His287 and Met289 in the S2 subsite that interact with the peptidomimetic but not with aliskiren attract very few probes, although Figure 4A is based on the mapping of the peptidomimetic-bound structure. Other residues interacting with the peptidomimetic backbone are Ser76 and Thr77, both located between the S1 and S2 pockets. These residues also bind very few probes, attesting that the probe clusters trace the shape of aliskiren rather than that of the peptide. We note that Thr77 forms hydrogen bonds with the peptide, but with neither aliskiren nor the probes (Fig. 4B). 626 Fig. 4. (A) Intermolecular non-bonded interactions between inhibitors, probes and renin residues. The interactions are shown for the peptidomimetic inhibitor, aliskiren and the probes. The probe interactions are based on mapping the structure co-crystallized with the peptidomimetic inhibitor (PDB code 1rne). Two atoms are considered to interact if located within 5 Å from each other. (B) Same as in panel (A), but hydrogen bonds rather than non-bonded interactions. 4 CONCLUSION Finding the most druggable pockets of target proteins, and identifying molecular fragments or functional groups that tend to bind there, are important steps in drug design. Such information can be obtained, at considerable costs, by determining the structure of a protein in a number of organic solvents by X-ray crystallography, or by NMR-based screening programs that detect the binding of fragment-sized compounds. In both methods, it has been observed that the small organic compounds cluster in the important pockets of the binding site, and this provides information on their druggability. Here, we describe the FTMAP algorithm which can be used to perform a close computational analog of such experimental druggability analyses. The main novelty of FTMAP is that it combines the highly efficient FFT correlation method of sampling protein–probe complexes with the use of a detailed and highly accurate energy function for scoring the complex conformations. The two applications presented here demonstrate the type of information that can be obtained by FTMAP. For elastase the algorithm correctly identifies all important subsites, in good agreement with the results obtained by X-ray crystallography. For renin, a long-standing pharmaceutical target, the few largest CSs Protein mapping trace out the shape of the first approved renin inhibitor, aliskiren (Rahuel et al., 2000; Wood et al., 2003), rather than that of peptidomimetic inhibitors that have been studied for several decades but did not provide any successful drug candidate. It is important that the mapping reveals the better fit of aliskiren into the ‘hot spots’ even when applied to a renin structure without any bound ligand, or to structures co-crystallized with peptidomimetic inhibitors. Two more applications are described in the Supplementary Material. FTMAP is substantially more efficient than our earlier mapping method CS-Map (Dennis et al., 2002; Silberstein et al., 2003). The FFT-based search is twice as fast as the non-linear minimization used in CS-Map, and takes about 30 min of CPU time on a single 1 GHz PIII processor for a small protein. The primary speed-up is achieved in the local minimization. Due to the improved sampling by FTMAP, for each probe it is sufficient to refine the 2000 lowest energy poses rather than over 6000 as we do in CS-Map (Dennis et al., 2002; Silberstein et al., 2003). Since the refinement of a single protein–probe complex (with the protein fixed) takes about 30 s, for each probe the minimization on a single CPU is reduced from 54 h to 18 h. Although the minimizations can be easily distributed among multiple processors, this is still a substantial gain, improving the applicability of the method. FTMAP is available as a server at http://ftmap.bu.edu/. The server requires only the PDB file or PDB code of the protein to be mapped. The output data include (i) a PDB file with the original protein and the six lowest energy cluster representatives for each probe type, grouped into CSs; (ii) the number of non-bonded interactions between the probes and each residue; and (ii) the number of hydrogen bonds between the probes and each residue. The FTMAP program is also available to academic users free of charge. Funding: National Institutes of Health (GM064700 to S.V.); National Institute of Environmental Health Sciences (P42ES07381 to S.V.); National Science Foundation ( MCB-0237297 to C.M.). Conflict of Interest: none declared. REFERENCES Allen,K. et al. (1996) An experimental approach to mapping the binding surfaces of crystalline proteins. J. Phys. Chem., 100, 2605–2611. An,J.H. et al. (2005) Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol. Cell. Proteomics, 4, 752–761. Berman,H. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. Brooks,B.R. et al. (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comp. Chem., 4, 187–217. Caflisch,A. et al. (1993) Multiple copy simultaneous search and construction of ligands in binding sites: application to inhibitors of HIV-1 aspartic proteinase. J. Med. Chem., 36, 2142–2167. Chuang,G. et al. (2008) DARS (decoys as the reference state) potentials for proteinprotein docking. Biophys. J., 95, 4217–4127. DeLano,W.L. (2002) Unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol., 12, 14–20. Dennis,S. et al. (2002) Computational mapping identifies the binding sites of organic solvents on proteins. Proc. Natl Acad. Sci. USA, 99, 4290–4295. English,A.C. et al. (1999) Locating interaction sites on proteins: the crystal structure of thermolysin soaked in 2% to 100% isopropanol. Proteins, 37, 628–640. English,A.C. et al. (2001) Experimental and computational mapping of the binding surface of a crystalline protein. Protein Eng., 14, 47–59. Fisher,N. and Hollenberg,N. (2001) Is there a future for renin inhibitors? Expert Opin. Investig. Drugs, 10, 417–426. Glaser,F. et al. (2006) A method for localizing ligand binding pockets in protein structures. Proteins, 62, 479–488. Goodford,P.J. (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem., 28, 849–875. Hajduk,P.J. et al. (2005) Druggability indices for protein targets derived from NMRbased screening data. J. Med. Chem., 48, 2518–2525. Kozakov,D. et al. (2006) PIPER: an FFT-based protein docking program with pairwise potentials. Proteins, 65, 392–406. Laurie,A.T.R. and Jackson,R.M. (2005) Q-sitefinder: an energy-based method for the prediction of protein–ligand binding sites. Bioinformatics, 21, 1908–1916. Lindemann,S. et al. (2004) Incremental grid sampling strategies in robotics. In Proceedings of the Sixth International Workshop on the Algorithmic Foundations of Robotics. Springer, Berlin/Heidelberg, Zeist, Netherlands, pp. 313–328. Mattos,C. and Ringe,D. (1996) Locating and characterizing binding sites on proteins. Nat. Biotechnol., 14, 595–599. Mattos,C. et al. (1994) Analogous inhibitors of elastase do not always bind analogously. Nat. Struct. Biol., 1, 55–58. Mattos,C. et al. (2006) Multiple solvent crystal structures: probing binding sites, plasticity and hydration. J. Mol. Biol., 357, 1471–1482. McDonald,I. and Thornton,J. (1994) Satisfying hydrogen bonding potential in proteins. J. Mol. Biol., 238, 777–793. Miranker,A. and Karplus,M. (1991) Functionality maps of binding sites: a multiple copy simultaneous search method. Proteins, 11, 29–34. Peisach,E. et al. (1995) Interaction of a peptidomimetic aminimide inhibitor with elastase. Science, 269, 66–69. Rahuel,J. et al. (2000) Structure-based drug design: the discovery of novel nonpeptide orally active inhibitors of human renin. Chem. Biol., 7, 493–504. Ruvinsky,A.M. and Kozintsev,A.V. (2006) Novel statistical-thermodynamic methods to predict protein–ligand binding positions using probability distribution functions. Proteins, 62, 202–208. Silberstein,M. et al. (2003) Identification of substrate binding sites in enzymes by computational solvent mapping. J. Mol. Biol., 332, 1095–1113. Vajda,S. and Guarnieri,F. (2006) Characterization of protein-ligand interaction sites using experimental and computational methods. Curr. Opin. Drug. Discov. Dev., 9, 354–362. Vakser,I. and Aflalo,C. (1994) Hydrophobic docking: a proposed enhancement to molecular recognition techniques. Proteins, 20, 320–329. Wood,J.M. et al. (2003) Structure-based design of aliskiren, a novel orally effective renin inhibitor. Biochem. Biophys. Res. Commun., 308, 698–705. Young,T. et al. (2007) Motifs for molecular recognition exploiting hydrophobic enclosure in protein-ligand binding. Proc. Natl Acad. Sci. USA, 104, 808–813. 627