Docking Manual
Docking Manual
Docking Manual
SYBYL®-X 2.1
Mid 2013
This material contains confidential and proprietary information of Certara, L.P. and third parties furnished under the
Tripos Software License Agreement. This material may be copied only as necessary for a Licensee’s internal use
consistent with the Agreement. The allowed use includes printing of hardcopy versions hereof as minimally necessary
for Licensee’s internal use. Neither Certara, L.P., nor any person acting on its behalf, makes any warranty or
representation, expressed or implied, with respect to the accuracy, completeness, or usefulness of the material
contained in this manual or in the corresponding electronic documentation, nor in the programs or data described
herein. Certara, L.P. assumes no responsibility nor liability with respect to the use of this manual, any materials
contained herein, or programs described herein, or for any damages resulting from the use of any of the above. Except
for printing of hardcopy versions as stated, no part of this manual may be reproduced in any form or by any means
without permission in writing from Tripos (DE), Inc., 1699 South Hanley Road, Suite 200, St. Louis, Missouri 63144-
2917, USA (314-647-1099).
Selected software programs for methodologies contained or documented herein are covered by one or more of the
following patents: AllChem: US 7,860,657; Comparative Molecular Field Analysis (CoMFA): US 5,025,388; US
5,307,287; US 5,751,605; AT E150883; BE 0592421; CH 0592421; DE 691 25 300 T2; FR 0592421; GB 0592421;
IT 0592421; NL 0592421; SE 0592421. HQSAR: US 6,208,942. Embedded NLM: US 6,675,103. Topomers: US
6,185,506; US 6,240,374; US 7,184,893; US 7,212,951. TopCoMFA: US 7,329,222. DBTop: US 7,330,793. OptiSim:
US 6,535,819. Surflex software programs for chemical analysis by morphological similarity: US 6,470,305 B1.
SYBYL, UNITY, CoMFA, CombiFlexX, Concord, DiverseSolutions, GALAHAD, LeapFrog, OptDesign, StereoPlex,
and Alchemy are registered trademarks of Certara, L.P.
AUSPYX, Benchware, CScore, DISCOtech, Distill, GASP, HQSAR, Legion, MOLCAD, Molecular Spreadsheet,
Muse, OptiDock, OptiSim, Pantheon, ProTable, ProtoPlex, Selector, SiteID, Topomer CoMFA, Topomer Search,
Tuplets, and Tripos Bookshelf are trademarks of Certara, L.P.
RACHEL is a trademark of Drug Design Methodologies.
Surflex, Surflex-Dock, and Surflex-Sim are trademarks of BioPharmics LLC.
“FairCom” and “c-tree Plus” are trademarks of FairCom Corporation and are registered in the United States and other
countries.
All other trademarks are the sole property of their respective owners.
Docking Suite Table of Contents
2. Surflex-Dock Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Perform a Simple Surflex-Dock Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Use Placed Fragments to Guide Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Allow Protein Movement to Accommodate Docked Ligands . . . . . . . . . . . . 28
2.4 Validation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4. Surflex-Dock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 Run Surflex-Dock Standalone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 The Surflex-Dock Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Reading Material
• BioPharmics’ Surflex Manual: Docking and Similarity
• See the list of Recommended Reading about Surflex-Dock on page 80.
Acknowledgments
Surflex-Dock was developed by Prof. Ajay N. Jain, University of California San
Francisco (UCSF) and BioPharmics LLC.
Multiple Engines
Running Surflex-Dock with one fragment selected and specifying more than
one engine would fail in SYBYL-X 2.0. This issue no longer occurs.
Module-Based Licensing
SYBYL continues to run with a license file issued before the SYBYL-X release.
Functionality License
Functionality License
Functionality License
This docking exercise uses thymidine kinase and 10 active ligands. Upon
successful completion of this tutorial, you will know how to use Surflex-Dock
to dock a series of ligands into the active site of a protein receptor. You can also
perform the validation test outlined at the end of the tutorial to verify that
Surflex-Dock enriches actives over inactives.
A Matter of Time: This tutorial includes a series of docking runs. The full set
requires about 30 minutes of personal time and 5 minutes of CPU time.
2. Make a local copy of the necessary files: the protein structure as retrieved from
RCSB and a file containing the structures of 10 active ligands.
This is the central dialog for the preparation of a Surflex-Dock run. Here you
will prepare the protein, define the active site, and generate the Surflex
protomol.
1KIM, as stored in the RCSB repository, is a dimer. The file contains two
complete units related by symmetry. Each consists of a protein, ligand, sulfate
salt, and water molecules. This means that every atom has a symmetry-related
duplicate in the file.
Usage notes about using dimers (or any multi-unit protein) with Surflex-Dock:
• SYBYL assigns the first set of atoms to chain A and the second set to
chain B. However, because Surflex-Dock does not take chain names into
consideration, it perceives corresponding residues in the two symmet-
rical units as duplicate entries.
• You should only use those units in the protein that define or enclose the
active site.
• If the active site is completely defined within a monomeric unit, you
only need to use a single unit in Surflex-Dock. The other unit(s)
should be removed, as demonstrated in this tutorial. Using all the
units may lead to unexpected results since Surflex-Dock does not
consider chain names.
• If the active site is defined by multiple units, use those multiple
units. In this case, it is advised to generate the protomol using either
the Ligand mode (if a crystallographic ligand is present) or the
Automatic mode. Using the Residue mode may lead to unexpected
results since Surflex-Dock does not consider chain names.
6. Identify all the structures that you want to remove: all the residues in chain B,
the ligand in chain B, the co-crystallized salt in both chains, and all the waters.
You will keep only the residues and the ligand in chain A.
! Press Remove Substructures.
Only the residues, co-crystallized salts and waters associated with chain B are
visible in the SYBYL window because a default option in the dialog is to Hide
Unselected.
Only the ligand atoms in chain A are visible in the SYBYL window because the
Hide Unselected option is on.
The protein residues in chain A are displayed again. The ligand is colored
green-blue. It is still in the protein cavity and will be extracted later, when you
leave this dialog and return to the Surflex-Dock -Define SFXC File dialog.
The Prepare Protein Structure dialog is displayed again, but items in the lower
half of the dialog are disabled. This is because you must first perform an
analysis.
! Press Analyze Selected Structure.
All residues identified by the analysis are located some distance from the cavity
and should not take any part in the docking operation.
Usage Note: Deciding whether to spend the time fixing those few residues
depends on how you intend to proceed with the docking. If you have a known
ligand and plan to use it to generate the protomol and if the problem residues
are far from the active site, you can simply proceed with the protein as is.
However, if the problem residues are close to the cavity, or if you intend to let
Surflex-Dock find the cavity and generate the protomol automatically, or if you
intend optimize the geometry of the entire protein prior to docking, it would be
best to spend some time to fully prepare the protein first.
9. Add all the hydrogens to the protein and ligand. These are necessary for
Surflex-Dock.
! On the Add Hydrogens press Add.
The hydrogens were added to the protein in the ideal geometry described in the
biopolymer dictionary’s residue files. A quick minimization step of these
hydrogens in the context of the protein will be performed later.
A few residues are reported as missing some their hydrogens. These are the
problem residues, all of them away from the cavity.
10. Prepare the molecule for a brief minimization using a suitable force field,
AMBER7 FF99. The corresponding atom types will be taken from the
dictionary for all residues, but 31 atoms are reported as not having their
AMBER/Kollman atom type assignment.
! On the Type Atoms line press Show.
All 31 ligand atoms are highlighted. They have the correct SYBYL atom types,
but this structure does not match any of the monomers defined in the dictionary.
! In the Assign AMBER Atom Types dialog set Atom Types to AMBER7
FF99 and press Assign Atom Types.
! Close the dialog.
The Prepare Protein Structure dialog still reports that 31 atoms do not have the
proper types. This is because other sets of AMBER and Kollman atom types
could be loaded on the ligand.
11. Orient the sidechain amides in all ASN and GLN residues to maximize
hydrogen bonding.
! On the Fix Sidechain Amides line press Fix.
12. Perform a focused minimization of the protein and its co-crystallized ligand.
! On the Staged Minimization line press Perform.
Watch the minimization proceed in two brief stages. While the minimization is
proceeding the atoms are color coded according to the local strain energy (sum
of energy terms in which each atom is involved). When the molecule is once
again colored by atom types the minimization is complete.
13. This is all the protein preparation that is needed in the context of this tutorial
where the known ligand will be used to generate the protomol.
The ligand is extracted into a separate molecule area and also saved into a Mol2
file whose name is derived from the name of the protein (1kim) followed by the
string _ligand.
! Press OK in the Message dialog reporting that the ligand has been
named 1kim_ligand, extracted to M3, and saved to the 1kim_
ligand.mol2 file.
14. Specify the mode of construction for the protomol. The choices are:
• Automatic—Surflex-Dock finds the largest cavity in the receptor
protein.
• Ligand—A ligand in the same coordinate space as the receptor.
• Residues—User-specified residues in the receptor.
• Multi-Channel Surface—Use MOLCAD’s multi-channel functionality
to detect potential active site cavities. If multiple cavities are fond their
surfaces are displayed in blue and listed in a dialog where you can select
the surface that contains the active site.
! In the Surflex-Dock -Define SFXC File dialog, set the Protomol Gener-
ation Mode to Ligand.
Two parameters determine the extent of the protomol. Their default values are
adequate for most datasets.
• Threshold is a factor (from 0 to 1) determining how much the protomol
can be buried in the protein. The default is 0.50. Increasing this number
will decrease the volume. Using a very small number will greatly
increase the time it takes to generate the protomol.
• Bloat can be used to inflate the protomol and include nearby crevices.
The Prefix is a text string that reflects the conditions used to generate the
protomol. The prefix is used to name the file containing the protomol and the
Surflex-Dock control file. It is composed of:
• The name of the Mol2 file containing the prepared protein;
• A single letter representing the mode of generation: A(utomatic) or
L(igand) or R(esidues) or M(ulti-channel);
• The Threshold value;
• The Bloat value.
Usage Note: If the active site is an open channel, rather than an enclosed cavity,
remember to increase the Bloat value so that the generated protomol is large
enough to account for those open ends. If the Bloat value is too small, Surflex-
Dock may have difficulties defining the limits of the protomol at the ends of the
open channel. Increasing the Bloat value to 1 or 2 is often sufficient in most
cases.
16. Go ahead and generate the protomol based on the ligand’s coordinates.
! Press Generate.
It may take about a minute to generate the protomol, which is then stored in the
file 1kim_H-L-0.50-0-protomol.mol2.
The file takes its name from the protein and the conditions for the protomol
generation: 1kim_H-L-0.50-0.sfxc.
! Enter tk.hits as the file name or use the file browser to locate the
file.
Usage Note: Surflex-Dock expects the ligands to be properly typed and proto-
nated as expected at physiological pH and their geometry to be optimized in any
arbitrary conformation. The most convenient way to achieve this is to press
Prepare to access the Ligand Preparation Setup dialog (see the Prepare
Ligands Manual for details). You may also use Concord to generate the
necessary 3D coordinates in the Surflex-Dock - Details dialog.
20. CScore — In the interest of speed CScore calculation will not be included in
this tutorial. In your own work, if you have access to CScore, you may want to
use additional scoring functions to evaluate the interactions between the ligand
and the receptor. For information see the CScore Details dialog on page 54.
! Surflex-Dock is fast, so you can set this small job to Run in Current
SYBYL Session.
! Enter tk as the job’s Name.
The job’s name will be used to create a directory containing the complete output
for the docking run. All results will be found there.
The docked ligands appear in the list in the center of the dialog. Surflex-Dock
produced multiple poses for each ligand. For each of the docked ligands, the
reported score is that of the pose with the highest score.
All ten ligands were docked successfully. They are listed in descending order of
total score values expressed as -log(Kd).
This means that, when you select a line in the list, the corresponding ligand will
be displayed.
The ligand is displayed as capped sticks. It is the best of 20 docked poses for
TK_ganciclovir.
Note that, by default, only the ligand’s polar hydrogens are displayed.
! In the View pull-down near the top of the Results Browser, select
1kim_H-L-0.50-0-protomol.mol21.
Note: This active site is not used during the run, but is created as a visual aid for
viewing results. The residues in this file are identified as those containing at
least one atom within 2.5 Å of any protomol atom.
! Rotate the combined structures to see how well this docked ligand
fits into and interacts with the active site.
25. Read in the experimental structure of the co-crystallized thymidine, which you
extracted earlier in this tutorial.
26. Make the experimental ligand more visible by rendering it in capped sticks.
27. Hydrogen bond between active and docked ligands are displayed by the Results
Browser. This is not the case for the molecule you just read in. You must
display those manually:
! View > Hydrogen Bonds > Intermolecular
30. Use the buttons to scroll through the docked ligands, one at a time.
! Click to view the highest scoring pose of the first ligand in the
list.
! Click again to view the highest scoring pose of the next ligand.
! Scroll back to the top of the list so that the asterisk is on the TK_
ganciclovir line.
31. Look at multiple docked poses for each ligand. You can do so by combining the
the buttons and the Examine N Poses slider.
! Move the Examine N Poses slider to the right or use the right arrow
to show progressively more docked poses for TK_ganciclovir.
! Click to scroll down the list and view the specified number of
poses for each ligand.
! When you are done, click to undisplay the docked poses for the
ligand being examined and remove the asterisk from the list.
! In the Results Browser, activate the Table radio button above the list.
The docked poses for this and all other ligands were saved in a Multi-Mol2 file
named tk-results.mol2 in the job directory. The docked poses are also stored in
3D SLN format along with all the score values in the companion file,
tk-results.sln. Viewing the results for this ligand as a spreadsheet, named TK_
THYMIDINE, created the matching table file, TK_thymidine.tbl, also in the
job directory.
The spreadsheet contains one row for each of the 20 docked poses. The
numbers1 reported in the Results Browser for each ligand are those of the top-
scoring pose.
34. Save the top pose of the top-scoring ligand for use in a subsequent docking run
later in this tutorial.
! At the bottom of the Results Browser press Save Results.
Most of the default settings in the Save Results dialog will be used. With the
docked ligands sorted by descending total score values, the best pose of the
higher scoring ligand will be saved.
! Set the Save Results dialog as follows:
- First N (Descending Total_Score): automatically set to 1.
- Number of Poses per Ligand: automatically set to 1.
- Toggle on Strip Pose Number from Molecule Name.
- Output Formats: select only Multi-Mol2.
- Output Prefix: type ganciclo_top.
- Press OK.
35. Close the Results Browser and clear the SYBYL screen.
38. Retrieve the Surflex-Dock control file created for the first docking run. This file
includes the names of the Mol2 files containing the prepared protein, the ligand,
and the protomol. All these will be reused.
40. Read in the highest scoring pose produced by the initial docking run. You saved
it in ganciclo_top.mol2 after exploring the results of that run.
! Press Import and retrieve ganciclo_top.mol2.
The ligand is displayed in capped sticks inside the active site. Hydrogen bonds
between the ligand and protein residues are shown as yellow dashed lines, and
those residues are labeled. These visual elements can be toggled on and off
within the dialog.
41. Fragment the ligand. With the Prefer Rings During Fragmentation option
below the list, fragments that include a partial ring structure will be automati-
cally expanded to the full ring during the fragment generation.
A few fragments are listed in the dialog along with the number of atoms they
contain. Not all generated fragments are retained for inclusion in this list.
! Use the buttons to the right of the list to view each of the
fragments and decide which one(s) to use as hints for the next
docking run.
Usage Note: In your own work you may want to use the tools to the right of the
list to create additional fragments.
• —Access the Sketcher where you can modify the selected
fragment’s structure, add hydrogens, and modify its name.
• —Combine two or more selected fragments into a single new
fragment.
• —Filter the selected fragment(s) by the specified SLN string to
create new fragment(s).
• —Remove the selected fragment(s) from the list.
42. Select one fragment to be used as representative of the types of interactions you
want to favor between docked ligands and the active site.
! Clear all selection in the list then select TK_ganciclovir-frag-025,
which is H-bonded to TYR101 and GLN125.
43. Adjust the parameters that will direct the docking with placed fragments. The
most important of these are:
• the penalty associated with the deviation between docked poses and the
placed fragments.
• whether to dock only the ligands that include a substructural match to
the fragments
44. Indicate that only the selected fragment will be used and give a name to the
fragment constraint file.
! Enter tk.hits as the file name or use the file browser to locate the
file.
! Toggle Perform CScore Calculations off if it is active.
The job’s name will be used to create a directory containing the complete output
for the docking run. All results will be found there.
Even though no substructural match to the placed fragment was required, these
three compounds can be oriented, more successfully than the others, to match
the constraining fragment.
! Scroll the list all the way to the right to bring that last two columns
into view.
FragRMSD reports the RMS distance between the constraining fragment and
the matching atoms in the docked pose. A value of -1 indicates no alignment.
49. Display the placed fragment that provided guidance during docking.
! In the upper-right corner of the dialog press Constraints.
Most of the default settings in the Save Results dialog will be used. With the
selected ligands sorted by descending total score values, the best pose of each
ligand will be saved.
The file ciclovir.sln is created, containing the 3D coordinates of the top scoring
optimized pose for each of the ciclovir trio. You will use dock these in the next
docking run: Allow Protein Movement to Accommodate Docked Ligands on
page 28.
50. Close the Results Browser and clear the SYBYL screen.
53. Retrieve the Surflex-Dock control file created for the first docking run. This file
includes the names of the Mol2 files containing the prepared protein and the
protomol. All these will be reused.
! Access the Filename [...] file browser and retrieve 1kim_H-L-0.50-
0.sfxc.
! Make sure that the Constraints option is off.
The job’s name will be used to create a subdirectory containing the complete
output for the docking run. All results will be found there.
The Results Browser provides access to two sets of scores because two consec-
utive Surflex-Dock runs are performed when protein flexibility is involved. You
can swap between the score sets via the Score pull-down menu:
• Base—The scores and poses resulting from a first, standard run.
• Protein Flexibility [PF-re]—The scores and poses resulting from the
second run, which allows protein movement and rescores the docked
poses. This list includes a Pose column because, for any given ligand,
the ranking of poses may be different before and after optimization.
58. To compare the relaxed active site display the original one.
! In the View pull-down near the top of the Results Browser, select tk_
ciclovir_flex_site.mol2.
Two decoy datasets, taken from a publication by Pham & Jain (Ref. 1), are
available for use with the thymidine kinase ligands described in this tutorial.
• $TA_DEMO/surflex/Bissantz_Pham.hits
A set of decoys, described by Bissantz et al (Ref. 2), consist of 990
randomly chosen, non-reactive molecules taken from the Available
Chemicals Directory (ACD) and filtered according to drug likeness as
described by Pham & Jain. The file was filtered to remove duplicates,
reducing the dataset to 851 compounds. This file also includes the 10
thymidine kinase ligands used in the Surflex-Dock tutorial.
• $TA_DEMO/surflex/Zinc_Pham.hits
A set of decoys described by Pham & Jain, that consists of 1000
randomly selected compounds taken from the ZINC (Ref. 3) database
(drug-like subset). The file distributed with SYBYL also includes the 10
thymidine kinase ligands used in the Surflex-Dock tutorial.
If this was a prospective virtual screen, you could have assayed the top 5% of
your compounds (saving a large amount of time and resource) and retrieved
80% of the actives.
References
[1] Pham, T.A.; Jain, A.J. “Parameter Estimation for Scoring Protein-Ligand
Interactions Using Negative Training Data.” J. Med. Chem. 2006, 49,
5856-5868.
[2] Bissantz, C.; Folkers, G.; Rognan, D. “Protein-based virtual screening of
chemical databases. 1. Evaluation of different docking/scoring
combinations.” J. Med. Chem. 2000, 43, 4759-4767.
[3] ZINC, a free database for virtual screening, available for download from
http://zinc.docking.org/
License availability allows you to select the docking engine and additional
features within the dialog: the use of pharmacophore constraints and the
docking of a combinatorial library. See License Requirements for the Docking
Suite on page 7.
Descriptor File
Several docking engines are accessible through this dialog. If you have only
licensed a single docking engine (such as Surflex-Dock), ignore references to
the unlicensed applications.
Ligands
Options
Job Submission
Name Enter the name to use for the subdirectory that will be
created containing the complete output for the docking
run.
Additional Information:
Analyze Docking Results on page 56
Protein Structure
Input Format How to read in the protein structure: Mol2 file, PDB
file or Mol Area.
Protein Source Enter the name of the input file or molecule area con-
taining the protein receptor or use the adjacent browser
to retrieve it.
Prepare Access the Prepare Protein Structure dialog to prepare
the protein structure for docking with Surflex. In partic-
ular, the protein must be protonated at physiological pH
including non-polar hydrogens, and the active site must
not contain any docked ligand.
Protomol Generation
Protomol The name of the Mol2 file containing the protomol con-
sists of the Prefix followed by “-protomol” and a
.mol2 extension.
Edit Press this button to access the SYBYL Sketcher
(described in the SYBYL Basics Manual). You can
then add, delete, or modify the atoms that make up the
protomol.
SFXC File The name of the Surflex control file consists of the
Prefix and a .sfxc extension.
Select from Sub- All the substructures in the protein are sorted by type
structure Lists for easier selection. Names are all preceded by the
name of the chain to which they belong.
Banks of buttons (select all, invert selec-
tion, clear selection) assist in each selection. Clicking a
second time on any item unselects it.
• Residues—List of all the amino acid residues.
• Other—List of the substructures that are not amino
acid residues or water. Typically found here are
ligands and cofactors.
• Water—List of co-crystallized water molecules.
Selection Enter the radius (in Å) surrounding already selected
Radius structure(s). All residues for which at least one atom is
within the selection radius are included and highlighted
in the list. The default radius is 0.1 Å, so that only the
designated substructures are selected.
Hide Unselected By default, only the selected substructures are shown.
These can be protein residues, cofactors, ligands, and
co-crystallized waters.
Label Selected Whether to label the selected substructures.
Pick from Pick one of more substructures directly from the screen
Screen of specify them via the Substructure Expression dialog,
a variant of the Atom Expression dialog where the
smallest unit is a substructure.
Clear All Clear the selection no matter how it was made.
Substructures This field echoes the selection made in the dialog. You
may also edit this field directly.
In the Surflex-Dock - Define SFXC File dialog set the Protomol Generation
Mode to Residues and press the [...] button nearby.
Note: Surflex-Dock uses the residues identifying the active site for the sole
purpose of generating the protomol. The jobname_site.mol2 file displayed in
the Results Browser is created as a visual aid, and its residues are identified
based on the protomol coordinates.
Selection Methods
Ligand Based Select the Mol2 file containing the ligand of interest.
Pick from Pick one of more substructures directly from the screen
Screen of specify them via the Substructure Expression dialog,
a variant of the Atom Expression dialog where the
smallest unit is a substructure.
Select from Sub- All the substructures in the protein are sorted by type
structure Lists for easier selection. Names are all preceded by the
name of the chain to which they belong.
Banks of buttons (select all, invert selection,
clear selection) assist in each selection. Clicking a sec-
ond time on any item unselects it.
• Residues—List of all the amino acid residues.
• Other—List of the substructures that are not amino
acid residues or water. Typically found here are
ligands and cofactors.
• Water—List of co-crystallized water molecules.
Selection Tools
Usage Note: Use this dialog to create a master list of fragments and save them
all in a multi-Mol2 file. In subsequent docking runs, import the master file into
the dialog, select the fragments of immediate interest and include only those in
the fragment constraint file that will be used for more focused docking.
Source
Press Import to retrieve small molecules and fragments in a variety of file
formats. Each imported molecular entity is added to the list below, where you
can display and modify it as needed.
Usage Notes:
• Each fragment that will be used to constrain the docking run must have
its atoms and bonds typed properly. It must share Cartesian coordinate
space with the protein and be oriented to form the desired interactions
with residues in the active site.
• To import a collection of ligands from multiple protein-ligand
complexes you must align the proteins before accessing the Docking
dialog. Backbone, sidechain, and water atoms will be automatically
excluded. All remaining substructures will be imported, coordinates
Visualize
To help in the visualization and selection of placed fragments you may choose
to display:
Fragments
Match Parameters
Require Frag- Whether to dock only the ligands that include a sub-
ment Match structural match to the fragment.
• Off (-fskip)—All ligands are docked. Those that
have no matching fragment are docked in the
normal fashion. This is the default.
• On (+fksip)—Screen the input ligands based on a
specific structural moiety and submit to docking
only those that match.
Include Hydro- Whether to force all hydrogens that exist in the frag-
gens ment to be matched explicitly by the ligand to be
docked.
• Off (-fhmatch)—All fragment hydrogens are
ignored. This is the default.
• On (+fhmatch)—Provide fine user control over the
structural moiety to match by including specific
hydrogens.
Input Options
Results Optimization
General Parameters
Flags
Spin Alignment
Number of Spins Used with the spin alignment method to control the
per Alignment rotation around the normal vector from the observer to
the molecule. Default = 12.
This feature corresponds to the -nspin n option.
Output Options
All poses for all successfully docked ligand are stored in a single Multi-Mol2
file.
Table 1 Parameter sets for the various docking modes in the SYBYL
interface (left column) and at the Surflex-Dock command line.
This dialog is brought up by pressing the Runtime button in the Docking dialog
(in Surflex-Dock mode). It allows you to insert options into the command that
will be executed when you submit the job.
See BioPharmics’ Surflex Manual for descriptions of the command line options.
Warning: No validation will be performed when you close the Runtime Param-
eters dialog. The options you enter will be used “as is.” In addition, runtime
parameters supersede the options specified in the Surflex-Dock - Details dialog.
Parameters The default parameter file to use with the CScore pro-
gram itself ($TA_MOLTABLES/flexx_cscore.par).
Read about this parameter file in the CScore Manual.
Relax Structure Relax the protein/ligand pair, using the various options
in the associated parameter file.
Score Relaxed Generate additional CScore scores for the relaxed pro-
tein/ligand pair. This option is available only if the
Relax Structure box has been checked.
Scoring Func- Toggle on or off the various scoring functions
tions (described in the CScore Manual) to compute additional
scores and create the corresponding columns in the
spreadsheet. The consensus score is then generated
from the combination of these scores and the score
computed by the selected docking engine.
Notes:
• All other SYBYL tools are available via the menubar or command line
while this browser is in operation.
• The rendering and color of molecules displayed via the Results Browser
may be customized. See Customizing the Rendering and Colors of
Displayed Results on page 62.
List Size
Ligand Index This slider determines the position in the total list of the
ligand at the top of the current view. This feature is par-
ticularly useful for large ligand sets when used in com-
bination with the Max Listed slider.
Max Listed This slider determines the number of compounds in the
visible list. The first compound is determined by the
Ligand Index slider.
Scroll through the list, one page at a time. The size of
the visible portion of the list is defined by the Max
Listed slider.
Sliders
Ligand Index The position in the total list of the ligand at the top of
the current view. Use the slider or its < and > buttons to
change this number. This feature is particularly useful
for large ligand sets when used in combination with the
Max Listed slider.
Max Listed The number of compounds in the visible list. The first
compound is determined by the Ligand Index.
Examine N This slider is active only when a docked or aligned
Poses ligand is being examined via the buttons and
only for the Base score set. Move the slider to the right
to display an increasing number of poses for that
ligand. The maximum number of poses per ligand is
specified at run time. Fewer poses may be found for
any of the ligands.
Score Sets
For details of score values refer to Surflex-Dock Values in the Results Browser
on page 63.
Protein Flexibil- The second set of scores resulting from a docking run
ity [PF-re] that includes protein flexibility. The option to Allow
Protein Movement must be specified in the Surflex-
Dock Details dialog before the run.
Selection Action
Molecule When this radio button is active any selection in the list
toggles the display of the selected ligand(s) as capped
sticks. The highest scoring pose is displayed for each
selected ligand.
Table When this radio button is active a selection in the list
opens the spreadsheet of poses associated with each
selected ligand. To close an open spreadsheet, simply
unselect the corresponding line in the results list. This
synchronizes the display status in the dialog with what
is on the screen.
Results List
All column headers above the list can be used to sort the entire results list in
either direction. Move any of the vertical bars to adjust the width of the corre-
sponding columns.
Selection Buttons
Examine Controls
Examine compounds one at a time. Multiple poses of the examined ligand can
be displayed. An asterisk identifies the compound in the list.
Marking Buttons
Marked compounds are flagged with a plus sign (+) in front of their name in the
list. Marking is retained even for compounds that are not in the currently visible
list.
Workflow:
1. With the Molecule radio button active, mark the compounds of interest
using any combination of the following:
• Select any number of compounds in the list to display them and click
to mark them.
• Use to examine compounds one at a time and click to
mark any you find interesting as you go.
2. Click to clear the selection.
3. Click to select all marked ligands in the visible portion of the list.
4. Switch to the Table view above the list.
5. Click again to open the spreadsheets associated with the ligands
marked in the visible portion of the list.
6. When you are done with the spreadsheets, make sure that the view is set to
Table then click to close them all.
Counters
The colors and rendering used to display results via the Results Browser can be
customized through the use of variables.
You may set any of the variables described below in the console (even while the
Results Browser is open). To load your preferences automatically at SYBYL
startup, enter the corresponding lines in your $HOME/sybyl.ini file (sample
sybyl.ini file in the Toolkit Utilities Manual). Any variable not explicitly set
before the first launch of the Results Browser will be set to its default value.
Rendering
You may set the following variables to any of the standard rendering options:
BALLS_ONLY, BALL_AND_STICKS, CAPPED_STICKS (default), SPACEFILL,
STICKS_ONLY. To disable rendering and display the molecules as lines, set the
variable(s) to the single character A (short for ANTIALIASED_LINES).
• FLXANS!RENDERMODE — Used for rendering the displayed molecules.
Default is capped_sticks.
• FLXANS!EXRENDERMODE — Used for rendering the poses of the
examined molecule.
Default is capped_sticks.
Y Y FLXANS!EXAMINECOLOR1 GREEN
N Y FLXANS!EXAMINECOLOR2 GREEN
Y N FLXANS!EXAMINECOLOR3 RED
N N FLXANS!EXAMINECOLOR4 RED
You may set these variables to any of the 24 SYBYL colors (see the Color
Editor in the SYBYL Reference Guide) or to ORIGINAL_COLOR, which by
default is by atom type.
If no other ligands are displayed, the color of the examined ligand is always set
to ORIGINAL_COLOR (which defaults to atom type coloring).
The list itself reports the values for the top-scoring pose for each ligand.
To access the information about additional docked poses click Table above the
list then select a ligand in the list. A spreadsheet will open in which each row is
the structure of a docked pose for that ligand.
For a simple Surflex-Dock run the columns contain the following information:
• Total_Score = The total Surflex-Dock score expressed as -log(Kd).
(See The Surflex-Dock Scoring Function on page 79.) The total score
includes the Crash score [Ref. 3].
• Crash = The degree of inappropriate penetration by the ligand into the
protein and of interpenetration (self-clash) between ligand atoms that are
separated by rotatable bonds. Crash scores close to 0 are favorable.
Negative numbers indicate penetration. The smaller the crash score, the
better Surflex-Dock is at screening out false positives. However, this
may discard true positives that fit tightly in the pocket.
If you specified the structure of the known ligand as reference, you will see an
additional column:
• Similarity = Surflex-Sim similarity [Ref. 11] between the top scoring
pose and the ligand provided as a reference for the Surflex-Dock run.
If placed fragment(s) were used to direct the run you will see additional
columns:
• FragIndex = ID number of the fragment that most closely matches the
docked ligand. The ID number reflects the position of the fragment in
the xxx-frag.mol2 file used in the docking run.
• FragRMSD = RMS distance between the docked ligand and the
reported fragment.
If you allowed protein movement, another set of scores is available via the
Score pull-down above the list: Protein Flexibility [PF-re]. The list is repop-
ulated with new values and the following additions:
• Pose = Indication of which pose in the initial run has the best score
after optimization with protein flexibility.
• Strain = Nominal ligand strain relative to the nearby local minimum in
units of pKd.
• Total = Ligand’s score corrected for strain energy.
• Ligmin = Energy of the nearby ligand minimum (kcal/mol).
• Full = Absolute energy of the optimized ligand including protein inter-
action (kcal/mol).
• Complex = Absolute energy of the complex including ligand, protein
pocket, and intermolecular interactions (kcal/mol).
• Cscale = Scaled complex score that normalizes the protein score
components so that ligand poses that contact different numbers of
protein atoms are more directly comparable.
• Pmove = Average movement of the protein atoms in the pocket for this
pose.
If a CScore calculation was performed at the end of the run you will see
additional columns. These are accessible only in the ligand spreadsheets and
they are computed only for the Base score set.
• x_Score = one column for each requested scoring function.
Fragment List List of all the placed fragments submitted to the dock-
ing run (jobname-constraints.mol2 in the job direc-
tory). Fragments selected in the list are displayed in
capped sticks.
Buttons Navigation and selection buttons to assist in the selec-
tion and display of constraining fragments.
Fragment Color Specify the color style for the displayed fragments. The
Spectrum scheme is particularly useful when includ-
ing multiple fragments in the display of the docking
results.
Multi-Volume Surface
The surface style is determined by Tailor variable CONTOUR DISPLAY_AS.
Hydrogens
Display H-bonds Toggle this check box to turn on and off the display of
H-bonds between displayed site and ligand(s). This fea-
ture is useful when examining docking results. It uses
the Monitor Hydrogen Bonds functionality described in
the Graphics Manual.
Display Toggle this check box to turn on and off the display of
Non-Polar H non-polar hydrogens. By default (off), only polar
hydrogens (those connected to potential H-bond accep-
tors or donors) are displayed.
Topomer Search
Access:
In the Results Browser press Visualization.
In the Visualization Options dialog, press Advanced.
Expression Tools Click the buttons and select molecule areas from the list
to form an expression in the field below. Alternatively,
you can type the expression directly into the field.
Briefly, the buttons are:
• ( ) Open and close parentheses for grouping objects
and operations.
• + Union
• - Difference
• & Intersection
• ` Negation
• Del to remove the last button selection from the
expression.
• Clr to clear the entire expression field.
Volume Color Select the color for the resulting volume surface.
OK The surface is displayed as an independent background
in D1 and may be deleted via the Delete button in the
Visualization Options dialog. The surface’s style is
determined by Tailor variable CONTOUR DISPLAY_AS.
Save Mode
Output
Output Formats Select one or more format(s) for the saved compounds:
• MDB—A SYBYL database of individual .mol2
files.
• SLN File—A file in SLN format (.sln) that includes
all the scores.
• SD File—A file in MDL data format (.sdf) that
includes all the scores.
• Multi-Mol2—A single .mol2 file containing all the
saved compounds.
• Spreadsheet—A spreadsheet of the selected
compounds with their scores. The corresponding
table file (.tbl) is created automatically.
Output Prefix The text string to use to name the file(s) containing the
saved results. The files will be created in the current
working directory.
This text string must start with an alphabetic character
and may contain digits and underscores (_) in any posi-
tion after the first. All other characters will be ignored.
Helper Files
CScore
Acknowledgments
Surflex-Dock was developed by Prof. Ajay N. Jain, University of California San
Francisco (UCSF) and BioPharmics LLC.
Licensing:
License Requirements for the Docking Suite on page 7.
Access:
Documentation:
BioPharmics’ Surflex Manual.
4. Given that each probe’s score represents that probe’s contribution to the
protein-ligand binding affinity, clusters of high scoring probes identify the
“stickiest” parts of the protein’s surface. The algorithm searching for sticky
spots is biased toward hydrophobic regions in the interior of the protein
because many protein-ligand complexes involve a receptor pocket, and the
binding affinity is often due in large part to hydrophobic interactions.
5. Before regrouping the sticky spots into a pocket, it is necessary to eliminate
disconnected sticky spots, thereby avoiding disconnected pockets. Spheres
are placed on a 1 Å cubical grid. Each sphere grows until it reaches the van
der Waals surface of a protein atom. Spheres with radii less than 0.5 Å are
discarded. The remaining set of protein-free spheres approximates a
negative image of the protein.
6. The sticky spots are merged into a pocket, the protomol, through a process
of accretion on the set of protein-free spheres. The final dimension of the
protomol is biased toward a size that can accommodate a small ligand.
For each ligand rotatable bonds are identified as all single or amide, acyclic,
non-terminal bonds. Surflex-Dock’s treatment of ring flexibility is optional and
uses a small, generic library of 5, 6, or 7-membered rings.
The list of atom pairs of interest is established by pruning out all protein-ligand
atom pairs for which the distance between their van der Waals surfaces is
greater than 2 Å. Each atom in the remaining protein-ligand pairs is labeled as
being non-polar (e.g. H in CH3) or polar (e.g. H in N-H or O in C=O). Each
polar atom is also assigned a charge.
Atomic Charges
Surflex-Dock does not use atomic charges. Atoms are assigned to be polar or
non-polar. Polar atoms are assigned a “charge” that reflects their hydrogen bond
ability.
It is, therefore, very important to pass Surflex the molecules in the protonation
state you think is relevant at biological pH, including non-polar hydrogens.
[5] Jain, A.N. “Surflex: Fully Automatic Flexible Molecular Docking Using
a Molecular Similarity-Based Search Engine” J. Med. Chem. 2003, 46,
499-511.
[6] Ruppert, J.; Welch, W.; Jain, A.N. “Automatic identification and
representation of protein binding sites for molecular docking” Protein
Sci. 1997, 6, 524-33.
[7] Jain, A.N. “Scoring noncovalent protein-ligand interactions: A
continuous differentiable function tuned to compute binding affinities”
J. Comput. Aided Mol. Des. 1996, 10, 427-40.
[8] Kellenberger, E.; Rodrigo, J.; Muller, P.; Rognan, D. “Comparative
Evaluation of Eight Docking Tools for Docking and Virtual Screening
Accuracy” PROTEINS: Structure, Function, and Bioinformatics 2004,
57, 225-242.
[9] Krier, M.; de Araújo-Júnior, J.X.; Schmitt, M.; Duranton, J.; Justiano-
Basaran, H.; Lugnier, C.; Bourguignon, J-J.; Rognan, D. “Design of
Small-Sized Libraries by Combinatorial Assembly of Linkers and
Functional Groups to a Given Scaffold: Application to the Structure-
Based Optimization of a Phosphodiesterase 4 Inhibitor” J. Med. Chem.
2005, 48, 3816-3822.
[10] Bissantz, C.; Folkers, G.; Rognan, D. “Protein-based virtual screening of
chemical databases. 1. Evaluation of different docking/scoring
combinations” J. Med. Chem. 2000, 43, 4759-4767.
[11] Jain, A. N. “Morphological similarity: A 3D molecular similarity
method correlated with protein-ligand recognition” J. Comp.-Aided Mol.
Des., 14, 199-213 (2000).
[12] More publications at: http://www.biopharmics.com/publications.html
[13] BioPharmics’ Surflex Manual: Docking and Similarity.
A M
Active site MOLCAD
defining 40 channels
to find active site 37
B
Bibliography
P
Surflex-Dock 80 Protomol 77
generate 37
C
Concord
R
3D coordinate generation References
Surflex-Dock 48 Surflex-Dock 80
Constraints Results
Surflex-Dock placed fragments 42 saving 69
CScore Results Browser 56
details 54 customizing 62
D S
Docking Saving
CScore details 54 docking results 69
dialogs 31 similarity results 69
files created Similarity
Surflex-Dock 72 save results 69
introduction 5
main dialog 32 Suggested reading
results browser 56 Surflex-Dock 80
save results 69 Surflex-Dock 75
Surflex-Dock 75 active site creation 40
docking procedure 78 atomic charges 80
references 80 charges 80
scoring function 79 columns in the results browser 63
tutorials 9 command line 76
theory details 47
Surflex-Dock 77 docking procedure 78
tutorials files created 72
Surflex-Dock 9 graphical user interface 31
introduction 5
license requirements 7
F ligand preparation 80
Files created protein flexibility 48
Surflex-Dock 72 protomol 77
references 80
results browser 56
L runtime parameters 53
License requirements scoring function 79
Docking 7 substructure removal 39
Surflex-Dock 7 theory 77
tutorial 9
U
UIMS variables
results browser 62
V
Variables
results browser 62
Visualization options 65