Bioluminate User Manual
Bioluminate User Manual
Bioluminate User Manual
BioLuminate 1.9
User Manual
Schrdinger Press
BioLuminate User Manual Copyright 2015 Schrdinger, LLC. All rights reserved.
While care has been taken in the preparation of this publication, Schrdinger
assumes no responsibility for errors or omissions, or for damages resulting from
the use of the information contained herein.
Schrdinger software includes software and libraries provided by third parties. For
details of the copyrights, and terms and conditions associated with such included
third party software, use your browser to open third_party_legal.html, which is in
the docs folder of your Schrdinger software installation.
This publication may refer to other third party software not included in or with
Schrdinger software ("such other third party software"), and provide links to third
party Web sites ("linked sites"). References to such other third party software or
linked sites do not constitute an endorsement by Schrdinger, LLC or its affiliates.
Use of such other third party software and linked sites may be subject to third
party license agreements and fees. Schrdinger, LLC and its affiliates have no
responsibility or liability, directly or indirectly, for such other third party software
and linked sites, or for damage resulting from the use thereof. Any warranties that
we make regarding Schrdinger products and services do not apply to such other
third party software or linked sites, or to the interaction between, or
interoperability of, Schrdinger products and services and such other third party
software.
May 2015
Contents
Document Conventions ..................................................................................................... ix
15.8 Running Residue Scanning from the Command Line .................................. 112
16.3 Relaxing the Structure Around the Mutation Site ......................................... 121
In addition to the use of italics for names of documents, the font conventions that are used in
this document are summarized in the table below.
Sans serif Project Table Names of GUI features, such as panels, menus,
menu items, buttons, and labels
Monospace $SCHRODINGER/maestro File names, directory names, commands, envi-
ronment variables, command input and output
Italic filename Text that the user must replace with a value
Sans serif CTRL+H Keyboard keys
uppercase
Links to other locations in the current document or to other PDF documents are colored like
this: Document Conventions.
In descriptions of command syntax, the following UNIX conventions are used: braces { }
enclose a choice of required items, square brackets [ ] enclose optional items, and the bar
symbol | separates items in a list from which one item must be chosen. Lines of command
syntax that wrap should be interpreted as a single command.
File name, path, and environment variable syntax is generally given with the UNIX conven-
tions. To obtain the Windows conventions, replace the forward slash / with the backslash \ in
path or directory names, and replace the $ at the beginning of an environment variable with a %
at each end. For example, $SCHRODINGER/maestro becomes %SCHRODINGER%\maestro.
Keyboard references are given in the Windows convention by default, with Mac equivalents in
parentheses, for example CTRL+H (H). Where Mac equivalents are not given, COMMAND
should be read in place of CTRL. The convention CTRL-H is not used.
In this document, to type text means to type the required text in the specified location, and to
enter text means to type the required text, then press the ENTER key.
References to literature sources are given in square brackets, like this: [10].
Chapter 1
Chapter 1: Introduction
This manual documents the unique tools and capabilities of BioLuminate, and provides refer-
ences to other documents for the related tools. A brief description of the tool set is given below,
with links to the relevant parts of this manual or of other manuals. The descriptions are classi-
fied by function. These tools are divided between the Tools menu, where the action does not
take much time and may be interactive, and the Tasks menu, where a job may need to be run
that takes a larger amount of time.
For a tutorial introduction to BioLuminate features, see the BioLuminate Quick Start Guide.
Protein Structure Quality Viewer (Tools Protein Structure Quality): Show reports on
deviations of protein parameters from standard values, in graphical and tabular form. See
Chapter 3.
Residue Analysis (Tasks Residue Analysis): Calculate energetic and other properties
of residues. See Chapter 4.
Consensus Visualization (Tools Protein Consensus Viewer): Locate consensus waters,
counter ions and ligands in a set of homologs to a reference protein. See Chapter 5.
Reactive Protein Residues (Tools Reactive Residue Identification): Identify residues
that are prone to specified reactions, by matching sequence patterns and some structural
information. See Chapter 6.
Aggregation Surface (Tasks Aggregation Surface): Predict regions on a protein surface
that have a propensity for aggregation. See Chapter 7.
Protein Interaction Analysis (Tasks Protein Interaction Analysis): Analyze the interac-
tions at the interface of two proteins. See Chapter 8.
Low Mode Vibrational Sampling (Tasks Low Normal Mode Analysis): Locate and visu-
alize large-scale vibrational motions in a protein. See Chapter 9.
SiteMap (Tools Binding Site Identification): Locate druggable sites on a protein. See the
SiteMap User Manual.
Protein Preparation Wizard (Tools Protein Preparation): Prepare proteins for modeling
by assigning bonds, fixing structural defects, removing unwanted parts, assigning proton-
ation and tautomeric states, and refining the structure. See the Protein Preparation Guide.
Simple Homology Modeling (Tasks Simple Homology Modeling): Predict the structure
of proteins using homology modeling, where the homology is high and the alignment of
the query and the template is straightforward. See Chapter 12.
Structure Prediction (Tasks Advanced Homology Modeling): Predict the structure of
single-chain or multi-chain proteins, including multimers, by homology modeling. See
Chapter 3 through Chapter 5 of the Prime User Manual.
Refinement (Tasks Loop + Sidechain Prediction or Tasks Implicit Solvent Refine-
ment + Analysis): Refine protein structures by performing predictions of selected side
chains or loops, or minimizations of various parts of protein structures. See Chapter 6 of
the Prime User Manual.
Peptide Helicity (Tasks Peptide Alpha Helicity): Predict the stability of alpha helices for
small peptide sequences, using molecular dynamics.
Peptide Docking (Tasks Peptide Docking): Dock peptides to a receptor, starting from
the sequence. The receptor is largely rigid, the conformational space of the peptide is
explored.
Peptide QSAR (Tasks Peptide QSAR): Predict properties of small peptides using a
QSPR (sequence-property) model based on peptide descriptors.
Align Binding Sites (Tools Binding Site Alignment): Align the sites on a set of proteins
at which drug-like molecules can bind. See Chapter 7 of the Prime User Manual.
Protein Structure Alignment (Tools Protein Structure Alignment): Structurally align two
or more proteins, using secondary structure information as well as coordinates. See
Chapter 7 of the Prime User Manual.
Superposition (Tools Superposition): Align two or more structures by minimizing the
RMSD of a selected set of atoms. See Section 10.3 of the Maestro User Manual.
Protein-Protein Docking (Tasks Protein-Protein Docking): Predict how two proteins
interact, using a rigid body search algorithm. See Chapter 11.
Linux:
To run any Schrdinger program on a Linux platform, or start a Schrdinger job on a remote
host from a Linux platform, you must first set the SCHRODINGER environment variable to the
installation directory for your Schrdinger software. To set this variable, enter the following
command at a shell prompt:
Once you have set the SCHRODINGER environment variable, you can run programs and utilities
with the following commands:
$SCHRODINGER/program &
$SCHRODINGER/utilities/utility &
You can start the BioLuminate interface with the following command:
Windows:
The primary way of running Schrdinger applications on a Windows platform is from a graph-
ical interface. To start the BioLuminate interface, double-click on the BioLuminate icon, on a
Maestro project, or on a structure file; or choose Start All Programs Schrodinger-2015-2
BioLuminate. You do not need to make any settings before starting BioLuminate or running
programs. The default working directory is the Schrodinger folder in your Documents folder.
If you want to run applications from the command line, you can do so in one of the shells that
are provided with the installation and have the Schrdinger environment set up:
Mac:
The primary way of running Schrdinger software on a Mac is from a graphical interface. To
start the BioLuminate interface, click its icon on the dock. If there is no BioLuminate icon on
the dock, you can put one there by dragging it from the SchrodingerSuite2015-2 folder in your
Applications folder. This folder contains icons for all the available interfaces. The default
working directory is the Schrodinger folder in your Documents folder ($HOME/Documents/
Schrodinger).
Running software from the command line is similar to Linuxopen a terminal window and
run the program. You can also start BioLuminate from the command line in the same way as on
Linux. The default working directory is then the directory from which you start BioLuminate.
You do not need to set the SCHRODINGER environment variable, as this is set in your default
environment on installation. To set other variables, on OS X 10.7 use the command
You can start a job immediately by clicking Run. The job is run on the currently selected host
with the current job settings and the job name in the Job name text box. If you want to change
the job name, you can edit it in the text box before starting the job. Details of the job settings
are reported in the status bar, which is below the Job toolbar.
If you want to change the job settings, such as the host on which to run the job and the number
of processors to use, click the Settings button. (You can also click the arrow next to the button
and choose Job Settings from the menu that is displayed.)
You can then make the settings in the Job Settings dialog box, and choose to just save the
settings by clicking OK, or save the settings and start the job by clicking Run. These settings
apply only to jobs that are started from the current panel.
If you want to save the input files for the job but not run it, click the Settings button and choose
Write. A dialog box opens in which you can provide the job name, which is used to name the
files. The files are written to the current working directory.
The Settings button also allows you to change the panel settings. You can choose Read, to read
settings from an input file for the job and apply them to the panel, or you can choose Reset
Panel to reset all the panel settings to their default values.
You can also set preferences for all jobs and how the interface interacts with the job at various
stages. This is done in the Preferences panel, which you can open at the Jobs section by
choosing Preferences from the Settings button menu.
Note: The items present on the Settings menu can vary with the application. The descriptions
above cover all of the items. Jaguar has an Edit item and extra functions for the Read
and Write items, which are described later in the manual.
The icon on the Job Status button shows the status of jobs for the application that belong to the
current project. It starts spinning when the first job is successfully launched, and stops spinning
when the last job finishes. It changes to an exclamation point if a job is not launched success-
fully.
Clicking the button shows a small job status window that lists the job name and status for all
active jobs submitted for the application from the current project, and a summary message at
the bottom. The rows are colored according to the status: yellow for submitted, green for
launched, running, or finished, red for incorporated, died, or killed. You can double-click on a
row to open the Monitor panel and monitor the job, or click the Monitor button to open the
Monitor panel and close the job status window. The job status is updated while the window is
open. If a job finishes while the window is open, the job remains displayed but with the new
status. Click anywhere outside the window to close it.
Jobs are run under the Job Control facility, which manages the details of starting the job, trans-
ferring files, checking on status, and so on. For more information about this facility and how it
operates, as well as details of the Job Settings dialog box, see the Job Control Guide.
Chapter 2
The BioLuminate interface is a customized form of the Maestro interface that is specially
designed for biologics use. It inherits most of the capabilities of the Maestro interface (though
organized differently), and it has features of its own.
This chapter focuses on the features that are unique to BioLuminate. Summaries of the main
Maestro features are given, with references to the Maestro User Manual for details. If you have
never used Maestro, you should be able to gain a basic understanding of its operation from this
chapter.
If you prefer to use the standard Maestro interface, you can do so. Most of the capabilities of
BioLuminate are available from the BioLuminate submenu of the Applications menu, and some
of them are on the Tools menu.
Menu bar. This is at the top of the window on Linux and Windows, and is the menu bar
on the Mac.
Manager toolbar. This toolbar is just below the menu bar on Linux and Windows, and at
the top of the window on the Mac. Each label on this toolbar displays or hides another
toolbar. By default they are all hidden, as much of their function is available in the Toggle
Table. See Section 2.4 of the Maestro User Manual for details of the toolbars.
Toggle Table. This dockable panel is displayed on the right side of the main window. You
can undock it from the main window and redock it with the docking button.
Many other panels are also dockable. You can change the docking behavior in the Prefer-
ences panel (Edit Settings Preferences, or CTRL+,)
Workspace. This is the large black area that occupies the main part of the main window.
It is where structures are displayed, along with any associated objects such as surfaces
and text labels.
Status bar. This bar is below the Workspace. At the left is a button that displays informa-
tion on what jobs are running, which you can click to open the Monitor panel for detailed
information on your jobs. When the pointer is not over an atom in the Workspace, the sta-
tus bar gives information on the contents of the Workspace. When the pointer is over an
atom, the status bar gives information on the identity of the atom. For more information,
see Section 2.5 of the Maestro User Manual.
Auto-help. This orange-yellow bar at the bottom of the main window gives tips on the
current action that can be performed in the Workspace.
There are several other components of the main window that can be displayed when needed, by
choosing Edit Settings and then choosing the component. These components include the
Sequence Viewer, which displays the sequences of proteins that are in the Workspace; the Find
toolbar (also opened with CTRL+F, F), which you can use to find structural components in
the Workspace, like chains or residues; and the Clipping Planes window, which shows a top
view of the Workspace and the planes where the structures in the Workspace are clipped for
display.
To add structures to the project, you can import them from an external source, such as a file or
the Protein Data Bank (PDB). To import structures from a file, choose File Import Struc-
tures. To get structures directly from the PDB (either from a local copy or from the web),
choose File Get PDB. Details of both of these methods for importing can be found in
Section 3.1 of the Maestro User Manual.
To see a list of all the entries in the project, you can open the Project Table panel with CTRL+T
(T) or Window Show Project Table. This panel lists the entries in the project with their
properties, and provides ways of doing actions on the entries and their properties, such as
management tasks, sorting, grouping, plotting, import and export of entries and properties. Full
details of the operation of the Project Table can be found in Chapter 9 of the Maestro User
Manual. The menu organization in the Project Table panel in the BioLuminate interface is a
little different from that in the standard Maestro interface: the Table menu in the standard inter-
face is split between the Table menu and the Tools menu in the BioLuminate interface.
When the toggle table is displayed, a set of shortcut (or context) menus is also available in the
Workspace, which you open by right-clicking. These menus offer the same functions as the
toggles.
The interaction with the Workspace provided by the Toggle Table panel is very similar to the
operation of PyMOL. If you are familiar with PyMOL, this interface should be easy to learn. If
you are familiar with Maestro but not with PyMOL, you can close the Toggle Table panel and
use the standard Maestro interaction with the Workspace.
The features of the Toggle Table panel are described in detail in the following subsections.
Note: The terminology used in the Toggle Table panel is the PyMOL terminology, which is
somewhat different from that used in standard Maestro.
ResetReset the view to the default view, in which the view axes are aligned with the
coordinate axes of the structure.
ZoomChange the view of the Workspace so that all atoms fit inside the Workspace
area.
OrientOrient the Workspace structures by translating and rotating them so that the cen-
ter of mass is at the origin, the largest principal axis lies on the x axis, and the second-
largest principal axis lies on the y axis.
Note: This operation changes the coordinates of the structures, not just the coordinates
of the view.
DrawSave an image of the Workspace to a file in TIFF, JPEG, or PNG format. (Same as
File Save Image.)
The All row: Actions taken in this row apply to all entries in the Toggle Table.
Entry rows: These rows apply to a single project entry that is currently in the Workspace.
The name of the row is the Title property of the entry. Entry rows are deleted when the
entry is excluded from the Workspace, and a new row is added when an entry is added to
the Workspace.
Selection rows: When a group of atoms is selected, a selection row is added to the table.
Actions in this row apply to the group of atoms that was selected to create this row and
even apply after those atoms are no longer selected. The selection row named (Selection)
always refers to the most recent group of selected atoms. If a new selection is made while
a selection row is active, that row now refers to the new set of selected atoms. Only one
selection can be active at a time.
Selection rows can be renamed: renamed selection rows always refer to the same set of
atoms regardless of subsequent Workspace selections. To rename a selection row, choose
A Rename Selection. Selection row names are always enclosed in parentheses.
Selection rows are deleted when any atoms they refer to are removed from the Work-
space. To delete a selection row, choose A Delete Selection.
Using the A button submenus, selection rows can also be duplicated, copied, and
extracted to define a new entity that is independent of the objects from which the selec-
tion was originally derived.
Some operations or menu items change based on whether they are being applied to the All row,
an entry row, or a selection row. The descriptions below primarily describe the behavior for
entry rows. When a behavior changes for the All row or a selection row, the change is noted
after the description.
Clicking the name of the All row or an entry row changes the visibility of that object in the
Workspace. When the object's visibility is off, the name is dimmed. This is a quick way of
showing or hiding the atoms in entries. Hiding atoms does not remove them from the Work-
space, so any action taken on the entry or the entire Workspace applies to the hidden atoms as
well as the visible atoms.
For instance, if the Workspace contains ten entries and only one entry is visible, choosing
Clean from the A menu in the All row operates on all ten structures. This may take a consider-
able amount of time to complete and lead to unexpected results. Similarly, if a panel imports
structures from the Workspace, it imports all ten structures rather than just the visible structure.
Clicking the name of the current selection row deselects the selected atoms, but the selection
remains defined. Clicking the name of any other selection row selects the atom group that the
row refers to, and any currently selected atoms that are not part of this atom group are dese-
lected.
ZoomChange the view of the Workspace so that all the atoms in the structure fit inside
and fill the Workspace area. In the All row, this action is equivalent to clicking the Zoom
button at the top of the panel.
OrientOrient the structure by translating and rotating it so that the center of mass is at
the origin, the largest principal axis lies on the x axis, and the second-largest principal
axis lies on the y axis. When you apply this operation to the selection, it is the center of
mass and principal axes of the selection that are used, but the entire structure is reori-
ented.
Note: This operation changes the coordinates of the structures, not just the coordinates
of the view (the camera angle).
CenterCenter the structure in the Workspace, by translating the structure so that its
centroid is at the center of the Workspace.
OriginSet the center around which rotation is performed to the centroid of the structure.
The centroid need not be at the Workspace origin.
SimpleShow proteins as ribbons (C alpha trace) colored by chain, ligands and bound
receptor as sticks, and solvent, disulfides and ions as lines. Atom colors are not changed.
Simple (no solvent)Same as Simple, but no waters are shown.
Ball and StickShow atoms and bonds in ball and stick, with no protein ribbons.
B-FactorShow the protein as tubes with residues colored by the B-factors of the resi-
dues, in a relative scheme that ranges from shades of blue, through green and yellow to
red.
TechnicalShow atoms as sticks, colored with rainbow colors by residue position, and
show polar contacts (hydrogen bonds) as yellow dotted lines.
LigandShow proteins as ribbons colored with rainbow colors by residue position, and
ligands as lines. All protein atoms within 5 of the ligand are shown as lines, with car-
bons colored with rainbow colors by residue position. Waters are shown as sticks, polar
contacts are shown as yellow dotted lines, and the view is zoomed in to the atoms shown.
Ligand SitesThis submenu shows variations on the Ligand preset that alter the way the
protein or region around the ligand is shown:
CartoonShow proteins as cartoons rather than ribbons.
Solid surfaceShow the molecular surface of the protein around the ligand as an
opaque surface, colored by the nearest non-hydrogen atom.
Transparent surfaceShow the molecular surface of the protein around the ligand
as a semi-transparent surface, colored by the nearest non-hydrogen atom; show
atoms and bonds as sticks.
Dot surfaceShow the molecular surface of the protein around the ligand as a dot
surface, colored by the nearest non-hydrogen atom; show atoms and bonds as sticks.
Mesh surfaceShow the molecular surface of the protein around the ligand as a
mesh surface, colored by the nearest non-hydrogen atom; show atoms and bonds as
sticks.
PrettyShow proteins as cartoons colored with rainbow colors by residue position and
ligands as sticks.
Pretty (with solvent)Same as Pretty but waters are shown as ball and stick.
PublicationSame as Pretty, but protein helices are two-sided.
Publication (with solvent)Same as Pretty (with solvent), but protein helices are two-
sided.
Protein InterfaceColor ribbons and carbons by chain, show anything not at a protein
interface as cartoon ribbons, and interface residues as ball and stick. Non-carbon atoms
retain their previous coloring. Interface residues are residues in a chain with more than
300 atoms that are within 4.5 of another chain with more than 300 atoms.
AntibodyShow everything as cartoon ribbons colored by antibody structure. The light
chain is colored in red hues, the heavy chain is colored in blue hues, and everything else
is colored green. Constant regions are dark hues and the CDR regions are bright hues.
The light chain L1-L3 loops are shaded orange to brown, while the heavy chain H1-H3
are shaded grey-blue to cyan.
DefaultShow everything as lines with default colors (colored by element with green
carbons).
Within Object
Involving Side Chains
Involving Solvent
Excluding Solvent
Excluding Main Chain
Excluding Intra-Main Chain
Just Intra-Side Chain
Just Intra-Main Chain
To Other Atoms In Entry
To Other Atoms In Entry Excluding Solvent
To Any Atoms
To Any Atoms Excluding Solvent
Each menu choice clears any previous choice before applying the new choice.
If you want more flexibility in choosing the atom groups between which polar contacts are
shown, you can use the Non-Bonded Interactions panel. Choose Style H-Bonds and Halogen
Bonds from the menu bar at the top of the Workspace to open this panel. You can also use the
HBonds toolbar button on the Measurements toolbar.
AllAll atoms.
PolymerBackbone and side-chain atoms.
OrganicLigand atoms.
SolventWater atoms.
Surface ResiduesAll residues with solvent-exposed surface area greater than 10 2.
Protein InterfaceResidues in a chain of more than 300 atoms that are within 4.5 of
another chain of more than 300 atoms.
Atom selections can also be generated by picking atom groups in the Workspace. To set the
kind of atom group you want to pick, choose Edit Pick Mode then the atom group name
(Atoms, Residues, Chains, Molecules, or Entries), using the menu bar at the top of the Work-
space. You can also set the mode by typing the first letter of the name when the pointer is in the
Workspace. To pick an atom group, click on an atom in the Workspace that belongs to the
group. You can see information about the atom in the Status bar when you pause the pointer
over the atom.
The Symmetry Mates submenu items control the cutoff distance from the original structure for
which symmetry mate atoms should be displayed. Note that no matter how far the cutoff is
placed from the original structure, only the nearest-neighbor mates are created and shown. The
symmetry mates are created as separate, temporary entries (scratch entries) and are shown in
the toggle table. You can remove the symmetry mates by choosing Generate Symmetry
Mates Show None.
The Modify actions all have a choice of atom groups to which they apply.
Aroundselect all atoms or residues within a given distance from the current set of
atoms, and deselect the current set of atoms. The distance is chosen from the submenu,
and can encompass atoms only or be filled to entire residues that have any atoms within
the chosen distance.
ExpandExpand the current selection to include all atoms or residues within a given dis-
tance from the current set of atoms. The distance is chosen from the submenu, and can
encompass atoms only or be filled to entire residues that have any atoms within the cho-
sen distance.
ExtendExpand the current selection to include all atoms or residues within a given
number of bonds from the current set of atoms.
InvertDeselect the current set of atoms and select all other atoms within a given atom
group. The atom groups can be chosen from the submenu:
Within ObjectsAll atoms in the entry that are not part of the selection are selected.
Within ChainsIn each chain that contains selected atoms, the unselected atoms are
selected, and the selected atoms are deselected.
Within ResiduesIn each residue that contains selected atoms, the unselected
atoms are selected, and the selected atoms are deselected.
Within MoleculesIn each molecule that contains selected atoms, the unselected
atoms are selected, and the selected atoms are deselected.
Within AnyAll atoms in the entire Workspace that are not part of the selection are
selected.
CompleteAdd all other atoms within a given atom group to the selection. The atom
groups can be chosen from the submenu.
ResiduesIn each residue that contains selected atoms, all atoms are selected.
ChainsIn each chain that contains selected atoms, all atoms are selected.
ObjectsIn each entry that contains selected atoms, all atoms are selected.
MoleculesIn each molecule that contains selected atoms, all atoms are selected.
C-alphasAll alpha carbons for residues within the selection are selected. All other
atoms are deselected.
Restrict toReduce the selection to only those atoms within a specific group that are cur-
rently selected. The available atom groups are:
ObjectRestrict the selection to atoms in a specific entry, chosen from the sub-
menu.
SelectionRestrict the selection to atoms in a specific selection row, chosen from
the submenu.
VisibleRestrict the selection to atoms that are visible.
PolymerRestrict the selection to backbone and side-chain atoms.
OrganicRestrict the selection to ligand atoms.
SolventRestrict the selection to water atoms.
InorganicRestrict the selection to atoms other than H, C, N, O, F, P, S, Cl, Br, or I.
IncludeInclude additional atom groups in the current selection. The available atom
groups are:
ObjectInclude atoms in a specific entry, chosen from the submenu.
SelectionInclude atoms in a specific selection row, chosen from the submenu.
VisibleInclude all visible Workspace atoms.
ExcludeExclude specific atoms from the selection. The available atom groups are:
ObjectExclude atoms in a specific entry, chosen from the submenu.
SelectionExclude atoms in a specific selection row, chosen from the submenu.
VisibleExclude atoms that are visible.
PolymerExclude backbone and side-chain atoms.
OrganicExclude ligand atoms.
SolventExclude water atoms.
InorganicExclude atoms with element other than H, C, N, O, F, P, S, Cl, Br, or I.
For selection rows Delete Selection just removes the row from the Toggle Table. The atoms in
this selection group are deselected but otherwise remain unaltered.
You can change the name of a selection row with the Rename Selection action. If you do this
for the default selection row, the selection is preserved for future use as a named selection. You
can also rename named selection.
For entry rows, the action creates a new project entry below the entry that is the duplicate of the
entry. Both structures remain in the Workspace and are listed in the Toggle Table.
For selection rows, this action creates a duplicate selection row in the Toggle Table. No project
entry is created. This can be useful if you want to use a selection as the basis for another selec-
tion.
To clear the Workspace entirely, choose Remove Everything from Workspace in the All row.
To create a project entry by removing atoms from entries and placing them into a new entry,
you can choose Extract to New Project Entry in a selection row. The new project entry contains
the atoms in the selection, and the atoms are deleted from their current structure.
You can also add hydrogens from the main menu bar with Edit Add Hydrogens or from the
Edit toolbar.
The computed properties are displayed in a window that opens, and you can copy and paste the
text. They are not stored in the Project Table.
These items appear on both the Show menu itself and on the As submenu. When you choose
from the Show menu, the representation is added to the display. When you choose from the As
submenu, the previous representation is replaced by the new choice.
For instance, a residue shown with lines and cartoon ribbons is shown as only ball and stick if
S As Ball and Stick is chosen, but is shown as cartoon and ball and stick if S Ball and
Stick is chosen.
Two commands on the menu for entry rows and the All row create and display molecular
surfaces. The surface is created if it does not exist, otherwise the color and representation of
the surface is changed.
EverythingHide all features: atomic, ribbon, and surface representations and labels.
AtomsHide atoms and bonds.
RibbonHide all ribbon representations.
CartoonHide all ribbon representations.
LabelHide labels. The labels remain defined and can be redisplayed.
Nonbonded AtomsHide atoms with no attachments, such as Cl ions.
MeshHide surfaces.
SurfaceHide surfaces.
Main ChainHide backbone atoms.
Side ChainHide side-chain atoms.
WatersHide water atoms.
HydrogensHide nonpolar or all hydrogens, as chosen from the submenu.
Symmetry MatesRemoves all crystal symmetry mates from the Workspace. This is a
Workspace setting, so affects all symmetry mates in the Workspace.
Polar ContactsRemove polar contact (hydrogen bond) markers.
All OthersHide everything for all atoms in the Workspace other than the atoms defined
in the row.
Labels can be cleared from the Workspace with L Clear. They can be hidden with H
Label. If they are hidden, they can be redisplayed with S Label, while if they are cleared,
they need to be created (usually by other Label menu commands) before they can be shown
again.
ResiduesLabel the first carbon atom in each residue with the three-letter PDB code and
residue number.
ChainsLabel the first and last residue in each chain with the chain name.
The next set of commands offers a choice of the label content, including identifiers and
numeric properties.
Other PropertiesSubmenu with other properties that can be used for labels:
Formal ChargeLabel each atom with its formal charge.
Partial Charge (0.00)Label each atom with its partial charge to two decimal
places.
Partial Charge (0.0000)Label each atom with its partial charge to four decimal
places.
MacroModel Text TypeLabel each atom with its MacroModel atom type.
MacroModel Numeric TypeLabel each atom with the numerical index for the Mac-
roModel atom type.
StereochemistryLabel each atom with E,Z and R,S stereochemistry.
Atom IdentifiersSubmenu of atom identifiers that can be used for labels.
Color by ElementColor H, C, N, O and S atoms. Default colors are white for H, green
for C, blue for N, red for O and yellow for S. There are several choices for modifying the
color scheme on this submenu.
Reset HNOSSet H, N, O and S atoms to their default color. Carbon atoms remain
their current color.
Custom Color {C}HNOSPick the color for carbon atoms and set H, N, O, and S
atoms to their default color. Clicking on the menu item opens a palette of colors to
choose from for carbon atoms, while selecting Recent Color Choices lists the most
recent colors chosen by this command.
Custom Color {H}CNOSPick the color for hydrogen atoms and set C, N, O, and S
atoms to their default color. Clicking on the menu item opens a palette of colors to
choose from for hydrogen atoms, while selecting Recent Color Choices lists the
most recent colors chosen by this command.
Color by ChainColor atoms by chain:
by Chain (Carbons)Change the color of carbon atoms only.
by Chain (Calpha)Change the color of alpha carbon atoms only.
by ChainChange the color of all atoms.
ChainbowsEach chain is colored with rainbow colors.
Color by SubstructureHelices, sheets and loops are colored by the chosen color
scheme. All other atoms retain their current color.
Color by SpectrumColor all residues by a spectrum of colors.
Rainbow (Carbons)Color carbon atoms only with rainbow colors by residue posi-
tion. The chain is divided into segments, each of which has residues of the same
color.
Rainbow (Calpha)Color alpha carbon atoms only with rainbow colors by residue
position.
Most of the menu items are the same as those on the Toggle Table buttons or button menus. The
Selection shortcut menu has a Disable item, which turns off the selection. The Workspace
shortcut menu has Enable and Disable actions, which you can use to display or undisplay any
of the Workspace rows. These actions are also available from some of the submenus. This
menu also has a Select action, which you can use to create a selection from the visible
(displayed) atoms. It also allows you to operate on the visible atoms only or on all atoms in the
Workspace.
Chapter 3
The quality of a protein structure is often measured by deviations from values reported in the
PDB. You can analyze a protein and display tabular and graphical reports on its quality in the
Protein Structure Quality Viewer panel, which you open by choosing Tools Protein Structure
Quality.
If there is a protein in the Workspace, it is analyzed when you open the panel. Otherwise, you
can display a protein in the Workspace and click Analyze Workspace to perform the analysis.
At the top of the panel, the protein table lists the chains in the protein that is analyzed along
with various measures of the overall structure quality. You can analyze multiple proteins and
they are all listed in the table, and you can select multiple chains in a single protein for
reporting, but you cannot select multiple proteins.
The remainder of the panel consists of two tabs that show different data: the Ramachandran
Plot tab, and the Protein Report tab.
Pausing the cursor over a point displays information for that residue at the top of the panel, and
highlights the residue in the Workspace. Clicking on a point selects the point and zooms the
Workspace image in to that residue, and highlights it with pale yellow markers. The point is
displayed as an outline instead of solid black. The residue information is displayed at the top of
the panel. Click again on the point to deselect it.
Below the plot, you can select options to change the appearance of the residues in the Work-
space structure for each of the three regions. The appearance is changed by coloring the resi-
dues and modifying the molecular representation. The coloring is applied when you analyze
the Workspace or change the color scheme, so you should set these options first, and then click
Analyze Workspace or change the color scheme. Deselecting any of these options does not
revert the color scheme to the original scheme, so you must change the scheme manually to
revert it.
Figure 3.1. The Protein Structure Quality Viewer panel, Ramachandran Plot tab.
The property is plotted as a function of the row number in the table in the area below the table.
The plot area has a toolbar, which is described in Section 3.3. The red dashed horizontal lines
and blue dashed vertical lines can be dragged to highlight a portion of the plot of interest,
which is given a white background, and the rest is gray. The atoms associated with the high-
lighted portion of the plot are interactively selected in the Workspace, and the rows for the
points in the highlighted region are selected in the table.
Figure 3.2. The Protein Structure Quality Viewer panel, Protein Report tab.
If you want to export the values in the table, click Export or Export All. Export exports the
current property table as a text file. Export All exports all property tables as a text file. Both
buttons open a file selector in which you can navigate to the location and name the file.
Reset
Reset the plot to the original pan and zoom settings.
Back
Display the previous view of the plot in the view history
Next
Display the next view of the plot in the view history
Pan/zoom
Pan the plot by dragging with the left mouse button, zoom by dragging with the right mouse
button.
Zoom to rectangle
Drag out a rectangle on the plot to zoom in to that rectangle.
Configure subplots
Configure the margins and spacing of each plot in the panel.
Chapter 4
To open the Residue Analysis panel choose Tasks Residue Analysis in the main window.
Before analyzing a protein, you should prepare it with the Protein Preparation Wizard. Calcu-
lating the energetic properties requires an all-atom structure with bond assignments. To
analyze the properties of a protein, first display it in the Workspace, and then click Analyze
Workspace. A job is run to calculate energetic properties. This job can take several minutes,
and progress is reported in a bar at the bottom of the panel.
When the job finishes, the table is filled in and the first property is plotted below. You can sort
the table by clicking on the heading of the column you want to sort by. A second click changes
the sort direction. The table columns are described in Table 4.1.
Column Description
Residue Residue identity: chain, residue number and insertion code, 3-letter name.
Hydropathy Hydropathy calculated using the Kyte-Doolittle scale [6], normalized by the
solvent accessible surface area.
Potential Energy Sum of the residue-based internal energy and the non-bonded interaction
energy (vdW, electrostatic) between the residue and the remainder of the sys-
tem.
Internal Energy Sum of energies arising from intra-residue bonded interactions (bonds,
angles, torsions) and intra-residue non-bonded interactions (vdW, electro-
static).
Interaction Energy Energy of interaction between this residue and all other atoms.
SASA (Non-polar) Solvent-accessible surface area of nonpolar atoms of this residue
SASA (Polar) Solvent-accessible surface area of polar atoms of this residue
SASA Total solvent-accessible surface area of this residue
Rotatable bonds Number of rotatable bonds in this residue
If you want to view properties for water molecules or het groups (ligands, cofactors, ions),
select the appropriate Show option below the table. You can export the table data to a CSV file
by clicking Export and then providing the file name in the file selector that opens.
To highlight one or more residues in the Workspace, select the table rows. If you have Fit on
selection selected, the view zooms in (or out) so that these residues occupy most of the Work-
space.
To examine a particular property for all residues, you can make a plot of the property as a func-
tion of residue position. Choose the property from the Graph property option menu to display
the plot. You can use the plot to select residues in the table and highlight them in the Work-
space, by moving the dotted lines to enclose the residues you are interested in. For example,
you might want to select residues that have significantly larger or significantly smaller values
of the properties than the average, by moving the upper or the lower red dotted line to enclose
just those data points.
The toolbar provides tools for manipulation of the plot and for saving an image of the plot.
This is a generic toolbar, and some of the actions may not be useful in the current context. The
panel has a toolbar that you can use to configure the plot or to save an image of the plot. The
toolbar buttons are described below.
Reset
Reset the plot to the original pan and zoom settings.
Back
Display the previous view of the plot in the view history
Next
Display the next view of the plot in the view history
Pan/zoom
Pan the plot by dragging with the left mouse button, zoom by dragging with the right mouse
button.
Zoom to rectangle
Drag out a rectangle on the plot to zoom in to that rectangle.
Configure subplots
Configure the margins and spacing of each plot in the panel.
If you want to analyze another protein, click Reset to clear all the panel data.
Chapter 5
The Consensus Visualization panel helps you to identify conserved waters, counter ions, and
ligands for a protein. To open this panel, choose Tools Protein Consensus Viewer.
Homologs of the target protein are identified by a BLAST search. You can select a subset of
these homologs to determine the consensus between them for the locations of waters, counter
ions, and ligands. These homologs are aligned, both by sequence and by structure, to the target
protein. The consensus between the positions of the waters, counter ions and ligands is then
determined. Consensus analysis can help you to quickly identify moieties that are repeated
among multiple structures, such as structurally important waters that should be included in
modeling studies.
For a tutorial introduction to consensus visualization, see Chapter 5 of the BioLuminate Quick
Start Guide.
Browse for FileOpen a file browser in which you can navigate to the desired location
and select the file that contains the structure. The allowed file types are Maestro and
PDB.
From PDB IDImport the structure from the specified PDB ID. Opens the Enter PDB ID
dialog box, in which you can enter the PDB ID of the structure. The structure is retrieved
from a local copy of the PDB if it is available, or from the RCSB web site, depending on
the preference set for PDB retrieval.
From WorkspaceImport the structure that is displayed in the Workspace.
If you choose to prepare the protein beforehand in the Protein Preparation Wizard panel, you
should ensure that you do not delete any of the molecules for which you are seeking a
consensus. In particular, you might want to deselect Delete waters beyond N from het groups,
or make the distance large enough to ensure that you have the relevant waters.
If you already have a set of homologs, you can simply import them with the Import button.
Once they are imported you can align them structurally by clicking Align.
To run a BLAST search for homologs, click Find and Align Homologs. First, the Blast Search
Settings panel opens. You do not usually need to change the BLAST search settings. When you
are satisfied with the settings, click Start Job. A Job Progress dialog box replaces the Blast
Search Settings dialog box, and displays the log file from the BLAST search.
After a few minutes, the job finishes and the BLAST Search Results dialog box opens, with the
results of the search. The top ten results are selected in the table by default. You can change the
number of top rows to select by entering the number of rows in the text box below the table,
and clicking Select Top. You can also manually select rows in the table.
When you have finished selecting rows, click Incorporate Selected Rows. If you do not have a
local installation of the BLAST or PDB databases, the search is done on the web, and a
warning is displayed: Multiple Sequence Viewer is attempting to access a remote server.
Would you like to continue? You can select Do not ask this question again, to prevent it from
opening each time a structure is downloaded, then click OK.
If an information box opens stating that problems were found when importing a structure, you
can select Do not show this dialog again to prevent it from opening for each structure that has
problems, and click OK. The structures are imported without any preprocessing, so they might
have structural defects. For the purposes of this panel, it is generally acceptable to use struc-
tures from the PDB that have structural issues.
The homologs you selected are aligned, added to the sequence viewer, and displayed in the
Workspace. All atoms are marked in all of the homologs.
For each of the three types of molecules, you can perform the following actions:
Choose whether to display all, only the consensus, or none of the molecule of the given
type, from the Display option menu. For example, viewing all the molecules gives an indi-
cation of whether a consensus exists, whereas viewing the consensus shows whether there
is a strong enough consensus to consider the molecule as conserved.
Select and display residues near the molecules of the given type. You can choose to dis-
play residues near any of the molecules or only the consensus molecules. The action is
not performed until you click Select.
Change the color of the residues that are near the molecules of the given type, by clicking
on the color button, and choosing a color in the color selector that opens.
When displaying the molecules, the identity of any consensus molecule can be ascertained by
moving the cursor over the structure in the Workspace and viewing the text in the status bar,
below the Workspace. Consensus waters and ions are displayed as spheres, consensus ligands
as ball-and-stick, and they are highlighted with a silhouette (which can be changed with Style
Highlights, Text and Arrows).
Chapter 6
You may need to identify reactive residues in a protein, so that you can mutate them to improve
the protein properties. This can be done in the Reactive Protein Residues panel, which you
open by choosing Tools Reactive Residue Identification.
Reactive residues are identified by matching residue patterns in the sequence. Four patterns are
provided by default, for the common reactions: deamidation, oxidation, glycosylation, and
proteolysis. You can use these patterns or you can set up and use your own patterns.
To identify reactive residues, include the protein you want to analyze in the Workspace, and
click Analyze Workspace. The structure in the Workspace is analyzed to identify residues that
match the patterns.
The results are listed in the table, showing the reaction type, the reactive residues identified, the
solvent-accessible surface area of the reactive residues, their percentage exposure to solvent,
and their B-factors (if available). The B-factor shown is the average over all atoms in the
residue.
You can use the Show option menu to show only the results for a particular reaction type. You
can also apply filters on the percentage solvent exposure and the B-factor.
To sort the table by the values in a column, click on the column heading. Click again to change
the direction of the sort. The column by which the table is sorted is indicated by an arrow on
the right side of the heading, which also indicates the sort direction.
The reactive sites are marked with spheres in the Workspace. The spheres are colored
according to the reaction type, and the color legend for the spheres is given below the table.
When you select a table row, the residue is highlighted in the Workspace. If you have Fit on
selection selected, the view zooms in to that residue.
If you want to define your own reactive groups, click Edit, to open the Edit Patterns dialog box.
This dialog box lists the patterns in a table, giving the pattern name, the definition, and a
hotspot index. To edit an existing pattern, double-click the table cell that you want to edit, and
enter the changes. The lower part of the panel explains the syntax for the patterns in the Defini-
tion column. The syntax is an extended PROSITE syntax, which allows you to specify
secondary structure and some properties:
Standard IUPAC one-letter (upper case) codes are used for all amino acids.
Lower case x is used for any amino acid.
Some examples of valid and invalid patterns are given below, with comments.
The hotspot index is the index of the reactive residue in the pattern.
Chapter 7
Protein aggregation often occurs via hydrophobic regions. You can locate and analyze these
regions using the Aggregation Surface panel, which you open by choosing Tasks Aggrega-
tion Surface. Aggregation regions are defined by identifying clusters of exposed hydrophobic
residues, and creating a molecular surface that is colored red in the regions near these residues.
Once you have located potential aggregation regions on a protein, you might want to mutate
residues in these regions to reduce the tendency to aggregate.
3. Enter a name in the Surface name text box, if you want to change the default name.
This name is used in the Manage Surfaces panel (Style Create and Manage Surfaces
Surface Manager) to identify the surface.
4. Click Create Surface.
A progress bar is displayed above the buttons while the surface is being created. Surface
creation should take less than a minute.
When the surface is created, it is displayed in the Workspace and other surfaces are hidden. To
display more than one surface, use the Manage Surfaces panel (Window Show Surface
Manager).
The aggregation surface is a molecular surface that is created for a large probe molecule, repre-
senting part of a protein. It is colored red in the regions near the hydrophobic residue clusters.
The side chains of these residues and the alpha carbons are also colored red, and the remaining
backbone atoms are colored gray. The side chains are displayed in ball-and-stick representa-
tion, and the backbone is displayed as lines. Residues that are in contact with these residues are
displayed as lines in gray. All other residues are hidden, so you see only the residues that
contribute to the aggregation regions and very near neighbors. The surface is semi-transparent,
so you can see the atoms inside the surface. The residues involved in the clusters are colored
red in the Workspace sequence viewer.
The surface is stored with the project entry, just like any other surface. If you want, you can
create more than one surface for the Workspace contents. For example, you might have two
proteins displayed and want to analyze the aggregation regions at the interface.
Surface grid spacingSet the grid spacing in angstroms for generation of the surface. A
smaller number results in a smoother surface, but takes longer to generate the surface.
Probe radiusSet the radius of the probe for defining the surface. This is the radius of
the sphere that is rolled over the van der Waals surface to create a Connolly surface. The
large default radius is intended to model a protein probe. Hydrogens are included when
creating the surface. See Section 12.1.2 of the Maestro User Manual for details on the
construction of the surface.
TransparencySet the default transparency of the surface. The transparency can be
changed in the Surface Display Options dialog boxsee Section 12.4.2 of the Maestro
User Manual.
Radius to find neighborsSpecify the distance cutoff for finding hydrophobic neighbors
to identify aggregation regions. The distance between any side-chain heavy atom in one
hydrophobic residue and any side-chain heavy atom in another hydrophobic residue must
be less than this cutoff for the residues to be counted as neighbors.
Hydrophobic neighbors required for siteSpecify the minimum number of hydrophobic
neighbors required to include a residue (site) in an aggregation region.
Buried residue SASASpecify the maximum solvent-accessible surface area (SASA) for
a residue to be regarded as buried. Buried residues are not included in aggregation
regions. The SASA is measured for the heavy atoms only: it does not include hydrogens.
When the analysis finishes, the table is filled in with a list of residues that contribute to the
aggregation regions of the surface. For each residue, its contribution to the surface and the
index of the group to which it belongs is listed in the table. The contribution is a count of
surface elements, which is roughly proportional to the surface area due to that residue. A group
is a set of residues that contribute to the same aggregation region, defined by the Radius to find
neighbors setting in the Aggregation Surface - Options dialog box.
When you select a row in the table, the residue is selected in the Workspace. You can select
entire aggregation regions (groups) by choosing the group from the Select group option menu.
The residues are selected in the table and the Workspace. The option menu shows the sum of
contributions to the group as well as the group index.
If Color selected green is selected, the red surface patch associated with the selected residues is
changed to green, and the residues themselves are colored green also. You can reset the colors
to the original aggregation color scheme (red and gray) by clicking Reset Aggregation Colors.
If you want to zoom in to the selection in the Workspace when you select residues or groups,
select Zoom to selection.
You can display or hide the surface by selecting or deselecting Display surface. As the color
scheme is applied to both the residues and the surface, the residue coloring still changes with
selection when the surface is hidden.
If you want to export the data from the table in CSV format, click Export. A file selector opens,
so you can save the file in the desired location.
One way of doing this is to perform a set of mutations to reduce the size of the aggregation
regions. You can create a set of structures that have single mutations at selected sites, which
you choose from the residues in the aggregation regions, as follows:
The residues that were found to contribute to the aggregation region are listed in the table in the
Residues tab. You can then select any of them for mutation, define the mutations, and run the
job. See Chapter 15 for details of setting up a residue scanning job.
Another option is to mutate a single residue or a loop. Mutating a loop (loop swap) is useful
if you want to mutate more than one residue and the residues are adjacent. For this purpose,
you can use the Residue and Loop Mutation panel. The loop to change is defined by selecting
the residues in the Workspace. You can take advantage of the fact that you only have the aggre-
gation residues and their neighbors visible in the Workspace to choose the residues for the loop
to swap, or you can use the Analysis tab to select the residues, and use the selection. See
Chapter 13 for details of using this panel.
Chapter 8
It can be useful to analyze the specific nature of the interactions at the interface of two proteins.
BioLuminate provides this capability in the Protein Interaction Analysis panel. The analysis
locates residues in one protein that are within a given distance of residues in another protein,
and presents counts of hydrogen bonds, salt bridges, disulfide bonds, pi-pi stacking interac-
tions, and van der Waals clashes, and reports the van der Waals surface complementarity and
buried solvent-accessible surface area.
To open the Protein Interaction Analysis panel, choose Tasks Protein Interaction Analysis in
the main window.
The proteins whose interface you want to analyze must be part of a single project entry. The
could come from a protein-protein docking run, for example, or an antibody-antigen complex,
or a multi-chain protein from the PDB. Once you have a structure, from whatever source, it
must be properly prepared, for example in the Protein Preparation Wizard panel.
You do not have to include all the chains in the analysis, provided that there is one chain in
each group. Chains that are unassigned are not analyzed.
The analysis is performed using settings that characterize the interactions, such as interatomic
distances and angles. You can change the settings in the Advanced Options dialog box, which
you open by clicking Advanced Options. These settings affect the values that are listed in the
table after the analysis is done. The settings are listed below. To return to the default settings,
click Reset.
Set the cutoff for determining the closest interaction neighbors for the interaction analysis. Any
residue that has any atom within the specified distance of the target residue is considered a
neighbor of that residue. The default is 4.0 .
Hydrogen bonds
Specify the criteria for determining whether a hydrogen bond exists. The four atoms involved
in the hydrogen bond are designated DH...AX, where D is the donor atom and A is the
acceptor atom. The default values are the Maestro defaults.
Minimum acceptor angleSet the minimum acceptor angle H...AX, in degrees. The
default is 90.
Minimum donor angleSet the minimum donor angle DH...A, in degrees. The default is
120.
Maximum distanceSet the maximum H...A distance, in angstroms. The default is 2.5 .
Salt bridges
Specify the maximum distance between an ion and a protein atom for detecting a salt bridge, in
the Maximum distance box. The default is 4.0 .
Pi stacking
Set the maximum distance between the centroids of the two aromatic rings in the Maximum
centroid distance box. The default is 4.0 .
Set the maximum overlap distance of the van der Waals spheres of any two atoms in the Allow-
able overlap box. If RA+RB-RAB is greater than the allowable overlap, the atoms are considered
to clash, where RA and RB are the van der Waals radii, and RAB is the distance between atoms
A and B. The default is 0.4 .
Column Description
Residue This column lists residues in either group that have contact-type interactions with
residues in the other group.
Closest This column lists residues from the other group that are within a specified distance
of the residue listed in the Residue column. The default distance is 4.0 .
Specific Interac- Text list of specific interactions between the residue listed in the Residue column
tions and residues in the other group. The list covers hydrogen bonds, salt bridges, pi-pi
interactions, disulfides, and van der Waals clashes.
# HB Number of hydrogen bonds between the residue listed in the Residue column and
residues in the other group. The criteria for detecting hydrogen bonds can be
changed in the Advanced Settings dialog box, prior to the analysis.
# Salt Bridges Number of salt bridges between the residue listed in the Residue column and resi-
dues in the other group.
# Pi Stacking Number of pi-pi stacking interactions between the residue listed in the Residue
column and residues in the other group.
Column Description
# Disulfides Number of disulfide bonds between the residue listed in the Residue column and
residues in the other group.
# vdW Clash Number of van der Waals clashes between the residue listed in the Residue col-
umn and residues in the other group. A clash is defined as an overlap of the van der
Waals radii of two atoms by more than a specified cutoff. The default is 0.4 .
vdW Comple- Van der Waals shape complementarity between the residue listed in the Residue
mentarity column and residues in the other group, as defined in Ref. 10.
Buried SASA Fraction of the solvent-accessible surface area of the residue listed in the Residue
column that is buried by the interaction with residues in the other group.
If you want to import the results into another application, you can export them to a CSV file by
clicking the Export Table button, and naming the file in the file selector that opens. The file is
comma-separated with a heading row. Table cells that have multiple rows are exported as
double-quoted text with line breaks embedded.
You can choose the property from the Color by option menu. The choices are:
When you choose an option from this menu, the text boxes for the minimum and maximum
values are updated to reflect the possible range of values.
The colors that represent the chosen minimum value and maximum value of the interaction are
chosen from the Minimum color and Maximum color option menus. The minimum and
maximum values for the color display are set with the value options, described below. The
color for the minimum is applied to any value below the chosen minimum for display; likewise
the color for the maximum is applied to any value above the chosen maximum for display. The
colors between the two limits are obtained by a linear interpolation of the RGB color values.
The minimum value and maximum value for the color scheme can be chosen by selecting one
of the Minimum value and Maximum value options. There are two options for each value:
Minimum/Maximum in tableSet the values to use for coloring to the minimum value and
maximum value found in the table. The color scheme so defined is then relative to the
range observed, rather than fixed.
Supplied valueSet the values to use for coloring to the values given in the boxes. The
default minimum value is zero; the default maximum is the allowed maximum for the
total number of specific interactions, 1.0 for the vdW complementarity, and 100% for the
buried SASA.
If you only want to color some of the residues, you can select the table rows for the residues
you want to color, and select Only color selected rows.
When you have chosen the property, set up the color scheme, and optionally chosen residues to
color, click Color Residues to apply the color scheme.
Chapter 9
Finding large-scale motions in proteins can provide information on the flexibility or rigidity of
parts of the protein, on domain movements, or give some clues to biochemical processes.
Trajectories from molecular dynamics simulations can show the large-scale motions of
proteins, but these simulations are not resolved into individual modes, and they require a large
amount of computational resources. The lowest vibrational modes of a protein can be deter-
mined and visualized using the Low Mode Vibrational Sampling panel. You can open this panel
by choosing Tasks Low Normal Mode Analysis Calculate or Visualize.
Before running the calculation, you should ensure that the protein is properly prepared, using
the Protein Preparation Wizard (Tools Protein Preparation). You should remove waters and
solvent molecules so that you are analyzing just the protein (and its ligands, if any).
First, the input structure is minimized (using the PRCG method for a maximum of 10000 itera-
tions to a gradient convergence threshold of 0.05 kJ mol11). This ensures that the structure
is at its minimum, which is important for generating the vibrational modes. The vibrational
modes are generated as a set of structures sampled at regular intervals along a full cycle of the
vibrational mode. The rotational and translational modes (the trivial modes) are discarded. The
vibrational modes are then visualized by displaying the structures in sequence as a movie in
the Workspace.
2. Set the number of frames to generate per vibrational cycle in the Number of frames per
mode text box.
Each frame is a snapshot of the structure at a particular point in the vibration between the
classical limits. The full cycle of a vibration is divided evenly to define the coordinates
used for the snapshots, so this number should be divisible by 4.
3. Set the maximum amplitude of vibration in the Vibrational amplitude text box.
This is the maximum displacement of the fastest-moving atom. For proteins, the fastest
moving atom could potentially move a large distance if the motion involves a long loop,
for example. This choice is somewhat arbitrary: you might for example want to exagger-
ate the motions to make them easier to see.
4. Click Run.
A job is started that generates the series of structures for each vibration. The limit on the
number of atoms that can be processed depends on the memory available on the machine, but
with 4GB of memory, it is about 5000. For example, 1ETT, with about 4800 atoms, runs
successfully. The job can take several hours to run, depending on the size of the protein.
When you have finished viewing the modes, you should clean up the structures in the project,
by clicking Remove Structures from Project, or deleting the entry group in the Project Table
panel.
Chapter 10
The alpha helical tendency of one or more peptides is determined from a molecular dynamics
simulation by tracking i to i+4 hydrogen bond formation, and other indications of helical struc-
ture. The prediction is made entirely from the sequences, by building them as idealized alpha-
helices, and then performing a molecular dynamics simulation in water using simulated
annealing, to simulate experimental melting experiments. At the end of the simulation, aver-
ages are taken to determine the values of properties that can be used to determine the helicity
of the sequences.
Simulations are set up in the Start Simulations tab, and results are presented in the Results tab.
To run a simulation:
1. Select an option for the sequences to simulate under Use sequences from.
The options are Workspace (the included entries), Project Table (the selected entries),
File, or Sequences. If you choose File, enter the file name in the text box or click Browse
to navigate to and select the file. If you choose Sequences, type or paste the sequences
into the text box, separated by commas. Each sequence that you provide is simulated sep-
arately in a single job.
2. Click Start Simulations.
A Start dialog box opens, in which you can set the job name, select a host, and specify the
number of processors. Click Start in this dialog box to submit the job to a host for
execution.
Since the simulations are likely to take many hours, it is a good idea to run the job on a
multiprocessor host. The simulation uses a 3D domain decomposition of the simulation
box, so you can specify the number of processors for each dimension in the decomposi-
tion (labeled x, y, and z). See Section 3.10 of the Desmond User Manual for more
information. A good choice is 2 processors for each, on an 8-core CPU.
When the simulation finishes, you can review the results of the simulations in the Results tab.
Click Load Output File read the results, which are in a CSV file. A file selector opens, in which
you can navigate to and select the output CSV file. The results are presented in a table, whose
columns are described in Table 10.1. The averages are taken over the length of the simulation.
If you want to run another calculation, click Reset to clear all panel data.
Table 10.1. Columns of the Results table in the Peptide Helicity panel.
Column Description
The independent variables in these models are sets of amino acid properties taken from the
literature [79], and do not need to be explicitly provided. Only the observed property is
needed, for the training set and the test set of the model.
To open the Peptide QSAR panel, choose Tasks Peptide QSAR. The model is set up in the
Setup tab, and the results are presented in the Results tab.
The sequences can come from peptide structures in the project, from a Fasta file, or from a
CSV file, and must be all of the same length. You can choose the source of the sequences from
the Load sequences from option menu.
Project Table (selected entries)Select the entries in the Project Table that contain the
peptide sequences.
Workspace (included entries)Include the entries in the Workspace that contain the pep-
tide sequences. The sequences are displayed in the Workspace sequence viewer.
CSV fileRead the sequences and the observables from a CSV file.
Fasta fileRead the sequences from a Fasta file.
The choice you make determines which controls are displayed in the rest of the dialog box for
the selection of the observables.
If you load the sequences from a CSV file, the observable must also be in the CSV file.
You can specify the file with the usual tools (File text box, Browse button), and you can
specify the column that contains the sequence and the column that contains the observ-
able. If the sequences have names, you can also check the box for the name and specify
the column that contains the name. If the CSV file has a header row, select Has header.
If you load the sequences from the Project Table or the Workspace, you can use a project
property for the observable, and choose the property from the Property option menu.
For any of the methods you can load the observable from a CSV file. You must specify
the CSV file, the column in the file that contains the observable, and indicate whether the
file has a header row.
You can also leave the observable property undefined, by choosing Enter manually in the
table / Has no observables. You can do this when building a model if you plan to enter the
values manually in the table. If you are making predictions for new sequences, select this
option to load sequences without observables. This option is not available when loading
sequences from a CSV file.
Click OK when you have made your choices. The dialog box closes, and the sequence table in
the Peptide QSAR panel is filled in (see Figure 10.2 on page 59). If you chose to leave the
observables undefined, you can edit the table cells to supply the values.
First, you need to choose a training set and a test set. There are three options:
You can choose the sets explicitly. Select Use rows marked as Training Set and Test Set.
Select the rows for one of the sets in the table, choose the set from the Set selected rows
as option menu, and click Update. Do the same for the other set. You should of course
only select rows that have observable values.
You can use all the sequences, and assign the training and test sets randomly. Select Use
all rows in the table, and set the percentage for the test set in the Randomly select text box.
You can also specify a seed for the random number generator.
You can use a subset of the sequences, and assign the training and test sets randomly.
Select Use only rows marked as "either", and set the percentage for the test set in the Ran-
domly select text box. Select the rows to use in the table, choose Either training or test
from the Set selected rows as option menu, and click Update. You can also specify a seed
for the random number generator.
Next, choose the type of amino acid descriptor set to be used as the X (independent) variables
in the model, from the Peptide descriptor type option menu. The choices are:
zvalueUse the three z-value variables (z1, z2, z3) of Hellberg et al. [7] for the amino
acid descriptors. These are derived from a principal components analysis (PCA) of 29
physicochemical variables for the 20 coded amino acids. The descriptors include molecu-
lar weight, pKa, pI, side-chain vdW volumes, NMR shifts, retention times, partition coef-
ficients, solvent exposure. Choose this variable set only if the peptides in your set consist
entirely of coded amino acids.
ezvalueUse the five extended z-value variables of Sandberg et al. [8] for the amino acid
descriptors. These are derived from a principal components analysis of 26 physicochemi-
cal descriptors for 87 amino acids (including the 20 coded amino acids). The descriptors
include molecular weight, NMR shifts, partition coefficients, side-chain vdW volumes,
HOMO and LUMO energies, heats of formation, polarizabilities, surface areas, hard-
nesses, TLC retention times, hydrogen-bond donor and acceptor counts, side chain
charges.
dppsUse the 10 divided physicochemical property scores of Tian et al. [9]. These are
derived from 23 electronic, 54 hydrophobic, 37 steric and 5 H-bond properties of the 20
coded amino acids, by applying principal components analysis to each of the groups sep-
arately and keeping 4 electronic components and 2 each for the other groups. Choose this
variable set only if the peptides in your set consist entirely of coded amino acids.
allUse all of the above variables. There is likely to be some linear dependence between
these variable sets.
The models are built using partial least squares techniques, with a choice of two variants that
you can select from the QSAR method option menu. For each of these variants, you can set
options to control the application of the method, by clicking the Advanced Options for variant,
and making settings in the dialog box that openssee Section 10.2.5 on page 64.
When you have finished making settings, click Build. The model may take a minute to build,
and then the results are displayed in the Results tab. To apply this model to other sequences,
you must save it first. Click Export Model in the Results tab to save the model to a file.
The Statistics tables display statistics for the training set and the test set. For the training set,
four statistics are shown: the standard deviation, the R2 correlation coefficient, the R2 - correla-
tion coefficient from cross-validation, and the stability. The latter two statistics are calculated
with a leave-n-out method; the stability indicates how sensitive the results are to the choice of
the training set. For the test set, the root-mean-square error, the Q2 correlation coefficient and
the Pearson r value are shown.
The results table shows the observed and predicted values for the peptide sequences used for
the training and test sets, when building a model, and for the chosen sequences when applying
a model. The table also shows the sequence name and the training and test set classification.
To use the results in another application, you can export them to a CSV file, by clicking Export
Predictions and navigating to a location and naming the file in the file selector that opens.
If you have just developed a model, you must export it before you can apply it to a new set of
sequences in the Setup tab. Click Export Model, and navigate to a location and name the file in
the file selector that opens.
For a visual representation of the results, click Plot to display a scatter plot of predicted values
against observed values, with a line of perfect fit, in the Peptide QSAR Scatter Plot panel. The
panel has a plot toolbar, which allows you to configure the plot and save an image.
1. Load the sequences you want to apply the model to, if you do not already have the
sequences loaded.
See Section 10.2.1 on page 60 for information on loading sequences.
2. Choose the QSAR method for the model you want to apply.
3. Select Apply a model.
4. Click Browse to navigate to the model file and open it.
5. Choose an option for the table rows to apply the model to.
You can apply it to all rows, or to rows that are marked in the table as training, test, or
either, or to rows that are not marked.
6. Click Apply.
The model is applied and the results are displayed in the Results tab.
Specify the maximum number of PLS factors to use in the regression model. Regression
models are built for increasing numbers of PLS factors up to this number. The maximum
number that can be used is limited by the number of descriptors, which is 3 times the number
of residues for the zvalue set, 5 times the number of residues for the ezvalue set, and x times
the number of residues for the dpps set. It is rarely useful to build models with more than a few
PLS factors, as models with a large number tend to be overfit.
Autoscale X variables
Scale the X variables by dividing the values of each property by the standard deviation in the
value of the property.
Stop adding PLS factors when standard deviation of the regression drops to
Select this option to stop adding PLS factors when the standard deviation of the regression
drops below the value specified in the text box. Using this option could result in fewer PLS
factors than the number specified in the Maximum number of PLS factors box, but adding more
factors may not yield any improvement in the model.
Eliminate X variables whose t-value is less than the value given in the text box. The t-value is
the ratio of the coefficient of the variable in the fitted model to the standard error of the model.
Small t values indicate that the variable is not contributing significantly to the model.
Specify the maximum number of KPLS factors to use in the regression model. Regression
models are built for increasing numbers of KPLS factors up to this number. The maximum
number that can be used is limited by the number of descriptors, which is 3 times the number
of residues for the zvalue set, 5 times the number of residues for the ezvalue set, and x times
the number of residues for the dpps set. It is rarely useful to build models with more than a few
PLS factors, as models with a large number tend to be overfit.
Kernel nonlinearity
Change the kernel nonlinearity value. A Gaussian kernel exp(d2/2) is used, where d is the
Euclidean distance between two X variables. The nonlinearity value is 1/, so small values are
almost linear, and large values are very nonlinear. Higher nonlinearity typically leads to tighter
fitting, but it also tends to give poorer predictions on new peptides. Click Reset to reset the
value to the default.
Stop adding KPLS factors when standard deviation of the regression drops to
Select this option to stop adding KPLS factors when the standard deviation of the regression
drops below the value specified in the text box. Using this option could result in fewer KPLS
factors than the number specified in the Maximum number of KPLS factors box.
Calculate a confidence interval for each predicted value in the test set, by bootstrapping. This is
done by sampling the training set randomly with replacement to generate a new test set of the
same size with duplicates, building a model and making predictions of the test set, then
repeating the procedure a specified number of times. The standard deviation from the original
test set is then calculated as the uncertainty.
Specify the number of times a random sample is made and a prediction obtained in the uncer-
tainty calculations. This number determines how many values are used in calculating the stan-
dard deviation, and should be at least 5.
Peptide docking is done with Glide, which treats the receptor as a rigid structure but with soft-
ening of the potentials in the active site region to simulate small adjustments of the receptor to
the ligand. The peptide ligands are treated flexibly. As peptides are very flexible compared to
typical non-peptide ligands, the Glide docking is performed with increased sampling, and
several docking runs are done for each peptide, with different input conformations, to further
increase the sampling.
Centroid of Workspace ligandCenter the binding site and thus the docking grids at the
centroid of the ligand molecule that you pick in the Workspace. Use this option if your
receptor has a ligand that occupies the binding site. To pick the ligand molecule, select
Pick and click on a ligand atom in the Workspace. Information on the ligand is shown in
the box once you have picked the ligand.
Centroid of selected residuesSet the center of the binding site at the centroid of a set of
residues that you select. This option is useful if you don't have a bound ligand. You
should choose residues whose centroid is approximately where the centroid of a bound
ligand should be.
To select the residues:
1. Click Select Residues.
2. Click the Residues tab in the Atom Selection dialog box.
3. Pick the residues in the Workspace.
4. Click Add, then click OK.
The X, Y, and Z text boxes show the coordinates of the center of the binding site.
Once you have defined the center of the binding site, you next need to define its extent, as
Glide needs to know how large the volume is in which the ligand can be docked. There are two
options:
Set automatically based on peptide ligand setSelect this option if you want the box size
to be determined automatically once the peptide ligand set is defined. This option ensures
that the box size is no larger than is necessary, and is the default.
Outer box fully accommodates linear peptide of N residuesSpecify the maximum num-
ber of residues in the peptides that you want to dock. This option sets the box size so that
it accommodates the longest peptide in a linear conformation. The box may be larger than
necessary to dock the actual conformations.
With the Glide technology, peptides that have a diameter greater than 80 cannot be docked
due to the grid box size limit. This corresponds to a linear peptide of about 16 residues. Glide
also restricts the number of rotatable bonds to 100, due to the rapid increase in the number of
conformations with the number of rotatable bonds. The maximum peptide length that can be
docked in practice will depend on the conformation and the residue types.
FileAdd peptide sequences from a file. Opens a file selector in which you can locate
and open the file. The file must be a plain text file (.txt) or a PDB file (.pdb). If the file
is a plain text file, there must be one sequence per line in the file. If it is a PDB file, only
the first sequence is used.
Project TableUse the peptide sequences from the selected entries in the Project Table.
You should select the entries in the Project Table first, before clicking this button. Only
the sequences are used: no structural information is kept, as structures are generated by a
conformational search.
TextEnter the sequence of the peptide to be docked as a string of one-letter standard
residue codes. Opens a dialog box in which you can type in the sequence.
The sequences are added to the list can be edited or removed. If the sequence is colored red, it
contains an invalid single-letter residue code, and must be corrected. You can edit individual
rows by double-clicking on the row, then editing the text. You can select multiple rows to
remove them with the Delete button, or you can remove all rows with the Delete All button.
If you want to use the sequences somewhere else, you can export them to a plain text file, one
sequence per line, by clicking Export.
The structures that are built do not have capping groups, and by default are in zwitterionic
form, i.e. with charged termini. (The option to neutralize termini is turned off and cannot be
changed in the current release.)
When generating conformers from sequences, you can allow peptide linkages in the cis confor-
mation rather than the usual trans conformation by selecting Allow cis amide bonds.
Each peptide is docked several times in independent docking runs, with different starting
conformations. This helps to increase the sampling of conformations in the docking process,
and thus produce a better set of poses for the peptide. You can specify the number of indepen-
dent docking runs to perform for each peptide in the Number of independent docking runs per
peptide box.
After these runs have finished, the poses from each run are combined and sorted to give a set of
poses for the peptide. You can specify the number of poses to return from each of these runs in
the Number of poses to return for each independent docking run box. If the number specified is
too small, you risk losing good poses. For example, if the first ten poses from a particular run
are lower than any of the poses from the other runs, and you only allow one pose from each
run, you would lose nine of the best poses.
Finally, you can set options for scoring the docked poses. The scores are estimates of the
binding free energy, and are returned as properties of the docked poses. The two options are:
MM-GBSAUse the MM-GBSA ligand binding energy as the score. This option involves
an extra calculation to evaluate the binding free energy in implicit solvent, and is there-
fore more expensive, but more accurate.
GlidescoreUse the GlideScore value from the docking run to score the peptide poses.
This is the Glide SP docking score, and is produced automatically as part of the docking
run.
For more information on the Glide docking process, see the Glide User Manual. For more
information on the MM-GBSA method used, see Chapter 8 of the Prime User Manual.
Docking peptides is time consuming as there are many conformations to explore. A single
peptide can take a few hours to dock. It it recommended that you run the job across multiple
processors if you can, especially if you want to dock multiple peptides. The number of proces-
sors and the number of subjobs can be specified in the Job Settings dialog box.
The job output is a file containing the receptor and the poses of the docked peptides (in a pose
viewer file). You can step through the poses in the presence of the receptor using the Pose
Viewer panelsee Chapter 6 of the Glide User Manual for more information.
Chapter 11
The question of whether one protein binds to another, and where, can be addressed by protein-
protein docking. Protein-protein docking in BioLuminate is performed using the Piper
program [5], under license from Boston University. The job can be set up in the Protein-Protein
Docking panel. In this panel you can set up jobs to dock two arbitrary proteins, dock an antigen
to an antibody, or dock one protein to itself to form a dimer or a trimer.
One protein is treated as the receptor and the other as the ligand. In the general case, it
does not matter which protein is treated as the receptor and which protein is treated as the
ligand. For antibody-antigen docking, the receptor is the antibody and the ligand is the antigen.
The algorithm samples all possible orientations of the two proteins, subject to whatever
constraints are applied. It uses a grid to locate the best poses of the two proteins, with a
maximum resolution in the poses of about 5. The docking is performed as a rigid-body opti-
mization: there is no subsequent minimization of the interfacial region.
To prepare your protein for docking, you can use the Protein Preparation Wizard (see the
Protein Preparation Guide). When you prepare your structure, you should select Fill in missing
side chains using Prime to predict the side chains, and Fill in missing loops using Prime to
predict the loops. The loop prediction used in the protein preparation is the faster look-up
method, rather than the more extensive ab initio loop building. Both of these predictions can
take several minutes.
If you are concerned about the accuracy of the surface side chains or loops, you can do a more
extensive prediction in the Refinement panel (Tasks Loop and Sidechain Prediction). If you
have access to the X-ray data, you could consider performing some refinement of the structure
with PrimeX, the X-ray refinement program. Choose Tasks Advanced Tasks Protein X-
Ray Refinement Display Toolbar, and use the toolbar to perform the refinement tasks. See
the PrimeX User Manual for more information.
In the docking experiment, however, the accuracy of the methods may not be sufficient to
distinguish between conformations, so an extensive prediction of the surface side chains or
loops is probably not necessary. The presence of the side chain or loop in the right region is
likely to be more important than its exact conformation. If you want to test the effects of loop
or side chain conformations, you can generate multiple conformations and dock each of them.
Next, choose the receptor protein and the ligand protein. It does not matter which of your two
proteins you choose as the receptor and which you choose as the ligand. To select the struc-
tures, click Receptor or Ligand in the Protein Structures section, and choose a source from the
menu that is displayed. The menu items are the same for each menu:
Browse for FileOpens a file selector so that you can browse to the location of the file
and select it. The structure is added to the project and to the Workspace.
From the WorkspaceUse the structure that is in the Workspace. You should ensure that
only the desired structure is included in the Workspace. If you have prepared the protein
with the Protein Preparation Wizard, the structure should already be in the Workspace.
From PDB IDOpens a dialog box in which you can specify a PDB ID. The protein you
specify is imported from the PDB, either from a local copy or from the web site. It is
added to the project and to the Workspace. Proteins from the PDB are likely to have
atoms missing.
When you import the structures, if hydrogens are missing, you are prompted to add them. If the
protein has multiple chains, you are prompted to choose the chains. You can choose more than
one chain for the docking.
When the protein is imported, the structure is added to the project and included in the Work-
space. If the structure is removed from the Workspace you can add it with the Include and
zoom button (eye icon).
To view the sequences of the proteins you imported in a sequence viewer, click View
Sequences.
When you have selected the proteins to dock, you can set two parameters to control the
docking. The first is the number of ligand orientations to the receptor that are sampled. You can
set the value in the Number of ligand rotations to probe box The default of 70,000 corresponds
approximately to sampling every 5 in the space of Euler angles, and is the maximum value
allowed. Decreasing the number of rotations generally degrades the results, but decreases the
run time.
The second parameter is the number of poses to return, which you can set in the Maximum
poses to return box. Each pose is the center of a cluster that results from clustering the top
1000 results of rigid docking of the ligand. If more clusters are found than the number of poses
to return, the clusters are ranked by size and poses are chosen from the largest clusters. If fewer
clusters are found than the maximum number of poses to return, one pose per cluster is
returned.
After setting the parameters, click Run to run the docking job, or click the Settings button to
make settings for running the job. In the Job Settings dialog box, you can choose whether to
append the results to the project (Append new entries) or leave them on disk (Do not incorpo-
rate), name the job, choose a host to run the job on, and set the number of processors to use for
distribution of the work. A typical docking job with the default parameters takes several hours
on a single processor, so distributing the job over multiple processors (cores) reduces the turn-
around time considerably.
Constraints can be added in the Constraint section of the panel. Each constraint is represented
by a row in the table in this section. To add a new constraint row to the table, click Add
Constraint. All of the settings needed to define the constraint can be made in the cells of the
table.
Column Description
To define the constraint, you must choose the constraint type in the Type column, choose the
protein to apply it to in the Protein column, set a value in the Bonus column if it is an attractive
constraint, and then select the residues to apply the constraint to by using the Residues
column. You can select the residues in the Workspace, then choose From Workspace selection,
or you can choose Choose Workspace atoms, then use the Atom Selection dialog box to select
the residues for the constraint. See Section 6.5 of the Maestro User Manual for information on
using the Atom Selection dialog box.
To delete a constraint, click the red minus icon in the Actions column.
The dimer and the trimer are subject to symmetry constraints: the dimer must have a twofold
axis of rotation (C2 axis), and the trimer must have a threefold axis of rotation (C3 axis).
As for the other types of docking, you can choose the antibody and antigen in the Protein struc-
tures section. After prompting you to add missing hydrogens, the antibody is analyzed to
locate the CDR regions and determine which are the light and heavy chains. A progress bar is
displayed while the analysis is done. When the analysis finishes, another dialog box opens,
prompting you to select the chains to use for the antibody (receptor). You must select two
chains: a light chain and a heavy chain. When selecting the antigen, an alert box opens,
warning you about the chains. You can dismiss this box.
Apart from these changes, docking an antigen to an antibody is set up and run in the same way
as a general protein-protein docking job.
Chapter 12
If you have a protein sequence and want to build a homology model of the protein, there are
three ways that you can proceed in the BioLuminate interface.
Proteins for which you expect a high homology with the template and that require only a
straightforward alignment to the template can be modeled in the Homology Model panel. In the
default mode, any missing loops are predicted using a curated database of known loops in the
PDB. This approach is very fast and a full homology model can typically be generated using
this panel in 2-5 minutes. To open this panel, choose Tasks Homology Modeling Simple
Homology Modeling. The use of this panel is described below.
Proteins where the homology is not as high or where alignment of the template and the query is
required can be modeled either in the Multiple Sequence Viewer panel (opened with Tools
Multiple Sequence Viewer), or with the Structure Prediction panel (opened with Tasks
Homology Modeling Advanced Homology Modeling). For information on using these panels,
see the Multiple Sequence Viewer document for the Multiple Sequence Viewer and the Prime
User Manual for the Structure Prediction panel.
For a tutorial introduction to homology modeling using the Homology Modeling panel, see
Chapter 6 of the BioLuminate Quick Start Guide. For a tutorial introduction to advanced
homology modeling, see the Prime Quick Start Guide.
2. Choose a source for the template structure on which to build the model, using the Tem-
plate button.
You can either read in a template structure from a Maestro file or a PDB file (Browse for
File), or you can run a BLAST search, as follows:
When the job finishes, the model is added to the Workspace, in cartoon representation. The
cartoon is colored by how the template was used: dark blue for residues for which all coordi-
nates were taken from the template, cyan for residues for which the backbone was taken from
the template, and red for residues that were entirely modeled, not using the template. The title
given to the model includes information on the query and the template.
If the job fails, it is likely that there is insufficient homology or poor alignment between the
reference and the template to build a model. In this case you should use the Advanced
Homology Modeling panel (Tasks Homology Modeling Advanced Homology Modeling) or
the Multiple Sequence Viewer (Tools Multiple Sequence Viewer).
If you want to examine the quality of the model structure, click Examine Model Quality, to open
the Protein Structure Quality Viewer panel, where you can view reports on the protein structure,
a Ramachandran plot, and plots of protein properties
If you want to refine loops in the model, click Refine Loops. The Refinement panel opens with
the Refine loops task selected. You should only need to refine the loops if they were not
predicted from the template. Click Non-Template in the Refinement panel to list only the loops
that did not come from the template. See Chapter 6 of the Prime User Manual for information
on this panel.
Chapter 13
At some point in a workflow, you might want to mutate a single residue, or replace a single
loop with another loop. You can do this in the Residue and Loop Mutation panel, which you
open from the Tasks menu.
BioLuminate provides other tools for mutations of more than one residue. The Residue Scan-
ning panel allows you to mutate a protein at multiple sites to generate a set of proteins, each
with a single mutationsee Chapter 15 for information. If you want to mutate residues to or
from cysteine and break or form disulfide bonds, you should use the Cysteine Mutation
panelsee Chapter 16 for information.
The workflows for each of these mutations is summarized here and detailed in the following
sections.
To find a particular residue in the Workspace, you can use the Find tool. Type CTRL+F (F) in
the Workspace to display the Find toolbar below the Workspace. You can then choose Residue
number or Residue type from the Find menu to choose residues by number or type, then enter
the number in the text box or choose the type from the menu, and click the N or P button.
To find a residue in the sequence viewer in the Residue and Loop Mutation panel, enter the
residue letter in the Find Pattern text box, and click the arrow keys to step through the occur-
rences. You can also enter multiple residues to find a pattern, e.g. D-A-P. The pattern uses
PROSITE syntax, which is explained in the tool tip. When you have found the residue you
want, clear the Find Pattern text box, then click on the residue to select it, or simply select it in
the Workspace. The residues that are found are selected in the Workspace, so clicking on one
of them changes the selection to the desired residue.
If the sequence viewer doesnt have a sequence in it, you can click Import and choose From
Workspace to load the sequence of the Workspace structure.
To begin editing a standard amino acid in the Workspace to produce a custom amino acid,
check Edit. The amino acid replaces the protein in the Workspace, with the backbone repre-
sented as ball and stick and the side chain as lines.
To add groups to the structure or to change elements, you can use the Build and the Fragments
toolbars. Click Build or Fragments on the Manager toolbar at the top of the main window to
display these toolbars. With the Fragments toolbar, you can select a fragment, then click on an
atom to replace that atom with the fragment. With the Build toolbar, you can sketch a structure
with the Draw tool, change the element with the Set Element tool. You might also want to add
hydrogens after sketching a structure, which you can do from the Edit toolbar with the Add H
tool. You should also use the Clean Up tool after sketching the structure and adding hydrogens,
to ensure that the structure is not distorted.
For more information on building structures, see Chapter 5 of the Maestro User Manual.
When you have finished editing, clear the Edit check box. A dialog box prompts you to provide
a 3-letter PDB name for the custom amino acid. The default is USR. You can edit the name in
the Three-letter PDB Name text box after creating a custom amino acid.
The text on the amino acid option menu is set to Custom when you finish editing.
After the mutation is defined, the mutation is reported in the panel, and a title for the mutated
structure is entered in the Mutated structure title text box. You can edit this title if you wish.
There are two types of refinement available: minimization and molecular dynamics simulation.
You can choose one or the other or both. The minimization is run first.
Minimization is the fastest option, and is a good choice if the mutated residue is in approxi-
mately the right conformation. If you choose to do a minimization, you can run it in the gas
phase or in implicit solvent. The residues minimized can be limited to just the altered residues,
all residues within a given distance of the altered residues, or the entire structure.
Molecular dynamics is much more time-consuming, but it can sample other parts of conforma-
tional space, particularly if you run simulated annealing. Once the structure is mutated and any
minimization is done, the protein is prepared for the simulation by adding explicit water mole-
cules. The Molecular Dynamics panel or the Simulated Annealing panel opens, and you can
make settings and run the simulation, which uses the Desmond molecular dynamics program.
For more information on these panels, see Chapter 3 of the Desmond User Manual.
If you decide you do not want to run the simulation, which can take many hours, you should
delete the entry group in the Project Table that was created for the simulation, as follows:
1. Select the entry group (click the row with the number in square brackets).
2. Choose Entry Unlock.
3. Choose Entry Delete.
For information on using the Project Table, see Chapter 9 of the Maestro User Manual.
The basic procedure is summarized below, and details are given in the following sections.
5. If desired, load a new structure and select residues to specify a starting set of residues for
the replacement loop.
By default, the replacement loop starts out identical to the original loop.
6. If desired, edit the replacement loop by clicking on residues in the table and specifying an
insertion, deletion or mutation at that point.
7. Choose the number of models to generate and the prediction method.
8. Click Accept.
The Insertions, Deletions, and Loop Swaps panel closes.
9. Click Mutate.
No refinement of the loop modification is offered, so you must perform any refinement inde-
pendently.
Browse for FileOpen a file browser in which you can navigate to the desired location
and select the file that contains the structure. The allowed file types are Maestro and
PDB.
From PDB IDImport the structure from the specified PDB ID. Opens the Enter PDB ID
dialog box, in which you can enter the PDB ID of the sequence. The sequence and struc-
tures are retrieved from a local copy of the PDB if it is available, or from the RCSB web
site, depending on the preference set for PDB retrieval.
From WorkspaceImport the structure that is displayed in the Workspace.
The sequence for the imported structure is added to the sequence viewer. This allows you to
import multiple structures, and choose which structure to take a loop from.
When you have the structure that you want to mutate, you can select the loop to mutate either
in the Workspace or in the sequence viewer in the panel.
Selecting the loop in the Workspace allows you to visually identify the loop. To ensure that you
are picking residues in the Workspace to define the loop, choose Edit Pick Mode Resi-
dues, or type the letter R with the pointer in the Workspace. You can then pick residues that
belong to the loop you are interested in. The residues that you pick are highlighted in the
Workspace (yellow dots) and in the sequence viewer (white letter on blue background).
Selecting the loop in the sequence viewer allows you to easily search for patterns in the
sequence, or to use the secondary structure assignment (SSA) to identify loops. The SSA has
no annotation where there are loops (the absence of any other secondary structure). To search
for patterns in the sequence, enter the pattern in the Find Pattern text box, with a dash sepa-
rating each residue from the next, e.g. D-A-P. The syntax is summarized in the tool tip for the
text box, and is given in more detail on page 40. You can use the arrow keys to step through the
patterns. All of the matches are selected in the Workspace, so you might have to select the loop
that you want in the Workspace, or by clearing the Find Pattern text box, then selecting the
residues to use.
When you select the residues for the loop to mutate, you should select at least two residues on
either side of the residues that you plan to modify. These extra residues are stem residues,
which are used by the Prime software when rebuilding the loop, to properly fit the new loop
onto the structure. For example, if a single residue is being deleted, the original loop should
consist of five residues - the residue to delete and the two residues on either side of it.
After you have selected the residues, click Workspace Selection in the Original loop section, to
register the selection and copy the structure for the loop mutation job. The original loop is used
by default for the replacement loop, which you can modify as described in the next section.
If you want to replace the loop with a loop from another structure, you can place another struc-
ture in the Workspace, and select a loop from this structure. If the structure is not already in the
sequence viewer, you can import it as described above. You can also select the loop in the same
way as for the original loop.
You might want to align the sequence for the replacement to the sequence for the original
structure, so that you can select the loop by its alignment. To do the alignment, make sure that
the original sequence is at the top of the sequence viewer (use the shortcut menu for the
sequence name to move it if necessary), and add the replacement sequence to the selected
sequences. You can then use either the pairwise alignment tool or the multiple alignment tool
to align the sequences (using ClustalW).
You can also do manual alignment, with locking and unlocking of gaps. See the Multiple
Sequence Viewer document for more information on doing manual alignment.
When you have selected the residues in the desired structure for the replacement loop, click
Workspace Selection in the Replacement loop selection. If you decide not to use this loop, you
can click Set to Original Loop to change the replacement loop back to the original loop struc-
ture.
When you choose to insert a residue, a new column is added to the table before or after the
current position. The cell in the Original row has three dashes, to indicate that there is no
residue in the sequence at this position. The cell in the Replacement row has the residue name
between two colons, to indicate that the chain and residue number are not yet assigned. Inser-
tion codes are added for the inserted residues when the job is run.
If you delete a residue, the cell in the Replacement row is set to three dashes, to indicate the
deletion.
Likewise, if you mutate a residue, the cell in the Replacement row has the new residue name
between two colons.
You can also choose whether to create the loops via a fast table-lookup method or via a full
Prime loop prediction, which builds loops by sampling multiple conformations and scoring
them, to produce the best loop structures. The fast method takes only a minute or so, where as
the thorough loop sampling can take hours. To make the choice, select Fast or Thorough under
Loop prediction. No additional refinement beyond these methods is done for new loop predic-
tions.
When you have finished selecting options, click the Accept button to return to the main
Residue and Loop Mutation panel. Click the Mutate button on that panel to begin the loop swap
job.
Chapter 14
As part of protein design, it can be useful to cross-link two proteins. For example, suppose you
have two oligopeptide fragments or protein domains that bind to a third protein. Both frag-
ments or domains need to bind to the third protein for function. To increase efficacy, you might
want to try to tether those two fragments or domains together. Another use for cross-linking is
for circular permutation of a protein, in which you connect the termini and break the chain at
some other point. Provided the break is outside the binding region, you could create a protein
that still binds a ligand, but may interact differently in the cellular environment.
The Crosslink Proteins panel allows you to cross-link two pre-positioned proteins by connec-
ting chain termini with peptide linkers. To open this panel, choose Tasks Crosslink Proteins.
Before you can add the linkers, you must ensure that the proteins are in a single project entry. If
they are in different entries, you can create a project entry by clicking the Workspace button on
the Manager toolbar to show the Workspace toolbar, then click the Create Entry button, and
name the entry in the dialog box that opens.
As for any modeling exercise, you should ensure that the protein is prepared, by using the
Protein Preparation Wizard. To cross-link the proteins, you must delete all het groups and
waters when you prepare the protein, and ensure that the protein contains only on the standard
amino acids. The het groups can be restored later, for example by creating another project entry
that contains only the het groups and waters, then merging this entry with the results of the
cross-linking.
When you pick the termini, it can be useful to select them in the Workspace first. To do this,
you can display the sequence viewer (Edit Settings Show Sequence Viewer), select the
terminal residues in the sequence viewer, then choose A Zoom in the Selection row in the
Toggle Table. The residues are marked with yellow selection markers, which makes them easy
to pick.
To pick the first terminus, select Pick residue in Workspace for Connection residue one, and
pick a terminal residue in the Workspace. A warning is posted if you pick a non-terminal
residue. The text box is filled in with the residue ID in the form chain:resnum(resname), e.g.
A:1(PRO). The alpha carbon of the residue is marked with a green sphere in the Workspace.
After the residue is picked the check box is automatically cleared.
To pick the second terminus, select Pick residue in Workspace for Connection residue two, and
pick a terminal residue in the Workspace. If residue one was an N-terminal residue, residue
two must be a C-terminal residue, and vice versa. A warning is posted if you pick a non-
terminal residue or a terminus of the wrong type. The text box is filled in with the residue ID as
for the first residue, and the alpha carbon of the residue is marked with a green sphere in the
Workspace. When the residue is picked the check box is automatically cleared, and the two text
boxes are colored green to indicate that the definition of the connection residues is complete.
After the two residues are picked, the Inter-residue distance text box is filled in with the
distance between the alpha carbons of the two picked residues. The distance is used to calcu-
late and display the approximate number of linker units needed to link the proteins.
To define a monomer, enter the sequence of 1-letter codes for the monomer in the Define multi-
residue monomer text box, and click Add to List.
To import monomers, click Add from File, and choose the file in the file selector that opens.
The file must be a text file with one sequence per line. All the sequences in the file are added to
the Choose Monomers option menu.
The next step after defining the monomers is to select the monomers you want to use, from the
Choose monomers menu. The monomers that you define are automatically selected, and their
sequence is displayed in the text box of the menu. To add a monomer, choose it from the menu.
The selected monomers have a check mark next to them. To remove a monomer, choose it from
the menu; the check mark is removed and the sequence is removed from the text box. You can
choose monomers of different lengths to create linkers with different numbers of residues.
Once you have chosen monomers for the linkers, the Inter-residue distance / Average chosen
monomer length text box reports the ratio of the inter-residue distance to the average length of
the monomers chosen for the linker. This ratio gives an approximate number of monomers that
must be included in the linker to form a proper link. You can set the minimum and maximum
number of monomers in the linkers on the basis of this information, in the Lengths of linker
chains to build boxes.
The monomer units in each linker are selected randomly for each linker that has more than one
monomer. The number of linkers to construct for each number of monomers can be specified
explicitly or as a percentage of the possible variations, by choosing one of the Random compo-
sition options, and providing the number or percentage in the box. The total possible linkers
scales as NM, where N is the number of monomers chosen and M is the linker length.
The random selection can be modified by specifying the fraction of each monomer that you
want in the linkers. Select Fraction of monomer, choose the monomer from the option menu,
and set the limits in the text boxes. Linkers for which the fraction of each monomer does not
fall within the specified range are discarded. No checking is done that the fractions add up to 1,
so you must ensure that your choices do not result in impossible requirements (such as setting
the minimum for two monomers greater than 0.5). A warning is presented if no linkers are
generated, so if you see this warning, you should check the fractions, if you have set them.
Loop lookup from curated PDBUse a table of known loops taken from the PDB to
determine the loop conformation. This is the fastest method.
Simple de novo loop creationBuild the loop residue by residue to produce a single loop
conformation that has no clashes with the existing structure.
It is a good idea to refine the structure after it is built, as the simpler methods used here might
not give the optimal conformation, though a minimization of the linker is performed as part of
the procedure.
The chains are connected with each linker that is generated, and the strain energy for the linker
is evaluated as the difference between the linker energy in the linked conformation and the
minimum energy of the linker in the unbound conformation. You can choose a method for
calculating the strain energy of the new loop from the Energy calculation option menu.
To start the job, click Run, or click the Settings button and make settings in the Job Settings
dialog box, then click Save and Start. The time taken depends on the length and number of
linker chains.
The strain energy is useful to score multiple possible cross-linking chains on a relative basis.
Generally, chains with lower strain energies are better. Performing a search of multiple lengths
and conformations and focusing on those cross-links with the lower strain energies helps to
select those candidates that are more likely to accommodate the connected domains in the
starting conformation. However, note that the strain energy is only one component of the
energy of the linked protein structure: it does not include the interaction energy between the
linker and the protein, which can compensate for the strain.
If you want to examine the structures, use the Step through results buttons, which display the
structures in turn in the Workspace. The table row for the structure in the Workspace is high-
lighted. You can also add the original structure to the Workspace by selecting Show original
structure in gray for comparison.
If you want to display the results at some other time, you can click Export, and export the
results to a zip file. The results are also present in the Project Table. To view the results the
Results tab later, you can click Import, and navigate to the zip file.
Chapter 15
This chapter describes how to scan a protein for potential residue mutations, generate mutated
structures, and compare the properties of the mutated structures. The mutation sites and the
mutations can be selected manually or selected automatically based on homology modeling or
3D structural criteria.
If you have not already opened the panel, open it from the Tasks menu, as described above.
You can display the protein before or after opening the panel.
The first part of the procedure is to import the protein into the workflow. You can either use the
Workspace structure, selected residues in the Workspace structure, or a structure from a
Maestro file. If you choose one of the Workspace options from the Import structure from option
menu, click the Import button to import the protein. If you choose Maestro File, click the
Browse button to locate and open the file.
If your structure has not been prepared for modeling, you are asked if you want to use the
Protein Preparation Wizard to prepare the structure. You cannot proceed if you do not have a
protein that is an all-atom, 3D structure with bonding information. After preparing the protein,
you can import it into the workflow.
When the structure has been imported and analyzed, the residues table is filled in with all the
residues in the chain or chains that can be mutated.
For more information on the stability and affinity, see Section 15.7 on page 109.
The residues that are mutated when you run the job are the residues that have mutations
defined. The number of residues for which mutations are defined is reported at the bottom of
the tab, below the report of the number of residues selected in the table.
There are two main ways of choosing the mutations: manually, or based on analysis of homo-
logs to the structure of interest. Manual selection is described in this section, and using
homology is described in Section 15.4 on page 102.
You can choose mutations for individual residues, or you can apply a set of mutations to a
number of residues.
When you select rows in the table, the Workspace view zooms in to the residues you have
selected, if Fit on select is checked. The selected residues are highlighted with green carbons,
and the remaining residues are dimmed. (You can adjust how far the view zooms in by making
a setting in the Preferences panel. Choose Edit Settings Preferences, select Fitting under
Workspace in the tree on the left, then enter a value in the Fit margin text box.)
A text message is displayed at the bottom of the tab that reads Will mutate M of N residues;
Total res types: K. The number of mutations is the total number of structures generated. Only
one site is mutated at a time, so the number of structures is linear in both the number of sites
mutated and the number of mutations at each site.
Homology suggestions are generated for a single chain. You can choose the chain from the
Chain option menu. Once the selection is complete, click Save to apply the suggestions to the
residue list, including the mutations obtained from the homologs. To generate suggestions for
more than one chain, reopen the panel, and choose another chain.
At the end of the job, the BLAST Search Results dialog box opens, so that you can choose how
many of the homologs you want to use. You can do one of the following to select homologs:
When you have selected the homologs, click Incorporate Selected Rows to add the structures
to the sequence viewer.
If you have a set of homologs in a file, click the Import homologs button and choose From
file. A file selector opens so you can locate and import the file.
If you want to manually import structures from the PDB, click Import and choose From
PDB ID, then specify the PDB ID in the dialog box that opens.
If the structures are in the Workspace, click Import and choose From Workspace.
To run the alignment job, click Align Homologs, or click the Multiple Alignment toolbar button.
The alignment, which uses ClustalW, is usually quick, but a progress dialog box may be
displayed briefly. The alignment adds gaps in the sequence viewer as necessary.
To align sequences manually, you can drag residues to the right or the left, to fill or create gaps.
If you have already created gaps that you want to preserve during another alignment, you can
click the Lock Gaps toolbar button. The gap symbol changes to - to indicate that the gap is
locked. To unlock the gaps again, click the Unlock Gaps toolbar button.
Variability at position >/< N %Select residues based on the percentage variability at the
residue position. The variability is the normalized Shannon entropy for residue variation
at the position, expressed as a percentage. You can apply a minimum or a maximum vari-
ability by choosing from the option menu, and specify the cutoff in the text box.
Variability at position >/< N residue typesSelect residues based on the residue type vari-
ability at the residue position. The variability is the number of residue types observed at
the position. Choose whether to apply a minimum or a maximum variability from the
option menu, and specify the threshold for the number of residue types that can vary in
the text box.
Ignore positions with group conservationsIgnore (do not select) residues that are
strongly conserved or that are either strongly or weakly conserved. Select the appropriate
option for strong conservation or both strong and weak conservation.
Strong conservation means that all residues at a particular position are in one of the fol-
lowing groups: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW. Weak
conservation means that all residues at a particular position are in one of the following
groups: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK,
FVLIM, HFY. These definitions are those used by ClustalW.
Ignore parent sequence in applying above criteriaIgnore the parent (query) sequence
when applying the variability or conservation criteria. The selection is then based on the
variability in the homologs only.
Parent residue different than N % homologs at positionSelect residues for which the
parent residue is different from more than the specified percentage of homologs at the
residue position.
When residues are selected on the basis of homology, a default set of mutations is also defined.
The mutations for a given residue include all the variants found at that residue position.
Solvent accessible surface areaSelect this option to select residues by their solvent-
accessible surface area (SASA) relative to an isolated residue of the same type, and set a
threshold for the maximum or minimum allowed relative SASA. This option is useful for
locating surface or buried residues. (The SASA is calculated by the Lee-Richards method
[11], which is also used in Canvas and Phase).
Residue side chain makes no more than N interactions with proteinSelect this option to
filter out residues whose side chains have multiple interactions with other protein resi-
dues, and set the maximum number of residues with which the side chain has interac-
tions.
Residue side chain interacts/does not interact with molecule NSelect this option to
select residues by their interaction with a particular molecule. Choose whether to allow or
disallow the interaction from the option menu, and specify the molecule in the text box,
or select Pick and pick the molecule in the Workspace. This is useful for mutating resi-
dues near a ligand, or for mutating residues at protein-protein interfaces, for example.
Interactions are determined by a distance cutoff: any residue that has atoms within 4 of the
side chain is considered to interact.
If you select residues only by their structural attributes, there are no suggestions for the muta-
tions, so the result is that the matching residues are selected in the residues table. You can then
set the mutations for these residues. If structural attribute criteria are combined with homology
criteria, then the mutations are set as well as selecting the residues.
The region around the mutation site that is relaxed during the calculations is defined by a cutoff
distance, which you specify in the Cutoff text box. Any residue that has an atom within the
specified distance of a hypothetical Arg residue at the mutation site is included in the refine-
ment. A hypothetical Arg residue is used to ensure that the set of residues refined is identical,
regardless of the initial or mutated residue identities. This ensures that comparisons of the
properties of the mutated structures are not affected by the choice of residues that are relaxed.
Before the minimization of the region around the mutation site, a side-chain prediction of the
mutated residues is performed, which does a thorough exploration of possible side-chain
conformations. The method used for the side-chain prediction can be chosen from the Refine-
ment option menu. The choices are:
If you want to use an implicit membrane model for the protein, you can set it up by clicking
Advanced Options. Select Use implicit solvent membrane, and click Define Membrane to place
the membrane on the protein. The membrane is modeled with a low-dielectric slab.
mutations per residue small if you choose this option, to avoid combinatorial explosion.
For example, mutating 5 residues with only one mutation per residue results in 31 possi-
ble mutated structures; with two mutations per residue the number of possible mutated
structures is 242; with three mutations it is 1023; with four it is 3224; and with ten it is
161050.
You can set a limit on the number of mutations in any structure in the Maximum number of
simultaneous deviations from input text box. This restricts the search space, which is useful if
you are making many mutations at a number of sites.
You can also specify the maximum number of output structures to return in the Maximum
number of output structures text box. The structures that are returned are those with the best
(most negative) value of the binding affinity or the stability.
The change in stability is always calculated, and also the solvent-accessible surface area
(SASA), hydropathy, rotatable bonds, and complementarity. The properties are described in
more detail in Table 15.1 on page 110. By default, the pKa property is not calculated, as this is
the most time-consuming property to calculate. To calculate it, click Advanced and select
Report changes in pKa in the Residue Scanning - Advanced Options dialog box.
If you run the job on a multiprocessor host, you can divide the job into subjobs and distribute
them over multiple processors. The minimum work a subjob can do is to mutate one residue to
another residue, so you should not specify more subjobs than there are mutations. For optimal
load balancing, the number of subjobs should be a few times the number of processors. The job
takes several minutes per residue mutation to run, depending on the refinement options.
Monte Carlo affinity maturation jobs perform a random walk with a different random seed in
each subjob if the job is distributed over multiple processors. The results of each walk are
collated at the end to select the structures with the best values of the stability or affinity.
If you want to examine results from a job that was completed earlier, you can open this panel
by choosing Tasks Residue Scanning View Results. You can then click Import to locate
the Maestro file (.maegz) that contains your results and import it into the current project.
The results are listed in the Mutations table, with one mutated structure per row. The mutations
in each structure are identified by the residue positions and the original and mutated residue
names. For residue scanning, there is only one mutation per structure, but for affinity matura-
tion, there may be multiple mutations. The properties that were calculated for each mutant are
listed in the table. The properties are described briefly in Table 15.1. Some of these properties
are described in more detail in the sections below. In addition to the properties listed, the
change in the Prime energy properties are also available. All properties are calculated after the
refinement is performed, and so include relaxation of the protein after mutation.
Property Description
For affinity maturation, residue-level properties are summed, so they represent total changes
due to the mutations. Dividing by the number of residues would give the average change in
these properties. The properties for the individual residues are not reported. You can also create
a LOGO plot from the affinity maturation results by clicking Create LOGO plot. The plot is
written to a .png file and the data to a .csv file with the job name as the base name.
You can sort the table columns by clicking on the column headings. You can plot any of these
properties against the mutation (table row) by choosing the property from the Graph property
option menu. If you want to export the table data as a CSV file, click Export, and navigate to a
location and name the file.
You can select a region in the graph using the horizontal and vertical dashed lines, which can
be dragged to create the selection. The rows corresponding to the selected region of the graph
are highlighted in the table above, and the residues are highlighted in the Workspace.
If you select a table row, the view zooms in to the mutated residue. To display the original
structure, select Display original structure in grey. The parent protein is displayed and colored
grey. You can then see how the mutation is positioned in relation to the original residue.
G1
R+L RL
G3 G4
G2
R+L' RL'
where R is the receptor, L is the ligand in the parent, and L' is the mutated ligand. R+L and
R+L' represent the separated receptor and ligand. RL and RL' represent the receptor bound to
the ligand. The change in binding affinity is
G(bind) = G2 - G1 = G4 - G3
Experiment measures G1 and G2, but it is G3 and G4 that are calculated, to optimize the
cancellation of error in the computational models. The calculations are done with Prime MM-
GBSA, which uses an implicit (continuum) solvation model. A negative value indicates that
the mutant binds better than the parent protein.
G1
L(u) L(f)
G3 G4
G2
L'(u) L'(f)
where L(u) is the unfolded parent ligand, L(f) is the folded parent ligand, L'(u) is the unfolded
mutated ligand, and L'(f) is the folded mutated ligand. The change in stability is
G(stability) = G2 - G1 = G4 - G3
Experiment measures G1 and G2, but it is G3 and G4 that are calculated, to optimize the
cancellation of error in the computational models. For the purpose of the model, the unfolded
ligand is represented as a tripeptide, A-X-B, where X is the residue that is mutated, and A and
B are its neighbors, capped with ACE and NMA. The assumption is that the remaining interac-
tions in the unfolded state are negligible. The calculations are done with Prime MM-GBSA,
which uses an implicit (continuum) solvation model.
The structure file must be in Maestro format and contain either a single protein structure, with
or without a ligand, or a receptor structure followed by a set of ligand structures (a pose
viewer file).
The output Maestro file (jobname-out.maegz) contains first the starting structure, then the
refined mutant structures. The property s_bioluminate_Mutations identifies the mutation
made in each structure relative to the starting structure (for which this property is None), in the
form chain:resnum(oldres>newres), e.g. H:103(TYR>ALA). The other properties reported
are differences in property values between the mutant and the starting structure, as listed in
Table 15.1.
The first task is to open or create a database, which you can do with the tools at the top of the
panel. The last database used is stored as a preference, so it is opened whenever you open the
panel. The current database is the one that is used to add nonstandard residues to the residue
selection tools.
Once you have opened an existing database or created a new database, you can add nonstan-
dard residues to it. If you already have structures for these residues, you can import them from
a file, from the selected entries in the project, or from the Workspace. The residues are added,
and a check is done as the residues are imported to detect duplicates (by the 3-letter residue
name). If a duplicate is found, a warning is posted, and you can choose to overwrite the
existing residue with the imported residue, assign a new name to the imported residue by using
the automatic naming scheme, or provide a new name for the imported residue.
The table that shows the contents of the database lists standard residues and built-in nonstan-
dard residues in addition to the custom residues that you add. The extra residues are not actu-
ally stored in the database, as they are already in the installation, but they are presented in the
table so that you can select them as templates for building new residues. The built-in nonstan-
dard residues are also listed so that you can mark them to appear in the residue selection tools.
The table rows are described in Table 15.2.
Column Description
Name Three-letter residue name. This name must be unique in the database. You can edit
the table cell to change the name of any custom residue.
Code One-letter residue code. This code does not have to be unique.
Structure 2D structure of the residue. A larger version is displayed in a tooltip when you
pause the pointer over the table cell. If the residue is a custom residue that is not
locked, you can double-click in the cell to edit it in the Sketch Non-Standard
Residue tab.
Locked This column indicates the source of the residue if it came from the installation, or
the edit status of the residue. Residues that are locked cannot be edited unless they
are unlocked.
Mutate to The check box in this column marks residues that are included in the residue selec-
tion tools for mutation.
Description Description of the residue, such as its chemical name. You can double-click in the
cell to edit the description.
The table also has a shortcut menu, from which you can perform various actions on the table.
The menu items are described in Table 15.3.
Item Description
Use as Template Use the selected row as a template for building a new residue. When you
choose this item, the residue is added to the drawing area in the Sketch Non-
Standard Residue tab, where you can modify it and add the new residue to
the database. This item is only present if a single residue row is selected.
Enable Mutate To Check the box in the Mutate to column of the table for the selected residues,
so that they appear on lists in application panels. This item is not present if all
the residues selected are standard residues.
Lock/Unlock This item is only present for custom residues. Locking a residue makes it
noneditable until it is unlocked. The Edit Structure item does not appear on
the menu for locked residues (or for standard or built-in residues).
Edit Structure Edit the structure of the residue. The structure is displayed in the Sketch
Non-Standard Residue tab, where you can modify it and update it in the
database. This item is only present for custom residues that are not locked.
Set 1-letter code Set the 1-letter code for the residue. The 1-letter code does not have to be
unique. This item is only present for custom residues that are not locked.
Set stereoisomer to Change the stereoisomer from L to D or D to L. This item is only present for
isomer custom residues that are not locked.
Select All Select all residues in the table.
Select Inverse Invert the selection of residues in the table: the selected residues are dese-
lected and the unselected residues are selected.
Export to File Export the selected residues to a Maestro file.
Add to Project Table Add the selected residues to the Project Table as new project entries.
Duplicate Rows Duplicate the selected rows in the table. The new rows are added at the end of
the table, with new, unique names.
Delete Rows Delete the selected rows from the table. This action removes these structures
from the database.
You can build new residues or edit existing residues using the tools in the Sketch Non-Standard
Residue tab, as described below.
Figure 15.6. The Nonstandard Residues panel, Sketch Nonstandard Residue tab.
The remaining task in this panel is to select the nonstandard residues that will be shown in the
residue selection tools.
Chapter 16
Disulfide bridges between cysteine residues add to the stability of a protein structure. Mutating
residues to form or break disulfide bridges offers a way of controlling the stability of a protein.
This chapter describes how to run a cysteine mutation calculation to locate and rank possible
disulfide bridges. The calculation is set up and run in the Cysteine Mutation panel, in the Run
tab, and the results are presented in the Results tab.
If you want to analyze a single protein, first display it in the Workspace. The protein must be
one that has been prepared for use in modeling. If it has not been prepared, we recommend that
you prepare it with the Protein Preparation Wizard (on the Tools menu and the Tasks menu).
Details of preparing a protein can be found in the Protein Preparation Guide.
The protein must be analyzed to locate possible residue pairs that could be mutated to cyste-
ines, or to locate disulfide bridges that could be broken by mutation to other residues, which
you do by clicking Analyze Workspace. Instead of analyzing the entire protein, you can
analyze the Workspace selection. To do this, select the desired residues in the Workspace struc-
ture, select Analyze only selected Workspace residues in the Cysteine Mutation panel, then
click Analyze Workspace.
If your structure has not been prepared for modeling, you are asked if you want to use the
Protein Preparation Wizard to prepare the structure. You cannot proceed if you do not have a
protein that is an all-atom, 3D structure with bonding information. After preparing the protein,
display it in the Workspace again and click Analyze Workspace.
If you want to analyze a molecular dynamics trajectory, click Analyze MD Trajectory. A file
selector opens, in which you can locate a Desmond MD simulation results file (-out.cms).
After selecting the file, you are prompted to specify the interval at which the analysis is
performed on the trajectory, as a number of steps. The analysis takes some time, and is run
under job control. Running a Desmond MD simulation requires prior protein preparation, so no
further preparation is necessary.
When the structure has been analyzed, the Residue pairs for mutation table is filled in with all
the residue pairs that meet the criteria for forming or breaking a disulfide bridge. The criterion
for identifying potential cysteine pairs is a CC distance between the residues that is less
than the distance specified in the panel. For Gly, the distance is taken from the alpha hydrogen.
Column Description
(Index) The first column contains the index of the residue pair. When the table is filtered
to show only certain residue pairs, this index remains the same (i.e. it is not a
table row number).
Type Mutation type, which can be one of the following:
X-X -> S-S: Mutation of two residues to Cys with formation of a disulfide bond.
S-X -> S-S: Mutation of one residue to Cys with formation of a disulfide bond to
a nearby cysteine
S-S -> S-X: Mutation of one Cys residue of a bonded pair to break a disulfide
bond.
Residue 1 Identity of the first residue in the pair, given by the chain letter, the residue num-
ber and insertion code, and the 3-letter residue name. For Cys-Cys pairs, this res-
idue is the residue that is mutated, so the pair is listed twice, in opposite order, to
allow selection of only one of the pair to mutate.
Residue 2 Identity of the second residue in the pair, given by the chain letter, the residue
number and insertion code, and the 3-letter residue name.
-carbon Distance Distance in angstroms between the beta carbons (CB) of the two residues. In the
case of Gly, the distance is taken from the alpha hydrogen (HA) that would be
replaced by the beta carbon of the Cys.
Separation Sequence separation between the residues in the pair, defined as the number of
residues between the two residues. Displayed as N/A if the residues are in differ-
ent chains.
SASA Combined solvent-accessible surface area of the two residues in the pair.
Sec. Structure Secondary structure elements of the two residues in the pair (helix, strand, etc.).
If both have the same secondary structure element, it is only given once, other-
wise the two elements are given in the form element1/element2.
B Factor Temperature factors (crystallographic B factors) of the two residues in the pair,
represented as B(residue 1)/B(residue 2).
Frame Trajectory frame, if an analysis was performed on a trajectory. Each pair can
come from a different frame, and the structure in that frame is the one that is
mutated.
The first option menu allows you to include or exclude secondary structure elements for
one or both residues, and the second option menu allows you to choose the secondary
structure elements.
f. Use the Solvent accessible surface area option, menu, and text box
Specify the minimum or maximum combined SASA for the residue pair.
These display options allow you to restrict the list of residue pairs to those that are of interest,
so that it is easier to select the pairs in the table that you want to mutate. The job mutates only
the selected pairs in the table, so you must make a selection before you run the job.
If you have Cys-Cys pairs whose bond you want to break by mutating one of the cysteines to
another residue, you must also select the mutations, from the Cys -> X replacement residues
option menu. The option menu works in the same way as the X -> Cys replacement residues
option menu, described above.
Residues within N Optimize all residues that have atoms within the specified distance
of the mutated residue pair.
Adjacent residues in sequenceOptimize the N residues next in the sequence on either
side of the residues in the pair, where N is selected from the option menu.
NoneDo not optimize any residues but the two residues in the pair.
You can also choose whether to run the minimization in the gas phase (the fastest option), or
use an implicit solvent model, or not to minimize at all. If the solvent-accessible surface area of
the minimization region is negligible, the implicit solvent model may not be of any value; if the
SASA is large enough, the implicit solvent model should probably be used.
If you want to choose whether to incorporate the results into the project, click the Settings
button. The Job Settings dialog box opens. There are two output options:
The job is run locally, and its progress is displayed in a status bar at the bottom of the panel.
If you want to run the job from the command line, click the Settings button arrow to display its
menu and choose Write. The input files and a shell script (.sh) to run the job are written to a
subdirectory of the current directory, all named with the job name. You can then execute the
shell script to run the job.
The results of the mutation job are displayed in the Mutations table. The table columns are
described in Table 16.2.
Column Description
Residues Chain name, residue number and insertion code of both residues in the mutated
pair.
Original 3-letter names of original residues in the pair.
Mutated 3-letter names of mutated residues.
Ei Change in interaction energy between the residue pair and the rest of the protein
on mutation.
Strain E Change in strain energy on mutation. The strain energy is the difference in internal
energy between the state of the residue pair in the protein and the relaxed state of
each residue or of the disulfide in the gas phase.
Ei+Strain E Change in the sum of interaction energy and strain energy on mutation, equal to
Ei + Strain E.
Pre-min Score Geometric energy score of the mutated protein prior to minimization. The score is
calculated using an empirical function that is derived from the distributions of the
internal coordinates of cysteine disulfides in the PDB. Geometries that are seen in
the PDB for disulfides yield lower scores.
Weighted Score Weighted sum of change in interaction energy, change in strain energy, pre-mini-
mization score and post-minimization score. The last of these uses the same
method as the pre-minimization score and is reported in the Project Table, but is
not reported here. The score includes a shift for mutations for which any of the
four components of the weighted score is larger than a threshold for that compo-
nent. Sorting the weighted score places the mutations that pass all threshold tests
first, followed by all the others.
Selecting a row in the table zooms in on the mutated pair in the Workspace, if you have Fit on
select selected. The mutated residues are displayed in ball-and-stick representation, and the
rest of the structure uses a darker color scheme. If you want to see the original residues as well,
select Display original structure in grey.
Chapter 17
The main task of antibody modeling in BioLuminate is to construct a homology model based
on a database of antibody templates. Once the model is constructed, it can be humanized if
required. A database of antibody templates is provided with BioLuminate, based on a new
analysis of antibodies in the PDB from 2010 [2], which you can modify or add to, or create
your own database. Antibody-antigen docking is covered in Section 11.5 on page 75.
The modeling is performed using the Antibody Prediction panel, which you open by choosing
Tasks Antibody Modeling Prediction in the main window.
The left part of the panel displays a diagram of the Fv region of an antibody. Clicking on the
light variable or heavy variable region in the diagram displays a menu, from which you can
choose the source of the sequence for this region. The choices are:
Browse for FileOpen a file browser in which you can navigate to the desired location
and select the file that contains the sequence.
From WorkspaceUse the sequence for the structure that is displayed in the Workspace.
Only one structure must be displayed.
From Selected Entries in the Project TableUse the sequence from the entry that is
selected in the Project Table. Only one entry must be selected.
From PDB IDUse the sequence from the specified PDB ID. Opens the Enter PDB ID
dialog box, in which you can enter the PDB ID of the sequence. The sequence is retrieved
from a local copy of the PDB if it is available, or from the RCSB web site, depending on
the preference set for PDB retrieval.
Enter/Paste New SequenceType or paste in the sequence. Opens the Sequence Editor
dialog box, in which you can name the sequence and type it or paste it in, as a string of
single-letter codes.
Figure 17.2. The Antibody Prediction panel showing the import menu.
If you intend to model only part of an antibody and use the input structure for the rest, you
should ensure that you import a sequence that has an associated structure. This means that you
can choose any of the menu items except the last.
When the protein is read in, it is analyzed to find the chains and the loops. If there is more than
one possible chain that could be used (for example, in a dimer), a dialog box opens, in which
you can choose one of the chains. When the analysis finishes and you have chosen a chain if
requested, the region in the diagram is colored to indicate that the chain is assigned. When both
chains have been chosen, the text prompting you to import the two chains is no longer
displayed.
You can view the sequences in a sequence viewer at any time by clicking View Sequences.
You can choose one or more databases to use when modeling an antibody. The default is the
database in the installation. If you want to select the databases, click Choose Databases, to
open the Choose Database dialog box. This dialog box has a table of databases that you can
choose from. You can do the following:
Add a database to the list, by clicking Add Database and navigating to the database in the
file selector that opens.
Select a database to use, by checking the check box in the Active column.
Remove a database from the list, by clicking the button in the Actions column.
When you have finished modifying the list and choosing the active databases, click OK.
If you want to use the coordinates from the input structure, select Input structure. The
sequences that you imported for the light and heavy regions must of course be associated with
a structure, and an error message is posted if there is no structure available.
If you want to use a homology model, select Homology search. To run the search, click Search.
When the search finishes, the table in the lower part of the Framework Search tab is filled in
with the results, in order of their score. The table columns are described in Table 17.1.
Column Description
You can select a single template for the antibody, or you can select different templates for the
heavy and light chains (and a template for the common framework).
If you want to use different templates for the heavy and light chains, select Use separate
templates for heavy and light chains. The search results table is split into two, one for the heavy
chain and one for the light chain. You can select one chain from each table. You can also select
a template for the common framework region that is different from either of these. To do so,
click Select Common Framework, and choose the template in the dialog box that opens.
Figure 17.4. The Antibody Prediction panel after searching for templates.
When you have selected templates or chosen to use input coordinates, click Accept to accept
the choice and move on to the next stage. The Basic Loop Model tab is displayed automatically,
after a short pause.
Figure 17.5. The Antibody Prediction panel for selection of separate H and L templates.
First, choose a property from the Available properties list at the top of the panel. The list shows
the property name, the family (category) it belongs to, and the range of values, for numeric
properties. You can limit the list to a particular property family by choosing from the Show
family option menu. If you type in the Property text box, a completion list is displayed below it,
from which you can choose a property.
Once you have chosen a property, it is displayed in the Property text box. You can then use the
text boxes and menus to the right to define the restrictions on the values of this property. Click
Add to add the filter to the Filtering definitions and criteria list. You can add multiple criteria or
definitions, and each of them is applied to the databases. The number of structures in the data-
base that match the filters is reported at the end of the list. The search is case-insensitive, so for
example 1FSK matches 1fsk. To do case-sensitive searches, select Case-sensitive.
If you want to filter on multiple properties, you can choose another property, set up the restric-
tions on its value, and click Add to add the filter to the list. The filters are cumulative (implicit
AND), so the resulting structures are those that match all filters.
As another example, to filter our structures for which a property has values in a given range:
The six CDR loops are listed in a table that provides radio buttons for selecting the source of
input coordinates. To choose the source, select the appropriate radio button. You can only
choose to use the input structure if the framework also came from the input structure. By
default all loops are predicted from the database.
Figure 17.7. The Antibody Prediction panel, Basic Loop Model tab.
If you want to choose individual templates for one or more loops, double-click in the Loop
Template column of the input table in the Basic Loop Model tab, or right-click and choose
Select Cluster, to open the Loop Clusters dialog box. This dialog box shows you a list of clus-
ters of structurally similar templates for each loop, with information on the template with the
highest sequence similarity to the query, and allows you to select that template to use in the
model. The procedure is described in more detail in Section 17.1.7 on page 133. Choosing a
template automatically selects use of the antibody loop database for the loop. Information on
the template chosen is shown in the Loop Template column of the table as User(PDB ID); this
column shows Automatic for automatic prediction from the database, or Input Structure if the
loop coordinates are being taken from the input structure.
If you want to include the template antigen in the model, select Include template antigen. This
option is only available if the template has an antigen, which is indicated in the list of templates
in the Framework Search tab.
Likewise, if you want to include ligands, cofactors, or water in the model, select them in the
ligands list. Only ligands are displayed by default. To display the cofactors and water, select
Show all above the list.
Otherwise, the selection of the loop is done by sorting the clusters by the size of the cluster,
then locating the largest cluster in the list that has a member whose loop sequence similarity to
that of the query is greater than the loop similarity cutoff. The cluster member with the greatest
similarity to the query is the one that is used to build the loop.
You can build more than one model for the structure, by setting the desired number in the
Number of models text box. If more than one model is requested, a series of diverse models is
returned. The first model returned is usually the most likely to be correct.
To generate the loop model or models, click Generate Loop Models. To assess the quality of a
generated model, you can examine it in the Workspace (click View in Workspace), or analyze it
in the Structure Quality Viewer (click View in Structure Quality Viewer). If you have multiple
models, you can choose them from the Model to view option menu.
When you view a model in the Workspace, it is colored by residue with the following color
scheme:
Blueresidues for which the full residue conformation was copied from the template.
Cyanresidues for which the residue backbone conformation was copied from the tem-
plate, and the side chain was modeled (because there was a residue mutation in the tem-
plate relative to the query).
Redresidues for which both the backbone and the side chain were modeled.
Maroonresidues in the CDR loops.
When you view the models in the Protein Structure Quality Viewer, all models are listed in the
table at the top of the viewer, and the structures are colored in the Workspace according to the
Ramachandran plot regions.
To model the full Fab region, select Generate Fab model, then click Browse and select a struc-
ture file that contains the desired Fab constant region.
To model the entire antibody including the constant region (2 Fab and Fc), select Generate full
antibody model, choose an Fc template from the table, and click OK. There are very few
templates, so you should be aware that the quality of the results in this region might be low.
The model of the Fc region is built even if the sequence for this region is incomplete or
missing, by using the template sequence. Generate Fv model is selected by default.
If you want to change the definitions of the loops, select Use custom CDR loop region, and edit
the starting and ending residues of any of the loops in the loop table. You can reset any value to
its default by right-clicking in the table cell and choosing Reset.
When this dialog box is first opened, the antibody databases are searched for loops of the same
length as those in the query sequence for each of the six loops, and the loops are clustered
structurally. The dialog box may therefore take a while to open. The progress of the clustering
is reported in the Basic Loop Model tab, below the input table. Subsequently these loops and
clusters are reused, so opening is much faster.
The clusters for the loop chosen from the Loop option menu are listed in the table, in order of
decreasing maximum sequence similarity to the query sequence. For each cluster, the loop
template with the highest sequence similarity to the query sequence within the cluster is chosen
as the cluster representative. The list is restricted to the clusters for which the similarity of the
representative is greater than the cutoff specified in the Minimum similarity box. Information on
the cluster representative is given in the table. The table columns are described in Table 17.2.
You can sort the table by clicking on a column heading.
You can set the minimum acceptable similarity between a loop from the database and the query
loop in the Minimum similarity box. Only the clusters that contain at least one loop whose simi-
larity to the query is greater than this threshold are used in building the model, whether in the
default automatic procedure or by manual selection of a cluster.
Column Description
Source PDB ID of the member of the given cluster that has the highest sequence
similarity to the query (this member is called the cluster representative). The
template used for the framework region is also included in the list.
Max Similarity Sequence similarity of the cluster representative to the query.
Sequence Sequence of the query for this loop (above) and of the cluster representative
(below). Residues that differ are marked in red.
Members Number of members of the cluster.
Resolution Resolution of the X-ray structure for the cluster representative, in angstroms.
Average B-Factor Temperature factor of the X-ray structure for the cluster representative, aver-
aged over the backbone atoms of the loop.
RMSD to Framework RMS deviation of the coordinates of the CDR loop stem residues in the clus-
ter representative from those of the framework template.
RMSD within Cluster Average of the RMS differences of the coordinates of each pair of loops in
(Average) the cluster
RMSD within Cluster Maximum RMS difference of the coordinates of any pair of loops in the
(Maximum) cluster
If the minimum cluster similarity is changed, this change applies to all loops. If no cluster is
found with a loop that has the minimum similarity, then the program automatically uses the
template loop in the database that has the greatest similarity to the query, no matter which
cluster it belongs to.
Figure 17.9. The Antibody Prediction panel, Advanced Loop Model tab.
If you generated more than one model, you should select the model you want to use from the
Model to view option menu before leaving the Basic Loop Model tab.
Here are some general guidelines for when to refine a loop with Prime in the Advanced Loop
Model tab:
If the loop is long (9 residues or more) and the homology to the template is good (similar-
ity above about 80%) then refinement with Prime is not usually necessary.
If the loop sequence similarity is less than 40%, the basic model quality usually is very
poor and a Prime refinement is recommended.
When building a new H3 loop with new sequences on the native (crystal) structure of an
antibody, Prime refinement is recommended.
The quality of the Prime refinement is greater for shorter loops. Accurate and detailed loop
predictions using Prime can take hours for each loop. It is not usually necessary to run an
advanced loop prediction for any of the loops except the H3 loop. The H3 loop is selected by
default for an advanced prediction, as it is the hardest loop to predict. The results are returned
to the Project Table as new structures.
The input structure for the loop prediction is always the structure that came from the Basic
Loop Model tab. This means that you cannot do sequential loop predictions in this panel, but
you can use the Refinement panel to predict more than one loop (Tasks Loop + Sidechain
Prediction). See Chapter 6 of the Prime User Manual for information on refinement tasks.
To choose the loop for prediction, select it in the table. Click Run Prime to run a Prime loop
prediction job for the selected loop. A dialog box opens so you can name the job, select a host,
and specify the maximum number of processors to use. The single, best loop prediction is
returned. The Prime loop prediction algorithm is automatically selected based on the length of
the loop.
The same controls as in the Basic Loop Model tab are present for viewing the structure.
17.1.9 Summary
The basic procedure for running a prediction using a homology model is as follows:
1. Import the sequences for the light and heavy chains: Click the part of the diagram for the
region you want to import, and choose a source from the list that is displayed.
2. Set up the antibody databases that you want to use to search for homologs for the frame-
work region and the loops:
3. (Optional) Click Choose Databases in the Framework Search tab to add databases and
select them in the table. The default database is the one from the installation.
4. (Optional) Click Filter Search Results to filter the databases by properties of the struc-
tures in the databases.
5. Select Homology search and run the homology search by clicking Search.
6. Select the homolog in the results table that you want to use for the framework region, and
click Accept.
7. (Optional) In the Basic Loop Model tab, select the loop cluster you want to use for each
loop in the model by clicking Select Cluster, and choosing a cluster in the Loop Clusters
panel. By default a cluster is chosen automatically, based on cluster size and a minimum
similarity criterion.
8. (Optional) Include the antigen, ligands, cofactors, and water in the model.
9. (Optional) Set the number of models of the antibody that you want to generate.
10. Click Generate Loop Models to generate the models.
11. (Optional) If you think that the H3 loop needs further refinement, select it in the
Advanced Loop Model tab, and click Run Prime.
You can also use coordinates from existing structures instead of homology models, for
example if you want to vary just one of the loops. The procedure is similar to that given above.
1. Import the sequences for the light and heavy chains: Click the part of the diagram for the
region you want to import, and choose a source from the list that is displayed.
2. Set up the antibody databases that you want to use to search for homologs for the frame-
work region and the loops:
3. (Optional) Click Choose Databases in the Framework Search tab to add databases and
select them in the table. The default database is the one from the installation.
4. Select Input coordinates and click Accept.
5. (Optional) In the Basic Loop Model tab, select the loop cluster you want to use for each
loop in the model by clicking Select Cluster, and choosing a cluster in the Loop Clusters
panel. By default a cluster is chosen automatically, based on similarity, then cluster size.
6. (Optional) Check the boxes for the loops for which you want to use input coordinates.
7. (Optional) Set the number of models of the antibody that you want to generate.
8. Click Generate Loop Models to generate the models.
9. (Optional) If you think that the H3 loop needs further refinement, select it in the
Advanced Loop Model tab, and click Run Prime.
The mutations can be set up and run in the Antibody Humanization: Residue Mutation panel,
which you open by choosing Tasks Antibody Modeling Humanization Residue Muta-
tion in the main window. The panel has two tabs, one for setting up the criteria for choosing
residues to mutate, and one for selecting the residues and defining the mutants. When both
these tasks are done, you can start the job to mutate the residues.
Once the analysis is complete, you are prompted to choose chains for binding affinity calcula-
tions. One set of chains is used as the ligand, the remaining chains are the receptor. These
labels are arbitrary, so it does not matter if the antigen is treated as the ligand or the receptor.
When you open this dialog box, the sequence is shown in the sequence viewer, colored by
residue type, with its secondary structure assignment and disulfide bond annotation. You can
choose which chain to display by using the Show in viewer options. This sequence is called the
parent sequence.
The database in the installation is used by default for the search. If you want to use a different
database or add a custom database, click Choose Databases and select the desired databases.
The default database is a database of human antibody data.
To find the homologs, click Search Antibody Database for Homologs. The progress of the
search is shown in the status area at the bottom of the panel. When the search is done, the
homologs must be aligned to the parent sequence, so that selection of residues for mutation can
be done on the basis of matching residue positions. Click Align Homologs to perform the
multiple sequence alignment of the homologs to the parent.
Criteria that are based on homology use the variations in the residues at each residue position
among the homologs, or between the homologs and the parent. The variations found are used
as a basis for choosing default mutations. The criteria that you can set are:
Variability at position >/< N %Filter residues based on the percentage variability at the
residue position. Choose whether to apply a minimum or a maximum variability from the
option menu, and specify the percentage threshold in the text box.
Variability at position >/< N residue typesFilter residues based on the residue type vari-
ability at the residue position. Choose whether to apply a minimum or a maximum vari-
ability from the option menu, and specify the threshold for the number of residue types
that can vary in the text box.
Ignore positions with group conservationsIgnore (do not select) residues that are
strongly conserved or that are either strongly or weakly conserved. Select the appropriate
option for strong conservation or both strong and weak conservation.
Ignore parent sequence in applying above criteriaIgnore the parent (query) sequence
when applying the variability or conservation criteria. Only the variability in the homo-
logs is considered.
Parent residue different than N % homologs at positionSelect residues for which the
parent residue is different from more than the specified percentage of homologs at the
residue position.
The criteria based on the 3D structure of the parent antibody include solvent-accessible surface
area (SASA) and interactions between residues. Interactions are determined by a distance
cutoff: any residue that has atoms within 4 of a given side chain is considered to interact
with it.
Solvent accessible surface areaSelect this option to filter residues by their solvent-
accessible surface area (SASA) relative to an isolated residue of the same type, and set a
threshold for the maximum or minimum allowed relative SASA. This option is useful for
locating surface (or buried) residues.
Residue side chain makes no more than N interactions with proteinSelect this option to
filter out residues whose side chains make multiple interactions with the protein, and set
the maximum number of such interactions.
Residue side chain interacts/does not interact with molecule NSelect this option to filter
residues by their interaction with a selected molecule. Choose whether to allow or disal-
low the interaction from the option menu, and specify the molecule in the text box, or
select Pick and pick the molecule in the Workspace.
The residues table lists all the residues in the structure. The first column identifies the residues.
The second column specifies mutations of the residues. Any residue for which mutations are
defined is mutated when the job is run.
To define the mutations for a given residue, click in the Mutations column. A popup is
displayed, from which you can select one or more residues to mutate to (including any defined
nonstandard residues), or select groups of residue types. When you make a selection, the
residue list in the table cell is updated. When you have finished selecting mutations, press
ENTER. To cancel the current changes to the mutations, press ESC.
To define a common set of mutations for multiple residues, you can use the tools at the top of
the tab. The target residues for mutation can be selected using the Mutate option menu, which
has the following choices:
All residuesSet the mutations for all residues in the table. When you select this option,
all residues are selected in the table.
Selected residuesSet the mutations for the residues that are selected in the table.
Residue selectionSet the mutations for selected residue types. Opens the Residue
Scanning - Select Residues dialog box, in which you can limit the selection of residues
by chain, solvent exposure, or residue type. All residues that meet the criteria are selected
in the residues table when you click OK in the dialog box. The setting on the Mutate
option menu is switched to Selected residues, as the dialog box is simply a means of
selecting residues.
Atom selectionSelect residues for setting up mutations by general atom selection pro-
cedures. Opens the Atom Selection dialog box, in which you can make selections by com-
binations of criteria. All residues that meet the criteria are selected in the residues table
when you click OK in the dialog box. The setting on the Mutate option menu is switched
to Selected residues, as use of the dialog box is simply a means of selecting residues.
Next, define the mutations for the selected residues from the second option menu. When you
click on the menu, a popup displays a list of mutations that can be selected (just as for editing
the table cell). Press ENTER to select the mutations, or ESC to cancel. Click Apply to apply this
set of mutations to the residues that are selected in the table.
displayed is the value for the mutated structure minus the value for the parent structure, so a
positive value for the change means that the mutated structure has a larger (more positive)
value than the parent structure.
The change in stability is always calculated. You can choose which of the other properties to
calculate by selecting the appropriate Calculate option. The properties are described in more
detail in Table 15.1 on page 110. By default, the pKa property is not selected, as this is the
most time-consuming property to calculate. The Affinity property is not selected by default also,
as it requires more than just the antibody. If you have an antigen (or other structure) to which
you want to calculate the binding affinity in the Workspace, you can calculate the binding
affinity.
Before the minimization of the region around the mutation site, a side-chain prediction of the
mutated residues is performed, which does a thorough exploration of possible side-chain
conformations. The method used for the side-chain prediction can be chosen from the Refine-
ment option menu. The choices are:
To run the job with the current job settings, enter a name in the Job Name dialog box and
click Run.
To make job settings in the Job Settings dialog box, including the host, job name, number
of processors, number of subjobs, and treatment of output, click the Settings button. Click
Run in the dialog box to run the job.
The Mutations table lists all the mutations that were generated, along with the changes in a
range of properties as a result of the mutation. The properties include SASA (total, polar, and
nonpolar), pKa, hydropathy, number of rotatable bonds, energy, potential energy, and stability.
These quantities are defined in Table 15.1 on page 110. You can sort the table columns by
clicking on the column heading. You can plot any of these properties against the mutation
(table row) by choosing the property from the Graph property option menu. If you want to
export the table data as a CSV file, click Export, and navigate to a location and name the file.
You can select a region in the graph using the horizontal and vertical dashed lines, which can
be dragged to create the selection. The rows corresponding to the selected region of the graph
are highlighted in the table above, and the residues are highlighted in the Workspace.
If you select a table row, the view zooms in to the mutated residue. To display the original
structure, select Display original structure in grey. The parent antibody is displayed and colored
grey. You can then see how the mutation is positioned in relation to the original residue.
17.2.10 Summary
1. Import a structure into the workflow, either from the Workspace or from a Maestro file
(Import structure from).
2. Click Homology Criteria.
3. (Optional) Select databases for running the search for homologs.
4. Run the search for homologs (Search Antibody Database for Homologs) and align the
homologs (Align Homologs). You can also click Import homologs to import the homologs.
5. (Optional) Select an option for the chain to show, and examine the alignment for that
chain.
6. Choose the regions that you want to substitute residues in.
7. Specify the criteria for automatic selection of residues, in the Selection by homology crite-
ria and Parent structure 3D criteria sections.
The human frameworks can be automatically located by a database search that finds the best
matches to the query antibody. If you want to choose the databases to search for human anti-
bodies, click Choose Databases. See Section 17.1.2 on page 127 for details of this task. All
databases that you select as active are searched when you run a search for human frameworks.
The first task is to import the antibody structure. Click one of the Import antibody structure to
humanize buttons to import the structure into the workflow:
Next, a structure for the framework is needed, which you import in the Specify replacement
framework section. Click one of the Import framework from buttons to import a framework:
DatabaseSearch the antibody databases for the best-fitting human antibody frame-
works. The human antibodies are aligned to the query antibody to find the best alignment
of the framework regions to the query framework regions where the CDR loops join.
Framework StructureImport a file that contains the framework structure. The structure
does not have to be an entire antibody, provided it contains the framework region.
Framework SequenceImport the sequence for the framework region. A homology
model is built for the framework region using the default options, as represented in the
Antibody Prediction panel.
The templates that were found or imported for the framework regions are listed in the Frame-
work table, along with sequence identity and stem geometry scores. The results are sorted by
score, which you can use to select a framework. You can also adjust the score by selecting the
weights of its components in the Weights dialog box, which you open by clicking Weight
Options.
To replace the original framework with the framework that is selected in the table, click
Replace Framework. When the resulting model is build, a sequence viewer panel opens,
showing the sequences of the original light and heavy variable chains and the chains of the
grafted model, annotated with disulfide bridges, SSA, and CDR regions. This sequence viewer
allows you to compare the original and the replacement framework.
The framework replacement is done without adjusting the loops in the CDR regions. There
may be clashes between the CDR loops and the new framework, which can be relieved by
back-mutation to the query, or to some other residue type.
The residues that clash can be listed in the table in the Define back mutation from human frame-
work to query section, by selecting the options Show residues with CDR/framework vdW clash
option or Show residues within N of CDR side chain (or both). You can adjust the threshold
distance to the side chain if you wish.
Each framework residue that clashes is shown, along with the distance to the CDR side chain
and a check for CDR clashes. The Mutations column lists the residue that the clashing residue
will be mutated to.
To select the residue you want to mutate to, click in the column and choose from the
menu that is displayed. In addition to the standard residues, this menu has an item for the
corresponding query residue and an item for no mutation, which disables mutation.
To disable mutation of multiple residues, select them in the table and click Clear Muta-
tions.
Column Description
Heavy PDB ID of the template used for the heavy framework region
Light PDB ID of the template used for the light framework region
Structure Source of the structure used for the framework regions, which can either be a
crystal structure or a homology model. The scoring depends on the structure
source.
Composite Score Average of the Heavy Sim. and Light Sim. scores. This score is used as the
sequence similarity score in the calculation of the weighted score.
Heavy Sim. Sequence similarity of the entire variable domain sequence (framework and
CDR) for the heavy chain. The similarity is the number of matching residues
divided by the total number of residues, where matching means that the
two residues have a positive score in the BLOSUM62 matrix.
Light Sim. Sequence similarity of the entire variable domain sequence (framework and
CDR) for the light chain. The similarity is the number of matching residues
divided by the total number of residues, where matching means that the
two residues have a positive score in the BLOSUM62 matrix.
CDR Stem Geom Fitness score for the geometry of the CDR loop stem residues. The stem res-
idues are the residues in the framework region that are directly attached to
the CDR loops. The geometry score is related to the RMSD between the
native and the grafted antibody for the C-N distance, the C-N-C angle, and
the C-C-N-C dihedral across the bond between the stem residue and the first
(or last) loop residue.
Weighted Geom+Sim Weighted combination of the stem geometry fitness score and the sequence
similarity score. The rows in the table are ordered by this score. The weights
can be specified by clicking Weight Options and specifying the weights in
the Weights dialog box.
To set the mutation to the query for multiple residues, select them in the table and click
Mutate to Query.
To select residues in the table by picking them in the Workspace, select Pick residue, then
pick the residues. This allows you to use the 3D structure to find residues to mutate.
When you have decided which residues to mutate and what to mutate them to, click Perform
Back Mutations. The residues shown in the table that have a valid mutation in the Mutations
column are mutated, and the mutated residues and neighbors within 4 are refined.
After you have performed these tasks, you may want to refine the structure of the antibody
further to relieve strain. For example, you might want to do a side-chain refinement on the
framework to relieve minor clashes with the CDR loops.
Structures that are added to a database are automatically characterized and curated. Antibody
structures and the light and heavy chains are identified using a similarity search against known
antibodies, and structures that do not meet the similarity criteria are rejected. Then the constit-
uent regions of the antibody chains, including the framework region, as well as the six hyper-
variable loops, are identified and annotated for use in subsequent predictions.
When you first open the panel, the default database is loaded. Normally this database is
installed by an administrator and you do not have write privileges, so it is opened read-only.
You can import this database into your own database if you want, as described below. It is only
necessary to do this if you want to modify the data in some way, because you can specify
multiple databases for modeling.
To open a database, or to create a new database, click Open/Create. A file selector opens, and
you can navigate to the desired location. If you want to open a database, select the database
from the list of files. It should have the extension .db. If you want to create a new database,
enter the name in the File name text box.
The structures in the database are shown in the antibody table. You can restrict the structures
that are listed in the table by searching for strings in the table and only showing the rows that
contain the string. By default, all visible text is searched, but you can change it by clicking
Select and choosing the columns you want to search on. To return to sorting all visible
columns, click Reset. The text box has a tool tip that explains the syntax for the search string,
which can include relational operators, wild cards, regular expressions, and some special
terms.
The table shows only a few columns by default. If you want to see all the columns, by Show
columns, select All. The full set of columns includes the identity (residue range) and length of
each loop, and a range of information from the originating PDB structure. If you want to export
the information in the columns to a CSV file, click Export Table, and navigate to a location and
name the file in the file selector that opens.
To delete structures from the database, select them in the table and click Delete Selected Rows.
If you want to clear the entire database, click Delete All Data. You should exercise care when
deleting rows or all data, as these functions are not reversible.
Structures can be added to the database from several sources, represented by buttons in the Add
structures to database section. In each case, the imported data is automatically processed for
you: the antibody chains are identified, the constituent regions of the chains are determined,
and all the pertinent information required for subsequent modeling is saved in a rapidly
accessed format. You need only supply the antibody structures in PDB (or Maestro) format.
The structures are filtered to ensure that only those structures that have characteristics of anti-
bodies are included in the database. You can alter the criteria for filtering structures in the Anti-
body Database - Advanced Options dialog box, which you open by clicking Advanced Options.
You can also select the file types that are presented when importing structures from files.
In the Minimum sequence identity section, you can specify cutoffs for determining whether a
structure should be included in the database, based on percentage sequence identity. If no chain
in the structure passes the tests, it is not included in the database. A chain must meet the FR
threshold and either the VL or VH threshold to be included in the database. If a chain meets
only one of the VL or VH thresholds, it is only included as a light or heavy variable chain.
For the framework regions, you can specify which of the four framework regions to use in the
similarity analysis, by selecting the options for FR1, FR2, FR3, or FR4 in the Framework
regions used in similarity analysis section.
You can create a database from the command line using a set of sequences or structures with
the following command:
The database file must have a .db extension. If you specify an existing database, the new
entries are appended, otherwise a new database is created. The input files can be Fasta files or
Maestro files. If you provide structures, they are used as is; if you provide sequences without
structures, an Fv antibody model is built using the standard options for antibody prediction.
References
Getting Help
The docs folder (directory) of your software installation, which contains HTML and
PDF documentation. Index pages are available in this folder.
The Schrdinger web site, http://www.schrodinger.com/, In particular, you can use the
Knowledge Base, http://www.schrodinger.com/kb, to find current information on a range
of topics, and the Known Issues page, http://www.schrodinger.com/knownissues, to find
information on software issues.
To get information:
Pause the pointer over a GUI feature (button, menu item, menu, ...). In the main window,
information is displayed in the Auto-Help text box, which is located at the foot of the
main window, or in a tooltip. In other panels, information is displayed in a tooltip.
If the tooltip does not appear within a second, check that Show tooltips is selected under
General Appearance in the Preferences panel, which you can open with CTRL+, (,).
Not all features have tooltips.
Click the Help button in the lower right corner of a panel or press F1, for information
about a panel or the tab that is displayed in a panel. The help topic is displayed in the Help
panel. The button may have text or an icon:
Choose Help Online Help or press CTRL+H (H) to open the default help topic.
When help is displayed in the Help panel, use the navigation links in the help topic or
search the help.
Choose Help Documentation Index, to open a page that has links to all the documents.
Click a link to open the document.
Choose Help Search Manuals to search the manuals. The search tab in Adobe Reader
opens, and you can search across all the PDF documents. You must have Adobe Reader
installed to use this feature.
For information on:
Problems and solutions: choose Help Knowledge Base or Help Advanced Help
Options Known Issues product.
New software features: choose Help Advanced Help Options New Features.
Python scripting: choose Help Advanced Help Options Python Module Overview.
Utility programs: choose Help About About Utilities.
Keyboard shortcuts: choose Help Keyboard Shortcuts.
Installation and licensing: see the Installation Guide.
Running and managing jobs: see the Job Control Guide.
Using Maestro: see the Maestro User Manual.
Maestro commands: see the Maestro Command Reference Manual.
Web: http://www.schrodinger.com/supportcenter
E-mail: help@schrodinger.com
Mail: Schrdinger, 101 SW Main Street, Suite 1300, Portland, OR 97204
Phone: +1 888 891-4701 (USA, 8am 8pm Eastern Time)
+49 621 438-55173 (Europe, 9am 5pm Central European Time)
Fax: +1 503 299-4532 (USA, Portland office)
FTP: ftp://ftp.schrodinger.com
Generally, using the web form is best because you can add machine output and upload files, if
necessary. You will need to include the following information:
4. Click Create.
An archive file is created, and an information dialog box with the name and location of
the file opens. You can highlight and copy the name of the file.
5. Upload the file specified in the dialog box to the support web form.
If you have already submitted a support request, use the upload link in the email response
from Schrdinger to upload the file. If you need to submit a new request, you can upload
the file when you fill in the form.
6. Copy and paste any log messages from the window used to start the interface or the job
into the web form (or an e-mail message), or attach them as a file.
Windows: Right-click in the window and choose Select All, then press ENTER to
copy the text.
Mac: Start the Console application (Applications Utilities), filter on the applica-
tion that you used to start the job (Maestro, BioLuminate, Elements), copy the text.
If Maestro failed:
1. Open the Diagnostics panel.
Windows: Start All Programs Schrodinger-2015-2 Diagnostics
Mac: Applications SchrodingerSuite2015-2 Diagnostics
Linux/command line: $SCHRODINGER/diagnostics
2. When the diagnostics have run, click Technical Support.
A dialog box opens, with instructions. You can highlight and copy the name of the file.
3. Upload the file specified in the dialog box to the support web form.
If you have already submitted a support request, use the upload link in the email response
from Schrdinger to upload the file. If you need to submit a new request, you can upload
the file when you fill in the form.
4. Upload the error files to the support web form.
The files should be in the following location:
Windows: %LOCALAPPDATA%\Schrodinger\appcrash
(Choose Start Run and paste this location into the Open text box.)
Attach maestro_error_pid.txt and maestro.exe_pid_timestamp.dmp.
Mac: $HOME/Library/Logs/CrashReporter
(Go Home Library Logs CrashReporter)
Attach maestro_error_pid.txt and maestro_timestamp_machinename.crash.
Linux: $HOME/.schrodinger/appcrash
Attach maestro_error_pid.txt and crash_report_timestamp_pid.txt.
SCHRDINGER