Molrep MR Tutorial
Molrep MR Tutorial
Molrep MR Tutorial
MOLREP tutorial
1. Using MOLREP with ccp4i Notations: Select directory A window, title of which contains words Select directory. Self RF solutions ... [Browse] In the line containing words Self RF solutions, press button, panel etc with word Browse on it, Project <pst> ... In the line, containing word Project, type pst in a text box. [Auto Do not use] pseudo-translation vector Change selection from Auto to Do not use in the line containing words pseudo-translation vector. [+] use sequence Press untitled button in the line containing words use sequence. [] use sequence Release untitled button in the line containing words use sequence. The parent window has smaller indent than the child one. The line denoting an action has the same indent as the window where this action has to be performed. 1.1. Example: s100 The crystal has symmetry H3. The asymmetric unit contains a dimer. Data were collected up to resolution 2.5A. Molecular replacement model is a monomer from 1mho, identity 38%.
>ccp4i
1.1.1. Simple run The default parameters are used in this run. User only needs to dene two input les containing search model and data. Choose module from the pop-down menu on the left (yellow button): CCP4Interface [Renement Molecular Replacement] and then program to run: CCP4Interface [Molrep-auto MR] Molrep Molecular Replacement MTZ in ... [Browse] Select Input MTZ le [s100.mtz] [OK] Model in ... [Browse] Select Input PDB le [monomer.pdb] [OK] [Run] [Run Now] First program performs rotation function. Then it nds position of one monomer using conventional translation function. Finally, it xes the found monomer and searches for the second one. The number of monomers to search for is estimated assuming solvent content about 50%. By default, the program checks, whether the data are seriously anisotropic, and if it is so, it performs anisotropic correction. (In this example anisotropy is not signicant.) To see log-le, rst select the job from the list of running/ nished/ failed jobs: CCP4Interface [1 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer The log-le contains ve tables. The rst one is the list of peaks of Cross Rotation Function (CRF), sorted according their heights. The second table shows the results of Translation Function (TF) for these peaks (orientations). The last table is sorted according to the TF score to give the third table. The search for the second monomer (the rst is xed) is represented by the last two tables, which are organised similar to the previous two. The nal model contains two monomers oriented according to the peaks 1 and 3 of the CRF. CCP4I leviewer [Quit] More details can be found in another output le: CCP4Interface [View Files From Job] [s100 1 molrep.doc] /s100 1 molrep.doc [Quit]
CCP4Interface [Directories&ProjectDir] Directories & Project Directory [Add project] Project <s100> ... [Browse] Select directory Go to the tutorial: [Go up directory] ... [tutorial] and then to the s100 directory: [s100] [OK] Project for this session of CCP4Interface [s100] [Apply&exit] Warning [Dismiss] This example is convenient to illustrate how to use basic options in MOLREP.
1.1.2. Use of sequence Modication of the model according to the sequence of the target protein usually improves contrast and therefore increases the probability to nd solution. In MOLREP this modication is done as follows. Firstly, the sequence derived from the pdb-le is aligned to the target sequence. This alignment takes into account the 3-D structure of the model, namely, the gaps and insertions are considered unlikely within the sequence fragments corresponding to helices and strands in the search model. The gaps and insertions are considered more probable in the loops and at the surface of the search model. Secondly, the atoms of the aligned residues are aligned, and atoms in the search model that have no counterpart in the target are deleted, e.g. if VAL in the target sequence corresponds to LEU in the search model, then only N, CA, C, O, CB and CG of LEU are kept in the search model and CD1 and CD2 are deleted. CG is renamed to CG1, LEU to VAL, and the residue number of LEU in the model is changed to the number of VAL in the target sequence. Residues in the search model that align to the gaps in the target sequence are deleted. Note that no new atoms are added, because predictions are less reliable than experiment. If the sequence similarity between the target and the search model is low (say 20%), this kind of modication can result into deletion of too many atoms. Such sparse model usually performs worse than the complete one. For such cases, there are other ways to improve contrast and to increase the probability of nding the solution, namely, the use of oligomers as search models, deletion of loops, NMR-model, playing with such keywords as SIM, COMP, RESMAX. Molrep Molecular Replacement [+] use sequence Seq in ... [Browse] Select Input SEQ le [s100.seq] [OK] [Run] [Run Now] .. File Exists [continue] CCP4Interface [2 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer With use sequence on, the log-le shows sequence alignment used for model correction. In this example, the model correction makes no dramatic changes (only the score for the second monomer increases). This is not surprising as the original model was already good enough. [Quit]
section. Molrep Molecular Replacement Do [molecular replacement self rotation function] [Run] [Run Now] Output le s100 3 rf.ps shows sections of the Self Rotation Function. It can be viewed as follows (if ps-viewer is dened in the ccp4i): CCP4Interface [3 ... FINISHED molrep ...] [View Files From Job] [s100 3 rf.ps] The dimer produces pronounced peak at 83o and 26o in the section 180o Note that if there are many monomers in the AU of the unknown structure (the program does not know this at this step as it uses only experimental data and does not interpret automatically the SRF) the radius of integration has to be reduced to eliminate the effect of intermolecular vectors. A reasonable choice of the integration radius is the diameter of search model. In our example the radius derived automatically from the unit cell parameter is reasonable and there is no need to dene it manually. In the case when this is needed indeed, the integration radius can be changed as follows. Molrep Molecular Replacement [Parameters for Self-Rotation Function] Search radius <30> [Run] [Run Now]
1.1.4. Lock-rotation function In many cases, correct peaks are very low in the list of strongest peaks of CRF and may be not present in the list that is further tested by TF. These are the cases where lock-rotation function could help. The lock-rotation function is applicable, when the NCS operators form a point group (maybe together with a subset of operators of crystal point group). It applies the NCS operators to the Cross Rotation Function and averages the results. After these, the correct peaks become stronger and ghosts think. Molrep Molecular Replacement Do [self rotation function molecular replacement] Adjust specic parameters (grey panels at the bottom of the job window): [Search parameters] [+] Locked rotation function Use Self Rotation function with < 1 > ... Self RF solutions ... [Browse] Select Input MRO File [s100 3 srf.molrep rf] [OK] Self RF solutions ... [View] .../s100 3 srf Click rst peak to comment it (it is the identity operator or a symmetry equivalent of the latter):
1.1.3. Self-rotation function The self-rotation function exhibits non-crystallographic symmetry (NCS). It gives independent information (along with Matthews coefcient) about the contents of the asymmetric unit. Also, the knowledge of oligomeric state of the protein in the unknown structure is useful in terms of what oligomer (if available) is to be use as a search model. Also, if SRF shows oligomer with point group symmetry, the knowledge of the NCS operators derived from SRF, can be used in the so-called Locked Rotation Function. This is discussed in the next sub-
[Sol RF 1 ...] [Save&Exit] [Run] [Run Now] .. File Exists [continue] CCP4Interface [4 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer In our example, the use of LRF instead of CRF clearly improves the contrast in the orientation search. Also, it pushes correct peaks to the rst and second position, whereas they were rst and third in the previous run. Note that there are no signicant changes in the TF (as it should be). [Quit]
[s100 4 molrep1.pdb] [OK] Bug: Coords out has the same name as Model in; this name has to be changed: Coords out <s100 6 molrep1.pdb> ... [Run] [Run Now] [Close] CCP4Interface [6 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer The gain in CC and R can seem small, but these are not good criteria, when the model is still far from real structure. The starting model for the consequential restrained renement is improved, and this improvement is sometimes crucial in terms of interpretation of the map after the restrained renement. [Quit] [Exit]
1.1.5. Use of dimer found by MOLREP If there are two or more monomers found, the programs tests all their symmetry equivalents to detect dimer. If a dimer is detected, the coordinates of the dimer are written in a separate pdb-le. If there is more than one dimer in the AU, the found dimer can be further used in consequential runs. In general, use of an oligomer increases the probability to nd solution, provided that the search oligomer is similar to that in the unknown structure. Molrep Molecular Replacement [] use sequence [] Locked rotation function Model in ... [Browse] Select Input PDB le [s100 4 molrep dimer.pdb] [OK] [Run] [Run Now] CCP4Interface [5 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer The contrast, of course, is signicantly better compared to the previous runs, where the search model was a monomer. [Quit]
1.2. Example: handling translational NCS Space group is P21 . The asymmetric unit contains four monomers, which are pair-wise related by translational NCS.
>ccp4i
CCP4Interface [Directories&ProjectDir] Directories & Project Directory [Add project] Project <pst> ... [Browse] Select directory [Go up directory] [pst] [OK] Project for this session of CCP4Interface [pst] [Apply&exit] Warning [Dismiss]
1.2.1. Automatic translational NCS mode The program automatically detects translational NCS and uses this information when performs the TF. In effect, the search model contains two monomers related by non-crystallographic translation, although technically everything is done in the reciprocal space. CCP4Interface [Molrep-auto MR] Molrep Molecular Replacement MTZ in ... [Browse] Select Input MTZ le [pst.mtz] [OK]
1.1.6. Rigid body renement This is the last step to perform before proceeding with restrained renement, and the last step shown for this example. Molrep Molecular Replacement Do [molecular replacement pure RB renement] Model in ... [Browse] Select Input PDB le
Model in ... [Browse] Select Input PDB le [model.pdb] [OK] [Run] [Run Now] [1 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer [Quit]
1.2.2. Translational NCS mode off Translational NCS mode can be switched off, in which case single monomer is a search model. This is essential e.g. in specic cases of disordered crystals where there are strong peaks in the Patterson map which do not correspond to a translational NCS. Molrep Molecular Replacement [Search parameters] [Auto Do not use] pseudo-translation vector [Run] [Run Now] .. File Exists [continue] [Close] CCP4Interface [2 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer In this particular case, it was not a good idea to switch the translational NCS mode off: only three of four monomers are found. [Quit] [Exit]
This example illustrates a frequently used technique, where the search model is dissected into relatively rigid domains. Orientations and positions of these domains are found in sequential runs of molecular replacement. This is applicable, say, when the structure of holo-ferment is given and the structure of its apo-form is to be solved. Thus, in this example, there are four steps of structure solution. Also, with this example, a relatively new technique, MR with Spherically Averaged Phased Translation Function (SAPTF) is demonstrated. A brief discussion of this technique is given below. Given estimates of phases and an atomic model of a homologue protein, the MR techniques can be used to position the model into the density. The standard approach prescribes the following rout: (i) conventional rotation function, which does not use phase information, (ii) phased translation function. In the MR with SAPTF the phase information is used in both steps, but now the order of steps is changed: (i) SAPTF (phase information is used!), (ii) phased rotation function, (iii) phased translation function to conrm the position of the molecule found in the step (i). The third way is a 6-dimensional search, which is somewhat slower (hours instead of minutes). Finally, with this example, the tting of two homologous structures implemented in MOLREP is shown. A specic feature of this tting is that it does not need a preliminary sequence alignment, and that it ts the largest fragments among those, which are 3d-similar, and ignores the rest of the structure. These features can be used to dene domains in the cases where their denition is not obvious.
>ccp4i
CCP4Interface [Directories&ProjectDir] Directories & Project Directory [Add project] Project <1tj3> ... [Browse] Select directory [Go up directory] [1tj3] [OK] Project for this session of CCP4Interface [1tj3] [Apply&exit] Warning [Dismiss] Here one can undertake a straightforward MR, as e.g. described in ??, to nd out that automatic MR nds solution, but renement does not improve much the R-factors and there is no density for the domain 2. This is because of the exibility of the molecule. At this point, it is worth noting that a domain in a wrong position is a double error in terms structure factors, and hence in terms of density interpretation. Firstly, it is not present in the right place, and secondly, it is present where it should not be. Thus, it is good idea to start from the very beginning and to solve structure by parts, searching for one domain after another. 1.3.1. Search for domain 1 using MOLREP This is a straightforward run, except for the expected number of
1.3. Example: mobile domains Space group is P65 22, resolution is 2.8A, asymmetric unit contains one monomer. Search model is 1s2o. There are two mobile domains in the molecule, which can be seen using molecular graphic:
>cd tutorial/1tj3 >coot --pdb 1s2oA.pdb
Coot [Display Manager] Display Control 0 ... [Bonds ... C-alphas] [OK] Two domains are clearly seen. [File] [Exit]
monomers being dened manually. This is because MOLREP estimates the number of monomers in the AU of unknown crystal structure assuming solvent content about 50%. As we deal with approximately half of the model, the estimated number of monomers will be two instead of one. Of course, in the automatic run, the program would not nd the non-existing second monomer, but it would spend some time attempting to do so. CCP4Interface [Molrep-auto MR] Molrep Molecular Replacement MTZ in ... [Browse] Select Input MTZ le [1tj3.mtz] [OK] Model in ... [Browse] Select Input PDB le [1s2oA dom1.pdb] [OK] [Search parameters] Search for <1> monomers in the asymmetric unit [Run] [Run Now] [Close] Output: 1s2oA dom1 molrep1.pdb: MR solution for the domain 1. 1.3.2. Renement of the rst domain using REFMAC
R-free decreases! [Quit] Output: 1s2oA dom1 molrep1 refmac1.pdb: rened model of the domain 1, 1tj3 refmac1.mtz: MTZ le including phases estimated by REFMAC. 1.3.3. Search for the domain 2 with domain 1 xed In this step, MR with SAPTF is used, see comments at the beginning of this section. CCP4Interface [Renement Molecular Replacement] [Molrep-auto MR] Molrep Molecular Replacement Do [molecular replacement SAPTF + Phased RF [+] Input xed model MTZ in ... [Browse] Select Input MTZ le 1tj3 refmac1.mtz [OK] Use [+] experimental phases from input MTZ le FP [FWT] SIGFP [Unassigned] PH [PHWT] Weight [Unassigned] Model in ... [Browse] Select Input PDB le [1s2oA dom2.pdb] [OK] Fixed in ... [Browse] Select Input PDB le 1s2oA dom1 molrep1 refmac1.pdb [OK] [Experimental Data] Use [observed mask] structure factors [Search parameters] Search for <1> monomers in the asymmetric unit [Run] [Run Now] [Close] [3 ... FINISHED molrep ...] [View Files From Job] [View Log File] CCP4I leviewer Only the correct orientation passed the comparison of the positions found by SAPTF at step (i) and by Phased Translation Function at step (iii) (see discussion at the beginning of this section). [Quit]
CCP4Interface [Molecular Replacement Renement] [Run Refmac5] Run Refmac5 MTZ in ... [Browse] Select Input MTZ le [1tj3.mtz] [OK] PDB in ... [Browse] Select Input PDB le 1s2oA dom1 molrep1.pdb [OK] [Renement parameters] Do <20> cycles of maximum likelihood ... [Run] [Run&View Com File] No harvest dataset name given [Continue] View Command File Substitute auto for MATRIX 0.3. This will be default in the next release of CCP4. [Continue] [Close] [2 ... FINISHED refmac5 ...] [View Files From Job] [View Log File] CCP4I leviewer
Output: 1s2oA dom2 molrep1.pdb: model containing both domains, where domain 1 is taken from Fixed model and domain 2 is just found. 1.3.4. Renement of the whole molecule
Run Refmac5 MTZ in ... [Browse] Select Input MTZ le [1tj3.mtz] [OK] PDB in ... [Browse] Select Input PDB le 1s2oA dom2 molrep1.pdb [OK] [Run] [Run&View Com File] No harvest dataset name given [Continue] View Command File Substitute auto for MATRIX 0.3 [Continue] [Close] [4 ... FINISHED refmac5 ...] [View Files From Job] [View Log File] CCP4I leviewer This is a correct solution, as the R-free is already below 30%, and this is without rebuilding and adding water. [Quit] Output: 1s2oA dom2 molrep1 refmac1.pdb: complete rened model. 1.3.5. Fitting of original and nal models Model is a moving model to t, and Model2 is a xed model to match. CCP4Interface [Renement Molecular Replacement] [Molrep-auto MR] Molrep Molecular Replacement Do [SAPTF tting two molecules] Model in ... [Browse] Select Input PDB le [1s2oA.pdb] [OK] Model2 in ... [Browse] Select Input PDB le [1s2oA dom2 molrep1 refmac1.pdb] [OK] [Run] [Run Now] [Close] [Exit] Output: 1s2oA molrep1.pdb: initial model tted to the nal model. 1.3.6. Comparison of the initial and nal structures
Coot [File] [Open Coordinates] Select Coordinates File [Filter] [1s2oA molrep1.pdb] [OK] [Display Manager] Display Control 0 ... [Bonds ... C-alphas] 1 ... [Bonds ... C-alphas] [OK] Note that this superposition happens to be by larger domains. Do not be confused: the two domains in the nal model have different chain identiers and their C-alpha representations in COOT are coloured differently. [File] [Exit]
1.4. MR with twinned data In simple cases of twinning, the MR automatically nds solution, which can be rened to give the electron density of reasonable quality.
CCP4Interface [Directories&ProjectDir] Directories & Project Directory [Add project] Project <twin> ... [Browse] Select directory [Go up directory] [twin] [OK] Project for this session of CCP4Interface [twin] [Apply&exit] Warning [Dismiss]
1.4.1.
CCP4Interface [Molecular Replacement Renement] [Run Sfcheck & Procheck] Check - Run Sfcheck and Procheck [] Run Procheck to analyse structure geometry Due to a bug, the following step must be here although the le it denes is not used: Coords in ... [Browse] Select Input PDB le [monomer.pdb] [OK]
... Run Sfcheck to analyse [experimental data only] MTZ in ... [Browse] Select Input MTZ le [twin.mtz] [OK] [Run] [Run Now] [Close] [1 ... FINISHED check ...] [View Files From Job] [monomer sfcheck1.ps] The last line in the rst page says that twinning fraction is 0.407, the Partial Twinning Test in the second page was used to estimate it. 1.4.2. MR - structure solution
View Command File substitute auto for MATRIX 0.3 [Continue] [Close] [3 ... FINISHED refmac5 ...] [Exit] Output: monomer molrep1 refmac1.pdb: rened model, twin refmac1.mtz: MTZ le to compute the electron density. 1.4.4. Checking the density with COOT
monomer_molrep1_refmac1.pdb
CCP4Interface [Renement Molecular Replacement] [Molrep-auto MR] Molrep Molecular Replacement MTZ in ... [Browse] Select Input MTZ le [twin.mtz] [OK] Model in ... [Browse] Select Input PDB le [monomer.pdb] [OK] [+] Use sequence Seq in ... [Browse] Select Input SEQ le [s100.seq] [OK] [Run] [Run Now] [Close] Output: monomer molrep1.pdb: MR solution. 1.4.3. Renement of the MR solution using REFMAC with no account for twinning CCP4Interface [Molecular Replacement Renement] [Run Refmac5] Run Refmac5 MTZ in ... [Browse] Select Input MTZ le [twin.mtz] [OK] PDB in ... [Browse] Select Input PDB le monomer molrep1.pdb [OK] [Run] [Run&View Com File] No harvest dataset name given [Continue]
>coot --pdb
Coot [File] [Auto Open MTZ] Select Dataset File [Filter] [twin refmac1.mtz] [OK] The density is reasonable. [File] [Exit]
2. Using MOLREP from command string This option is illustrated below with the examples of previous section. Details of using MOLREP from command string can be seen by running MOLREP with the key -h.
>molrep -h
2.1. Example: s100 2.1.1. Simple run The only two keys used here are -f and -m. They must be followed by names of the les containing experimental data and model, respectively.
>cd tutorial/s100 >molrep -f s100.mtz -m monomer.pdb
Output: molrep.pdb: solution, molrep.doc: detailed log-le, containing more information than goes to the standard output, molrep dimer.pdb: the dimer found in this run, molrep rf.tab: list of CRF-peaks, molrep.bat: command le to rerun this job. 2.1.2. Use of sequence Extra key -s assumes use of sequence and must be followed by the name of le containing sequence in fasta-format.
>molrep -f s100.mtz -m monomer.pdb -s s100.seq
Output lenames are the same as in previous run. Sequence alignment (using 3d-structureal information, see above) is shown in molrep.doc.
2.1.3. Self-rotation function If only -f is given, the calculation of self-rotation function is assumed.
>molrep -f s100.mtz
2.3. Example: mobile domains In this example the runs of MOLREP are alternated with the REFMAC runs. What shown is how to run programs from the command line, but this can be easily transformed into shell-scripts. 2.3.1. Search for domain 1 using MOLREP
Output: molrep rf.ps: gures showing sections of SRF, molrep srf.tab: list of SRF-peaks, molrep.doc: detailed log-le, which, in particular, contains relations between SRF-peaks and all their symmetry equivalents, molrep.bat: command le to rerun this job. 2.1.4. Lock-rotation function Edit molrep srf.tab to delete the rst peak. The keyword -i will switch on the interactive mode and will allow entering keywords specifying that LRF is required. In this particular run the keyword le tsrf can be omitted, as molrep srf.tab is the default name for list of SRFpeaks. When keyword input is completed, press enter.
>molrep -f s100.mtz -m monomer.pdb -s s100.seq -i lock y file_tsrf molrep_srf.tab nsrf 1
>cd ../1tj3 >molrep -f 1tj3.mtz -m 1s2oA_dom1.pdb -i nmon 1 >mv molrep.pdb step1.pdb >mv molrep.doc step1.doc
2.3.2.
>refmac5 xyzin step1.pdb hklin 1tj3.mtz \ xyzout step2.pdb hklout step2.mtz weight auto ncyc 20 labin FP=FP SIGFP=SIGFP FREE=FreeR_flag end
2.1.5.
2.3.3.
Note that sequence correction is not possible, when model contains more than one chain. In our protocol it is not needed, as the monomers in molrep dimer.tab are already corrected. In general case, each chain can be corrected separately using molrep, and then the corrected chains can be merged in one le. The following run performs sequence correction without further RF and TF:
>molrep -m molrep.pdb -s s100.seq
>molrep -f step2.mtz -mx step2.pdb -m 1s2oA_dom2.pdb -i labin F=FWT PH=PHWT diff m prf y nmon 1 np 50 mv molrep.pdb step3.pdb mv molrep.doc step3.doc
The output of the last run, align.pdb, can be used in the next steps of MR. 2.1.6. Rigid body renement The le molrep.pdb, containing the MR solution has to be renamed. The key -mx instead of -m tells the program that the rigid-body renement has to be performed, but not the molecular replacement search.
mv molrep.pdb solution.pdb molrep -f s100.mtz -mx solution.pdb
Here, the keyword np controls the number of analysed peaks of Phased Rotation Function. Default value of 30 would work for this example, but in general it is a good idea to increase it for the tasks like this, where the difference map is used instead of the experimental one. 2.3.4. Renement of the whole molecule
2.2. Example: handling translational NCS The two next runs are very similar to the previous examples. The information about translational NCS can be seen in the log-le after the rst run, where pst is on by default. 2.2.1. Automatic translational NCS mode
>refmac5 xyzin step3.pdb hklin 1tj3.mtz \ xyzout step4.pdb hklout step4.mtz weight auto ncyc 20 labin FP=FP SIGFP=SIGFP FREE=FreeR_flag end
2.3.5.
2.2.2.
Input: 1s2oA.pdb: xed model, step4.pdb: moving model. Output: molrep.pdb: moving model ts the xed one by the larger domain. (Note the second strong peak in the TF in the last run. This peak corresponds to the t by smaller domain.)