Motivation: The development of an open-source platform to predict protein 1D features and 3D structure is an important task. In this paper, we report an open-source toolkit for protein 3D structure modeling, named OPUS-X. It contains three modules: OPUS-TASS2, which predicts protein torsion angles, secondary structure and solvent accessibility; OPUS-Contact, which measures the distance and orientations information between different residue pairs; and OPUS-Fold2, which uses the constraints derived from the first two modules to guide folding.
Results: OPUS-TASS2 is an upgraded version of our previous method OPUSS-TASS. OPUS-TASS2 integrates protein global structure information and significantly outperforms OPUS-TASS. OPUS-Contact combines multiple raw co-evolutionary features with protein 1D features predicted by OPUS-TASS2, and delivers better results than the open-source state-of-the-art method trRosetta. OPUS-Fold2 is a complementary version of our previous method OPUS-Fold. OPUS-Fold2 is a gradient-based protein folding framework based on the differentiable energy terms in opposed to OPUS-Fold that is a sampling-based method used to deal with the non-differentiable terms. OPUS-Fold2 exhibits comparable performance to the Rosetta folding protocol in trRosetta when using identical inputs. OPUS-Fold2 is written in Python and TensorFlow2.4, which is user-friendly to any source-code level modification.
Python 3.7
TensorFlow 2.4
hh-suite3
The standalone version of OPUS-X is hosted on Baidu Drive with password 93sa
. Also, it can be downloaded directly from Here. The corresponding files for the testsets (CAMEO-Hard61 (60), CAMEO (78), CASP13 (26), and CASP14 (15)) we used can be downloaded from Baidu Drive with password 2tn4
, and they can be downloaded directly from Here.
More information about generating pssm
and hhm
can be found in our OPUS-TASS repo.
- Change
hhsuite3_path
to your local hh-suite3 path. - Put
pssm
(generated by psiblast),hhm
(generated by hh-suite3),aln
(generated by DeepMSA) andnpz
(generated by trRosetta) toOPUS-TASS2_and_Contact\tmp_files
. Note thatnpz
can also be generated from other predictors that output trRosetta-style's results.
- Change
trr_cons_path
to your trRosetta-style's outputs path. Here, we use the outputs from OPUS-Contact (*.contact.npz) - Change
init_torsions
to your OPUS-TASS2 outputs path (*.tass2).
Predictors | SS3 | SS8 | MAE(ϕ) | MAE(ψ) | ASA |
---|---|---|---|---|---|
NetSurfP-2.0 | 83.78 | 70.38 | 20.1 | 29.99 | 0.779 |
SPOT-1D | 83.69 | 70.72 | 19.55 | 29.97 | 0.775 |
OPUS-TASS | 84.15 | 72.12 | 19.26 | 29.47 | - |
OPUS-TASS2 | 84.55 | 72.5 | 19.07 | 28.79 | 0.797 |
Predictors | SS3 | SS8 | MAE(ϕ) | MAE(ψ) | ASA |
---|---|---|---|---|---|
NetSurfP-2.0 | 80.68 | 69.14 | 19.94 | 31.43 | 0.749 |
SPOT-1D | 82.37 | 71.11 | 19.39 | 30.1 | 0.744 |
OPUS-TASS | 83.4 | 73.27 | 18.85 | 28 | - |
OPUS-TASS2 | 85.96 | 76.28 | 17.94 | 25.17 | 0.804 |
Predictors | SS3 | SS8 | MAE(ϕ) | MAE(ψ) | ASA |
---|---|---|---|---|---|
NetSurfP-2.0 | 75.39 | 61.87 | 22.62 | 40.54 | 0.68 |
SPOT-1D | 75.19 | 61.41 | 23.19 | 43.98 | 0.663 |
OPUS-TASS | 77.3 | 63.53 | 21.91 | 38.93 | - |
OPUS-TASS2 | 80.87 | 68.26 | 20.53 | 33.48 | 0.735 |
We use the TM-score to measure the accuracy of the predicted 3D structures obtained using trRosetta and OPUS-Contact outputs information as the constraints in trRosetta folding protocol, respectively.
Predictors | CAMEO-Hard61 (60) | CAMEO (78) | CASP13 (26) | CASP14 (15) |
---|---|---|---|---|
trRosetta | 0.600 | 0.668 | 0.659 | 0.427 |
OPUS-Contact | 0.616 | 0.684 | 0.671 | 0.469 |
We compare the folding performance of OPUS-Fold2 and the Rosetta folding protocol in trRosetta on CAMEO-Hard61 (60). dist
denotes the prediction obtained by distance-guided folding exclusively, ori
denotes the prediction obtained by orientations-guided (ω, θ and φ) folding exclusively, dist+ori
denotes the prediction obtained using both of them, and complete
denotes the prediction obtained using trRosetta’s original complete energy terms (including the ramachandran, the omega, the van der Waals (vdw), and the centroid backbone hydrogen bonding (cen_hb) terms)
OPUS-Fold2 optimization process of target 2020-01-18_00000081_1.pdb (with 444 residues in length) in CAMEO-Hard61 (60).
Some intermediate structures of target 6BZT_D_21_522.pdb (with 501 residues in length) during the optimization process of OPUS-Fold2.
This project is freely available for academic usage only.
@article{xu2021opus1,
title={OPUS-X: an open-source toolkit for protein torsion angles, secondary structure, solvent accessibility, contact map predictions and 3D folding},
author={Xu, Gang and Wang, Qinghua and Ma, Jianpeng},
journal={Bioinformatics},
year={2021},
publisher={Oxford University Press}
}