Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DCI: An Accurate Quality Assessment Criteria for Protein Complex Structure Models

[Uncaptioned image] Wenda Wang
Institute for Mathematical Sciences
Renmin University of China
Beijing, China, 100872
wangwenda87@ruc.edu.cn
&Jiaqi Zhai
Institute for Mathematical Sciences
Renmin University of China
Beijing, China, 100872
2022000744@ruc.edu.cn
\ANDHe Huang
Institute for Mathematical Sciences
Renmin University of China
Beijing, China, 100872
hehuang@ruc.edu.cn
&* Xinqi Gong
Institute for Mathematical Sciences
Renmin University of China
Beijing, China, 100872
xinqigong@ruc.edu.cn
Abstract

The structure of proteins is the basis for studying protein function and drug design. The emergence of AlphaFold 2 has greatly promoted the prediction of protein 3D structures, and it is of great significance to give an overall and accurate evaluation of the predicted models, especially the complex models. Among the existing methods for evaluating multimer structures, DockQ is the most commonly used. However, as a more suitable metric for complex docking, DockQ cannot provide a unique and accurate evaluation in the non-docking situation. Therefore, it is necessary to propose an evaluation strategy that can directly evaluate the whole complex without limitation and achieve good results. In this work, we proposed DCI score, a new evaluation strategy for protein complex structure models, which only bases on distance map and CI (contact-interface) map, DCI focuses on the prediction accuracy of the contact interface based on the overall evaluation of complex structure, is not inferior to DockQ in the evaluation accuracy according to CAPRI classification, and is able to handle the non-docking situation better than DockQ. Besides, we calculated DCI score on CASP datasets and compared it with CASP official assessment, which obtained good results. In addition, we found that DCI can better evaluate the overall structure deviation caused by interface prediction errors in the case of multi-chains. Our DCI is available at https://gitee.com/WendaWang/DCI-score.git, and the online-server is available at http://mialab.ruc.edu.cn/DCIServer/.

Keywords Protein Complex Structures  \cdot Evaluation Metric  \cdot Matrix Calculation

1 Introduction

Quality assessment of predicted models is very important for protein structure prediction task. Giving an accurate score for the predicted protein structure will help a lot for the improvement of the prediction methods and the subsequent applications and studies.

In the evaluation metrics for an overall protein structure, the most traditional one is RMSD (Root Mean Square Deviation) (Kabsch, 1976, 1978), which measures the deviation between the predicted model with the native structure by calculating the distance of equivalent atoms after structure superimposition. But, RMSD calculates the deviation of aligned atoms, if the predicted model is especially different from its native structure, the bad regions of non-aligned parts will be ignored, and RMSD measures the distance of all pairs of equivalent atoms equally, a small deviation will result in a high RMSD. TM-score (Template Model) (Zhang and Skolnick, 2007) makes an improvement on the basis of RMSD. Instead of using simple root mean square deviation during superposition, TM-score uses an objective function that can eliminate the influence of protein sequence length on the score, and the distance of corresponding residues is scaled to obtain a continuous value between 0-1 as the final score. GDT (Zemla, 2003) was used to evaluate structures since CASP 4 (Zemla et al., 2001); its method is to calculate the maximum proportion of atoms whose distance after superposition is less than the threshold value in the total atoms according to the specified threshold value, and select the average of the proportion obtained by different thresholds as the final score. GDT can greatly reduce the influence of atoms that seriously deviate from the native structure on the evaluation results, and has a stricter requirement for the completeness of the prediction part. The algorithm to find the best Superimposition is complicated, and the structure can only be evaluated as a rigid body. More importantly, the evaluation strategy based on superimposition is strongly influenced by domain motions, changes in the relative orientation between domains may lead to large changes in the evaluation scores, while lDDT (Mariani et al., 2013), as a local evaluation metric without superimposition, is more flexible and extensive in its application. lDDT considers pairs of atoms from different residues within a predefined distance threshold, calculates the distance difference between this pair of atoms in the native structure and the predicted model, and finally scores the average of the proportion of atom pairs with distance differences within 0.5Å, 1Å, 2Å, 4Å.

All of the above mentioned evaluation metrics are mainly used for protein monomer. But for complexes, the contact sites and interface between protein and protein are crucial for their structure and function (Gao and Skolnick, 2010). In addition, with the emergence of AlphaFold 2 (Jumper et al., 2021; Evans et al., 2021), single protein structures can be predicted almost accurate. Now when evaluating complexes, the quality of the contact interface is the most critical difference compared to the evaluation of monomers. In both CASP (Critical Assessment of Structure Prediction) and CAPRI (Critical Assessment of PRotein Interaction prediction) assessments, there is an evaluation strategy specifically for complexes. In recent years at the CASP, such as CASP 14 (Kryshtafovych et al., 2021; Ozden et al., 2021) and recent CASP 15, the official assessment criteria uses oligo-lDDT, TM-score, ICS (interface contact score) and IPS (interface patch score) to evaluate complex predicted results, where oligo-lDDT evaluating models locally, TM-score evaluating models globally, ICS and IPS evaluating interface areas specifically. The complex assessment strategy in CAPRI takes a different approach, which uses three metrics related to the contact interface (Lensink et al., 2007): Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT, lRMS and iRMS. Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT is the proportion of native contacts, lRMS is the backbone RMSD of the ligand common residues after the receptors are superimposed. iRMS is the backbone RMSD of the common interface residues. Based on these three metrics, the predicted model quality can be classified as high, medium, acceptable and incorrect. DockQ (Basu and Wallner, 2016) is a continuous model quality metric that combines Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT, lRMS, and iRMS for protein-protein docking, currently the most versatile and recognized method for evaluating protein complexes, which has also been used in AlphaFold-multimer. DockQ can evaluate not only dimers but also multimer protein complexes. For evaluating multimers, DockQ divides all the chains of the multimer into two parts and then calculate the value just like that for a dimer. But because of that, as a suitable metric for docking, DockQ depends heavily on the selection of ligand and receptor and structure superimposition, so cannot provide an unique and accurate evaluation for a multimer complex without the knowledge of which chains are ligand or receptor. In conclusion, it is still necessary to develop an objective and good evaluation metric for evaluating entire protein complex structures.

In this work, we developed DCI score, a new assessment metric for protein complex, which only based on distance map and CI (contact-interface) map, achieving good results both in comparison with DockQ and CASP assessment. The pipeline of DCI is shown in Fig. 1.

DCI highlights the accuracy of the contact interface based on the overall comparison of two complex structures. It takes a pair of structures to be compared as input, calculates the distance map and CI map of the two structures respectively. The CI map is used to calculate Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT, to characterize intra-chain contacts and inter-chain contacts in the complex separately, we extend Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT to intra-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT and inter-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT. The next step is comparing the distance maps to obtain distance difference map. Then we measure the importance of residues according to whether a pair of residues contact and which chain they are located respectively. We give a higher weight to the important residues pair and a lower weight to the unimportant residues pair to form a weight matrix which used to act on the distance difference map to get the difference value V𝑉Vitalic_V. V𝑉Vitalic_V represents the overall difference between the predicted structure and the native structure, Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT represents the accuracy of contact prediction. So V𝑉Vitalic_V is weighted and summed with Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT to obtain the final DCI score.

Refer to caption
Figure 1: DCI pipeline. Firstly, calculate distance map and CI map of two structures respectively. The distance maps are used to obtain distance difference map and the CI maps are used to calculate Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT and generate the weight matrix together with native structure. Distance difference map is multiplied by weight matrix and normalized to a difference value V𝑉Vitalic_V. At last, V𝑉Vitalic_V is weighted sum with Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT to get DCI score.

2 Materials and Methods

2.1 DCI maps

When the three-dimensional structure of a protein complex is obtained, the distance map and the contact-interface map can be naturally calculated, in which the elements in the distance map represent the Cα𝛼\alphaitalic_α-atom distance between any two residues in the complex. At the same time, contacts for intra-chain and interfaces for inter-chain can be obtained by applying different distance thresholds to the residue pairs located between or within a chain, and we made the inter-chain contact threshold slightly larger than the intra-chain contact threshold to obtain more interface contacts. In this work, the inter-chain contact threshold is set to 8Å, and the intra-chain contact threshold is set to 6Å, the resulting contact-interface map is a binary matrix of 1 if there is a contact and 0 if there is no contact.

2.1.1 Fraction of Native Contact in Predicted Interface

Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT in CAPRI(Lensink et al., 2007) indicates the fraction of native contacts preserved in the contacts of the predicted model. Because of the great improvement in the prediction accuracy of monomer, the prediction accuracy of intra-chain contacts in the complex has reached a high level, and inter-chain contacts are currently the main factor affecting the overall prediction accuracy of the structure. In order to characterize intra-chain contacts and inter-chain contacts in the complex separately, we extend Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT to intra-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT and inter-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT, where inter-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT represents the fraction of native inter-chain contacts appeared in the predicted model, and intra-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT represents the fraction of native intra-chain contacts appeared in the predicted model.

2.1.2 Distance Difference Map

Obviously, two similar structures tend to have similar distance maps, so the starting point is to compare the difference of distance maps between structures to obtain their similarity. As shown in Equation.1, distance difference map D~~𝐷\widetilde{D}over~ start_ARG italic_D end_ARG represents the absolute value of the difference between the distance map of ground truth Dgsubscript𝐷𝑔D_{g}italic_D start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and predicted model Dpsubscript𝐷𝑝D_{p}italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. The element size in D~~𝐷\widetilde{D}over~ start_ARG italic_D end_ARG represents the similarity of the position of corresponding residues pair in the structures, and a smaller value means that the two structures are more similar at that residues pair. Since distance matrix can be calculated by single structure, so the superimposition operation can be skipped after getting common residues of a pair of structures.

D~=|DgDp|~𝐷subscript𝐷𝑔subscript𝐷𝑝\widetilde{D}=|D_{g}-D_{p}|\vspace*{-5pt}over~ start_ARG italic_D end_ARG = | italic_D start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT - italic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | (1)

2.2 Mask and Weight Matrix

It is not reasonable to consider the difference of each pair of residues with the same importance, so consider a weight matrix acting on the distance difference matrix. The value of the element in the weight matrix represents the importance of the distance difference of the corresponding position of the residues pair, which is determined by i) whether a pair of residues is in contact, ii) whether it is from intra-chain or inter-chain, and iii) from which chain respectively if from inter-chain. The objective of our consideration is to focus more on the inter-chain, contact residues. And if the residues pair are located in two different chains, the closer contact between the two chains have, the more emphasis on the evaluation of the residues pair from these two chains. To represent the location of the three types of residue pairs above, we define the corresponding mask matrices: The inter-chain/intra-chain mask matrix MIsubscript𝑀𝐼M_{I}italic_M start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT is a binary matrix where the corresponding position of the intra-chain residues pair is 1 and the corresponding position of the inter-chain residues pair is 0 (as Fig. 2 left 1); For a chains set \mathbb{C}blackboard_C of a structure, Mjksubscript𝑀𝑗𝑘M_{jk}italic_M start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT is another binary matrix, the corresponding position of the residues pair from chain j𝑗jitalic_j and chain k𝑘kitalic_k is 1 (j,k𝑗𝑘j,k\in\mathbb{C}italic_j , italic_k ∈ blackboard_C), otherwise the corresponding position is 0 (as Fig. 2 right 1-3); And the mask matrix MCsubscript𝑀𝐶M_{C}italic_M start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT which represents whether a residues pair is in contact, is exactly the native CI map CIg𝐶subscript𝐼𝑔CI_{g}italic_C italic_I start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT.

Refer to caption
Figure 2: Schematic diagram of masks. The picture shows the masks of a trimer. The first mask on the left indicates whether the residues pairs are in the same chain. The colored part are intra-chain and the gray part are inter-chain. The three masks on the right indicate which two chains the residue pairs are located in when they are in different chains. For example, the dark blue part of the first mask on the right represents that the residue pairs come from the first and second chains respectively. For all masks, assign 1 to the colored parts and 0 to the gray parts.

The weight matrix W𝑊Witalic_W measuring the importance of residues pair is combined by mask matrices mentioned above. The final weight matrix is composed of two parts: first we use MIsubscript𝑀𝐼M_{I}italic_M start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT to give a higher weight ωiesubscript𝜔𝑖𝑒\omega_{ie}italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT for inter-chain residues pair and a lower weight ωiasubscript𝜔𝑖𝑎\omega_{ia}italic_ω start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT for intra-chain residues pair. WIsubscript𝑊𝐼W_{I}italic_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT that give different weights to intra-chain and inter-chain residues pairs is defined as Equation.2. And we considered weighted average for intra-chain weight and inter-chain weight as ωie+ωia=1subscript𝜔𝑖𝑒subscript𝜔𝑖𝑎1\omega_{ie}+\omega_{ia}=1italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT = 1.

WI=ωiaMI+ωie(1MI)subscript𝑊𝐼subscript𝜔𝑖𝑎subscript𝑀𝐼subscript𝜔𝑖𝑒1subscript𝑀𝐼W_{I}=\omega_{ia}M_{I}+\omega_{ie}(1-M_{I})italic_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = italic_ω start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT ( 1 - italic_M start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ) (2)

And use MCsubscript𝑀𝐶M_{C}italic_M start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT to enlarge the weight of contact residues pair and reduce the weight of non-contact residues pair by two coefficients ωcsubscript𝜔𝑐\omega_{c}italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and ωncsubscript𝜔𝑛𝑐\omega_{nc}italic_ω start_POSTSUBSCRIPT italic_n italic_c end_POSTSUBSCRIPT correspondingly. WCsubscript𝑊𝐶W_{C}italic_W start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT that give different weights to contact and non-contact residues pairs is defined as Equation.3.

WC=ωcMC+ωnc(1MC)subscript𝑊𝐶subscript𝜔𝑐subscript𝑀𝐶subscript𝜔𝑛𝑐1subscript𝑀𝐶W_{C}=\omega_{c}M_{C}+\omega_{nc}(1-M_{C})italic_W start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT italic_n italic_c end_POSTSUBSCRIPT ( 1 - italic_M start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) (3)

second we use Mjksubscript𝑀𝑗𝑘M_{jk}italic_M start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT to give different weight for residues pair from different chains, that the weight ωjksubscript𝜔𝑗𝑘\omega_{jk}italic_ω start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT will be higher if two chains have more contacts, to achieve this, we define a series of interface features: for an interface formed by chain j and chain k, Lj.Lkformulae-sequencesubscript𝐿𝑗subscript𝐿𝑘L_{j}.L_{k}italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTrepresents the respective length of the two chains, Iklsubscript𝐼𝑘𝑙I_{kl}italic_I start_POSTSUBSCRIPT italic_k italic_l end_POSTSUBSCRIPT is the number of interface residue pairs, While Cjsubscript𝐶𝑗C_{j}italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represent the number of residues in the interface of the two chains, and Cjksubscript𝐶𝑗𝑘C_{jk}italic_C start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT represents the total number of residues in the interface, i.e. Cjk=Cj+Cksubscript𝐶𝑗𝑘subscript𝐶𝑗subscript𝐶𝑘C_{jk}=C_{j}+C_{k}italic_C start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. We define ωjk=Ijk2+Cjk2LjLksubscript𝜔𝑗𝑘superscriptsubscript𝐼𝑗𝑘2superscriptsubscript𝐶𝑗𝑘2subscript𝐿𝑗subscript𝐿𝑘\omega_{jk}=\frac{I_{jk}^{2}+C_{jk}^{2}}{L_{j}\cdot L_{k}}italic_ω start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = divide start_ARG italic_I start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ⋅ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG as the weight that measures the importance of the contact between the two chains. wDsubscript𝑤𝐷w_{D}italic_w start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT that assign different weights to residues from different chains is shown as Equation.4.

WD=j,kωjkMjksubscript𝑊𝐷subscript𝑗𝑘subscript𝜔𝑗𝑘subscript𝑀𝑗𝑘W_{D}=\sum\limits_{j,k\in\mathbb{C}}\omega_{jk}M_{jk}italic_W start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j , italic_k ∈ blackboard_C end_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT (4)

The final weight matrix W𝑊Witalic_W is obtained based on the expansion or reduction of WIsubscript𝑊𝐼W_{I}italic_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT and WCsubscript𝑊𝐶W_{C}italic_W start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, plus the additional weight of WDsubscript𝑊𝐷W_{D}italic_W start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT as Equation.5.

W=WIWC+WD𝑊subscript𝑊𝐼subscript𝑊𝐶subscript𝑊𝐷W=W_{I}W_{C}+W_{D}italic_W = italic_W start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT + italic_W start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT (5)

After obtaining distance difference matrix and weight matrix, the Hadamard product of the two matrices is mapped to the difference value V𝑉Vitalic_V on the 0-1 continuous interval as shown in Equation.6.

V=11+WD~¯𝑉11¯direct-product𝑊~𝐷V=\frac{1}{1+\overline{W\odot\widetilde{D}}}italic_V = divide start_ARG 1 end_ARG start_ARG 1 + over¯ start_ARG italic_W ⊙ over~ start_ARG italic_D end_ARG end_ARG end_ARG (6)

So far we have gotten V represents the overall difference between the predicted structure and the native structure and intra-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT, inter-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT represents the accuracy of intra-chain and inter-chain contact prediction. Finally DCI score is obtained by weighted average of intra-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT, inter-Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT and difference value V𝑉Vitalic_V as shown in Equation.7, where ω1+ω2+ω3=1subscript𝜔1subscript𝜔2subscript𝜔31\omega_{1}+\omega_{2}+\omega_{3}=1italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 1.

DCI=ω1Intra-Fnat+ω2Inter-Fnat+ω3V𝐷𝐶𝐼subscript𝜔1𝐼𝑛𝑡𝑟𝑎subscript𝐹𝑛𝑎𝑡subscript𝜔2𝐼𝑛𝑡𝑒𝑟subscript𝐹𝑛𝑎𝑡subscript𝜔3𝑉DCI=\omega_{1}\cdot Intra\operatorname{-}F_{nat}+\omega_{2}\cdot Inter% \operatorname{-}F_{nat}+\omega_{3}\cdot Vitalic_D italic_C italic_I = italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_I italic_n italic_t italic_r italic_a - italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_I italic_n italic_t italic_e italic_r - italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⋅ italic_V (7)

The diagram containing the above algorithm flow is shown in Fig. 3.

Refer to caption
Figure 3: algorithm of DCI. DCI inputs a set of structures for comparison and calculates the distance map and CI map for each structure (purple represents the true structure and blue represents the predicted structure, dark regions in the maps represent the close or contact of residue pairs). Distance difference map is calculated using the distance maps of two structures, and Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT is calculated using true CI map, several masks of the position of residues pair was generated according to the native structure, which is combined into the weight matrix and operate Hadamard product with the distance difference map, DCI score is obtained by weighted summation of this value with Fnatsubscript𝐹𝑛𝑎𝑡F_{nat}italic_F start_POSTSUBSCRIPT italic_n italic_a italic_t end_POSTSUBSCRIPT finally.

2.3 Penalty

In order to make the distance matrix of the native structure and the predicted structure can be subtracted, it is required to keep the dimension of the two matrices consistent, so it is necessary to intercept the residues common to the two structures. If the residues of the native structure are cropped due to the loss of residues in the predicted structure, a penalty coefficient will be added to the DCI calculation. The penalty coefficient is the ratio of the cropped length to the original length in native structure. In practice, whether calculating DCI with penalty is optional.

2.4 Datasets

Here we use five protein complex datasets for parameter optimization and results analysis. We briefly described each dataset and labels for evaluation and comparison. More detailed information about the datasets is presented in the Appendix.

2.4.1 Overview of Five Datasets

  1. 1.

    CAPRI set(Basu and Wallner, 2016) contains models submitted to CAPRI between 2005–2014 with their respective DockQ scores and CAPRI classifications. Since the prediction models of T35 and T36 contains only one chains in the pdb files, we removed these two targets, then it has 13 targets and 18205 predicted models totally. According to CAPRI classification, there are 16042 incorrect, 860 acceptable, 817 medium, 486 high models. (http://cb.iri.univ-lille1.fr/Users/lensink/Score_set/)

  2. 2.

    Heterodimer-AF2 (HAF2) set was generated by collecting the structures of heterodimers from the Protein Data Bank and predicting different models for each heterodimer (https://zenodo.org/record/6569837). Overall, the HAF2 dataset has 13 targets with 1370 models, 457 incorrect, 84 acceptable, 488 medium and 341 high.

  3. 3.

    Multimer-AF2(MAF2) set used AlphaFold 2 and AlphaFold-Multimer to generate protein complex structures for protein targets derived from the EVCoupling (Hopf et al., 2017) and DeepHomo (Yan and Huang, 2021) datasets.(https://zenodo.org/record/6570843) There are 2503 targets with 4985 models have correct classification and DockQ value.

  4. 4.

    CASP 14 set includes multimers in the CASP 14 whose native structures can be found, there are 17 targets and 1914 models. (https://www.predictioncenter.org/casp14)

  5. 5.

    CASP 15 set includes multimers in the CASP 15 whose native structures can be found and official assessments are shown on CASP website, there are 21 targets and 5304 models. (https://www.predictioncenter.org/casp15/)

2.4.2 Reference Label

In order to complete the parameter optimization and result testing with the above datasets, a reference label needs to be provided for each model. Labels in the datasets can be divided into two categories: classification label and regression label. On the one hand, CAPRI dataset and HAF2 dataset contain both DockQ result and CAPRI classification, MAF2 dataset only has DockQ result but can calculate CAPRI classification for dimer models (multimers more than two chains need assign ligand and receptor to get CAPRI classification), then the CAPRI classification in these three datasets can be regarded as reference label in the next optimization. On the other hand, CASP14 dataset has several evaluation metrics supplied by CASP, we select four main metrics that used in official assessment of recent CASP competition—oligo-lDDT, TM-score, ICS (F1 score) and IPS (Jaccard Coefficient) and calculate mean value of these four metrics as the reference label. According to the type of reference label, we can also divide datasets into classification set (CAPRI, HAF2, MAF2) and regression set (CASP14, CASP15), totally 2567 targets and 31778 models, the usage and number of obtained datasets are shown in Tabel 1.

Table 1: Description of dataset
Type Dataset Target Model
CAPRI 13 18205
Classification HAF2 13 1370
MAF2 2503 4985
Regression CASP14 17 1914
CASP15 21 5304
Summary 2567 31778

2.5 Parameter optimization

The process of DCI calculation mentioned in "Mask and Weight Matrix" requires 7 parameters including inter-chain weight ωiesubscript𝜔𝑖𝑒\omega_{ie}italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT, intra-chain weight ωiasubscript𝜔𝑖𝑎\omega_{ia}italic_ω start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT in Equation.2, contact weight ωcsubscript𝜔𝑐\omega_{c}italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, non-contact weight ωncsubscript𝜔𝑛𝑐\omega_{nc}italic_ω start_POSTSUBSCRIPT italic_n italic_c end_POSTSUBSCRIPT in Equation.3, and DCI weight coefficients ω1,ω2,ω3subscript𝜔1subscript𝜔2subscript𝜔3\omega_{1},\omega_{2},\omega_{3}italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT in Equation.7. And we have two constraints in Equation.8:

ωie+ωia=1ω1+ω2+ω3=1formulae-sequencesubscript𝜔𝑖𝑒subscript𝜔𝑖𝑎1subscript𝜔1subscript𝜔2subscript𝜔31\omega_{ie}+\omega_{ia}=1\qquad\omega_{1}+\omega_{2}+\omega_{3}=1\vspace*{-9pt}italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT = 1 italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 1 (8)

so that the number of weight coefficients can be reduced to 5: ωie,ωc,ωnc,ω2,ω3subscript𝜔𝑖𝑒subscript𝜔𝑐subscript𝜔𝑛𝑐subscript𝜔2subscript𝜔3\omega_{ie},\omega_{c},\omega_{nc},\omega_{2},\omega_{3}italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_n italic_c end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. In addition to optimizing these weight coefficients, we optimized classification cutoffs to calculate CAPRI classification by DCI score as accurate as possible. Here we give a brief introduction about the process and result of parameter optimization, more details are mentioned in the Appendix.

2.5.1 Weight coefficient

In this work we used Bayesian optimization (Frazier, 2018) on five weight coefficients, which aims at finding an acceptable local optimal solution for an objective function f(𝝎|𝒙,y)𝑓conditional𝝎𝒙𝑦f(\boldsymbol{\omega}|\boldsymbol{x},y)italic_f ( bold_italic_ω | bold_italic_x , italic_y ), where 𝝎=(ωie,ωc,ωnc,ω2,ω3)𝝎subscript𝜔𝑖𝑒subscript𝜔𝑐subscript𝜔𝑛𝑐subscript𝜔2subscript𝜔3\boldsymbol{\omega}=(\omega_{ie},\omega_{c},\omega_{nc},\omega_{2},\omega_{3})bold_italic_ω = ( italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_n italic_c end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), 𝒙𝒙\boldsymbol{x}bold_italic_x and y𝑦yitalic_y represent datasets and their labels. we use a few objective functions f𝑓fitalic_f to find more possible weight coefficient combinations including overall MCC𝑀𝐶𝐶MCCitalic_M italic_C italic_C (Matthews Correlation Coefficient), w_MCC𝑤_𝑀𝐶𝐶w\_MCCitalic_w _ italic_M italic_C italic_C (weighted MCC for four classifications) of CAPRI classification, cor𝑐𝑜𝑟coritalic_c italic_o italic_r (Pearson correlation coefficient), MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E (square loss between DCI score and CASP reference value). We created training set by classification set and mixed set (both classification and regression set).After obtaining good parameter combinations in training set, we calculated ACC𝐴𝐶𝐶ACCitalic_A italic_C italic_C (accuracy) and MCC𝑀𝐶𝐶MCCitalic_M italic_C italic_C in test set and selected the one with the best performance. The size ratio of the training set to the test set is 8:2. The specific selection of objective functions and training set, and the parameter combinations selected from the training set can be found in Appendix. Finally we determined all weight coefficients as ωie=0.8,ωc=1.4,ωnc=0.9,ω2=0.17,ω3=0.8formulae-sequencesubscript𝜔𝑖𝑒0.8formulae-sequencesubscript𝜔𝑐1.4formulae-sequencesubscript𝜔𝑛𝑐0.9formulae-sequencesubscript𝜔20.17subscript𝜔30.8\omega_{ie}=0.8,\omega_{c}=1.4,\omega_{nc}=0.9,\omega_{2}=0.17,\omega_{3}=0.8italic_ω start_POSTSUBSCRIPT italic_i italic_e end_POSTSUBSCRIPT = 0.8 , italic_ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 1.4 , italic_ω start_POSTSUBSCRIPT italic_n italic_c end_POSTSUBSCRIPT = 0.9 , italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.17 , italic_ω start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.8.

2.5.2 Classification cutoffs

We expect to obtain the correct CAPRI classification by DCI score. After calculating the DCI score with the optimized weight coefficients, three appropriate cutoffs c1,c2,c3subscript𝑐1subscript𝑐2subscript𝑐3c_{1},c_{2},c_{3}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are selected to classify the classification data in the test set, so that the classification is as accurate as possible. The minimum step size of grid search is set to 0.01 in the optimization method, and c1,c2,c3subscript𝑐1subscript𝑐2subscript𝑐3c_{1},c_{2},c_{3}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are searched in the range of [0.25,0.5],[0.45,0.7],[0.6,0.85]0.250.50.450.70.60.85[0.25,0.5],[0.45,0.7],[0.6,0.85][ 0.25 , 0.5 ] , [ 0.45 , 0.7 ] , [ 0.6 , 0.85 ] respectively. Finally we selected c1=0.48,c2=0.6,c3=0.82formulae-sequencesubscript𝑐10.48formulae-sequencesubscript𝑐20.6subscript𝑐30.82c_{1}=0.48,c_{2}=0.6,c_{3}=0.82italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.48 , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.6 , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.82 as classification cutoffs.

3 Results

We have obtained the complete calculation method of DCI score according to Equation.1-7, and got the optimized parameter values and cutoffs, here we will show the excellence of DCI by comparing it with other common metrics in complex structure assessment.

Classification results

This part of results is the comparison with DockQ. We evaluated the results not only on the complete test set, but also on the samples belonging to CAPRI, HAF2 and MAF2 in the test set which we called sub-test sets. First in terms of correlation, the overall Pearson correlation between DCI and DockQ is as high as 0.96, which indicates DCI score and DockQ, the state-of-the-art metric in complex assessment, have a strong connection. Fig. 4 shows a scatter plot of DCI versus DockQ on the test set,

Refer to caption
Figure 4: Scatter plot of DCI versus DockQ on the test set. Models are colored according to CAPRI classification as Incorrect (blue), Acceptable (green), Medium (red), High (yellow), and the Pearson correlation within the different quality classes is 0.46 (incorrect), 0.64 (acceptable), 0.9 (medium), 0.96 (high) respectively, and the overall Pearson correlation is 0.96.

and Fig. 5 shows DCI versus DockQ on each sub-test set, the Pearson correlation between DCI and DockQ on CAPRI, HAF2, MAF2 sub-test set is 0.91, 0.98, 0.996 respectively.

Refer to caption
Figure 5: Scatter plot of DCI vs. DockQ on each sub-test set. Models are colored according to CAPRI classification as Incorrect (blue), Acceptable (green), Medium (red), High (yellow), the Pearson correlation between DCI and DockQ on each sub-test set is 0.91 (CAPRI), 0.98 (HAF2), 0.996 (MAF2) respectively.

In terms of comparison of CAPRI classification results, DCI score and DockQ results are equivalent. The correlation coefficients between DCI and CAPRI true labels, DockQ and CAPRI true labels are equal, both are 0.78. Moreover, for the classification results of the test set samples, DCI was better than DockQ in the overall results, both ACC (accuracy) and MCC (Matthews correlation coefficient), and the accuracy of the four categories was also equivalent to DockQ. Tabel 2 shows the specific classification accuracy results.

Table 2: ACC and MCC comparation on test set
Target Range DCI DockQ
overall 0.903 0.898
Incorrect 0.97 0.955
ACC Acceptable 0.96 0.949
Medium 0.929 0.935
High 0.948 0.956
overall 0.786 0.783
Incorrect 0.925 0.893
MCC Acceptable 0.557 0.518
Medium 0.692 0.701
High 0.718 0.783

DCI showed comparable classification performance to DockQ on all three sub-test sets, we chose the MAF2 sub-test set with the best results for presentation as Tabel 3. The other two sub-test sets’ results are presented in the Appendix.

Table 3: ACC and MCC comparation on MAF2 sub-test set
Dataset Target Range DCI DockQ
overall 0.798 0.655
Incorrect 0.972 0.861
ACC Acceptable 0.943 0.826
Medium 0.823 0.775
MAF2 High 0.857 0.847
overall 0.713 0.52
Incorrect 0.926 0.628
MCC Acceptable 0.506 0.009
Medium 0.604 0.489
High 0.692 0.681
Refer to caption
Figure 6: Case study of complex 7AC9. This complex has 3 chains but we do not know which part is the ligand and which part is the receptor, if choosing the shortest chain (I) as the ligand, the DockQ value is 0.173, and if calculating mean DockQ value when three chains are chosen as ligand in turn (0.173 for chain I, 0.483 for chain H and 0.87 for chain L), the value is 0.509, however, the predicted structure only has obvious errors on chain I, and the rest positions are basically consistent with the native structure. Compared with the two values calculated by DockQ, DCI = 0.684 can more correctly reflect the prediction accuracy of this complex.

3.1 Advantages in multimer assessment

Here we show the advantage of DCI score in evaluating multimers compared to the current state-of-art metric DockQ, using a typical case (pdbID: 7AC9) as Fig. 6, an ordinary trimer with unknown ligand (required to calculate DockQ). If one of the chains is chosen as the ligand, It can be seen that the DockQ values of the three chains as ligand are very different. If we follow the current default principle of selecting ligand: choosing the shortest chain as the ligand, Then the ligand of this structure will be chain I which has only 9 residues, but obviously the predicted model is pretty similar to ground truth, and there is only a distinct deviation on the I chain, so it can be seen that DockQ, in the case of this unknown ligand, The evaluation of protein complexes is not sufficiently accurate. We also experimented with two trimer targets in the CAPRI dataset, with a total of 2950 samples, The results in Fig. 7 show that when the ligand is unknown, the relevant calculation of DockQ cannot guarantee the correct evaluation. Whether it is to calculate DockQ separately with each chain as ligand and then take the average, or to choose the shortest chain as ligand by default, Both results in a significant decrease in the accuracy of structure evaluation (the horizontal and vertical dashed lines in the figure represent the classification thresholds of DockQ and DCI respectively), which also reflects the advantage of DCI score in directly evaluating multimer regardless of ligand selection. Through the above two sets of experiments, we clarify the instability of DockQ in calculating non-docking situation, and also show the advantage of DCI score in this ubiquitous situation.

Refer to caption
Figure 7: DCI vs.multi-method DockQ on two CAPRI trimer targets. a. shows scatter plots of Target37 and Target40 DCI scores versus DockQ scores. The left subgraph is the standard case that calculate DockQ based on the known ligand. The latter two subgraphs show that when we assume that the ligand partition of the two targets is unknown, calculate DockQ with each chain as ligand and then average them (subgraph2), or calculate DockQ with the shortest chain in the structure as ligand (subgraph3). It can be seen that compared with the standard DockQ in subgraph1, the classification accuracy is significantly decreased in other two cases (ideally, the samples of the four classes should be located in the four diagonal grids). b. shows DockQ distribution differences calculated in these three cases. It is obviously that when ligand is unknown, the results of DockQ are unstable. So it can be seen that DockQ is not able to accurately evaluate the complex in the non-docking situation.

3.2 CASP results

In this section, we evaluated the results of the CASP14 test set and the CASP15 set (hereafter referred to as a CASP set), then analyzed the batch interface errors appeared in CASP assessment and the performance of DCI score in CASP targets. Fig. 8 shows the scatter plot of the DCI score and the CASP score :(oligo-lDDT+TM+ICS+IPS)/4.

Refer to caption
Figure 8: Scatter plot of DCI versus CASP score on CASP set. This figure contains 381 samples in CASP14 test set and 5314 samples in CASP15 set, color from cold to warm tones indicates the number of chains from small to large. The Pearson correlation coefficient between DCI score and CASP score is 0.522. The green line represents the fitting result of DCI score and CASP score when the number of chains is low (<6), and the corresponding light green areas are the confidence interval of ±0.25plus-or-minus0.25\pm 0.25± 0.25 given by us.

It can be found in Fig. 8 that when the number of chains is small, DCI correlates strongly with CASP score, but when the number of chains increases, the samples will deviate from the confidence interval. The Pearson correlation coefficient of DCI and CASP score on the CASP set is 0.522, after removing samples with 6 or more chains from CASP set, the Pearson correlation coefficient increased to 0.942. We fitted the samples after removing models that more than 6 chains and generate ±0.25plus-or-minus0.25\pm 0.25± 0.25 confidence intervals as overall trend. For outliers deviating from the overall trend, we divided them into the following two kinds.

First kind of outliers with high CASP score and low DCI score at the left upper side of Fig. 8, whose chain numbers are large. Low DCI score versus high CASP score means DCI is more strict than CASP evaluation for large complexes. To find out the reasons, we first analyzed the four metrics in CASP score.

We analyzed the relationship between the four metrics in CASP score and DCI score as shown in Fig. 9 focusing on the first kind of outliers. It can be found that when the metrics measuring the interface (ICS, IPS) increases, the metrics measuring the overall structure (TM-score, lDDT) is almost horizontal and always at high values. We consider that as the metrics mainly designed to evaluate single protein, TM-score and lDDT regard multimers as monomers when evaluating, which will cause inaccuracy in the overall structure evaluation of multimers. For example, lDDT is not sensitive to the relative orientation of domains, which is no longer an advantage when evaluating multimers because the relative orientation of chains is crucial to the multimers structures.

Refer to caption
Figure 9: Relationship between DCI score and four metrics in CASP score. This figure shows the outliers above the overall trend in Fig. 8. The vast majority are complexes with more than 6 chains. The left picture shows the scatter plots of DCI score an four metrics (TM-score, lDDT, ICS, IPS) in CASP score respectively. It can be found that when the metrics measuring the interface (ICS, IPS) increases, the metrics measuring the overall structure (TM-score, lDDT) is almost horizontal and always at high values. And the right picture shows the difference in distribution between DCI score and the four metrics, when the number of chains is large, the strictness of DCI score compared with CASP score is mainly reflected in the difference between DCI score and the overall metrics TM-score and lDDT.

When evaluating complexes using the metrics used to evaluate monomers, it is difficult to capture the overall structural deviation due to errors of interface prediction, but interface predicted accuracy is the most important for complex prediction compared to single chain prediction. Thanks to AlphaFold 2, the 3D structure of a single chain in a complex has already gotten high quality (Callaway, 2022), but the relative position and orientation of the chains in a complex are mainly determined by the interface predicted accuracy. Even two correct single-chain structures, if connected by a wrong interface, the obtained overall structure also has a large deviation from the native structure. Moreover, the more chains there are, the more interface errors are likely to occur, then the more chains are likely to deviate from the correct position. We call this kind of problem : batch interface errors.

Unlike CASP score, DCI can better reflect the overall structural deviation caused by batch interface errors. Because the mechanism of DCI is emphasizing the prediction accuracy of interface on the basis of evaluating overall structural similarity. The relative position deviation caused by interface error between two chains can be well reflected by the increase of corresponding elements in the distance difference map.

There are two sets of cases of CASP results that show the accuracy of DCI in evaluating multimers compared to CASP score. Fig. 10 shows the top 6 CASP score cases of target H1036 (9 chains) in CASP14 test set, but the DCI scores are very low.

Refer to caption
Figure 10: Case study of target H1036. This target has nine chains, the above models are the top six CASP scores among the predicted models of H1036. But after aligning these models with the native structure, it can be seen that the overall similarity of each prediction model is not high, And the main reason is that the predicted error of the interface formed by any neighboring chains would add up to a large deviation in the overall structure of the model. For example, some chains of TS018_1 and TS177_2 are completely opposite to the correct position, while some chains of TS055_2, TS418_2 and TS193_5 are significantly deviated from the correct position. However, chains like TS403_4 have relatively fewer deviation errors, so its DCI score (0.223) is higher than other models, which the selection of best model is consistent with the result of CASP assessment.

Also as shown in Fig. 11, three predicted models have similar CASP scores around 0.6, but as the number of chains increases from 2 to 10 and then to 27, contact errors between interfaces lead to more and more differences in the relative orientation of chains, which leads to briefly decreases in DCI score.

Refer to caption
Figure 11: 3 Cases study of CASP15. The above three pairs of structures were from T1187o (2 chains), H1137 (10 chains), H1111 (27 chains) and one predicted model of each, and the CASP score is 0.6, but as the number of chains increased, For example, the predicted structure of T1187o only shows many deviations in the loop region, while 6 chains of H1137 show serious deviations in direction. After alignment of the predicted model onto the native structure of H1111, the native structure and each chain of the predicted model show a complete "interleaved" state. Almost no single pair of chains can be broadly aligned in the overall structure. The "interleaved" details of H1111 is shown in Appendix.

According to the above analysis, compared with CASP assessment, DCI can better reflect the case of batch interface errors. And because the larger the number of chains, the more likely and number of batch interface errors will occur, so the DCI score will be more strict than the CASP score, that is, the more likely to have lower DCI score.

It is worth noting that more attention is paid to the relative orientation between the chains, which is consistent with the situation that in the current structure prediction task, the single-chain structure prediction is very accurate, and the key difficulty is the combination of multi-chains.

Refer to caption
Figure 12: Missing chain case in CASP set. This case shows T1034o in CASP14, which represents the second kind of outlier cases. Its native structure has 4 chains, while the predicted model has only 2 chains. Therefore, in the process of structure length alignment, the first two chains of the native structure are clipped by default. However, the predicted model actually corresponds to the second and third chains in the native structure. As a result, DCI score was only 0.279. When the number and order of chains correspond correctly, the DCI score (0.148) returns to normal trend.

Second kind of outliers with low CASP scores and high DCI score, this case reflects a common problem in the use of most evaluation metrics, that is, the prediction models only have partial chains of the native structure. As shown in Fig. 12, This target is a proxy below the overall trend in Fig. 8, which DCI score is 0.279, higher than the CASP score of 0.136. This is the case because the native structure has 4 chains, but the predicted model contains only 2 of them. So when the length of these two structures is aligned, the first two chains of the native structure are selected by default. To solve this problem, the number and order of chains need to be determined at the time of input to facilitate structure alignment. If the number of chains does not correspond, it needs to be corrected manually. If the chain order does not correspond, our program provides a special parameter to align the chain order.

4 Discussion

In this work, we used DCI to solve two problems in the evaluation of protein complex predicted model:

  1. 1.

    When evaluating a normal multimer for unspecified receptor and ligand, the evaluation of DockQ will become unstable, while DCI score can give an overall and accurate evaluation.

  2. 2.

    When the number of chains increases, batch interface errors are more likely to occur. In this case, TM-score and lDDT used by CASP do not well reflect the overall structure deviation caused by interface prediction errors, where DCI does a significantly better job.

In terms of the DCI algorithm, one issue that needs to be considered is, while DCI reflects batch interface errors, it is likely to be affected by the number of chains and the length of the structure, that is, more or longer chains will magnify the judgment of the algorithm on batch interface errors, resulting in a very low score. Therefore removing these effects from the algorithm is the main goal in the next step.

It is also worth noting that we have given cutoffs for CAPRI classification using DCI, but it is clear that when the number of chains is large, the classification cutoffs are only references, and what is more important is the value of DCI score to evaluate the structural similarity between complexes.

In terms of the application of DCI score, we consider that DCI score is well suited as a reference label or loss function for protein complex model quality assessment (QA) problems. In addition, it will also play a good role in the task of using deep learning models to predict the distance matrix between strands and contact residue pairs.

5 Conclusion

In this work we proposed a new assessment metric, DCI for evaluating protein complex predicted model quality. DCI score is a continuous value within 0 to 1, which only depends on the distance maps and contact-interface map of structures. Instead of dividing a complex into two parts (ligand and receptor) like DockQ, DCI can evaluate a structure as a whole. And DCI can better evaluate batch interaface errors in multi-chain case. This accurate and stable evaluation assessment can better guide the prediction task of protein complex, and for protein engineering such as drug design, it also has a positive influence.

References

  • Kabsch (1976) W. Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A, 32, 1976.
  • Kabsch (1978) Wolfgang Kabsch. A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A, 34:827–828, 1978.
  • Zhang and Skolnick (2007) Yang Zhang and Jeffrey Skolnick. Scoring function for automated assessment of protein structure template quality. Proteins, 57:702–10, 09 2007. doi:10.1002/prot.20264.
  • Zemla (2003) Adam Zemla. Lga: A method for finding 3d similarities in protein structures. Nucleic acids research, 31(13):3370—3374, July 2003. ISSN 0305-1048. doi:10.1093/nar/gkg571. URL https://europepmc.org/articles/PMC168977.
  • Zemla et al. (2001) Adam Zemla, Česlovas Venclovas, John Moult, and Krzysztof Fidelis. Processing and evaluation of predictions in casp4. Proteins: Structure, Function, and Bioinformatics, 45:13 – 21, 02 2001. doi:10.1002/prot.10052.
  • Mariani et al. (2013) Valerio Mariani, Marco Biasini, Alessandro Barbato, and Torsten Schwede. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics, 29(21):2722–2728, 08 2013. ISSN 1367-4803. doi:10.1093/bioinformatics/btt473. URL https://doi.org/10.1093/bioinformatics/btt473.
  • Gao and Skolnick (2010) Mu Gao and Jeffrey Skolnick. ialign: A method for the structural comparison of protein-protein interfaces. Bioinformatics (Oxford, England), 26:2259–65, 09 2010. doi:10.1093/bioinformatics/btq404.
  • Jumper et al. (2021) John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon Kohl, Andrew Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, and Demis Hassabis. Highly accurate protein structure prediction with alphafold. Nature, 596:1–11, 08 2021. doi:10.1038/s41586-021-03819-2.
  • Evans et al. (2021) Richard Evans, Michael O’Neill, Alexander Pritzel, Natasha Antropova, Andrew Senior, Tim Green, Augustin Žídek, Russ Bates, Sam Blackwell, Jason Yim, Olaf Ronneberger, Sebastian Bodenstein, Michal Zielinski, Alex Bridgland, Anna Potapenko, Andrew Cowie, Kathryn Tunyasuvunakool, Rishub Jain, Ellen Clancy, Pushmeet Kohli, John Jumper, and Demis Hassabis. Protein complex prediction with alphafold-multimer. bioRxiv, 2021. doi:10.1101/2021.10.04.463034. URL https://www.biorxiv.org/content/early/2021/10/04/2021.10.04.463034.
  • Kryshtafovych et al. (2021) Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and John Moult. Critical assessment of methods of protein structure prediction(casp)–round xiv. Proteins: Structure, Function, and Bioinformatics, 89, 09 2021. doi:10.1002/prot.26237.
  • Ozden et al. (2021) Burcu Ozden, Andriy Kryshtafovych, and Ezgi Karaca. Assessment of the casp14 assembly predictions. Proteins: Structure, Function, and Bioinformatics, 89, 08 2021. doi:10.1002/prot.26199.
  • Lensink et al. (2007) Marc Lensink, Raul Mendez Giraldez, and Shoshana Wodak. Docking and scoring protein complexes: Capri 3rd edition. Proteins, 69:704–18, 12 2007. doi:10.1002/prot.21804.
  • Basu and Wallner (2016) Sankar Basu and Björn Wallner. Dockq: A quality measure for protein-protein docking models. PLoS ONE, 11:e0161879, 08 2016. doi:10.1371/journal.pone.0161879.
  • Hopf et al. (2017) Thomas Hopf, John Ingraham, Frank Poelwijk, Charlotta Schärfe, Michael Springer, Chris Sander, and Debora Marks. Mutation effects predicted from sequence co-variation. Nature Biotechnology, 35, 01 2017. doi:10.1038/nbt.3769.
  • Yan and Huang (2021) Yumeng Yan and Sheng-You Huang. Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes. Briefings in Bioinformatics, 22(5), 03 2021. ISSN 1477-4054. doi:10.1093/bib/bbab038. URL https://doi.org/10.1093/bib/bbab038. bbab038.
  • Frazier (2018) P. Frazier. A tutorial on bayesian optimization. ArXiv, abs/1807.02811, 2018.
  • Callaway (2022) Ewen Callaway. After alphafold: protein-folding contest seeks next big breakthrough. Nature, 613, 12 2022. doi:10.1038/d41586-022-04438-1.