1 appendices

\epstopdfDeclareGraphicsRule

.pdfpng.pngconvert #1 \OutputFile

1 appendices

2 Training and Assessment criteria

2.1 Datasets

A dataset of 1,800 antibody-antigen complexes from the AbDb was used Ferdous, 2018 . Complexes with antigens larger than 700 or smaller than 20 were discarded, resulting in 1,214 complexes. Because this is a dataset of bound complex structures, we produced a noisy unbound-like model for antibodies by modeling the antibodies via AFM exposing the docking and the training to noisy modeled structures (Fig. S2B), The complexes were divided into train and test using 97% sequence identity cut-off for antibody sequences, The final train set included 790 complexes, with 85 additional complexes that were used for validation. The complexes were divided into train and test using 97% sequence identity cut-off for antibody sequences.

2.2 Sampling antibody-antigen complexes

The antibodies were docked to antigens using the antibody-antigen docking protocol of PatchDock. PatchDock is an efficient geometric rigid docking method that maximizes shape complementarity Duhovny, 2002 ;Schneidman-Duhovny, 2005.Here we used a higher sampling precision to generate a higher fraction of “positives”, eg. acceptable accuracy models, for training, typically generating 100,000 docking models. These models were scored and re-ranked using SOAP-PP statistical potential Dong, 2013. For training, we used 2,500 top-scoring models along with up to 50 positives irrespective of their ranking. For validation, we have used 2,500 top-scoring docking models based on SOAP-PP ranking.

2.3 Training

ContactNet was trained end-to-end to predict binary classification based on the assumption that the distribution of correct and incorrect complexes is separable while optimizing the cross-entropy loss. We trained the model with the WADAM optimizer Loshchilov, 2017 with a learning rate of 1e-4 that decays 100 times while training using the cosine decay as a learning rate scheduler with a weight of 5e-3 for the decay factor. The batch size contained 52 docking models, randomly selected from different complexes for better generalization. One epoch was defined as 2,000 batches with 100,000 docking models per epoch. The network was trained for 160 epochs. The major challenge in the training process was the unbalanced nature of the data. Specifically, there were hundreds of thousands of negative complexes compared to a small number of positive ones. To cope with this challenge each batch is composed of 25% of the positive, acceptable accuracy docking models and 75% of the negative, incorrect ones. We used the W&B platform for experiment tracking Biewald, 2020 . and trained the model on a single GPU 2080RTX for 20 hours.

2.4 Assessment criteria

Each complex model is assessed for accuracy based on root mean square deviation (RMSD) from the correct structure, as used at CAPRI Mendez, 2005;Lensink, 2007 . A docking model is considered acceptable if the ligand $C\alpha$ RMSD after superposition of the receptors is < 10 Å or the interface $C\alpha$ RMSD is < 4 Å. A docking model is of medium accuracy if ligand $C\alpha$ RMSD is < 5 Å or interface $C\alpha$ RMSD is < 2 Å. The success rate is the percentage of benchmark cases with at least one medium or acceptable accuracy model in the top N predictions. The hit rate is defined as the number of complexes of acceptable or higher accuracy among top-N predictions.