Abstract
The physiology of a cell can be viewed as the product of thousands of proteins acting in concert to shape the cellular response. Coordination is achieved in part through networks of proteinâprotein interactions that assemble functionally related proteins into complexes, organelles, and signal transduction pathways. Understanding the architecture of the human proteome has the potential to inform cellular, structural, and evolutionary mechanisms and is critical to elucidating how genome variation contributes to disease1,2,3. Here we present BioPlex 2.0 (Biophysical Interactions of ORFeome-derived complexes), which uses robust affinity purificationâmass spectrometry methodology4 to elucidate protein interaction networks and co-complexes nucleated by more than 25% of protein-coding genes from the human genome, and constitutes, to our knowledge, the largest such network so far. With more than 56,000 candidate interactions, BioPlex 2.0 contains more than 29,000 previously unknown co-associations and provides functional insights into hundreds of poorly characterized proteins while enhancing network-based analyses of domain associations, subcellular localization, and co-complex formation. Unsupervised Markov clustering5 of interacting proteins identified more than 1,300 protein communities representing diverse cellular activities. Genes essential for cell fitness6,7 are enriched within 53 communities representing central cellular functions. Moreover, we identified 442 communities associated with more than 2,000 disease annotations, placing numerous candidate disease genes into a cellular framework. BioPlex 2.0 exceeds previous experimentally derived interaction networks in depth and breadth, and will be a valuable resource for exploring the biology of incompletely characterized proteins and for elucidating larger-scale patterns of proteome organization.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
References
Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068â1081 (2012)
Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339â344 (2015)
Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 1257601 (2015)
Huttlin, E. L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425â440 (2015)
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575â1584 (2002)
Blomen, V. A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 1092â1096 (2015)
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096â1101 (2015)
Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413â1415 (2008)
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57â74 (2012)
Stenson, P. D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1â9 (2014)
Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637â643 (2006)
Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712â723 (2015)
Guruharsha, K. G. et al. A protein complex network of Drosophila melanogaster. Cell 147, 690â703 (2011)
Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 659â661 (2011)
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexesâ2009. Nucleic Acids Res. 38, D497âD501 (2010)
Rual, J. F. et al. Towards a proteome-scale map of the human proteinâprotein interaction network. Nature 437, 1173â1178 (2005)
Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212â1226 (2014)
Ryan, C. J. et al. High-resolution network biology: connecting sequence with function. Nat. Rev. Genet. 14, 865â879 (2013)
Dutkowski, J. et al. A gene ontology inferred from molecular networks. Nat. Biotechnol. 31, 38â45 (2013)
Magrane, M . & UniProt Consortium. UniProt Knowledgebase: a hub of integrated protein data. Database 2011, bar009 (2011)
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015)
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222âD230 (2014)
Zhong, Y. et al. Distinct regulation of autophagic activity by Atg14L and Rubicon associated with Beclin 1âphosphatidylinositol-3-kinase complex. Nat. Cell Biol. 11, 468â476 (2009)
Austin-Tse, C. et al. Zebrafish ciliopathy screen plus human mutational analysis identifies C21orf59 and CCDC65 defects as causing primary ciliary dyskinesia. Am. J. Hum. Genet. 93, 672â686 (2013)
Piñero, J . et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015, bav028 (2015)
Babu, M. et al. Interaction landscape of membrane-protein complexes in Saccharomyces cerevisiae. Nature 489, 585â589 (2012)
Floyd, B. J. et al. Mitochondrial protein interaction mapping identifies regulators of respiratory chain function. Mol. Cell 63, 621â632 (2016)
Chantranupong, L. et al. The CASTOR proteins are arginine sensors for the mTORC1 pathway. Cell 165, 153â164 (2016)
Dong, R. et al. Endosome-ER contacts control actin nucleation and retromer function through VAP-dependent regulation of PI4P. Cell 166, 408â423 (2016)
Rappsilber, J., Mann, M. & Ishihama, Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896â1906 (2007)
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976â989 (1994)
Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207â214 (2007)
Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174â1189 (2010)
Sowa, M. E., Bennett, E. J., Gygi, S. P. & Harper, J. W. Defining the human deubiquitinating enzyme interaction landscape. Cell 138, 389â403 (2009)
Behrends, C., Sowa, M. E., Gygi, S. P. & Harper, J. W. Network organization of the human autophagy system. Nature 466, 68â76 (2010)
Franceschini, A. et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808âD815 (2013)
Warde-Farley, D. et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38, W214âW220 (2010)
Chatr-aryamontri, A. et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, D816âD823 (2013)
Licata, L. et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857âD861 (2012)
Pratt, D. et al. NDEx, the Network Data Exchange. Cell Syst. 1, 302â305 (2015)
Calvo, S. E., Clauser, K. R. & Mootha, V. K. MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins. Nucleic Acids Res. 44 (D1), D1251âD1257 (2016)
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546â1558 (2013)
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289â300 (1995)
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25â29 (2000)
Gallegos, L. L. et al. A protein interaction map for cell-cell adhesion regulators identifies DUSP23 as a novel phosphatase for β-catenin. Sci. Rep. 6, 27114 (2016)
Wilson-Grady, J. T., Haas, W. & Gygi, S. P. Quantitative comparison of the fasted and re-fed mouse liver phosphoproteomes using lower pH reductive dimethylation. Methods 61, 277â286 (2013)
Ran, F. A. et al. Genome engineering using the CRISPRâCas9 system. Nat. Protoc. 8, 2281â2308 (2013)
Tan, M. K., Lim, H. J., Bennett, E. J., Shi, Y. & Harper, J. W. Parallel SCF adaptor capture proteomics reveals a role for SCFFBXL17 in NRF2 activation via BACH1 repressor turnover. Mol. Cell 52, 9â24 (2013)
Meng, Z., Moroishi, T. & Guan, K. L. Mechanisms of Hippo pathway regulation. Genes Dev. 30, 1â17 (2016)
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582â587 (2014)
Acknowledgements
We thank M. Vidal and D. Hill for ORFeome 8.1, and the Nikon Imaging Center (Harvard Medical School) for imaging support. This work was supported by the National Institutes of Health (U41 HG006673 to S.P.G., J.W.H., and E.L.H.) and Biogen (S.P.G., J.W.H.). J.A.P. is supported by K01DK098285, and S.S. was supported by the Canadian Institutes for Health Research.
Author information
Authors and Affiliations
Contributions
The study was conceived by S.P.G. and J.W.H. E.L.H. developed CompPASS-Plus and software for data collection and integration, performed all informatic analyses, and oversaw data collection and pipeline quality. R.J.B. directed the cell culture and biochemistry pipeline and organized samples for mass spectrometry analysis with L.T. J.A.P. and J.R.C. were responsible for all mass spectrometry operation. K.B., G.C., F.G., M.P.G., H.P., R.A.O., S.T., G.Z., and J.S. performed DNA and cell line production. R.J.B. and G.Z. performed all affinity purifications. L.P.-V., A.E.W., and S.S. performed validation experiments. B.K.E. and R.R. provided computational support. Data interpretation was performed by E.L.H., K.L., K.G.G., S.A.-T., S.P.G., and J.W.H. Data visualization tools were constructed by S.P.G. and D.K.S. The paper was written by E.L.H., S.P.G., and J.W.H. and was edited by all authors.
Corresponding authors
Ethics declarations
Competing interests
S.P.G. is a consultant for Biogen, Inc.
Additional information
Reviewer Information Nature thanks J. Coon and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Figure 1 BioPlex network coverage and validation of interactions for a set of poorly studied proteins in BioPlex 2.0 using HCT116 cells.
a, BioPlex network coverage of selected protein classes. Light shades represent total proteins, while dark shades represent baits targeted for APâMS. BioPlex 1.0 is depicted in blue shades while BioPlex 2.0 is highlighted in red. bâm, The indicated bait proteins (teal) were expressed in HCT116 cells and anti-HA immune complexes analysed by mass spectrometry. HCIPs were determined using CompPASS-Plus. Interactions observed in both HCT116 and HEK293T cells are indicated with blue edges and nodes. Interactions seen in HEK293T but not HCT116 are shown in grey edges and nodes. b, TMEM111; c, ZNHIT3; d, RMND5A; e, SMTNL2; f, FBXO28; g, C3orf75; h, c9orf41; i, MPP2; j, ZNF219; k, ZNF483; l, WDR37; m, LRCH3.
Extended Data Figure 2 Validation of interactions in BioPlex 2.0.
aâc, Systematic analysis of 14-3-3 interactions by reciprocal APâMS. a, The matrix relates 39 BioPlex 2.0 baits (horizontal) with six 14-3-3 proteins (left) which were detected as preys one or more times. Coloured (that is, non-white) boxes indicate interactions that were observed in BioPlex 2.0; the specific colour indicates the outcome of a reciprocal APâMS experiment targeting the 14-3-3 protein instead. Boxes shaded red could not be detected in the reciprocal direction because the 14-3-3 protein YWHAE failed sequence validation and could not be subjected to APâMS analysis; boxes shaded light grey were also not observed in reciprocal orientation, probably because those particular proteins (shaded in grey across the top) were not detectable in HEK293T cells and not expected to appear as preys in the 14-3-3 pulldowns. Blue boxes indicate interactions that were observed in reciprocal orientation, while dark grey boxes were not observed in reciprocal orientation. Note that SFN is listed in both horizontal and vertical directions because it was a bait in the BioPlex 2.0 network. b, Reciprocal interactions among 14-3-3 proteins. Shading is the same as above, with black indicating that self-interactions are not considered for reciprocal analysis. c, Summary of interaction results across a and b. Overall, more than 40% of 14-3-3 interactions were confirmed via reciprocal immunoprecipitation; after accounting for YWHAE and those BioPlex baits that are not detected in HEK293T cells in the absence of overexpression, the reciprocal rate rises to 63% of eligible interactions. dâi, Validation of a PDLIM7âPTPN14 BioPlex 2.0 network in MCF10A cells. This network is regulated by the Hippo kinase system, which is activated upon contact inhibition of cell proliferation. To validate this network, including previously unreported interactions, a series of APâMS experiments were performed in proliferating or contact inhibited MCF10A cells and HCIPs identified using CompPASS. d, Summary of interactions identified in BioPlex 2.0 or MCF10A APâMS experiments. Edges detected in BioPlex 2.0 only are red, while edges detected in both cell lines are purple and edges unique to the MCF10A immunoprecipitations are shaded blue. MCF10A-specific edges that could not appear in BioPlex 2.0 because neither of their constituent proteins were targeted as a bait are shown as dashed lines. Nodes are coloured to represent their status in the BioPlex network: black nodes were targeted as baits in BioPlex 2.0 and grey nodes appear as preys, while white nodes do not appear in BioPlex at all. Edges observed in MCF10A experiments are assumed to have been detected in both confluent and sub-confluent cells, unless they have been labelled with an âSâ or a âCâ, implying that they were detected only under sub-confluent or confluent conditions, respectively. Interactions further confirmed via immunoprecipitationâwestern analysis are labelled with âWâ (see h and i). e, Duplicate network highlighting previously unreported edges within the combined BioPlex 2.0/MCF10A Hippo interaction network. Edges highlighted in grey have been reported previously, while new edges are highlighted in blue. f, Summary of overlap between BioPlex 2.0 and the MCF10A interaction networks. Sixty-five per cent of eligible interactions were confirmed. g, Summary of novel and previously reported interaction counts in the combined Hippo network: 63% of interactions have not been previously reported. hâi, Immunoprecipitationâwestern analysis confirmation of interactions among PDLIM7âPTPN14 (h) and PTPN14âMAGI1 (i). IP, immunoprecipitation.
Extended Data Figure 3 BioPlex 2.0 enables subcellular localization prediction for additional uncharacterized proteins.
a, Increased interaction density expands subcellular localization predictions from BioPlex 2.0. b, Subcellular localization predictions for a selection of uncharacterized human proteins for which no confident prediction could be made in BioPlex 1.0. Where possible, the figure indicates whether predicted localization is consistent with the Human Protein Atlas21. câj, Sub-networks highlighting primary and secondary neighbours for selected uncharacterized human proteins whose subcellular localization can be predicted using the BioPlex network. Nodes are coloured according to subcellular localization data provided by UniProt. P values were calculated by Fisherâs exact test as described in Methods with multiple testing correction. Localizations depicted in c, e, g, and i are consistent with recent characterization as listed in UniProt; The localization given in d is consistent with MitoCarta 2.0 (ref. 41).
Extended Data Figure 4 Validation of subcellular localization predictions using anti-HA immunofluorescence.
The indicated bait proteins fused at their C terminus with an HA tag were expressed after transient infection of lentiviruses at low multiplicity of infection; after 2 days, cells were fixed and subjected to anti-HA-based immunofluorescence (red). Nuclei were stained with Hoechst. For baits with predicted mitochondrial localization, cells were co-stained with anti-TOMM20 antibodies (green). Z-series optical sections were acquired via spinning disk confocal microscopy; maximum intensity projections are shown. Scale bar, 20 μm.
Extended Data Figure 5 Increased scope of BioPlex 2.0 network reveals additional domainâdomain associations.
a, Numbers of PFAM domain associations detected within BioPlex 1.0 and 2.0 interaction networks. b, A selection of domain interactions detected in both networks highlighting increased significance due to greater coverage of the BioPlex 2.0 network (red) versus its earlier form (blue). c, A subset of domainâdomain associations detected within BioPlex 2.0, but not BioPlex 1.0. Although over 4,000 new domainâdomain associations were detected overall (a; BenjaminiâHochberg adjusted Pâ<â0.01), for purposes of display only domain associations with Pâ<â10â15 are shown. d, Selected domainâdomain associations involving domains of unknown function (DUF*, where * represents the variable number); an adjusted P value less than 10â6 was required. eâg, Sub-networks highlighting interactions underlying associations among selected domain pairs. Blue and red shading highlights proteins bearing the indicated domains. Asterisks denote central proteins whose names are denoted above each sub-network. e, GDI/Ras association; f, KBP-C/Kinesin association; g, DUF4482/KRAB association.
Extended Data Figure 6 Cullin domain associations reflect regulatory proteins and substrate adaptors.
a, Modular structure of cullinâRING E3 ubiquitin ligases. Edge colours unite domain(s) within the same protein molecules. Shading highlights individual domains as cullins (purple), adaptor proteins (light blue), substrate-binding modules (green), or other (grey). CSN, Cop9/signalsome. b, Cullin domain associations. Edges connect domains that were found to associate with each other more frequently than expected (see Methods). P values were calculated by Fisherâs exact test with multiple testing correction. Self-loops indicate domains that were found to preferentially associate with other proteins containing the same domain. Nodes are coloured to reflect protein function as described in a. c, d, Pairwise enrichment of the indicated PFAM domains among neighbours of each indicated cullin-domain-containing protein. Proteins that have been specifically targeted for APâMS as baits are highlighted in blue; those that appear as preys only are black. Domains are grouped by function with colour coding as described above. CSN, Cop9/signalsome; GLMN, glomulin. c, Red boxes indicate significant enrichment (Pâ<â0.01) after multiple testing correction; NS indicates the specified domain was found, but significance thresholds were not met. d, Networks depict the immediate neighbours of each cullin-domain-containing protein (centre, blue). Neighbours that contain the indicated domains are highlighted in red.
Extended Data Figure 7 BioPlex 2.0 expands functional insights into uncharacterized proteins.
a, Stacked bar graph depicting the number of baits targeted in BioPlex 1.0 and BioPlex 2.0 with gene symbols matching each pattern; BioPlex 2.0 matches have been subdivided to indicate the fraction associated with one or more enriched functional classes (hypergeometric test; BenjaminiâHochberg adjusted Pâ<â0.01). This fraction is also expressed as a percentage for each bar. bâk, Nearest neighbour sub-networks centred on selected human proteins with limited previous characterization. Colour coding is used to highlight proteins that match any enriched functional categories. lân, Validation of C13orf18 association with components of the BECN1 complex (h). Extracts prepared from HEK293T cells expressing the indicated constructs were subjected to affinity purification using anti-GFP resin (l, m) or anti-Flag magnetic beads (n), followed by immunoblotting with anti-BECN1 or anti-C13orf18 antibodies.
Extended Data Figure 8 MCL clustering subdivides the BioPlex 2.0 network into clusters of functionally associated proteins.
a, Summary of sub-network topologies for all 1,320 complexes. Numbers indicate the counts of complexes matching each topology. bâe, Selected protein complexes that associate proteins with related functions. Coloured nodes and edges associate individual proteins with enriched classifications. Inset diagrams indicate complex coverage in BioPlex 1.0. Black nodes and edges indicate proteins and interactions that were present in the BioPlex 1.0; empty nodes depict proteins from the BioPlex 2.0 community that were not detected in BioPlex 1.0.
Extended Data Figure 9 Network properties and community distribution of fitness genes.
a, Overlap among BioPlex 2.0 and two published lists of cellular fitness genes6,7. bâe, Simulations reveal distinctive network properties of cellular fitness genes (see Methods for details). b, Mean vertex degree; c, mean eigenvector centrality; d, mean local clustering coefficient; e, graph assortativity. f, Expanded view of the BioPlex community network from Fig. 3a, including descriptions of 53 communities that are enriched for cellular fitness proteins. Numbers after each community description correspond to cluster indices as found in Supplementary Tables 6â8.
Extended Data Figure 10 The BioPlex interaction network and hereditary disease: patient mutations in the hereditary spastic paraplegia protein KIAA0196/SPG8 affect formation of the WASH complex.
aâc, BioPlex 2.0 communities associated with congenital or hereditary disease states. Green nodes are associated with the indicated disease (DisGeNET), while other community members are grey. Edge colours indicate connectivity of individual communities revealed through MCL clustering. a, BardetâBiedl syndrome; b, mitochondrial complex I deficiency; c, hereditary spastic paraplegia (the WASH complex). d, Quantitative analysis of the association of KIAA0196/SPG8 and its mutant forms found in hereditary spastic paraplegia was performed using tandem mass tagging proteomics, and the relative abundance of individual WASH complex subunits displayed as a heat map. e, HEK293T cells were gene-edited to delete endogenous KIAA0196. Wild type (WT) or disease variants (N471D/L619F/V626F) of KIAA0196 (N-terminally Flag tagged) were expressed in these cells and assayed by immunoblotting. f, Work-flow for the tandem mass tagging approach to quantify KIAA0196-associated proteins. g, Quantitative interaction proteomics of wild type and variants of KIAA0196. Average relative intensities of biological replicates of interacting proteins are shown. Error bars, meanâ±âs.d. The number of peptides quantified for each protein is indicated in parentheses. h, i, Immunoprecipitation (IP)/immunoblotting (IB) was performed on three biological replicates to examine association of WASH complex members by immunoblotting. Average relative intensities of immunoblot signals for biological triplicates are shown; error bars, meanâ±âs.d.
Supplementary information
Supplementary Information
This file contains Supplementary Figure 1. (PDF 1227 kb)
Supplementary Table
This file contains Supplementary Table 1. (XLSX 5414 kb)
Supplementary Table
This file contains Supplementary Table 2. (XLSX 112 kb)
Supplementary Table
This file contains Supplementary Table 3. (XLSX 543 kb)
Supplementary Table
This file contains Supplementary Table 4. (XLSX 323 kb)
Supplementary Table
This file contains Supplementary Table 5. (XLSX 5340 kb)
Supplementary Table
This file contains Supplementary Table 6. (XLSX 246 kb)
Supplementary Table
This file contains Supplementary Table 7. (XLSX 508 kb)
Supplementary Table
This file contains Supplementary Table 8. (XLSX 244 kb)
Supplementary Table
This file contains Supplementary Table 9. (XLSX 17 kb)
Supplementary Table
This file contains Supplementary Table 10. (XLSX 33 kb)
Rights and permissions
About this article
Cite this article
Huttlin, E., Bruckner, R., Paulo, J. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505â509 (2017). https://doi.org/10.1038/nature22366
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature22366