Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map)

Donghoon  Oh

Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map)

Bioinformatics, 2005

Motivation: Many genomes have been completely sequenced. However, detecting and analyzing their protein–protein interactions by experimental methods such as co-immunoprecipitation, tandem affinity purification and Y2H is not as fast as genome sequencing. Therefore, a computational prediction method based on the known protein structural interactions will be useful to analyze large-scale protein–protein interaction rules within and among complete genomes. Results: We confirmed that all the predicted protein family interactomes (the full set of protein family interactions within a proteome) of 146 species are scale-free networks, and they share a small core network comprising 36 protein families related to indispensable cellular functions. We found two fundamental differences among prokaryotic and eukaryotic interactomes: (1) eukarya had significantly more hub families than archaea and bacteria and (2) certain special hub families determined the topology of the eukaryotic interactomes.......Read more

Comparative analysis of protein family networks interaction information available at present and (3) it provides expli- cit models of structural interactions that can be used in the validation of the predicted interactions. A disadvantage is that the individual protein interaction information cannot be readily verified. However, we found that 340 out of the 591 (57.5%) human protein interac- tions in the Database of Interacting Proteins could be explained by structural interactions (Moon et al., 2005). To investigate the broad evolutionary trend in protein interaction networks, the use of conserved protein family interactions is more appropriate than individual protein interactions. For this reason, in our comparative analysis using PSIMAP the precise verification of individual molecular interactions is not necessary, and the reliab- ility of the family interaction predictions is mostly dependent on accurate family detection in the genomes. Although particular organ- isms may inevitably have false or missing predictions, in general this should not greatly affect the overall comparative family level analysis. Based on the comparative analysis results of protein family interaction networks reconstructed by PSIMAP (see Supplement- ary Table 1), we were able to get insight into species diversification in terms of computationally predicted protein family interactomes and their common features. First, we found that all 146 species’ pro- tein family interactomes share a small core network comprising 47 protein family interaction pairs (including self- or homo-interacting pairs). Although small and probably incomplete in number, we can infer that the core is well-conserved in diverse life forms. A notable outcome was that a very small number of expansive protein families played a key role in the interactome growth and determined the char- acteristics of the interactomes of prokarya (archaea and bacteria) and eukarya. Eukaryotic interactomes have a distinctly different topology from prokaryotic interactomes regardless of the proteome size. METHODS Protein structure assignment To construct the protein family interaction network in a particular proteome, we first assigned the known three-dimensional structural families (on which PSIMAP is based) to the protein sequences. For this, 146 completely sequenced species from the European Bioinformatics Institute (EBI) and their 578 625 protein sequences were used (Pruess et al., 2003). The 146 genomes represented species of archaea (15), bacteria (122) and eukarya (9). Among the 578 625 proteins, 296 630 (52%) had at least one SCOP family assignment. The SCOP database (version 1.65) is a manually curated pro- tein structural domain classification system. The actual structure assignment of proteins was conducted by the PSI-BLAST algorithm (Altschul et al., 1997) using the default BLOSUM62 amino acid substitution matrix as a threshold with 0.01 E-value (expectation value scoring). Several thresholds ranging from 0.0001 to 0.01 E-values were tested and 0.01 was chosen after a manual inspection of results, as there was no explosion of erroneous matches in the iterative searching process of PSI-BLAST. Our experiments showed that 0.01 E-value provided 2–3% higher coverage than E-value of 0.0001, without sacrificing the assignment quality. Our homology assignments were made at the SCOP family level of well- classified and clearly distinguishable SCOP classes: a (all alpha proteins), b (all beta proteins), c [alpha and beta proteins (a/b)], d [alpha and beta proteins (a + b)], e (multi-domain proteins), f (membrane and cell surface proteins and peptides) and g (small proteins). Out of the 2327 SCOP families 2091 (90%) were assigned to at least one protein sequence. Then, 371 SCOP families that did not have interaction information in PSIMAP were filtered out. Multi-domain proteins contain more than one SCOP domain. A prob- lem in structural assignment is that erroneous alignment overlap can occur between two domain sequences on the same region of a multi-domain protein. To overcome this, we regarded domain sequences with an overlap of ≤15 amino acid residues in the alignments as two separate domains in the protein sequence. Mapping the protein family interaction To construct species-specific protein family interactomes, we used PSIMAP (see http://psimap.org and http://psibase.kaist.ac.kr). PSIMAP is a global interaction map that describes domain–domain and protein–protein interac- tion information for known PDB structures. It considers every possible pair of structural domains within a protein or complex to see if there are at least five residue contacts within a 5 Ådistance (Bolser et al., 2003). Although the number of PDB structures is relatively small in comparison with the sequence data, PSIMAP can cover the majority of known protein structural information (Fig. 1a) (Aloy and Russell, 2002). All the predicted protein family interactomes are species specific, and our aim was to find any evolutionary trend among all the interactomes. Hence, the interactomes are based at the protein family interaction level, instead of the individual protein level. Figure 1b shows how interaction intensity is unevenly distributed in a spherical interaction network layout of three model species interactomes. The unique topologies of protein family interaction networks enabled us to compare and analyze them, in order to hypothesize on how the interactomes have expanded. In terms of the coverage of the interactomes, we assumed that the present PDB (the source of interaction information for our analysis) represented the majority of protein folds in nature. This is because the number of PDB entries is growing exponentially while the number of new folds is increasing very slowly. It has been reported that there may be fewer than 2000 distinct protein architectures in nature (Alexandrov and Go, 1994; Chothia, 1992; Orengo et al., 1994; Wang, 1996; Zhang, 1997). Therefore, the present structural interactome data represent a relatively complete set of distinct protein fam- ilies, although this does not imply that all the possible family interactions have been observed in the PDB. Also, as a significant portion of the unas- signed genes represents transmembrane proteins that are yet to be determined experimentally, we suggest that PDB and PSIMAP cover the majority of the existing soluble families in nature. Table 1 shows the subdivision of the 1720 observed protein families and the 2404 observed protein family interactions. RESULTS AND DISCUSSION The core protein family network of life We found 36 commonly present protein families in the 146 spe- cies. They produced 47 protein family interaction pairs (1.3 links per family) that are predicted to be conserved across all species (Fig. 2c), while one-third of the protein family interaction pairs (31%, 734 out of 2404 total pairs) were counted in over 80 species (Fig. 2a and Sup- plementary Table 2). The statistical likelihood of forming the core network (36 families) in the 1720 protein families is 5 × 10 −9 under a Poisson distribution (i.e. not random). A notable aspect of the core network is that there are only 17 hetero-interaction pairs, which reflected that many protein famil- ies are self- or homo-interacting (e.g. homodimer proteins). There were 1358 homo- and 1251 hetero-interaction pairs in the PDB. Of the archaeal interactomes 60% had homo-interaction pairs. Bacteria had 59%. Eukarya had the lowest rate at 53%. Eukaryotic homo- interaction is statistically significantly lower (Kruskal–Wallis test) than the rest. This indicates that eukaryotic interactomes may have expanded their interaction partner repertoires more diversely than archaea and bacteria. Out of the 36 core protein families 16 (44%) were related to pro- tein translation. Notably, c.37.1.8, the most highly interactive protein 3235 Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023

BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 15 2005, pages 3234–3240 doi:10.1093/bioinformatics/bti512 Structural bioinformatics Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map) Daeui Park1,† , Semin Lee2,† , Dan Bolser3 , Michael Schroeder4 , Michael Lappe5 , Donghoon Oh1 and Jong Bhak2,6,∗ 1 Object Received on February 28, 2005; Revised on April 22, 2005; Accepted on May 19, 2005 Advance Access publication May 24, 2005 ABSTRACT Motivation: Many genomes have been completely sequenced. However, detecting and analyzing their protein–protein interactions by experimental methods such as co-immunoprecipitation, tandem affinity purification and Y2H is not as fast as genome sequencing. Therefore, a computational prediction method based on the known protein structural interactions will be useful to analyze large-scale protein–protein interaction rules within and among complete genomes. Results: We confirmed that all the predicted protein family interactomes (the full set of protein family interactions within a proteome) of 146 species are scale-free networks, and they share a small core network comprising 36 protein families related to indispensable cellular functions. We found two fundamental differences among prokaryotic and eukaryotic interactomes: (1) eukarya had significantly more hub families than archaea and bacteria and (2) certain special hub families determined the topology of the eukaryotic interactomes. Our comparative analysis suggests that a very small number of expansive protein families led to the evolution of interactomes and seemed to have played a key role in species diversification. Contact: jong@kribb.re.kr Supplementary information: http://interactomics.org INTRODUCTION Since 1995, over 250 genomes have been completely sequenced (Bernal et al., 2001; Shendure et al., 2004). The availability of such genomic sequence data allows us to conduct a comparative genomics study, yielding important information on developmental processes and disease defense mechanisms (Eichler and Sankoff, 2003; O’Brien et al., 1999; Rubin et al., 2000). Protein comparison using proteomes alone is, however, not sufficient to fully understand how the cellular machinery evolved over a long period of time. The fundamental differences among organisms cannot be explained fully by simply looking at the genes and proteins. A step forward would be to look at all the interactions among them (Ng et al., 2003; von Mering et al., 2002). ∗ To whom correspondence should be addressed. † The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. 3234 The full range of functional complexity and diversity in biological systems is probably the result of interactions among biological entities. The architectures and organization of such interactions are best represented as networks, for example, networks of interacting proteins that reflect biochemical pathways and genetic regulations. It has been reported that, owing to functional constraints, biological interaction networks are tightly conserved (Bolser and Park, 2003). The question is about the basic similarity and difference that underlies the networks of interacting proteins in the completely sequenced genomes. To answer this question, the examination of many complete proteomes and their interactomes would be necessary. However, present experimental technology is not fast enough to map the molecular interactions of proteins for all the completely sequenced genomes. Consequently, computational methods for assigning and predicting protein interactions have been developed using the genomic sequence data (Dandekar et al., 1998; Enright et al., 1999; Huynen and Bork, 1998; Marcotte et al., 1999; Overbeek et al., 1999; Pellegrini et al., 1999; Tan et al., 2004). The aim of these methods, however, has been more focused on discovering functional interactions rather than physical interactions. Therefore, we introduced a structure-oriented protein interaction protocol: PSIMAP (protein structural interactome map) (Gong et al., 2005; Park et al., 2001). The interactions among structural protein families are fundamental to the workings of cells: in multi-domain polypeptide chains, in multi-subunit proteins and in transient complexes among proteins that also exist independently. One critical aspect of PSIMAP is that it allows us to view interactions among protein domains in terms of their structural families to analyze the large-scale patterns and evolution of interactomes among species. (Fig. 1). PSIMAP extracts the exact molecular interaction information of proteins from the Protein Data Bank (PDB) (Berman et al., 2000) and their domains from the Structural Classification of Proteins (SCOP) (Murzin et al., 1995). It has a predictive capacity that can be extended to a genomic scale with the assistance of bioinformatics. We have built a high-throughput, homology-based interaction prediction method utilizing PSIMAP. The key advantages of the PDB-derived predictive method are (1) it covers a many times larger dataset for probable protein interactions, (2) it can reveal the history of interaction in genomes with the limited amount of experimental Published by Oxford University Press 2005 Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 Interaction Technologies Inc., CMS Building 705, 373-1, Guseong-dong, Yuseong-gu, Daejeon 305-701, Korea, 2 Department of BioSystems, KAIST, 373-1, Yuseong-gu, Daejeon 305-701, Korea, 3 Medical Research Council, Dunn Human Nutrition Unit, Cambridge CB2 2QH, UK, 4 Biotec and Department of Computing, TU Dresden, 108 01307 Dresden, Germany, 5 Max-Planck-Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany and 6 NGIC, KRIBB, Daejeon 305-333, Korea Comparative analysis of protein family networks METHODS Protein structure assignment To construct the protein family interaction network in a particular proteome, we first assigned the known three-dimensional structural families (on which PSIMAP is based) to the protein sequences. For this, 146 completely sequenced species from the European Bioinformatics Institute (EBI) and their 578 625 protein sequences were used (Pruess et al., 2003). The 146 genomes represented species of archaea (15), bacteria (122) and eukarya (9). Among the 578 625 proteins, 296 630 (52%) had at least one SCOP family assignment. The SCOP database (version 1.65) is a manually curated protein structural domain classification system. The actual structure assignment of proteins was conducted by the PSI-BLAST algorithm (Altschul et al., 1997) using the default BLOSUM62 amino acid substitution matrix as a threshold with 0.01 E-value (expectation value scoring). Several thresholds ranging from 0.0001 to 0.01 E-values were tested and 0.01 was chosen after a manual inspection of results, as there was no explosion of erroneous matches in the iterative searching process of PSI-BLAST. Our experiments showed that 0.01 E-value provided 2–3% higher coverage than E-value of 0.0001, without sacrificing the assignment quality. Our homology assignments were made at the SCOP family level of wellclassified and clearly distinguishable SCOP classes: a (all alpha proteins), b (all beta proteins), c [alpha and beta proteins (a/b)], d [alpha and beta proteins (a + b)], e (multi-domain proteins), f (membrane and cell surface proteins and peptides) and g (small proteins). Out of the 2327 SCOP families 2091 (90%) were assigned to at least one protein sequence. Then, 371 SCOP families that did not have interaction information in PSIMAP were filtered out. Multi-domain proteins contain more than one SCOP domain. A problem in structural assignment is that erroneous alignment overlap can occur between two domain sequences on the same region of a multi-domain protein. To overcome this, we regarded domain sequences with an overlap of ≤15 amino acid residues in the alignments as two separate domains in the protein sequence. Mapping the protein family interaction To construct species-specific protein family interactomes, we used PSIMAP (see http://psimap.org and http://psibase.kaist.ac.kr). PSIMAP is a global interaction map that describes domain–domain and protein–protein interaction information for known PDB structures. It considers every possible pair of structural domains within a protein or complex to see if there are at least five residue contacts within a 5 Ådistance (Bolser et al., 2003). Although the number of PDB structures is relatively small in comparison with the sequence data, PSIMAP can cover the majority of known protein structural information (Fig. 1a) (Aloy and Russell, 2002). All the predicted protein family interactomes are species specific, and our aim was to find any evolutionary trend among all the interactomes. Hence, the interactomes are based at the protein family interaction level, instead of the individual protein level. Figure 1b shows how interaction intensity is unevenly distributed in a spherical interaction network layout of three model species interactomes. The unique topologies of protein family interaction networks enabled us to compare and analyze them, in order to hypothesize on how the interactomes have expanded. In terms of the coverage of the interactomes, we assumed that the present PDB (the source of interaction information for our analysis) represented the majority of protein folds in nature. This is because the number of PDB entries is growing exponentially while the number of new folds is increasing very slowly. It has been reported that there may be fewer than 2000 distinct protein architectures in nature (Alexandrov and Go, 1994; Chothia, 1992; Orengo et al., 1994; Wang, 1996; Zhang, 1997). Therefore, the present structural interactome data represent a relatively complete set of distinct protein families, although this does not imply that all the possible family interactions have been observed in the PDB. Also, as a significant portion of the unassigned genes represents transmembrane proteins that are yet to be determined experimentally, we suggest that PDB and PSIMAP cover the majority of the existing soluble families in nature. Table 1 shows the subdivision of the 1720 observed protein families and the 2404 observed protein family interactions. RESULTS AND DISCUSSION The core protein family network of life We found 36 commonly present protein families in the 146 species. They produced 47 protein family interaction pairs (1.3 links per family) that are predicted to be conserved across all species (Fig. 2c), while one-third of the protein family interaction pairs (31%, 734 out of 2404 total pairs) were counted in over 80 species (Fig. 2a and Supplementary Table 2). The statistical likelihood of forming the core network (36 families) in the 1720 protein families is 5 × 10−9 under a Poisson distribution (i.e. not random). A notable aspect of the core network is that there are only 17 hetero-interaction pairs, which reflected that many protein families are self- or homo-interacting (e.g. homodimer proteins). There were 1358 homo- and 1251 hetero-interaction pairs in the PDB. Of the archaeal interactomes 60% had homo-interaction pairs. Bacteria had 59%. Eukarya had the lowest rate at 53%. Eukaryotic homointeraction is statistically significantly lower (Kruskal–Wallis test) than the rest. This indicates that eukaryotic interactomes may have expanded their interaction partner repertoires more diversely than archaea and bacteria. Out of the 36 core protein families 16 (44%) were related to protein translation. Notably, c.37.1.8, the most highly interactive protein 3235 Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 interaction information available at present and (3) it provides explicit models of structural interactions that can be used in the validation of the predicted interactions. A disadvantage is that the individual protein interaction information cannot be readily verified. However, we found that 340 out of the 591 (57.5%) human protein interactions in the Database of Interacting Proteins could be explained by structural interactions (Moon et al., 2005). To investigate the broad evolutionary trend in protein interaction networks, the use of conserved protein family interactions is more appropriate than individual protein interactions. For this reason, in our comparative analysis using PSIMAP the precise verification of individual molecular interactions is not necessary, and the reliability of the family interaction predictions is mostly dependent on accurate family detection in the genomes. Although particular organisms may inevitably have false or missing predictions, in general this should not greatly affect the overall comparative family level analysis. Based on the comparative analysis results of protein family interaction networks reconstructed by PSIMAP (see Supplementary Table 1), we were able to get insight into species diversification in terms of computationally predicted protein family interactomes and their common features. First, we found that all 146 species’ protein family interactomes share a small core network comprising 47 protein family interaction pairs (including self- or homo-interacting pairs). Although small and probably incomplete in number, we can infer that the core is well-conserved in diverse life forms. A notable outcome was that a very small number of expansive protein families played a key role in the interactome growth and determined the characteristics of the interactomes of prokarya (archaea and bacteria) and eukarya. Eukaryotic interactomes have a distinctly different topology from prokaryotic interactomes regardless of the proteome size. D.Park et al. (a) (b) Table 1. Summary of structure assignment and comparative analysis of protein family interaction networks at the superkingdom level Description Archaea Bacteria Eukarya All No. of species No. of proteins No. of structure assigned proteins No. of families assigned to proteins No. of families assigned to proteins that can be covered by PSIMAP No. of protein family interaction pairs 15 35 197 16 085 986 830 1086 122 362 484 186 539 1526 1281 1691 9 180 944 94 006 1616 1340 1916 146 578 625 296 630 2091a 1720a 2404a a Non-redundant count. family (Bolser et al., 2003; Bolser and Park, 2003) contained domain variations that were directly related to protein translation such as elongation factors Tu/1-alpha/2 and initiation factors IF2/eIF2/eIF5b. Seven protein families (19%) were related to DNA-binding proteins. The last five protein families (14%) were related to ATP metabolism (see Supplementary Table 3). Our results corroborate previous studies on well-conserved and minimal gene sets. The functions of protein families constituting a core network are mostly related to protein translation, ribosomal structure and biogenesis (Aravind et al., 2000; Koonin, 2000; Mushegian, 1999; Tatusov et al., 1997). 3236 At the other end of the scale, many protein family interactions appear species or lineage specific (Fig. 2a, right-hand side and Fig. 2b, left-hand side). This U-shape trend is, however, not common throughout the superkingdoms. Archaea and bacteria show more unique protein family interactions, while eukarya have fewer unique family interactions. Eukarya have a higher ratio of common protein families without many unique interaction pairs (Fig. 2b). To check if this trend is found in the occurrences of protein families, we also plotted protein family numbers without considering their interactions (data not shown). We found the same U-shape distribution for prokarya. This suggests faster evolution rates of prokarya Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 Fig. 1. Global view of protein family interaction networks. (a) An overview of PSIMAP for all the protein family interactions (1895 protein family nodes, 2655 protein family interaction pairs). Each family is color-coded by its connectivity. More than one-fifth of the families are part of the large connected cluster centering c.37.1.8 (G-proteins). Other highly connected nodes are b.47.1.2 (eukaryotic proteases) and d.58.1.5 (ferredoxin domain). Most of the protein families have only 1–3 interaction partners. (b) Spherical layout of protein family interactomes for three species from three different superkingdoms: Protein family interaction networks for Thermoplasma acidophilum (archaea, 432 protein families, 552 protein family interaction pairs), E.coli (bacteria, 856 protein families, 1100 protein family interaction pairs) and Homo sapiens (eukarya, 1126 protein families, 1624 protein family interaction pairs). The protein families were taken from SCOP. The protein family interactions were assigned by PSIMAP. Depending on the number of protein families and their interactions, the density of the network varies, representing the complexity of the interactomes. Comparative analysis of protein family networks (a) (b) Fig. 2. The frequency of protein family interaction pairs and a core protein family network. (a) The frequency of protein family interaction pairs in 146 species. X represents the number of species, and Y the number of family interaction pairs. Out of the 2404 (2%) 47 protein family interaction pairs are observed in all the 146 species. (b) The frequency of protein family interaction pairs for three superkingdoms. The numbers of species and protein family interactions were normalized to plot in the same scale. (c) A core protein family network, conserved protein family interaction pairs in all the 146 species (36 protein families, 47 protein family interaction pairs); hetero-interaction pairs, interaction pairs among different protein families were 17 and homo-interaction pairs were 30. The loop indicates homo-interaction. The boxes represent functional clustering. The largest box on the right is for protein translation. The left boxes are for DNA binding and ATP metabolism. Other functions found are mostly enzymatic activities such as exonuclease, dehydrogenase, nitrogenase, aldolase and DNA gyrase. (Nancy et al., 1995) with more room for new interaction links. This would result in a high ratio of species-specific protein families and their interactions. Also, the eukaryotic era is shorter than that of prokarya, and it is bound to have fewer unique protein family interactions. We suspect that some portion of the unique protein families and their interactions is due to an artifact from incomplete protein structure assignment. When the protein structural assignment ratio for proteomes increases, their portion decreases gradually to give an accurate number of highly species-specific protein interactions. Functional coverage of structure assigned proteins We carried out a test to quantify the coverage of biological functions with the structurally assigned proteins. Using EBI’s GOA-slim (Camon et al., 2004), a selected set of Gene Ontology (GO) (Harris et al., 2004), we found that most GO annotations (97% on average) for a complete proteome can be covered by structurally assigned proteins. Although the coverage of our structural assignment is ∼50% of whole proteomes, its functional coverage was high enough to deal with overall biological functions. The functional assignment comparison is given in Supplementary Figure 1. We also carried out a functional analysis of the core protein families. Out of the 296 630 proteins, 45 164 belonged to the core protein families, and they covered 83% of GO annotations at the level of EBI’s GOA-slim. In a GO ‘biological process’ mapping, >91% of the core protein family assigned proteins were associated with central biochemical processes categories such as physiological process (GO:0007582), cellular process (GO:0009987), cellular physiological process (GO:0050875), metabolism (GO:0008152), nucleobase, nucleoside, nucleotide and nucleic acid metabolism (GO:0006139), biosynthesis (GO:0009058), macromolecule metabolism (GO:0043170), and transport (GO:0006810). In the ‘molecular function’ category of GO, >90% of the proteins were associated with binding (GO:0005488), catalytic activity (GO:0003824), hydrolase activity (GO:0016787), nucleic acid binding (GO:0003676), ligase activity (GO:0016874), transporter activity (GO:0005215), transferase activity (GO:0016740), helicase activity (GO:0004386) and structural molecule activity (GO:0005198). Given the functional coverage, it seems likely that the core protein family network spanning all types of life forms was formed in the very early stage of evolution, occupying the core biochemical processes for life. After the initial formation, a gradual attachment of the interactome seems to have occurred for a long period of time as peripheral functions, such as cell motility (GO:0006928), membrane fusion (GO:0006944), extracellular structure organization and biogenesis (GO:0043062), and pathogenesis (GO:0009405), were needed. 3237 Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 (c) D.Park et al. Connectivity of protein family interaction networks Network topology and interactome complexity of prokarya and eukarya The function of a protein is often affected by its interacting partners. The topology of interaction networks among protein families is determined by (1) the number of protein families, (2) the number of their interactions and (3) the topology of the interaction link patterns. Having calculated the number of all the interaction pairs for the 146 species, we found that the number of interaction pairs increases linearly along with the number of protein families without any particular deviation from species to species (Fig. 3a). This indicates that the size of a protein interaction network itself does not determine the topology 3238 of the evolving network. Rather, it is closely related to the presence and number of extreme hub families that are capable of continuous growth. As shown in Figure 3a, increasing one interaction node resulted in one additional interaction edge on average, regardless of the superkingdoms. Figure 3b shows that eukaryotic proteomes have distinctively higher γ values upon increasing the number of interaction pairs. This is due to the different complexity levels they have with a small number of large hub families. Figure 3c shows the degree of interaction for the most highly interacting protein family (G-proteins, c.37.1.8 SCOP family). The number of interacting partners of G-proteins is very high in eukarya. This is a major distinction between eukarya and prokarya, where certain families exploded in the number of interaction partners as the overall interactome size increased. In contrast to the G-proteins, 2Fe-2S ferredoxin domains (d.15.4.2 SCOP family) did not show many extra interaction partners in eukarya, although the interactome size increased (Fig 3d). This is because, while the G-proteins grew rapidly in eukarya, the 2Fe-2S ferredoxin domains remained conservative in incorporating new interacting partners. In other words, these two families could be important components distinguishing eukarya from other superkingdoms. From a functional view, it is plausible that G-proteins have evolved to transduce signals and mediate multicellularity, resulting in a homogeneous environment in eukarya, while 2Fe-2S ferredoxin domains have evolved under the pressure of necessity for diverse metabolism in prokarya. Specifically, in the comparison of E.coli and S.cerevisiae, the number of interaction pairs is similar in size (1120 and 1097, respectively) even though they belong to different superkingdoms (see Supplementary Table 1). However, the value of γ representing connectivity distribution of the network was 2.58 in E.coli and only 2.03 in S.cerevisiae. This is because yeast has more hub families, and the hub families are larger despite the similar number of nodes and edges. We suggest that the difference between their interaction network topologies is responsible for their organismal complexity. One mechanism for the higher number of interactions with the same number of interacting families could be compartmentalization within cells. The same kind of structures can have different interacting partners if they are located in different compartments without interfering with other similar molecular interactions. If we regard all the cellular functions as a part of information processing, this could be viewed as an optimization strategy of information processing in S.cerevisiae. Expansion of protein family networks As discussed above, interactomes can reveal the differences among species in terms of their network topologies. To analyze the evolution of protein family networks, we measured the correlation between the interaction degree of each protein family in each of the 146 predicted interaction networks. The correlation was calculated by Pearson’s correlation coefficient method, which set the score 0.5–1 on positive correlations, −0.49 to 0.49 on non-correlations and −1 to −0.5 on negative correlations (see Supplementary Table 4). We found that only a limited number of protein families have positive correlations (199 out of 1720). Therefore, most of the protein families in a species do not have a positive correlation with the number of total interaction pairs in interactomes. An example for the positive correlations is the c.37.1.8 (Fig. 3c). Another example is the d.15.4.2, which contains increased interaction partners in all Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 A simple method of characterizing an interactome is to calculate the degree (the total number of connections) of an interacting protein family. In the protein family interactomes, degree (k) is the total number of connecting partners at the protein family level (in physics, this quantity is often called ‘connectivity’ and has a different meaning in the graph theory) (Dorogovtsev and Mendes, 2002; Fraser et al., 2002). The degree is the actual number of nearest neighbors of a node (protein family). The degree distribution, the total distribution of degrees of a network, P (k), has been reported to be of a power-law form: P (k) ∼ k −γ in most biological networks (Fraser et al., 2002; Wagner, 2001). We found that the γ is 2.57 ± 0.11 for the 15 archaeal species and 2.55 ± 0.11 for the 122 bacterial species, on average. We found the average γ of nine eukaryotic species had a much lower value of 2.08 ± 0.09. This means that protein family interactomes are scalefree (Jeong et al., 2001; Steffen et al., 2002) in all superkingdoms with a distinction between eukarya and the rest (see Supplementary Table 1). Eukarya have a higher number of hub families (Barabasi and Oltvai, 2004) than those of archaea and bacteria. Also, in eukarya, the hub families have higher numbers of interaction partners, although the criterion for the selection of the constituents in a hub family is arguable. The highest degree ranges from 24 to 38 for eukarya, from 11 to 17 for bacteria and from 9 to 11 for archaea. Eukarya have more multi-domain proteins than prokarya in general (Apic et al., 2001). We can explain the eukaryotic protein family interaction networks with many factors including the presence of multi-domained proteins. However, being multi-domained alone cannot fully account for this. For example, Pseudomonas syringae (bacteria) and Saccharomyces cerevisiae (eukarya) have similar assignable proteome sizes of 2812 (52% of total) and 2784 (45% of total), and have multi-domain proteins of 752 and 720, respectively; however, they have different γ of 2.67 and 2.03. That is, S.cerevisiae has more hub protein families than P.syringae. Although being multidomained influences the difference, it does not account for the total architectural difference between the two types. This high number of interaction partners is not a simple function of proteome size. In the case of Encephalitozoon cuniculi (a eukaryotic parasite protozoan; proteome size: 839 proteins; genome size: 2.9 Mb), the largest hub family has 24 interaction partners, while 80% of other protein families have one or two interaction partners. Although its proteome is smaller than that of Escherichia coli (2338 proteins, 5.2 Mb), the connectivity of the main hub family is much larger than that of E.coli (15 interaction partners). This implies that there is a fundamental difference between prokaryotic and eukaryotic interactomes. Comparative analysis of protein family networks (b) (c) (d) Fig. 3. Correlation analysis among the characteristics of a protein family network. (a) Correlation between the number of protein families and the number of their interaction pairs in a species. The number of protein family interaction pairs has linearly (slope: 1.35) increased with the number of protein families in each species across all three superkingdoms. There was no relation to the kind of species observed. (b) The number of protein family interaction pairs in a species with a γ value of its protein family interaction network was plotted for all the 146 species. γ Values of eukarya (2.24–1.96) are distinctively higher than those of archaea (2.75–2.37) and bacteria (2.74–2.16). (c and d) Two examples of protein families that show positive correlation between the number of interaction partners for the family and the total number of protein families for the species. c.37.1.8 (G-proteins) showed positive correlation especially in eukarya. d.15.4.2 (2Fe-2S ferredoxin domains) showed positive correlation in all three superkingdoms. species along with the number of protein family interaction pairs (Fig. 3d). d.15.4.2 is the ferredoxin family, made up of iron–sulfur proteins mediating electron transfer in a range of metabolic reactions (Mason and Cammack, 1992; Otaka and Ooi, 1989). In chloroplasts, 2Fe-2S ferredoxin functions as electron carriers in the photosynthetic electron transport chain and electron donors to various cellular proteins (Gibney et al., 1996). From the correlation pattern analysis of other ferredoxin families, it is probable that protein family interactions related to ATP synthesis were central to life from the very early days, and they increased in number in a manner shared by all species. CONCLUSION We introduced an analysis protocol that was based on protein family interactions, PSIMAP. Using this protocol, we identified the core network of 47 protein family interaction pairs in all the 146 species. The functions of families constituting the core network are protein translation, ribosomal structure, DNA binding and ATP metabolism. The results confirmed previous studies that all species share the same basic protein families and family interactions critical to cellular functions. We noted topological characteristics in the interactomes across species: the protein family networks of eukarya had more hub families than archaea and bacteria. This implies an architectural difference between prokaryotic and eukaryotic interactomes. The number of protein family interaction pairs increased linearly with the number of protein families, regardless of the species. In this respect, we suggest that the increase of network size itself does not determine the characteristics of an evolving network. Only a small number of protein families have a very large number of protein family interaction partners, especially in eukarya. We suggest that big hub families continuously increase their number of interaction partners. Therefore, the addition of new protein families to the evolving network is driven not by a random process to all the protein families but by a selective process to special hub families. The recent empirical data (Eriksen and Hornquist, 2002; Jeong et al., 2003; Newman, 2001) on the dynamics of the attachment of new edges in various growing networks support this mechanism. Interestingly, this indicates that a very small number of special protein families (G-proteins, c.37.1.8 SCOP family for example) play a key role in driving all species diversifications, especially in higher organisms. We suggest a unique approach to detect an ‘interaction’ core for many species without expensive experiments. As the structural assignment rate using the PDB rises, we expect to expand and eventually complete the core. At this stage, our finding confirms the known small core network. 3239 Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 (a) D.Park et al. ACKNOWLEDGEMENTS The authors would like to thank their colleagues at the Biomatics laboratory, especially, HwanHo Choi. Maryana Bhak corrected and edited this manuscript. S.L would like to thank Juok Cho for helping with the statistical analysis. This project was supported by Biogreen21 program of RDA KRIBB Research Initiative Program and R01-2004-000-10172-0 grant from KOSEF. We thank the constructively helpful reviewers. Conflict of Interest: none declared. Alexandrov,N.N. and Go,N. (1994) Biological meaning, statistical significance and classification of local spatial similarities in nonhomologous proteins. Protein Sci., 3, 866–875. Aloy,P. and Russell,R.B. (2002) Interrogating protein interaction networks through structural biology. Proc. Natl Acad. Sci. USA, 99, 5896–5901. Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. Apic,G. et al. (2001) An insight into domain combinations. Bioinformatics, 17 (Suppl. 1), S83–S89. Aravind,L. et al. (2000) Lineage-specific loss and divergence of functionally linked genes in eukaryotes. Proc. Natl Acad. Sci. USA, 97, 11319–11324. Barabasi,A.L. and Oltvai,Z.N. (2004) Network biology: understanding the cell’s functional organization. Nat. Rev. Genet., 5, 101–113. Berman,H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. Bernal,A. et al. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res., 29, 126–127. Bolser,D.M. and Park,J.H. (2003) Biological network evolution hypothesis applied to protein structural interactome. Genomics Inform., 1, 7–19. Bolser,D. et al. (2003) Visualisation and graph-theoretic analysis of a large-scale protein structural interactome. BMC Bioinformatics, 4, 45. Camon,E. et al. (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res., 32, D262–D266. Chothia,C. (1992) Proteins. One thousand families for the molecular biologist. Nature, 357, 543–544. Dandekar,T. et al. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci., 23, 324–328. Dorogovtsev,S.N. and Mendes,J.F. (2002) Evolution of networks. Adv. Phys., 51, 1079–1187. Eichler,E.E. and Sankoff,D. (2003) Structural dynamics of eukaryotic chromosome evolution. Science, 301, 793–797. Enright,A.J. et al. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature, 402, 86–90. Eriksen,K.A. and Hornquist,M. (2002) Scale-free growing networks imply linear preferential attachment. Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 65, 017102. Fraser,H.B. et al. (2002) Evolutionary rate in the protein interaction network. Science, 296, 750–752. Gibney,B.R. et al. (1996) Ferredoxin and ferredoxin-heme maquettes. Proc. Natl Acad. Sci. USA, 93, 15041–15046. Gong,S. et al. (2005) PSIbase: a database of Protein Structural Interactome map (PSIMAP). Bioinformatics, 21, 2541–2543. Harris,M.A. et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., 32, D258–D261. 3240 Downloaded from https://academic.oup.com/bioinformatics/article/21/15/3234/195577 by guest on 18 May 2023 REFERENCES Huynen,M.A. and Bork,P. (1998) Measuring genome evolution. Proc. Natl Acad. Sci. USA, 95, 5849–5856. Jeong,H. et al. (2001) Lethality and centrality in protein networks. Nature, 411, 41–42. Jeong,H. et al. (2003) Measuring preferential attachment for evolving networks. Europhys. Lett., 61, 567–572. Koonin,E.V. (2000) How many genes can make a cell: the minimal-gene-set concept. Annu. Rev. Genomics Hum. Genet., 1, 99–116. Marcotte,E.M. et al. (1999) Detecting protein function and protein–protein interactions from genome sequences. Science, 285, 751–753. Mason,J.R. and Cammack,R. (1992) The electron-transport proteins of hydroxylating bacterial dioxygenases. Annu. Rev. Microbiol., 46, 277–305. Moon,H.S. et al. (2005) Architecture of basic building blocks in protein and domain structural interaction networks. Bioinformatics, 21, 1479–1486. Murzin,A.G. et al. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540. Mushegian,A. (1999) The minimal genome concept. Curr. Opin. Genet. Dev., 9, 709–714. Nancy,A.M. et al. (1995) Faster evolutionary rates in endosymbiotic bacteria than in cospeciating insect hosts. J. Mol. Evol., 41, 727–731. Newman,M.E. (2001) Clustering and preferential attachment in growing networks. Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 64, 025102. Ng,S.K. et al. (2003) Integrative approach for computationally inferring protein domain interactions. Bioinformatics, 19, 923–929. O’Brien,S.J. et al. (1999) The promise of comparative genomics in mammals. Science, 286, 458–462, 479–481. Orengo,C.A. et al. (1994) Protein superfamilies and domain superfolds. Nature, 372, 631–634. Otaka,E. and Ooi,T. (1989) Examination of protein sequence homologies: V. New perspectives on evolution between bacterial and chloroplast-type ferredoxins inferred from sequence evidence. J. Mol. Evol., 29, 246–254. Overbeek,R. et al. (1999) The use of gene clusters to infer functional coupling. Proc. Natl Acad. Sci. USA, 96, 2896–2901. Park,J. et al. (2001) Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. J. Mol. Biol., 307, 929–938. Pellegrini,M. et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA, 96, 4285–4288. Pruess,M. et al. (2003) The Proteome Analysis database: a tool for the in silico analysis of whole proteomes. Nucleic Acids Res., 31, 414–417. Rubin,G.M. et al. (2000) Comparative genomics of the eukaryotes. Science, 287, 2204–2215. Shendure,J. et al. (2004) Advanced sequencing technologies: methods and goals. Nat. Rev. Genet., 5, 335–344. Steffen,M. et al. (2002) Automated modelling of signal transduction networks. BMC Bioinformatics, 3, 34. Tan,S.H. et al. (2004) ADVICE: Automated Detection and Validation of Interaction by Co-Evolution. Nucleic Acids Res., 32, W69–W72. Tatusov,R.L. et al. (1997) A genomic perspective on protein families. Science, 278, 631–637. von Mering,C. et al. (2002) Comparative assessment of large-scale data sets of protein– protein interactions. Nature, 417, 399–403. Wagner,A. (2001) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol. Biol. Evol., 18, 1283–1292. Wang,Z.X. (1996) How many fold types of protein are there in nature? Proteins, 26, 186–191. Zhang,C.T. (1997) Relations of the numbers of protein sequences, families and folds. Protein Eng., 10, 757–761.

The various institutions of the organic civil society have always contributed in shaping the opinion of the audiences and promoting their consciousness of their rights on the one hand and influencing the public policies on the other hand through working with local entities in making strategies for influencing and taking decisions about public policies. The roles and functions of the civil society are mainly supported by the different media as they provide news and information to be reached by the maximum of audiences. Nevertheless, the introduction of the internet in the early 1990s as well as the technological revolution in ICT and the internet of things (web 3.0) have reached into people's daily life in almost all its dimensions. Digital media and modern technologies have exerted remarkable influence on people's tastes and standards. They have a great role in the perceptual modulation and shaping of the public opinion. The tools of media discourse allow them nowadays to play the same role of the Institutions of social and cultural upbringing of the past. Beside the advantages of the technological revolution in what concerns the endless resources of knowledge and information in easy and accessible ways and the continuous connection to the world regardless of the time and space constraints …, new media remain disadvantageous when stereotyping awareness and standards and assimilating them into a universal, monolithic and dominant hypostasis. In the digital era new values have been generated and the whole values system has been transformed into a distorted arena of incompatible values based on falsification, molding and distortion of consciousness. Consequently, social ties are disintegrated and the values of violence and hatred have been developed among members of the same community. The current study sheds light on the huge transformations in modern society due to the technological development and the impact of the social media on new communities namely on new value systems. It aims at explaining the role of media in the marketing and making of information from a critical perspective. The research is based on analyzing and interpreting concrete examples of media content in the world and their impact on peoples' perception of events and their reactions to them. It is an attempt to explain and evaluate the effectiveness of media discourse in the moral system and social behavior.

国外文凭办理《利物浦赫普大学毕业证办理英国Hope毕业证书哪里买》《Q微信/1954292140》【急速办理英国文凭证书】利物浦赫普大学学历证书毕业证书学历证学位证《哪里买英国利物浦赫普大学毕业证文凭英国Hope成绩学校快递邮寄信封》、原版制作英国Hope成绩单英国Hope毕业证购买、《利物浦赫普大学毕业证书PDF电子版》、原版制作英国Hope成绩单英国Hope毕业证购买、英国英国Hope成绩单原版制作英国英国Hope Graduation利物浦赫普大学Postgraduate。英国留学办理英国Hope文凭利物浦赫普大学毕业证【Q/微信1954 292 140】办理全套留学文凭材料（利物浦赫普大学毕业证/成绩单（GPA成绩修改）/英国Hope文凭学历证书)；（真实可查）教育部学历认证、留信网认证、使馆认证留学人员回国证明、文凭认证、英国Hope diploma、英国Hope certificate、英国Hope Degree（实体公司，专业可靠）。我们主要项目有：美国毕业证办理，英国毕业证办理，加拿大毕业证办理，澳洲毕业证办理，德国文凭，（高仿文凭样本展示《英国文凭等级原版制作利物浦赫普大学毕业证电子图》【Q/微1954292140】《开除利物浦赫普大学成绩单》）法国文凭，新西兰文凭，新加坡文凭，马来西亚文凭等国大学文凭制作，如果那您有需要希望你与我们联系。《急速办理利物浦赫普大学毕业证电子图》文凭学历证书办理流程《Q微1954292140》： 1、客户提供办理信息：姓名、生日、专业、学位、毕业时间等（如信息不确定可以咨询顾问：微信1954292140我们有专业老师帮你查询）； 2、客户付定金下单； 3、公司确认到账转制作点做电子图； 4、电子图做好发给客户确认； 5、电子图确认好转成品部做成品； 6、成品做好拍照或者视频确认再付余款； 7、快递给客户（国内顺丰，国外DHL）。真实网上可查的证明材料 1 #教育部学历学位认证《英国文凭等级原版制作利物浦赫普大学毕业证电子图》【Q/微1954292140】《开除利物浦赫普大学成绩单》留服官网真实存档可查，永久存档。 2 #留学回国人员证明（使馆认证）《英国Hope毕业证电子图急速办理》【Q/微1954292140】《原版制作利物浦赫普大学留信人才入库文凭等级》使馆网站真实存档可查。真实可查留信认证的作用(私企，外企，荣誉的见证): 1：该专业认证可证明留学生真实留学身份《英国Hope成绩单利物浦赫普大学文凭等级开除》【Q/微1954292140】《原版制作英国Hope毕业证电子图成绩单》同时对留学生所学专业等级给予评定，国家专业人才认证中心颁发入库证书。 2：个人职称评审加20分，个人信誉贷款加10分。 3：在国家人才网主办的全国网络招聘大会中纳入资料，供国家500强等高端企业选择人才。【业务选择办理准则】一、工作未确定，回国需先给父母看利物浦赫普大学毕业证的样子《Q微1954292140》、亲戚朋友看下学历认证的情况办理《急速办理利物浦赫普大学毕业证电子图》【Q/微1954292140】《英国Hope成绩单开除》一份就读学校的毕业证成绩单即可二、回国进私企、外企、自己做生意的情况这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外学历认证的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证成绩单《英国文凭等级原版制作利物浦赫普大学毕业证电子图》【Q/微1954292140】《开除利物浦赫普大学成绩单》即可三、回国进国企、银行等事业性单位或者考公务员的情况办理一份毕业证成绩单，递交材料到教育部，《英国Hope毕业证电子图急速办理》【Q/微1954292140】《原版制作利物浦赫普大学文凭等级留信人才入库》办理真实教育部认证教育部学历认证。留学生学历认证如何办理《急速办理利物浦赫普大学毕业证电子图》【Q/微1954 292 140】《英国Hope成绩单开除》,一般来说教育制度规定，其所设立的课程偏于实用，培养的主要是工程师、拥有高级职业技能者。其毕业所需的年限也比要短一些。德国190万在校大学生中，有45万在高专学习。但是有些专业比如法学家、教育学、药师、牙医这样的专业是无法设立的。毕业生在毕业后如果想要继续读博的话，一般还需要再经过几门课的补习或者。就拿到的学位来说，拿到的学位比如“Diplom”和大学的urkunde《英国文凭等级原版制作利物浦赫普大学毕业证电子图》【Q/微1954 292 140】《开除利物浦赫普大学成绩单》是一样的，只是后面会多了“fh”的字样。但到目前为止，国内对fh的学位到底是否承认为硕士，似乎还没有一个明确的答案。

Log In

Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map)