Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Evaluating link prediction methods

Published: 01 December 2015 Publication History

Abstract

Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.

References

[1]
Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data: a short course. AMLBook
[2]
Adamic L (2001) Friends and neighbors on the web. Soc Netw 25(3):211---230
[3]
Al-Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proceedings of SDM'06 workshop on link analysis, counterterrorism and security
[4]
Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 635---644
[5]
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509---512
[6]
Barabási A-L, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaboration. Phys A Stat Mech Appl 311(3---4):590---614
[7]
Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98---101
[8]
Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661---703
[9]
Davis D, Lichtenwalter R, Chawla NV (2011) Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 international conference on advances in social networks analysis and mining, pp 281---288
[10]
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, pp 233---240
[11]
Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1271---1279
[12]
Dong Y, Tang J, Wu S, Tian J, Chawla NV, Rao J, Cao H (2012) Link prediction and recommendation across heterogeneous social networks. In: Proceedings of the 2012 international conference on data mining, pp 181---190
[13]
Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95---130
[14]
Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. ReCALL 31(HPL-2003-4), pp 1---38
[15]
Fletcher RJ, Acevedo MA, Reichert BE, Pias KE, Kitchens WM (2011) Social network models predict movement and connectivity in ecological landscapes. Proc Natl Acad Sci 108(48):19282---19287
[16]
Getoor L (2003) Link mining: a new data mining challenge. ACM SIGKDD Explor Newsl 5(1):84---89
[17]
Goldberg DS, Roth FP (2003) Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci 100(8):4372---4376
[18]
Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103---123
[19]
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13---30
[20]
Hopcroft J, Lou T, Tang J (2011) Who will follow you back? Reciprocal relationship prediction. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp 1137---1146
[21]
Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint in proceedings on digital libraries, pp 7---11
[22]
Leroy V, Cambazoglu BB, Bonchi F (2010) Cold start link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 393---402
[23]
Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 462---470
[24]
Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29---123
[25]
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on world wide web, pp 641---650
[26]
Lichtenwalter RN, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 243---252
[27]
Lichtenwalter RN, Chawla NV (2012) Link prediction: fair and effective evaluation. In: IEEE/ACM international conference on social networks analysis and mining, pp 376---383
[28]
Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: Proceedings of the twelfth international conference on information and knowledge management, pp 556---559
[29]
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019---1031
[30]
Liu X, He Q, Tian Y, Lee WC, McPherson J, Han J (2012) Event-based social networks: linking the online and offline social worlds. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1032---1040
[31]
Lu L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A 390(6):1150---1170
[32]
Martinez ND, Hawkins BA, Dawah HA, Feifarek BP (1999) Effects of sampling effort on characterization of food-web structure. Ecology 80:1044---1055
[33]
Murata T, Moriyasu S (2007) Link prediction of social networks based on weighted proximity measures. In: Proceedings of the IEEE/WIC/ACM international conference on web intelligence, pp 85---88
[34]
Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: statistical significance and interpretation. Q J R Meteorol Soc 2002:2145---2166
[35]
Narayanan A, Shi E, Rubinstein BIP (2011) Link prediction by de-anonymization: how we won the Kaggle social network challenge. Arxiv preprint arXiv:1102.4374
[36]
Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev Lett E 64(2):025102
[37]
Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98:404---409
[38]
O'Madadhain J, Hutchins J, Smyth P (2005) Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explor Newsl 7(2):23---30
[39]
O'Madadhain J, Smyth P, Adamic L (2005) Learning predictive models for link formation. In: International sunbelt social network conference
[40]
Papadopoulos F, Kitsak M, Serrano M, Boguna M, Krioukov D (2012) Popularity versus similarity in growing networks. Nature 489(7417):537---540
[41]
Raeder T, Hoens TR, Chawla NV (2010) Consequences of variability in classifier performance estimates. In: Proceedings of the 10th IEEE international conference on data mining, pp 421---430
[42]
Sarukkai RR (2000) Link prediction and path analysis using Markov Chains. In: Proceedings of the 9th international WWW inproceedings on computer networks: the international journal of computer and telecommunications networking, pp 377---386
[43]
Scellato S, Mascolo C, Musolesi M, Latora V (2010) Distance matters: geo-social metrics for online social networks. In: Proceedings of the 3rd conference on online social networks, pp 8---8
[44]
Scellato S, Noulas A, Mascolo C (2011) Exploring place features in link prediction on location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1046---1054
[45]
Scripps J, Tan PN, Chen F, Esfahanian AH (2008) A matrix alignment approach for link prediction. In: Proceedings of the 19th international conference on pattern recognition, pp 1---4
[46]
Stumpf MPH, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci 102(12):4221---4224
[47]
Sprinzak E, Sattath S, Margalit H (2003) How reliable are experimental protein---protein interaction data? J Mol Biol 327(5):919---923
[48]
Sun Y, Han J, Aggarwal CC, Chawla NV (2012) When will it happen? Relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 663---672
[49]
Szilágyi A, Grimm V, Arakaki AK, Skolnick J (2005) Prediction of physical protein---protein interactions. Phys Biol 2(2):S1---16
[50]
Taskar B, Wong MF, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of the conference on neural information processing systems
[51]
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in facebook. In: Proceeding of the 2nd ACM SIGCOMM workshop on social networks, pp 37---42
[52]
Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the IEEE international conference on data mining, pp 322---331
[53]
Wang D, Pedreschi D, Song C, Giannotti F, Barabasi AL (2011) Human mobility, social ties, and link prediction. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1100---1108
[54]
Wittie MP, Pejovic V, Deek L, Almeroth KC, Zhao BY (2010) Exploiting locality of interest in online social networks. In: Co-NEXT '10 proceedings of the 6th international conference
[55]
Yang Y, Chawla NV, Sun Y, Han J (2012) Predicting links in multi-relational and heterogeneous networks. In: Proceedings of the 12th IEEE international conference on data mining, pp 755---764
[56]
Yin Z, Gupta M, Weninger T, Han J (2010) A unified framework for link recommendation using random walks. In: Proceedings of the 2010 international conference on advances in social networks analysis and mining, pp 152---159

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 45, Issue 3
December 2015
266 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 December 2015

Author Tags

  1. Class imbalance
  2. Link prediction and Evaluation
  3. Sampling
  4. Temporal effects on link prediction
  5. Threshold curves

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Discrete-time graph neural networks for transaction prediction in Web3 social platformsMachine Language10.1007/s10994-024-06579-y113:9(6395-6412)Online publication date: 1-Sep-2024
  • (2024)Temporal graph learning for dynamic link prediction with text in online social networksMachine Language10.1007/s10994-023-06475-x113:4(2207-2226)Online publication date: 1-Apr-2024
  • (2024)Link prediction in directed complex networks: combining similarity-popularity and path patterns miningApplied Intelligence10.1007/s10489-024-05565-054:17-18(8634-8665)Online publication date: 1-Sep-2024
  • (2023)Evaluating graph neural networks for link predictionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666291(3853-3866)Online publication date: 10-Dec-2023
  • (2023)Graph structure learning on user mobility data for social relationship inferenceProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i4.25580(4578-4586)Online publication date: 7-Feb-2023
  • (2023)Graph embedding-based link prediction for literature-based discovery in Alzheimer’s DiseaseJournal of Biomedical Informatics10.1016/j.jbi.2023.104464145:COnline publication date: 1-Sep-2023
  • (2023)DyHNetInformation Sciences: an International Journal10.1016/j.ins.2023.119371646:COnline publication date: 29-Aug-2023
  • (2023)Negative link prediction to reduce dropout in Massive Open Online CoursesEducation and Information Technologies10.1007/s10639-023-11597-928:8(10385-10404)Online publication date: 25-Jan-2023
  • (2023)Link Prediction-Based Multi-Identity Recognition of Darknet VendorsInformation and Communications Security10.1007/978-981-99-7356-9_19(317-332)Online publication date: 18-Nov-2023
  • (2022)Towards better evaluation for dynamic link predictionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602656(32928-32941)Online publication date: 28-Nov-2022
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media