research-article

Efficient top-k simrank-based similarity join

Editors: Chen Li, Volker Markl Authors:

Guoliang LiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 8, Issue 3

Pages 317 - 328

https://doi.org/10.14778/2735508.2735520

Published: 01 November 2014 Publication History

Abstract

SimRank is a popular and widely-adopted similarity measure to evaluate the similarity between nodes in a graph. It is time and space consuming to compute the SimRank similarities for all pairs of nodes, especially for large graphs. In real-world applications, users are only interested in the most similar pairs. To address this problem, in this paper we study the top-k SimRank-based similarity join problem, which finds k most similar pairs of nodes with the largest SimRank similarities among all possible pairs. To the best of our knowledge, this is the first attempt to address this problem. We encode each node as a vector by summarizing its neighbors and transform the calculation of the SimRank similarity between two nodes to computing the dot product between the corresponding vectors. We devise an efficient two-step framework to compute top-k similar pairs using the vectors. For large graphs, exact algorithms cannot meet the high-performance requirement, and we also devise an approximate algorithm which can efficiently identify top-k similar pairs under user-specified accuracy requirement. Experiments on both real and synthetic datasets show our method achieves high performance and good scalability.

References

[1]

I. Antonellis, H. Garcia-Molina, and C.-C. Chang. Simrank++: query rewriting through link analysis of the clickgraph (poster). In WWW, pages 1177--1178, 2008.

Digital Library

[2]

A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, pages 426--434, 2003.

Digital Library

[3]

C. Cooper and A. M. Frieze. Random walks with look-ahead in scale-free random graphs. SIAM J. Discrete Math., 24(3): 1162--1176, 2010.

Digital Library

[4]

M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Y. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4(12): 1213--1224, 2011.

Digital Library

[5]

Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013.

Digital Library

[6]

J. He, H. Liu, J. X. Yu, P. Li, W. He, and X. Du. Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst., 42: 107--122, 2014.

Digital Library

[7]

G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, pages 538--543, 2002.

Digital Library

[8]

M. Kusumoto, T. Maehara, and K.-i. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014.

Digital Library

[9]

D. Lee, J. Park, J. Shim, and S. goo Lee. An efficient similarity join algorithm with cosine similarity predicate. In DEXA (2), pages 422--436, 2010.

Digital Library

[10]

P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.

Digital Library

[11]

D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. VLDB J., 19(1): 45--66, 2010.

Digital Library

[12]

Y. Low and A. X. Zheng. Fast top-k similarity queries via matrix compression. In CIKM, pages 2070--2074, 2012.

Digital Library

[13]

O. Rojas, V. G. Costa, and M. Marín. Efficient parallel block-max wand algorithm. In Euro-Par, pages 394--405, 2013.

Digital Library

[14]

Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11): 992--1003, 2011.

Digital Library

[15]

H. Yan, S. Ding, and T. Suel. Inverted index compression and query processing with optimized document ordering. In WWW, pages 401--410, 2009.

Digital Library

[16]

W. Yu, X. Lin, and W. Zhang. Towards efficient simrank computation on large networks. In ICDE, pages 601--612, 2013.

Digital Library

[17]

W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1): 13--24, 2013.

Digital Library

[18]

W. Yu, W. Zhang, X. Lin, Q. Zhang, and J. Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3): 327--353, 2012.

Digital Library

[19]

W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient simrank-based similarity join over large graphs. PVLDB, 6(7): 493--504, 2013.

Digital Library

Cited By

Bai JZhou JChen SDu MChen ZMin M(2024)HitSim: An Efficient Algorithm for Single-Source and Top-k SimRank ComputationInformation10.3390/info1506034815:6(348)Online publication date: 12-Jun-2024
https://doi.org/10.3390/info15060348
Ge QLiu YZhao YSun YZou LChen YPan A(2024)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636219
Zhang MXiao YWang WSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599328
Show More Cited By

Recommendations

Efficient SimRank-Based Similarity Join
Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015, Regular Papers and Technical Correspondence

Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, and so on. ...
SimRank: a measure of structural-context similarity
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

The problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents or computing overlap among item-sets. We propose a complementary approach, ...
Efficient top-K SimRank-based similarity join
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

SimRank is an effective and widely adopted measure to quantify the structural similarity between pairs of nodes in a graph. In this paper we study the problem of top-k SimRank-based similarity join, which finds k pairs of nodes with the largest SimRank ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 8, Issue 3

November 2014

144 pages

ISSN:2150-8097

Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2014

Published in PVLDB Volume 8, Issue 3

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
135
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai JZhou JChen SDu MChen ZMin M(2024)HitSim: An Efficient Algorithm for Single-Source and Top-k SimRank ComputationInformation10.3390/info1506034815:6(348)Online publication date: 12-Jun-2024
https://doi.org/10.3390/info15060348
Ge QLiu YZhao YSun YZou LChen YPan A(2024)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.14778/3636218.3636219
Zhang MXiao YWang WSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599328
Yu WMcCann JZhang CFerhatosmanoglu H(2022)Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge GraphsACM Transactions on Information Systems10.1145/349520940:4(1-45)Online publication date: 11-Jan-2022
https://dl.acm.org/doi/10.1145/3495209
Symeonidis PKirjackaja LZanker M(2021)Session-based news recommendations using SimRank on multi-modal graphsExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115028180:COnline publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1016/j.eswa.2021.115028
Wang HWei ZLiu YYuan YDu XWen J(2021)ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truthsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00672-730:6(989-1015)Online publication date: 5-Jun-2021
https://dl.acm.org/doi/10.1007/s00778-021-00672-7
Shi JJin TYang RXiao XYang Y(2020)Realtime index-free single source SimRank processing on web-scale graphsProceedings of the VLDB Endowment10.14778/3384345.338434713:7(966-980)Online publication date: 1-Mar-2020
https://dl.acm.org/doi/10.14778/3384345.3384347
Wang HWei ZYuan YDu XWen JMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Exact Single-Source SimRank Computation on Large GraphsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389781(653-663)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3389781
Yu WMcCann JZhang C(2019)Efficient Pairwise Penetrating-rank Similarity RetrievalACM Transactions on the Web10.1145/336861613:4(1-52)Online publication date: 18-Dec-2019
https://dl.acm.org/doi/10.1145/3368616
Symeonidis PChairistanidis S(2019)OmniRank: learning to recommend based on omni-traversal of heterogeneous graphsSocial Network Analysis and Mining10.1007/s13278-019-0585-79:1Online publication date: 27-Jul-2019
https://doi.org/10.1007/s13278-019-0585-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents