Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Fast top-k similarity join for SimRank

Published: 01 March 2017 Publication History

Abstract

SimRank is a well-studied similarity measure between two nodes in a network. However, evaluating SimRank of all nodes in a network is not only time-consuming but also not pragmatic, since users are only interested in the most similar pairs in many real-world applications. This paper focuses on top-k similarity join based on SimRank. In this work, we first present an incremental algorithm for computing SimRank. On top of that, we derive an iterative batch pruning framework, which is able to iteratively filter out unpromising nodes and obtain the top-k pairs in a fast mode. Specifically, we define the concept of super node such that for a node in the network, the SimRank with its super node is not less than that with any others. Based on this feature, we propose a tight upper bound for each node that can be easily calculated after each iteration. Experiments on both real-life and synthetic datasets demonstrate that our method achieves better performance and scalability, in comparison with the state-of-the-art solution.

References

[1]
D. Fogaras, B. Rácz, Scaling link-based similarity search, in: Proceedings of the Fourteenth International Conference on World Wide Web, Chiba, Japan, 2005, pp. 641-650. May 10-14, 2005.
[2]
Y. Fujiwara, M. Nakatsuji, H. Shiokawa, M. Onizuka, Efficient search algorithm for SimRank, in: Proceedings of the Twenty Ninth IEEE International Conference on Data Engineering, Brisbane, Australia, 2013, pp. 589-600. April 8-12, 2013.
[3]
G. He, H. Feng, C. Li, H. Chen, Parallel SimRank computation on large graphs with iterative aggregation, in: Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 543-552. July 25-28, 2010.
[4]
G. Jeh, J. Widom, SimRank: a measure of structural-context similarity, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 20 02, pp. 538-543. July 23-26, 20 02.
[5]
M. Kusumoto, T. Maehara, K. Kawarabayashi, Scalable similarity search for SimRank, in: Proceedings of the Thirty Third ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 2014, pp. 325-336. June 22-27, 2014.
[6]
P. Lee, L.V.S. Lakshmanan, J.X. Yu, On top-k structural similarity search, in: Proceedings of the Twenty Eighth IEEE International Conference on Data Engineering, Washington, DC, USA (Arlington, Virginia), 2012, pp. 774-785. 1-5 April, 2012.
[7]
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, T. Wu, Fast computation of SimRank for static and dynamic information networks, in: Proceedings of the Thirteenth EDBT International Conference on Extending Database Technology, Lausanne, Switzerland, 2010, pp. 465-476. March22-26, 2010.
[8]
P. Li, H. Liu, J.X. Yu, J. He, X. Du, Fast single-pair SimRank computation, in: Proceedings of the Tenth SIAM International Conference on Data Mining, Columbus, Ohio, USA, 2010, pp. 571-582. April 29 - May 1, 2010.
[9]
D. Lizorkin, P. Velikhov, M.N. Grinev, D. Turdakov, Accuracy estimate and optimization techniques for SimRank computation, VLDB J. 19 (1) (2010) 45-66.
[10]
L. Nie, M. Wang, Z. Zha, T. Chua, Oracle in image search: a content-based approach to performance prediction, ACM Trans. Inf. Syst. 30 (2) (2012) 13.
[11]
L. Nie, M. Wang, Z. Zha, G. Li, T. Chua, Multimedia answering: enriching text QA with media information, in: Proceeding of the Thirty Fourth International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 2011, pp. 695-704. July 25-29, 2011.
[12]
L. Nie, S. Yan, M. Wang, R. Hong, T. Chua, Harvesting visual concepts for image search with complex queries, in: Proceedings of the Twentieth ACM Multimedia Conference, MM '12, Nara, Japan, 2012, pp. 59-68. October 29 - November 02, 2012.
[13]
Y. Shao, B. Cui, L. Chen, M. Liu, X. Xie, An efficient similarity search framework for SimRank over large dynamic graphs, in: Proceedings of the Forty First International Conference on Very Large Data Bases Endowment, 8, 2015, pp. 838-849.
[14]
W. Tao, M. Yu, G. Li, Efficient top-k SimRank-based similarity join, in: Proceedings of the Fortieth International Conference on Very Large Data Bases Endowment, 8, 2014, pp. 317-328.
[15]
W. Yu, X. Lin, J. Le, A space and time efficient algorithm for SimRank computation, in: Proceedings of the Twelfth Asia-Pacific Web Conference, Busan, Korea, 2010, pp. 164-170. 6-8 April 2010.
[16]
W. Yu, X. Lin, J. Le, Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs, in: Proceedings of the Eleventh International Conference on Web-Age Information Management, Jiuzhaigou, China, 2010, pp. 280-296. July 15-17, 2010.
[17]
W. Yu, X. Lin, W. Zhang, Towards efficient SimRank computation on large networks, in: Proceedings of Twenty Ninth IEEE International Conference on Data Engineering, Brisbane, Australia, 2013, pp. 601-612. April 8-12, 2013.
[18]
W. Zheng, L. Zou, Y. Feng, L. Chen, D. Zhao, Efficient SimRank-based similarity join over large graphs, in: Proceedings of the Forty First International Conference on Very Large Data Bases Endowment, 6, 2013, pp. 493-504.

Cited By

View all
  • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
  • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
  • (2022)Personalized query techniques in graphsInformation Sciences: an International Journal10.1016/j.ins.2022.06.023607:C(961-1000)Online publication date: 1-Aug-2022
  • Show More Cited By
  1. Fast top-k similarity join for SimRank

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Sciences: an International Journal
    Information Sciences: an International Journal  Volume 381, Issue C
    March 2017
    371 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 March 2017

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and ImprovementProceedings of the VLDB Endowment10.14778/3636218.363621917:4(617-629)Online publication date: 1-Dec-2023
    • (2023)Efficient Single-Source SimRank Query by Path AggregationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599328(3342-3352)Online publication date: 6-Aug-2023
    • (2022)Personalized query techniques in graphsInformation Sciences: an International Journal10.1016/j.ins.2022.06.023607:C(961-1000)Online publication date: 1-Aug-2022
    • (2022)Privacy preserving similarity joins using MapReduceInformation Sciences: an International Journal10.1016/j.ins.2019.03.035493:C(20-33)Online publication date: 20-Apr-2022
    • (2020)Multiple-user closest keyword-set querying in road networksInformation Sciences: an International Journal10.1016/j.ins.2019.09.009509:C(133-149)Online publication date: 1-Jan-2020
    • (2017)Partial sums-based P-Rank computation in information networksProceedings of the International Conference on Web Intelligence10.1145/3106426.3109447(1122-1130)Online publication date: 23-Aug-2017

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media