Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Closest Pairs Search Over Data Stream

Published: 13 November 2023 Publication History
  • Get Citation Alerts
  • Editorial Notes

    The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on April 15, 2024. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

    Abstract

    k-closest pair (KCP for short) search is a fundamental problem in database research. Given a set of d-dimensional streaming data S, KCP search aims to retrieve k pairs with the shortest distances between them. While existing works have studied continuous 1-closest pair query (i.e., k=1) over dynamic data environments, which allow for object insertions/deletions, they require high computational costs and cannot easily support KCP search with k>1. This paper investigates the problem of KCP search over data stream, aiming to incrementally maintain as few pairs as possible to support KCP search with arbitrarily k. To achieve this, we introduce the concept of NNS (short for Nearest Neighbour pair-Set), which consists of all the nearest neighbour pairs and allows us to support KCP search via only accessing O(k) objects. We further observe that in most cases, we only need to use a small portion of NNS to answer KCP search as typically kłl n. Based on this observation, we propose TNNS (short for Threshold-based NNpair Set), which contains a small number of high-quality NN pairs, and a partition named τ-DLBP (short for τ-Distance Lower-Bound based Partition) to organize objects, with τ being an integer significantly smaller than n. τ-DLBP organizes objects using up to O(łog n / τ) partitions and is able to support the construction and update of TNNS efficiently.

    Supplemental Material

    PDF File - 3617326-VoR
    Version of Record for "Closest Pairs Search Over Data Stream" by Zhu et al., Proceedings of the ACM on Management of Data, Vol 1, No. 3 (PACMMOD).

    References

    [1]
    Michiel H. M. Smid. Closest-point problems in computational geometry. In Handbook of Computational Geometry, pages 877--935. North Holland / Elsevier, 2000.
    [2]
    Timothy M. Chan. Dynamic generalized closest pair: Revisiting eppstein's technique. In 3rd Symposium on Simplicity in Algorithms, SOSA 2020, Salt Lake City, UT, USA, January 6--7, 2020, pages 33--37. SIAM, 2020.
    [3]
    Sanguthevar Rajasekaran, Subrata Saha, and Xingyu Cai. Novel exact and approximate algorithms for the closest pair problem. In ICDM 2017, New Orleans, LA, USA, November 18--21, 2017, pages 1045--1050. IEEE Computer Society, 2017.
    [4]
    Antonio Corral, Yannis Manolopoulos, Yannis Theodoridis, and Michael Vassilakopoulos. Closest pair queries in spatial databases. In SIGMOD 2000, May 16--18, 2000, Dallas, Texas, USA, pages 189--200. ACM, 2000.
    [5]
    Sergei Bespamyatnikh. An optimal algorithm for closest-pair maintenance. Discret. Comput. Geom., 19(2):175--195, 1998.
    [6]
    Mordecai J. Golin, Rajeev Raman, Christian Schwarz, and Michiel H. M. Smid. Randomized data structures for the dynamic closest-pair problem. SIAM J. Comput., 27(4):1036--1072, 1998.
    [7]
    Christian Schwarz, Michiel H. M. Smid, and Jack Snoeyink. An optimal algorithm for the on-line closest-pair problem. Algorithmica, 12(1):18--29, 1994.
    [8]
    Yunjun Gao, Lu Chen, Xinhan Li, Bin Yao, and Gang Chen. Efficient k-closest pair queries in general metric spaces. VLDB J., 24(3):415--439, 2015.
    [9]
    Anil K. Jain, M. Narasimha Murty, and Patrick J. Flynn. Data clustering: A review. ACM Comput. Surv., 31(3):264--323, 1999.
    [10]
    David Eppstein. Fast hierarchical clustering and other applications of dynamic closest pairs. ACM J. Exp. Algorithmics, 5:1, 2000.
    [11]
    Alexandros Nanopoulos, Yannis Theodoridis, and Yannis Manolopoulos. C2 p: Clustering based on closest pairs. In VLDB 2001, September 11--14, 2001, Roma, Italy, pages 331--340. Morgan Kaufmann, 2001.
    [12]
    Philipp Kranen, Ira Assent, Corinna Baldauf, and Thomas Seidl. Self-adaptive anytime stream clustering. In ICDM 2009, Miami, Florida, USA, 6--9 December 2009, pages 249--258. IEEE Computer Society, 2009.
    [13]
    Dingming Wu, Erjia Xiao, Yi Zhu, Christian S. Jensen, and Kezhong Lu. Efficient retrieval of the top-$k$k most relevant event-partner pairs. IEEE Trans. Knowl. Data Eng., 35(3):2529--2543, 2023.
    [14]
    Pankaj K. Agarwal, Haim Kaplan, and Micha Sharir. Kinetic and dynamic data structures for closest pair and all nearest neighbors. ACM Trans. Algorithms, 5(1):4:1--4:37, 2008.
    [15]
    Jianzhong Qi, Rui Zhang, Christian S. Jensen, Kotagiri Ramamohanarao, and Jiayuan HE. Continuous spatial query processing: A survey of safe region based techniques. ACM Comput. Surv., 51(3), may 2018.
    [16]
    Yuandong Wang, Hongzhi Yin, Lian Wu, Tong Chen, and Chunyang Liu. Secure your ride: Real-time matching success rate prediction for passenger-driver pairs. IEEE Trans. Knowl. Data Eng., 35(3):3059--3071, 2023.
    [17]
    Arneish Prateek, Arijit Khan, Akshit Goyal, and Sayan Ranu. Mining top-k pairs of correlated subgraphs in a large network. Proc. VLDB Endow., 13(9):1511--1524, 2020.
    [18]
    Jeffrey D. Ullman and Jonathan R. Ullman. Some pairs problems. In Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2016, San Francisco, CA, USA, July 1, 2016, page 8. ACM, 2016.
    [19]
    Fangwei Wu, Xike Xie, and Jieming Shi. Top-k closest pair queries over spatial knowledge graph. In Database Systems for Advanced Applications - 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11--14, 2021, Proceedings, Part I, volume 12681 of Lecture Notes in Computer Science, pages 625--640. Springer, 2021.
    [20]
    Dingming Wu, Yi Zhu, and Christian S. Jensen. In good company: Efficient retrieval of the top-k most relevant event-partner pairs. In Database Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22--25, 2019, Proceedings, Part II, volume 11447 of Lecture Notes in Computer Science, pages 519--535. Springer, 2019.
    [21]
    Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3):20:1--20:46, 2010.
    [22]
    Bolong Zheng, Xi Zhao, Lianggui Weng, Quoc Viet Hung Nguyen, Hang Liu, and Christian S. Jensen. PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search. VLDB J., 31(6):1339--1363, 2022.
    [23]
    Michiel H. M. Smid. Maintaining the minimal distance of a point set in polylogarithmic time. In Proceedings of the Second Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 28--30 January 1991, San Francisco, California, USA, pages 1--6. ACM/SIAM, 1991.
    [24]
    Sanjiv Kapoor and Michiel H. M. Smid. New techniques for exact and approximate dynamic closest-point problems. SIAM J. Comput., 25(4):775--796, 1996.
    [25]
    Antonio Corral, Yannis Manolopoulos, Yannis Theodoridis, and Michael Vassilakopoulos. Algorithms for processing k-closest-pair queries in spatial databases. Data Knowl. Eng., 49(1):67--104, 2004.
    [26]
    Yifan Zhu, Lu Chen, Yunjun Gao, and Christian S. Jensen. Pivot selection algorithms in metric spaces: a survey and experimental study. VLDB J., 31(1):23--47, 2022.
    [27]
    Lu Chen, Yunjun Gao, Xuan Song, Zheng Li, Yifan Zhu, Xiaoye Miao, and Christian S. Jensen. Indexing metric spaces for exact similarity search. ACM Comput. Surv., 55(6):128:1--128:39, 2023.
    [28]
    Yiqiu Wang, Shangdi Yu, Yan Gu, and Julian Shun. A parallel batch-dynamic data structure for the closest pair problem. In 37th International Symposium on Computational Geometry, SoCG 2021, June 7--11, 2021, Buffalo, NY, USA (Virtual Conference), volume 189 of LIPIcs, pages 60:1--60:16, 2021.
    [29]
    Zhitao Shen, Muhammad Aamir Cheema, Xuemin Lin, Wenjie Zhang, and Haixun Wang. Efficiently monitoring top-k pairs over sliding windows. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, pages 798--809, 2012.
    [30]
    Muhammad Aamir Cheema, Xuemin Lin, Haixun Wang, and Wenjie Zhang. A unified framework for answering k closest pairs queries and variants. IEEE Trans. Knowl. Data Eng., 26(11):2610--2624, 2014.
    [31]
    Abdullah Mueen, Eamonn J. Keogh, Qiang Zhu, Sydney Cash, and M. Brandon Westover. Exact discovery of time series motifs. In Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA, pages 473--484. SIAM, 2009.
    [32]
    Jianzhong Qi, Rui Zhang, Christian S. Jensen, Kotagiri Ramamohanarao, and Jiayuan He. Continuous spatial query processing: A survey of safe region based techniques. ACM Comput. Surv., 51(3):64:1--64:39, 2018.
    [33]
    Kyriakos Mouratidis, Marios Hadjieleftheriou, and Dimitris Papadias. Conceptual partitioning: An efficient method for continuous nearest neighbor monitoring. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005, pages 634--645. ACM, 2005.
    [34]
    Baihua Zheng, Wang-Chien Lee, and Dik Lun Lee. Search continuous nearest neighbors on the air. In 1st Annual International Conference on Mobile and Ubiquitous Systems (MobiQuitous 2004), Networking and Services, 22--25 August 2004, Cambridge, MA, USA, pages 236--245. IEEE Computer Society, 2004.
    [35]
    Baihua Zheng, Wang-Chien Lee, and Dik Lun Lee. On searching continuous k nearest neighbors in wireless data broadcast systems. IEEE Trans. Mob. Comput., 6(7):748--761, 2007.
    [36]
    Kyriakos Mouratidis, Spiridon Bakiras, and Dimitris Papadias. Continuous monitoring of spatial queries in wireless broadcast environments. IEEE Trans. Mob. Comput., 8(10):1297--1311, 2009.
    [37]
    Rui Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng, and Guoren Wang. SAP: improving continuous top-k queries over streaming data. IEEE Trans. Knowl. Data Eng., 29(6):1310--1328, 2017.
    [38]
    Yunjun Gao and Baihua Zheng. Continuous obstructed nearest neighbor queries in spatial databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 577--590. ACM, 2009.
    [39]
    Yunjun Gao, Baihua Zheng, Wang-Chien Lee, and Gencai Chen. Continuous visible nearest neighbor queries. In EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, March 24--26, 2009, Proceedings, pages 144--155. ACM, 2009.
    [40]
    Yunjun Gao, Baihua Zheng, Gencai Chen, Qing Li, and Xiaofa Guo. Continuous visible nearest neighbor query processing in spatial databases. VLDB J., 20(3):371--396, 2011.
    [41]
    Yunjun Gao, Baihua Zheng, Gang Chen, Chun Chen, and Qing Li. Continuous nearest-neighbor search in the presence of obstacles. ACM Trans. Database Syst., 36(2):9:1--9:43, 2011.
    [42]
    Kyriakos Mouratidis, Dimitris Papadias, Spiridon Bakiras, and Yufei Tao. A threshold-based algorithm for continuous monitoring of k nearest neighbors. IEEE Trans. Knowl. Data Eng., 17(11):1451--1464, 2005.
    [43]
    Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei Tao, and Dik Lun Lee. Location-based spatial queries. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, pages 443--454. ACM, 2003.
    [44]
    Sarana Nutanong, Rui Zhang, Egemen Tanin, and Lars Kulik. The v*-diagram: a query-dependent approach to moving KNN queries. Proc. VLDB Endow., 1(1):1095--1106, 2008.
    [45]
    Haibo Hu, Jianliang Xu, and Dik Lun Lee. A generic framework for monitoring continuous spatial queries over moving objects. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005, pages 479--490. ACM, 2005.
    [46]
    Chuanwen Li, Yu Gu, Jianzhong Qi, Ge Yu, Rui Zhang, and Wang Yi. Processing moving knn queries using influential neighbor sets. Proc. VLDB Endow., 8(2):113--124, 2014.
    [47]
    Stefan Berchtold, Christian Böhm, Daniel A. Keim, and Hans-Peter Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 12--14, 1997, Tucson, Arizona, USA, pages 78--86. ACM Press, 1997.
    [48]
    https://www1.nyc.gov/site/tlc/about/tlc-trip-record data.page.

    Cited By

    View all
    • (2024)SWISP: Distributed Convoy Mining via Sliding Window-based Indexing and Sub-track Partitioning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00344(4518-4531)Online publication date: 13-May-2024
    • (2024)Multiple Continuous Top-K Queries Over Data Stream2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00129(1575-1588)Online publication date: 13-May-2024
    • (2024)TSec: An Efficient and Effective Framework for Time Series Classification2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00115(1394-1406)Online publication date: 13-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 3
    PACMMOD
    September 2023
    472 pages
    EISSN:2836-6573
    DOI:10.1145/3632968
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2023
    Published in PACMMOD Volume 1, Issue 3

    Permissions

    Request permissions for this article.

    Author Tags

    1. cube
    2. k-closest pair search
    3. partition
    4. streaming data

    Qualifiers

    • Research-article

    Funding Sources

    • the National Key Research and Development Program of China
    • National Natural Science Foundation of China
    • National Natural Science Foundation of Liao Ning

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)211
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SWISP: Distributed Convoy Mining via Sliding Window-based Indexing and Sub-track Partitioning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00344(4518-4531)Online publication date: 13-May-2024
    • (2024)Multiple Continuous Top-K Queries Over Data Stream2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00129(1575-1588)Online publication date: 13-May-2024
    • (2024)TSec: An Efficient and Effective Framework for Time Series Classification2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00115(1394-1406)Online publication date: 13-May-2024
    • (2024)Classic distance join queries using compact data structuresInformation Sciences: an International Journal10.1016/j.ins.2024.120732674:COnline publication date: 18-Jul-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media