Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3517831acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Faster and Better Solution to Embed Lp Metrics by Tree Metrics

Published: 11 June 2022 Publication History

Abstract

Hierarchically Separated Tree (HST) is the most popular solution to embed a metric space into a tree metric. By using HSTs, many optimization problems, which are hard on defined metrics, become easier to get good approximation bounds with respect to the effectiveness, e.g., task assignment, trip planning, and facility location planning. Existing work focuses on constructing HSTs for arbitrary metric spaces, which makes a general-purpose algorithm take at least O(n2)-time to get tight distortion guarantees O(log(n)). Here, distortion is a prevalent measurement of HSTs' effectiveness and usability. However, we observe that (1) in many applications that HSTs are applied, only Lp metrics are used (e.g., Euclidean space), (2) the state-of-the-art solution is still time-consuming to construct HSTs for large-scale data, and (3) distortions of existing algorithms are only satisfactory for high-dimensional data. Thus, in this paper, we are motivated to study the Embedding Lp metrics through Tree metrics (ELT) problem. We aim to design a faster algorithm than O(n2) time to construct HSTs with not only O(log(n)) distortion guarantees but also good and robust empirical results. Specifically, we first present a divide-and-conquer based general framework and prove that it has a distortion guarantee of O(log(n)). To achieve a better time complexity than O(n2), we next design two optimization techniques: reducing to nearest neighbor search (by indexing) and sampling. Finally, extensive experiments demonstrate that our algorithm DCsam outperforms the state-of-the-art algorithms by a large margin in terms of both distortion and running time.

References

[1]
2021. Didi Chuxing. Retrieved Oct 21, 2021 from https://www.didiglobal.com/
[2]
2021. Faster and Better Solution to Embed Metrics by Tree Metrics (Full Paper). Technical Report. https://github.com/yzengal/SIGMOD22_Git
[3]
2021. Foursquare. Retrieved Oct 21, 2021 from https://foursquare.com/
[4]
2021. UCAR Inc. Retrieved Oct 21, 2021 from https://www.10101111.com/
[5]
Ahmed Abdelkader, Sunil Arya, Guilherme Dias da Fonseca, and David M. Mount. 2019. Approximate Nearest Neighbor Searching with Non-Euclidean and Weighted Distances. In SODA. 355--372.
[6]
Ittai Abraham, Yair Bartal, and Ofer Neiman. 2006. Advances in metric embedding theory. In STOC. 271--286.
[7]
Barbara M. Anthony and Christine Chung. 2014. Online bottleneck matching. Journal of Combinatorial Optimization 27, 1 (2014), 100--114.
[8]
Lars Arge, Mark de Berg, Herman J. Haverkort, and Ke Yi. 2008. The priority R-tree: A practically efficient and worst-case optimal R-tree. ACM Trans. Database Syst. 4, 1 (2008), 9:1--9:30.
[9]
Sunil Arya and David M. Mount. 1993. Approximate Nearest Neighbor Queries in Fixed Dimensions. In SODA. 271--280.
[10]
Sunil Arya, David M. Mount, Nathan S. Netanyahu, Ruth Silverman, and Angela Y. Wu. 1998. An Optimal Algorithm for Approximate Nearest Neighbor Searching Fixed Dimensions. Journal of the ACM 45, 6 (1998), 891--923.
[11]
V Asha, Nagappa U Bhajantri, and P Nagabhushan. 2011. GLCM--based chi--square histogram distance for automatic detection of defects on patterned textures. International Journal of Computational Vision and Robotics 2, 4 (2011), 302--313.
[12]
Yossi Azar and Noam Touitou. 2019. General Framework for Metric Optimization Problems with Delay or with Deadlines. In FOCS. 11--22.
[13]
Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. 2019. Scalable Fair Clustering. In ICML. 405--413.
[14]
Yair Bartal. 1996. Probabilistic Approximations of Metric Spaces and Its Algorithmic Applications. In FOCS. 184--193.
[15]
Yair Bartal. 1998. On Approximating Arbitrary Metrices by Tree Metrics. In STOC. 161--168.
[16]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In SIGMOD. 322--331.
[17]
Guy E. Blelloch, Anupam Gupta, and Kanat Tangwongsan. 2012. Parallel probabilistic tree embeddings, k-median, and buy-at-bulk network design. In SPAA. 205--213.
[18]
Zhao Chen, Peng Cheng, Yuxiang Zeng, and Lei Chen. 2019. Minimizing Maximum Delay of Task Assignment in Spatial Crowdsourcing. In ICDE. 1454--1465.
[19]
Christian Coester and Elias Koutsoupias. 2019. The online -taxi problem. In STOC. 1136--1147.
[20]
Marek Cygan, Artur Czumaj, Marcin Mucha, and Piotr Sankowski. 2018. Online Facility Location with Deletions. In ESA. 21:1--21:15.
[21]
Beman Dawes, David Abrahams, and Rene Rivera. 2021. Boost C++ Libraries. Retrieved Oct 21, 2021 from https://www.boost.org/
[22]
Mark de Berg, Otfried Cheong, Marc J. van Kreveld, and Mark H. Overmars. 2008. Computational geometry: algorithms and applications, 3rd Edition. Springer.
[23]
Didi Chuxing. 2021. GAIA Initiative. Retrieved Oct 21, 2021 from http://gaia.didichuxing.com
[24]
Yunus Esencayi, Marco Gaboardi, Shi Li, and Di Wang. 2019. Facility Location Problem in Differential Privacy Model Revisited. In NeurIPS. 8489--8498.
[25]
Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. 2003. A tight bound on approximating arbitrary metrics by tree metrics. In STOC. 448--455.
[26]
Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. 2004. A tight bound on approximating arbitrary metrics by tree metrics. Journal of Computer and System Sciences 69, 3 (2004), 485--497.
[27]
Raphael A. Finkel and Jon Louis Bentley. 1974. Quad Trees: A Data Structure for Retrieval on Composite Keys. Acta Informatica 4 (1974), 1--9.
[28]
Johannes Fischer and Volker Heun. 2011. Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays. SIAM J. Comput. 40, 2 (2011), 465--492.
[29]
Stephan Friedrichs and Christoph Lenzen. 2018. Parallel Metric Tree Embedding Based on an Algebraic View on Moore-Bellman-Ford. Journal of the ACM 65, 6 (2018), 43:1--43:55.
[30]
Harold N. Gabow, Jon Louis Bentley, and Robert Endre Tarjan. 1984. Scaling and Related Techniques for Geometry Problems. In STOC. 135--143.
[31]
Junhao Gan and Yufei Tao. 2015. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation. In SIGMOD. 519--530.
[32]
Jie Gao, Leonidas J. Guibas, Nikola Milosavljevic, and Dengpan Zhou. 2009. Distributed resource management and matching in sensor networks. In IPSN. 97--108.
[33]
Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD. 47--57.
[34]
Sariel Har-Peled. 2001. A Replacement for Voronoi Diagrams of Near Linear Size. In FOCS. 94--103.
[35]
Sariel Har-Peled. 2011. Geometric approximation algorithms. Number 173. American Mathematical Society.
[36]
Dov Harel and Robert Endre Tarjan. 1984. Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput. 13, 2 (1984), 338--355.
[37]
Qiang Huang, Jianlin Feng, Qiong Fang, Wilfred Ng, and Wei Wang. 2017. Query-aware locality-sensitive hashing scheme for Lp norm. The VLDB Journal 26, 5 (2017), 683--708.
[38]
Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng. 2015. Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search. PVLDB 9, 1 (2015), 1--12.
[39]
Piotr Indyk. 2001. Algorithmic Applications of Low-Distortion Geometric Embeddings. In FOCS. 10--33.
[40]
Bala Kalyanasundaram and Kirk Pruhs. 1993. Online Weighted Matching. Journal of Algorithms 14, 3 (1993), 478--488.
[41]
Goran Konjevod, R. Ravi, and F. Sibel Salman. 2001. On approximating planar metrics by tree metrics. Information Processing Letters 80, 4 (2001), 213--219.
[42]
Jian Li, Amol Deshpande, and Samir Khuller. 2009. Minimizing Communication Cost in Distributed Multi-query Processing. In ICDE. 772--783.
[43]
Adam Meyerson, Akash Nanavati, and Laura J. Poplawski. 2006. Randomized online algorithms for minimum metric bipartite matching. In SODA. 954--959.
[44]
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.
[45]
Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press.
[46]
David M. Mount. 2019. New Directions in Approximate Nearest-Neighbor Searching. In CALDAM. 1--15.
[47]
Marius Muja and David G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP. 331--340.
[48]
Marius Muja and David G. Lowe. 2021. FLANN: Fast Library for Approximate Nearest Neighbors. Retrieved Oct 21, 2021 from https://github.com/flann-lib/flann
[49]
Ofir Pele and Michael Werman. 2010. The quadratic-chi histogram distance family. In ECCV. 749--762.
[50]
Jianzhong Qi, Guanli Liu, Christian S. Jensen, and Lars Kulik. 2020. Effectively Learning Spatial Indices. PVLDB 13, 11 (2020), 2341--2354.
[51]
Jianzhong Qi, Yufei Tao, Yanchuan Chang, and Rui Zhang. 2020. Packing R-trees with Space-filling Curves: Theoretical Optimality, Empirical Efficiency, and Bulk-loading Parallelizability. ACM Trans. Database Syst. 45, 3 (2020), 14:1--14:47.
[52]
Hanan Samet. 2006. Foundations of multidimensional and metric data structures. Academic Press.
[53]
James S Tanton. 2005. Encyclopedia of mathematics. Infobase Publishing.
[54]
Qian Tao, Yongxin Tong, Zimu Zhou, Yexuan Shi, Lei Chen, and Ke Xu. 2020. Differentially Private Online Task Assignment in Spatial Crowdsourcing: A Tree- based Approach. In ICDE. 517--528.
[55]
Yongxin Tong, Jieying She, Bolin Ding, Lei Chen, Tianyu Wo, and Ke Xu. 2016. Online Minimum Matching in Real-Time Spatial Data: Experiments and Analysis. PVLDB 9, 12 (2016), 1053--1064.
[56]
Yongxin Tong, Yuxiang Zeng, Bolin Ding, Libin Wang, and Lei Chen. 2021. Two-Sided Online Micro-Task Assignment in Spatial Crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 33, 5 (2021), 2295--2309.
[57]
Yongxin Tong, Zimu Zhou, Yuxiang Zeng, Lei Chen, and Cyrus Shahabi. 2020. Spatial crowdsourcing: a survey. The VLDB Journal 29, 1 (2020), 217--250.
[58]
Csaba D Toth, Joseph O'Rourke, and Jacob E Goodman. 2017. Handbook of discrete and computational geometry. Chapman and Hall/CRC.
[59]
Kilian Weinberger. 2021. Lecture 2: K-Nearest Neighbors (Curse of Dimensionality). Retrieved Oct 21, 2021 from https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote02_kNN.html
[60]
David P Williamson and David B Shmoys. 2011. The design of approximation algorithms. Cambridge university press.
[61]
Raymond Chi-Wing Wong, Yufei Tao, Ada Wai-Chee Fu, and Xiaokui Xiao. 2007. On Efficient Spatial Matching. In VLDB. 579--590.
[62]
Dingqi Yang, Daqing Zhang, Vincent. W. Zheng, and Zhiyong Yu. 2015. Modeling User Activity Preference by Leveraging User Spatial Temporal Characteristics in LBSNs. IEEE Trans. Syst. Man Cybern. Syst. 45, 1 (2015), 129--142.
[63]
Yuxiang Zeng, Yongxin Tong, and Lei Chen. 2019. Last-Mile Delivery Made Practical: An Efficient Route Planning Framework with Theoretical Guarantees. PVLDB 13, 3 (2019), 320--333.
[64]
Yuxiang Zeng, Yongxin Tong, and Lei Chen. 2021. HST+: An Efficient Index for Embedding Arbitrary Metric Spaces. In ICDE. 648--659.
[65]
Yuxiang Zeng, Yongxin Tong, Lei Chen, and Zimu Zhou. 2018. Latency-Oriented Task Completion via Spatial Crowdsourcing. In ICDE. 317--328.
[66]
Yuxiang Zeng, Yongxin Tong, Yuguang Song, and Lei Chen. 2020. The Simpler The Better: An Indexing Approach for Shared-Route Planning Queries. PVLDB 13, 13 (2020), 3517--3530.
[67]
Boming Zhao, Pan Xu, Yexuan Shi, Yongxin Tong, Zimu Zhou, and Yuxiang Zeng. 2019. Preference-Aware Task Assignment in On-Demand Taxi Dispatching: An Online Stable Matching Approach. In AAAI. 2245--2252.

Cited By

View all
  • (2025)DIMS: Distributed Index for Similarity Search in Metric SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348775937:1(210-225)Online publication date: Jan-2025
  • (2024)Dimensionality Reduction for Partial Label Learning: A Unified and Adaptive ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336772136:8(3765-3782)Online publication date: 20-Feb-2024
  • (2024)HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00326(4275-4287)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchically separated tree
  2. metric embedding

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong ITC ITF grants
  • Microsoft Research Asia Collaborative Research Grant
  • the Hong Kong RGC Theme-based project TRS
  • HKUST-Webank joint research lab grants
  • National Key Research and Development Program of China Grant
  • HKUST-NAVER/LINE AI Lab
  • HKUST Global Strategic Partnership Fund (2021 SJTU-HKUST)
  • the Hong Kong RGC CRF Project
  • Guangdong Basic and Applied Basic Research Foundation
  • the National Science Foundation of China (NSFC) under Grant
  • the Hong Kong RGC RIF Project
  • China NSFC
  • the State Key Laboratory of Software Development Environment Open Funding
  • the Hong Kong RGC GRF Project
  • the Hong Kong RGC AOE Project
  • Didi-HKUST joint research lab

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)DIMS: Distributed Index for Similarity Search in Metric SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348775937:1(210-225)Online publication date: Jan-2025
  • (2024)Dimensionality Reduction for Partial Label Learning: A Unified and Adaptive ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336772136:8(3765-3782)Online publication date: 20-Feb-2024
  • (2024)HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00326(4275-4287)Online publication date: 13-May-2024
  • (2023)LiteHST: A Tree Embedding based Method for Similarity SearchProceedings of the ACM on Management of Data10.1145/35887151:1(1-26)Online publication date: 30-May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media