Abstract
In many applications and scenarios, there are opportunities for processing reverse nearest neighbor (RNN) queries, which are derived from and more complex than nearest neighbor (NN) queries. Generally, processing NN queries involves sophisticated data structures and methods, and has been very well addressed for low-dimensional data (usually less than 10); while efficiently processing exact NN or RNN queries for high dimensional data remains a challenging problem. This paper proposes an algorithm of evaluating RNN queries in higher dimensional lp spaces. The main idea of our algorithm is that an RNN query can be processed efficiently based on relevant information easily available and retrievable from memory. The data space containing a finite dataset is divided into multiple small regions forming an unbalanced multiway region tree, then an index containing important information is created by using the tree and the sorted lists of tuples in the dataset. The algorithm consists of two pruning approaches and a verification method based on the index and the characteristics of lp spaces. Extensive experiments are conducted to demonstrate the excellent performance of our algorithm over various datasets and to show that it outperforms existing state-of-the-art methods CSD, VR-RNN, SFT and TPL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allheeib, N., Adhinugraha, K., Taniar, D., Islam, M.S.: Computing reverse nearest neighbourhood on road maps. World Wide Web 25, 99–130 (2022)
Blackard, J.A., Dean, D.J., Anderson, C.W.: UCI repository of machine learning data-bases (1998). http://archive.ics.uci.edu/ml/datasets/Covertype. Accessed 10 Aug 2022
Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high dimensional nearest neighbor search and related problems. In: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing (STOC 1999), pp. 312–321 (1999)
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pp. 211–222 (2001)
Casanova, G., et al.: Dimensional testing for reverse k-nearest neighbor search. Proc. VLDB Endowment 10(7), 769–780 (2017)
Chahal, H., Toner, H., Rahkovsky, I.: Small data’s big AI potential. Center for Security and Emerging Technology (2021). https://cset.georgetown.edu/publication/small-datas-big-ai-potential/. Accessed 26 July 2022
Cheema, M.A., Zhang, W., Lin, X., Zhang, Y.: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. 21(5), 703–728 (2012)
Das, R., Biswas, S.K., Devi, D., Sarma, B.: An oversampling technique by integrating reverse nearest neighbor in SMOTE: Reverse-SMOTE. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 1239–1244 (2020)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Guo, Y.-R., Bai, Y.-Q., Li, C.-N., Shao, Y.-H., Ye, Y.-F., Jiang, C.-Z.: Reverse nearest neighbors bhattacharyya bound linear discriminant analysis for multimodal classification. Eng. Appl. Artif. Intell. 97, 104033 (2021)
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012)
Hu, L., Liu, H., Zhang, J., Liu, A.: KR-DBSCAN: a density-based clustering algorithm based on reverse nearest neighbor and influence space. Expert Syst. Appl. 186, 115763 (2021)
Jin, P., et al.: Maximizing the influence of bichromatic reverse k nearest neighbors in geo-social networks. World Wide Web 26(4), 1567–1598 (2023)
Khedr, A.M., Raj, P.V.P.: DRNNA: decomposable reverse nearest neighbor algorithm for vertically distributed databases. In: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD), pp. 681–686 (2021)
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. ACM SIGMOD Rec. 29(2), 201–212 (2000)
Li, Y., Liu, G., Bai, M., Gao, J., Ye, L., Ming, Z.: CSD: Discriminance with conic section for improving reverse k nearest neighbors queries. arXiv:2005.08483 (2020)
Panetta, K.: Gartner top 10 data and analytics trends for 2021 (2021). https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021. Accessed 15 July 2022
Sharifzadeh, M., Shahabi, C.: VoR-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proc. VLDB Endowment 3(1–2), 1231–1242 (2010)
Singh, A., Ferhatosmanoğlu, H., Tosun, A.Ş.: High dimensional reverse nearest neighbor queries. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 2003), pp. 91–98 (2003)
Singh, V., Singh, A.K.: SIMP: accurate and efficient near neighbor search in high dimensional spaces. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 492–503 (2012)
Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor queries for dynamic databases. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)
Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multi-dimensional reverse kNN search. VLDB J. 16(3), 293–316 (2007)
U.S. Census Bureau. https://www2.census.gov/geo/tiger/TGRGDB21/. Accessed 24 July 2022
Wang, S., Zhang, Y., Lin, X., Cheema, M.A.: Maximize spatial influence of facility bundle considering reverse k nearest neighbors. In: Database Systems for Advanced Applications, DASFAA 2018, pp. 684–700 (2018)
Wu, W., Yang, F., Chan, C.-Y., Tan, K.-L.: FINCH: Evaluating reverse k-nearest-neighbor queries on location data. Proc. VLDB Endowment 1(1), 1056–1067 (2008)
Yang, S., Cheema, M.A., Lin, X., Zhang, Y., Zhang, W.: Reverse k nearest neighbors queries and spatial reverse top-k queries. VLDB J. 26(2), 151–176 (2017)
Zheng, B., Zhao, X., Weng, L., Hung, N.Q.V., Liu, H., Jensen, C.S.: PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search. Proc. VLDB Endowment 13(5), 643–655 (2020)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendix: Proofs for Lemmas
Appendix: Proofs for Lemmas
Lemma 1.
Consider the dataset R and its UR-tree Index with the region set {Pi: i = 0, 1, ⋅⋅⋅, s}. For a query point Q, a region Pi and Ri = R ∩ Pi (0 ≤ i ≤ s), if d(c, Q) > rmax + dmax, then ∀t ∈ Ri, t ∉ RNN(Q), where c is the center-point of Pi, rmax = d(c, y), dmax = max{d(t, tNN) | ∀t ∈ Ri}, and y = (y1, ⋅⋅⋅, yn) is the max-point of Pi.
Proof of Lemma 1:
For a query point Q, a region Pi and Ri = R ∩ Pi, as is shown in Fig.A, d(c, Q) > rmax + dmax = d(c, y) + max{d(t, tNN) | ∀t ∈ Ri}, c is the center-point of Pi, and y = (y1, ⋅⋅⋅, yn) is the max-point of Pi.
For arbitrary t ∈ Ri, according to the triangle inequality of distance function, we have
Since d(c, Q) > rmax + dmax, we have.
Moreover, d(c, t) ≤ d(c, y) = rmax as y is the max-point of Pi. Therefore,
Thus, query point Q is not the nearest neighbor of tuple t, that is, t ∉ RNN(Q).
We restate Lemma 2 that summarizes some characteristics of lp spaces, and then we prove it.
Lemma 2.
Let two arbitrary points x = (x1, ⋅⋅⋅, xn), y = (y1, ⋅⋅⋅, yn) ∈ ℜn, and a constant σ > 0. Then.
(1°) ||x||p ≤ n(1/p−1/q) ||x||q, for 1 ≤ p < q.
(2°) ||x||∞ ≤ ||x||p ≤ n1/p||x||∞, for 1 ≤ p < ∞.
(3°) dp(x, y) > σ if |xi − yi|>σ for some 1 ≤ i ≤ n, where the distance function dp(⋅,⋅) is induced by ||⋅||p.
We will present the proof of (1°) by using Hölder’s inequality.
Hölder’s Inequality:
Assume that r and s are in the open interval (1, ∞) with 1/r + 1/s = 1. Then, for arbitrary a = (a1, a2, ⋅⋅⋅, an), b = (b1, b2, ⋅⋅⋅, bn) ∈ ℜn,
that is,
Proof of Lemma 2:
(1°) Let s = q/p and r = q/(q − p). Then s > 1, r > 1 and 1/r + 1/s = 1 since 1 ≤ p < q. Suppose that ai = 1 and bi = |xi|p, i = 1, ⋅⋅⋅, n. By Hölder’s inequality, we have.
Then,
That is, ||x||p ≤ n(1/p−1/q)||x||q by the definition ||x||p = (\(\sum^{n}_{i=1} \)|xi|p)1/p.
(2°) By the definition of lp-norm ||⋅||p, (1°) and ||x||q → ||x||∞ when q → ∞, we have.
That is,
(3°) If |xi − yi| > σ for some 1 ≤ i ≤ n, then ||x − y||∞ ≥ |xi − yi| > σ. Thus, dp(x, y) = ||x − y||p ≥ ||x − y||∞ > σ by using (2°).
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhu, L., Zhang, S., Song, X., Ma, Q., Meng, W. (2023). Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_57
Download citation
DOI: https://doi.org/10.1007/978-981-99-7254-8_57
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7253-1
Online ISBN: 978-981-99-7254-8
eBook Packages: Computer ScienceComputer Science (R0)