Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index

Zhu, Liang; Zhang, Shilan; Song, Xin; Ma, Qin; Meng, Weiyi

doi:10.1007/978-981-99-7254-8_57

Liang Zhu¹²,
Shilan Zhang¹²,
Xin Song¹²,
Qin Ma¹² &
…
Weiyi Meng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14306))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1218 Accesses

Abstract

In many applications and scenarios, there are opportunities for processing reverse nearest neighbor (RNN) queries, which are derived from and more complex than nearest neighbor (NN) queries. Generally, processing NN queries involves sophisticated data structures and methods, and has been very well addressed for low-dimensional data (usually less than 10); while efficiently processing exact NN or RNN queries for high dimensional data remains a challenging problem. This paper proposes an algorithm of evaluating RNN queries in higher dimensional l_p spaces. The main idea of our algorithm is that an RNN query can be processed efficiently based on relevant information easily available and retrievable from memory. The data space containing a finite dataset is divided into multiple small regions forming an unbalanced multiway region tree, then an index containing important information is created by using the tree and the sorted lists of tuples in the dataset. The algorithm consists of two pruning approaches and a verification method based on the index and the characteristics of l_p spaces. Extensive experiments are conducted to demonstrate the excellent performance of our algorithm over various datasets and to show that it outperforms existing state-of-the-art methods CSD, VR-RNN, SFT and TPL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reverse-k-Nearest-Neighbor Join Processing

Reverse Top-k Group Nearest Neighbor Search

Fast Nearest Neighbor Search Based on Approximate k-NN Graph

References

Allheeib, N., Adhinugraha, K., Taniar, D., Islam, M.S.: Computing reverse nearest neighbourhood on road maps. World Wide Web 25, 99–130 (2022)
Article Google Scholar
Blackard, J.A., Dean, D.J., Anderson, C.W.: UCI repository of machine learning data-bases (1998). http://archive.ics.uci.edu/ml/datasets/Covertype. Accessed 10 Aug 2022
Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high dimensional nearest neighbor search and related problems. In: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing (STOC 1999), pp. 312–321 (1999)
Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pp. 211–222 (2001)
Google Scholar
Casanova, G., et al.: Dimensional testing for reverse k-nearest neighbor search. Proc. VLDB Endowment 10(7), 769–780 (2017)
Article Google Scholar
Chahal, H., Toner, H., Rahkovsky, I.: Small data’s big AI potential. Center for Security and Emerging Technology (2021). https://cset.georgetown.edu/publication/small-datas-big-ai-potential/. Accessed 26 July 2022
Cheema, M.A., Zhang, W., Lin, X., Zhang, Y.: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. 21(5), 703–728 (2012)
Article Google Scholar
Das, R., Biswas, S.K., Devi, D., Sarma, B.: An oversampling technique by integrating reverse nearest neighbor in SMOTE: Reverse-SMOTE. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 1239–1244 (2020)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Guo, Y.-R., Bai, Y.-Q., Li, C.-N., Shao, Y.-H., Ye, Y.-F., Jiang, C.-Z.: Reverse nearest neighbors bhattacharyya bound linear discriminant analysis for multimodal classification. Eng. Appl. Artif. Intell. 97, 104033 (2021)
Article Google Scholar
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012)
Article MathSciNet MATH Google Scholar
Hu, L., Liu, H., Zhang, J., Liu, A.: KR-DBSCAN: a density-based clustering algorithm based on reverse nearest neighbor and influence space. Expert Syst. Appl. 186, 115763 (2021)
Article Google Scholar
Jin, P., et al.: Maximizing the influence of bichromatic reverse k nearest neighbors in geo-social networks. World Wide Web 26(4), 1567–1598 (2023)
Article Google Scholar
Khedr, A.M., Raj, P.V.P.: DRNNA: decomposable reverse nearest neighbor algorithm for vertically distributed databases. In: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD), pp. 681–686 (2021)
Google Scholar
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. ACM SIGMOD Rec. 29(2), 201–212 (2000)
Article Google Scholar
Li, Y., Liu, G., Bai, M., Gao, J., Ye, L., Ming, Z.: CSD: Discriminance with conic section for improving reverse k nearest neighbors queries. arXiv:2005.08483 (2020)
Panetta, K.: Gartner top 10 data and analytics trends for 2021 (2021). https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021. Accessed 15 July 2022
Sharifzadeh, M., Shahabi, C.: VoR-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proc. VLDB Endowment 3(1–2), 1231–1242 (2010)
Article Google Scholar
Singh, A., Ferhatosmanoğlu, H., Tosun, A.Ş.: High dimensional reverse nearest neighbor queries. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 2003), pp. 91–98 (2003)
Google Scholar
Singh, V., Singh, A.K.: SIMP: accurate and efficient near neighbor search in high dimensional spaces. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 492–503 (2012)
Google Scholar
Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor queries for dynamic databases. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)
Google Scholar
Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multi-dimensional reverse kNN search. VLDB J. 16(3), 293–316 (2007)
Article Google Scholar
U.S. Census Bureau. https://www2.census.gov/geo/tiger/TGRGDB21/. Accessed 24 July 2022
Wang, S., Zhang, Y., Lin, X., Cheema, M.A.: Maximize spatial influence of facility bundle considering reverse k nearest neighbors. In: Database Systems for Advanced Applications, DASFAA 2018, pp. 684–700 (2018)
Google Scholar
Wu, W., Yang, F., Chan, C.-Y., Tan, K.-L.: FINCH: Evaluating reverse k-nearest-neighbor queries on location data. Proc. VLDB Endowment 1(1), 1056–1067 (2008)
Article Google Scholar
Yang, S., Cheema, M.A., Lin, X., Zhang, Y., Zhang, W.: Reverse k nearest neighbors queries and spatial reverse top-k queries. VLDB J. 26(2), 151–176 (2017)
Article Google Scholar
Zheng, B., Zhao, X., Weng, L., Hung, N.Q.V., Liu, H., Jensen, C.S.: PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search. Proc. VLDB Endowment 13(5), 643–655 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hebei University, Baoding, 071002, Hebei, China
Liang Zhu, Shilan Zhang, Xin Song & Qin Ma
State University of New York at Binghamton, Binghamton, NY, 13902, USA
Weiyi Meng

Authors

Liang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shilan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Song
View author publications
You can also search for this author in PubMed Google Scholar
Qin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Weiyi Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Liang Zhu or Xin Song .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Feng Zhang
Victoria University, Footscray, VIC, Australia
Hua Wang
Qatar University, Doha, Qatar
Mahmoud Barhamgi
Swinburne University of Technology, Hawthorn, Australia
Lu Chen
Swinburne University of Technology, Hawthorn, Australia
Rui Zhou

Appendix: Proofs for Lemmas

Lemma 1.

Consider the dataset R and its UR-tree Index with the region set {P_i: i = 0, 1, ⋅⋅⋅, s}. For a query point Q, a region P_i and R_i = R ∩ P_i (0 ≤ i ≤ s), if d(c, Q) > r_max + d_max, then ∀t ∈ R_i, t ∉ RNN(Q), where c is the center-point of P_i, r_max = d(c, y), d_max = max{d(t, t_NN) | ∀t ∈ R_i}, and y = (y₁, ⋅⋅⋅, y_n) is the max-point of P_i.

Proof of Lemma 1:

For a query point Q, a region P_i and R_i = R ∩ P_i, as is shown in Fig.A, d(c, Q) > r_max + d_max = d(c, y) + max{d(t, t_NN) | ∀t ∈ R_i}, c is the center-point of P_i, and y = (y₁, ⋅⋅⋅, y_n) is the max-point of P_i.

For arbitrary t ∈ R_i, according to the triangle inequality of distance function, we have

$$ \begin{array}{*{20}c} {d\left( {{\varvec{c}},{\varvec{t}}} \right) + d\left( {{\varvec{t}},Q} \right) \, \ge d\left( {{\varvec{c}},Q} \right)} \\ {d\left( {{\varvec{t}},Q} \right) \, \ge d\left( {{\varvec{c}},Q} \right) - d\left( {{\varvec{c}},{\varvec{t}}} \right)} \\ \end{array} $$

Since d(c, Q) > r_max + d_max, we have.

$$ \begin{array}{*{20}c} {d\left( {{\varvec{t}},Q} \right)\, > \,r_{max} \, + \,d_{max} \, - \,d({\varvec{c}},{\varvec{t}})} \\ {d\left( {{\varvec{t}},Q} \right)\, - \,d_{max} \, > \,r_{max} \, - \,d({\varvec{c}},{\varvec{t}})} \\ \end{array} $$

Moreover, d(c, t) ≤ d(c, y) = r_max as y is the max-point of P_i. Therefore,

$$ \begin{array}{*{20}c} {d\left( {{\varvec{t}},Q} \right)\, - \,d_{max} \, > \,0} \\ {d\left( {{\varvec{t}},Q} \right)\, > \,d_{max} \, = \,max\{ d\left( {{\varvec{t}},{\varvec{t}}_{NN} } \right) \, |\forall {\varvec{t}}\, \in \,{\varvec{R}}_{i} \} \, \ge \,d({\varvec{t}},{\varvec{t}}_{NN} )} \\ \end{array} $$

Thus, query point Q is not the nearest neighbor of tuple t, that is, t ∉ RNN(Q).

We restate Lemma 2 that summarizes some characteristics of l_p spaces, and then we prove it.

Lemma 2.

Let two arbitrary points x = (x₁, ⋅⋅⋅, x_n), y = (y₁, ⋅⋅⋅, y_n) ∈ ℜⁿ, and a constant σ > 0. Then.

(1°) ||x||_p ≤ n^(1/p−1/q) ||x||_q, for 1 ≤ p < q.

(2°) ||x||_∞ ≤ ||x||_p ≤ n^1/p||x||_∞, for 1 ≤ p < ∞.

(3°) d_p(x, y) > σ if |x_i − y_i|>σ for some 1 ≤ i ≤ n, where the distance function d_p(⋅,⋅) is induced by ||⋅||_p.

We will present the proof of (1°) by using Hölder’s inequality.

Hölder’s Inequality:

Assume that r and s are in the open interval (1, ∞) with 1/r + 1/s = 1. Then, for arbitrary a = (a₁, a₂, ⋅⋅⋅, a_n), b = (b₁, b₂, ⋅⋅⋅, b_n) ∈ ℜⁿ,

$$ \left| {\left| {{\varvec{ab}}} \right|} \right|_{1} \, \le \,\left| {\varvec{a}} \right|_{r} \left| {\left| {\varvec{b}} \right|} \right|_{s} $$

that is,

$$ \sum\nolimits_{i = 1}^{n} {\left| {a_{i} b_{i} } \right| \le \left( {\sum\nolimits_{i = 1}^{n} {\left| {a_{i} } \right|^{r} } } \right)}^{1/r} \left( {\sum\nolimits_{i = 1}^{n} {\left| {b_{i} } \right|^{s} } } \right)^{1/s} $$

Proof of Lemma 2:

(1°) Let s = q/p and r = q/(q − p). Then s > 1, r > 1 and 1/r + 1/s = 1 since 1 ≤ p < q. Suppose that a_i = 1 and b_i = |x_i|^p, i = 1, ⋅⋅⋅, n. By Hölder’s inequality, we have.

$$ \begin{gathered} \sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{p} = \sum\nolimits_{i = 1}^{n} {(1 \cdot \left| {x_{i} } \right|^{p} )\, = \sum\nolimits_{i = 1}^{n} {\left| {a_{i} b_{i} } \right| \le \left( {\sum\nolimits_{i = 1}^{n} {\left| {a_{i} } \right|^{r} } } \right)} } }^{1/r} \left( {\sum\nolimits_{i = 1}^{n} {\left| {b_{i} } \right|^{s} } } \right)^{1/s} \hfill \\ = \,n^{1/r} \left( {\sum\nolimits_{i = 1}^{n} {\left| {b_{i} } \right|^{s} } } \right)^{1/s} = n^{(q - p)/q} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{ps} } } \right)^{1/s} = n^{(q - p)/q} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{p/q} \hfill \\ \end{gathered} $$

Then,

$$ \begin{gathered} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{p} } } \right)^{1/p} \le \left( {n^{(q - p)/q} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{p/q} } \right)^{1/p} = \,n^{(q - p)/qp} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{1/q} \hfill \\ = \,n^{(1/p - 1/q)} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{1/q} = \,n^{(1/p - 1/q)} \left| {\left| {\varvec{x}} \right|} \right|_{q} \hfill \\ \end{gathered} $$

That is, ||x||_p ≤ n^(1/p−1/q)||x||_q by the definition ||x||_p = ($\sum^{n}_{i=1} $|x_i|^p)^1/p.

(2°) By the definition of l_p-norm ||⋅||_p, (1°) and ||x||_q → ||x||_∞ when q → ∞, we have.

$$ \left| {\left| {\varvec{x}} \right|} \right|_{\infty } \, \le \,\left| {\left| {\varvec{x}} \right|} \right|_{p} and \, \left| {\left| {\varvec{x}} \right|} \right|_{p} \, \le \,n^{1/p} \left| {\left| {\varvec{x}} \right|} \right|_{\infty } $$

That is,

$$ \left| {\left| {\varvec{x}} \right|} \right|_{\infty } \, \le \,\left| {\left| {\varvec{x}} \right|} \right|_{p} \, \le \,n^{1/p} \left| {\left| {\varvec{x}} \right|} \right|_{\infty } {\text{fo}}r \, 1\, \le \,p\, < \,\infty $$

(3°) If |x_i − y_i| > σ for some 1 ≤ i ≤ n, then ||x − y||_∞ ≥ |x_i − y_i| > σ. Thus, d_p(x, y) = ||x − y||_p ≥ ||x − y||_∞ > σ by using (2°).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, L., Zhang, S., Song, X., Ma, Q., Meng, W. (2023). Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_57

Download citation

DOI: https://doi.org/10.1007/978-981-99-7254-8_57
Published: 21 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7253-1
Online ISBN: 978-981-99-7254-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reverse-k-Nearest-Neighbor Join Processing

Reverse Top-k Group Nearest Neighbor Search

Fast Nearest Neighbor Search Based on Approximate k-NN Graph

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix: Proofs for Lemmas

Lemma 1.

Lemma 2.

Hölder’s Inequality:

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reverse-k-Nearest-Neighbor Join Processing

Reverse Top-k Group Nearest Neighbor Search

Fast Nearest Neighbor Search Based on Approximate k-NN Graph

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix: Proofs for Lemmas

Appendix: Proofs for Lemmas

Lemma 1.

Lemma 2.

Hölder’s Inequality:

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation