Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

Published: 01 June 2005 Publication History

Abstract

In this article, we present an efficient B+-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. iDistance partitions the data based on a space- or data-partitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B+-tree structure and KNN search to be performed using one-dimensional range search. The choice of partition and reference points adapts the index structure to the data distribution.We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.

References

[1]
Aggarwal, C., Procopiuc, C., Wolf, J., Yu, P., and Park, J. 1999. Fast algorithm for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]]
[2]
Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1994. An optimal algorithm for approximate nearest neighbor searching. In Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 573--582.]]
[3]
Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6, 891--923.]]
[4]
Berchtold, S., Böhm, C., Jagadish, H., Kriegel, H., and Sander, J. 2000. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the International Conference on Data Engineering. 577--588.]]
[5]
Berchtold, S., Böhm, C., and Kriegel, H.-P. 1998a. The pyramid-technique: Towards breaking the curse of dimensionality. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 142--153.]]
[6]
Berchtold, S., Ertl, B., Keim, D., Kriegel, H.-P., and Seidl, T. 1998b. Fast nearest neighbor search in high-dimensional space. In Proceedings of the International Conference on Data Engineering. 209--218.]]
[7]
Berchtold, S., Keim, D., and Kriegel, H. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the International Conference on Very Large Data Bases. 28--37.]]
[8]
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. 1999. When is nearest neighbors meaningful? In Proceedings of the International Conference on Database Theory.]]
[9]
Böhm, C., Berchtold, S., and Keim, D. 2001. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322--373.]]
[10]
Bozkaya, T. and Ozsoyoglu, M. 1997. Distance-based indexing for high-dimensional metric spaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 357--368.]]
[11]
Chakrabarti, K. and Mehrotra, S. 1999. The hybrid tree: An index structure for high dimensional feature spaces. In Proceedings of the International Conference on Data Engineering. 322--331.]]
[12]
Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: a new approach to indexing high dimensional spaces. In Proceedings of the International Conference on Very Large Databases. 89--100.]]
[13]
Ciaccia, P., Patella, M., and Zezula, P. 1997. M-trees: An efficient access method for similarity search in metric space. In Proceedings of the International Conference on Very Large Data Bases. 426--435.]]
[14]
Cui, B., Ooi, B. C., Su, J. W., and Tan, K. L. 2003. Contorting high dimensional data for efficient main memory processing. In Proceedings of the ACM SIGMOD Conference. 479--490.]]
[15]
Cui, B., Ooi, B. C., Su, J. W., and Tan, K. L. 2004. Indexing high-dimensional data for efficient in-memory similarity search. In IEEE Trans. Knowl. Data Eng. to appear.]]
[16]
Faloutsos, C. and Lin, K.-I. 1995. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 163--174.]]
[17]
Filho, R. F. S., Traina, A., and Faloutsos, C. 2001. Similarity search without tears: The omni family of all-purpose access methods. In Proceedings of the International Conference on Data Engineering. 623--630.]]
[18]
Goldstein, J. and Ramakrishnan, R. 2000. Contrast plots and p-sphere trees: space vs. time in nearest neighbor searches. In Proceedings of the International Conference on Very Large Databases. 429--440.]]
[19]
Guha, S., Rastogi, R., and Shim, K. 1998. Cure: an efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]]
[20]
Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 47--57.]]
[21]
Jagadish, H., Ooi, B. C., Tan, K.-L., Yu, C., and Zhang, R. 2004. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. Tech. Rep. www.comp.nus.edu.sg/~ooibc, National University of Singapore.]]
[22]
Jolliffe, I. T. 1986. Principle Component Analysis. Springer-Verlag.]]
[23]
Katamaya, N. and Satoh, S. 1997. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]]
[24]
Koudas, N., Ooi, B. C., Tan, K.-L., and Zhang, R. 2004. Approximate NN queries on streams with guaranteed error/performance bounds. In Proceedings of the International Conference on Very Large Data Bases. 804--815.]]
[25]
Kruskal, J. B. 1956. On the shortest spanning subtree of a graph and the travelling salesman problem. In Proceedings of the American Mathematical Society 7, 48--50.]]
[26]
Lin, K., Jagadish, H., and Faloutsos, C. 1995. The TV-tree: An index structure for high-dimensional data. VLDB Journal 3, 4, 517--542.]]
[27]
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symposium on Mathematical statistics and probability. University of California Press, 281--297.]]
[28]
Ooi, B. C., Tan, K. L., Yu, C., and Bressan, S. 2000. Indexing the edge: a simple and yet efficient approach to high-dimensional indexing. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 166--174.]]
[29]
Pagel, B.-U., Korn, F., and Faloutsos, C. 2000. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the International Conference on Data Engineering.]]
[30]
Ramakrishnan, R. and Gehrke, J. 2000. Database Management Systems. McGraw-Hill.]]
[31]
Sakurai, Y., Yoshikawa, M., and Uemura, S. 2000. The a-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the International Conference on Very Large Data Bases. 516--526.]]
[32]
Tao, Y., Faloutsos, C., and Papadias, D. 2003. The power-method: A comprehensive estimation technique for multi-dimensional queries. In Proceedings of the Conference on Information and Knowledge Management.]]
[33]
Traina, A., Seeger, B., and Faloutsos, C. 2000. Slim-trees: high performance metric trees minimizing overlap between nodes. In Advances in Database Technology---EDBT 2000, International Conference on Extending Database Technology, Konstanz, Germany, March 27--31, 2000, Proceedings. Lecture Notes in Computer Science, vol. 1777. Springer-Verlag, 51--65.]]
[34]
Weber, R., Schek, H., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Data Bases. 194--205.]]
[35]
White, D. and Jain, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the International Conference on Data Engineering. 516--523.]]
[36]
Yu, C., Ooi, B. C., Tan, K. L., and Jagadish, H. 2001. Indexing the distance: an efficient method to knn processing. In Proceedings of the International Conference on Very Large Data Bases. 421--430.]]
[37]
Zhang, T., Ramakrishnan, R., and Livny, M. 1996. Birch: an efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]]

Cited By

View all
  • (2025)DIMS: Distributed Index for Similarity Search in Metric SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348775937:1(210-225)Online publication date: Jan-2025
  • (2025)Learning Road Network Index Structure for Efficient Map MatchingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348519537:1(423-437)Online publication date: 1-Jan-2025
  • (2025)Geospatial indexing for sea–land navigation based on machine learningComputers and Electrical Engineering10.1016/j.compeleceng.2024.109433118:PBOnline publication date: 7-Jan-2025
  • Show More Cited By

Index Terms

  1. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 30, Issue 2
    June 2005
    328 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/1071610
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2005
    Published in TODS Volume 30, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Indexing
    2. KNN
    3. nearest neighbor queries

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)102
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)DIMS: Distributed Index for Similarity Search in Metric SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348775937:1(210-225)Online publication date: Jan-2025
    • (2025)Learning Road Network Index Structure for Efficient Map MatchingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348519537:1(423-437)Online publication date: 1-Jan-2025
    • (2025)Geospatial indexing for sea–land navigation based on machine learningComputers and Electrical Engineering10.1016/j.compeleceng.2024.109433118:PBOnline publication date: 7-Jan-2025
    • (2025)How good are multi-dimensional learned indexes? An experimental surveyThe VLDB Journal10.1007/s00778-024-00893-634:2Online publication date: 21-Jan-2025
    • (2024)FurMoLi: A Future Query Technique for Moving Objects Based on a Learned IndexMathematics10.3390/math1213203212:13(2032)Online publication date: 29-Jun-2024
    • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
    • (2024)Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modificationsJournal of Big Data10.1186/s40537-024-00973-y11:1Online publication date: 11-Aug-2024
    • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
    • (2024)DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338111136:10(5092-5105)Online publication date: 1-Oct-2024
    • (2024)A Survey of Multi-Dimensional Indexes: Past and Future TrendsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336418336:8(3635-3655)Online publication date: 1-Aug-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media