Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1862344.1862349acmotherconferencesArticle/Chapter ViewAbstractPublication PagessisapConference Proceedingsconference-collections
research-article

Dimension reduction for distance-based indexing

Published: 18 September 2010 Publication History

Abstract

Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking of coordinate structure, mathematical tools in Rn can only be applied indirectly, making it difficult for theoretical study in metric space indexing. Toward solving this problem, we formalize a "pivot space model" where data is mapped from metric space to Rn, preserving all the pair wise distances under Linfin;. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in Rn. Further, we show the necessity of dimension reduction for Rn and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of Rn makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric-space.

References

[1]
}}Bentley, J. L., Multidimensional binary search trees used for associative searching. Commun.ACM, 1975. 18(9): p. 509--517.
[2]
}}Beyer, K. S., J. Goldstein, R. Ramakrishnan, and U. Shaft. When Is "Nearest Neighbor" Meaningful? the 7th International Conference on Database Theory. 1999: Springer-Verlag.
[3]
}}Bozkaya, T. and M. Ozsoyoglu, Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst., 1999. 24(3): p. 361--404.
[4]
}}Brin, S. Near Neighbor Search in Large Metric Spaces. in the 21th International Conference on Very Large Data Bases (VLDB'95). 1995: Morgan Kaufmann Publishers Inc.
[5]
}}Bustos, B., G. Navarro, and E. Chavez, Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn. Lett., 2003. 24(14): p. 2357--2366.
[6]
}}Camastra, F., Data dimensionality estimation methods: a survey. Pattern Recognition, 2003. 36(12): p. 2945--2954.
[7]
}}Chavez, E., G. Navarro, R. Baeza-Yates, and J. Marroqu, Searching in metric spaces. ACM Computing Surveys, 2001. 33(3): p. 273--321.
[8]
}}Ciaccia, P. and M. Patella. Bulk loading the M-tree. in 9th Australasian Database Conference (ADO'98). 1998.
[9]
}}Clarkson K. L., Nearest-neighbor searching and metric space dimensions, In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, MIT Press, 2006, pp. 15--59
[10]
}}Guttman, A., R-trees: a dynamic index structure for spatial searching, in Proceedings of the 1984 ACM SIGMOD international conference on Management of data. 1984.
[11]
}}Hjaltason, G. R. and H. Samet, Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS), 2003. 28(4): p. 517--580.
[12]
}}Hochbaum, D. S. and D. B. Shmoys, A best possible heuristic for the k-center problem. Mathematics of Operational Research, 1985. 10(2): p. 180--184.
[13]
}}Kegl, B., Intrinsic dimension estimation using packing numbers. Advances in Neural Information Processing Systems, 2003. 15: p. 681--688.
[14]
}}Mao, R., W. Xu, S. Ramakrishnan, G. Nuckolls, and D. P. Miranker. On Optimizing Distance-Based Similarity Search for Biological Databases. in the 2005 IEEE Computational Systems Bioinformatics Conference (CSB 2005). 2005.
[15]
}}Mao, R., W. Xu, N. Singh, and D. P. Miranker, An Assessment of a Metric Space Database Index to Support Sequence Homology. International Journal on Artificial Intelligence Tools (IJAIT), 2005: p. 867--885.
[16]
}}Matousek, J., Lectures on Discrete Geometry. 2002: Springer-Verlag New York, Inc. 497.
[17]
}}Test suite. http://aug.csres.utexas.edu/mobios-workload/.
[18]
}}Navarro, G. Searching in Metric Spaces by Spatial Approximation. in Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware. 1999: IEEE Computer Society.
[19]
}}Roweis, S., EM Algorithms for PCA and SPCA. Neural Information Processing Systems 10, 1997: p. 626--632.
[20]
}}Samet, H., Foundations of Multidimensional and Metric Data Structures. 2006, Morgan-Kaufmann.
[21]
}}Shaft, U. and R. Ramakrishnan. When Is Nearest Neighbors Indexable? in Tenth International Conference on Database Theory (ICDT 2005). 2005: Springer
[22]
}}Uhlmann, J. K., Satisfying General Proximity/Similarity Queries with Metric Trees. Information Processing Letter, 1991. 40(4): p. 175--179.
[23]
}}Weber, R., H. J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. in International Conference on Very Large Data Bases. 1998.
[24]
}}Xu, W. and D. P. Miranker, A Metric Model of Amino Acid Substitution. Bioinformatics, 2004. 20(8): p. 1214--1221.
[25]
}}Yianilos, P. N. Data structures and algorithms for nearest neighbor search in general metric spaces. in the fourth annual ACM-SIAM Symposium on Discrete algorithms. 1993: Society for Industrial and Applied Mathematics.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SISAP '10: Proceedings of the Third International Conference on SImilarity Search and APplications
September 2010
130 pages
ISBN:9781450304207
DOI:10.1145/1862344
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Bilkent University: Bilkent University
  • Mexican Computer Science Society

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimension reduction
  2. intrinsic dimension
  3. metric space
  4. pivot selection
  5. pivot space model
  6. similarity query

Qualifiers

  • Research-article

Funding Sources

Conference

SISAP '10
Sponsor:
  • Bilkent University

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Calculating Fourier Transforms in SQLAdvances in Databases and Information Systems10.1007/978-3-030-28730-6_10(151-166)Online publication date: 13-Aug-2019
  • (2018)A comparison of pivot selection techniques for permutation-based indexingInformation Systems10.1016/j.is.2015.01.01052:C(176-188)Online publication date: 30-Dec-2018
  • (2017)High-Dimensional Simplexes for Supermetric SearchSimilarity Search and Applications10.1007/978-3-319-68474-1_7(96-109)Online publication date: 28-Sep-2017
  • (2014)Speed Up Distance-Based Similarity Query Using Multiple Threads2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming10.1109/PAAP.2014.54(215-219)Online publication date: Jul-2014
  • (2014)Excluded Middle Forest Versus Vantage Point Tree: An Analytical and Empirical ComparisonPractical Applications of Intelligent Systems10.1007/978-3-642-54927-4_41(431-437)Online publication date: 19-Jul-2014
  • (2013)Building an Information Retrieval System: Global Indexing or Local Indexing?Software Engineering and Applications10.12677/SEA.2013.2100202:01(6-14)Online publication date: 2013
  • (2013)Pivot Selection Strategies for Permutation-Based Similarity SearchProceedings of the 6th International Conference on Similarity Search and Applications - Volume 819910.1007/978-3-642-41062-8_10(91-102)Online publication date: 2-Oct-2013
  • (2011)Multivariate regression for pivot selection: A preliminary study2011 3rd Symposium on Web Society10.1109/SWS.2011.6101281(115-121)Online publication date: Oct-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media