Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Similarity search on Bregman divergence: towards non-metric indexing

Published: 01 August 2009 Publication History

Abstract

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how state-of-the-art tree-based indexing methods, for low to moderate dimensional datasets, and vector approximation file (VA-file) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distribution-based index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the R-trees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions.

References

[1]
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6):891--923, 1998.
[2]
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. Journal of Machine Learning Research, 6:1705--1749, 2005.
[3]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD Conference, pages 322--331, 1990.
[4]
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, pages 97--104, 2006.
[5]
S. Boltz, E. Debreuve, and M. Barlaud. High dimensional statistical distance for region-of-interest tracking: Application to combining a soft geometric constraint with radiometry. In CVPR, 2007.
[6]
L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(3):200--217, 1967.
[7]
L. Cayton. Fast nearest neighbor retrieval for bregman divergences. In ICML, pages 112--119, 2008.
[8]
L. Chen and X. Lian. Efficient similarity search in nonmetric spaces with local constant embedding. IEEE Trans. Knowl. Data Eng., 20(3):321--336, 2008.
[9]
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi. Vector approximation based indexing for non-uniform high dimensional data sets. In CIKM, pages 202--209, 2000.
[10]
J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3(3):209--226, 1977.
[11]
J. Goldberger, S. Gordon, and H. Greenspan. An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In ICCV, pages 487--493, 2003.
[12]
R. M. Gray, A. Buzo, A. H. Gray, and Y. Matsuyama. Disotrotion measures for speech processing. IEEE Transaction on Acoustics, Speech, and Signal Processing, 28(4).
[13]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD Conference, pages 47--57, 1984.
[14]
F. Itakura and S. Saito. A statistical method for estimation of speech spectral density and formant frequencies. Electronics and Communications in Japan, 53:36--43, 1970.
[15]
H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+ -tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005.
[16]
B. Kulis, M. Sustik, and I. Dhillon. Learning low-rank kernel matrices. In ICML, pages 505--512, 2006.
[17]
B. Long, Z. M. Zhang, and P. S. Yu. Graph partitioning based on link distributions. In AAAI, pages 578--583, 2007.
[18]
G. McLachlan and D. Peel. Finite Mixture Models. Wiley-Interscience, 2000.
[19]
F. Nielsen and R. Nock. On approximating the smallest enclosing bregman balls. In Symposium on Computational Geometry, pages 485--486, 2006.
[20]
F. C. N. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In ACL, pages 183--190, 1993.
[21]
J. Puzicha, Y. Rubner, C. Tomasi, and J. M. Buhmann. Empirical evaluation of dissimilarity measures for color and texture. In ICCV, pages 1165--1172, 1999.
[22]
N. Rasiwasia, P. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE Transaction on Multimedia, 9(5).
[23]
D. Suciu and N. N. Dalvi. Foundations of probabilistic answers to queries. In SIGMOD Conference, 2005.
[24]
J. Tang, X.-S. Hua, G.-J. Qi, Y. Song, and X. Wu. Video annotation based on kernel linear neighborhood propagation. IEEE Transaction on Multimedia, 4(10):620--628, 2008.
[25]
A. K. H. Tung, R. Zhang, N. Koudas, and B. C. Ooi. Similarity search: A matching based approach. In VLDB, pages 631--642, 2006.
[26]
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 2, Issue 1
August 2009
1293 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009
Published in PVLDB Volume 2, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case StudySimilarity Search and Applications10.1007/978-3-030-32047-8_7(72-85)Online publication date: 2-Oct-2019
  • (2018)PigeonringProceedings of the VLDB Endowment10.14778/3275536.327553912:1(28-42)Online publication date: 1-Sep-2018
  • (2017)Distance-Based Index Structures for Fast Similarity SearchCybernetics and Systems Analysis10.1007/s10559-017-9966-y53:4(636-658)Online publication date: 1-Jul-2017
  • (2015)A Locality Sensitive Hashing Filter for Encrypted Vector DatabasesFundamenta Informaticae10.5555/2751298.2751305137:2(291-304)Online publication date: 1-Apr-2015
  • (2015)Indefinite proximity learningNeural Computation10.1162/NECO_a_0077027:10(2039-2096)Online publication date: 1-Oct-2015
  • (2015)GFilter: A General Gram Filter for String Similarity SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.234991427:4(1005-1018)Online publication date: 1-Apr-2015
  • (2013)The Bregman variational dual-tree frameworkProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3023638.3023641(22-31)Online publication date: 11-Aug-2013
  • (2013)Learning to prune in metric and non-metric spacesProceedings of the 26th International Conference on Neural Information Processing Systems - Volume 110.5555/2999611.2999787(1574-1582)Online publication date: 5-Dec-2013
  • (2013)Asymmetric signature schemes for efficient exact edit similarity query processingACM Transactions on Database Systems10.1145/2508020.250802338:3(1-44)Online publication date: 5-Sep-2013
  • (2013)Engineering Efficient and Effective Non-metric Space LibraryProceedings of the 6th International Conference on Similarity Search and Applications - Volume 819910.1007/978-3-642-41062-8_28(280-293)Online publication date: 2-Oct-2013
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media