Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458110acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A novel optimization approach to efficiently process aggregate similarity queries in metric access methods

Published: 26 October 2008 Publication History

Abstract

A similarity query considers an element as the query center and searches a dataset to find either the elements far up to a bounding radius or the k nearest ones from the query center. Several algorithms have been developed to efficiently execute similarity queries. However, there are queries that require more than one center, which we call Aggregate Similarity Queries. Such queries appear when the user gives multiple desirable examples, and requests data elements that are similar to all of the examples, as in the case of applying relevance feedback. Here we give the first algorithms that can handle aggregate similarity queries on Metric Access Methods (MAM) such as the M-tree and Slim-tree. Our method, which we call Metric Aggregate Similarity Search (MASS) has the following properties: (a) it requires only the triangle inequality property; (b) it guarantees no false-dismissals, as we prove that it lower-bounds the aggregate distance scores; (c) it can work with any MAM; (d) it can handle any number of query centers, which are either scattered all over the space or concentrated on a restricted region. Experiments on both real and synthetic data show that our method scales on both the number of elements and, if the dataset is in a spatial domain, also on its dimensionality. Moreover, it achieves better results than previous related methods.

References

[1]
A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. University of California, Irvine, http://archive.ics.uci.edu/ml/.
[2]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Int'l Conf. on Very Large Databases (VLDB), pages 426--435, Athens, Greece, 1997.
[3]
J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders. The Amsterdam library of object images. Int'l Journal of Computer Vision, 61(1):103--112, 2005.
[4]
G. R. Hjaltason and H. Samet. Ranking in spatial databases. In Int'l Symposium on Advances in Spatial Databases (SSD), pages 83--95, Portland, Maine, 1995.
[5]
G. R. Hjaltason and H. Samet. Distance browsing in spatial databases. ACM Transactions on Database Systems (TODS), 24(2):265--318, 1999.
[6]
F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast nearest neighbor search in medical image databases. In Int'l Conf. Very Large Databases (VLDB), pages 215--226, San Francisco, 1996.
[7]
D. Papadias, Y. Tao, K. Mouratidis, and C. K. Hui. Aggregate nearest neighbor queries in spatial databases. ACM Transactions on Database Systems (TODS), 30(2):529--576, 2005.
[8]
H. L. Razente, M. C. N. Barioni, A. J. M. Traina, and C. T. Jr. Aggregate similarity queries in relevance feedback methods for content-based image retrieval. In ACM Symposium on Applied Computing (SAC), pages 869--874, Fortaleza, Brazil, 2008.
[9]
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In ACM Int'l Conf. on Management of Data (SIGMOD), pages 71--79, San Jose, CA, 1995.
[10]
H. Samet. Depth-first k-nearest neighbor finding using the maxnearestdist estimator. In IEEE Int'l Conf. on Image Analysis and Processing (ICIAP)}, pages 486--491, Mantova, Italy, 2003.
[11]
T. Seidl and H.-P. Kriegel. Optimal multi-step k-nearest neighbor search. In ACM Int'l Conf. on Management of Data (SIGMOD)}, pages 154--165, Seattle, Washington, 1998.
[12]
M. Tasan and Z. M. Ozsoyoglu. Improvements in distance-based indexing. In Int'l Conf. on Scientific and Statistical Database Management (SSDBM), page 161, Washington, DC, 2004. IEEE Computer Society.
[13]
C. Traina-Jr., A. J. M. Traina, C. Faloutsos, and B. Seeger. Fast indexing and visualization of metric datasets using Slim-trees. IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(2):244--260, 2002.
[14]
L. Wu, C. Faloutsos, K. Sycara, and T. R. Payne. Falcon: Feedback adaptive loop for content-based retrieval. In Int'l Conf. on Very Large Databases (VLDB)}, pages 297--306, Cairo, Egypt, 2000.
[15]
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer, 2005.

Cited By

View all
  • (2022)Storing data once in M-trees and PM-treesInformation Systems10.1016/j.is.2021.101896104:COnline publication date: 9-Apr-2022
  • (2020) On Efficiently Monitoring Continuous Aggregate k Nearest Neighbors in Road Networks IEEE Transactions on Mobile Computing10.1109/TMC.2019.291195019:7(1664-1676)Online publication date: 1-Jul-2020
  • (2019)Enhanced Privacy Preserving Group Nearest Neighbor SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2930696(1-1)Online publication date: 2019
  • Show More Cited By

Index Terms

  1. A novel optimization approach to efficiently process aggregate similarity queries in metric access methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. access methods
    2. aggregate dissimilarity
    3. similarity queries

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Storing data once in M-trees and PM-treesInformation Systems10.1016/j.is.2021.101896104:COnline publication date: 9-Apr-2022
    • (2020) On Efficiently Monitoring Continuous Aggregate k Nearest Neighbors in Road Networks IEEE Transactions on Mobile Computing10.1109/TMC.2019.291195019:7(1664-1676)Online publication date: 1-Jul-2020
    • (2019)Enhanced Privacy Preserving Group Nearest Neighbor SearchIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2930696(1-1)Online publication date: 2019
    • (2019)Storing Data Once in M-tree and PM-treeSimilarity Search and Applications10.1007/978-3-030-32047-8_2(18-31)Online publication date: 2-Oct-2019
    • (2018)Aggregate k Nearest Neighbor Queries in Metric SpacesWeb and Big Data10.1007/978-3-319-96893-3_24(317-333)Online publication date: 19-Jul-2018
    • (2016)Monochromatic and bichromatic reverse top-k group nearest neighbor queriesExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.01.01253:C(57-74)Online publication date: 1-Jul-2016
    • (2016)Exact and approximate flexible aggregate similarity searchThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0418-x25:3(317-338)Online publication date: 1-Jun-2016
    • (2015)Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.247574027:12(3258-3273)Online publication date: 1-Dec-2015
    • (2015)Nearest neighborhood search in spatial databases2015 IEEE 31st International Conference on Data Engineering10.1109/ICDE.2015.7113326(699-710)Online publication date: Apr-2015
    • (2015)Flexible Aggregate Similarity Search in High-Dimensional Data SetsProceedings of the 8th International Conference on Similarity Search and Applications - Volume 937110.1007/978-3-319-25087-8_2(15-28)Online publication date: 12-Oct-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media