Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
column

Towards efficient indexing of arbitrary similarity: vision paper

Published: 16 July 2013 Publication History

Abstract

The popularity of similarity search expanded with the increased interest in multimedia databases, bioinformatics, or social networks, and with the growing number of users trying to find information in huge collections of unstructured data. During the exploration, the users handle database objects in different ways based on the utilized similarity models, ranging from simple to complex models. Efficient indexing techniques for similarity search are required especially for growing databases.
In this paper, we study implementation possibilities of the recently announced theoretical framework SIMDEX, the task of which is to algorithmically explore a given similarity space and find possibilities for efficient indexing. Instead of a fixed set of indexing properties, such as metric space axioms, SIMDEX aims to seek for alternative properties that are valid in a particular similarity model (database) and, at the same time, provide efficient indexing. In particular, we propose to implement the fundamental parts of SIMDEX by means of the genetic programming (GP) which we expect will provide highquality resulting set of expressions (axioms) useful for indexing.

References

[1]
C. Beecks, M. S. Uysal, and T. Seidl. Signature quadratic form distance. In Proc. ACM International Conference on Image and Video Retrieval, pages 438--445, 2010.
[2]
E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Comp. Surveys, 33(3):273--321, 2001.
[3]
N. L. Cramer. A representation for the adaptive generation of simple sequential programs. In Proc. of the 1st Int. Conf. on Genetic Algorithms, pages 183--187. L. Erlbaum Associates Inc., USA, 1985.
[4]
R. Cummins and C. O'Riordan. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif. Intell. Rev., 28:51--68, 2007.
[5]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proc. of the 6th conf. on Symp. on Oper. Systems Design & Impl., USA, 2004.
[6]
H. Fang and C. Zhai. An exploration of axiomatic approaches to information retrieval. In SIGIR, pages 480--487. ACM, 2005.
[7]
R. K. France. Weights and Measures: an Axiomatic Approach to Similarity Computations. Technical report, 1995.
[8]
J. Galgonek, D. Hoksza, and T. Skopal. SProt: sphere-based protein structure similarity algorithm. Proteome Science, 9:1--12, 2011.
[9]
M. L. Hetland. Ptolemaic indexing. arXiv:0911.4384 {cs.DS}, 2009.
[10]
J. R. Koza. Genetic programming. MIT Press, Cambridge, MA, USA, 1992.
[11]
J. LokoČ, M. Hetland, T. Skopal, and C. Beecks. Ptolemaic indexing of the signature quadratic form distance. In Similarity Search and Applications, pages 9--16. ACM, 2011.
[12]
Y. Lv and C. Zhai. Lower-bounding term frequency normalization. In Proc. of the 20th ACM Int. Conf. on Information and knowledge management, CIKM '11, pages 7--16, New York, NY, USA, 2011. ACM.
[13]
C. Macdonald, N. Tonellotto, and I. Ounis. On upper bounds for dynamic pruning. In Proc. of the 3rd Int. Conf. on Advances in information retrieval theory, ICTIR'11, pages 313--317. Springer-Verlag, 2011.
[14]
H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., USA, 2005.
[15]
T. Skopal. Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Transactions on Database Systems, 32(4):1--46, 2007.
[16]
T. Skopal and T. Bartoš. Algorithmic Exploration of Axiom Spaces for Efficient Similarity Search at Large Scale. In Similarity Search and Applications, LNCS, 7404, pages 40--53. Springer, 2012.
[17]
T. Skopal and B. Bustos. On nonmetric similarity search problems in complex domains. ACM Comp. Surv., 43:1--50, 2011.
[18]
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of molecular biology, 147:195--197, 1981.
[19]
D. Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65--85, 1994.
[20]
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach. Advances in Database Systems. Springer-Verlag, USA, 2005.

Cited By

View all
  • (2014)Real-Time Exploration of Multimedia CollectionsDatabases Theory and Applications10.1007/978-3-319-08608-8_18(198-205)Online publication date: 2014
  • (2013)Universal indexing of arbitrary similarity modelsProceedings of the VLDB Endowment10.14778/2536274.25363246:12(1392-1397)Online publication date: 1-Aug-2013
  • (2013)Designing Similarity Indexes with Parallel Genetic ProgrammingProceedings of the 6th International Conference on Similarity Search and Applications - Volume 819910.1007/978-3-642-41062-8_29(294-299)Online publication date: 2-Oct-2013
  1. Towards efficient indexing of arbitrary similarity: vision paper

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 42, Issue 2
    May 2013
    64 pages
    ISSN:0163-5808
    DOI:10.1145/2503792
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 July 2013
    Published in SIGMOD Volume 42, Issue 2

    Check for updates

    Qualifiers

    • Column

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Real-Time Exploration of Multimedia CollectionsDatabases Theory and Applications10.1007/978-3-319-08608-8_18(198-205)Online publication date: 2014
    • (2013)Universal indexing of arbitrary similarity modelsProceedings of the VLDB Endowment10.14778/2536274.25363246:12(1392-1397)Online publication date: 1-Aug-2013
    • (2013)Designing Similarity Indexes with Parallel Genetic ProgrammingProceedings of the 6th International Conference on Similarity Search and Applications - Volume 819910.1007/978-3-642-41062-8_29(294-299)Online publication date: 2-Oct-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media