Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Similarity query processing using disk arrays

Published: 01 June 1998 Publication History

Abstract

Similarity queries are fundamental operations that are used extensively in many modern applications, whereas disk arrays are powerful storage media of increasing importance. The basic trade-off in similarity query processing in such a system is that increased parallelism leads to higher resource consumptions and low throughput, whereas low parallelism leads to higher response times. Here, we propose a technique which is based on a careful investigation of the currently available data in order to exploit parallelism up to a point, retaining low response times during query processing. The underlying access method is a variation of the R*-tree, which is distributed among the components of a disk array, whereas the system is simulated using event-driven simulation. The performance results conducted, demonstrate that the proposed approach outperforms by factors a previous branch-and-bound algorithm and a greedy algorithm which maximizes parallelism as much as possible. Moreover, the comparison of the proposed algorithm to a hypothetical (non-existing) optimal one (with respect to the number of disk accesses) shows that the former is on average two times slower than the latter.

References

[1]
N. Beckmann, H.P. Kriegel and B. Seeger: "The R*- tree: an Efficient and Robust Method for Points and Rectangles", Proceedings of the 1990 ACM SIGMOD Conference, pp.322-331, Atlantic City, NJ, 1990.
[2]
A. Belussi and C. Faloutsos: "Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension", Proceedings of the 21th VLDB Conference, pp.299-310, Zurich, Switzerland, 1995.
[3]
S. Berchtold, D. Keim and H.-P. Kriegel: "The X-tree: An Index Structure for High-Dimensional Data", Proceedings of the 1996 VLDB Conference, Bombay, India, 1996.
[4]
S. Berchtold, C. Boehm, D.A. Keim and H.-P. Kriegel: "A Cost Model for Nearest Neighbor Search in High- Dimensional Data Space", Proceedings of the 17th A CM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97), Tucson, AZ, 1997.
[5]
S. Berchtold, C. Boehm, B. Braunmueller, D. A. Keim and H.-P. Kriegel: "Fast Parallel Similarity Search in Multimedia Databases", Proceedings of the 1997 ACM SIGMOD Conference, pp.l-12, Tucson, AZ, 1997.
[6]
P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz and D.A. Patterson: "RAID: High-Performance, Reliable Secondary Storage", A CM Computing Surveys, vol.26, no.2, pp.145-185, 1994.
[7]
C. Faloutsos and I. Kamel: "Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension", Proceedings of the 13th A CM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '94), pp.4-13, Minneapolis, MN, 1994.
[8]
C. Faloutsos, M. Ranganathan and Y. Manolopoulos: "Fast Subsequence Matching in Time-Series Databases", Proceedings of the 1994 A CM SIGMOD Conference, pp.419-429, Minneapolis, 1994.
[9]
J.H. Friedman, J.L. Bentley and R.A. Finkel: "An Algorithm for Finding the Best Matches in Logarithmic Expected Time", A CM Transactions on Mathematical Software, vol.3, pp.209-226, 1977.
[10]
A. Guttman: "R-trees: a Dynamic Index Structure for Spatial Searching", Proceedings of the 1984 ACM SIG- MOD Conference, pp.47-57, Boston, MA, 1984.
[11]
I. Kamel and C. Faloutsos: "Parallel R-trees", Proceedings of the 1992 A CM SIGMOD Conference, pp.195- 204, 1992.
[12]
I. Kamel and C. Faloutsos: "Hilbert R-tree: an Improved R-tree Using Fractals", Proceedings of the 20th VLDB Conference, pp.500-509, Santiago, Chile, 1994.
[13]
N. Katayama and S. Satoh: "The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries", Proceedings of the 1997 A CM SIGMOD Conference, pp.369-380, Tucson, AZ, 1997.
[14]
K. Lin, H.V. Jagadish and C. Faloutsos: "The TV-tree: An Index Structure for High Dimensional Data", The VLDB Journal, vol.3, pp.517-542, 1995.
[15]
Y. Manolopoulos: "Probability Distributions for Seek Time Evaluation, information Sciences, vol.60, no.l-2, pt).29-40, 1992.
[16]
B.U. Pagel, H.W. Six, H. Toben and P. Widmayer: "Towards an Analysis of Range Query Performance in Spatial Data Structures", Proceedings of the 12th ACM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '93), pp.214-221, Washington DC, 1993.
[17]
A.N. Papadopoulos and Y. Manolopoulos: "Performance of Nearest Neighbor Queries in R-trees", Proceedings of the 6th International Conference on Database Theory (ICDT 97), pp.394-408, Delphi, Greece, January 1997.
[18]
D.A. Patterson, G. Gibson and R.H. Katz: "A Case for Redundant Arrays of Inexpensive Disks (RAID)", Proceedings of the 1988 A CM SIGMOD Conference, pp.109-116, Chicago, IL, 1988.
[19]
N. Roussopoulos, S. Kelley and F. Vincent: "Nearest Neighbor Queries", Proceedings of the 1995 ACM SIG- MOD Conference, pp.71-79, San Jose, CA, 1995.
[20]
C. Ruemmler and J. Wlkes: "An Introduction to Disk Drive Modeling", IEEE Computer, vol.27, no.3, 1994.
[21]
B. Seeger and P.A. Larson: "Multi-Disk B-trees", Proceedings of the 1992 A CM SIGMOD Conference, pp.436-445, Denver, Colorado, 1991.
[22]
T. Sellis, N. Roussopoulos and C. Faloutsos: "The R+- tree: a Dynamic Index for Multidimensional Objects", Proceedings of the 13th VLDB Conference, pp.507-518, Brighton, UK, 1987.
[23]
M. Stonebraker, J. Frew, K. Gardels and J. Meredith: "The Sequoia 2000 Storage Benchmark", Proceedings of the 1993 A CM SIGMOD Conference, pp. 2-11, Washington, DC, 1993.
[24]
Y. Theodoridis and T. Sellis: "A Model for the Prediction of R-tree Performance", Proceedings of the 15th A CM SIGA CT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '96), Montreal, Canada, 1996.
[25]
TIGER/Line Files, 1994 Technical Documentation / prepared by the Bureau of the Census, Washington, DC, 1994.
[26]
D. White and R. Jain: "Similarity Indexing with the SS-tree", Proceedings of the 12th International Conference on Data Engineering (ICDE'96), New Orleans, LO, 1996.
[27]
Y. Zhou, S. Shekhar and M. Coyle: "Disk Allocation Methods for Parallelizing grid files", Proceedings of the l Oth international Conference on Data Engineering, pp.243-252, Houston, TX, 1994.

Cited By

View all
  • (2022)Efficient parallel processing of high-dimensional spatial kNN queriesSoft Computing10.1007/s00500-022-07081-026:22(12291-12316)Online publication date: 2-May-2022
  • (2014)Efficient and robust large medical image retrieval in mobile cloud computing environmentInformation Sciences: an International Journal10.1016/j.ins.2013.10.013263(60-86)Online publication date: 1-Apr-2014
  • (2012)Bandwidth-Aware Medical Image Retrieval in Mobile Cloud Computing NetworkWeb-Age Information Management10.1007/978-3-642-32281-5_32(322-333)Online publication date: 2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 27, Issue 2
June 1998
595 pages
ISSN:0163-5808
DOI:10.1145/276305
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data
    June 1998
    599 pages
    ISBN:0897919955
    DOI:10.1145/276304
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1998
Published in SIGMOD Volume 27, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)22
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Efficient parallel processing of high-dimensional spatial kNN queriesSoft Computing10.1007/s00500-022-07081-026:22(12291-12316)Online publication date: 2-May-2022
  • (2014)Efficient and robust large medical image retrieval in mobile cloud computing environmentInformation Sciences: an International Journal10.1016/j.ins.2013.10.013263(60-86)Online publication date: 1-Apr-2014
  • (2012)Bandwidth-Aware Medical Image Retrieval in Mobile Cloud Computing NetworkWeb-Age Information Management10.1007/978-3-642-32281-5_32(322-333)Online publication date: 2012
  • (2008)Multi-query Optimization for Distributed Similarity Query ProcessingProceedings of the 2008 The 28th International Conference on Distributed Computing Systems10.1109/ICDCS.2008.58(639-646)Online publication date: 17-Jun-2008
  • (2000)A cost model for query processing in high dimensional data spacesACM Transactions on Database Systems10.1145/357775.35777625:2(129-178)Online publication date: 1-Jun-2000
  • (2000)Parallel Index StructuresAdvanced Database Indexing10.1007/978-1-4419-8590-3_11(219-234)Online publication date: 2000
  • (2000)Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-FilesResearch and Advanced Technology for Digital Libraries10.1007/3-540-45268-0_8(83-92)Online publication date: 17-Nov-2000
  • (2017)Efficient Parallel Processing for KNN QueriesProceedings of the 2017 International Conference on Industrial Design Engineering10.1145/3178264.3178289(88-94)Online publication date: 29-Dec-2017
  • (2015)A concurrent k-NN search algorithm for R-treeProceedings of the 8th Annual ACM India Conference10.1145/2835043.2835050(123-128)Online publication date: 29-Oct-2015
  • (2006)Efficient parallel processing for K-nearest-neighbor search in spatial databasesProceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V10.1007/11751649_5(39-48)Online publication date: 8-May-2006
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media