Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Divide-and-conquer scheme for strictly optimal retrieval of range queries

Published: 30 November 2009 Publication History

Abstract

Declustering distributes data among parallel disks to reduce retrieval cost using I/O parallelism. Many schemes were proposed for single copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties were proposed. It is computationally expensive to verify optimality of replication schemes designed for range queries and existing schemes verify optimality for up to 50 disks. In this article, we propose a novel method to find replicated declustering schemes that render all spatial range queries optimal. The proposed scheme uses threshold based declustering, divisibility of large queries for optimization and optimistic approach to compute maximum flow. The proposed scheme is generic and works for any number of dimensions. Experimental results show that using 3 copies there exist allocations that render all spatial range queries optimal for up to 750 disks in 2 dimensions and with the exception of several values for up to 100 disks in 3 dimensions. The proposed scheme improves search for strictly optimal replicated declustering schemes significantly and will be a valuable tool to answer open problems on replicated declustering.

References

[1]
Abdel-Ghaffar, K. A. S. and El Abbadi, A. 1997. Optimal allocation of two-dimensional data. In Proceedings of the International Conference on Database Theory (ICDT). 409--418.
[2]
Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In Proceedings of the International Conference on Very Large Databases (VLDB). 329--338.
[3]
Antoshenkov, G. 1995. Byte-aligned bitmap compression. In Proceedings of the Data Compression Conference.
[4]
Atallah, M. J. and Prabhakar, S. 2000. (Almost) optimal parallel block access for range queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 205--215.
[5]
Beckmann, N., Kriegel, H., Schneider, R., and Seeger, B. 1990. The R* tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 322--331.
[6]
Berchtold, S., Bohm, C., Braunmuller, B., Keim, D. A., and Kriegel, H.-P. 1997. Fast parallel similarity search in multimedia databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 1--12.
[7]
Bhatia, R., Sinha, R. K., and Chen, C. 2000. Hierarchical declustering schemes for range queries. In Proceedings of the International Conference on Extending Database Technology (EDBT). 525--537.
[8]
Chen, C., Bhatia, R., and Sinha, R. 2000. Declustering using golden ratio sequences. In Proceedings of the International Conference on Data Engineering (ICDE). 271--280.
[9]
Chen, C. and Cheng, C. T. 2002. From discrepancy to declustering: Near optimal multidimensional declustering strategies for range queries. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 29--38.
[10]
Chen, C.-M. and Cheng, C. 2003. Replication and retrieval strategies of multidimensional data on parallel disks. In Proceedings of the Conference on Information and Knowledge Management (CIKM'03).
[11]
Chen, L. T. and Rotem, D. 1994. Optimal response time retrieval of replicated data. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 36--44.
[12]
Ciaccia, P. and Veronesi, A. 1996. Dynamic declustering methods for parallel grid files. In Proceedings of the 3rd International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O. 110--123.
[13]
Du, H. C. and Sobolewski, J. S. 1982. Disk allocation for cartesian product files on multiple-disk systems. ACM Trans. Datab. Syst. 7, 1, 82--101.
[14]
Faloutsos, C. and Bhagwat, P. 1993. Declustering using fractals. In Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems. 18--25.
[15]
Faloutsos, C. and Metaxas, D. 1989. Declustering using error correcting codes. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 253--258.
[16]
Fan, C., Gupta, A., and Liu, J. 1994. Latin cubes and parallel array access. In Proceedings of the 8th International Parallel Processing Symposium.
[17]
Ferhatosmanoglu, H., Agrawal, D., and Abbadi, A. E. 1999. Concentric hyperspaces and disk allocation for fast parallel range searching. In Proceedings of the International Conference on Data Engineering (ICDE). 608--615.
[18]
Ferhatosmanoglu, H., Tosun, A. S., Canahuate, G., and Ramachandran, A. 2006. Efficient parallel processing of range queries through replicated declustering. J. Distrib. Parall. Datab. 20, 2, 117--147.
[19]
Ferhatosmanoglu, H., Tosun, A. S., and Ramachandran, A. 2004. Replicated declustering of spatial data. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 125--135.
[20]
Frikken, K. 2005. Optimal distributed declustering using replication. In Proceedings of the 10th International Conference on Database Theory (ICDT'05). 144--157.
[21]
Frikken, K., Atallah, M., Prabhakar, S., and Safavi-Naini, R. 2002. Optimal parallel i/o for range queries through replication. In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA). 669--678.
[22]
Gaede, V. and Gunther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 170--231.
[23]
Ghandeharizadeh, S. and DeWitt, D. J. 1990a. Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines. In Proceedings of the International Conference onVery Large Databases (VLDB). 481--492.
[24]
Ghandeharizadeh, S. and DeWitt, D. J. 1990b. A multiuser performance analysis of alternative declustering strategies. In Proceedings of the International Conference on Data Engineering (ICDE). 466--475.
[25]
Ghandeharizadeh, S. and DeWitt, D. J. 1992. A performance analysis of alternative multi-attribute declustering strategies. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 29--38.
[26]
Gray, J., Horst, B., and Walker, M. 1990. Parity striping of disc arrays: Low-cost reliable storage with acceptable throughput. In Proceedings of the International Conference on Very Large Databases (VLDB). 148--161.
[27]
Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 47--57.
[28]
Hua, K. A. and Young, H. C. 1997. A general multidimensional data allocation method for multicomputer database systems. In Proceedings of the International Conference on Database and Expert System Applications. 401--409.
[29]
Kim, K. and Prasanna-Kumar, V. K. 1993. Latin squares for parallel array access. IEEE Trans. Parall. Distrib. Syst. 4, 4, 361--370.
[30]
Kim, M. H. and Pramanik, S. 1988. Optimal file distribution for partial match retrieval. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Chicago, 173--182.
[31]
Koyuturk, M. and Aykanat, C. 2005. Iterative-improvement-based declustering heuristics for multi-disk databases. Inform. Syst. 30, 9, 47--70.
[32]
Li, J., Srivastava, J., and Rotem, D. 1992. CMD: a multidimensional declustering method for parallel database systems. In Proceedings of the International Conference on Very Large Databases (VLDB). Vancouver, Canada, 3--14.
[33]
Liu, D. and Wu, M. 2001. A hypergraph based approach to declustering problems. Distr. Paral. Datab. 10, 3.
[34]
Lovasz, L. and Plummer, M. 1986. Matching Theory. North-Holland.
[35]
Moon, B., Acharya, A., and Saltz, J. 1996. Study of scalable declustering algorithms for parallel grid files. In Proceedings of the Parallel Processing Symposium.
[36]
Prabhakar, S., Abdel-Ghaffar, K., Agrawal, D., and El Abbadi, A. 1998. Cyclic allocation of two-dimensional data. In Proceedings of the International Conference on Data Engineering (ICDE). 94--101.
[37]
Prabhakar, S., Agrawal, D., and El Abbadi, A. 1998. Efficient disk allocation for fast similarity searching. In Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'98). 78--87.
[38]
Samet, H. 1989. The Design and Analysis of Spatial Structures. Addison Wesley.
[39]
Sanders, P., Egner, S., and Korst, K. 2000. Fast concurrent access to parallel disks. In Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms.
[40]
Shektar, S. and Liu, D. 1996. Partitioning similarity graphs: A framework for declustering problems. Inform. Syst. 21, 4.
[41]
Sinha, R. K., Bhatia, R., and Chen, C. 2001. Asymptotically optimal declustering schemes for range queries. In Proceedings of the 8th International Conference on Database Theory. Lecture Notes in Computer Science. Springer, 144--158.
[42]
Stockinger, K. 2002. Bitmap indices for speeding up high-dimensional data analysis. In Proceedings of the 13th International Conference on Database and Expert Systems Applications. Springer-Verlag, 881--890.
[43]
Tosun, A. S. 2004. Replicated declustering for arbitrary queries. In Proceedings of the 19th ACM Symposium on Applied Computing. 748--753.
[44]
Tosun, A. S. 2005a. Constrained declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 232--237.
[45]
Tosun, A. S. 2005b. Design theoretic approach to replicated declustering. In Proceedings of the International Conference on Information Technology Coding and Computing. 226--231.
[46]
Tosun, A. S. 2005c. Threshold based declustering in high dimensions. In Proceedings of the International Conference on Database and Expert Systems Applications. 818--827.
[47]
Tosun, A. S. 2006. Efficient retrieval of replicated data. J. Distrib. Parall. Datab. 19, 2-3, 107--124.
[48]
Tosun, A. S. 2007a. Analysis and comparison of replicated declustering schemes. IEEE Trans. Parall. Distrib. Syst. 18, 11, 1578--1591.
[49]
Tosun, A. S. 2007b. Threshold-based declustering. Inform. Sci. 177, 5, 1309--1331.
[50]
Tosun, A. S. and Ferhatosmanoglu, H. 2002. Optimal parallel I/O using replication. In Proceedings of the International Workshops on Parallel Processing (ICPP). Vancouver, Canada, 506--513.
[51]
Wu, K., Otoo, E., and Shoshani, A. 2002. Compressing bitmap indexes for faster search operations. In Proceedings of the International Conference on Statistical and Scientific Database Management (SSDBM). 99--108.

Cited By

View all
  • (2013)Query-Log Aware Replicated DeclusteringIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2012.11324:5(987-995)Online publication date: 1-May-2013
  • (2010)Research of Distributed Parallel Information Retrieval Based on JPPFProceedings of the 2010 International Conference of Information Science and Management Engineering - Volume 0110.1109/ISME.2010.31(109-111)Online publication date: 7-Aug-2010

Recommendations

Reviews

David Gary Hill

The most common query type in a large-say a terabyte (TB) or larger-database, such as a relational or spatial database, is a range query. In a range query, a user specifies a range of values for each dimension of interest within a dataset. The user receives an output result of "the set of items in the dataset that have values within the specified range for each dimension." When the efficient retrieval of all the requested items is a challenge, the time it takes to finish retrieving all of the items from a range query might be unacceptably long. Previous research has focused on efficient retrieval structures and methods that use input/output (I/O) parallelism, which involves storage techniques that access data from multiple disks. Numerous declustering schemes-that distribute data among parallel disks-have been proposed. This paper discusses overcoming the limitations of single-copy declustering schemes, and proposes using replication to achieve optimal queries. Tosun claims that the proposed replicated declustering scheme is "generic and works for any number of dimensions." The paper covers in mathematical detail the proposed scheme. Tosun includes a couple of examples that show how this approach might be integrated with commercial applications. Readers who have a technical interest in speeding up range queries for databases may find this paper useful. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 5, Issue 3
November 2009
153 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/1629075
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2009
Accepted: 01 March 2009
Revised: 01 March 2009
Received: 01 May 2008
Published in TOS Volume 5, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Declustering
  2. number theory
  3. parallel I/0
  4. replication
  5. spatial range query
  6. threshold

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Query-Log Aware Replicated DeclusteringIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2012.11324:5(987-995)Online publication date: 1-May-2013
  • (2010)Research of Distributed Parallel Information Retrieval Based on JPPFProceedings of the 2010 International Conference of Information Science and Management Engineering - Volume 0110.1109/ISME.2010.31(109-111)Online publication date: 7-Aug-2010

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media