Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1316689.1316789dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Automated design of multidimensional clustering tables for relational databases

Published: 31 August 2004 Publication History

Abstract

The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2® Universal DatabaseTM (DB2 UDB) product, called multidimensional clustering (MDC), which co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This paper describes the first published model for automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. For any significant dimensionality (3 or more), the possible solution space is combinatorially complex. The automated MDC design model is based on what-if query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering (such as week of year versus month of year). We show results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM® DB2 UDB for Linux®, UNIX® and Windows® Version 8.2 release.

References

[1]
{1} "Transaction Processing Performance Council" http://www.tpc.org/default.asp.
[2]
{2} "DB2 Universal Database for Linux, UNIX and Windows" http://www-306.ibm.com/software/data/db2/udb/
[3]
{3} S. Agrawal, S. Chaudhuri, V.R. Narasayya, "Automated selection of materialized views and indexes in SQL databases". Proc. VLDB 2000, Cairo, Egypt.
[4]
{4} B. Bhattacharjee, S. Padmanabhan, T. Malkemus, T. Lai, L. Cranston, M. Huras, " Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2", Proc. VLDB 2003, Berlin, Germany.
[5]
{5} S. Chaudhuri, E. Christensen, G. Graefe, V. Narasayya, M. Zwilling, " Self-Tuning Technology in Microsoft SQL Server", IEEE Data Eng. Bul. 22(2), June 1999.
[6]
{6} S. Chaudhuri, V. Narasayya, "AutoAdmin 'What-if' Index Analysis Utility", Proc. SIGMOD, 1998, Seattle, USA.
[7]
{7} S. Chaudhuri, V. Narasayya, "Microsoft Index Tuning Wizard for SQL Server 7.0", Proc. SIGMOD, 1998, Seattle, USA.
[8]
{8} M. R. Frank, E. R. Omiecinski, S.B. Navathe, "Adaptive and Automated Index Selection in RDBMS", Proc. EDBT 1992, Vienna, Austria.
[9]
{9} P. J. Haas, J. F. Naughton, S. Seshadri, L. Stokes, "Sampling Based Estimation of the Number of Distinct Values of an Attribute", Proc. VLDB 1995, Zurich, Switzerland.
[10]
{10} P. J. Haas, L. Stokes, "Estimating the number of classes in a finite population", JASA, V. 93, Dec. 1998.
[11]
{11} S. Lightstone. D. Zilio, C. Zuzarte, G. Lohman, J. Rao, K. Cheung, "DB2 Design Advisor: More than just index selection", IDUG 2004, Orlando, USA.
[12]
{12} J. H Liou, S. B. Yao, "Multi-dimensional clustering for database organizations". Information Systems, 2:187-198, 1977.
[13]
{13} G. Lohman, G. Valentin, D. Zilio, M. Zuliani, A. Skelly, "DB2 Advisor: An optimizer smart enough to recommend its own indexes", Proc. ICDE 2000, San Diego, USA.
[14]
{14} V. Markl, F. Ramsak, R. Bayer, "Improving OLAP Performance by Multi-dimensional Hierarchical Clustering", Proc. IDEAS'99, Montreal, Canada.
[15]
{15} S. Padmanabhan, B. Bhattacharjee, T. Malkemus, L. Cranston, M. Huras, "Multi-Dimensional Clustering: A New Data Layout Scheme in DB2." SIGMOD 2003, San Diego, USA.
[16]
{16} N. Pendse, R. Creeth, "The OLAP Report", http://www.olapreport.com/.
[17]
{17} J. Rao, S. Lightstone, G. Lohman, D. Zilio, A. Storm, C. Garcia-Arellano, S. Fadden "DB2 Design Advisor: integrated automated physical database design", Proc. VLDB 2004, Toronto, Canada.
[18]
{18} J. Rao, C. Zhang, N. Megiddo, G. Lohman, "Automating physical database design in a parallel database.", Proc. SIGMOD 2002, Madison, USA.
[19]
{19} B. Schiefer, G. Valentin, "DB2 Universal Database Performance Tuning", IEEE Data Eng. Bul 22(2), June 1999.
[20]
{20} T. Stöhr, H. Märtens, E. Rahm, "Multi-Dimensional Database Allocation for Parallel Data Warehouses", Proc. VLDB 2000, Cairo, Egypt.
[21]
{21} T. Stohr, E. Rahm, "WARLOCK : A Data Allocation Tool for Parallel Warehouses", Proc. VLDB 2001, Rome, Italy (Software Demonstration).

Cited By

View all
  • (2019)An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement LearningProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300085(415-432)Online publication date: 25-Jun-2019
  • (2016)Regularized Cost-Model Oblivious Database Tuning with Reinforcement LearningTransactions on Large-Scale Data- and Knowledge-Centered Systems XXVIII - Volume 994010.1007/978-3-662-53455-7_5(96-132)Online publication date: 1-Jun-2016
  • (2012)A new tool for multi-level partitioning in teradataProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398604(2214-2218)Online publication date: 29-Oct-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
August 2004
1380 pages

Sponsors

  • VLDB Endowment: Very Large Database Endowment

Publisher

VLDB Endowment

Publication History

Published: 31 August 2004

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement LearningProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3300085(415-432)Online publication date: 25-Jun-2019
  • (2016)Regularized Cost-Model Oblivious Database Tuning with Reinforcement LearningTransactions on Large-Scale Data- and Knowledge-Centered Systems XXVIII - Volume 994010.1007/978-3-662-53455-7_5(96-132)Online publication date: 1-Jun-2016
  • (2012)A new tool for multi-level partitioning in teradataProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398604(2214-2218)Online publication date: 29-Oct-2012
  • (2007)Efficient bulk deletes for multi dimensional clustered tables in DB2Proceedings of the 33rd international conference on Very large data bases10.5555/1325851.1325988(1197-1206)Online publication date: 23-Sep-2007
  • (2004)DB2 design advisorProceedings of the Thirtieth international conference on Very large data bases - Volume 3010.5555/1316689.1316783(1087-1097)Online publication date: 31-Aug-2004

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media