Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

R* optimizer validation and performance evaluation for local queries

Published: 15 June 1986 Publication History

Abstract

Few database query optimizer models have been validated against actual performance. This paper presents the methodology and results of a thorough validation of the optimizer and evaluation of the performance of the experimental distributed relational database management system R*, which inherited and extended to a distributed environment the optimization algorithms of System R. Optimizer estimated costs and actual R* resources consumed were written to database tables using new SQL commands, permitting automated control from SQL application programs of test data collection and reduction. A number of tests were run over a wide variety of dynamically-created test databases, SQL queries, and system parameters. The results for single-table access, sorting, and local 2-table joins are reported here. The tests confirmed the accuracy of the majority of the I/O cost model, the significant contribution of CPU cost to total cost, and the need to model CPU cost in more detail than was done in System R. The R* optimizer now retains cost components separately and estimates the number of CPU instructions, including those for applying different kinds of predicates. The sensitivity of I/O cost to buffer space motivated the development of more detailed models of buffer utilization unclustered index scans and nested-loop joins often benefit from pages remaining in the buffers, whereas concurrent scans of the data pages and the index pages for multiple tables during joins compete for buffer share. Without an index on the join column of the inner table, the optimizer correctly avoids the nested-loop join, confirming the need for merge-scan joins. When the join column of the inner is indexed, the optimizer overestimates the cost of the nested-loop join, whose actual performance is very sensitive to three parameters that are extremely difficult to estimate (1) the join (result) cardinality, (2) the outer table's cardinality, and (3) the number of buffer pages available to store the inner table. Suggestions are given for improved database statistics, prefetch and page replacement strategies for the buffer manager, and the use of temporary indexes and Bloom filters (hashed semijoins) to reduce access of unneeded data.

References

[1]
P M G Apers, A R Hevner, and S B Yao, Optnntzmg Algorithms for Dtstnbuted Queries, IEEE Trans on Software Engmeermg SE-9 (January 1983) pp 57-68]]
[2]
M M Astrahan, M Schkolmck, and W Kun, Performance of the System R Access Path Selecuon Mechamsm, Informatton Processing 80 (1980) pp 487-491]]
[3]
P A Bernstem, N Goodman, E Wong, C L Reeve, J Rothme, Query Processing m a System for D#stnbuted Databases (SDD-1), ACM Trans on Database Systems6,4 (December 1981) pp 602-625]]
[4]
M W Blasgen and K P Eswaren, Storage and Access m RelaUonal Data Bases, IBM Systems Journal 16,4 (1977) Also available as "On the EvaluaUon of Queries m a RelaUonal Data Base System", IBM Research Report RJ1745, San Jose, CA, April 1976]]
[5]
B H Bloom, Space/Trine Trade-offs m Hash Codmg wRh Allowable Errors, Commumcatlons of the A CM 13,7 (July 1970) pp 422-426]]
[6]
K Bratbergsengen, Hashmg Methods and RelaUonal Algebra Operations, Procs of the Tenth lnternattonal Conf on Very Large Data Bases (Smgapore, 1984) pp 323-333]]
[7]
M J Carey and H Lu, Some Expertmental Results on Dtstnbuted Jom Algorithms m a Local Network, Procs of the Elewnth International Conf on Very Large Data Bases (Stockholm, Sweden, August 1985)]]
[8]
D D Chamberlm, M M Astrahan, W F Kmg, R A Lone, J W Mehl, T G Price, M Sehkolmck, P Gnfflths Selmger, D R Slutz, B W Wade, and R A Yost, Support for RepeUttve TransacUons and Ad Hoc Queries m System R, A CM Trans on Database Systems 6,1 (March 1981) pp 70-94]]
[9]
J-M Chang, A Heurtstlc Approach to Distributed Query Processmg, Procs of the Etghth lnternattonal Conf on Very Large Data Bases (Mexico City, VLDB Endowment, Saratoga, pp 54-61]]
[10]
H-T Chou and D J DeWltt, An Evaluatmn of Buffer Management Strategtes for Relauonal Database Systems, Procs of the Eleventh International Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 127-141]]
[11]
W W Chu and P Hurley, Optunal Query Processing for D#stnbuted Database Systems, IEEE Trans on Computers C-31 (September 1982) pp 835-850]]
[12]
D Darnels, P G Selmger, L M Haas, B G Lmdsay, C Mohan, A Walker, and P Wdms, An Introducuon to D#stnbuted Query Compdataon m R*, Procs Second International Conf on D#trtbuted Databases (Berlm, September 1982) Also avatlable as IBM Research Report RJ3497, San Jose, CA, June 1982]]
[13]
D J DeWltt, R H Katz, F Olken, L D Shapiro, M R Stonebraker, and D Wood, Implementation Techmques for Mare Memory Database Systems, Proc SIGMOD 84 (Boston, MA, May 1984) pp 1-8]]
[14]
D J DeWltt and R Gerber, Multi-Processor Hash- Based Join Algortthms, Procs of the Eleventh Internatzonal Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 151-164]]
[15]
Effelsberg, W and Haerder, T, Principles of Database Buffer Management, A CM Trans Database Systems 9 (December 1984) pp 560-595]]
[16]
R Epstein, M Stonebraker, and E Wong, Distributed Query Processing m a Relational Data Base System, Procs ofACM-SIGMOD (Austm,TX, May 1978) pp 169-180]]
[17]
R Epstein and M Stonebraker, Analysis of Distributed Data Base Processing Strategies, Procs of the Szxth lnternatwnal Conf on Very Large Data Bases (Montreal,IEEE, October 1980) pp 92-101]]
[18]
S Fmkelstem, M Schkolmck, and P Tlbeno, DBDSGN -- a Physical Database Design Tool for System R, IEEE Database Engineering 5,1 (1982) pp 228-235]]
[19]
Graphtcal Data D#splay Manager Presentatwn Graphzcs Feature Interactive Chart Utzhty User's GuMe, Release 4, IBM Reference Manual SC33-0111-3 (IBM Corp, October 1984)]]
[20]
A R Hevner and S B Yao, Query Processing m Dlstnbuted Database Systems, IEEE Trans Software Engineering SE-5 (May 1979) pp 177-187]]
[21]
L Kerschberg, P D Tmg, and S B Yao, Query Optmuzauon m Star Computer Networks, A CM Trans on Database Systems 7,4 (December 1982) pp 678-711]]
[22]
G M Lohman, D Darnels, L M Haas, R gastler, P G Sehnger, OptlmtzaUon of Nested Quenes m a Distributed Relational Database, Procs of the Tenth lnternatwnal Conf on Very Large Data Bases (Singapore, 1984) pp 403-415 Also avadable as IBM Research Report RJ4260, San Jose, CA, Aprd 1984]]
[23]
G M Lohman, C Mohan, L M Haas, B G Lmdsay, P G Sehnger, P F Wllms, and D Darnels, Query Processing m R*, Query Processing m Database Systems, Sprmger-Verlag (Klm, Batory, & Remer (eds), 1985) pp 31-47 Also available as IBM Research Report R.14272, San Jose, CA, April 1984]]
[24]
L F Mackert and G M Lohman, Index Scans using a Flmte LRU Buffer A Validated I/O Model, IBM Research Report RJ4836 (Almaden Research Center, San Jose, CA,]]
[25]
L F Mackert and G M Lohman, R* Optmmmzer Validation and Performance Evaluation for Distributed Quenes, IBM Research Report RJ5050 (Almaden Research Center, San jose, CA,]]
[26]
E Onuegbe, S Rahum, and A R Hevner, Local Query Translation and Optmuzation m a Dmtnbuted System, Procs NCC 1983 (July 1983) pp 229-239]]
[27]
F P Palermo, A Data Base Search Problem, Informatwn Systems COINS IV (Procs of COINS-72 Symp) (Mmnu, FL, 1974) pp 67-101 Plenum Press, JuhusT Tou, ed]]
[28]
RDT Relational Design Tool, IBM Reference Manual SH20-6415 (IBM Corp, June 1984)]]
[29]
G M Sacco and M Schkolmck, A Mechamsm for Managing the Buffer Pool m a Relational Database System using the Hot Set Model, Procs of the Eighth International Conf on Very Large Data Bases (Mexico City, 1982) pp 257-262]]
[30]
M Schkolmck and P Ttbeno, Considerations m Developing a Design Tool for a Relational DBMS, Proc IEEE COMPSAC Conf (Cbacago, November 1979) pp 228-235]]
[31]
M Schkolmck and P Tlbeno, Estimating the Cost of Updates m a Relational Database, A CM Tram on Database Systems 10,2 (June 1985) pp 163-179 Also available as IBM Research Report RJ3327, San Jose, CA, 1981]]
[32]
P G Selmger, M M Astrahan, D D Chamberhn, R A Lone, and T G Price, Access Path Selection m a Relational Database Management System, Procs of ACM-SIGMOD 20 (1979) pp 23-34]]
[33]
P G Sehnger and M Adiba, Access Path Selection m Dtstnbuted Database Management Systems, Procs lnternatwnal Conf on Data Bases (Deen and Hammersly, ed, Umv of Abet pp 204-215]]
[34]
D G Severance and G M Lohman, Differential Fdes Their Apphcation to the Maintenance of Large Databases, ACM Trans on Database Systems 1,3 (September 1976) pp 256-267]]
[35]
SQL/Data System Application Programming for VM/ System Product, Release 3, IBM Reference Manual SH24-5068-0 (IBM Corp, December 1984)]]
[36]
M Stonebraker, Retrospection on a Data Base System, ACM Trans on Database Systems 5,2 (June 1980) pp225-240]]
[37]
M Stonebraker, Operating System Support for Database Management, Commumcatwns of the A CM 24 (July 1981) pp 412-418]]
[38]
M Stonebraker, J V#oodfdl, J Ranstrom, M Murphy, J Kalash, M Carey, K Arnold, Performance Analysts of Distributed Data Base Systems, Database Engineermg 5 (IEEE Computer Society, December 1982) pp 58-65]]
[39]
Network Program Products Planning (MVS, VSE, and VM), IBM Reference Manual SC23-0110-1 (IBM Corp, April 1985)]]
[40]
E Wong and K Youssefl, Decomposition E a Strategy for Query Processing, A CM Tram on Database Systems 1,3 (September 1976) pp 223-241]]
[41]
S B Yao, Optmnzation of Query Algorithms, ACM Trans on Database Systems 4,2 (June 1979) pp 133-155]]
[42]
C T Yu, and C C Chang, On the Design of a Query Processing Strategy m a D#stnbuted Database Enwronment, Proc SIGMOD 83 (San Jose, CA, May 1983) pp 30-39]]

Cited By

View all
  • (2024)Techniques Used in Time Series Databases and Their Internals2024 9th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA61153.2024.10607196(218-226)Online publication date: 16-Mar-2024
  • (2024)Energy consumption estimation and profiling for queries in distributed database systems based on a bottom-up comprehensive energy modelFuture Generation Computer Systems10.1016/j.future.2024.04.059159(379-394)Online publication date: Oct-2024
  • (2023)Time-limited Bloom FilterProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577791(1285-1288)Online publication date: 27-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 15, Issue 2
June 1986
407 pages
ISSN:0163-5808
DOI:10.1145/16856
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '86: Proceedings of the 1986 ACM SIGMOD international conference on Management of data
    June 1986
    407 pages
    ISBN:0897911911
    DOI:10.1145/16894
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 1986
Published in SIGMOD Volume 15, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)37
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Techniques Used in Time Series Databases and Their Internals2024 9th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA61153.2024.10607196(218-226)Online publication date: 16-Mar-2024
  • (2024)Energy consumption estimation and profiling for queries in distributed database systems based on a bottom-up comprehensive energy modelFuture Generation Computer Systems10.1016/j.future.2024.04.059159(379-394)Online publication date: Oct-2024
  • (2023)Time-limited Bloom FilterProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577791(1285-1288)Online publication date: 27-Mar-2023
  • (2022)PreKar: A learned performance predictor for knowledge graph storesWorld Wide Web10.1007/s11280-022-01033-226:1(321-341)Online publication date: 23-Mar-2022
  • (2018)SPARTIProceedings of the International Workshop on Semantic Big Data10.1145/3208352.3208356(1-6)Online publication date: 10-Jun-2018
  • (2018)Query optimization through the looking glass, and what we found running the Join Order BenchmarkThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0480-727:5(643-668)Online publication date: 1-Oct-2018
  • (2018)WORQ: Workload-Driven RDF Query ProcessingThe Semantic Web – ISWC 201810.1007/978-3-030-00671-6_34(583-599)Online publication date: 8-Oct-2018
  • (2018)SemijoinEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_706(3436-3440)Online publication date: 7-Dec-2018
  • (2018)Distributed JoinEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_705(1191-1196)Online publication date: 7-Dec-2018
  • (2017)Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL QueriesInternational Journal of Data Warehousing and Mining10.4018/IJDWM.201707010313:3(47-72)Online publication date: 1-Jul-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media