Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Index scans using a finite LRU buffer: a validated I/O model

Published: 01 September 1989 Publication History

Abstract

Indexes are commonly employed to retrieve a portion of a file or to retrieve its records in a particular order. An accurate performance model of indexes is essential to the design, analysis, and tuning of file management and database systems, and particularly to database query optimization. Many previous studies have addressed the problem of estimating the number of disk page fetches when randomly accessing k records out of N given records stored on T disk pages. This paper generalizes these results, relaxing two assumptions that usually do not hold in practice: unlimited buffer and unique records for each key value. Experiments show that the performance of an index scan is very sensitive to buffer size limitations and multiple records per key value. A model for these more practical situations is presented and a formula derived for estimating the performance of an index scan. We also give a closed-form approximation that is easy to compute. The theoretical results are validated using the R* distributed relational database system. Although we use database terminology throughout the paper, the model is more generally applicable whenever random accesses are made using keys.

References

[1]
ASTRAHAN, M. M., SCHKOLNICK, M., AND KIM, W. Performance of the System R access path selection mechanism. Inf. Process. 80 (1980), 487-491.
[2]
BITTON, D., AND DEWITT, D.J. Duplicate record elimination in large data files. A CM Trans. Database Syst. 8 (June 1983), 255-265.
[3]
CARDENAS, A.F. Analysis and performance of inverted data base structures. Commun. A CM 18 (May 1975), 253-263.
[4]
CHEUNG, T.-Y. Estimating block accesses and number of records in file management. Commun. ACM 25 (July 1982), 484-487.
[5]
CHOU, H.-T., AND DEWITT, D.J. An evaluation of buffer management strategies for relational database systems. In Proceedings of Eleventh International Conference on Very Large Data Bases (Stockholm, Sept. 1985), pp. 127-141.
[6]
CHRISTODOULAKIS, S. Estimating block transfers and join sizes. In Proceedings of the A CM- SIGMOD Conference (San Jose, Calif., May 1983). ACM, New York, 1983, pp. 40-54.
[7]
CHRISTODOULAKIS, S. Estimating block selectivities. Inf. Syst. 1 (1984), 69-79.
[8]
CHRISTODOULAKIS, S. Implications of certain assumptions in database performance evaluation. ACM Trans. Database Syst. 9 (June 1984), 163-186.
[9]
COMER, D. The ubiquitous B-tree. ACM Comput. Surv. 11 (June 1979), 121-137.
[10]
EFFELSBERG, W., AND HAERDER, T. Principles of database buffer management. ACM Trans. Database Syst. 9 (Dec. 1984), 560-595.
[11]
IJBEMA, A., AND BLANKEN, S. Estimating bucket accesses: A practical approach. In Proceedings of the International Conference on Data Engineering (Los Angeles, Calif., Feb. 1986), pp. 30-37.
[12]
LOHMAN, G. M., ET AL. Query processing in R*. In Query Processing in Database Systems, Springer-Verlag, (also available as IBM Res. Rep. RJ4272, April 1984).
[13]
LUK, W.S. On estimating block accesses in database organizations. Commun. ACM 11 (1983), 945-947.
[14]
MACKERT, L. F., AND LOHMAN, G.M. Index scans using a finite LRU buffer: A validated I/O Model. IBM Res. Rep. RJ4836, San Jose, Calif., Sept. 1985.
[15]
MACKERT, L. F., AND LOHMAN, G.M. R* optimizer validation and performance evaluation for local queries. In Proceedings of ACM-SIGMOD Conference (Washington, D.C., May 1986), pp. 84-95.
[16]
MACKERT, L. F., AND LOHMAN, G.M. R* optimizer validation and performance evaluation for distributed queries. In Proceedings of Twelfth International Conference on Very Large Data Bases (Kyoto, Aug. 1986), pp. 149-159.
[17]
SACCO, G. M., AND SCHKOLNICK, M. A mechanism for managing the buffer pool in a relational database system using the hot set model. In Proceedings of the Eighth International Conference on Very Large Data Bases (Mexico City, 1982), pp. 257-262.
[18]
SELINGER, P. G., ET AL. Access path selection in a relational database management system, in Proceedings of ACM-SIGMOD Conference (1979). ACM, New York, 1979.
[19]
STONEBRAKER, M. Operating system support for database management. Commun. ACM 24 (July 1981), 412-418.
[20]
VANDER ZANDEN, B. T., TAYLOR, H. M., AND BITTON, D. Estimating block accesses when attributes are correlated. In Proceedings of the Twelfth International Conference on Very Large Data Bases (Kyoto, Aug. 1986), pp. 119-127.
[21]
WATERS, S.J. Hit ratios. Computer J. 19 (1976), 21-24.
[22]
WHANG, K.-Y., WIEDERHOLD, G., AND SAGALOWICZ, D. Estimating block accesses in database organizations: A closed noniterative formula. Commun. Au~ zo (l~ov. 1983), 940-944.
[23]
YAO, S.B. Approximating block accesses in database organizations. Commun. ACM 20 {April 1977), 260-261.

Cited By

View all
  • (2014)Defining Energy Consumption Plans for Data Querying ProcessesProceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud Computing10.1109/BDCloud.2014.109(641-647)Online publication date: 3-Dec-2014
  • (2012)Making cost-based query optimization asymmetry-awareProceedings of the Eighth International Workshop on Data Management on New Hardware10.1145/2236584.2236588(24-32)Online publication date: 21-May-2012
  • (2011)A revisit of fault class hierarchies in general boolean specificationsACM Transactions on Software Engineering and Methodology10.1145/2000791.200079720:3(1-11)Online publication date: 26-Aug-2011
  • Show More Cited By

Index Terms

  1. Index scans using a finite LRU buffer: a validated I/O model

                        Recommendations

                        Reviews

                        Fazli Can

                        Most database optimizers try to estimate the I/O cost of a query in order to choose the cheapest access path (index or sequential scan) using information available during compile time. Studies on the I/O cost of random accesses using indexes usually rely on two assumptions: unlimited buffer and unique records for each key value. This paper studies the problem while relaxing these assumptions. The authors first introduce the problem with a good literature review and provide terminology, which improves the readability of the paper. In the main body of the paper, they derive an iterative formula for I/O cost in terms of the number of page fetches from disk to a finite LRU buffer when scanning an unclustered index. Next, they derive two easy-to-compute approximations of this formula that are applicable in the typical situation of low duplication of index keys. Numerous experiments validate the theoretical results. In the experiments, the authors compare their I/O cost estimates with the actual I/O cost using the R* distributed relational DBMS, with the System R estimate, and with another useful estimate. In the conclusion of the paper, the authors provide a list of problems as a pointer to future research. These problems are the effects of (a) data values drawn from a nonuniform distribution, (b) records not randomly placed on pages, and (c) a variable number of records per page. This well-written paper is useful for researchers working on physical database design and performance evaluation. Its results are applicable whenever random accesses are made using indexes.

                        Access critical reviews of Computing literature here

                        Become a reviewer for Computing Reviews.

                        Comments

                        Information & Contributors

                        Information

                        Published In

                        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                        Publisher

                        Association for Computing Machinery

                        New York, NY, United States

                        Publication History

                        Published: 01 September 1989
                        Published in TODS Volume 14, Issue 3

                        Permissions

                        Request permissions for this article.

                        Check for updates

                        Qualifiers

                        • Article

                        Contributors

                        Other Metrics

                        Bibliometrics & Citations

                        Bibliometrics

                        Article Metrics

                        • Downloads (Last 12 months)139
                        • Downloads (Last 6 weeks)12
                        Reflects downloads up to 15 Oct 2024

                        Other Metrics

                        Citations

                        Cited By

                        View all
                        • (2014)Defining Energy Consumption Plans for Data Querying ProcessesProceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud Computing10.1109/BDCloud.2014.109(641-647)Online publication date: 3-Dec-2014
                        • (2012)Making cost-based query optimization asymmetry-awareProceedings of the Eighth International Workshop on Data Management on New Hardware10.1145/2236584.2236588(24-32)Online publication date: 21-May-2012
                        • (2011)A revisit of fault class hierarchies in general boolean specificationsACM Transactions on Software Engineering and Methodology10.1145/2000791.200079720:3(1-11)Online publication date: 26-Aug-2011
                        • (2011)FlagRemoverACM Transactions on Software Engineering and Methodology10.1145/2000791.200079620:3(1-33)Online publication date: 26-Aug-2011
                        • (2011)Reducing the effort of bug report triageACM Transactions on Software Engineering and Methodology10.1145/2000791.200079420:3(1-35)Online publication date: 26-Aug-2011
                        • (2011)PrIMeACM Transactions on Software Engineering and Methodology10.1145/2000791.200079220:3(1-42)Online publication date: 26-Aug-2011
                        • (2010)Online monitoring and visualisation of database structural deteriorationInternational Journal of Autonomic Computing10.1504/IJAC.2010.0330111:3(297-323)Online publication date: 1-May-2010
                        • (2008)Diagnosing Estimation Errors in Page Counts Using Execution FeedbackProceedings of the 2008 IEEE 24th International Conference on Data Engineering10.1109/ICDE.2008.4497510(1013-1022)Online publication date: 7-Apr-2008
                        • (2007)Construction of tree-based indexes for level-contiguous buffering supportProceedings of the 12th international conference on Database systems for advanced applications10.5555/1783823.1783866(361-373)Online publication date: 9-Apr-2007
                        • (2007)Towards Adaptive Costing of Database Access MethodsProceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop10.1109/ICDEW.2007.4401030(469-477)Online publication date: 17-Apr-2007
                        • Show More Cited By

                        View Options

                        View options

                        PDF

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader

                        Get Access

                        Login options

                        Full Access

                        Media

                        Figures

                        Other

                        Tables

                        Share

                        Share

                        Share this Publication link

                        Share on social media