Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/170035.170055acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment

Published: 01 June 1993 Publication History

Abstract

This paper proposes a novel strategy for estimating the size of the resulting relation after an equi-join and selection using a regression model. An approximating series representing the underlying data distribution and dependency is derived from the actual data. The proposed method provides an instant and accurate size estimation by performing an evaluation of the series, with no run-time overheads in page faults and space, and with negligible CPU overhead. In contrast, the popular sampling methods incur run-time overheads in page faults (for sampling), CPU time and space. These overheads of sampling methods increase the response time of processing a query. The results of a comprehensive experimental study are also reported, which demonstrate that the estimation accuracy by the proposed method is comparable with that of the sampling methods which are believed to provide the most accurate estimation. The proposed method seems ideal for retrieval-intensive database and information systems. Since the overheads involved in deriving the approximating series are fairly moderate, we believe that this method is also an extremely competent method when moderate or periodical updates are present.

References

[1]
Ahrens, J.H. and Dieter, U., "Extensions of Forsythe's Method for Random Sampling from the Normal Distribution". Math. Compu., 27, 124 (Oct. 1973), pp. 92%937.
[2]
Christodoulakis, S. "Estimating Record Selectivities", Inf. Syst. 8, 2 1983, pp. 105-115.
[3]
Christodoulakis, S. "Estimating Block Transfers and Join Sizes". In Proceedings of the ACM SIGMOD Conference (May). ACM, New York, 1983, pp. 40- 54.
[4]
Christodoulakis, S. "Implications of Certain Assumptions in Database Performance Evaluation', ACM Trans. On Database System, Voi.9, No. 2, pp. 163-186, June, 1984.
[5]
Gerard P. Weeg, Georgia B. Reed, "Introduction to Numerical Analysis", Blaisdell Publishing Company, 1966, pp. 63-72.
[6]
Haas, P. and Swami, A., "Sequential sampling procedures for query size estimation", ACM SIGMOD, 1992, pp. 341-350.
[7]
Hou, Wen-chi, Ozsoyoglu, G., and Dogdu, E., "Error-Constrained Count Query Evaluation in Relational Databases", Proc. ACM SIGMOD, 1991, pp. 279-287.
[8]
Hou, Wen-chi, Ozsoyoglu, G., B. Taneja, "Statistical Estimator for Relational Algebra Expression", Proc. of ACM SIGMOD, Austin, TX, 1988, pp. 278-287.
[9]
Hou, Wen-chi, Ozsoyoglu, G., B. Taneja, "Processing Aggregate Relational Queries with Hard Time Constraints", ACM SIGMOD, Portland, OR, 1989, pp. 165-172.
[10]
Hou, Wen-chi, Ozsoyoglu, G., "Statistical Estimators lot Aggregate Relational Algebra Queries", Acm TODS, 16, 4 (December), 1991, pp. 600-654.
[11]
Jarke, M. and Koch, 3., "Query optimization in database systems", ACM Computing Surveys, Vol 16, 1984, pp. 111-152.
[12]
Marvin J. Karson, "Multivariate Statistical Methods", The Iowa State University Press, 1982.
[13]
R. Kooi. "The optimization of queries in relational database systems". PhD thesis, Case Western University, Cleveland, Ohio, 1980.
[14]
Lefons, E., Silvestri, A. and Tangorra, F., "An Analytic Approach to Statistical Databases", Proc. of VLDB, Firenze 1983, pp. 260-274,.
[15]
L'Ecuyer, P. and Cote, S. 1991. "Implementing a Random Number Package with Splitting Facilities." ACM Trans. on Mathematical Software, pp. 98-111
[16]
Ling, Y. and Sun, W., "A Supplement to Sampling- Based Methods for Query Size Estimation in a Database System", ACM SIGMOD RECORD, Dec. 19921 pp.12-15.
[17]
Richard J. Lipton, Jeffrey F. Naughton, and Donovan A. Schneider, " Practical Selectivity Estimation through Adaptive Sampling", In Proc. of ACM SIG- MOD, 1990, pp. 1-12.
[18]
Luk, W. S. and Black, P. A., "On Cost Estimation in Processing a Query in a Distributed Database System'# Proc. of the IEEE 5th COMSAC, Chicago, IL, Nov. 1981, pp. 24-32.
[19]
C. Lynch, "Selectivity estimation and query optimization in large databases with highly skewed distributions of column values". In proc. Fourteenth VLDB, August 1988, pp. 240-251.
[20]
M. :I. Maron, "Numerical Analysis# A practical approach", Macmillan Publishing Company, 1987.
[21]
Michael V. Mannino, eg al., "Statistical Profile Estimation in Database System", ACM Computing Surveys, Vol. 20, No 3, September, 1988, pp. 191-221.
[22]
M. Muralikrishna and D. DeWitt. "Statistical Profile estimation in database system", Computing Surveys, 20(3) Sept. 1988, pp. 191-221.
[23]
Tommaso Mostardi, "Estimating The Size of Relational SP~:I Operation Results: An Analytic Approach", Infor. Syst., Vol. 15, 1990, pp. 591-601.
[24]
Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., and Price, T., "Access Path Selection in a Relational Database Management System", In Proc. of ACM SIGMOD, San :Jose, CA, 1979, pp. 23-34.
[25]
Walton, C., "Four Types of Data Skew and Their Effect on Parallel 3oin Performance", TR-90-12, Computer Science Dept., Univ. of Texas at Austin, 1990.
[26]
Wolf, 3., Dias, D., Yu, P., and Turek, J., "A Parallel Hash-:loin Algorithm for Managing Data Skew", Tech Report RC 16489, IBM Watson Center, 1991.
[27]
Yao, S. 1977. "Approximating block accesses in database organizations". Commun. ACM 20, 4 (Apr.), pp. 260-261.
[28]
Yu, C. and Chang, C., "Distributed Query Processing"# ACM Computing Survey, 1984, pp. 399- 433.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data
June 1993
566 pages
ISBN:0897915925
DOI:10.1145/170035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS93

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)21
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation GraphsACM SIGMOD Record10.1145/3604437.360445852:1(94-102)Online publication date: 8-Jun-2023
  • (2022)Accurate summary-based cardinality estimation through the lens of cardinality estimation graphsProceedings of the VLDB Endowment10.14778/3529337.352933915:8(1533-1545)Online publication date: 22-Jun-2022
  • (2016)Cost EstimationEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_857-2(1-7)Online publication date: 8-Dec-2016
  • (2009)Estimating Join and Projection Selectivity FactorsProceedings of the European Computing Conference10.1007/978-0-387-85437-3_27(287-297)Online publication date: 28-Feb-2009
  • (2008)Multi-join algorithm utilizing sublattice of a minimal number of double indices2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT)10.1109/ICADIWT.2008.4664348(513-519)Online publication date: Aug-2008
  • (2007)A grid-based infrastructure for distributed retrievalProceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries10.5555/2392444.2392463(161-173)Online publication date: 16-Sep-2007
  • (2007)Selectivity estimation by batch-query based histogram and parametric methodProceedings of the eighteenth conference on Australasian database - Volume 6310.5555/1273730.1273741(93-102)Online publication date: 30-Mar-2007
  • (2007)Selectivity estimation of range queries based on data density approximation via cosine seriesData & Knowledge Engineering10.1016/j.datak.2007.05.00363:3(855-878)Online publication date: 1-Dec-2007
  • (2007)A Grid-Based Infrastructure for Distributed RetrievalResearch and Advanced Technology for Digital Libraries10.1007/978-3-540-74851-9_14(161-173)Online publication date: 2007
  • (2006)Estimating query result sizes for proxy caching in scientific database federationsProceedings of the 2006 ACM/IEEE conference on Supercomputing10.1145/1188455.1188562(102-es)Online publication date: 11-Nov-2006
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media