Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/375663.375724acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Applying the golden rule of sampling for query estimation

Published: 01 May 2001 Publication History

Abstract

Query size estimation is crucial for many database system components. In particular, query optimizers need efficient and accurate query size estimation when deciding among alternative query plans. In this paper we propose a novel sampling technique based on the golden rule of sampling, introduced by von Neumann in 1947, for estimating range queries. The proposed technique randomly samples the frequency domain using the cumulative frequency distribution and yields good estimates without any a priori knowledge of the actual underlying distribution of spatial objects. We show experimentally that the proposed sampling technique gives smaller approximation error than the Min-Skew histogram based and wavelet based approaches for both synthetic and real datasets. Moreover, the proposed technique can be easily extended for higher dimensional datasets.

References

[1]
Swarup Acharya, Viswanath Poosala, and Sridhar Ramaswamy. Selectivity estimation in spatial databases. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania, USA, pages 13-24. ACM Press, 1999.
[2]
H. Ahrens and U. Dieter. Sequential random sampling. ACM Transaction Mathematical Software, 11(2):157 - 169, June 1985.
[3]
Surajit Chandhuri, Rajeev Motwani, and Vivek Narasyya. On Random Sampling over Joins. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pages 263-274, Philadelphia, Pennsylvania, June 1999.
[4]
Computational Science Education Project. Introduction to Monte Carlo Methods. http://csep1.phy.ornl.gov/mc/mc.html, 1995.
[5]
Roger Eckhardt. Stan Ulam, John Von Neumann, and the Monte Carlo Method. Los Alamos Science, (15, Spacial Issue):135, 1987.
[6]
P. B. Gibbons, V. Poosala, S. Acharya, Y. Matias Y. Bartal, S. Muthukrishnan, S. Ramaswamy, and T. Suel. AQUA: System and Techniques for Approximate Query Answering. Technical report, Bell Laboratories, Murray Hill, N/I, February 1998.
[7]
Phillip B. Gibbons and Yossi Matias. New sampling-based summary statistics for improving approximate query answers. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA, pages 331-342. ACM Press, 1998.
[8]
Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, and Ramakrishnan Srikant. Range Queries in OLAP Data Cubes. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pages 73-88, Arizona, 1997.
[9]
Yannis E. Ioannidis and Viswanath Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pages 233-244, March 1995.
[10]
Richard J. Lipton, Jeffrey F. Nanghton, and Donovan A. Schneider. Practical selectivity estimation through adaptive sampling. In Proceedings of 1990 ACM SIGMOD international conference on Management of data, pages 1 - 11, 1990.
[11]
Yossi Matias, Jeffrey Scott Vitter, and Min Wang. Wavelet-Based Histograms for Selectivity Estimation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 448-459, Seattle, 1998.
[12]
V. Poosala, Y. E. Ioannidis, P.J. Haas, and E. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, May 1996.
[13]
Viswanath Poosala. Private communication. 2000.
[14]
T. R. Smith. A digital library for geographically referenced materials. Computers, 29(5):54-60, May 1996.
[15]
TIGER. 1997 TIGER/Line Files (machine-readable data files). Technical report, U.S. Bureau Of the Census, Washington, DC, 1997.
[16]
Jeffrey Scott Vitter, Min Wang, and Bala Iyer. Data Cube Approximation and Histograms via Wavelets. In Proceedings of Seventh International Conference on Information and Knowledge Management - CIKM'98, pages 96-104, Bethesda, Maryland, November 1998.
[17]
Yi-Leh Wu, Divyakant Agrawal, and Amr E1 Abbadi. Applying the Golden Rule of Sampling for Query Estimation. Technical Report TRCS01-05, Computer Science Department, University of California at Santa Barbara, Santa Barbara, California, March 2001.
[18]
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.

Cited By

View all
  • (2023)Interpolation and Prediction of Spatiotemporal XML Data Integrated With Grey Dynamic ModelUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch011(193-210)Online publication date: 15-Dec-2023
  • (2021)MOSE: A Monotonic Selectivity Estimator Using Learned CDFIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3112753(1-1)Online publication date: 2021
  • (2019)Selectivity estimation for range predicates using lightweight modelsProceedings of the VLDB Endowment10.14778/3329772.332978012:9(1044-1057)Online publication date: 1-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cumulative frequency distribution
  2. query estimation
  3. random sampling
  4. range query

Qualifiers

  • Article

Conference

SIGMOD/PODS01
Sponsor:

Acceptance Rates

SIGMOD '01 Paper Acceptance Rate 44 of 293 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Interpolation and Prediction of Spatiotemporal XML Data Integrated With Grey Dynamic ModelUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch011(193-210)Online publication date: 15-Dec-2023
  • (2021)MOSE: A Monotonic Selectivity Estimator Using Learned CDFIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3112753(1-1)Online publication date: 2021
  • (2019)Selectivity estimation for range predicates using lightweight modelsProceedings of the VLDB Endowment10.14778/3329772.332978012:9(1044-1057)Online publication date: 1-May-2019
  • (2017)Interpolation and Prediction of Spatiotemporal Data Based on XML Integrated with Grey Dynamic ModelISPRS International Journal of Geo-Information10.3390/ijgi60401136:4(113)Online publication date: 7-Apr-2017
  • (2011)The VC-dimension of SQL queries and selectivity estimation through samplingProceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II10.5555/2034117.2034160(661-676)Online publication date: 5-Sep-2011
  • (2011)The VC-dimension of SQL queries and selectivity estimation through samplingProceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II10.1007/978-3-642-23783-6_42(661-676)Online publication date: 5-Sep-2011
  • (2009)Fast Query Point Movement Techniques for Large CBIR SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2008.18821:5(729-743)Online publication date: 1-May-2009
  • (2009)Survey on Query Estimation in Data Streams2009 IEEE International Advance Computing Conference10.1109/IADCC.2009.4809224(1417-1422)Online publication date: Mar-2009
  • (2009)A new approach to building histogram for selectivity estimation in query processing optimizationComputers & Mathematics with Applications10.1016/j.camwa.2008.10.05657:6(1037-1047)Online publication date: 1-Mar-2009
  • (2007)Selectivity estimation by batch-query based histogram and parametric methodProceedings of the eighteenth conference on Australasian database - Volume 6310.5555/1273730.1273741(93-102)Online publication date: 30-Mar-2007
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media