Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3300096acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
extended-abstract

Scalable Reservoir Sampling on Many-Core CPUs

Published: 25 June 2019 Publication History

Abstract

Database systems need to be able to convert queries to efficient execution plans. As recent research has shown, correctly estimating cardinalities of subqueries is an important factor in the efficiency of the resulting plans [7, 8]. Many algorithms have been proposed in literature that utilize a random sample to estimate cardinalities [6, 9, 13]. Thus, some modern database systems choose to store a materialized uniformly random sample for their relations [3, 6]. Such samples are built and refreshed when statistics are gathered, by loading uniformly random tuples from the relation in disk using random IO.

References

[1]
Mohammed Al-Kateb, Byung Suk Lee, and Xiaoyang Sean Wang. 2007. Adaptive-Size Reservoir Sampling over Data Streams. In SSDBM. IEEE Computer Society, 22.
[2]
Gustavo Alonso. 2013. Hardware killed the software star. In ICDE. IEEE Computer Society, 1--4.
[3]
Surajit Chaudhuri, Eric Christensen, Goetz Graefe, Vivek R. Narasayya, and Michael J. Zwilling. 1999. Self-Tuning Technology in Microsoft SQL Server. IEEE Data Eng. Bull., Vol. 22, 2 (1999), 20--26.
[4]
Michael Greenwald. 1999. Non-blocking Synchronization and System Design . Technical Report. Stanford, CA, USA.
[5]
Viktor Leis, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference. ACM, 743--754.
[6]
Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings .
[7]
Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query optimization through the looking glass, and what we found running the Join Order Benchmark. VLDB J., Vol. 27, 5 (2018), 643--668.
[8]
Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. PVLDB, Vol. 2, 1 (2009), 982--993.
[9]
Magnus Mü ller, Guido Moerkotte, and Oliver Kolb. 2018. Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses. PVLDB, Vol. 11, 9 (2018), 1016--1028.
[10]
Peter Sanders, Sebastian Lamm, Lorenz Hü bschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. 2018. Efficient Parallel Random Sampling - Vectorized, Cache-Efficient, and Online. ACM Trans. Math. Softw., Vol. 44, 3 (2018), 29:1--29:14.
[11]
Herb Sutter. 2005. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, Vol. 30, 3 (2005), 202--210. http://www.gotw.ca/publications/concurrency-ddj.htm
[12]
Jeffrey Scott Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw., Vol. 11, 1 (1985), 37--57.
[13]
Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigü mü s, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. 1081--1092.

Cited By

View all
  • (2020)Concurrent online sampling for all, for freeProceedings of the 16th International Workshop on Data Management on New Hardware10.1145/3399666.3399924(1-8)Online publication date: 15-Jun-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Check for updates

Author Tags

  1. main memory
  2. many-core
  3. multi-core
  4. online
  5. online sampling
  6. parallel
  7. reservoir
  8. reservoir sampling
  9. sampling
  10. scalable
  11. shared memory

Qualifiers

  • Extended-abstract

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Concurrent online sampling for all, for freeProceedings of the 16th International Workshop on Data Management on New Hardware10.1145/3399666.3399924(1-8)Online publication date: 15-Jun-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media