Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Woltmann, Lucas; Hartmann, Claudio; Habich, Dirk; Lehner, Wolfgang

doi:10.1007/s13222-021-00400-z

Computer Science > Databases

arXiv:2005.09367 (cs)

[Submitted on 19 May 2020]

Title:Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Authors:Lucas Woltmann (1), Claudio Hartmann (1), Dirk Habich (1), Wolfgang Lehner (1) ((1) TU Dresden)

View PDF

Abstract:Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches can deliver more accurate cardinality estimations than traditional approaches. However, a lot of example queries have to be executed during the model training phase to learn a data-dependent ML model leading to a very time-consuming training phase. Many of those example queries use the same base data, have the same query structure, and only differ in their predicates. Thus, index structures appear to be an ideal optimization technique at first glance. However, their benefit is limited. To speed up this model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-enabled training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 63 with our aggregate-enabled training phase.

Comments:	10 pages, technical report
Subjects:	Databases (cs.DB)
Cite as:	arXiv:2005.09367 [cs.DB]
	(or arXiv:2005.09367v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2005.09367
Journal reference:	Datenbank-Spektrum 22 (2022) 1-13
Related DOI:	https://doi.org/10.1007/s13222-021-00400-z

Submission history

From: Lucas Woltmann [view email]
[v1] Tue, 19 May 2020 11:24:36 UTC (1,344 KB)

Computer Science > Databases

Title:Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators