research-article

Open access

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Authors:

Samuel MaddenAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 41, Pages 1 - 27

https://doi.org/10.1145/3588721

Published: 30 May 2023 Publication History

PDF eReader

Abstract

Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on simplified assumptions leading to ineffective cardinality estimates or build large models to understand the complicated data distributions, leading to long planning times and a lack of generalizability across queries.

In this paper, we propose a new framework FactorJoin for estimating join queries. FactorJoin combines the idea behind the classical join-histogram method to efficiently handle joins with the learning-based methods to accurately capture attribute correlation Specifically, FactorJoin scans every table in a DB and builds single-table conditional distributions during an offline preparation phase. When a join query comes, FactorJoin translates it into a factor graph model over the learned distributions to effectively and efficiently estimate its cardinality.

Unlike existing learning-based methods, FactorJoin does not need to de-normalize joins upfront or require executed query workloads to train the model. Since it only relies on single-table statistics, FactorJoin has a small space overhead and is extremely easy to train and maintain. In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods, with 40x less estimation latency, 100x smaller model size, and 100x faster training speed at comparable or better accuracy. In addition, FactorJoin can estimate 10,000 sub-plan queries within one second to optimize the query plan, which is very close to the traditional cardinality estimators in commercial DBMS.

Supplemental Material

MP4 File

Presentation video - short version of "FactorJoin: A New Cardinality Estimation Framework for Join Queries"

Download
26.41 MB

PDF File

Read me

Download
66.89 KB

ZIP File

Source Code

Download
24.60 MB

References

[1]

Mahmoud Abo Khamis, Hung Q Ngo, and Dan Suciu. 2017. What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another?. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 429--444.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Query optimization through the looking glass, and what we found running the Join Order Benchmark

Comments

Information

Published In

Publisher

Publication History

Permissions

Badges

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations