STHoles: A multidimensional workload-aware histogram

N Bruno, S Chaudhuri, L Gravano - Proceedings of the 2001 ACM …, 2001 - dl.acm.org
Proceedings of the 2001 ACM SIGMOD international conference on Management of …, 2001dl.acm.org
Attributes of a relation are not typically independent. Multidimensional histograms can be an
effective tool for accurate multiattribute query selectivity estimation. In this paper, we
introduce STHoles, a “workload-aware” histogram that allows bucket nesting to capture data
regions with reasonably uniform tuple density. STHoles histograms are built without
examining the data sets, but rather by just analyzing query results. Buckets are allocated
where needed the most as indicated by the workload, which leads to accurate query …
Attributes of a relation are not typically independent. Multidimensional histograms can be an effective tool for accurate multiattribute query selectivity estimation. In this paper, we introduce STHoles, a “workload-aware” histogram that allows bucket nesting to capture data regions with reasonably uniform tuple density. STHoles histograms are built without examining the data sets, but rather by just analyzing query results. Buckets are allocated where needed the most as indicated by the workload, which leads to accurate query selectivity estimations. Our extensive experiments demonstrate that STHoles histograms consistently produce good selectivity estimates across synthetic and real-world data sets and across query workloads, and, in many cases, outperform the best multidimensional histogram techniques that require access to and processing of the full data sets during histogram construction.
ACM Digital Library