Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
The optimization of queries in relational databases
Publisher:
  • Case Western Reserve University
  • Computer Engineering and Science 10900 Euclid Avenue Cleveland, OH
  • United States
Order Number:AAI8109596
Pages:
168
Reflects downloads up to 30 Jan 2025Bibliometrics
Skip Abstract Section
Abstract

A fully implemented system for optimizing and executing queries for relational databases is described. The system optimizes n-table, equi-join queries written in QUEL, the query language supported by the INGRES relational database management system (DBMS). Tenfold and greater improvements in response time to complex queries have been achieved compared to INGRES.

The system models query execution plans as binary trees where each node can be one of four operator types: join, restrict-project, reformat and disk-resident-scan. This model is shown to include many previously published algorithms in addition to many new ones. The system also uses histograms to represent information about attribute's distributions. This information is used to accurately determine query execution cost and results in queries which run approximately thirty percent faster than when the distributional information is not used (i.e., uniform distributions are assumed).

Measurements are given comparing execution times for the new system and INGRES, comparing estimated and actual disk I/O with actual query execution time, and comparing the use of accurate distributional information stored in histograms with the use of uniform distributions.

Cited By

  1. ACM
    Do T and Graefe G (2022). Robust and Efficient Sorting with Offset-value Coding, ACM Transactions on Database Systems, 48:1, (1-23), Online publication date: 31-Mar-2023.
  2. ACM
    Do T, Graefe G and Naughton J (2022). Efficient Sorting, Duplicate Removal, Grouping, and Aggregation, ACM Transactions on Database Systems, 47:4, (1-35), Online publication date: 31-Dec-2022.
  3. Ahmed M (2019). Data summarization, Knowledge and Information Systems, 58:2, (249-273), Online publication date: 1-Feb-2019.
  4. Zhang Y, Wang H, Yang L and Li J (2018). Efficient histogram-based range query estimation for dirty data, Frontiers of Computer Science: Selected Publications from Chinese Universities, 12:5, (984-999), Online publication date: 1-Oct-2018.
  5. Ahmed M (2018). Reservoir-based network traffic stream summarization for anomaly detection, Pattern Analysis & Applications, 21:2, (579-599), Online publication date: 1-May-2018.
  6. ACM
    Canonne C Are Few Bins Enough Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, (455-463)
  7. ACM
    Acharya J, Diakonikolas I, Hegde C, Li J and Schmidt L Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, (249-263)
  8. ACM
    Thomasian A (2014). Analysis of Fork/Join and Related Queueing Systems, ACM Computing Surveys, 47:2, (1-71), Online publication date: 8-Jan-2015.
  9. Labbadi W and Akaichi J Improving Range Query Result Size Estimation Based on a New Optimal Histogram Proceedings of the 10th International Conference on Flexible Query Answering Systems - Volume 8132, (40-56)
  10. ACM
    Yu F, Hou W, Luo C, Che D and Zhu M CS2 Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, (469-480)
  11. ACM
    Inkster D, Zukowski M and Boncz P (2011). Integration of vectorwise with ingres, ACM SIGMOD Record, 40:3, (45-53), Online publication date: 17-Nov-2011.
  12. ACM
    Nergiz M, Tamersoy A and Saygin Y (2011). Instant anonymization, ACM Transactions on Database Systems (TODS), 36:1, (1-33), Online publication date: 1-Mar-2011.
  13. Buccafurri F and Lax G (2019). Approximating sliding windows by cyclic tree-like histograms for efficient range queries, Data & Knowledge Engineering, 69:9, (979-997), Online publication date: 1-Sep-2010.
  14. ACM
    Kanne C and Moerkotte G Histograms reloaded Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, (663-674)
  15. ACM
    Halim F, Karras P and Yap R Fast and effective histogram construction Proceedings of the 18th ACM conference on Information and knowledge management, (1167-1176)
  16. Furfaro F, Mazzeo G, Saccà D and Sirangelo C (2008). Compressed hierarchical binary histograms for summarizing multi-dimensional data, Knowledge and Information Systems, 15:3, (335-380), Online publication date: 1-Jun-2008.
  17. Yan F, Hou W, Jiang Z, Luo C and Zhu Q (2007). Selectivity estimation of range queries based on data density approximation via cosine series, Data & Knowledge Engineering, 63:3, (855-878), Online publication date: 1-Dec-2007.
  18. ACM
    Eavis T and Lopez A Rk-hist Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (475-484)
  19. Guha S and Shim K (2007). A Note on Linear Time Algorithms for Maximum Error Histograms, IEEE Transactions on Knowledge and Data Engineering, 19:7, (993-997), Online publication date: 1-Jul-2007.
  20. Lin X, Zhang Q, Yuan Y and Liu Q (2007). Error minimization in approximate range aggregates, Data & Knowledge Engineering, 62:1, (156-176), Online publication date: 1-Jul-2007.
  21. ACM
    Jermaine C, Dobra A, Arumugam S, Joshi S and Pol A (2006). The Sort-Merge-Shrink join, ACM Transactions on Database Systems (TODS), 31:4, (1382-1416), Online publication date: 1-Dec-2006.
  22. ACM
    Su W, Wang J, Huang Q and Lochovsky F Query result ranking over e-commerce web databases Proceedings of the 15th ACM international conference on Information and knowledge management, (575-584)
  23. ACM
    Graefe G (2006). Implementing sorting in database systems, ACM Computing Surveys (CSUR), 38:3, (10-es), Online publication date: 30-Sep-2006.
  24. Jin L and Li C Selectivity estimation for fuzzy string predicates in large data sets Proceedings of the 31st international conference on Very large data bases, (397-408)
  25. Elmongui H, Mokbel M and Aref W Spatio-temporal histograms Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases, (19-36)
  26. ACM
    Furfaro F, Mazzeo G, Saccà D and Sirangelo C Hierarchical binary histograms for summarizing multi-dimensional data Proceedings of the 2005 ACM symposium on Applied computing, (598-603)
  27. Muthukrishnan S and Suel T (2005). Approximation algorithms for array partitioning problems, Journal of Algorithms, 54:1, (85-104), Online publication date: 1-Jan-2005.
  28. Pham H and Sevcik K Structure choices for two-dimensional histogram construction Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, (13-27)
  29. Miled Z, Liu J, Bukhres O, Li H, Martin J, Balagopalakrishna C and Oppelt R (2019). Use and Maintenance of Histograms for Large Scientific Database Access Planning, Journal of Intelligent Information Systems, 23:2, (145-178), Online publication date: 1-Sep-2004.
  30. Ioannidis Y The history of histograms (abridged) Proceedings of the 29th international conference on Very large data bases - Volume 29, (19-30)
  31. ACM
    Gao L, Wang M, Wang X and Padmanabhan S A learning-based approach to estimate statistics of operators in continuous queries Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, (66-72)
  32. ACM
    Gibbons P, Matias Y and Poosala V (2002). Fast incremental maintenance of approximate histograms, ACM Transactions on Database Systems, 27:3, (261-298), Online publication date: 1-Sep-2002.
  33. Bernardino J, Furtado P and Madeira H (2019). Approximate Query Answering Using Data Warehouse Striping, Journal of Intelligent Information Systems, 19:2, (145-167), Online publication date: 1-Sep-2002.
  34. ACM
    Thaper N, Guha S, Indyk P and Koudas N Dynamic multidimensional histograms Proceedings of the 2002 ACM SIGMOD international conference on Management of data, (428-439)
  35. ACM
    Li W, Gao D and Snodgrass R Skew handling techniques in sort-merge join Proceedings of the 2002 ACM SIGMOD international conference on Management of data, (169-180)
  36. Andrei M and Valduriez P User-Optimizer Communication using Abstract Plans in Sybase ASE Proceedings of the 27th International Conference on Very Large Data Bases, (29-38)
  37. Garofalakis M and Gibbon P Approximate Query Processing Proceedings of the 27th International Conference on Very Large Data Bases
  38. Berman P, DasGupta B, Muthukrishnan S and Ramaswami S Improved approximation algorithms for rectangle tiling and packing Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, (427-436)
  39. Ioannidis Y and Poosala V Histogram-Based Approximation of Set-Valued Query-Answers Proceedings of the 25th International Conference on Very Large Data Bases, (174-185)
  40. Oommen B and Thiyagarajah M Query Result Size Estimation Using a Novel Histogram-like Technique Proceedings of the 1999 International Symposium on Database Engineering & Applications
  41. ACM
    Acharya S, Gibbons P, Poosala V and Ramaswamy S Join synopses for approximate query answering Proceedings of the 1999 ACM SIGMOD international conference on Management of data, (275-286)
  42. ACM
    Aboulnaga A and Chaudhuri S Self-tuning histograms Proceedings of the 1999 ACM SIGMOD international conference on Management of data, (181-192)
  43. ACM
    Acharya S, Poosala V and Ramaswamy S Selectivity estimation in spatial databases Proceedings of the 1999 ACM SIGMOD international conference on Management of data, (13-24)
  44. ACM
    Acharya S, Gibbons P, Poosala V and Ramaswamy S (1999). Join synopses for approximate query answering, ACM SIGMOD Record, 28:2, (275-286), Online publication date: 1-Jun-1999.
  45. ACM
    Aboulnaga A and Chaudhuri S (1999). Self-tuning histograms, ACM SIGMOD Record, 28:2, (181-192), Online publication date: 1-Jun-1999.
  46. ACM
    Acharya S, Poosala V and Ramaswamy S (1999). Selectivity estimation in spatial databases, ACM SIGMOD Record, 28:2, (13-24), Online publication date: 1-Jun-1999.
  47. Smith A and Suri S Rectangular tiling in multi-dimensional arrays Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, (786-794)
  48. Gibbons P, Matias Y and Poosala V Fast Incremental Maintenance of Approximate Histograms Proceedings of the 23rd International Conference on Very Large Data Bases, (466-475)
  49. Poosala V and Ioannidis Y Selectivity Estimation Without the Attribute Value Independence Assumption Proceedings of the 23rd International Conference on Very Large Data Bases, (486-495)
  50. ACM
    Poosala V, Haas P, Ioannidis Y and Shekita E (2019). Improved histograms for selectivity estimation of range predicates, ACM SIGMOD Record, 25:2, (294-305), Online publication date: 1-Jun-1996.
  51. ACM
    Poosala V, Haas P, Ioannidis Y and Shekita E Improved histograms for selectivity estimation of range predicates Proceedings of the 1996 ACM SIGMOD international conference on Management of data, (294-305)
  52. ACM
    O'Neil P and Graefe G (1995). Multi-table joins through bitmapped join indices, ACM SIGMOD Record, 24:3, (8-11), Online publication date: 1-Sep-1995.
  53. ACM
    Ioannidis Y and Poosala V Balancing histogram optimality and practicality for query result size estimation Proceedings of the 1995 ACM SIGMOD international conference on Management of data, (233-244)
  54. ACM
    Ioannidis Y and Poosala V (2019). Balancing histogram optimality and practicality for query result size estimation, ACM SIGMOD Record, 24:2, (233-244), Online publication date: 22-May-1995.
  55. Graefe G, Linville A and Shapiro L (2018). Sort vs. Hash Revisited, IEEE Transactions on Knowledge and Data Engineering, 6:6, (934-944), Online publication date: 1-Dec-1994.
  56. ACM
    Ioannidis Y and Christodoulakis S (1993). Optimal histograms for limiting worst-case error propagation in the size of join results, ACM Transactions on Database Systems (TODS), 18:4, (709-748), Online publication date: 1-Dec-1993.
  57. ACM
    Sun W, Ling Y, Rishe N and Deng Y (1993). An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment, ACM SIGMOD Record, 22:2, (79-88), Online publication date: 1-Jun-1993.
  58. ACM
    Sun W, Ling Y, Rishe N and Deng Y An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment Proceedings of the 1993 ACM SIGMOD international conference on Management of data, (79-88)
  59. Graefe G (2019). Heap-Filter Merge Join, IEEE Transactions on Software Engineering, 17:9, (979-982), Online publication date: 1-Sep-1991.
  60. Chu P (2019). A Contingency Approach to Estimating Record Selectivities, IEEE Transactions on Software Engineering, 17:6, (544-552), Online publication date: 1-Jun-1991.
  61. Lynch C Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values Proceedings of the 14th International Conference on Very Large Data Bases, (240-251)
  62. ACM
    Ioannidis Y and Wong E (2019). Query optimization by simulated annealing, ACM SIGMOD Record, 16:3, (9-22), Online publication date: 1-Dec-1987.
  63. ACM
    Ioannidis Y and Wong E Query optimization by simulated annealing Proceedings of the 1987 ACM SIGMOD international conference on Management of data, (9-22)
Contributors
  • Case Western Reserve University

Recommendations