article

Query optimization in compressed database systems

Authors:

Johannes Gehrke,

Flip KornAuthors Info & Claims

ACM SIGMOD Record, Volume 30, Issue 2

Pages 271 - 282

https://doi.org/10.1145/376284.375692

Published: 01 May 2001 Publication History

Abstract

Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work describes the benefits of compression for numerical attributes, where data is stored in compressed format on disk. Despite the abundance of string-valued attributes in relational schemas there is little work on compression for string attributes in a database context. Moreover, none of the previous work suitably addresses the role of the query optimizer: During query execution, data is either eagerly decompressed when it is read into main memory, or data lazily stays compressed in main memory and is decompressed on demand only

In this paper, we present an effective approach for database compression based on lightweight, attribute-level compression techniques. We propose a IIierarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes. We show that eager and lazy decompression strategies produce sub-optimal plans for queries involving compressed string attributes. We then formalize the problem of compression-aware query optimization and propose one provably optimal and two fast heuristic algorithms for selecting a query plan for relational schemas with compressed attributes; our algorithms can easily be integrated into existing cost-based query optimizers. Experiments using TPC-H data demonstrate the impact of our string compression methods and show the importance of compression-aware query optimization. Our approach results in up to an order speed up over existing approaches.

References

[1]

Transact on processing performance council TPC-H benchmark, http://www.tpc.org 1999.

[2]

Predator DMBS. http://www.cs.cornel l.edu/database/predator, Cornel l Univ., Computer Science Dept.,2000.

[3]

S.Amer-Yahia and T.Johnson.Optimizing queres on compressed b tmaps.In Proc.of VLDB pages 329 -338,2000.

Digital Library

[4]

G.Antoshenkov,D.B.Lomet,and J.Murray.Order preserving compression.In Proc.of ICDE pages 655 -663,1996.

Digital Library

[5]

C.Blake and C.Merz.UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html 1998.

[6]

P.A.Boncz,S.Manegold,and M.L.Kersten. Database architecture opt m zed for the new bottleneck:Memory access.In Proc.of VLDB pages 54 -65,1999.

Digital Library

[7]

S.Chaudhur and K.Shim.Opt m zat on of quer es w th user-de .ned predicates.TODS 24(2):177 -228, 1999.

Digital Library

[8]

Z.Chen and P.Seshadr .An algebra c compression framework for query results.In Proc.of ICDE pages 177 -188,2000.

Digital Library

[9]

J.G.Cleary and I.H.W tten.Data compression using adaptive coding and partial string matching. IEEE Trans. on Communications COM-32(4),pages 396 -402,April 1984.

[10]

G.Cormack.Data compression n a database system. Commnications of the ACM pages 1336 -1342,Dec. 1985.

Digital Library

[11]

S.J.Eggers,F.Olken,and A.Shoshani.A compress on techn que for large statist cal data-bases. In Proc.of VLDB pages 424 -434,1981.

[12]

J.Goldste n,R.Ramakr shnan,and U.Shaft. Compressing relations and indexes.In Proc.of ICDE pages 370 -379,1998.

Digital Library

[13]

J.Goldste n,R.Ramakr shnan,and U.Shaft. Squeezing the most out of relat onal database systems. In Proc.of ICDE page 81,2000.

Digital Library

[14]

G.Graefe.Opt ons n physical databases.SIGMOD Record 22(3),pages 76 -83,Sept.1993.

Digital Library

[15]

G.Graefe and L.Shapiro.Data compression and database performance.In ACM/IEEE-CS Symp. On Applied Computing pages 22 -27,April 1991.

[16]

R.Greer.Daytona and the fourth-generat on language cymbal.In Proc.of SIGMOD pages 525 -526,1999.

Digital Library

[17]

J.M.Hellerste n and M.Stonebraker.Predicate migration:Optimizing queries with expensive pred cates.In Proc. of SIGMOD pages 267 -276,1993.

Digital Library

[18]

D.Hu .man.A method for the construct on of m nimum-redundanc codes.In Proc. IRE, 40(9), pages 1098 -1101,Sept.1952.

[19]

B.R.Iyer and D.W lh te.Data compression support n databases.In Proc.of VLDB pages 695 -704,1994.

Digital Library

[20]

T.J.Lehman and M.J.Carey.Query processing n man memory database management systems.In Proc. of SIGMOD,pages 239 -250,1986.

Digital Library

[21]

J.L,D.Rotem,and J.Srivastava.Aggregat on algorithms for very large compressed data warehouses. In Proc. of VLDB pages 651 -662,1999.

Digital Library

[22]

H.Lefke and D.Suciu.Xmill:Anecient compressor for XML data.In Proc.of SIGMOD pages 153 -164, 2000.

Digital Library

[23]

W.K.Ng and C.V.Rav shankar.Relat onal database compression using augmented vector quant zat on.In Proc. of ICDE pages 540 -549,1995.

Digital Library

[24]

G.Ray,J.R.Harista,and S.Seshadri.Database compression:A performance enhancement tool.In the 7th Int'l Conf. on Management of Data (COMAD), Pune,India,1995.

[25]

M.A.Roth and S.J.V.Horn.Database compression. SIGMOD Record 22(3):31 -39,1993.

Digital Library

[26]

P.G.Sel nger,M.M.Astrahan,D.D.Chamberl n, R.A.Lorie,and T.G.Price.Access path selection n a relat onal database management system.In Proc. of SIGMOD pages 23 -34,1979.

Digital Library

[27]

D.Severance.A pract t oner 's guide to database compression.Information Systems 8(1),pages 51 -62, 1983.

[28]

T.Welch.A technique for high-performance data compression.IEEE Computer 17(6),pages 8 -19,June 1984.

[29]

T.Westmann,D.Kossmann,S.Helmer,and G.Moerkotte.The mplementation and performance of compressed databases.SIGMOD Record 29(3), Sept.2000.

Digital Library

[30]

I.H.W tten,A.Mo .at,and T.C.Bell.Managing Giga Bytes - Compressing and Indexing Documents and Images Morgan Kaufmann Publ shers,Inc,1999.

Digital Library

[31]

I.H.W tten,R.Neal,and J.Cleary.Arithmetic coding for data compression.Communications of the ACM, 30(6),pages 520 -540,June 1987.

Digital Library

[32]

J.Ziv and A.Lempel.On the complexity of .nite sequences.IEEE Trans. on Information Theory, 22(1),pages 75 -81,1976.

[33]

J.Ziv and A.Lempel.A universal algorithm for sequent al data compression.IEEE Trans. on Information Theory, 22(1),pages 337 -343,1977.

Digital Library

Cited By

Gao CBallijepalli SWang J(2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
https://doi.org/10.1145/3654972
Alguliyev RAliguliyev RSukhostat L(2021)Parallel batch k-means for Big data clusteringComputers & Industrial Engineering10.1016/j.cie.2020.107023152(107023)Online publication date: Feb-2021
https://doi.org/10.1016/j.cie.2020.107023
Li YXu W(2021)Utilizing the column imprints to accelerate no‐partitioning hash joins in large‐scale edge systemsTransactions on Emerging Telecommunications Technologies10.1002/ett.408432:6Online publication date: 13-Jun-2021
https://dl.acm.org/doi/10.1002/ett.4084
Show More Cited By

Index Terms

Query optimization in compressed database systems

Recommendations

Query optimization in compressed database systems
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work ...
Building compressed database systems
Aggregate-Join Query Processing in Parallel Database Systems
HPC '00: Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2

Queries containing aggregate functions often combine multiple tables through join operations. We call these queries Aggregate-Join queries. In parallel processing of such queries, it must be decided which attribute to be used as a partitioning attribute,...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record

ACM SIGMOD Record Volume 30, Issue 2

June 2001

625 pages

ISSN:0163-5808

DOI:10.1145/376284

Editors:
Timos Sellis
National Technical Univ. of Athens
,
Sharad Mehrotra
Univ. of California at Irvine

Issue’s Table of Contents

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Editors:
Timos Sellis,
Sharad Mehrotra

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Published in SIGMOD Volume 30, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
1,459
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)6

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao CBallijepalli SWang J(2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
https://doi.org/10.1145/3654972
Alguliyev RAliguliyev RSukhostat L(2021)Parallel batch k-means for Big data clusteringComputers & Industrial Engineering10.1016/j.cie.2020.107023152(107023)Online publication date: Feb-2021
https://doi.org/10.1016/j.cie.2020.107023
Li YXu W(2021)Utilizing the column imprints to accelerate no‐partitioning hash joins in large‐scale edge systemsTransactions on Emerging Telecommunications Technologies10.1002/ett.408432:6Online publication date: 13-Jun-2021
https://dl.acm.org/doi/10.1002/ett.4084
Boncz PNeumann TLeis V(2020)FSSTProceedings of the VLDB Endowment10.14778/3407790.340785113:12(2649-2661)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3407790.3407851
Damme PUngethüm APietrzyk JKrause AHabich DLehner W(2020)MorphStoreProceedings of the VLDB Endowment10.14778/3407790.340783313:12(2396-2410)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3407790.3407833
Zhang HLiu XAndersen DKaminsky MKeeton KPavlo AMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Order-Preserving Key Compression for In-Memory Search TreesProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380583(1601-1615)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3380583
Alguliyev RAliguliyev RSukhostat L(2020)Efficient algorithm for big data clustering on single machineCAAI Transactions on Intelligence Technology10.1049/trit.2019.00485:1(9-14)Online publication date: 8-Jan-2020
https://doi.org/10.1049/trit.2019.0048
Wang ZKara KZhang HAlonso GMutlu OZhang C(2019)Accelerating generalized linear models with MLWeavingProceedings of the VLDB Endowment10.14778/3317315.331732212:7(807-821)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.14778/3317315.3317322
Liu CUmbenhower MJiang HSubramaniam PMa JElmore A(2019)Mostly Order Preserving Dictionaries2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00111(1214-1225)Online publication date: Apr-2019
https://doi.org/10.1109/ICDE.2019.00111
Pang ZWu SHuang HHong ZXie Y(2019)AQUA+: Query Optimization for Hybrid Database-MapReduce System2019 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2019.00034(199-206)Online publication date: Nov-2019
https://doi.org/10.1109/ICBK.2019.00034
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents