Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1142473.1142548acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Integrating compression and execution in column-oriented database systems

Published: 27 June 2006 Publication History

Abstract

Column-oriented database system architectures invite a re-evaluation of how and when data in databases is compressed. Storing data in a column-oriented fashion greatly increases the similarity of adjacent records on disk and thus opportunities for compression. The ability to compress many adjacent tuples at once lowers the per-tuple cost of compression, both in terms of CPU and space overheads.In this paper, we discuss how we extended C-Store (a column-oriented DBMS) with a compression sub-system. We show how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems. We then evaluate a set of compression schemes and show that the best scheme depends not only on the properties of the data but also on the nature of the query workload.

References

[1]
{1} http://www.addamark.com/products/sls.htm.
[2]
{2} http://www.lzop.org.
[3]
{3} C-Store code release under bsd license. http://db.csail.mit.edu/projects/cstore/, 2005.
[4]
{4} A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB, pages 169-180, 2001.
[5]
{5} S. Amer-Yahia and T. Johnson. Optimizing queries on compressed bitmaps. In VLDB, pages 329-338, 2000.
[6]
{6} G. Antoshenkov. Byte-aligned data compression. U.S. Patent Number 5,363,098.
[7]
{7} G. Antoshenkov, D. B. Lomet, and J. Murray. Order preserving compression. In ICDE '96, pages 655-663. IEEE Computer Society, 1996.
[8]
{8} P. Boncz, S. Manegold, and M. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, pages 54-65, 1999.
[9]
{9} P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Journal: Very Large Data Bases, 8(2):101-119, 1999.
[10]
{10} P. A. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. In CIDR, pages 225-237, 2005.
[11]
{11} Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD '01, pages 271-282, 2001.
[12]
{12} G. V. Cormack. Data compression on a database system. Commun. ACM, 28(12):1336-1342, 1985.
[13]
{13} G. Graefe and L. Shapiro. Data compression and database performance. In ACM/IEEE-CS Symp. On Applied Computing pages 22-27, April 1991.
[14]
{14} J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In ICDE '98, pages 370-379, 1998.
[15]
{15} D. Huffman. A method for the construction of minimum-redundancy codes. Proc. IRE, 40(9):1098-1101, September 1952.
[16]
{16} B. R. Iyer and D. Wilhite. Data compression support in databases. In VLDB '94, pages 695-704, 1994.
[17]
{17} T. Johnson. Performance measurements of compressed bitmap indices. In VLDB, pages 278-289, 1999.
[18]
{18} S. Khoshafian, G. P. Copeland, T. Jagodis, H. Boral, and P. Valduriez. A query processing strategy for the decomposed storage model. In ICDE, pages 636-643. IEEE Computer Society, 1987.
[19]
{19} Kx Sytems, Inc. Faster database platforms for the real-time enterprise: How to get the speed you need to break through business intelligence bottlenecks in financial institutions. http://library.theserverside.com/data/ document.do?res_id=1072792428_967, 2003.
[20]
{20} C. A. Lynch and E. B. Brownrigg. Application of data compression to a large bibliographic data base. In VLDB '81, Cannes, France, pages 435-447, 1981.
[21]
{21} R. MacNicol and B. French. Sybase IQ multiplex - designed for analytics. In VLDB, pages 1227-1230, 2004.
[22]
{22} A. Moffat and J. Zobel. Compression and fast indexing for multi-gigabyte text databases. Australian Computer Journal, 26(1):1-9, 1994.
[23]
{23} P. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, pages 38-49, 1997.
[24]
{24} R. Ramamurthy, D. Dewitt, and Q. Su. A case for fractured mirrors. In VLDB, pages 89-101, 2002.
[25]
{25} G. Ray, J. R. Haritsa, and S. Seshadri. Database compression: A performance enhancement tool. In COMAD, 1995.
[26]
{26} M. A. Roth and S. J. V. Horn. Database compression. SIGMOD Rec., 22(3):31-39, 1993.
[27]
{27} D. G. Severance. A practitioner's guide to data base compression - tutorial. Inf. Syst., 8(1):51-62, 1983.
[28]
{28} M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A column-oriented DBMS. In VLDB, pages 553-564, 2005.
[29]
{29} T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Rec., 29(3):55-67, 2000.
[30]
{30} K. Wu, E. Otoo, and A. Shoshani. Compressed bitmap indices for efficient query processing. Technical Report LBNL-47807, 2001.
[31]
{31} K. Wu, E. Otoo, and A. Shoshani. Compressing bitmap indexes for faster search operations. In SSDBM'02, pages 99-108, 2002. LBNL-49627., 2002.
[32]
{32} K. Wu, E. Otoo, A. Shoshani, and H. Nordberg. Notes on design and implementation of compressed bit vectors. Technical Report LBNL/PUB-3161, 2001.
[33]
{33} A. Zandi, B. R. Iyer, and G. G. Langdon Jr. Sort order preserving data compression for extended alphabets. In Data Compression Conference, pages 330-339, 1993.
[34]
{34} J. Zhou and K. Ross. A multi-resolution block storage model for database design. In Proceedings of the 2003 IDEAS Conference, pages 22-33, 2003.
[35]
{35} J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337-343, 1977.
[36]
{36} J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530-536, 1978.
[37]
{37} M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In ICDE, 2006.

Cited By

View all
  • (2025)HLN-Tree: A memory-efficient B+-Tree with huge leaf nodes and locality predictorsACM Transactions on Storage10.1145/3707641Online publication date: 6-Jan-2025
  • (2024)µSlopeProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691966(529-544)Online publication date: 10-Jul-2024
  • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 8-Nov-2024
  • Show More Cited By

Index Terms

  1. Integrating compression and execution in column-oriented database systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
    June 2006
    830 pages
    ISBN:1595934340
    DOI:10.1145/1142473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. column-oriented databases
    2. column-stores
    3. database compression
    4. query execution

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)238
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)HLN-Tree: A memory-efficient B+-Tree with huge leaf nodes and locality predictorsACM Transactions on Storage10.1145/3707641Online publication date: 6-Jan-2025
    • (2024)µSlopeProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691966(529-544)Online publication date: 10-Jul-2024
    • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 8-Nov-2024
    • (2024)Partition, Don't Sort! Compression Boosters for Cloud Data Ingestion PipelinesProceedings of the VLDB Endowment10.14778/3681954.368201317:11(3456-3469)Online publication date: 1-Jul-2024
    • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
    • (2024)Improving Graph Compression for Efficient Resource-Constrained Graph AnalyticsProceedings of the VLDB Endowment10.14778/3665844.366585217:9(2212-2226)Online publication date: 1-May-2024
    • (2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
    • (2024)Making In-Memory Learned Indexes Efficient on DiskProceedings of the ACM on Management of Data10.1145/36549542:3(1-26)Online publication date: 30-May-2024
    • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
    • (2024)Exploiting Data-pattern-aware Vertical Partitioning to Achieve Fast and Low-cost Cloud Log StorageACM Transactions on Storage10.1145/364364120:2(1-35)Online publication date: 19-Feb-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media