Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/583890.583901acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Strategies for processing ad hoc queries on large data warehouses

Published: 08 November 2002 Publication History

Abstract

As data warehousing applications grow in size, existing data organizations and access strategies, such as relational tables and B-tree indexes, are becoming increasingly ineffective. The two primary reasons for this are that these datasets involve many attributes and the queries on the data usually involve conditions on small subsets of the attributes. Two strategies are known to address these difficulties well, namely vertical partitioning and bitmap indexes. In this paper, we summarize our experience of implementing a number of bitmap index schemes on vertically partitioned data tables. One important observation is that simply scanning the vertically partitioned data tables is often more efficient than using B-tree based indexes to answer ad hoc range queries on static datasets. For these range queries, compressed bitmap indexes are in most cases more efficient than scanning vertically partitioned tables. We evaluate the performance of two different compression schemes for bitmap indexes stored is various ways. Using the compression scheme called Word-Aligned Hybrid Code (WAH) to store the bitmaps in plain files shows the best overall performance for bitmap indexes. Tests indicate that our bitmap index strategy based on WAH is not only efficient for attributes of low cardinality, say, <100, but also for high-cardinality attributes with 200,000 or more distinct values.

References

[1]
S. Amer-Yahia and T. Johnson. Optimizing Queries on Compressed Bitmaps. In Proceedings of VLDB 2000, pages 329--338. Morgan Kaufmann, 2000.]]
[2]
G. Antoshenkov. Byte-Aligned Bitmap Compression. Technical Report, Oracle Corp., 1994. U.S. Patent number 5,363,098.]]
[3]
G. Antoshenkov and M. Ziauddin. Query Processing and Optimization in ORACLE RDB. The VLDB Journal, 5:229--237, 1996.]]
[4]
R. Bayer. The Universal B-tree for Multidimensional Indexing. In Proc. of Intl. Conf. on World-Wide Computing and Its Applications, pages 98--112. Springer-Verlag, 1997.]]
[5]
C.-Y. Chan and Y. E. Ioannidis. Bitmap Index Design and Evaluation. In Proceedings of SIGMOD 1998. ACM Press, 1998.]]
[6]
C. Y. Chan and Y. E. Ioannidis. An Efficient Bitmap Encoding Scheme for Selection Queries. In Proceedings of SIGMOD 1999. ACM Press, 1999.]]
[7]
S. Chaudhuri and U. Dayal. An Overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record, 26(1):65--74, March 1997.]]
[8]
D. Comer. The Ubiquitous B-Tree. Computing Surveys, 11(2):121--137, 1979.]]
[9]
V. Gaede and O. Günther. Multidimension access methods. ACM Computing Surveys, 30(2):170--231, 1998.]]
[10]
T. Johnson. Performance Measurements of Compressed Bitmap Indices. In Proceedings of VLDB '99, pages 278--289. Morgan Kaufmann, 1999.]]
[11]
M. Jürgens and H.-J. Lenz. Tree Based Indexes vs. Bitmap Indexes - a Performance Study. International Journal of Cooperative Information Systems, 10(3):355--376, 2001.]]
[12]
V. Markl and R. Bayer. Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering. In Proceedings of DMDW 2000, June 5--6, 2000.]]
[13]
P. O'Neil. Model 204 Architecture and Performance. In 2nd International Workshop in High Performance Transaction Systems, Asilomar, CA, pages 40--59, September 1987.]]
[14]
P. O'Neil and D. Quass. Improved Query Performance With Variant Indices. In Proceedings of SIGMOD '97, pages 38--49. ACM Press, 1997.]]
[15]
A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem, and A. Sim. Multidimensional Indexing and Query Coordination for Tertiary Storage Management. In Proceedings of SSDBM'99, pages 214--225. IEEE Computer Society Press, 1999.]]
[16]
K. Stockinger, D. Duellmann, W. Hoschek, and E. Schikuta. Improving the Performance of High-Energy Physics Analysis through Bitmap Indices. In DEXA 2000. Springer-Verlag 2000.]]
[17]
K. Stockinger. Bitmap Indices for Speeding Up High-Dimensional Data Analysis. In DEXA 2002. Springer-Verlag, 2002.]]
[18]
K. Wu, E. J. Otoo, and A. Shoshani. A Performance Comparison of Bitmap Indexes. In Proceedings of CIKM 2001, pages 559--561. ACM Press, 2001.]]
[19]
K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. Notes on Design and Implementation of Compressed Bit Vectors. Technical Report LBNL/PUB-3161, Lawrence Berkeley National Laboratory, Berkeley, CA, 2001.]]

Cited By

View all
  • (2020)Comparative Study on Query Processing and Indexing Techniques in Big Data2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315935(933-939)Online publication date: 3-Dec-2020
  • (2020)Optimizing bitmap index encoding for high performance queriesConcurrency and Computation: Practice and Experience10.1002/cpe.594333:18Online publication date: 7-Sep-2020
  • (2019)Parallel membership queries on very large scientific data sets using bitmap indexesConcurrency and Computation: Practice and Experience10.1002/cpe.515731:15Online publication date: 28-Jan-2019
  • Show More Cited By

Index Terms

  1. Strategies for processing ad hoc queries on large data warehouses

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DOLAP '02: Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
    November 2002
    88 pages
    ISBN:1581135904
    DOI:10.1145/583890
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 November 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    CIKM02

    Acceptance Rates

    Overall Acceptance Rate 29 of 79 submissions, 37%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Comparative Study on Query Processing and Indexing Techniques in Big Data2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315935(933-939)Online publication date: 3-Dec-2020
    • (2020)Optimizing bitmap index encoding for high performance queriesConcurrency and Computation: Practice and Experience10.1002/cpe.594333:18Online publication date: 7-Sep-2020
    • (2019)Parallel membership queries on very large scientific data sets using bitmap indexesConcurrency and Computation: Practice and Experience10.1002/cpe.515731:15Online publication date: 28-Jan-2019
    • (2014)Defining Energy Consumption Plans for Data Querying ProcessesProceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud Computing10.1109/BDCloud.2014.109(641-647)Online publication date: 3-Dec-2014
    • (2012)Indexing RFID data using the VG-curveProceedings of the Twenty-Third Australasian Database Conference - Volume 12410.5555/2483739.2483754(117-126)Online publication date: 31-Jan-2012
    • (2010)Sorting improves word-aligned bitmap indexesData & Knowledge Engineering10.1016/j.datak.2009.08.00669:1(3-28)Online publication date: 1-Jan-2010
    • (2009)New binning strategy for bitmap indices on high cardinality attributesProceedings of the 2nd Bangalore Annual Compute Conference10.1145/1517303.1517327(1-5)Online publication date: 9-Jan-2009
    • (2009)Automatic beam path analysis of laser wakefield particle acceleration dataComputational Science & Discovery10.1088/1749-4699/2/1/0150052:1(015005)Online publication date: 18-Nov-2009
    • (2008)Investigating design choices between Bitmap index and B-tree index for a large data warehouse systemProceedings of the 8th conference on Applied computer scince10.5555/1504034.1504058(123-130)Online publication date: 21-Nov-2008
    • (2008)High performance multivariate visual data exploration for extremely large dataProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413422(1-12)Online publication date: 15-Nov-2008
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media