Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1182635.1164200acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Approximate encoding for direct access and query processing over compressed bitmaps

Published: 01 September 2006 Publication History

Abstract

Bitmap indices have been widely and successfully used in scientific and commercial databases. Compression techniques based on run-length encoding are used to improve the storage performance. However, these techniques introduce significant overheads in query processing even when only a few rows are queried. We propose a new bitmap encoding scheme based on multiple hashing, where the bitmap is kept in a compressed form, and can be directly accessed without decompression. Any subset of rows and/or columns can be retrieved efficiently by reconstructing and processing only the necessary subset of the bitmap. The proposed scheme provides approximate results with a trade-off between the amount of space and the accuracy. False misses are guaranteed not to occur, and the false positive rate can be estimated and controlled. We show that query execution is significantly faster than WAH-compressed bitmaps, which have been previously shown to achieve the fastest query response times. The proposed scheme achieves accurate results (90%-100%) and improves the speed of query processing from 1 to 3 orders of magnitude compared to WAH.

References

[1]
{1} S. Amer-Yahia and T. Johnson. Optimizing queries on compressed bitmaps. In The VLDB Journal, pages 329-338, 2000.
[2]
{2} G. Antoshenkov. Byte-aligned bitmap compression. Technical Report, Oracle Corp., 1994. U. S. Patent number 5,363,098.
[3]
{3} G. Antoshenkov. Byte-aligned bitmap compression. In Data Compression Conference, Nashua, NH, 1995. Oracle Corp.
[4]
{4} G. Antoshenkov and M. Ziauddin. Query processing and optimization in oracle rdb. The VLDB Journal, 1996.
[5]
{5} B. Bloom. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, July 1970.
[6]
{6} A. Broder and M. Mitzenmacher. Network Applications of Bloom Filters: A Survey. In Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing, pages 636-646, 2002.
[7]
{7} J. Byers, J. Considine, M. Mitzenmacher, and S. Rost. Informed content delivery across adaptive overlay networks. Proceedings of ACM SIGCOMM, August 2002, pp. 47-60, August 2002.
[8]
{8} C. Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pages 355-366. ACM Press, 1998.
[9]
{9} C. Y. Chan and Y. E. Ioannidis. An efficient bitmap encoding scheme for selection queries. SIGMOD Rec., 28(2):215-226, 1999.
[10]
{10} Wu chang Feng, D. D. Kandlur, D. Saha, and K. G. Shin. Stochastic Fair Blue: A Queue Management Algorithm for Enforcing Fairness. In Proc. of INFOCOM, volume 3, pages 1520-1529, April 2001.
[11]
{11} B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. Compactly encoding a function with static support in order to support approximate evaluations queries. Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, January 2004.
[12]
{12} H. Edelstein. Faster data warehouses. Information Week, December 1995.
[13]
{13} L. Fan, P. Cao, J. Almeida, and A. Broder. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. In IEEE/ACM Transactions on Networking, Canada, 2000.
[14]
{14} L. Fan, P. Cao, J. Almeida, and A. Broder. Web cache sharing. Collaborating Web caches use bloom filter to represent local set of cached files to reduce the network traffic. In IEEE/ACM Transactions on Networking, 2000.
[15]
{15} Informix Inc. Informix decision support indexing for the enterprise data warehouse. http://www.informix.com/informix/corpinfo/-zines/whiteidx.htm.
[16]
{16} Sybase Inc. Sybase IQ Indexes, chapter 5: Sybase IQ Release 11.2 Collection. March 1997.
[17]
{17} D. Johnson, S. Krishnan, J. Chhugani, S. Kumar, and S. Venkatasubramanian. Compressing large boolean matrices using reordering techniques. In VLDB 2004.
[18]
{18} T. Johnson. Performance measurement of compressed bitmap indices. In VLDB, pages 278-289, 1999.
[19]
{19} J. Chen K. Wu, W. Koegler and A. Shoshani. Using bitmap index for interactive exploration of large datasets. In Proceedings of SSDBM, 2003.
[20]
{20} N. Koudas. Space efficient bitmap indexing. In Proceedings of the ninth international conference on Information and knowledge management, pages 194-201. ACM Press, 2000.
[21]
{21} A. Kumar, J. J. Xu, and J. Wang L. Li. Algorithms: Space-code bloom filter for efficient traffic flow measurement. In Proceedings of the 2003 ACM SIGCOMM conference on Internet measurement, October 2003.
[22]
{22} A. Kumar, J. J. Xu, L. Li, and J. Wang. Measuring approximately yet reasonably accurate per-flow accounting without maintaining per-flow state. Proceedings of the 2003 ACM SIGCOMM conference on Internet measurement, 2003 October.
[23]
{23} P. Mishra and M. H. Eich. Join processing in relational databases. In ACM Computing Surveys (CSUR), March 1992.
[24]
{24} A. Moffat and J. Zobel. Parameterized compression of sparse bitmaps. In SIGIR Conference on Information Retrieval, 1992.
[25]
{25} J. K. Mullin. Estimating the size of joins in distributed databases where communication cost must be maintained low. In IEEE Transactions on Software Engineering, 1990.
[26]
{26} J. K. Mullin. Optimal semijoins for distributed database systems. In IEEE Transactions on Software Engineering, volume 16, pages 558-560, 1990.
[27]
{27} P. E. O'Neil. Model 204 architecture and performance. In Proceedings of the 2nd International Workshop on High Performance Transaction Systems, pages 40-59. Springer-Verlag, 1989.
[28]
{28} P. E. O'Neil. Informix and indexing support for data warehouses, 1997.
[29]
{29} P. E. O'Neil and D. Quass. Improved query performance with variant indexes. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pages 38-49. ACM Press, 1997.
[30]
{30} A. Partow. General purpose hash function algorithms library. http://www.partow.net/programming/hashfunctions/index.html, 2002.
[31]
{31} A. Pinar, T. Tao, and H. Ferhatosmanoglu. Compressing bitmap indices by data reorganization. ICDE, pages 310-321, 2005.
[32]
{32} M. V. Ramakrishna. In Indexing Goes a New Direction., volume 2, page 70, 1999.
[33]
{33} M. J. B. Robshaw. Md2, md4, md5, sha and other hash functions. technical report tr-101, version 4.0. RSA Laboratories, July 1995.
[34]
{34} A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem, and A. Sim. Multidimensinal indexing and query coordination for tertiary storage management. In SSDBM, pages 214-225, 1999.
[35]
{35} A. C. Snoeren. Hash-based IP traceback. In ACM SIGCOMM Computer Communication Review, 2001.
[36]
{36} A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio, B. Schwartz, S. T. Kent, and W. T. Strayer. IP Traceback to record packet digests traffic forwarded by the routers. IEEE/ACM Transactions on Networking (TON), December 2002.
[37]
{37} K. Stockinger. Bitmap indices for speeding up high-dimensional data analysis. In Proceedings of the 13th International Conference on Database and Expert Systems Applications, pages 881-890. Springer-Verlag, 2002.
[38]
{38} K. Stockinger, J. Shalf, W. Bethel, and K. Wu. Dex: Increasing the capability of scientific data analysis pipelines by using efficient bitmap indices to accelerate scientific visualization. In Proceedings of SSDBM, 2005.
[39]
{39} K. Stockinger and K. Wu. Improved searching for spatial features in spatio-temporal data. In Technical Report. Lawrence Berkeley National Laboratory. Paper LBNL-56376. http://repositories.cdlib.org/lbnl/LBNL-56376, September 2004.
[40]
{40} J. Wang. Caching proxy servers on the world wide web to improve performance and reduce traffic, October 1999.
[41]
{41} A. Whitaker and D. Wetherall. Detecting loops in small networks. 5th IEEE Conference on Open Architectures and Network Programming (OPENARCH), June 2002.
[42]
{42} K. Wu, E. J. Otoo, and A. Shoshani. A performance comparison of bitmap indexes. In Proc. Conf. on 10th International Conference on Information and Knowledge Management, pages 559-561. ACM Press, 2001.
[43]
{43} K. Wu, E. J. Otoo, and A. Shoshani. Compressing bitmap indexes for faster search operations. In SSDBM, pages 99-108, Edinburgh, Scotland, UK, July 2002.
[44]
{44} K. Wu, E. J. Otoo, and A. Shoshani. An efficient compression scheme for bitmap indices. Technical Report 49626, LBNL, April 2004.
[45]
{45} K. Wu, E. J. Otoo, and A. Shoshani. On the performance of bitmap indices for high cardinality attributes. Technical Report LBNL-54673, Lawrence Berkeley National Laboratory, March 2004.
[46]
{46} K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indexes with efficient compression. ACM Transactions on Database Systems (To appear), 2006.
[47]
{47} K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. Notes on design and implementation of compressed bit vectors. Technical Report LBNL PUB-3161, Lawrence Berkeley National Laboratory, 2001.
[48]
{48} M. C. Wu. Query optimization for selections using bitmaps. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data, pages 227-238. ACM Press, 1999.

Cited By

View all
  • (2016)An efficient method to evaluate intersections on big data setsTheoretical Computer Science10.1016/j.tcs.2016.07.018647:C(1-21)Online publication date: 27-Sep-2016
  • (2015)The hyperdyadic index and generalized indexing and query with PIQUEProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791374(1-12)Online publication date: 29-Jun-2015
  • (2015)A Padded Encoding Scheme to Accelerate Scans by Leveraging SkewProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2737787(1509-1524)Online publication date: 27-May-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages

Sponsors

  • SIGMOD: ACM Special Interest Group on Management of Data
  • K.I.S.S. SIG on Databases
  • AJU Information Technology Co., Ltd
  • US Army ITC-PAC Asian Research Office
  • Google Inc.
  • The Database Society of Japan
  • Samsung SOS
  • Advanced Information Technology Research Center
  • Naver
  • Microsoft: Microsoft
  • Korea Info Sci Society: Korea Information Science Society
  • SK telecom
  • Systems Applications Products
  • ORACLE: ORACLE
  • International Business Management
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Kosef
  • Kaist
  • LG Electronics
  • CCF-DBS

Publisher

VLDB Endowment

Publication History

Published: 01 September 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)An efficient method to evaluate intersections on big data setsTheoretical Computer Science10.1016/j.tcs.2016.07.018647:C(1-21)Online publication date: 27-Sep-2016
  • (2015)The hyperdyadic index and generalized indexing and query with PIQUEProceedings of the 27th International Conference on Scientific and Statistical Database Management10.1145/2791347.2791374(1-12)Online publication date: 29-Jun-2015
  • (2015)A Padded Encoding Scheme to Accelerate Scans by Leveraging SkewProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2737787(1509-1524)Online publication date: 27-May-2015
  • (2010)Position list word aligned hybridProceedings of the 13th International Conference on Extending Database Technology10.1145/1739041.1739071(228-239)Online publication date: 22-Mar-2010
  • (2009)Correlation mapsProceedings of the VLDB Endowment10.14778/1687627.16877652:1(1222-1233)Online publication date: 1-Aug-2009
  • (2009)Inverted indexes vs. bitmap indexes in decision support systemsProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646158(1509-1512)Online publication date: 2-Nov-2009
  • (2009)Secondary indexing in one dimensionProceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/1559795.1559824(177-186)Online publication date: 29-Jun-2009
  • (2008)Dynamic data organization for bitmap indicesProceedings of the 3rd international conference on Scalable information systems10.5555/1459693.1459733(1-10)Online publication date: 4-Jun-2008
  • (2008)BrighthouseProceedings of the VLDB Endowment10.14778/1454159.14541741:2(1337-1345)Online publication date: 1-Aug-2008
  • (2007)Space-efficient structures for detecting port scansProceedings of the 18th international conference on Database and Expert Systems Applications10.5555/2395856.2395873(120-129)Online publication date: 3-Sep-2007
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media