Abstract
Set intersection is a fundamental operation for evaluating conjunctive queries in the context of scientific data analysis. The state-of-the-art approach in performing set intersection, compressed bitmap indexing, achieves high computational efficiency because of cheap bitwise operations; however, overall efficiency is often nullified by the HPC I/O bottleneck, because compressed bitmap indexes typically exhibit a heavy storage footprint. Conversely, the recently-presented PForDelta-compressed index has been demonstrated to be storage-lightweight, but has limited performance for set intersection. Thus, a more effective set intersection approach should be efficient in both computation and I/O.
Therefore, we propose a fast set intersection approach that couples the storage light-weight PForDelta indexing format with computationally-efficient bitmaps through a specialized on-the-fly conversion. The resultant challenge is to ensure this conversion process is fast enough to maintain the performance gains from both PForDelta and the bitmaps. To this end, we contribute two key enhancements to PForDelta, BitRun and BitExp, which improve bitmap conversion through bulk bit-setting and a more streamlined PForDelta decoding process, respectively. Our experimental results show that our integrated PForDelta-bitmap method speeds up conjunctive queries by up to 7.7x versus the state-of-the-art approach, while using indexes that require 15%-60% less storage in most cases.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Demaine, E., López-Ortiz, A., Munro, J.: Adaptive set intersections, unions, and differences. In: Proc. Symposium on Discrete Algorithms, SODA (2000)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems (1996)
Byna, S., Wehner, M., Wu, K., et al.: Detecting atmospheric rivers in large climate datasets. In: Proc. Workshop on Petascal Data Analytics: Challenges and Opportunities (2011)
Wu, K., Otoo, E., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: Proc. Scientific and Statistical Database Management, SSDM (2002)
Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proc. Very Large Data Bases (VLDB), vol. 30 (2004)
Wu, K.: FastBit: An efficient indexing technology for accelerating data-intensive science. Journal of Physics: Conference Series (2005)
Jenkins, J., et al.: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Liddle, S.W., Schewe, K.-D., Tjoa, A.M., Zhou, X. (eds.) DEXA 2012, Part II. LNCS, vol. 7447, pp. 16–30. Springer, Heidelberg (2012)
Jenkins, J., et al.: Alacrity: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) TLDKS X. LNCS, vol. 8220, pp. 95–114. Springer, Heidelberg (2013)
Lakshminarasimhan, S., Boyuka II, D., et al.: Scalable in situ scientific data encoding for analytical query processing. In: Proc. High-performance Parallel and Distributed Computing HPDC 2013 (2013)
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proc. International Conference on Data Engineering, ICDE (2006)
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. World Wide Web, WWW (2008)
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. World Wide Web, WWW (2009)
Barbay, J., López-Ortiz, A., Lu, T.: Faster adaptive set intersections for text searching. In: Àlvarez, C., Serna, M. (eds.) WEA 2006. LNCS, vol. 4007, pp. 146–157. Springer, Heidelberg (2006)
Baeza-Yates, R.: A fast set intersection algorithm for sorted sequences. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 400–408. Springer, Heidelberg (2004)
Chatchaval, J., Boonjing, V., Chanvarasuth, P.: A skipping SvS intersection algorithm. In: Proc. International Conference on Computing, Engineering and Information, ICC (2009)
Jonassen, S., Bratsberg, S.E.: Efficient compressed inverted index skipping for disjunctive text-queries. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 530–542. Springer, Heidelberg (2011)
Chen, J., Choudhary, A., Supinski, B., et al.: Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science & Discovery (2009)
Fryxell, B., Olson, K., Ricker, P., et al.: FLASH: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. The Astrophysical Journal Supplement Series (2000)
Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM Transactions on Database Systems, TODS (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zou, X. et al. (2014). Fast Set Intersection through Run-Time Bitmap Construction over PForDelta-Compressed Indexes. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_56
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_56
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)