Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2682862.2682870acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadcsConference Proceedingsconference-collections
research-article

Compression, SIMD, and Postings Lists

Published: 26 November 2014 Publication History

Abstract

The three generations of postings list compression strategies (Variable Byte Encoding, Word Aligned Codes, and SIMD Codecs) are examined in order to test whether or not each truly represented a generational change -- they do. Some weaknesses of the current SIMD-based schemes are identified and a new scheme, QMX, is introduced to address both space and decoding inefficiencies. Improvements are examined on multiple architectures and it is shown that different SSE implementations (Intel and AMD) perform differently.

References

[1]
Anh, V.N., A. Moffat, Inverted Index Compression using Word-Aligned Binary Codes. Inf. Ret., 2005. 8(1):151--166.
[2]
Anh, V.N., A. Moffat, Index compression using 64-bit words. Softw. Pract. Exper., 2010. 40(2):131--147.
[3]
Catena, M., C. Macdonald, I. Ounis, On Inverted Index Compression for Search Engine Efficiency, in ECIR 2014, pp. 359--371.
[4]
Dean, J., Challenges in Building Large-scale Information Retrieval Systems: Invited Talk, in WSDM 2009.
[5]
Elias, P., Universal Codeword Sets and the Representation of the Integers. IEEE Trans. Inf. Theory, 1975. 21(2):194--203.
[6]
Golomb, S.W., Run-length Encodings. IEEE Trans. Inf. Theory, 1966. 12(3):399--401.
[7]
Lemire, D., L. Boytsov, Decoding Billions of Integers per Second through Vectorization. Software: Prac. Exper.
[8]
Moffat, A., L. Stuiver, Binary Interpolative Coding for Effective Index Compression. Inf. Ret., 2000. 3(1):25--47.
[9]
Scholer, F., H.E. Williams, J. Yiannis, J. Zobel. Compression of Inverted Indexes for Fast Query Evaluation. in SIGIR 2002, pp. 222--229
[10]
Silvestri, F., R. Venturini, VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming, in CIKM 2010, pp. 1219--1228.
[11]
Stepanov, A.A., A.R. Gangolli, D.E. Rose, R.J. Ernst, P.S. Oberoi, SIMD-based Decoding of Posting Lists, in CIKM 2011, pp. 317--326.
[12]
Trotman, A., Compressing Inverted Files. Inf Ret., 2003. 6(1):5--19.
[13]
Trotman, A., X.-F. Jia, M. Crane, Towards an Efficient and Effective Search Engine, in SIGIR 2012 Workshop on Open Source Information Retrieval. 2012. pp. 40--47.
[14]
Williams, H.E., J. Zobel, Compressing Integers for Fast File Access. Computer Journal, 1999. 42(3):193--201.
[15]
Zhang, J., X. Long, T. Suel, Performance of Compressed Inverted List Caching in Search Engines, in WWW 2008, pp. 387--396.
[16]
Zukowski, M., S. Heman, N. Nes, P. Boncz, Super-Scalar RAM-CPU Cache Compression, in ICDE 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ADCS '14: Proceedings of the 19th Australasian Document Computing Symposium
November 2014
132 pages
ISBN:9781450330008
DOI:10.1145/2682862
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • RMIT University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compression
  2. Procrastination

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ADCS '14
ADCS '14: Australasian Document Computing Symposium
November 27 - 28, 2014
VIC, Melbourne, Australia

Acceptance Rates

Overall Acceptance Rate 30 of 57 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient immediate-access dynamic indexingInformation Processing & Management10.1016/j.ipm.2022.10324860:3(103248)Online publication date: May-2023
  • (2022)Real-time and post-hoc compression for data from Distributed Acoustic SensingComputers & Geosciences10.1016/j.cageo.2022.105181166:COnline publication date: 1-Sep-2022
  • (2020)Techniques for Inverted Index CompressionACM Computing Surveys10.1145/341514853:6(1-36)Online publication date: 6-Dec-2020
  • (2020)JASSjr: The Minimalistic BM25 Search Engine for Teaching and Learning Information RetrievalProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401413(2185-2188)Online publication date: 25-Jul-2020
  • (2020)On Optimally Partitioning Variable-Byte CodesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291128832:9(1812-1823)Online publication date: 5-Aug-2020
  • (2020)Compressed Data Structures for Binary Relations in PracticeIEEE Access10.1109/ACCESS.2020.29709838(25949-25963)Online publication date: 2020
  • (2019)Fast Dictionary-Based Compression for Inverted IndexesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290962(6-14)Online publication date: 30-Jan-2019
  • (2019)Optimizing partitioning strategies for faster inverted index compressionFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6252-513:2(343-356)Online publication date: 1-Apr-2019
  • (2019)An Experimental Study of Index Compression and DAAT Query Processing MethodsAdvances in Information Retrieval10.1007/978-3-030-15712-8_23(353-368)Online publication date: 7-Apr-2019
  • (2019) Micro‐ and macro‐optimizations of S aa T search Software: Practice and Experience10.1002/spe.268349:5(942-950)Online publication date: 20-Feb-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media