Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-540-89097-3_18guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Practical Rank/Select Queries over Arbitrary Sequences

Published: 10 November 2008 Publication History

Abstract

We present a practical study on the compact representation of sequences supporting <em>rank</em>, <em>select</em>, and <em>access</em> queries. While there are several theoretical solutions to the problem, only a few have been tried out, and there is little idea on how the others would perform, especially in the case of sequences with very large alphabets. We first present a new practical implementation of the compressed representation for bit sequences proposed by Raman, Raman, and Rao [SODA 2002], that is competitive with the existing ones when the sequences are not too compressible. It also has nice local compression properties, and we show that this makes it an excellent tool for compressed text indexing in combination with the Burrows-Wheeler transform. This shows the practicality of a recent theoretical proposal [Mäkinen and Navarro, SPIRE 2007], achieving spaces never seen before. Second, for general sequences, we tune wavelet trees for the case of very large alphabets, by removing their pointer information. We show that this gives an excellent solution for representing a sequence within zero-order entropy space, in cases where the large alphabet poses a serious challenge to typical encoding methods. We also present the first implementation of Golynski et al.'s representation [SODA 2006], which offers another interesting time/space trade-off.

References

[1]
Barbay, J., He, M., Munro, I., Srinivasa Rao, S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: 18th SODA, pp. 680-689 (2007)
[2]
Brisaboa, N., Fariña, A., Ladra, S., Navarro, G.: Reorganizing compressed text. In: SIGIR (to appear, 2008)
[3]
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech.Rep. 124, December (1994)
[4]
Clark, D.: Compact Pat Trees. Ph.D thesis, University of Waterloo (1996)
[5]
Claude, F., Navarro, G.: A fast and compact Web graph representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 118-129. Springer, Heidelberg (2007)
[6]
Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice (manuscript, 2007), http://pizzachili.dcc.uchile.cl
[7]
Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552-581 (2005)
[8]
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM TALG 3(2) article 20 (2007)
[9]
Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368-373 (2006)
[10]
González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. Posters WEA, pp. 27-38 (2005)
[11]
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: SODA, pp. 841-850 (2003)
[12]
Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: SPIRE, pp. 214-226 (2007)
[13]
Munro, I., Raman, R., Raman, V., Srinivasa Rao, S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345-356. Springer, Heidelberg (2003)
[14]
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) article 2 (2007)
[15]
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) article 2 (2007)
[16]
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: ALENEX (2007)
[17]
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001, vol. 2125, pp. 426-437. Springer, Heidelberg (2001)
[18]
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233-242 (2002)

Cited By

View all

Index Terms

  1. Practical Rank/Select Queries over Arbitrary Sequences
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    SPIRE '08: Proceedings of the 15th International Symposium on String Processing and Information Retrieval
    November 2008
    293 pages
    ISBN:9783540890966
    • Editors:
    • Amihood Amir,
    • Andrew Turpin,
    • Alistair Moffat

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 10 November 2008

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CoCo-trieInformation Systems10.1016/j.is.2023.102316120:COnline publication date: 1-Feb-2024
    • (2024)Compressed and queryable self-indexes for RDF archivesKnowledge and Information Systems10.1007/s10115-023-01967-766:1(381-417)Online publication date: 1-Jan-2024
    • (2022)A Learned Approach to Design Compressed Rank/Select Data StructuresACM Transactions on Algorithms10.1145/352406018:3(1-28)Online publication date: 11-Oct-2022
    • (2022)Using Compressed Suffix-Arrays for a compact representation of temporal-graphsInformation Sciences: an International Journal10.1016/j.ins.2018.07.023465:C(459-483)Online publication date: 21-Apr-2022
    • (2021)Practical Wavelet Tree ConstructionACM Journal of Experimental Algorithmics10.1145/345719726(1-67)Online publication date: 9-Jul-2021
    • (2018)Library and Function Identification by Optimized Pattern Matching on Compressed DatabasesProceedings of the 2nd Reversing and Offensive-oriented Trends Symposium10.1145/3289595.3289598(1-12)Online publication date: 29-Nov-2018
    • (2018)Log(graph)Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243198(1-13)Online publication date: 1-Nov-2018
    • (2017)Parallel construction of wavelet trees on multicore architecturesKnowledge and Information Systems10.1007/s10115-016-1000-651:3(1043-1066)Online publication date: 1-Jun-2017
    • (2016)Compressed $$\text {k}\mathsf {^d}\text {-tree}$$kd-tree for temporal graphsKnowledge and Information Systems10.1007/s10115-015-0908-649:2(553-595)Online publication date: 1-Nov-2016
    • (2015)SNT-indexProceedings of the 1st International ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics10.1145/2835022.2835023(1-8)Online publication date: 3-Nov-2015
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media