Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2791188.2791194guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Practical entropy-compressed rank/select dictionary

Published: 06 January 2007 Publication History
  • Get Citation Alerts
  • Abstract

    Rank/Select dictionaries are data structures for an ordered set S ⊂ {0,1, . . ., n − 1} to compute rank(x, S) (the number of elements in S that are no greater than x), and select(i, S) (the i-th smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc. In these data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose four novel Rank/Select dictionaries: esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) ≤ 1 is the zero-th order empirical entropy of S) in practice. Furthermore, their query times are superior to those of existing structures. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations, both in terms of size and query time.

    References

    [1]
    D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275--292, 2005.
    [2]
    Y. T. Chiang, C. C. Lin, and H. I. Lu. Orderly spanning trees with applications. SIAM Journal on Computing, 34(4):924--945, 2005.
    [3]
    T. Cover. Enumerative source encoding. IEEE Trans. on Information, 19(1):73--77, 1973.
    [4]
    P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Structuring labeled trees for optimal succinctness, and beyond. In FOCS, 2005.
    [5]
    P. Ferragina and G. manzini. Indexing compressed texts. Journal of the ACM, 52(4):552--581, 2005.
    [6]
    R. Geary., N. Rahman., R. Raman., and V. Raman. A simple optimal represengtation for balanced parentheses. In Proc. of CPM, pages 159--172, 2004.
    [7]
    R. Geary., N. Rahman., and V. Raman. Succinct ordinal trees with level-ancestor queries. In ACM-SIAM SODA, pages 1--10, 2004.
    [8]
    A. Golynski. Optimal lower bounds for rank and select indexes. In Proc. of ICALP, 2006.
    [9]
    R. González, S. Grabowski, V. Mäkinen, and G. Navarro. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA '05), pages 27--38, Greece, 2005. CTI Press and Ellinika Grammata.
    [10]
    R. Grossi., A. Gupta., and J. Vitter. High-order entropy-compressed text indexes. In Proc. of SODA, pages 841--850, 2003.
    [11]
    R. Grossi., A. Gupta., and J. Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. In Proc. of SODA, pages 636--645, 2004.
    [12]
    Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing, 35(2):378--407, 2005.
    [13]
    A. Gupta, W. Hon, R. Shar, and J. Vitter. Compressed dictionaries: Space measures, data sets, and experiments. In Proc. of WEA, 2006. To appear.
    [14]
    A. Gupta, W. K. Hon, R. Shar, and J. Vitter. Compressed data structures: Dictionaries and data-aware measures. In Proc. of DCC, pages 213--222. IEEE, 2006.
    [15]
    W. K. Hon, K. Sadakane, and W. K. Sung. Succinct data structures for searchable partial sums. In Proc. of ISAAC, pages 505--516, 2003.
    [16]
    G. Jacobson. Space-efficient static trees and graphs. In Proc. of FOCS, pages 549--554, 1989.
    [17]
    D. K. Kim., J. C. Na., J. E. Kim., and K. Park. Efficient implementation of rank and select functions for succinct representation. In Proc. of WEA, 2005.
    [18]
    P. B. Miltersen. Lower bounds on the size of selection and rank indexes. In Proc. of SODA, pages 11--12, 2005.
    [19]
    J. I. Munro. Tables. In Proc. of FSTTCS, pages 37--42, 1996.
    [20]
    J. I. Munro and S. S. Rao. Succinct representations of functions. In Proc. of ICALP, pages 1006--1015, 2004.
    [21]
    J. I. Munro, V. Rman, and S. S. Rao. Space efficient suffix trees. Journal of Algorithms, 39(2):205--222, 2001.
    [22]
    R. Pagh. Low redundancy in static dictionaries with constant query time. SIAM J. Computation, 31(2):353--363, 2001.
    [23]
    C. K. Poon and W. K. Yiu. Opportunistic data structures for range queries. In Proc. of COCOON, pages 560--569, 2005.
    [24]
    R. Raman, V. Raman, and S. S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proc. of SODA, pages 232--242, 2002.
    [25]
    S. S. Rao. Time-space trade-offs for compressed suffix arrays. Information Processing Letters, 82(6):307--311, 2002.
    [26]
    K. Sadakane. Succinct representations of lcp information and improvements in the compressed suffi arrays. In ACM-SIAM SODA, pages 225--232, 2002.
    [27]
    K. Sadakane. New text indexing functionalities of the compressed suffix arrays. J. Algorithms, 48(2):294--313, 2003.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Proceedings of the Meeting on Algorithm Engineering & Expermiments
    January 2007
    163 pages

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 06 January 2007

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 28 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Compressed and queryable self-indexes for RDF archivesKnowledge and Information Systems10.1007/s10115-023-01967-766:1(381-417)Online publication date: 1-Jan-2024
    • (2020)The PGM-indexProceedings of the VLDB Endowment10.14778/3389133.338913513:8(1162-1175)Online publication date: 3-May-2020
    • (2019)Better External Memory LCP Array ConstructionACM Journal of Experimental Algorithmics10.1145/329772324(1-27)Online publication date: 14-Feb-2019
    • (2019)Fixed Block Compression Boosting in FM-IndexesAlgorithmica10.1007/s00453-018-0475-981:4(1370-1391)Online publication date: 1-Apr-2019
    • (2018)Morton filtersProceedings of the VLDB Endowment10.14778/3213880.321388411:9(1041-1055)Online publication date: 1-May-2018
    • (2018)Log(graph)Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243198(1-13)Online publication date: 1-Nov-2018
    • (2017)Practical Compact Indexes for Top-k Document RetrievalACM Journal of Experimental Algorithmics10.1145/304395822(1-37)Online publication date: 2-Mar-2017
    • (2017)Compressed double-array tries for string dictionaries supporting fast lookupKnowledge and Information Systems10.1007/s10115-016-0999-851:3(1023-1042)Online publication date: 1-Jun-2017
    • (2016)Succinct Data Structures in Information RetrievalProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914802(1231-1233)Online publication date: 7-Jul-2016
    • (2016)Efficient dynamic range minimum queryTheoretical Computer Science10.1016/j.tcs.2016.07.002656:PB(108-117)Online publication date: 20-Dec-2016
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media