Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Practical linear-time O(1)-workspace suffix sorting for constant alphabets

Published: 05 August 2013 Publication History
  • Get Citation Alerts
  • Abstract

    This article presents an O(n)-time algorithm called SACA-K for sorting the suffixes of an input string T[0, n-1] over an alphabet A[0, K-1]. The problem of sorting the suffixes of T is also known as constructing the suffix array (SA) for T. The theoretical memory usage of SACA-K is n log K + n log n + K log n bits. Moreover, we also have a practical implementation for SACA-K that uses n bytes + (n + 256) words and is suitable for strings over any alphabet up to full ASCII, where a word is log n bits. In our experiment, SACA-K outperforms SA-IS that was previously the most time- and space-efficient linear-time SA construction algorithm (SACA). SACA-K is around 33% faster and uses a smaller deterministic workspace of K words, where the workspace is the space needed beyond the input string and the output SA. Given K=O(1), SACA-K runs in linear time and O(1) workspace. To the best of our knowledge, such a result is the first reported in the literature with a practical source code publicly available.

    References

    [1]
    Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Combinatorial Pattern Matching, Lecture Notes in Computer Science, vol. 2676, Spriger Verlag, Berlin Heidelberg, 55--69.
    [2]
    Dementiev, R., Kärkkäinen, J., Mehnert, J., and Sanders, P. 2008. Better external memory suffix array construction. ACM J. Exp. Algor. 12.
    [3]
    Ferragina, P., Gagie, T., and Manzini, G. 2012. Lightweight data indexing and compression in external memory. Algorithmica 63, 3, 707--730.
    [4]
    Fischer, J. 2011. Inducing the LCP-array. In Algorithms and Data Structures, Lecture Notes in Computer Science, vol. 6844, Spriger Verlag, Berlin Heidelberg, 374--385.
    [5]
    Franceschini, G. and Muthukrishnan, S. 2007. In-place suffix sorting. In Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4596, Spriger Verlag, Berlin Heidelberg, 533--545.
    [6]
    Hon, W. K., Sadakane, K., and Sung, W. K. 2003. Breaking a time-and-space barrier for constructing full-text indices. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS'03). 251--260.
    [7]
    Itoh, H. and Tanaka, H. 1999. An efficient method for in memory construction of suffix arrays. In Proceedings of the String Processing and Information Retrieval Symposium and International Workshop on Group-ware (SPIRE'99). 81--88.
    [8]
    Kärkkäinen, J., Sanders, P., and Burkhardt, S. 2006. Linear work suffix array construction. JACM 53, 6, 918--936.
    [9]
    Kim, D. K., Jo, J., Park, H., and Park, K. 2005. Constructing suffix arrays in linear time. J. Disc. Algor. 3, 2--4, 126--142.
    [10]
    Ko, P. and Aluru, S. 2005. Space-efficient linear time construction of suffix arrays. J. Disc. Algor. 3, 2--4, 143--156.
    [11]
    Larsson, N. J. and Sadakane, K. 1999. faster suffix sorting. Tech. rep. LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1--20/(1999). Department of Computer Science, Lund University, Sweden.
    [12]
    Manber, U. and Myers, G. 1993. Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948.
    [13]
    Maniscalco, M. A. and Puglisi, S. J. 2006. Faster lightweight suffix array construction. In Proceedings of the 17th Australasian Workshop on Combinatorial Algorithms. 16--29.
    [14]
    Manzini, G. and Ferragina, P. 2004. Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 1, 33--50.
    [15]
    Nong, G., Zhang, S., and Chan, W. H. 2011. Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60, 10, 1471--1484.
    [16]
    Okanohara, D. and Sadakane, K. 2009. A linear-time burrows-wheeler transform using induced sorting. In Proceedings of the 16th International Symposium on string Processing and Information Retrieval (SPIRE'09). Lecture Notes in Computer Science, vol. 5721, Spriger Verlag, Berlin Heidelberg, 90--101.
    [17]
    Puglisi, S. J., Smyth, W. F., and Turpin, A. H. 2007. A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39, 2, 1--31.
    [18]
    Sadakane, K. 1998. A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation. In Proceedings of the Data Comprission Conference (DCC'98). 129--38.
    [19]
    Schürmann, K. B. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and the 2nd Workshop on Analytic Algorithms and Combinations (ALENEX/ANALCO'05). 77--85.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 31, Issue 3
    July 2013
    202 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/2493175
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 August 2013
    Accepted: 01 March 2013
    Revised: 01 January 2013
    Received: 01 June 2012
    Published in TOIS Volume 31, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. O(1)-workspace
    2. Suffix array
    3. linear time
    4. sorting algorithm

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Suffix sorting via matching statisticsAlgorithms for Molecular Biology10.1186/s13015-023-00245-z19:1Online publication date: 12-Mar-2024
    • (2024)Generic Non-recursive Suffix Array ConstructionACM Transactions on Algorithms10.1145/364185420:2(1-42)Online publication date: 13-Apr-2024
    • (2024)Efficient Sorting Suffixes of Big Alphabets2024 Data Compression Conference (DCC)10.1109/DCC58796.2024.00035(273-282)Online publication date: 19-Mar-2024
    • (2024) Efficient construction and utilization of k -ordered FM-indexes with kISS for ultra-fast read mapping in large genomes Bioinformatics10.1093/bioinformatics/btae40940:7Online publication date: 19-Jun-2024
    • (2024)Algorithm design and performance evaluation of sparse induced suffix sortingInformation Processing & Management10.1016/j.ipm.2024.10377761:5(103777)Online publication date: Oct-2024
    • (2023)Efficient construction of the BWT for repetitive text using string compressionInformation and Computation10.1016/j.ic.2023.105088294(105088)Online publication date: Oct-2023
    • (2023)Tunnel: Parallel-inducing sort for large string analyticsFuture Generation Computer Systems10.1016/j.future.2023.08.009149(650-663)Online publication date: Dec-2023
    • (2022)Foundations of Differentially Oblivious AlgorithmsJournal of the ACM10.1145/355598469:4(1-49)Online publication date: 26-Aug-2022
    • (2022)Grammar Compression by Induced Suffix SortingACM Journal of Experimental Algorithmics10.1145/354999227(1-33)Online publication date: 26-Aug-2022
    • (2022)Building and Checking Suffix Array Simultaneously by Induced Sorting MethodIEEE Transactions on Computers10.1109/TC.2021.306170971:4(756-765)Online publication date: 1-Apr-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media