Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

Published: 01 February 2009 Publication History

Abstract

Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. In the literature, the fastest algorithm runs in $O(n)$ time, while it requires $O(n\log n)$-bit working space, where $n$ denotes the length of the text. On the other hand, the most space-efficient algorithm requires $O(n)$-bit working space while it runs in $O(n\log n)$ time. It was open whether these indices can be constructed in both $o(n\log n)$ time and $o(n\log n)$-bit working space. This paper breaks the above time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array, which takes $O(n)$ time and $O(n)$-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires $O(n\log^{\epsilon}n)$ time and $O(n)$-bit working space for any $0<\epsilon<1$. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as compressed suffix tree, compressed suffix arrays, and FM-index. We also study the general case where the size of the alphabet $\Sigma$ is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal $O(n\log|\Sigma|)$-bit working space while running in $O(n\log\log|\Sigma|)$ time and $O(n(\log^{\epsilon}n+\log|\Sigma|))$ time, respectively. These are the first algorithms that achieve $o(n\log n)$ time with optimal working space. Moreover, for the special case where $\log|\Sigma|=O((\log\log n)^{1-\epsilon})$, we can speed up our suffix array construction algorithm to the optimal $O(n)$.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Computing
SIAM Journal on Computing  Volume 38, Issue 6
February 2009
413 pages

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 February 2009

Author Tags

  1. preprocessing
  2. suffix arrays
  3. suffix trees
  4. text indexing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Text Indexing for Long Patterns: Anchors are All you NeedProceedings of the VLDB Endowment10.14778/3598581.359858616:9(2117-2131)Online publication date: 1-May-2023
  • (2023)Weighted Burrows–Wheeler CompressionSN Computer Science10.1007/s42979-022-01629-54:3Online publication date: 17-Mar-2023
  • (2022)A fast algorithm for constructing suffix arrays for DNA alphabetsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.04.01534:7(4659-4668)Online publication date: 1-Jul-2022
  • (2021)Indexing Highly Repetitive String Collections, Part IIACM Computing Surveys10.1145/343299954:2(1-32)Online publication date: 9-Feb-2021
  • (2020)Linear-time String Indexing and Analysis in Small SpaceACM Transactions on Algorithms10.1145/338141716:2(1-54)Online publication date: 9-Mar-2020
  • (2019)String synchronizing sets: sublinear-time BWT construction and optimal LCE data structureProceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing10.1145/3313276.3316368(756-767)Online publication date: 23-Jun-2019
  • (2019)Sparse Dynamic Programming on DAGs with Small WidthACM Transactions on Algorithms10.1145/330131215:2(1-21)Online publication date: 6-Feb-2019
  • (2019)SACABench: Benchmarking Suffix Array ConstructionString Processing and Information Retrieval10.1007/978-3-030-32686-9_29(407-416)Online publication date: 7-Oct-2019
  • (2017)Space-efficient construction of compressed indexes in deterministic linear timeProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3039686.3039712(408-424)Online publication date: 16-Jan-2017
  • (2017)Spacetime trade-offs for finding shortest unique substrings and maximal unique matchesTheoretical Computer Science10.1016/j.tcs.2017.08.002700:C(75-88)Online publication date: 14-Nov-2017
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media