4.1. Adaptation of the Single-Pass Algorithm
Having an -bits representation of gives us hope to find an algorithm computing in a total workspace space of bits. Indeed, we can adapt our algorithm devised for the reversed LZ factorization to compute . For that, we just have to promote all leaves to phrase leaves such that the condition in Line 7 of Algorithm 1 is always true. Consequently, Player 1 performs a root-to-leaf traversal for finding the lowest node marked in of each leaf. By doing so, the time complexity becomes , however, since we visit at most many nodes during the root-to-leaf traversals (there are strings like for which this sum becomes ).
To lower this time bound, we follow the same strategy as in [
22] ([Section 3.5]) or [
34] ([Lemma 6]) using
and Lemma 2: After Player 1 has computed
for
w being the lowest ancestor marked in
of the leaf with suffix number
, we cache
for the next turn of Player 1 such that Player 1 can start the root-to-leaf traversal to the leaf
with suffix number
i directly from
and thus skips the nodes from the root to
. This works because
is the ancestor of
with
, and
must have been marked in
since
. See
Figure 7 for a visualization. By skipping the nodes from the root to
, we visit only
many nodes during the
i-th turn of Player 1. A telescoping sum together with Lemma 2 shows that Player 1 visits
nodes in total.
The final bottleneck for CST are the
n evaluations of
to compute the actual values of
(cf. Line 15 of Algorithm 1). Here, we use a support data structure on CST for
[
34] ([Lemma 6]), which can be constructed in
time, uses
bits of space, and answers
in
time. This finally gives Theorem 2.
4.2. Algorithm of Crochemore et al.
We can also run the algorithm of Crochemore et al. [
14] ([Algorithm 2]) with our suffix tree representations to obtain the same space and time bounds as stated in Theorem 2. For that, let us explain this algorithm in suffix tree terminology: For each leaf
with suffix number
i, the idea for computing
is to scan the leaves for the leaf
with
being the referred position, and hence the string depth of
is
. To compute
, we approach
from the left and from the right to find
(resp.
) having the deepest LCA with
among all leaves to the left (resp. right) side of
whose suffix numbers are greater than
. Then either
or
is
. Let
and
. Then
, and the referred position is either
or
, depending on whose respective LCA has the deeper string depth. Note that the referred positions in this algorithm are not necessarily always the leftmost possible ones.
Correctness. Let j be the referred position of the leaf with suffix number i such that and have the LCP F of length . Due to Lemma 1, there is a suffix tree node w whose string label is F. Consequently, and the leaf with suffix number are in the subtree rooted at w. Now suppose that we have computed and according to the above described algorithm. On the one hand, let us first assume that (the case is treated symmetrically). By definition of , there is a descendant of w with the string depth , and has both and in its subtree. However, this means that and have a common prefix longer than , a contradiction to storing the length of the longest such LCP. On the other hand, let us assume that . Then w is a descendant of the node being the LCA of and . Without loss of generality, let us stipulate that the leaf with suffix number is to the right of (the other case to the left of works with by symmetry). Then is to the left of , i.e., is between and . Since , this contradicts the selection of to be the closest leaf on the right hand side of with a suffix number larger than .
Finding the Starting Points. Finally, to find the starting points of and being initially the leaves with the maximal suffix number to the left and to the right of , respectively, we use a data structure for answering.
returning the leaf with the maximum suffix number among all leaves whose leaf-ranks are in .
We can modify the data structure computing
in
Section 3.3 to return the leaf-rank instead of the suffix number (the used data structure for
first computes the leaf-rank and then the respective suffix number). Finally, we need to take the border case into account that
is the leftmost leaf or the rightmost leaf in the suffix tree, in which case we only need to approach
from the right side or from the left side, respectively.
The algorithm explained up to now already computes correctly, but visits leaves per entry, or leaves in total. To improve this bound to leaves, we apply two tricks. To ease the explanation of these tricks, let us focus on the right-hand side of ; the left-hand side is treated symmetrically.
Overview for Algorithmic Improvements. Given we want to compute , we start with a pointer to a leaf to the right of with suffix number larger than , and approach with from the right until there is no leaf closer to on its right side with a suffix number larger than . Then is , and we can compute being the string depth of the LCA of and . If we scan linearly the suffix tree leaves to reach with the pointer , this gives us leaves to process. Now the first trick lets us reduce the number of these leaves up to many for computing . The broad idea is that with the operation we can find a leaf closer to whose LCA is at least one string depth deeper than the LCA with the previously processed leaf. In total, the first trick helps us to compute by processing at most many leaves. In the second trick, we show that we can reuse the already computed neighboring leaves and by following their suffix links such we process at most many leaves (instead of ) for computing . Finally, by a telescoping sum, we obtain a linear number of leaves to process.
First Trick. The first trick is to jump over leaves whose respective suffixes all share the same longest common prefix with . We start with being the leaf on the right-hand side of with the largest suffix number. As long as , we search the leftmost leaf between and (to be more precise: ) with . Having , we consider:
If (meaning is to the right of and there is no leaf between and ), we terminate.
Otherwise, we set to the leaf with the largest suffix number among the leaves with leaf-ranks in the range . If , we set and recurse. Otherwise we terminate.
On termination,
because there is no leaf
on the right of
closer to
than
with
and
. Hence,
is the referred position, and we continue with the computation of
. See
Figure 8 for a visualization.
Broadly speaking, the idea is that the closer
gets to
, the deeper the string depth of
becomes. However, we have to stop when there is no closer leaf with a suffix number larger than
. So we first scan until reaching a
having the same lowest common ancestor with
, and then search within the interval of leaves between
and
for the remaining leaf
with the largest suffix number. We search for
because we can jump from
to
with a range minimum query on the LCP array returning the index of the leftmost minimum in a given range. We can answer this query with an
-bits data structure in
or
time for the SST or the CST, respectively, and build it in
time or
time (cf. [
22] ([Section 3.3]) and [
41] ([Lemma 3]) for details). However, with this algorithm, we may visit as many leaves as
since each jump from
to
via
brings us at least one value closer to
. To lower this bound to
leaf-visits, we again make use of Lemma 2 (cf.
Section 4.1), but exchange
with
(or respectively
) in the statement of the lemma.
Second Trick. Assume that we have computed with . We subsequently set , but also . Now has suffix number i. If , then the string depth of the is , and is lexicographically larger than ; hence is to the right of with (generally speaking, given two leaves and whose LCA is not the root, then if and only if .). Otherwise (), we reset . By doing so, we assure that is always a leaf to the right of with (if such a leaf exists), and that we have already skipped string depths for the search of with . Since , the telescoping sum shows that we visit leaves in total.
In total, we obtain an algorithm that visits
leaves, and spends
or
time per leaf when using the SST or the CST, respectively. We need
bits of working space on top of
since we only need the values
,
,
, and
to compute
. We note that Crochemore et al. [
14] do not need the suffix tree topology, since they only access the suffix array, its inverse, and the LCP array, which we translated to
leaves and the string depths of their LCAs.