Local Search for String Problems:
Brute Force is Essentially Optimal
Jiong Guo1⋆ , Danny Hermelin2 , Christian Komusiewicz3
1
3
Universität des Saarlandes,
Campus E 1.7, D-66123 Saarbrücken, Germany.
jguo@mmci.uni-saarland.de
2
Industrial Management and Engineering Department,
Ben-Gurion University, Beer Sheva, Israel.
hermelin@bgu.ac.il
Institut für Softwaretechnik und Theoretische Informatik,
Technische Universität Berlin, D-10587 Berlin, Germany
christian.komusiewicz@tu-berlin.de
Abstract. We address the problem of whether the brute-force procedure for the local improvement
step in a local search algorithm can substantially be improved when applied to classical NP-hard
string problems. We examine four of the more prominent problems in this domain: Closest String,
Longest Common Subsequence, Shortest Common Supersequence, and Shortest Common
Superstring. Herein, we consider arguably the most fundamental string distance measure, namely
the Hamming distance, which has been applied in practical local search implementations for string
problems. Our results indicate that for all four problems, the brute-force algorithm cannot be considerably improved.
Key words: Local Search, Parameterized Complexity, Parameterized Intractability, Closest String, Longest Common
Subsequence, Shortest Common Supersequence, Shortest Common Superstring.
1
Introduction
Local search is a universal algorithmic approach for coping with computationally hard optimization
problems. It is typically applied on problems which can be formulated as finding a solution maximizing or minimizing a criterion among a number of feasible solutions. The main idea is to start
with some solution, then search inside the local neighborhood of this solution for a better solution
until a locally optimal solution has been found. The hope is then that the locally optimal solution
is almost as good as a globally optimal one. See the book by Aarts and Lenstra [1] for further
background and results concerning local search.
There are two main theoretical approaches to study local search: PLS-completeness [12] and
parameterized local search [15, 6]. PLS-completeness can be used to show that finding a locally
optimal solution is computationally hard since a lot of improvement steps might be needed until
it has been found. In contrast, parameterized local search is concerned with the parameterized
complexity of the problem of searching the local neighborhood of a solution in order to find a
better solution. Usually the size of the neighborhood is nO(k) , where n is the total input length,
and k is a parameter measuring the “radius” of the neighborhood; that is, the maximum distance to
the current solution. It is therefore natural to ask whether nO(k) time is required for searching this
neighborhood, or whether f (k) · poly(n) time can be achieved. This is precisely the main question
underlying the theory of parameterized complexity [5]. In this context, this question translates to
determining whether the improvement step problem is fixed-parameter tractable or not with respect
to k.
⋆
Supported by the DFG excellence cluster MMCI.
There is substantial work in parameterized local search. For example, concerning the Traveling Salesman problem, Balas [2] showed that one can find, if it exists, a better tour with
“shift” distance at most k to the old one in 4k · poly(n) time. Marx [15] proved the non-existence
of such an algorithm for the edge-exchange neighborhood. Subsequently, the complexity of local
search for further neighborhood measures of Traveling Salesman was examined [10]. Fellows et
al. [6] provided parameterized algorithms for local search variants of diverse graph problems such
as r-Center, Vertex Cover, Odd Cycle Transversal, Max Cut, and Min-Bisection on
planar graphs and proved W[1]-hardness for the general case. Fomin et al. [8] considered the Feedback Arc set in Tournaments problems and presented a subexponential-time algorithm for
its edge-exchange local search version. Further, Marx and Schlotter [16] studied a variant of Stable Marriage with respect to local search. Further results concerning parameterized local search
have been achieved for clustering problems [4], Boolean Constraint Satisfaction [13], and
Satisfiability [18].
In this paper, we add a new realm to the study of parameterized local search by considering
string problems. Stringology is one of the most widely studied areas in computer science, particularly motivated by direct applications in text mining and computational biology. Here, we
consider four prominent NP-hard string problems: Closest String, Longest Common Subsequence, Shortest Common Supersequence, and Shortest Common Superstring. Local
search seems to be a natural approach for dealing with string problems. For instance, a local search
heuristic using the Hamming distance neighborhood has been implemented and evaluated with
real-world data for a closely related variant of Closest String [7, 17].
We examine all four string problems above in the framework of parameterized local search.
Herein, we consider the Hamming distance neighborhood of a temporary solution and prove that
the local search version of all these problems are W[1]-hard even on alphabets of constant size,
with the Hamming distance k between the old and new solutions as parameter. Since the Hamming
distance seems to be the most simple distance between strings, our results could serve as the basis for
proving the hardness for other distance neighborhoods. Moreover, for all problems except Shortest
Common Supersequence, we can exclude the existence of algorithms with running-times no(k) .
Thus, for these three problems, the nO(k) -time brute-force cannot be substantially improved and is
essentially optimal.
2
Preliminaries
For a string S we write |S| to denote the length of S. We use S[i], 1 ≤ i ≤ |S|, to denote the letter
at position i in S and use S[i, j], 1 ≤ i < j ≤ |S|, to denote the substring S[i] · · · S[j] of S from
position i to position j. A substring of the form S[i, n] is called a suffix of S, and a substring S[1, j]
is called a prefix. For a given suffix T of S, we write S − T to denote the string S[1, |S| − |T |].
We use S− as a shorthand for S − S[|S|]. A string T is a subsequence of S if T can be obtained
from S by deleting some letters; that is, if there exists a sequence of positions i1 < · · · < i|T | with
S[ij ] = T [j] for all j ∈ {1, . . . , |T |}. If T is a subsequence of S, then S is called a supersequence
of T . The Hamming distance dH (S, T ) := |{i : S[i] 6= T [i]}| between two string S and T of equal
length is defined as the number of positions in which the two strings differ. We define the Hamming
distance of a string S to a set T of strings as dH (S, T ) := maxT ∈T dH (S, T ).
We will analyze our local search string problems using the framework of parameterized complexity [5]. In parameterized complexity, problem instances are ordered pairs (x, k) ∈ {0, 1}∗ × N,
where x is a binary string that encodes the combinatorial input of the problem (in our case, a set
of strings and a few integers), and k is a non-negative integer which is referred to as the parameter.
2
For this paper, the only notion from parameterized complexity theory is the concept of reduction:
A parameterized reduction from a parameterized problem L to another parameterized problem L′
is an algorithm with running time f (k) · poly(|x|) for some computable f (), that maps an instance
(x, k) of L to an instance (x′ , k ′ ) of L′ such that:
(i) k ′ ≤ g(k) for some computable g(), and
(ii) (x, k) ∈ L ⇐⇒ (x′ , k ′ ) ∈ L′ .
Note that the running-time of the reduction is allowed to be super-polynomial in k, yet the exponent
of the polynomial which depends on n in this expression is required to be independent of k. Thus,
a running-time of e.g. O(2k · |x|5 ) is allowed, while O(|x|k ) is not. If g() is linearly bounded, i.e.
g(k) = O(k) , then we say that such a reduction is a linear parameterized reduction.
Parameterized intractability is defined via circuit satisfiability problems. A constant-depth circuit is said to have weft t if the maximum number of gates with unbounded fan-in on any inputoutput path is t. The parameterized Weft-t Weighted SAT(k) problem asks to determine, given
a circuit of weft t and a parameter k, whether the circuit can be satisfied by an assignment of Hamming weight k (i.e. by setting k of its input gates to 1). For every t ≥ 1, the class W[t] is defined
as the set of all problems reducible, via a (not necessarily linear) parameterized reduction, to the
Weft-t Weighted SAT(k) problem. It is not difficult to show that W[1] ⊆ W[2] . . ., and it is
widely believed that all these inclusions are proper. In this paper we will only be concerned with
W[1] and W[2]. If there is a parameterized reduction from a W[1]-hard (W[2]-hard) problem to a
parameterized problem L, then this implies W[1]-hardness (W[2]-hardness) of L.
The hardness results in this paper are obtained by parameterized reductions from the following
three well-known problems which all have the solution size k as parameter: In the W[2]-hard
Multicolored Hitting Set(k) (MHS(k)) problem, the input is a hypergraph (V, E) and a
coloring function c : V → {c1 , . . . , ck } which assigns one of k colors to each vertex in V . The goal
is to determine whether there exists a size-k subset H ⊆ V with H ∩ E 6= ∅ for all E ∈ E, such
that H is multicolored, that is, |{v ∈ H : c(v) = ci }| = 1 for all ci ∈ {c1 , . . . , ck }. In the W[1]hard Multicolored Independent Set(k) (MIS(k)) problem, the input is a graph (V, E) and
a coloring function c : V → {c1 , . . . , ck }, and the goal is to determine if (V, E) has a multicolored
independent set I ⊆ V . The W[1]-hard Multicolored Clique(k) (MC(k)) is defined similarly,
except the goal is to determine the existence of a multicolored clique instead of a multicolored
independent set.
While parameterized reductions can show hardness in the parameterized sense (e.g. W[1]hardness or W[2]-hardness), a well-known result of Chen et al. [3] shows that linear reductions
can be used to obtain seemingly stronger results related to hardness in the classical world. This
is done via the so-call Exponential Time Hypothesis (ETH) of Impagliazzo and Paturi [11] which
states that the 3-SAT problem cannot be solved in 2o(n) time. The most relevant part of the paper
by Chen et al. [3] to our work is stated in the lemma below. We refer the reader to [14] for a recent
survey on ETH-based running time lower bounds.
Lemma 1. Let L be a parameterized problem with parameter k, and assume that there is a linear
parameterized reduction from either MHS(k)1 , MIS(k), or MC(k) to L. Then unless ETH fails,
L cannot be solved by an algorithm running in no(k) time.
1
For MHS(k), Chen et al. [3] do not explicitly make this statement, but it can be inferred using the standard simple
reduction from Dominating Set
3
3
Closest String
The first local search string problem we consider is a local search variant of the Closest String
problem. Let Σ denote some arbitrary alphabet, and n be a positive integer. In Closest String,
the input is a set T ⊆ Σ n of strings and an integer d, and the goal is to determine whether there is
a string S ∈ Σ n such that dH (S, T ) ≤ d. The local search variant of this problem that we consider
is defined as follows:
Local Search Closest String (LSCS):
Input: A set T := {T1 , . . . , Tm } ⊆ Σ n of input strings, a temporary solution string S ∈ Σ n
with d := dH (S, T ), and a nonnegative integer k.
e T ) < d and dH (S,
e S) ≤ k?
Question: Is there a string Se of length n such that dH (S,
Thus, we are given a temporary solution string S, and we want to find a better solution Se in the
k-neighborhood of S, where this neighborhood is defined w.r.t. Hamming distance.
We denote the different parameterizations of this problem by appending the parameters to the
problem name in parenthesis. Thus, LSCS(k) for instance, is the LSCS problem parameterized
by k. Observe that LSCS can be solved by a brute-force algorithm in O(nk+1 · m) time. It is also
not difficult to devise a bounded search tree algorithm with running-time dk · poly(n, m) using the
following observation: As long as S (or an intermediate solution) differs from some input string in at
least d positions, then one of these positions in S must be changed. Achieving a f (m) · poly(n)-time
algorithm by modifying the Integer Linear Programming-based algorithm of Gramm et al. [9] is
also possible. Below, we will show that for the parameter k, one cannot substantially improve on
the brute-force algorithm in general, even in the case where the strings are binary.
Theorem 1. There is a linear parameterized reduction from MHS(k) to LSCS(k + |Σ|).
Proof. Given an instance (V := {1, . . . , |V |}, E, c) of MHS(k) with c : V → {c1 , . . . , ck }, we create
an instance (T , S, k) of LSCS(k, |Σ|) as follows: For each E ∈ E, we create a string TE of length
|V | for which TE [v] := c(v) for all v ∈ E, and TE [v] := 0 for all v ∈ V \ E. For each c ∈ {c1 , . . . , ck },
we construct a string Tc with Tc = c|V | . The set of strings T is then defined as T := {TE : E ∈
E} ∪ {Tc : c ∈ {c1 , . . . , ck }}, and the string S is set to S := 1|V | . Thus, the input strings are over
the k + 2 letter alphabet Σ := {0, 1, c1 , . . . , ck }, and dH (S, T ) = |V |.
Cleary, the above construction can be carried out in polynomial time. To see its correctness,
observe that given a multicolored hitting set H ⊆ V of size k for (V, E), we can construct a solution
e = c(v) for all v ∈ H, and S[v]
e = 1 for all v ∈ V \ H. Indeed,
string Se for (T , S, k) by setting S[v]
e
e
dH (S, S) = k, and it is not difficult to verify that dH (S, T ) < |V | = dH (S, T ) for every T ∈ T .
Conversely, a solution string Se for (T , S, k) must include at least one occurrence of the letter c for
e Tc ) = |V |. Since dH (S, S)
e ≤ k, this implies that
every color c ∈ {c1 , . . . , ck }, since otherwise dH (S,
e
S must include exactly one such occurrence for each c ∈ {c1 , . . . , ck }. Furthermore, for every E ∈ E
e TE ) < |V |, which implies that there is a position v ∈ V for which S[v]
e = TE [v], which
we have dH (S,
e
in turn implies that v ∈ E. It therefore follows that the set H := {v : S[v] 6= 1} is a multicolored
hitting set of size k for (V, E).
⊔
⊓
We now modify the construction in the proof of Theorem 1 so that the input strings are all over
the binary alphabet Σ := {0, 1}. The difficulty for inputs with a binary alphabet is that flipping a
character from 0 to 1 not only means “moving towards” the input strings with a 1 at this position,
but also “moving away” from all other input strings. Let (V := {1, . . . , |V |}, E, c) be an instance
of MHS(k), and assume w.l.o.g. that |E| ≤ |V | − k for each E ∈ E. Set the individual input string
4
length to n := |V | + |V | · |E| + 2k · |V |, and set the temporary solution S to 0n . For each E ∈ E create
a string TE of length n. For each v ∈ {1, . . . , |V |}, set TE [v] := 1 if v ∈ E and TE [v] := 0 otherwise.
Note that the Hamming distance between TE [1, |V |] and S[1, |V |] is exactly |E| ≤ |V | − k. The
remaining positions are used to “pad” the distance between TE and S to |V | − k. To this end,
assign a unique number i ∈ {1, . . . , |E|}, and use the substring TE [i · |V | + 1, (i + 1) · |V |] to pad the
distance between TE and S; that is, set the first |V | − k − |E| positions in this substring to 1 and
all other positions in TE [|V | + 1, n] to 0.
Next, add an additional set of strings which enforce that for each proper subset of colors C ⊂
e := {c(v) : S[v]
e = 1} is not C. Since we
{c1 , . . . , ck }, the set of colors used by a solution string C(S)
e
enforce this for each proper subset, it will follow that C(S) := {c1 , . . . , ck }. For each proper C ⊂
{c1 , . . . , ck }, construct a string TC such that, for each v ∈ {1, . . . , |V |}, we have TC [v] = 0 if c(v) ∈ C
and TC [v] = 1 otherwise. Note that the distance between S[1, |V |] and TC [1, |V |] equals the number
of vertices in V not colored by a color in C. Pad the distance between TC and S to |V | − |C|
by assigning TC a unique number i ∈ {1, . . . , 2k − 1}, and let x denote the number of positions v
in TC [1, |V |] with TC [v] = 0. Note that x ≥ |C| since for each color c ∈ C there is at least one vertex
colored c. Consequently, set the first x − |C| positions in TC [|V | · (|E| + i) + 1, |V | · (|E| + i + 1)] to 1,
and all remaining unspecified positions to 0. Observe that in this way T∅ = 1|V | 0n−|V | .
This concludes the construction of the set T of input strings, and the instance (T , S, k) of
LSCS(k). Clearly this construction can be performed in 2k · poly(n, m) time, and therefore it is
a parameterized reduction. Furthermore, observe that dH (S, T ) = |V |, and that this distance is
obtained by the distance between S and T∅ . We prove the correctness of our reduction using the
following two lemmas.
Example 1. Suppose the hypergraph in the input instance to MHS(k) is given by V = {1, 2, 3, 4}
and E = {E1 = {2, 3}, E2 = {1}, E3 = {2, 4}, E4 = {1, 3}}. Also assume that k = 2, and vertices
{2, 3} are colored r (for red), and vertices {1, 4} are colored b (for blue). The set of strings T is
constructed as:
–
–
–
–
–
–
–
T1 =
T2 =
T3 =
T4 =
T∅ =
T{r} =
T{b} =
0110
1000
0101
1010
1111
1001
0110
0000000000000000000000000000
0000100000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000010000000
0000000000000000000000001000
For readability purposes, we inserted above an artificial gap between the prefix and padding-suffix
of each string.
Lemma 2. If (V, E, c) ∈ MHS(k) then (T , S, k) ∈ LSCS(k).
e =
Proof. Let H ⊆ V be a multicolored hitting set of size k for (V, E). Consider the string Se with S[v]
e
e
e
1 if v ∈ H and S[v] = 0 otherwise. Clearly, dH (S, S) = k. We show that dH (S, S) < |V |; that is,
that Se is a solution to the instance (T , S, k). First, consider an arbitrary input string TE ∈ T
e = k, the distance between Se
corresponding to E ∈ E. Since dH (TE , S) = |V | − k and dH (S, S)
e = 1. Since H
and TE can be at most |V |. This is the case if TE has a 0 at each position v with S[v]
e = TE [v] = 1.
is a hitting set, there is a vertex v ∈ H with v ∈ E, and for this position v we have S[v]
e
Hence, dH (TE , S) < |V |.
Next, consider an arbitrary string TC ∈ T corresponding to a proper subset C ⊂ {c1 , . . . , ck }.
e = |V |−k < |V |. Assume thus that C 6= ∅. By construction, dH (S, TC ) = |V |−|C|.
Clearly dH (T∅ , S)
5
As H is multicolored and of size k, for any color c ∈ {c1 . . . , ck } there is exactly one v ∈ H with
e = 1. Thus, restricted to the positions that correspond to a
c(v) = c for which 0 = S[v] 6= S[v]
color c ∈ C, the distance between Se and TC is increased by 1 compared to distance between Se
and TC , since TC [v] = 0 for all such positions v. For a color c ∈
/ C, this distance is decreased by
1, as TC [v] = 1 for all such positions v ∈ {1, . . . , |V |} with c(v) = c. Since C is a proper subset of
{c1 , . . . , ck }, it follows that the distance between TC and Se is at most |V | − |C| + |C| − 1 < |V |. ⊔
⊓
Lemma 3. If (T , S, k) ∈ LSCS(k) then (V, E, k) ∈ MHS(k).
e S) ≤ k. Then the
Proof. Let Se be a solution string for (T , S, k) with dH (S, T ) < |V | and dH (S,
e
e
set of positions H := {v ∈ {1, . . . , n} : S[v] = 1} which are set to 1 in S is of size at most k. Since
for each i ∈ {|V | + 1, n} there is at most one string T ∈ T with T [i] = 1, we can also assume
e | + 1, n] can be matched only to a single T ∈ T , and
that H ⊆ V , since any 1 in the substring S[|V
e
can thus be shifted to the substring S[1, |V |] while making sure that the distance from T is less
e |V |], then replacing all 1’s with 0’s in S[|V
e | + 1, n] always
than |V |. (Note that if T [1, |V |] = S[1,
results in a string of distance less than |V | to T .) We next argue that H is a multicolored subset of
V , and has size precisely k.
Assume not. Then the set C := {c(v) : v ∈ H} of colors in H is a proper subset of {c1 , . . . , ck },
and so consider the string TC ∈ T . By construction of TC , it follows that the difference dH (S, TC ) −
e TC ) is decreased by 1 for every v ∈ H with c(v) ∈ C, and increased by 1 for every v ∈ H
dH (S,
with c(v) ∈
/ C. Since the color set of H is precisely C, we only have vertices of the first type, and so
e TC ) = dH (S, TC ) + |H| = |V | − |C| + |H| ≥ |V |,
dH (S,
e T ) < |V |.
contradicting the assumption that dH (S,
Thus, we can assume that H ⊆ V is multicolored, and has precisely k elements. Furthermore, H
e TE ) < |V | for each TE , there is at least one position v ∈ H
is also a hitting set for (V, E): Since dH (S,
with TE [v] = 1 (otherwise, the distance between TE and Se is at least |V |).
⊔
⊓
The two lemmas above combined prove the correctness of our reduction. We therefore have the
following theorem.
Theorem 2. There is a linear parameteric reduction from MHS(k) to LSCS(k) for binary strings.
By combining Lemma 1 and Theorem 2, we obtain the following lower bound for LSCS restricted
to binary strings.
Corollary 1. LSCS(k) restricted to binary strings is W[2]-hard. Moreover, the problem has no
no(k) algorithm unless ETH fails.
4
Longest Common Subsequence
The Longest common subsequence (LCS) problem asks to determine whether an input set T
of strings has a string S of some specified length ℓ such that S is a subsequence of each string T ∈ T .
In this section we consider the following local search variant of this problem:
Local Search Longest Common Subsequence (LSLCS):
Input: A set T := {T1 , . . . , Tm } of input strings over an alphabet Σ, a temporary solution
string S such that S is a subsequence of each string in T , and a nonnegative integer k.
e is a subsequence
Question: Is there a letter σ ∈ Σ and a string Se of length |S| such that Sσ
e
of each string in T and dH (S, S) ≤ k?
6
Observe that LSLCS can be solved in nO(k) time by brute-force. We show that it is unlikely to
substantially improve on this algorithm, even in the case of constant-size alphabets. As a warm-up,
we begin with the very easy case of unbounded alphabets.
Lemma 4. There is a linear parameterized reduction from LCS(ℓ) to LSLCS(k).
Proof. Let T := {T1 , . . . , Tm } be an input set of strings to the LCS(ℓ) problem. We create an
instance for LSLCS(k) by setting S := $ℓ−1 , where $ is a letter not appearing in any string of
T , and then setting T ′ := {ST1 , T2 S, . . . , Tm S}. It can easily be verifed to see that (T ′ , S) has a
e with dH (S,
e S) ≤ k = ℓ − 1 iff T has a common subsequence of length ℓ.
solution Sσ
⊔
⊓
We next proceed to the more involved case where |Σ| is part of the parameter. We present a
reduction from MIS(k) to LSLCS(k + |Σ|). Let (G = (V, E), c) denote an instance of MIS(k),
where G is a graph and c is a coloring function c : V → {c1 . . . , ck }. By padding (G, c), we can
assume w.l.o.g, that each color class in G has precisely n vertices, that is, |{v : c(v) = ci }| = n for
each i ∈ {1, . . . , k}. The general idea of the reduction is to construct an instance (T , S, k ′ = k) of
LSLCS(k + |Σ|), with two enforcement strings T1 , T2 ∈ T , such that adding any letter at the end
of S forces modifications in S that correspond to the selection of k vertices of different colors in
G. We then add edge strings Te to T corresponding to edges e ∈ E in order to ensure that these
selected vertices form an independent set in G. The complete details are given below.
We begin by describing the solution string S. The string S consists of a suffix S ∗ := ($£k+1 $)k+1 ,
where $ and £ are two letters of the alphabet that do not appear elsewhere in S. The prefix of S
consists of k substrings, or blocks, one for each color class. The substring S(ci ) corresponding to ci
−
is defined as the string S(ci ) := →
ci (0#)n ←
c−i . The whole string S is thus constructed as
S := S(c1 ) · · · S(ck )S ∗ .
Next we construct the two enforcement strings T1 , T2 ∈ T . The string T1 contains the string S
as its suffix. Its prefix contains k blocks, one for each color class of G, where the i’th block T1 (ci )
−
is defined as T1 (ci ) := →
ci (0#1#)n−1 ←
c−i . The prefix of T1 is separated from its suffix with the string
∗
S to form the string
T1 := T1 (c1 ) · · · T1 (ck )S ∗ S.
The string T2 also contains k blocks, each corresponding to a color of G, where the block corre−
sponding to ci is constructed as T2 (ci ) := →
ci (01#)n ←
c−i . We concatenate all these blocks with the
suffix S ∗ $ to obtain the string
T2 := T2 (c1 ) · · · T2 (ck )S ∗ $.
Finally, for each edge e ∈ E, we construct an input string Te as follows. Assume that the vertices
in each color class are ordered. Let e be an edge between the x’th vertex of color ci and the y’th
vertex of color cj , where i < j. The string Te consists of two blocks for each color class of G, defined
by
–
–
–
–
–
−
Te1 (ci ) := →
ci (01#)x−1 0#(01#)n−x ←
c−i ,
2
y−1
n−y
→
−
←
Te (cj ) := cj (01#) 0#(01#)
c−j ,
−
Te2 (ci ) := →
ci (01#)n ←
c−i ,
1
n
→
−
←
Te (cj ) := cj (01#) c−j ,
−
Te1 (cℓ ) := Te2 (cℓ ) := →
cℓ (01#)n ←
c−ℓ , for all ℓ 6= i, j.
7
We then construct Te by concatenating all these blocks, along with the suffix S ∗ $ to form the string
Te := Te1 (c1 ) · · · Te1 (ck )Te2 (c1 ) · · · Te2 (ck )S ∗ $.
Setting T := {T1 , T2 } ∪ {Te : e ∈ E} completes the construction of our LSLCS(k + |Σ|) instance
(T , S, k). Observe that S is indeed a subsequence of all strings in T , and that Σ is an alphabet
−
−
of size 2k + 5 consisting of the letters →
c1 , ←
c−1 , . . ., →
ck , ←
c−
k , 0, 1, #, $, and £. We now make two
observations that lead to the soundness and completeness of our reduction.
e is a solution string for the constructed instance (T , S, k). Then Sσ
e =
Lemma 5. Suppose that Sσ
∗
e
e
e
S(c1 ) · · · S(ck )S σ, where for each i ∈ {1, . . . , k}, the substring S(ci ) is obtained from S(ci ) by
replacing exactly one occurrence of the letter 0 with the letter 1.
Proof. Observe that no matter what σ is chosen to be, the string S ∗ σ is not a subsequence of
S. Furthermore, it is not difficult to see that modifying any k letters in S ∗ σ results in a string
e is a subsequence of T1 , this means that the suffix of
which is still not a subsequence of S. Since Sσ
∗
e must be a subsequence of S ∗ S. Thus, the remaining prefix of Sσ
e must be a
length |S | + 1 in Sσ
subsequence of T1 (c1 ) · · · T1 (ck ). But as each T1 (ci ) contains one less occurrence of the letter 0 than
e can be written as
S(ci ), the only way for this prefix to be a subsequence of T1 (c1 ) · · · T1 (ck ) is if Sσ
∗
e
e
e
S(c1 ) · · · S(ck )S σ, where each S(ci ) is obtained from S(ci ) by replacing exactly one occurrence of
the letter 0 in S(ci ). Clearly, the only two possibilities are to replace the 0 with either 1 or #, since
e i ) would not be a subsequence of T1 (ci ). However, since Sσ
e is also a subsequence of
otherwise S(c
−
T2 , the letter # cannot be chosen since T2 does not contain the subsequence →
ci #n+1 ←
c−i .
⊔
⊓
e 1 ) · · · S(c
e k ) differs from
According to Lemma 5 above, we can think of the positions in which S(c
S(c1 ) · · · S(ck ) as an encoding the selection of k vertices, one for each color class of G. We refer to
e
these vertices as the set of vertices selected by S.
Lemma 6. The set I ⊆ V (G) of vertices selected by Se is a multicolored independent set in G.
Proof. Suppose that I contains two vertices which are adjacent by an edge e in G, and assume that
these vertices are the x’th vertex of color class ci and the y’th vertex of color class cj . Then by
e contains the subsequence
Lemma 5, the solution string Sσ
−
→
−
c−j .
c−i →
cj (0#)y−1 1#(0#)n−y ←
ci (0#)x−1 1#(0#)n−x ←
However, this string is not a subsequence of Te ∈ T , a contradiction.
⊔
⊓
e with dH (S,
e S) ≤ k then G has a
Lemma 6 implies that if (T , S, k) has a solution string Sσ
multicolored independent set of size k. Conversely, if G has a multicolored independent set I of size
k, then it can readily be verified that by choosing Se such that the k vertices it selects are precisely
e for (T , S, k). Thus, since our construction can be carried out in
I, we have a solution string S$
polynomial time, and since MIS(k) is W[1]-complete, we obtain Theorem 3 below.
Theorem 3. There is a linear parameterized reduction from MIS(k) to LSLCS(k + |Σ|).
We next sketch how to reduce the alphabet in our construction to constant size. For each
−
i ∈ {1, . . . , k}, replace the letters →
ci and ←
c−i with the substrings pα(i) and q α(i) respectively, where
α(k) := 1 and α(i) := α(k) + · · · + α(i + 1) + 1 for i < k. The new alphabet is of size 7. It is not
difficult to verify that Lemma 5 still holds under this modification. The rest of the proof remains
unchanged.
Corollary 2. LSLCS(k) restricted to strings over a constant-size alphabet is W[1]-hard. Moreover,
the problem has no no(k) algorithm unless ETH fails.
8
5
Shortest Common Supersequence
In this section, we consider a local search version of Shortest Common Supersequence (SCSeq). In SCSeq, the input is a set of strings T and an integer ℓ, and the question is whether there
exists a string S of length ℓ which is a supersequence of all strings in T . The local search variant
of this problem that we consider is given by:
Local Search Shortest Common Supersequence (LSSCSeq):
Input: A set T = {T1 , . . . , Tm } of strings over an alphabet Σ, a string S which is a supersequence of all Ti ’s, and a positive integer k.
Question: Is there a string Se of length |S| − 1 which is a supersequence of all Ti ’s such
e ≤ k?
that dH (S−, S)
In other words, the new solution supersequence Se is created from S by removing the last position
of S and modifying at most k remaining positions. The main result of this section is the theorem
below.
Theorem 4. There is a linear parameterized reduction from MIS(k) to LSSCSeq(k) restricted to
strings over an alphabet of constant size.
Let (G = (V, E), c) denote an arbitrary input of MIS(k) with c : V → {c1 , . . . , ck }. We assume
w.l.o.g. that there are n vertices colored ci , for each color ci ∈ {c1 , . . . , ck }, and that any pair of
vertices with equal color are adjacent in G. Furthermore, to ease our presentation, we assume that
the edges in G are directed; that is, E contains the two ordered pairs (u, v) and (v, u) for every
pair of adjacent vertices u and v in G.
In our construction, we use the first part of the supersequence S to encode the k color classes
of the vertex set of G. The main idea of the reduction is to construct an enforcement string T1
which is a suffix of S and, after the removal of the last letter of S, can only be matched to the
first part of S. This match forces that three positions of each color class are changed in the first
part of S. These positions correspond to the vertex of this color class that is in the independent
set. The remaining input strings are used to encode the edges of G, and to force that the changed
positions of each color class represent the same vertex from this class. For presentation purposes,
we construct strings over an alphabet of size O(k). The reduction could be adapted to the constant
alphabet case in a similar way as is done in Section 4. The details are as follows.
We begin by constructing the temporary solution S. First we create three substrings for each
color ci ∈ {c1 , . . . , ck }, which we refer to as selection blocks:
−
– S 1 (ci ) := →
ci (01#)n ←
c−i ,
2
n
←
→
−
– S (ci ) := ci (00#) c−i , and
−
– S 3 (ci ) := →
ci (01#)n ←
c−i .
We construct S by concatenating the selection blocks, using the letter & to separate the three sets
of selection blocks. We then add a suffix to S: The string S ∗ := ($£n $)k+1 concatenated to the
input string T1 ∈ T which will be specified later. The string S is then given by
S := S 1 (c1 ) · · · S 1 (ck ) & S 2 (c1 ) · · · S 2 (ck ) & S 3 (c1 ) · · · S 3 (ck ) S ∗ T1 .
Next, we construct the input string T1 which is the first of two input strings that will act
as enforcement strings, enforcing the changes in S to occur in its selection blocks in a controlled
fashion. For ci ∈ {c1 , . . . , ck }, define
9
−
– T11 (ci ) := →
ci 0n+1 ←
c−i ,
−
– T12 (ci ) := →
ci 1 ←
c−i , and
3
n+1
→
−
←
– T1 (ci ) := ci 0
c−i .
We construct T1 using these substrings, the separation letter &, and the suffix S ∗ :
T1 := T11 (c1 ) · · · T11 (ck ) & T12 (c1 ) · · · T12 (ck ) & T13 (c1 ) · · · T13 (ck ) S ∗ .
The second enforcement string T2 is constructed using the following substrings corresponding
to a color ci ∈ {c1 , . . . , ck }:
−
– T21 (ci ) := →
ci (0#)n ←
c−i ,
2
n
→
−
←
– T2 (ci ) := ci (0#) c−i , and
−
– T23 (ci ) := →
ci (0#)n ←
c−i .
The string T2 is then constructed as
T2 := T21 (c1 ) · · · T21 (ck ) & T22 (c1 ) · · · T22 (ck ) & T23 (c1 ) · · · T23 (ck ) S ∗ T1 − .
To complete the construction of T , we construct a string Te for each e ∈ E. These strings are
composed of substrings that correspond to vertices of G. Let v ∈ V with c(v) := ci , and assume v
is the x’th vertex of color ci . The string T (v) is defined by
−
c−i .
T (v) := →
ci (0#)x−1 01 (0#)n−x ←
The string Te is constructed as Te := T (u) & T (v) if e := (u, v) (recall that we assume that the
edges are directed, and that any pair of vertices with the same color are adjacent).
To finalize our construction, we set the parameter k ′ of the LSSCSeq instance to 3k. Clearly
the instance (T , S, k ′ ) can be constructed in polynomial time. We proceed to show that this instance
is equivalent to the MIS(k) instance. The first crucial step is given by the following lemma.
Lemma 7. If (T , S, k ′ ) ∈ LSSCSeq, then there exists a solution string Se for (T , S, k ′ ) where Se
can be written as Se := S ′ S ∗ T1 − with
S ′ := Se1 (c1 ) · · · Se1 (ck ) & Se2 (c1 ) · · · Se2 (ck ) & Se3 (c1 ) · · · Se3 (ck ),
such that for each i ∈ {1, . . . , k} we have:
–
–
–
Se1 (ci ) is obtained from S 1 (ci ) by replacing exactly one occurrence of 01 to 00,
Se2 (ci ) is obtained from S 2 (ci ) by replacing exactly one occurrence of 00 to 01, and.
Se3 (ci ) is obtained from S 3 (ci ) by replacing exactly one occurrence of 01 to 00.
Proof. Consider the suffix T1 − of S−. A careful examination shows that no matter what k changes
are made to T1 −, the string S ∗ will not be a subsequence of the resulting string. This means that
e and so it must be a subsequence of the
S ∗ is not a subsequence of the length |T1 − | suffix of S,
∗
∗
e Since S is contained as a substring in the |S ∗ T1 − | length suffix of
length |S T1 − | suffix of S.
e we can assume that Se can be written as Se := S ′ S ∗ T1 −, and that S ′ contains as a subsequence
S,
−
the prefix T1 − S ∗ of T1 . Since for each i ∈ {1, . . . , k}, this prefix contains three copies of each of →
ci
1
2
3
′
←
−
→
−
e
e
e
and ci , there must be three substrings S (ci ), S (ci ), and S (ci ) in S that all begin with ci and
end with ←
c−i . Moreover, all of these 3k substrings must be disjoint in S ′ . It is now not difficult to
see that the lemma follows due to our construction of the prefixes T1 − S ∗ and T2 − (S ∗ T1 −) of our
two enforcement strings T1 , T2 ∈ T .
⊔
⊓
10
Let Se be a solution string for (T , S, k ′ ) as in Lemma 7. We interpret the positions in S ′ that differ
from S− as a set of selected vertices {v11 , v12 , v13 , . . . , vk1 , vk2 , vk3 } of G, where for each i ∈ {1, . . . , k},
the vertex vi1 (resp. vi2 , vi3 ) is the x-th vertex in ci if the x-th substring 01 (resp. 00, 01) in S(ci )
e i ). The next lemma shows that the set of selected vertices includes in fact only k
is modified in S(c
vertices.
Lemma 8. vi1 = vi2 = vi3 for each i ∈ {1, . . . , k}.
Proof. Consider some arbitrary i ∈ {1, . . . , k}, and assume that vi1 , vi2 , and vi3 are the x-th, y-th,
and z-th vertices colored ci in G, respectively. Let Sei := Se1 (ci ) & Se2 (ci ) & Se3 (ci ), where Se1 (ci ),
Se2 (ci ), and Se3 (ci ) are the substrings of Se given in Lemma 7. Since Sei contains the only occurrences
−
e it must contain as a subsequence any input string Te ∈ T , with e being an edge
of →
ci and ←
c−i in S,
between two vertices colored ci in G. Suppose that x 6= y. Consider the string T (vi1 ) & T (vi2 ) ∈ T .
By construction, the only way this string can be a subsequence of Sei is if y = z. But then the string
T (vi2 ) & T (vi1 ) ∈ T is not a subsequence of Sei . It follows that x = y. A similar argument can be
used to show that x = z, and so vi1 = vi2 = vi3 .
⊔
⊓
According to Lemma 8, we let vi be the single vertex corresponding to vi1 = vi2 = vi3 , giving
us a multicolored set {v1 , . . . , vk } of vertices in G. The next lemma shows that this set must be
independent in G.
Lemma 9. The set of vertices I := {v1 , . . . , vk } form an independent set in G.
Proof. Suppose that there is an edge e := {vi , vj } in G, i < j, and consider the substring Se′ :=
−
−
Se1 (ci )Se1 (cj ) & Se2 (ci )Se2 (cj ) & Se3 (ci )Se3 (cj ). Since Se′ contains the only occurrences of →
ci , ←
c−i , →
cj
←
−
e it must contain as a subsequence any input string Te ∈ T , with e being an edge
and cj in S,
between two vertices colored ci and cj in G. But by construction, the string Se′ does not contain as
a subsequence the string Te := T (vi ) & T (vj ) ∈ T , a contradiction.
⊔
⊓
e then G has a multicolored
Combining Lemmas 7, 8, and 9 shows that if (T , S, k ′ ) has solution S,
independent set of size k. Conversely, starting at a multicolored independent set I in G, we can
construct a string Se corresponding to I as in Lemma 7 which can be easily verified to be a solution
string for (T , S, k ′ ). Thus our reduction is correct, and by reducing the alphabet size as in Section 4,
we obtain Theorem 4.
Corollary 3. LSSCSeq(k) restricted to strings over a constant-size alphabet is W[1]-hard, and
has no no(k) algorithm unless ETH fails.
6
Shortest Common Superstring
In this section we deal with a local search variant of the classical Shortest Common Superstring. In this problem, the input is a set of strings T and an integer ℓ, and the question is
whether there is a string S of length at most ℓ which is a superstring of all strings in T . The local
search version of Shortest Common Superstring is defined as follows:
Local Search Shortest Common Superstring (LSSCStr):
Input: A set T = {T1 , . . . , Tm } of strings over an alphabet Σ, a string S which is a superstring of all Ti ’s, and a positive integer k.
Question: Is there a string Se of length |S| − 1 which is a superstring of all Ti ’s such that
e S−) ≤ k?
dH (S,
11
Theorem 5. LSSCStr(k) is W[1]-hard, even with an alphabet of constant size.
For ease of presentation, we describe here only the case that the alphabet size |Σ| is part of
the parameter. The case with constant-size alphabets can be coped with the method introduced
in Section 4. The reduction is from the W[1]-complete Multicolored Clique problem, where,
given a graph G = (V, E) and a coloring function c : V → {c1 , . . . , ck }, we ask for a multicolored
clique of size k. We assume w.l.o.g. that c is a proper coloring, i.e. there is no edge {u, v} between
vertices u and v with c(u) = c(v).
The basic idea is as follows. We construct an input string T0 , which is a suffix of the superstring S. After the removal of the last letter of S, T0 has to be matched to a substring S ′ in S which
encodes a vertex v from the first color class; this vertex represents the clique vertex selected from
this color class. In order to match T0 , some modifications have to be performed in S ′ . These modifications then force k input strings, which are matched to substrings of S ′ before, to be matched to
somewhere else in S. One of these input strings then forces modifications in another substring S ′′
of S, which corresponds to a vertex of the second color class. The other k − 1 input strings encode
pairs of color classes. Now, the modifications of S ′′ result in the selection of a vertex from the
second color class, which then triggers a selection of a vertex of the third color class and so on.
After “selecting” one vertex from each color class, there are in total k(k − 1) input strings which
encode color pairs, two strings for each color pair. If the selected vertices form a clique, then the
two strings that correspond to the same pair can be matched to a substring of S which encodes
the edge between the two selected vertices. In this case, one modification is sufficient for each pair
of color classes; otherwise, two modifications are necessary. Hence, these k(k − 1) color pair strings
can be matched with at most k(k − 1)/2 modifications if and only if the selected vertices form a
clique. The details are as follows.
The alphabet Σ consists of k ∗ ·k(k −1)+4k +4 letters with k ∗ := 2k (k 2 +k). The letters $ and #
are separating letters, where $ does not occur in the input strings. The letters 0 and 1 are encoding
letters. The other letters correspond to colors and color pairs. For each color ci , we have 4 letters: ai ,
∗
bi , ci , and di . For each ordered color pair (ci , cj ) with i 6= j, there are k ∗ letters, namely, c1i,j , . . . , cki,j .
Assume that each color class in G contains n vertices. The LSSCStr(k)-instance consists of the
superstring S and a set T of 1 + (k − 1)k · n + (k − 1) · n input strings: one special input string T0 , k
input strings for each vertex from color classes c1 to ck−1 , and k − 1 input strings for each vertex
from the color class ck .
To construct these strings, we first introduce some strings, which are used as “building blocks”
in the construction. First, we describe the “separating blocks”. For each color ci with 2 ≤ i ≤ k, we
g(i)
g(i)
introduce two such blocks: Ai := ai and Bi := bi , where g(i) := 2k−i · (k 2 + k). For each ordered
∗
pair of colors ci and cj with i 6= j, we construct one separating block: Ci,j := (c1i,j #)n · · · (cki,j #)n .
Moreover, we construct two “color-pair matching” blocks for each color ci :
– Mi1 := 0Ci,1 · · · 0Ci,i−1 Ci,i+1 · · · Ci,k , and
– Mi2 := Ci,1 · · · Ci,i−1 Ci,i+1 0 · · · Ci,k 0.
Finally, for every vertex v we construct an “identifying block”. Let ci := c(v). Here we distinguish i = 1 and i > 1. Assume v is the x’th vertex colored ci . The identifying block for v is
constructed as
– I(v) := d1 0x−1 1 0n−x d1 for i = 1, and
– I(v) := di (0Ai )x−1 1 Ai (0Ai )n−x di di−1 , for i > 1.
We are now ready to describe the set of input strings T in our LSSCStr instance. First, for
each vertex v colored ci with 1 ≤ i < k, we construct one “triggering” input string. If v is the x’th
12
vertex colored ci , its triggering input string T (v) is constructed as:
1
T (v) := ci+1 Mi+1
di+1 (0Ai+1 )x−1 0 Bi+1 (0Ai+1 )n−x di+1 di .
Then, for each vertex v colored ci with 1 ≤ i ≤ k, we add k − 1 “pairing” input strings, each
corresponding to a color class cj with j 6= i. Here, we distinguish i < j and i > j:
– T (v, cj ) := Ci,j+1 · · · Ci,k I(v) Ci,1 · · · Ci,i−1 Ci,i+1 0 · · · Ci,j 0
– T (v, cj ) := 0Ci,j · · · 0Ci,i−1 Ci,i+1 · · · Ci,k I(v) Ci,1 · · · Ci,j−1
(i < j),
(i > j).
To finalize our construction of T , we set the special input string T0 :
T0 := c1 M11 d1 0n d1 .
Now, it remains to describe the temporary solution S. To this end, we introduce some further
building blocks. For each edge e = {u, v} ∈ E, where u is colored ci , and v is colored cj with i < j,
we construct one “edge block” S(e) for S as:
S(e) := T (u, cj ) − 1 − T (v, ci ),
where T (u, cj )− as usual denotes the prefix of the pairing input string T (u, cj ) without the last 0,
and −T (v, ci ) denotes the suffix of T (v, ci ) without the first 0. Furthermore, for each vertex v ∈ V
colored ci in G we construct the selection block S(v) of v by:
– S(v) := T (v) Mi1 I(v) Mi2 for i < k, and
– S(v) := Mk1 I(v) Mk2 for i = k.
The solution S then consists of three parts, S := S(V )S(E)T0 , where the first part S(V ) is the
concatenation of the selection blocks S(v) separated by $’s in any arbitrary order, the second part
S(E) is the concatenation of edge blocks S(e) separated by $’s, and T0 is the special input string
described above.
Finally, we set the parameter for the LSSCStr-instance to k ′ := 2k + k(k − 1)/2 + (2k−1 −
1)(k 2 + k). It is easy to verify that S is a superstring of all input strings: The string T0 occurs at the
end of S. Furthermore, for each vertex v ∈ V , the triggering input string T (v) is a prefix of S(v),
while the pairing strings T (v, cj ) are clearly substrings of Mi1 I(v) Mi2 . We next turn to showing
the equivalence of the two instances.
e
Lemma 10. If G has a multicolored clique K than (S, T , k ′ ) has a solution string S.
Proof. Let K := {v1 , . . . , vk }, where vi is the vertex colored ci in K, and assume vi is the xi ’th
vertex colored ci in G. First, we change the only 1 in the identifying block I(v1 ) of the selection
block S(v1 ) to 0 and the last character, i.e. d1 , of the triggering block T (v1 ) of S(v1 ) to c1 . Now T0
e 1 ). Then, for every 1 < i ≤ k, we change the
is a substring of the resulting selection block S(v
only 1 in the identifying block I(vi ) in S(vi ) to 0 and the xi−1 ’th Ai -block in I(vi ) is changed
to a Bi -block. Moreover, the letter di at the end of the triggering block T (vi ) is changed to ci .
e i ). Next,
Now, the triggering input string T (vi−1 ) created for vi−1 is a substring of the resulting S(v
consider the edges e := {vi , vj }, i < j, between the vertices in K. We change the only 1 in the
edge block S(e) in S to 0. After this, the pairing input strings T (vi , cj ) and T (vj , ci ) are substrings
e
of the resulting S(e).
After the modifications specified above, we get a string Se of length |S| − 1
which is a superstring of all input strings. Summing up, we performed two modifications for v1 ,
2 + |Ai | modifications for every i > 1, and one modification for each edge in K. This amounts
to k ′ = 2k + (2k−1 − 1)(k 2 + k) + k(k − 1)/2.
⊔
⊓
13
We next consider the reversed direction. Suppose that a solution Se exists for (T , S, k ′ ). We
e
use S(v)
to denote substring of Se corresponding to the selection block S(v) of S.
Lemma 11. If there is a solution Se for (T , S, k) constructed above, then the input string T0 is a
e 1 ) for some v1 ∈ V with c(v1 ) = c1 .
substring of some S(v
Proof. Since M11 is a substring of T0 and M11 contains (k − 1) · k ∗ > k ′ different letters cℓ1,j with 1 ≤
e For the same reason, T0 cannot be a substring
ℓ ≤ k ∗ and 2 ≤ j ≤ k, T0 cannot be a suffix of S.
e
of S(v) for any vertex v colored ci with 2 ≤ i ≤ k. Moreover, since 0n is a substring of T0 , T0 cannot
e
e As $ does not appear in T0 , this implies that T0 has to
be substring of the edge blocks S(e)
of S.
e
be a substring of some S(v1 ) with c(v1 ) = c1 .
⊔
⊓
Let v1 be the vertex in Lemma 11. By construction, we have to match M11 of T0 to the M11 e 1 ). This implies that the letter 1 in the corresponding identifying block has to be
substring of S(v
changed to 0. Moreover, the last letter d1 of the corresponding triggering block must be changed
to c1 . These two changes cause that the pairing input strings and the triggering string for v1
matched to somewhere else in Se than in S. We consider first the triggering string T (v1 ).
e 2 ) for some
Lemma 12. The triggering input string T (v1 ) can only be matched a substring of S(v
vertex v2 with c(v2 ) = c2 .
e
Proof. As in the proof of Lemma 11, the existence of M21 in T (v1 ) prevents it to be matched to S(v)
e So the only possibilities
corresponding to a vertex colored ci , i > 2, or to the edge blocks in S.
e
e
are the triggering blocks T (v) of S(v) with c(v) = c1 and v 6= v1 , or S(u)
for some vertex u
colored c2 . Note that in the first case, we have one B2 -block in T (v) and one B2 -block in T (v1 ), but
they have different positions. This implies in order to match T (v) to T (v1 ), we have to perform at
least 2|B2 | = 2 · 2k−2 (k 2 + k) modifications, more than the allowed number k ′ .
⊔
⊓
e 2 ) causes 2 + |A2 |
As argued in the other proof direction, the matching of T (v1 ) to some S(v
modifications after which the triggering input string T (v2 ) and k − 1 pairing input strings are
e 3 ) with
unmatched. With a similar proof as for Lemma 12, T (v2 ) has to be matched to some S(v
k−2
2
c(v3 ) = c3 : after performing the 2 + |A2 | = 2 + 2 (k + k) changes to match T (v1 ), one cannot
afford to perform 2|B3 | = 2 · 2k−3 (k 2 + k) = 2k−2 (k 2 + k) changes which are necessary for matching
T (v2 ) to the selection block of another vertex colored c2 . The same argument applies inductively
for all i > 2. In this way, the string Se differs from S in exactly k selection blocks corresponding to
a multicolored set of vertices {v1 , . . . , vk } in G, and the Hamming distance between the remaining
suffixes of S− and Se is at most k(k − 1)/2.
Lemma 13. The set of vertices {v1 , . . . , vk } specified above form a clique in G.
e we
Proof. As stated above, to get T0 and the k − 1 triggering input strings matched with S,
k−1
2
have to perform (2
− 1)(k + k) + 2k modifications. Only k(k − 1)/2 modifications remain for
the k(k − 1) unmatched pairing strings. By construction, no pairing string is a substring of another
pairing string. Consequently, the only possibility is to group them into pairs, where each pair has
to form an edge, since then the pair can be matched to the corresponding edge block with only one
modification, namely, changing the 1 to 0 in the edge block. This implies that the vertices whose
selection blocks are changed from S to Se must form a clique.
⊔
⊓
Combining all lemmas above completes the proof of theorem in case |Σ| is part of the parameter.
A similar modification to the one done in Section 4 provides the proof for the constant-size alphabet
14
case. We mention that in the presented reduction, there are parts of the superstring S that are not
matched to any input string, which is clearly not a feature of any minimal solution of Shortest
Common Superstring. The reduction can be adapted such that this is not the case: First, replace
each occurrence of $ in S by one distinct string, for example, $1i $ for the i’th occurrence, and add
an input string equal to this string. By doing this, all letters in S occur in the input strings. Next,
increase the length of T such that it contains all edge blocks as a prefix. Then, add a block in the
beginning of S that contains some unique starting letter followed by the edge blocks. Finally, add
one further input string that contains this unique starting letter and the edge blocks.
Corollary 4. Unless ETH fails, there is no no(lg k) algorithm for LSSCStr(k), even when all
strings are over an alphabet of constant size.
7
Conclusions
In this paper we addressed the question of whether the brute-force procedure for the local improvement step in a local search algorithm can substantially be improved when applied to four
classical NP-hard string problems. In all cases we considered the Hamming distance as our metric
for defining the local neighborhood where the search is performed, and showed that under this
metric, the brute-force algorithm cannot be considerably improved for all four problems. There are,
however, many other interesting metrics that deserve consideration. Furthermore, there are many
other NP-hard string problems where local search can be applied. Finally, we mention that the
local neighborhood defined for LSLCS and LSSCSub can also be defined by removing a letter at
an arbitrary position from the old solution string Sold (and this is perhaps a more natural definition). However, our hardness results do not indicate a lower bound on the running-time of the local
improvement step in this case; while we believe that this is the case, a proof of this is currently
absent.
References
1. E. H. L. Aarts and J. K. Lenstra. Local Search in Combinatorial Optimization. Wiley-Interscience, 1997.
2. E. Balas. New classes of efficiently solvable generalized traveling salesman problems. Annals of Operations
Research, 86:529–558, 1999.
3. J. Chen, B. Chor, M. Fellows, X. Huang, D. W. Juedes, I. A. Kanj, and G. Xia. Tight lower bounds for certain
parameterized NP-hard problems. Information and Computation, 201(2):216–231, 2005.
4. M. Dörnfelder, J. Guo, C. Komusiewicz, and M. Weller. On the parameterized complexity of consensus clustering.
In Proc. 22nd ISAAC, volume 7074 of Lecture Notes in Computer Science, pages 624–633. Springer, 2011.
5. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999.
6. M. R. Fellows, F. A. Rosamond, F. V. Fomin, D. Lokshtanov, S. Saurabh, and Y. Villanger. Local search: Is
brute-force avoidable? In Proc. 21st IJCAI, pages 486–491, 2009.
7. P. Festa and P. M. Pardalos. Efficient solutions for the far from most string problem. Annals of Operations
Research. To appear, available online.
8. F. V. Fomin, D. Lokshtanov, V. Raman, and S. Saurabh. Fast local search algorithm for weighted feedback arc
set in tournaments. In Proc. 24th AAAI. AAAI Press, 2010.
9. J. Gramm, R. Niedermeier, and P. Rossmanith. Fixed-parameter algorithms for closest string and related problems. Algorithmica, 37(1):25–42, 2003.
10. J. Guo, S. Hartung, R. Niedermeier, and O. Suchý. The parameterized complexity of local search for TSP, more
refined. In Proc. 22nd ISAAC, volume 7074 of Lecture Notes in Computer Science, pages 614–623. Springer,
2011.
11. R. Impagliazzo and R. Paturi. Complexity of k-SAT. In Proc. 14th CCC, pages 237–240, 1999.
12. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. How easy is local search? J. Comput. Syst. Sci.,
37(1):79–100, 1988.
13. A. Krokhin and D. Marx. On the hardness of lossing weight. ACM Transactions on Algorithms, 2012. To appear.
15
14. D. Lokshtanov, D. Marx, and S. Saurabh. Lower bounds based on the Exponential Time Hypothesis. Bulletin
of the EATCS, 105:41–71, 2011.
15. D. Marx. Searching the k-change neighborhood for TSP is W[1]-hard. Oper. Res. Lett., 36(1):31–36, 2008.
16. D. Marx and I. Schlotter. Stable assignment with couples: Parameterized complexity and local search. Discrete
Optimization, 8(1):25–40, 2011.
17. C. Meneses, C. A. S. Oliveira, and P. M. Pardalos. Optimization techniques for string selection and comparison
problems in genomics. IEEE Engineering in Medicine and Biology Magazine, 24(3):81–87, 2005.
18. S. Szeider. The parameterized complexity of k-flip local search for SAT and MAX SAT. Discrete Optimization,
8(1):139–145, 2011.
16