Local search for string problems: Brute-force is essentially optimal

Danny Hermelin

Local Search for String Problems: Brute Force is Essentially Optimal Jiong Guo1⋆ , Danny Hermelin2 , Christian Komusiewicz3 1 3 Universität des Saarlandes, Campus E 1.7, D-66123 Saarbrücken, Germany. jguo@mmci.uni-saarland.de 2 Industrial Management and Engineering Department, Ben-Gurion University, Beer Sheva, Israel. hermelin@bgu.ac.il Institut für Softwaretechnik und Theoretische Informatik, Technische Universität Berlin, D-10587 Berlin, Germany christian.komusiewicz@tu-berlin.de Abstract. We address the problem of whether the brute-force procedure for the local improvement step in a local search algorithm can substantially be improved when applied to classical NP-hard string problems. We examine four of the more prominent problems in this domain: Closest String, Longest Common Subsequence, Shortest Common Supersequence, and Shortest Common Superstring. Herein, we consider arguably the most fundamental string distance measure, namely the Hamming distance, which has been applied in practical local search implementations for string problems. Our results indicate that for all four problems, the brute-force algorithm cannot be considerably improved. Key words: Local Search, Parameterized Complexity, Parameterized Intractability, Closest String, Longest Common Subsequence, Shortest Common Supersequence, Shortest Common Superstring. 1 Introduction Local search is a universal algorithmic approach for coping with computationally hard optimization problems. It is typically applied on problems which can be formulated as finding a solution maximizing or minimizing a criterion among a number of feasible solutions. The main idea is to start with some solution, then search inside the local neighborhood of this solution for a better solution until a locally optimal solution has been found. The hope is then that the locally optimal solution is almost as good as a globally optimal one. See the book by Aarts and Lenstra [1] for further background and results concerning local search. There are two main theoretical approaches to study local search: PLS-completeness [12] and parameterized local search [15, 6]. PLS-completeness can be used to show that finding a locally optimal solution is computationally hard since a lot of improvement steps might be needed until it has been found. In contrast, parameterized local search is concerned with the parameterized complexity of the problem of searching the local neighborhood of a solution in order to find a better solution. Usually the size of the neighborhood is nO(k) , where n is the total input length, and k is a parameter measuring the “radius” of the neighborhood; that is, the maximum distance to the current solution. It is therefore natural to ask whether nO(k) time is required for searching this neighborhood, or whether f (k) · poly(n) time can be achieved. This is precisely the main question underlying the theory of parameterized complexity [5]. In this context, this question translates to determining whether the improvement step problem is fixed-parameter tractable or not with respect to k. ⋆ Supported by the DFG excellence cluster MMCI. There is substantial work in parameterized local search. For example, concerning the Traveling Salesman problem, Balas [2] showed that one can find, if it exists, a better tour with “shift” distance at most k to the old one in 4k · poly(n) time. Marx [15] proved the non-existence of such an algorithm for the edge-exchange neighborhood. Subsequently, the complexity of local search for further neighborhood measures of Traveling Salesman was examined [10]. Fellows et al. [6] provided parameterized algorithms for local search variants of diverse graph problems such as r-Center, Vertex Cover, Odd Cycle Transversal, Max Cut, and Min-Bisection on planar graphs and proved W[1]-hardness for the general case. Fomin et al. [8] considered the Feedback Arc set in Tournaments problems and presented a subexponential-time algorithm for its edge-exchange local search version. Further, Marx and Schlotter [16] studied a variant of Stable Marriage with respect to local search. Further results concerning parameterized local search have been achieved for clustering problems [4], Boolean Constraint Satisfaction [13], and Satisfiability [18]. In this paper, we add a new realm to the study of parameterized local search by considering string problems. Stringology is one of the most widely studied areas in computer science, particularly motivated by direct applications in text mining and computational biology. Here, we consider four prominent NP-hard string problems: Closest String, Longest Common Subsequence, Shortest Common Supersequence, and Shortest Common Superstring. Local search seems to be a natural approach for dealing with string problems. For instance, a local search heuristic using the Hamming distance neighborhood has been implemented and evaluated with real-world data for a closely related variant of Closest String [7, 17]. We examine all four string problems above in the framework of parameterized local search. Herein, we consider the Hamming distance neighborhood of a temporary solution and prove that the local search version of all these problems are W[1]-hard even on alphabets of constant size, with the Hamming distance k between the old and new solutions as parameter. Since the Hamming distance seems to be the most simple distance between strings, our results could serve as the basis for proving the hardness for other distance neighborhoods. Moreover, for all problems except Shortest Common Supersequence, we can exclude the existence of algorithms with running-times no(k) . Thus, for these three problems, the nO(k) -time brute-force cannot be substantially improved and is essentially optimal. 2 Preliminaries For a string S we write |S| to denote the length of S. We use S[i], 1 ≤ i ≤ |S|, to denote the letter at position i in S and use S[i, j], 1 ≤ i < j ≤ |S|, to denote the substring S[i] · · · S[j] of S from position i to position j. A substring of the form S[i, n] is called a suffix of S, and a substring S[1, j] is called a prefix. For a given suffix T of S, we write S − T to denote the string S[1, |S| − |T |]. We use S− as a shorthand for S − S[|S|]. A string T is a subsequence of S if T can be obtained from S by deleting some letters; that is, if there exists a sequence of positions i1 < · · · < i|T | with S[ij ] = T [j] for all j ∈ {1, . . . , |T |}. If T is a subsequence of S, then S is called a supersequence of T . The Hamming distance dH (S, T ) := |{i : S[i] 6= T [i]}| between two string S and T of equal length is defined as the number of positions in which the two strings differ. We define the Hamming distance of a string S to a set T of strings as dH (S, T ) := maxT ∈T dH (S, T ). We will analyze our local search string problems using the framework of parameterized complexity [5]. In parameterized complexity, problem instances are ordered pairs (x, k) ∈ {0, 1}∗ × N, where x is a binary string that encodes the combinatorial input of the problem (in our case, a set of strings and a few integers), and k is a non-negative integer which is referred to as the parameter. 2 For this paper, the only notion from parameterized complexity theory is the concept of reduction: A parameterized reduction from a parameterized problem L to another parameterized problem L′ is an algorithm with running time f (k) · poly(|x|) for some computable f (), that maps an instance (x, k) of L to an instance (x′ , k ′ ) of L′ such that: (i) k ′ ≤ g(k) for some computable g(), and (ii) (x, k) ∈ L ⇐⇒ (x′ , k ′ ) ∈ L′ . Note that the running-time of the reduction is allowed to be super-polynomial in k, yet the exponent of the polynomial which depends on n in this expression is required to be independent of k. Thus, a running-time of e.g. O(2k · |x|5 ) is allowed, while O(|x|k ) is not. If g() is linearly bounded, i.e. g(k) = O(k) , then we say that such a reduction is a linear parameterized reduction. Parameterized intractability is defined via circuit satisfiability problems. A constant-depth circuit is said to have weft t if the maximum number of gates with unbounded fan-in on any inputoutput path is t. The parameterized Weft-t Weighted SAT(k) problem asks to determine, given a circuit of weft t and a parameter k, whether the circuit can be satisfied by an assignment of Hamming weight k (i.e. by setting k of its input gates to 1). For every t ≥ 1, the class W[t] is defined as the set of all problems reducible, via a (not necessarily linear) parameterized reduction, to the Weft-t Weighted SAT(k) problem. It is not difficult to show that W[1] ⊆ W[2] . . ., and it is widely believed that all these inclusions are proper. In this paper we will only be concerned with W[1] and W[2]. If there is a parameterized reduction from a W[1]-hard (W[2]-hard) problem to a parameterized problem L, then this implies W[1]-hardness (W[2]-hardness) of L. The hardness results in this paper are obtained by parameterized reductions from the following three well-known problems which all have the solution size k as parameter: In the W[2]-hard Multicolored Hitting Set(k) (MHS(k)) problem, the input is a hypergraph (V, E) and a coloring function c : V → {c1 , . . . , ck } which assigns one of k colors to each vertex in V . The goal is to determine whether there exists a size-k subset H ⊆ V with H ∩ E 6= ∅ for all E ∈ E, such that H is multicolored, that is, |{v ∈ H : c(v) = ci }| = 1 for all ci ∈ {c1 , . . . , ck }. In the W[1]hard Multicolored Independent Set(k) (MIS(k)) problem, the input is a graph (V, E) and a coloring function c : V → {c1 , . . . , ck }, and the goal is to determine if (V, E) has a multicolored independent set I ⊆ V . The W[1]-hard Multicolored Clique(k) (MC(k)) is defined similarly, except the goal is to determine the existence of a multicolored clique instead of a multicolored independent set. While parameterized reductions can show hardness in the parameterized sense (e.g. W[1]hardness or W[2]-hardness), a well-known result of Chen et al. [3] shows that linear reductions can be used to obtain seemingly stronger results related to hardness in the classical world. This is done via the so-call Exponential Time Hypothesis (ETH) of Impagliazzo and Paturi [11] which states that the 3-SAT problem cannot be solved in 2o(n) time. The most relevant part of the paper by Chen et al. [3] to our work is stated in the lemma below. We refer the reader to [14] for a recent survey on ETH-based running time lower bounds. Lemma 1. Let L be a parameterized problem with parameter k, and assume that there is a linear parameterized reduction from either MHS(k)1 , MIS(k), or MC(k) to L. Then unless ETH fails, L cannot be solved by an algorithm running in no(k) time. 1 For MHS(k), Chen et al. [3] do not explicitly make this statement, but it can be inferred using the standard simple reduction from Dominating Set 3 3 Closest String The first local search string problem we consider is a local search variant of the Closest String problem. Let Σ denote some arbitrary alphabet, and n be a positive integer. In Closest String, the input is a set T ⊆ Σ n of strings and an integer d, and the goal is to determine whether there is a string S ∈ Σ n such that dH (S, T ) ≤ d. The local search variant of this problem that we consider is defined as follows: Local Search Closest String (LSCS): Input: A set T := {T1 , . . . , Tm } ⊆ Σ n of input strings, a temporary solution string S ∈ Σ n with d := dH (S, T ), and a nonnegative integer k. e T ) < d and dH (S, e S) ≤ k? Question: Is there a string Se of length n such that dH (S, Thus, we are given a temporary solution string S, and we want to find a better solution Se in the k-neighborhood of S, where this neighborhood is defined w.r.t. Hamming distance. We denote the different parameterizations of this problem by appending the parameters to the problem name in parenthesis. Thus, LSCS(k) for instance, is the LSCS problem parameterized by k. Observe that LSCS can be solved by a brute-force algorithm in O(nk+1 · m) time. It is also not difficult to devise a bounded search tree algorithm with running-time dk · poly(n, m) using the following observation: As long as S (or an intermediate solution) differs from some input string in at least d positions, then one of these positions in S must be changed. Achieving a f (m) · poly(n)-time algorithm by modifying the Integer Linear Programming-based algorithm of Gramm et al. [9] is also possible. Below, we will show that for the parameter k, one cannot substantially improve on the brute-force algorithm in general, even in the case where the strings are binary. Theorem 1. There is a linear parameterized reduction from MHS(k) to LSCS(k + |Σ|). Proof. Given an instance (V := {1, . . . , |V |}, E, c) of MHS(k) with c : V → {c1 , . . . , ck }, we create an instance (T , S, k) of LSCS(k, |Σ|) as follows: For each E ∈ E, we create a string TE of length |V | for which TE [v] := c(v) for all v ∈ E, and TE [v] := 0 for all v ∈ V \ E. For each c ∈ {c1 , . . . , ck }, we construct a string Tc with Tc = c|V | . The set of strings T is then defined as T := {TE : E ∈ E} ∪ {Tc : c ∈ {c1 , . . . , ck }}, and the string S is set to S := 1|V | . Thus, the input strings are over the k + 2 letter alphabet Σ := {0, 1, c1 , . . . , ck }, and dH (S, T ) = |V |. Cleary, the above construction can be carried out in polynomial time. To see its correctness, observe that given a multicolored hitting set H ⊆ V of size k for (V, E), we can construct a solution e = c(v) for all v ∈ H, and S[v] e = 1 for all v ∈ V \ H. Indeed, string Se for (T , S, k) by setting S[v] e e dH (S, S) = k, and it is not difficult to verify that dH (S, T ) < |V | = dH (S, T ) for every T ∈ T . Conversely, a solution string Se for (T , S, k) must include at least one occurrence of the letter c for e Tc ) = |V |. Since dH (S, S) e ≤ k, this implies that every color c ∈ {c1 , . . . , ck }, since otherwise dH (S, e S must include exactly one such occurrence for each c ∈ {c1 , . . . , ck }. Furthermore, for every E ∈ E e TE ) < |V |, which implies that there is a position v ∈ V for which S[v] e = TE [v], which we have dH (S, e in turn implies that v ∈ E. It therefore follows that the set H := {v : S[v] 6= 1} is a multicolored hitting set of size k for (V, E). ⊔ ⊓ We now modify the construction in the proof of Theorem 1 so that the input strings are all over the binary alphabet Σ := {0, 1}. The difficulty for inputs with a binary alphabet is that flipping a character from 0 to 1 not only means “moving towards” the input strings with a 1 at this position, but also “moving away” from all other input strings. Let (V := {1, . . . , |V |}, E, c) be an instance of MHS(k), and assume w.l.o.g. that |E| ≤ |V | − k for each E ∈ E. Set the individual input string 4 length to n := |V | + |V | · |E| + 2k · |V |, and set the temporary solution S to 0n . For each E ∈ E create a string TE of length n. For each v ∈ {1, . . . , |V |}, set TE [v] := 1 if v ∈ E and TE [v] := 0 otherwise. Note that the Hamming distance between TE [1, |V |] and S[1, |V |] is exactly |E| ≤ |V | − k. The remaining positions are used to “pad” the distance between TE and S to |V | − k. To this end, assign a unique number i ∈ {1, . . . , |E|}, and use the substring TE [i · |V | + 1, (i + 1) · |V |] to pad the distance between TE and S; that is, set the first |V | − k − |E| positions in this substring to 1 and all other positions in TE [|V | + 1, n] to 0. Next, add an additional set of strings which enforce that for each proper subset of colors C ⊂ e := {c(v) : S[v] e = 1} is not C. Since we {c1 , . . . , ck }, the set of colors used by a solution string C(S) e enforce this for each proper subset, it will follow that C(S) := {c1 , . . . , ck }. For each proper C ⊂ {c1 , . . . , ck }, construct a string TC such that, for each v ∈ {1, . . . , |V |}, we have TC [v] = 0 if c(v) ∈ C and TC [v] = 1 otherwise. Note that the distance between S[1, |V |] and TC [1, |V |] equals the number of vertices in V not colored by a color in C. Pad the distance between TC and S to |V | − |C| by assigning TC a unique number i ∈ {1, . . . , 2k − 1}, and let x denote the number of positions v in TC [1, |V |] with TC [v] = 0. Note that x ≥ |C| since for each color c ∈ C there is at least one vertex colored c. Consequently, set the first x − |C| positions in TC [|V | · (|E| + i) + 1, |V | · (|E| + i + 1)] to 1, and all remaining unspecified positions to 0. Observe that in this way T∅ = 1|V | 0n−|V | . This concludes the construction of the set T of input strings, and the instance (T , S, k) of LSCS(k). Clearly this construction can be performed in 2k · poly(n, m) time, and therefore it is a parameterized reduction. Furthermore, observe that dH (S, T ) = |V |, and that this distance is obtained by the distance between S and T∅ . We prove the correctness of our reduction using the following two lemmas. Example 1. Suppose the hypergraph in the input instance to MHS(k) is given by V = {1, 2, 3, 4} and E = {E1 = {2, 3}, E2 = {1}, E3 = {2, 4}, E4 = {1, 3}}. Also assume that k = 2, and vertices {2, 3} are colored r (for red), and vertices {1, 4} are colored b (for blue). The set of strings T is constructed as: – – – – – – – T1 = T2 = T3 = T4 = T∅ = T{r} = T{b} = 0110 1000 0101 1010 1111 1001 0110 0000000000000000000000000000 0000100000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000010000000 0000000000000000000000001000 For readability purposes, we inserted above an artificial gap between the prefix and padding-suffix of each string. Lemma 2. If (V, E, c) ∈ MHS(k) then (T , S, k) ∈ LSCS(k). e = Proof. Let H ⊆ V be a multicolored hitting set of size k for (V, E). Consider the string Se with S[v] e e e 1 if v ∈ H and S[v] = 0 otherwise. Clearly, dH (S, S) = k. We show that dH (S, S) < |V |; that is, that Se is a solution to the instance (T , S, k). First, consider an arbitrary input string TE ∈ T e = k, the distance between Se corresponding to E ∈ E. Since dH (TE , S) = |V | − k and dH (S, S) e = 1. Since H and TE can be at most |V |. This is the case if TE has a 0 at each position v with S[v] e = TE [v] = 1. is a hitting set, there is a vertex v ∈ H with v ∈ E, and for this position v we have S[v] e Hence, dH (TE , S) < |V |. Next, consider an arbitrary string TC ∈ T corresponding to a proper subset C ⊂ {c1 , . . . , ck }. e = |V |−k < |V |. Assume thus that C 6= ∅. By construction, dH (S, TC ) = |V |−|C|. Clearly dH (T∅ , S) 5 As H is multicolored and of size k, for any color c ∈ {c1 . . . , ck } there is exactly one v ∈ H with e = 1. Thus, restricted to the positions that correspond to a c(v) = c for which 0 = S[v] 6= S[v] color c ∈ C, the distance between Se and TC is increased by 1 compared to distance between Se and TC , since TC [v] = 0 for all such positions v. For a color c ∈ / C, this distance is decreased by 1, as TC [v] = 1 for all such positions v ∈ {1, . . . , |V |} with c(v) = c. Since C is a proper subset of {c1 , . . . , ck }, it follows that the distance between TC and Se is at most |V | − |C| + |C| − 1 < |V |. ⊔ ⊓ Lemma 3. If (T , S, k) ∈ LSCS(k) then (V, E, k) ∈ MHS(k). e S) ≤ k. Then the Proof. Let Se be a solution string for (T , S, k) with dH (S, T ) < |V | and dH (S, e e set of positions H := {v ∈ {1, . . . , n} : S[v] = 1} which are set to 1 in S is of size at most k. Since for each i ∈ {|V | + 1, n} there is at most one string T ∈ T with T [i] = 1, we can also assume e | + 1, n] can be matched only to a single T ∈ T , and that H ⊆ V , since any 1 in the substring S[|V e can thus be shifted to the substring S[1, |V |] while making sure that the distance from T is less e |V |], then replacing all 1’s with 0’s in S[|V e | + 1, n] always than |V |. (Note that if T [1, |V |] = S[1, results in a string of distance less than |V | to T .) We next argue that H is a multicolored subset of V , and has size precisely k. Assume not. Then the set C := {c(v) : v ∈ H} of colors in H is a proper subset of {c1 , . . . , ck }, and so consider the string TC ∈ T . By construction of TC , it follows that the difference dH (S, TC ) − e TC ) is decreased by 1 for every v ∈ H with c(v) ∈ C, and increased by 1 for every v ∈ H dH (S, with c(v) ∈ / C. Since the color set of H is precisely C, we only have vertices of the first type, and so e TC ) = dH (S, TC ) + |H| = |V | − |C| + |H| ≥ |V |, dH (S, e T ) < |V |. contradicting the assumption that dH (S, Thus, we can assume that H ⊆ V is multicolored, and has precisely k elements. Furthermore, H e TE ) < |V | for each TE , there is at least one position v ∈ H is also a hitting set for (V, E): Since dH (S, with TE [v] = 1 (otherwise, the distance between TE and Se is at least |V |). ⊔ ⊓ The two lemmas above combined prove the correctness of our reduction. We therefore have the following theorem. Theorem 2. There is a linear parameteric reduction from MHS(k) to LSCS(k) for binary strings. By combining Lemma 1 and Theorem 2, we obtain the following lower bound for LSCS restricted to binary strings. Corollary 1. LSCS(k) restricted to binary strings is W[2]-hard. Moreover, the problem has no no(k) algorithm unless ETH fails. 4 Longest Common Subsequence The Longest common subsequence (LCS) problem asks to determine whether an input set T of strings has a string S of some specified length ℓ such that S is a subsequence of each string T ∈ T . In this section we consider the following local search variant of this problem: Local Search Longest Common Subsequence (LSLCS): Input: A set T := {T1 , . . . , Tm } of input strings over an alphabet Σ, a temporary solution string S such that S is a subsequence of each string in T , and a nonnegative integer k. e is a subsequence Question: Is there a letter σ ∈ Σ and a string Se of length |S| such that Sσ e of each string in T and dH (S, S) ≤ k? 6 Observe that LSLCS can be solved in nO(k) time by brute-force. We show that it is unlikely to substantially improve on this algorithm, even in the case of constant-size alphabets. As a warm-up, we begin with the very easy case of unbounded alphabets. Lemma 4. There is a linear parameterized reduction from LCS(ℓ) to LSLCS(k). Proof. Let T := {T1 , . . . , Tm } be an input set of strings to the LCS(ℓ) problem. We create an instance for LSLCS(k) by setting S := $ℓ−1 , where $ is a letter not appearing in any string of T , and then setting T ′ := {ST1 , T2 S, . . . , Tm S}. It can easily be verifed to see that (T ′ , S) has a e with dH (S, e S) ≤ k = ℓ − 1 iff T has a common subsequence of length ℓ. solution Sσ ⊔ ⊓ We next proceed to the more involved case where |Σ| is part of the parameter. We present a reduction from MIS(k) to LSLCS(k + |Σ|). Let (G = (V, E), c) denote an instance of MIS(k), where G is a graph and c is a coloring function c : V → {c1 . . . , ck }. By padding (G, c), we can assume w.l.o.g, that each color class in G has precisely n vertices, that is, |{v : c(v) = ci }| = n for each i ∈ {1, . . . , k}. The general idea of the reduction is to construct an instance (T , S, k ′ = k) of LSLCS(k + |Σ|), with two enforcement strings T1 , T2 ∈ T , such that adding any letter at the end of S forces modifications in S that correspond to the selection of k vertices of different colors in G. We then add edge strings Te to T corresponding to edges e ∈ E in order to ensure that these selected vertices form an independent set in G. The complete details are given below. We begin by describing the solution string S. The string S consists of a suffix S ∗ := ($£k+1 $)k+1 , where $ and £ are two letters of the alphabet that do not appear elsewhere in S. The prefix of S consists of k substrings, or blocks, one for each color class. The substring S(ci ) corresponding to ci − is defined as the string S(ci ) := → ci (0#)n ← c−i . The whole string S is thus constructed as S := S(c1 ) · · · S(ck )S ∗ . Next we construct the two enforcement strings T1 , T2 ∈ T . The string T1 contains the string S as its suffix. Its prefix contains k blocks, one for each color class of G, where the i’th block T1 (ci ) − is defined as T1 (ci ) := → ci (0#1#)n−1 ← c−i . The prefix of T1 is separated from its suffix with the string ∗ S to form the string T1 := T1 (c1 ) · · · T1 (ck )S ∗ S. The string T2 also contains k blocks, each corresponding to a color of G, where the block corre− sponding to ci is constructed as T2 (ci ) := → ci (01#)n ← c−i . We concatenate all these blocks with the suffix S ∗ $ to obtain the string T2 := T2 (c1 ) · · · T2 (ck )S ∗ $. Finally, for each edge e ∈ E, we construct an input string Te as follows. Assume that the vertices in each color class are ordered. Let e be an edge between the x’th vertex of color ci and the y’th vertex of color cj , where i < j. The string Te consists of two blocks for each color class of G, defined by – – – – – − Te1 (ci ) := → ci (01#)x−1 0#(01#)n−x ← c−i , 2 y−1 n−y → − ← Te (cj ) := cj (01#) 0#(01#) c−j , − Te2 (ci ) := → ci (01#)n ← c−i , 1 n → − ← Te (cj ) := cj (01#) c−j , − Te1 (cℓ ) := Te2 (cℓ ) := → cℓ (01#)n ← c−ℓ , for all ℓ 6= i, j. 7 We then construct Te by concatenating all these blocks, along with the suffix S ∗ $ to form the string Te := Te1 (c1 ) · · · Te1 (ck )Te2 (c1 ) · · · Te2 (ck )S ∗ $. Setting T := {T1 , T2 } ∪ {Te : e ∈ E} completes the construction of our LSLCS(k + |Σ|) instance (T , S, k). Observe that S is indeed a subsequence of all strings in T , and that Σ is an alphabet − − of size 2k + 5 consisting of the letters → c1 , ← c−1 , . . ., → ck , ← c− k , 0, 1, #, $, and £. We now make two observations that lead to the soundness and completeness of our reduction. e is a solution string for the constructed instance (T , S, k). Then Sσ e = Lemma 5. Suppose that Sσ ∗ e e e S(c1 ) · · · S(ck )S σ, where for each i ∈ {1, . . . , k}, the substring S(ci ) is obtained from S(ci ) by replacing exactly one occurrence of the letter 0 with the letter 1. Proof. Observe that no matter what σ is chosen to be, the string S ∗ σ is not a subsequence of S. Furthermore, it is not difficult to see that modifying any k letters in S ∗ σ results in a string e is a subsequence of T1 , this means that the suffix of which is still not a subsequence of S. Since Sσ ∗ e must be a subsequence of S ∗ S. Thus, the remaining prefix of Sσ e must be a length |S | + 1 in Sσ subsequence of T1 (c1 ) · · · T1 (ck ). But as each T1 (ci ) contains one less occurrence of the letter 0 than e can be written as S(ci ), the only way for this prefix to be a subsequence of T1 (c1 ) · · · T1 (ck ) is if Sσ ∗ e e e S(c1 ) · · · S(ck )S σ, where each S(ci ) is obtained from S(ci ) by replacing exactly one occurrence of the letter 0 in S(ci ). Clearly, the only two possibilities are to replace the 0 with either 1 or #, since e i ) would not be a subsequence of T1 (ci ). However, since Sσ e is also a subsequence of otherwise S(c − T2 , the letter # cannot be chosen since T2 does not contain the subsequence → ci #n+1 ← c−i . ⊔ ⊓ e 1 ) · · · S(c e k ) differs from According to Lemma 5 above, we can think of the positions in which S(c S(c1 ) · · · S(ck ) as an encoding the selection of k vertices, one for each color class of G. We refer to e these vertices as the set of vertices selected by S. Lemma 6. The set I ⊆ V (G) of vertices selected by Se is a multicolored independent set in G. Proof. Suppose that I contains two vertices which are adjacent by an edge e in G, and assume that these vertices are the x’th vertex of color class ci and the y’th vertex of color class cj . Then by e contains the subsequence Lemma 5, the solution string Sσ − → − c−j . c−i → cj (0#)y−1 1#(0#)n−y ← ci (0#)x−1 1#(0#)n−x ← However, this string is not a subsequence of Te ∈ T , a contradiction. ⊔ ⊓ e with dH (S, e S) ≤ k then G has a Lemma 6 implies that if (T , S, k) has a solution string Sσ multicolored independent set of size k. Conversely, if G has a multicolored independent set I of size k, then it can readily be verified that by choosing Se such that the k vertices it selects are precisely e for (T , S, k). Thus, since our construction can be carried out in I, we have a solution string S$ polynomial time, and since MIS(k) is W[1]-complete, we obtain Theorem 3 below. Theorem 3. There is a linear parameterized reduction from MIS(k) to LSLCS(k + |Σ|). We next sketch how to reduce the alphabet in our construction to constant size. For each − i ∈ {1, . . . , k}, replace the letters → ci and ← c−i with the substrings pα(i) and q α(i) respectively, where α(k) := 1 and α(i) := α(k) + · · · + α(i + 1) + 1 for i < k. The new alphabet is of size 7. It is not difficult to verify that Lemma 5 still holds under this modification. The rest of the proof remains unchanged. Corollary 2. LSLCS(k) restricted to strings over a constant-size alphabet is W[1]-hard. Moreover, the problem has no no(k) algorithm unless ETH fails. 8 5 Shortest Common Supersequence In this section, we consider a local search version of Shortest Common Supersequence (SCSeq). In SCSeq, the input is a set of strings T and an integer ℓ, and the question is whether there exists a string S of length ℓ which is a supersequence of all strings in T . The local search variant of this problem that we consider is given by: Local Search Shortest Common Supersequence (LSSCSeq): Input: A set T = {T1 , . . . , Tm } of strings over an alphabet Σ, a string S which is a supersequence of all Ti ’s, and a positive integer k. Question: Is there a string Se of length |S| − 1 which is a supersequence of all Ti ’s such e ≤ k? that dH (S−, S) In other words, the new solution supersequence Se is created from S by removing the last position of S and modifying at most k remaining positions. The main result of this section is the theorem below. Theorem 4. There is a linear parameterized reduction from MIS(k) to LSSCSeq(k) restricted to strings over an alphabet of constant size. Let (G = (V, E), c) denote an arbitrary input of MIS(k) with c : V → {c1 , . . . , ck }. We assume w.l.o.g. that there are n vertices colored ci , for each color ci ∈ {c1 , . . . , ck }, and that any pair of vertices with equal color are adjacent in G. Furthermore, to ease our presentation, we assume that the edges in G are directed; that is, E contains the two ordered pairs (u, v) and (v, u) for every pair of adjacent vertices u and v in G. In our construction, we use the first part of the supersequence S to encode the k color classes of the vertex set of G. The main idea of the reduction is to construct an enforcement string T1 which is a suffix of S and, after the removal of the last letter of S, can only be matched to the first part of S. This match forces that three positions of each color class are changed in the first part of S. These positions correspond to the vertex of this color class that is in the independent set. The remaining input strings are used to encode the edges of G, and to force that the changed positions of each color class represent the same vertex from this class. For presentation purposes, we construct strings over an alphabet of size O(k). The reduction could be adapted to the constant alphabet case in a similar way as is done in Section 4. The details are as follows. We begin by constructing the temporary solution S. First we create three substrings for each color ci ∈ {c1 , . . . , ck }, which we refer to as selection blocks: − – S 1 (ci ) := → ci (01#)n ← c−i , 2 n ← → − – S (ci ) := ci (00#) c−i , and − – S 3 (ci ) := → ci (01#)n ← c−i . We construct S by concatenating the selection blocks, using the letter & to separate the three sets of selection blocks. We then add a suffix to S: The string S ∗ := ($£n $)k+1 concatenated to the input string T1 ∈ T which will be specified later. The string S is then given by S := S 1 (c1 ) · · · S 1 (ck ) & S 2 (c1 ) · · · S 2 (ck ) & S 3 (c1 ) · · · S 3 (ck ) S ∗ T1 . Next, we construct the input string T1 which is the first of two input strings that will act as enforcement strings, enforcing the changes in S to occur in its selection blocks in a controlled fashion. For ci ∈ {c1 , . . . , ck }, define 9 − – T11 (ci ) := → ci 0n+1 ← c−i , − – T12 (ci ) := → ci 1 ← c−i , and 3 n+1 → − ← – T1 (ci ) := ci 0 c−i . We construct T1 using these substrings, the separation letter &, and the suffix S ∗ : T1 := T11 (c1 ) · · · T11 (ck ) & T12 (c1 ) · · · T12 (ck ) & T13 (c1 ) · · · T13 (ck ) S ∗ . The second enforcement string T2 is constructed using the following substrings corresponding to a color ci ∈ {c1 , . . . , ck }: − – T21 (ci ) := → ci (0#)n ← c−i , 2 n → − ← – T2 (ci ) := ci (0#) c−i , and − – T23 (ci ) := → ci (0#)n ← c−i . The string T2 is then constructed as T2 := T21 (c1 ) · · · T21 (ck ) & T22 (c1 ) · · · T22 (ck ) & T23 (c1 ) · · · T23 (ck ) S ∗ T1 − . To complete the construction of T , we construct a string Te for each e ∈ E. These strings are composed of substrings that correspond to vertices of G. Let v ∈ V with c(v) := ci , and assume v is the x’th vertex of color ci . The string T (v) is defined by − c−i . T (v) := → ci (0#)x−1 01 (0#)n−x ← The string Te is constructed as Te := T (u) & T (v) if e := (u, v) (recall that we assume that the edges are directed, and that any pair of vertices with the same color are adjacent). To finalize our construction, we set the parameter k ′ of the LSSCSeq instance to 3k. Clearly the instance (T , S, k ′ ) can be constructed in polynomial time. We proceed to show that this instance is equivalent to the MIS(k) instance. The first crucial step is given by the following lemma. Lemma 7. If (T , S, k ′ ) ∈ LSSCSeq, then there exists a solution string Se for (T , S, k ′ ) where Se can be written as Se := S ′ S ∗ T1 − with S ′ := Se1 (c1 ) · · · Se1 (ck ) & Se2 (c1 ) · · · Se2 (ck ) & Se3 (c1 ) · · · Se3 (ck ), such that for each i ∈ {1, . . . , k} we have: – – – Se1 (ci ) is obtained from S 1 (ci ) by replacing exactly one occurrence of 01 to 00, Se2 (ci ) is obtained from S 2 (ci ) by replacing exactly one occurrence of 00 to 01, and. Se3 (ci ) is obtained from S 3 (ci ) by replacing exactly one occurrence of 01 to 00. Proof. Consider the suffix T1 − of S−. A careful examination shows that no matter what k changes are made to T1 −, the string S ∗ will not be a subsequence of the resulting string. This means that e and so it must be a subsequence of the S ∗ is not a subsequence of the length |T1 − | suffix of S, ∗ ∗ e Since S is contained as a substring in the |S ∗ T1 − | length suffix of length |S T1 − | suffix of S. e we can assume that Se can be written as Se := S ′ S ∗ T1 −, and that S ′ contains as a subsequence S, − the prefix T1 − S ∗ of T1 . Since for each i ∈ {1, . . . , k}, this prefix contains three copies of each of → ci 1 2 3 ′ ← − → − e e e and ci , there must be three substrings S (ci ), S (ci ), and S (ci ) in S that all begin with ci and end with ← c−i . Moreover, all of these 3k substrings must be disjoint in S ′ . It is now not difficult to see that the lemma follows due to our construction of the prefixes T1 − S ∗ and T2 − (S ∗ T1 −) of our two enforcement strings T1 , T2 ∈ T . ⊔ ⊓ 10 Let Se be a solution string for (T , S, k ′ ) as in Lemma 7. We interpret the positions in S ′ that differ from S− as a set of selected vertices {v11 , v12 , v13 , . . . , vk1 , vk2 , vk3 } of G, where for each i ∈ {1, . . . , k}, the vertex vi1 (resp. vi2 , vi3 ) is the x-th vertex in ci if the x-th substring 01 (resp. 00, 01) in S(ci ) e i ). The next lemma shows that the set of selected vertices includes in fact only k is modified in S(c vertices. Lemma 8. vi1 = vi2 = vi3 for each i ∈ {1, . . . , k}. Proof. Consider some arbitrary i ∈ {1, . . . , k}, and assume that vi1 , vi2 , and vi3 are the x-th, y-th, and z-th vertices colored ci in G, respectively. Let Sei := Se1 (ci ) & Se2 (ci ) & Se3 (ci ), where Se1 (ci ), Se2 (ci ), and Se3 (ci ) are the substrings of Se given in Lemma 7. Since Sei contains the only occurrences − e it must contain as a subsequence any input string Te ∈ T , with e being an edge of → ci and ← c−i in S, between two vertices colored ci in G. Suppose that x 6= y. Consider the string T (vi1 ) & T (vi2 ) ∈ T . By construction, the only way this string can be a subsequence of Sei is if y = z. But then the string T (vi2 ) & T (vi1 ) ∈ T is not a subsequence of Sei . It follows that x = y. A similar argument can be used to show that x = z, and so vi1 = vi2 = vi3 . ⊔ ⊓ According to Lemma 8, we let vi be the single vertex corresponding to vi1 = vi2 = vi3 , giving us a multicolored set {v1 , . . . , vk } of vertices in G. The next lemma shows that this set must be independent in G. Lemma 9. The set of vertices I := {v1 , . . . , vk } form an independent set in G. Proof. Suppose that there is an edge e := {vi , vj } in G, i < j, and consider the substring Se′ := − − Se1 (ci )Se1 (cj ) & Se2 (ci )Se2 (cj ) & Se3 (ci )Se3 (cj ). Since Se′ contains the only occurrences of → ci , ← c−i , → cj ← − e it must contain as a subsequence any input string Te ∈ T , with e being an edge and cj in S, between two vertices colored ci and cj in G. But by construction, the string Se′ does not contain as a subsequence the string Te := T (vi ) & T (vj ) ∈ T , a contradiction. ⊔ ⊓ e then G has a multicolored Combining Lemmas 7, 8, and 9 shows that if (T , S, k ′ ) has solution S, independent set of size k. Conversely, starting at a multicolored independent set I in G, we can construct a string Se corresponding to I as in Lemma 7 which can be easily verified to be a solution string for (T , S, k ′ ). Thus our reduction is correct, and by reducing the alphabet size as in Section 4, we obtain Theorem 4. Corollary 3. LSSCSeq(k) restricted to strings over a constant-size alphabet is W[1]-hard, and has no no(k) algorithm unless ETH fails. 6 Shortest Common Superstring In this section we deal with a local search variant of the classical Shortest Common Superstring. In this problem, the input is a set of strings T and an integer ℓ, and the question is whether there is a string S of length at most ℓ which is a superstring of all strings in T . The local search version of Shortest Common Superstring is defined as follows: Local Search Shortest Common Superstring (LSSCStr): Input: A set T = {T1 , . . . , Tm } of strings over an alphabet Σ, a string S which is a superstring of all Ti ’s, and a positive integer k. Question: Is there a string Se of length |S| − 1 which is a superstring of all Ti ’s such that e S−) ≤ k? dH (S, 11 Theorem 5. LSSCStr(k) is W[1]-hard, even with an alphabet of constant size. For ease of presentation, we describe here only the case that the alphabet size |Σ| is part of the parameter. The case with constant-size alphabets can be coped with the method introduced in Section 4. The reduction is from the W[1]-complete Multicolored Clique problem, where, given a graph G = (V, E) and a coloring function c : V → {c1 , . . . , ck }, we ask for a multicolored clique of size k. We assume w.l.o.g. that c is a proper coloring, i.e. there is no edge {u, v} between vertices u and v with c(u) = c(v). The basic idea is as follows. We construct an input string T0 , which is a suffix of the superstring S. After the removal of the last letter of S, T0 has to be matched to a substring S ′ in S which encodes a vertex v from the first color class; this vertex represents the clique vertex selected from this color class. In order to match T0 , some modifications have to be performed in S ′ . These modifications then force k input strings, which are matched to substrings of S ′ before, to be matched to somewhere else in S. One of these input strings then forces modifications in another substring S ′′ of S, which corresponds to a vertex of the second color class. The other k − 1 input strings encode pairs of color classes. Now, the modifications of S ′′ result in the selection of a vertex from the second color class, which then triggers a selection of a vertex of the third color class and so on. After “selecting” one vertex from each color class, there are in total k(k − 1) input strings which encode color pairs, two strings for each color pair. If the selected vertices form a clique, then the two strings that correspond to the same pair can be matched to a substring of S which encodes the edge between the two selected vertices. In this case, one modification is sufficient for each pair of color classes; otherwise, two modifications are necessary. Hence, these k(k − 1) color pair strings can be matched with at most k(k − 1)/2 modifications if and only if the selected vertices form a clique. The details are as follows. The alphabet Σ consists of k ∗ ·k(k −1)+4k +4 letters with k ∗ := 2k (k 2 +k). The letters $ and # are separating letters, where $ does not occur in the input strings. The letters 0 and 1 are encoding letters. The other letters correspond to colors and color pairs. For each color ci , we have 4 letters: ai , ∗ bi , ci , and di . For each ordered color pair (ci , cj ) with i 6= j, there are k ∗ letters, namely, c1i,j , . . . , cki,j . Assume that each color class in G contains n vertices. The LSSCStr(k)-instance consists of the superstring S and a set T of 1 + (k − 1)k · n + (k − 1) · n input strings: one special input string T0 , k input strings for each vertex from color classes c1 to ck−1 , and k − 1 input strings for each vertex from the color class ck . To construct these strings, we first introduce some strings, which are used as “building blocks” in the construction. First, we describe the “separating blocks”. For each color ci with 2 ≤ i ≤ k, we g(i) g(i) introduce two such blocks: Ai := ai and Bi := bi , where g(i) := 2k−i · (k 2 + k). For each ordered ∗ pair of colors ci and cj with i 6= j, we construct one separating block: Ci,j := (c1i,j #)n · · · (cki,j #)n . Moreover, we construct two “color-pair matching” blocks for each color ci : – Mi1 := 0Ci,1 · · · 0Ci,i−1 Ci,i+1 · · · Ci,k , and – Mi2 := Ci,1 · · · Ci,i−1 Ci,i+1 0 · · · Ci,k 0. Finally, for every vertex v we construct an “identifying block”. Let ci := c(v). Here we distinguish i = 1 and i > 1. Assume v is the x’th vertex colored ci . The identifying block for v is constructed as – I(v) := d1 0x−1 1 0n−x d1 for i = 1, and – I(v) := di (0Ai )x−1 1 Ai (0Ai )n−x di di−1 , for i > 1. We are now ready to describe the set of input strings T in our LSSCStr instance. First, for each vertex v colored ci with 1 ≤ i < k, we construct one “triggering” input string. If v is the x’th 12 vertex colored ci , its triggering input string T (v) is constructed as: 1 T (v) := ci+1 Mi+1 di+1 (0Ai+1 )x−1 0 Bi+1 (0Ai+1 )n−x di+1 di . Then, for each vertex v colored ci with 1 ≤ i ≤ k, we add k − 1 “pairing” input strings, each corresponding to a color class cj with j 6= i. Here, we distinguish i < j and i > j: – T (v, cj ) := Ci,j+1 · · · Ci,k I(v) Ci,1 · · · Ci,i−1 Ci,i+1 0 · · · Ci,j 0 – T (v, cj ) := 0Ci,j · · · 0Ci,i−1 Ci,i+1 · · · Ci,k I(v) Ci,1 · · · Ci,j−1 (i < j), (i > j). To finalize our construction of T , we set the special input string T0 : T0 := c1 M11 d1 0n d1 . Now, it remains to describe the temporary solution S. To this end, we introduce some further building blocks. For each edge e = {u, v} ∈ E, where u is colored ci , and v is colored cj with i < j, we construct one “edge block” S(e) for S as: S(e) := T (u, cj ) − 1 − T (v, ci ), where T (u, cj )− as usual denotes the prefix of the pairing input string T (u, cj ) without the last 0, and −T (v, ci ) denotes the suffix of T (v, ci ) without the first 0. Furthermore, for each vertex v ∈ V colored ci in G we construct the selection block S(v) of v by: – S(v) := T (v) Mi1 I(v) Mi2 for i < k, and – S(v) := Mk1 I(v) Mk2 for i = k. The solution S then consists of three parts, S := S(V )S(E)T0 , where the first part S(V ) is the concatenation of the selection blocks S(v) separated by $’s in any arbitrary order, the second part S(E) is the concatenation of edge blocks S(e) separated by $’s, and T0 is the special input string described above. Finally, we set the parameter for the LSSCStr-instance to k ′ := 2k + k(k − 1)/2 + (2k−1 − 1)(k 2 + k). It is easy to verify that S is a superstring of all input strings: The string T0 occurs at the end of S. Furthermore, for each vertex v ∈ V , the triggering input string T (v) is a prefix of S(v), while the pairing strings T (v, cj ) are clearly substrings of Mi1 I(v) Mi2 . We next turn to showing the equivalence of the two instances. e Lemma 10. If G has a multicolored clique K than (S, T , k ′ ) has a solution string S. Proof. Let K := {v1 , . . . , vk }, where vi is the vertex colored ci in K, and assume vi is the xi ’th vertex colored ci in G. First, we change the only 1 in the identifying block I(v1 ) of the selection block S(v1 ) to 0 and the last character, i.e. d1 , of the triggering block T (v1 ) of S(v1 ) to c1 . Now T0 e 1 ). Then, for every 1 < i ≤ k, we change the is a substring of the resulting selection block S(v only 1 in the identifying block I(vi ) in S(vi ) to 0 and the xi−1 ’th Ai -block in I(vi ) is changed to a Bi -block. Moreover, the letter di at the end of the triggering block T (vi ) is changed to ci . e i ). Next, Now, the triggering input string T (vi−1 ) created for vi−1 is a substring of the resulting S(v consider the edges e := {vi , vj }, i < j, between the vertices in K. We change the only 1 in the edge block S(e) in S to 0. After this, the pairing input strings T (vi , cj ) and T (vj , ci ) are substrings e of the resulting S(e). After the modifications specified above, we get a string Se of length |S| − 1 which is a superstring of all input strings. Summing up, we performed two modifications for v1 , 2 + |Ai | modifications for every i > 1, and one modification for each edge in K. This amounts to k ′ = 2k + (2k−1 − 1)(k 2 + k) + k(k − 1)/2. ⊔ ⊓ 13 We next consider the reversed direction. Suppose that a solution Se exists for (T , S, k ′ ). We e use S(v) to denote substring of Se corresponding to the selection block S(v) of S. Lemma 11. If there is a solution Se for (T , S, k) constructed above, then the input string T0 is a e 1 ) for some v1 ∈ V with c(v1 ) = c1 . substring of some S(v Proof. Since M11 is a substring of T0 and M11 contains (k − 1) · k ∗ > k ′ different letters cℓ1,j with 1 ≤ e For the same reason, T0 cannot be a substring ℓ ≤ k ∗ and 2 ≤ j ≤ k, T0 cannot be a suffix of S. e of S(v) for any vertex v colored ci with 2 ≤ i ≤ k. Moreover, since 0n is a substring of T0 , T0 cannot e e As $ does not appear in T0 , this implies that T0 has to be substring of the edge blocks S(e) of S. e be a substring of some S(v1 ) with c(v1 ) = c1 . ⊔ ⊓ Let v1 be the vertex in Lemma 11. By construction, we have to match M11 of T0 to the M11 e 1 ). This implies that the letter 1 in the corresponding identifying block has to be substring of S(v changed to 0. Moreover, the last letter d1 of the corresponding triggering block must be changed to c1 . These two changes cause that the pairing input strings and the triggering string for v1 matched to somewhere else in Se than in S. We consider first the triggering string T (v1 ). e 2 ) for some Lemma 12. The triggering input string T (v1 ) can only be matched a substring of S(v vertex v2 with c(v2 ) = c2 . e Proof. As in the proof of Lemma 11, the existence of M21 in T (v1 ) prevents it to be matched to S(v) e So the only possibilities corresponding to a vertex colored ci , i > 2, or to the edge blocks in S. e e are the triggering blocks T (v) of S(v) with c(v) = c1 and v 6= v1 , or S(u) for some vertex u colored c2 . Note that in the first case, we have one B2 -block in T (v) and one B2 -block in T (v1 ), but they have different positions. This implies in order to match T (v) to T (v1 ), we have to perform at least 2|B2 | = 2 · 2k−2 (k 2 + k) modifications, more than the allowed number k ′ . ⊔ ⊓ e 2 ) causes 2 + |A2 | As argued in the other proof direction, the matching of T (v1 ) to some S(v modifications after which the triggering input string T (v2 ) and k − 1 pairing input strings are e 3 ) with unmatched. With a similar proof as for Lemma 12, T (v2 ) has to be matched to some S(v k−2 2 c(v3 ) = c3 : after performing the 2 + |A2 | = 2 + 2 (k + k) changes to match T (v1 ), one cannot afford to perform 2|B3 | = 2 · 2k−3 (k 2 + k) = 2k−2 (k 2 + k) changes which are necessary for matching T (v2 ) to the selection block of another vertex colored c2 . The same argument applies inductively for all i > 2. In this way, the string Se differs from S in exactly k selection blocks corresponding to a multicolored set of vertices {v1 , . . . , vk } in G, and the Hamming distance between the remaining suffixes of S− and Se is at most k(k − 1)/2. Lemma 13. The set of vertices {v1 , . . . , vk } specified above form a clique in G. e we Proof. As stated above, to get T0 and the k − 1 triggering input strings matched with S, k−1 2 have to perform (2 − 1)(k + k) + 2k modifications. Only k(k − 1)/2 modifications remain for the k(k − 1) unmatched pairing strings. By construction, no pairing string is a substring of another pairing string. Consequently, the only possibility is to group them into pairs, where each pair has to form an edge, since then the pair can be matched to the corresponding edge block with only one modification, namely, changing the 1 to 0 in the edge block. This implies that the vertices whose selection blocks are changed from S to Se must form a clique. ⊔ ⊓ Combining all lemmas above completes the proof of theorem in case |Σ| is part of the parameter. A similar modification to the one done in Section 4 provides the proof for the constant-size alphabet 14 case. We mention that in the presented reduction, there are parts of the superstring S that are not matched to any input string, which is clearly not a feature of any minimal solution of Shortest Common Superstring. The reduction can be adapted such that this is not the case: First, replace each occurrence of $ in S by one distinct string, for example, $1i $ for the i’th occurrence, and add an input string equal to this string. By doing this, all letters in S occur in the input strings. Next, increase the length of T such that it contains all edge blocks as a prefix. Then, add a block in the beginning of S that contains some unique starting letter followed by the edge blocks. Finally, add one further input string that contains this unique starting letter and the edge blocks. Corollary 4. Unless ETH fails, there is no no(lg k) algorithm for LSSCStr(k), even when all strings are over an alphabet of constant size. 7 Conclusions In this paper we addressed the question of whether the brute-force procedure for the local improvement step in a local search algorithm can substantially be improved when applied to four classical NP-hard string problems. In all cases we considered the Hamming distance as our metric for defining the local neighborhood where the search is performed, and showed that under this metric, the brute-force algorithm cannot be considerably improved for all four problems. There are, however, many other interesting metrics that deserve consideration. Furthermore, there are many other NP-hard string problems where local search can be applied. Finally, we mention that the local neighborhood defined for LSLCS and LSSCSub can also be defined by removing a letter at an arbitrary position from the old solution string Sold (and this is perhaps a more natural definition). However, our hardness results do not indicate a lower bound on the running-time of the local improvement step in this case; while we believe that this is the case, a proof of this is currently absent. References 1. E. H. L. Aarts and J. K. Lenstra. Local Search in Combinatorial Optimization. Wiley-Interscience, 1997. 2. E. Balas. New classes of efficiently solvable generalized traveling salesman problems. Annals of Operations Research, 86:529–558, 1999. 3. J. Chen, B. Chor, M. Fellows, X. Huang, D. W. Juedes, I. A. Kanj, and G. Xia. Tight lower bounds for certain parameterized NP-hard problems. Information and Computation, 201(2):216–231, 2005. 4. M. Dörnfelder, J. Guo, C. Komusiewicz, and M. Weller. On the parameterized complexity of consensus clustering. In Proc. 22nd ISAAC, volume 7074 of Lecture Notes in Computer Science, pages 624–633. Springer, 2011. 5. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999. 6. M. R. Fellows, F. A. Rosamond, F. V. Fomin, D. Lokshtanov, S. Saurabh, and Y. Villanger. Local search: Is brute-force avoidable? In Proc. 21st IJCAI, pages 486–491, 2009. 7. P. Festa and P. M. Pardalos. Efficient solutions for the far from most string problem. Annals of Operations Research. To appear, available online. 8. F. V. Fomin, D. Lokshtanov, V. Raman, and S. Saurabh. Fast local search algorithm for weighted feedback arc set in tournaments. In Proc. 24th AAAI. AAAI Press, 2010. 9. J. Gramm, R. Niedermeier, and P. Rossmanith. Fixed-parameter algorithms for closest string and related problems. Algorithmica, 37(1):25–42, 2003. 10. J. Guo, S. Hartung, R. Niedermeier, and O. Suchý. The parameterized complexity of local search for TSP, more refined. In Proc. 22nd ISAAC, volume 7074 of Lecture Notes in Computer Science, pages 614–623. Springer, 2011. 11. R. Impagliazzo and R. Paturi. Complexity of k-SAT. In Proc. 14th CCC, pages 237–240, 1999. 12. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. How easy is local search? J. Comput. Syst. Sci., 37(1):79–100, 1988. 13. A. Krokhin and D. Marx. On the hardness of lossing weight. ACM Transactions on Algorithms, 2012. To appear. 15 14. D. Lokshtanov, D. Marx, and S. Saurabh. Lower bounds based on the Exponential Time Hypothesis. Bulletin of the EATCS, 105:41–71, 2011. 15. D. Marx. Searching the k-change neighborhood for TSP is W[1]-hard. Oper. Res. Lett., 36(1):31–36, 2008. 16. D. Marx and I. Schlotter. Stable assignment with couples: Parameterized complexity and local search. Discrete Optimization, 8(1):25–40, 2011. 17. C. Meneses, C. A. S. Oliveira, and P. M. Pardalos. Optimization techniques for string selection and comparison problems in genomics. IEEE Engineering in Medicine and Biology Magazine, 24(3):81–87, 2005. 18. S. Szeider. The parameterized complexity of k-flip local search for SAT and MAX SAT. Discrete Optimization, 8(1):139–145, 2011. 16

RELATED PAPERS

RELATED TOPICS

Log In

Local search for string problems: Brute-force is essentially optimal

Local search for string problems: Brute-force is essentially optimal

Related Papers

RELATED PAPERS

RELATED TOPICS