Abstract
In this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Boucher C, Landau GM, Levy A, Pritchard D, Weimann O (2013) On approximating string selection problems with outliers. Theor Comput Sci 498:107–114
Gusfield D (1997) Algorithms on strings, trees, and sequences. Computer science and computational biology. Cambridge University Press, Cambridge
Hsu WJ, Du MW (1984) Computing a longest common subsequence for a set of strings. BIT Numer Math 24(1):45–59. doi:10.1007/BF01934514
Landau GM, Schmidt JP, Sokol D (2001) An algorithm for approxixmate tandem repeat. J Comput Biol 8(1):1–18
Lizárraga E, Blesa MJ, Blum C, Raidl GR (2015) On solving the most strings with few bad columns problem: an ILP model and heuristics. In: Proceedings of INISTA 2015—international symposium on innovations in intelligent systems and applications, IEEE Press, pp 1–8
López-Ibáñez M, Dubois-Lacoste J, Stützle T, Birattari M (2011) The \(\sf irace\) package, iterated race for automatic algorithm configuration. Technical Report TR/IRIDIA/2011-004, IRIDIA, Université libre de Bruxelles, Belgium
Meneses C, Oliveira C, Pardalos P (2005) Optimization techniques for string selection and comparison problems in genomics. IEEE Eng Med Biol Mag 24(3):81–87
Mousavi S, Babaie M, Montazerian M (2012) An improved heuristic for the far from most strings problem. J Heuristics 18:239–262
Pappalardo E, Pardalos PM, Stracquadanio G (2013) Optimization approaches for solving string selection problems. SpringerBriefs in optimization. Springer, New York
Pisinger D, Ropke S (2010) Large neighborhood search. In: Gendreau M, Potvin JY (eds) Handbook of metaheuristics, International series in operations research and management science, vol 146. Springer, New York, pp 399–419
Rajasekaran S, Hu Y, Luo J, Nick H, Pardalos PM, Sahni S, Shaw G (2001) Efficient algorithms for similarity search. J Comb Optim 5(1):125–132
Rajasekaran S, Nick H, Pardalos PM, Sahni S, Shaw G (2001) Efficient algorithms for local alignment search. J Comb Optim 5(1):117–124
Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Voß S, Fink A, Duin C (2005) Looking ahead with the pilot method. Ann Oper Res 136(1):285–302
Acknowledgments
All experiments were executed in the High Performance Cluster managed by the Research and Development Lab (RDlab) of the Computer Science Dept. at the Universitat Politècnica de Catalunya (http://rdlab.cs.upc.edu). We thank all the RDlab staff for their support. A preliminary version of this work appeared at the IEEE 2015 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), September 2–4, 2015, Madrid, Spain. This work was supported by project TIN2012-37930-C02-02 (Spanish Ministry for Economy and Competitiveness, FEDER funds from the European Union) and project SGR 2014-1034 (AGAUR, Generalitat de Catalunya). Additionally, Christian Blum acknowledges support from IKERBASQUE. Evelia Lizárraga acknowledges support from the Mexican National Council for Science and Technology (CONACYT, Doctoral Grant Number 253787).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Evelia Lizárraga, Maria J. Blesa, Christian Blum, and Günther R. Raidl declare that they have no conflict of interest.
Ethical standard
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by C. Analide.
Rights and permissions
About this article
Cite this article
Lizárraga, E., Blesa, M.J., Blum, C. et al. Large neighborhood search for the most strings with few bad columns problem. Soft Comput 21, 4901–4915 (2017). https://doi.org/10.1007/s00500-016-2379-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2379-4