Abstract
The detection of transcription factor binding sites (TFBS) play a important role inside bioinformatics challenges. Its correct identification in the promoter regions of co-expressed genes is a crucial step for understanding gene expression mechanisms and creating new drugs and vaccines. The problem of finding motifs consists of looking for conserved patterns in biological datasets of sequences through the use of unsupervised learning algorithms. For that reason, it is considered one of the classic problems of computational biology, which in its simplest formulation has been proven to be NP-HARD. Moreover, heuristic and meta-heuristic algorithms have been shown to be very promising in solving combinatorial problems with very large search spaces. In this work, we propose an evaluation of different heuristics and meta-heuristics approaches in order to measure its performance: Variable Neighborhood Search (VNS), Expectation Maximization (EM) and Iterated Local Search (ILS). For each of them, two sets of experiments were carried out: In the first, the heuristics were performed alone and in the second, a constructive procedure was introduced with respect to improve the quality of initial solutions. Finally, the metrics were compared with the state-of-art MEME algorithm, which is very used in biological motif discovery. The results obtained suggest that the heuristics are more efficient when used together and also, a constructive procedure was very promising, managing to improve the performance metrics of the evaluated heuristics in most experiments. Also, the combination between a constructive procedure and EM proved to be quite competitive, managing to outperform the MEME algorithm in several datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Text segment of size w.
References
Ashraf, F.B., Shafi, M.S.R.: MFEA: an evolutionary approach for motif finding in DNA sequences. Inf. Med. Unlocked 21 (2020)
Bailey, T.L.: Streme: accurate and versatile sequence motif discovery. Bioinformatics 37(18), 2834–2840 (2021)
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21(1–2), 51–80 (1995)
Bailey, T.L., Johnson, J., Grant, C.E., Noble, W.S.: The meme suite. Nucleic Acids Res. 43(W1), W39–W49 (2015)
D’haeseleer, P.: How does DNA sequence motif discovery work? Nature Biotechnol. 24(8), 959–961 (2006)
D’haeseleer, P.: What are DNA sequence motifs? Nature Biotechnol. 24(4), 423–425 (2006)
Feo, T.A., Resende, M.G.: Greedy randomized adaptive search procedures. J. Global Optimiz. 6(2), 109–133 (1995)
Hart, J.P., Shogan, A.W.: Semi-greedy heuristics: an empirical study. Oper. Res. Lett. 6(3), 107–114 (1987)
He, Y., Shen, Z., Zhang, Q., Wang, S., Huang, D.S.: A survey on deep learning in DNA/RNA motif mining. Brief. Bioinf. 22(4), bbaa229 (2021)
Lee, N.K., Li, X., Wang, D.: A comprehensive survey on genetic algorithms for DNA motif prediction. Inf. Sci. 466, 25–43 (2018)
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of The Thirty-first Annual ACM Symposium on Theory of Computing, pp. 473–482. ACM (1999)
Lihu, A., Holban, Ş.: A review of ensemble methods for de novo motif discovery in chip-seq data. Briefings in bioinformatics p. bbv022 (2015)
Liu, F.F., Tsai, J.J., Chen, R.M., Chen, S., Shih, S.: FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. Proceedings, pp. 459–466. IEEE (2004)
Lourenço, H.R., Martin, O.C., Stützle, T.: Iterated local search: framework and applications. In: Gendreau, M., Potvin, J.-Y. (eds.) Handbook of Metaheuristics. ISORMS, vol. 272, pp. 129–168. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91086-4_5
Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(suppl 1), S207–S214 (2001)
Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W., Lenhard, B.: Jaspar: an open-access database for eukaryotic transcription factor binding profiles. Nucleic acids Res. 32(suppl 1), D91–D94 (2004)
Sandve, G.K., Drabløs, F.: A survey of motif discovery methods in an integrated framework. Biol. Direct 1(1), 11 (2006)
Stormo, G.D., Hartzell, G.W.: Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. 86(4), 1183–1187 (1989)
Stützle, T.: Local search algorithms for combinatorial problems. Darmstadt University of Technology PhD Thesis, p. 20 (1998)
Talbi, E.G.: A taxonomy of hybrid metaheuristics. J. Heurist. 8(5), 541–564 (2002)
Thijs, G., et al.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)
Tompa, M., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Garbelini, J.M.C., Sanches, D.S., Pozo, A.T.R. (2022). Towards a Better Understanding of Heuristic Approaches Applied to the Biological Motif Discovery. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-21686-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21685-5
Online ISBN: 978-3-031-21686-2
eBook Packages: Computer ScienceComputer Science (R0)