Abstract
The paper presents a method of interactive construction of global Hidden Markov Models (HMMs) based on local sequence patterns discovered in data. The method is based on finding interesting sequences whose frequency in the database differs from that predicted by the model. The patterns are then presented to the user who updates the model using their intelligence and their understanding of the modelled domain. It is demonstrated that such an approach leads to more understandable models than automated approaches. Two variants of the problem are considered: mining patterns occurring only at the beginning of sequences and mining patterns occurring at any position; both practically meaningful. For each variant, algorithms have been developed allowing for efficient discovery of all sequences with given minimum interestingness. Applications to modelling webpage visitors behavior and to modelling protein secondary structure are presented, validating the proposed approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD conference on management of data, pp 207–216
Asai K, Hayamizu S, Handa K (1993) Prediction of protein secondary structure by the Hidden Markov Model. Bioinformatics 9(2): 141–146
Balasubramanian V (1993) Equivalence and reduction of Hidden Markov Models. A.I. Technical Report 1370, Artificial Intelligence Laboratory, MIT
Bouchaffra D, Tan J (2006) Protein fold recognition using a structural Hidden Markov Model. In: International conference on pattern recognition, vol 3. Los Alamitos, CA, pp 186–189
Dunham MH (2003) Data mining, introductory and advanced topics, Part III. http://engr.smu.edu/~mhd/dmbook/part3.ppt
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1): 55–86
Hand D (2002) Pattern detection and discovery. In: Hand D, Adams N, Bolton R (eds) Pattern detection and discovery. Springer, Berlin, pp 1–12
Hunter L (1993) Molecular biology for computer scientists. In: Hunter L (eds) Artificial intelligence and molecular biology, chap 1. AAAI Press, Menlo Park, pp 1–46
Huo Q, Lee CH (1997) On-line adaptive learning of the continuous density Hidden Markov Model based on approximate recursive Bayes estimate. IEEE Trans Speech Audio Process 5(2): 161–172
Huo Q, Chan C, Lee CH (1995) Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition. IEEE Trans Speech Audio Process 3(5): 334–345
Inoue M, Ueda N (2003) Exploitation of unlabeled sequences in Hidden Markov Models. IEEE Trans Pattern Anal Mach Intell 25: 1570–1581
Jaroszewicz S (2008) Interactive HMM construction based on interesting sequences. In: Proceedings of local patterns to global models (LeGo’08) workshop at the 12th European conference on principles and practice of knowledge discovery in databases (PKDD’08), Antwerp, Belgium, pp 82–91
Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a Bayesian network. In: 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2005), Chicago, IL, pp 118–127
Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), Seattle, WA, pp 178–186
Jaroszewicz S, Scheffer T, Simovici D (2009) Scalable pattern mining with Bayesian networks as background knowledge. Data Min Knowl Discov 18(1): 56–100
Ji S, Watson L, Carin L (2009) Semisupervised learning of Hidden Markov Models via a homotopy method. IEEE Trans Pattern Anal Mach Intell 31: 275–287
Laxman S, Sastry P (2005) Discovering frequent episodes and learning Hidden Markov Models: a formal connection. IEEE Trans Knowl Data Eng 17(11): 1505–1517
Laxman S, Sastry P (2006) A survey of temporal data mining. Sadhana 31(2): 173–198
Lee CH, Gauvin JL (1996) Bayesian adaptive learning and map estimation of HMM. In: Lee CH, Soong F, Paliwal K (eds) Automatic speech and speaker recognition: advanced topics. Springer, New York, pp 83–108
Li D, Laurent A, Poncelet P (2007) Mining unexpected sequential patterns and rules. Technical Report RR-07027, Laboratoire d’Informatique de Robotique et de Microélectronique de Montpellier
Low-Kam C, Mas A, Teisseire M (2009) Mining for unexpected sequential patterns given a Markov model. http://www.math.univ-montp2.fr/~mas/lmt_siam09.pdf
Meyer C (2001) Matrix analysis and applied linear algebra book and solutions manual. SIAM, Philadelphia
Prum B, Rodolphe F, de Turckheim E (1995) Finding words with unexpected frequencies in DNA sequences. J R Stat Soc Ser B 57: 205–220
Qian N, Sejnowski T (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202: 865–884
Rabiner LR (1989) A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc IEEE 77(2): 257–286
Robbins H (1956) An empirical bayes approach to statistics. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1: contributions to the theory of statistics, University of California Press, pp 157–163
Satsangi A, Zaiane O (2007) Contrasting the contrast sets: an alternative approach. In: Eleventh international database engineering and applications symposium (IDEAS 2007), Banff, Canada
Spiliopoulou M (1999) Managing interesting rules in sequence mining. In: Proceedings of the third European conference on principles of data mining and knowledge discovery, pp 554–560
Uhlig F (2001) Transform linear algebra. Prentice Hall, Upper Saddle River
van Loan C, Golub G (1996) Matrix computations. Johns Hopkins University Press, London
Welch L (2003) Hidden Markov Models and the Baum-Welch algorithm. IEEE Inform Theory Soc Newsl 53(4): 1,10–13
Zaiane O, Yacef K, Kay J (2007) Finding top-n emerging sequences to contrast sequence sets. Tech. Rep. TR07-03, Department of Computing Science, University of Alberta, Edmonton, AB, Canada
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz and Arno Knobbe.
Rights and permissions
About this article
Cite this article
Jaroszewicz, S. Using interesting sequences to interactively build Hidden Markov Models. Data Min Knowl Disc 21, 186–220 (2010). https://doi.org/10.1007/s10618-010-0171-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0171-0