Using interesting sequences to interactively build Hidden Markov Models

Jaroszewicz, Szymon

doi:10.1007/s10618-010-0171-0

Using interesting sequences to interactively build Hidden Markov Models

Published: 07 April 2010

Volume 21, pages 186–220, (2010)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Szymon Jaroszewicz¹

338 Accesses
Explore all metrics

Abstract

The paper presents a method of interactive construction of global Hidden Markov Models (HMMs) based on local sequence patterns discovered in data. The method is based on finding interesting sequences whose frequency in the database differs from that predicted by the model. The patterns are then presented to the user who updates the model using their intelligence and their understanding of the modelled domain. It is demonstrated that such an approach leads to more understandable models than automated approaches. Two variants of the problem are considered: mining patterns occurring only at the beginning of sequences and mining patterns occurring at any position; both practically meaningful. For each variant, algorithms have been developed allowing for efficient discovery of all sequences with given minimum interestingness. Applications to modelling webpage visitors behavior and to modelling protein secondary structure are presented, validating the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD conference on management of data, pp 207–216
Asai K, Hayamizu S, Handa K (1993) Prediction of protein secondary structure by the Hidden Markov Model. Bioinformatics 9(2): 141–146
Article Google Scholar
Balasubramanian V (1993) Equivalence and reduction of Hidden Markov Models. A.I. Technical Report 1370, Artificial Intelligence Laboratory, MIT
Bouchaffra D, Tan J (2006) Protein fold recognition using a structural Hidden Markov Model. In: International conference on pattern recognition, vol 3. Los Alamitos, CA, pp 186–189
Dunham MH (2003) Data mining, introductory and advanced topics, Part III. http://engr.smu.edu/~mhd/dmbook/part3.ppt
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1): 55–86
Article MathSciNet Google Scholar
Hand D (2002) Pattern detection and discovery. In: Hand D, Adams N, Bolton R (eds) Pattern detection and discovery. Springer, Berlin, pp 1–12
Chapter Google Scholar
Hunter L (1993) Molecular biology for computer scientists. In: Hunter L (eds) Artificial intelligence and molecular biology, chap 1. AAAI Press, Menlo Park, pp 1–46
Google Scholar
Huo Q, Lee CH (1997) On-line adaptive learning of the continuous density Hidden Markov Model based on approximate recursive Bayes estimate. IEEE Trans Speech Audio Process 5(2): 161–172
Article Google Scholar
Huo Q, Chan C, Lee CH (1995) Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition. IEEE Trans Speech Audio Process 3(5): 334–345
Article Google Scholar
Inoue M, Ueda N (2003) Exploitation of unlabeled sequences in Hidden Markov Models. IEEE Trans Pattern Anal Mach Intell 25: 1570–1581
Article Google Scholar
Jaroszewicz S (2008) Interactive HMM construction based on interesting sequences. In: Proceedings of local patterns to global models (LeGo’08) workshop at the 12th European conference on principles and practice of knowledge discovery in databases (PKDD’08), Antwerp, Belgium, pp 82–91
Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data, relative to a Bayesian network. In: 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2005), Chicago, IL, pp 118–127
Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004), Seattle, WA, pp 178–186
Jaroszewicz S, Scheffer T, Simovici D (2009) Scalable pattern mining with Bayesian networks as background knowledge. Data Min Knowl Discov 18(1): 56–100
Article Google Scholar
Ji S, Watson L, Carin L (2009) Semisupervised learning of Hidden Markov Models via a homotopy method. IEEE Trans Pattern Anal Mach Intell 31: 275–287
Article Google Scholar
Laxman S, Sastry P (2005) Discovering frequent episodes and learning Hidden Markov Models: a formal connection. IEEE Trans Knowl Data Eng 17(11): 1505–1517
Article Google Scholar
Laxman S, Sastry P (2006) A survey of temporal data mining. Sadhana 31(2): 173–198
Article MATH MathSciNet Google Scholar
Lee CH, Gauvin JL (1996) Bayesian adaptive learning and map estimation of HMM. In: Lee CH, Soong F, Paliwal K (eds) Automatic speech and speaker recognition: advanced topics. Springer, New York, pp 83–108
Google Scholar
Li D, Laurent A, Poncelet P (2007) Mining unexpected sequential patterns and rules. Technical Report RR-07027, Laboratoire d’Informatique de Robotique et de Microélectronique de Montpellier
Low-Kam C, Mas A, Teisseire M (2009) Mining for unexpected sequential patterns given a Markov model. http://www.math.univ-montp2.fr/~mas/lmt_siam09.pdf
Meyer C (2001) Matrix analysis and applied linear algebra book and solutions manual. SIAM, Philadelphia
Google Scholar
Prum B, Rodolphe F, de Turckheim E (1995) Finding words with unexpected frequencies in DNA sequences. J R Stat Soc Ser B 57: 205–220
MATH Google Scholar
Qian N, Sejnowski T (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202: 865–884
Article Google Scholar
Rabiner LR (1989) A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc IEEE 77(2): 257–286
Article Google Scholar
Robbins H (1956) An empirical bayes approach to statistics. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1: contributions to the theory of statistics, University of California Press, pp 157–163
Satsangi A, Zaiane O (2007) Contrasting the contrast sets: an alternative approach. In: Eleventh international database engineering and applications symposium (IDEAS 2007), Banff, Canada
Spiliopoulou M (1999) Managing interesting rules in sequence mining. In: Proceedings of the third European conference on principles of data mining and knowledge discovery, pp 554–560
Uhlig F (2001) Transform linear algebra. Prentice Hall, Upper Saddle River
Google Scholar
van Loan C, Golub G (1996) Matrix computations. Johns Hopkins University Press, London
MATH Google Scholar
Welch L (2003) Hidden Markov Models and the Baum-Welch algorithm. IEEE Inform Theory Soc Newsl 53(4): 1,10–13
MathSciNet Google Scholar
Zaiane O, Yacef K, Kay J (2007) Finding top-n emerging sequences to contrast sequence sets. Tech. Rep. TR07-03, Department of Computing Science, University of Alberta, Edmonton, AB, Canada

Download references

Author information

Authors and Affiliations

National Institute of Telecommunications, Warsaw, Poland
Szymon Jaroszewicz

Authors

Szymon Jaroszewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szymon Jaroszewicz.

Additional information

Responsible editor: Johannes Fürnkranz and Arno Knobbe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaroszewicz, S. Using interesting sequences to interactively build Hidden Markov Models. Data Min Knowl Disc 21, 186–220 (2010). https://doi.org/10.1007/s10618-010-0171-0

Download citation

Received: 20 March 2009
Accepted: 06 March 2010
Published: 07 April 2010
Issue Date: July 2010
DOI: https://doi.org/10.1007/s10618-010-0171-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using interesting sequences to interactively build Hidden Markov Models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive Data Exploration Using Pattern Mining

Frequent Sequence Mining in Web Log Data

Introduction to Pattern Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Using interesting sequences to interactively build Hidden Markov Models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive Data Exploration Using Pattern Mining

Frequent Sequence Mining in Web Log Data

Introduction to Pattern Mining

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation