Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions

Mozes, Shay; Weimann, Oren; Ziv-Ukelson, Michal

doi:10.1007/978-3-540-73437-6_4

Shay Mozes¹,
Oren Weimann¹ &
Michal Ziv-Ukelson²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

783 Accesses
10 Citations

Abstract

We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi’s decoding and training algorithms [21], as well as to the forward-backward and Baum-Welch [4] algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. We describe three algorithms based alternatively on byte pair encoding (BPE) [19], run length encoding (RLE) and Lempel-Ziv (LZ78) parsing [12]. Compared to Viterbi’s algorithm, we achieve a speedup of Ω(r) using BPE, a speedup of \(\Omega(\frac{r}{\log r})\) using RLE, and a speedup of \(\Omega(\frac{\log n}{k})\) using LZ78, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. Furthermore, unlike Viterbi’s algorithm, our algorithms are highly parallelizable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Optimal Decoding of Hidden Markov Models with Consistency Constraints

Fast Exact Pattern Matching by the Means of a Character Bit Representation

Article 07 March 2022

A Normal Sequence Compressed by PPM* But Not by Lempel-Ziv 78

References

Benson, G., Amir, A., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Comp. and Sys. Sciences 52(2), 299–307 (1996)
Article MathSciNet Google Scholar
Agazzi, O., Kuo, S.: HMM based optical character recognition in the presence of deterministic transformations. Pattern recognition 26, 1813–1826 (1993)
Article Google Scholar
Apostolico, A., Landau, G.M., Skiena, S.: Matching for run length encoded strings. Journal of Complexity 15(1), 4–16 (1999)
Article MATH MathSciNet Google Scholar
Baum, L.E.: An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)
Google Scholar
Bird, A.P.: Cpg-rich islands as gene markers in the vertebrate nucleus. Trends in Genetics 3, 342–347 (1987)
Article Google Scholar
Buchsbaum, A.L., Giancarlo, R.: Algorithmic aspects in speech recognition: An introduction. ACM Journal of Experimental Algorithms, 2(1) (1997)
Google Scholar
Bunke, H., Csirik, J.: An improved algorithm for computing the edit distance of run length coded strings. Information Processing Letters 54, 93–96 (1995)
Article MATH Google Scholar
Chan, T.M.: All-pairs shortest paths with real weights in O(n ³/log n) time. In: Proc. 9th Workshop on Algorithms and Data Structures, pp. 318–324 (2005)
Google Scholar
Churchill, G.A.: Hidden Markov chains and the analysis of genome structure. Computers Chem. 16, 107–115 (1992)
Article MATH Google Scholar
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetical progressions. Journal of Symbolic Computation 9, 251–280 (1990)
Article MATH MathSciNet Google Scholar
Crochemore, M., Landau, G., Ziv-Ukelson, M.: A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In: Proc. 13th Annual ACMSIAM Symposium on Discrete Algorithms, pp. 679–688 (2002)
Google Scholar
Durbin, R., Eddy, S., Krigh, A., Mitcheson, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Karkkainen, J., Navarro, G., Ukkonen, E.: Approximate string matching over Ziv-Lempel compressed text. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 195–209. Springer, Heidelberg (2000)
Chapter Google Scholar
Karkkainen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proc. Third South American Workshop on String Processing (WSP), pp. 141–155 (1996)
Google Scholar
Makinen, V., Navarro, G., Ukkonen, E.: Approximate matching of run-length compressed strings. In: Proc. 12th Annual Symposium On Combinatorial Pattern Matching (CPM). LNCS, vol. 1645, pp. 1–13. Springer, Heidelberg (1999)
Google Scholar
Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. In: CPM 2001. LNCS, vol. 2089, pp. 31–49. Springer, Heidelberg (2001)
Google Scholar
Manning, C., Schutze, H.: Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Navarro, G., Kida, T., Takeda, M., Shinohara, A., Arikawa, S.: Faster approximate string matching over compressed text. In: Proc. Data Compression Conference (DCC), pp. 459–468 (2001)
Google Scholar
Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Speeding up pattern matching by text compression. In: Bongiovanni, G., Petreschi, R., Gambosi, G. (eds.) CIAC 2000. LNCS, vol. 1767, pp. 306–315. Springer, Heidelberg (2000)
Chapter Google Scholar
Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969)
Article MATH MathSciNet Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13, 260–269 (1967)
Article MATH Google Scholar
Ziv, J., Lempel, A.: On the complexity of finite sequences. IEEE Transactions on Information Theory 22(1), 75–81 (1976)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, USA
Shay Mozes & Oren Weimann
School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel
Michal Ziv-Ukelson

Authors

Shay Mozes
View author publications
You can also search for this author in PubMed Google Scholar
Oren Weimann
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ziv-Ukelson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mozes, S., Weimann, O., Ziv-Ukelson, M. (2007). Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-73437-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions

Abstract

Access this chapter

Preview

Similar content being viewed by others

Optimal Decoding of Hidden Markov Models with Consistency Constraints

Fast Exact Pattern Matching by the Means of a Character Bit Representation

A Normal Sequence Compressed by PPM* But Not by Lempel-Ziv 78

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions

Abstract

Access this chapter

Preview

Similar content being viewed by others

Optimal Decoding of Hidden Markov Models with Consistency Constraints

Fast Exact Pattern Matching by the Means of a Character Bit Representation

A Normal Sequence Compressed by PPM* But Not by Lempel-Ziv 78

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation