Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3618408.3619559guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Learning hidden Markov models when the locations of missing observations are unknown

Published: 23 July 2023 Publication History

Abstract

The Hidden Markov Model (HMM) is one of the most widely used statistical models for sequential data analysis. One of the key reasons for this versatility is the ability of HMM to deal with missing data. However, standard HMM learning algorithms rely crucially on the assumption that the positions of the missing observations within the observation sequence are known. In the natural sciences, where this assumption is often violated, special variants of HMM, commonly known as Silent-state HMMs (SHMMs), are used. Despite their widespread use, these algorithms strongly rely on specific structural assumptions of the underlying chain, such as acyclicity, thus limiting the applicability of these methods. Moreover, even in the acyclic case, it has been shown that these methods can lead to poor reconstruction. In this paper we consider the general problem of learning an HMM from data with unknown missing observation locations. We provide reconstruction algorithms that do not require any assumptions about the structure of the underlying chain, and can also be used with limited prior knowledge, unlike SHMM. We evaluate and compare the algorithms in a variety of scenarios, measuring their reconstruction precision, and robustness under model miss-specification. Notably, we show that under proper specifications one can reconstruct the process dynamics as well as if the missing observations positions were known.

References

[1]
Campbell, K. R. and Yau, C. Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data. Nature communications, 9(1):1-12, 2018.
[2]
Chadza, T., Kyriakopoulos, K. G., and Lambotharan, S. Analysis of hidden markov model learning algorithms for the detection and prediction of multi-stage network attacks. Future Gener. Comput. Syst., 108:636-649, July 2020.
[3]
Chen, J., Rénia, L., and Ginhoux, F. Constructing cell lineages from single-cell transcriptomes. Mol Aspects Med, 59:95-113, 02 2018.
[4]
Chib, S. Calculating posterior distributions and modal estimates in markov mixture models. Journal of Econometrics, 75(1):79-97, 1996. ISSN 0304-4076.
[5]
Deconinck, L., Cannoodt, R., Saelens, W., Deplancke, B., and Saeys, Y. Recent advances in trajectory inference from single-cell omics data. Current Opinion in Systems Biology, 2021.
[6]
Eddy, S. R. Profile hidden Markov models. Bioinformatics, 14(9):755-763, 10 1998. ISSN 1367-4803.
[7]
Finn, R. D., Clements, J., and Eddy, S. R. Hmmer web server, interactive sequence similarity searching. Nucleic Acids Research, 39, 05 2011.
[8]
Francis, W. N. A standard corpus of edited present-day american english. College English, 26(4):267-273, 1965. ISSN 00100994.
[9]
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2013. ISBN 9781439840955. URL https://books.google.co.il/books?id=ZXL6AQAAQBAJ.
[10]
Goldberg, K. Upper bounds for the determinant of a row stochastic matrix, 1966.
[11]
Hamilton, J. D. Time series analysis. Princeton university press, 2020.
[12]
Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57: 97-109, 1970. ISSN 0006-3444.
[13]
Herring, C. A., Banerjee, A., McKinley, E. T., Simmons, A. J., Ping, J., Roland, J. T., Franklin, J. L., Liu, Q., Gerdes, M. J., Coffey, R. J., and Lau, K. S. Unsupervised trajectory analysis of single-cell rna-seq and imaging data reveals alternative tuft cell origins in the gut. Cell Systems, 6(1):37-51.e9, Jan 2018. ISSN 2405-4712.
[14]
Higham, N. J. and Lin, L. On pth roots of stochastic matrices. Linear Algebra and its Applications, 435(3):448-463, 2011. ISSN 0024-3795. URL https://www.sciencedirect.com/science/article/pii/S0024379510001849. Special Issue: Dedication to Pete Stewart on the occasion of his 70th birthday.
[15]
Kaufmann, S. Hidden Markov models in time series, with applications in economics. Chapman and Hall/CRC, 2019.
[16]
Liu, Z., Lou, H., Xie, K., Wang, H., Chen, N., Aparicio, O. M., Zhang, M. Q., Jiang, R., and Chen, T. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nature Communications, 8(1):22, June 2017a.
[17]
Liu, Z., Lou, H., Xie, K., Wang, H., Chen, N., Aparicio, O. M., Zhang, M. Q., Jiang, R., and Chen, T. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nature communications, 8(1):1-9, 2017b.
[18]
Lummertz da Rocha, E., Rowe, R. G., Lundin, V., Malleshaiah, M., Jha, D. K., Rambo, C. R., Li, H., North, T. E., Collins, J. J., and Daley, G. Q. Reconstruction of complex single-cell trajectories using cellrouter. Nature Communications, 9(1):892, Mar 2018. ISSN 2041-1723.
[19]
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087-1092, 1953.
[20]
Moor, M., Horn, M., Rieck, B., Roqueiro, D., and Borgwardt, K. Early recognition of sepsis with gaussian process temporal convolutional networks and dynamic time warping. In Doshi-Velez, F., Fackler, J., Jung, K., Kale, D., Ranganath, R., Wallace, B., and Wiens, J. (eds.), Proceedings of the 4th Machine Learning for Healthcare Conference, volume 106 of Proceedings of Machine Learning Research, pp. 2-26. PMLR, 09-10 Aug 2019. URL https://proceedings.mlr.press/v106/moor19a.html.
[21]
Morimura, T., Osogami, T., and Ide, T. Solving inverse problem of markov chain with partial observations. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper/2013/file/32b30a250abd6331e03a2a1f16466346-Paper.pdf.
[22]
Murphy, K. P. Machine learning: a probabilistic perspective. MIT press, 2012.
[23]
Nakagawa, S. and Nakanishi, H. Speaker-independent english consonant and japanese word recognition by a stochastic dynamic time warping method. IETE Journal of Research, 34(1):87-95, 1988.
[24]
Nishimoto, S., Tokuoka, Y., Yamada, T. G., Hiroi, N. F., and Funahashi, A. Predicting the future direction of cell movement with convolutional neural networks. PLOS ONE, 14(9):1-14, 09 2019.
[25]
Orr, J. W., Tadepalli, P., Doppa, J. R., Fern, X., and Dietterich, T. G. Learning scripts as hidden markov models, 2018.
[26]
Pattabiraman, S. and Warnow, T. Profile hidden markov models are not identifiable. IEEE/ACM Trans Comput Biol Bioinform, 18(1):162-172, February 2021.
[27]
Rabiner, L. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, 1989.
[28]
Ramati. M. Irreguler-time markov model. Master's thesis, Ben Gurion University, Negev, 2010.
[29]
Ryden, T. EM versus Markov chain Monte Carlo for estimation of hidden Markov models: a computational perspective. Bayesian Analysis, 3(4):659-688, 2008.
[30]
Saelens, W., Cannoodt, R., Todorov, H., and Saeys, Y. A comparison of single-cell trajectory inference methods. Nature Biotechnology, 37(5):547-554, May 2019. ISSN 1546-1696.
[31]
Schreiber, J. pomegranate. GitHub repository, 2016.
[32]
Setty, M., Kiseliovas, V., Levine, J., Gayoso, A., Mazutis, L., and Pe'er, D. Characterization of cell fate probabilities in single-cell data with palantir. Nature biotechnology, 37(4):451-460, 2019.
[33]
Shokoohi, F., Stephens, D. A., Bourque, G., Pastinen, T., Greenwood, C. M., and Labbe, A. A hidden markov model for identifying differentially methylated sites in bisulfite sequencing data. Biometrics, 75(1):210-221, 2019.
[34]
Speekenbrink, M. and Visser, I. Ignorable and nonignorable missing data in hidden markov models, 2021. URL https://arxiv.org/abs/2109.02770.
[35]
Van den Berge, K., De Bezieux, H. R., Street, K., Saelens, W., Cannoodt, R., Saeys, Y., Dudoit, S., and Clement, L. Trajectory-based differential expression analysis for single-cell sequencing data. Nature communications, 11 (1):1-13, 2020.
[36]
Wheeler, T. J. and Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics, 29(19):2487-2489, July 2013.
[37]
Ye, Y., Gao, L., and Zhang, S. Circular trajectory reconstruction uncovers cell-cycle progression and regulatory dynamics from single-cell hi-c maps. Advanced Science, 6(23):1900986, 2019.
[38]
Yeh, H.-W., Chan, W., and Symanski, E. Intermittent missing observations in discrete-time hidden markov models. Communications in Statistics - Simulation and Computation, 41(2):167-181, 2012.
[39]
Yoon, B.-J. Hidden markov models and their applications in biological sequence analysis. Current genomics, 10(6): 402-415, 2009.
[40]
Yu, D. and Deng, L. Automatic Speech Recognition. Springer, 2016.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media