Abstract
Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
G. Bakiri and T. G. Dietterich. Achieving high-accuracy text-to-speech with machine learning. In R. I. Damper, editor, Data Mining Techniques in Speech Synthesis. Chapman and Hall, New York, NY, 2002.
Y. Bengio and P. Frasconi. Input-output HMM’s for sequence processing. IEEE Transactions on Neural Networks, 7(5):1231–1249, September 1996.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462–467, 1968.
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000.
A. P. Dempster, N. M. Laird, and D. B Rubin. Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc., B39:1–38, 1977.
J. L. Elman. Finding structure in time. Cognitive Science, 14:179–211, 1990.
T. Fawcett and F. Provost. Adaptive fraud detection. Knowledge Discovery and Data Mining, 1:291–316, 1997.
C. L. Giles, G. M. Kuhn, and R. J. Williams. Special issue on dynamic recurrent neural networks. IEEE Transactions on Neural Networks, 5(2), 1994.
A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation of non-orthogonal components. Technometrics, 12:55–67, 1970.
M. I. Jordan. Serial order: A parallel distributed processing approach. ICS Rep. 8604, Inst. for Cog. Sci., UC San Diego, 1986.
Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, 1997.
Daphne Koller and Mehran Sahami. Toward optimal feature selection. In Proc. 13th Int. Conf. Machine Learning, pages 284–292. Morgan Kaufmann, 1996.
Igor Kononenko, Edvard Šimec, and Marko Robnik-Šikonja. Overcoming the myopic of inductive learning algorithms with RELIEFF. Applied Intelligence, 7(1): 39–55, 1997.
John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Int. Conf. Machine Learning, San Francisco, CA, 2001. Morgan Kaufmann.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
Oded Maron and Andrew W. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In Adv. Neural Inf. Proc. Sys. 6, 59–66. Morgan Kaufmann, 1994.
Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In Int. Conf. on Machine Learning, 591–598. Morgan Kaufmann, 2000.
Thomas M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
N. Qian and T. J. Sejnowski. Predicting the secondary structure of globular proteins using neural network models. J. Molecular Biology, 202:865–884, 1988.
J. R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann, 1993.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing — Explorations in the Micro structure of Cognition, chapter 8, pages 318–362. MIT Press, 1986.
T. J. Sejnowski and C. R. Rosenberg. Parallel networks that learn to pronounce english text. Journal of Complex Systems, 1(1):145–168, February 1987.
A. S. Weigend, D. E. Rumelhart, and B. A. Huberman. Generalization by weight-elimination with application to forecasting. Adv. Neural Inf. Proc. Sys. 3, 875–882, Morgan Kaufmann, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dietterich, T.G. (2002). Machine Learning for Sequential Data: A Review. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2002. Lecture Notes in Computer Science, vol 2396. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70659-3_2
Download citation
DOI: https://doi.org/10.1007/3-540-70659-3_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44011-6
Online ISBN: 978-3-540-70659-5
eBook Packages: Springer Book Archive