Abstract
We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The decomposition method adapts attribute subset selection and constructive induction (cluster definition) to define new subproblems. To these problem definitions, we can apply metric-based model selection to select from a database of learning components, thereby producing a specification for supervised learning using a mixture model. We report positive learning results using temporal artificial neural networks (ANNs), on a synthetic, multiattribute learning problem and on a real-world time series monitoring application.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Barr, A. & Feigenbaum, E. A. (1981). Search. The handbook of artificial intelligence (Vol. 1, pp. 19–139). Reading, MA: Addison-Wesley.
Beauchamp, J.W., Maher, R. C., & Brown, R. (1993). Detection of musical pitch from recorded solo performances. Proceedings of the 94th Convention of the Audio Engineering Society, Berlin, Germany.
Benjamin, D. P. (Ed.) (1990). Change of representation and inductive bias. Boston: Kluwer Academic Publishers.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford, UK: Clarendon Press.
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time series analysis, forecasting, and control, 3rd ed. San Fransisco, CA: Holden-Day.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Cover, T. M. & Thomas, J. A. (1991). Elements of information theory. New York, NY: John Wiley and Sons.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (Series B), 1–38.
Donoho, S. K. (1996). Knowledge-guided constructive induction. Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.
Duda, R. O. & Hart, P. E. (1973). Pattern classification and scene analysis. New York, NY: JohnWiley and Sons.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
Engels, R., Verdenius, F., & Aha, D. (1998). Proceedings of the 1998 Joint AAAI-ICML Workshop on the Methodology of Applying Machine Learning (Technical Report WS–98–16). Menlo Park, CA: AAAI Press.
Freund, T. & Schapire, R. E. (1996). Experiments with aNewBoosting Algorithm. Machine Learning: Proceedings of the Thirteenth International Conference on (ICML-96).
Fu, L.-M. & Buchanan, B. G. (1985). Learning intermediate concepts in constructing a hierarchical knowledge base. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-85), Los Angeles, CA (pp. 659–666).
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemna. Neural Computation, 4, 1–58.
Gershenfeld, N. A. & Weigend, A. S. (1994). The Future of Time Series: Learning and Understanding. In A. S. Weigend & N. A. Gershenfeld (Eds.), Time series prediction: forecasting the future and understanding the past, Santa Fe institute studies in the sciences of complexity (Vol. XV). Reading, MA: Addison-Wesley.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley.
Grois, E., Hsu, W. H., Wilkins, D. C., & Voloshin, M. (1998). Bayesian network models for automatic generation of crisis management training scenarios. Proceedings of the National Conference on Innovative Applications of Artificial Intelligence (IAAI-98), Madison, WI (pp. 1113–1120). Menlo Park, CA: AAAI Press.
Hayes-Roth, B., Larsson, J. E., Brownston, L., Gaba, D., & Flanagan, B. (1996). Guardian Project Home Page, URL: http://www-ksl.stanford.edu/projects/guardian/index.html.
Haykin, S. (1994). Neural networks: A comprehensive foundation. New York, NY: Macmillan College Publishing.
Horvitz, E. & Barry, M. (1995). Display of information for time-critical decision making. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. San Mateo, CA: Morgan-Kaufmann.
Heckerman, D. A. (1996). A tutorial on learning with bayesian networks. Microsoft Research Technical Report 95–06, revised June 1996.
Hsu, W. H. (1998). Time series learning with probabilistic network composites. Ph.D. Thesis, University of Illinois at Urbana-Champaign (UIUC-DCS-R2063). URL: http://www.ncsa.uiuc.edu/People/bhsu/thesis.html.
Hsu, W. H., Gettings, N. D., Lease, V. E., Pan, Y., & Wilkins, D. C. (1998). A new approach to multistrategy learning from heterogeneous time series. Proceedings of the InternationalWorkshop on Multistrategy Learning, Milan, Italy.
Hsu, W. H., Auvil, L. S., Pottenger, W. M., Teheng, D., & Welge, M. (1999). Self-organizing systems for knowledge discovery in databases. Proceedings of the International Joint Conference on Neural Networks (IJCNN-99), Washington, DC.
Hsu, W. H. & Ray, S. R. (1998). A new mixture model for concept learning from time series. Proceedings of the 1998 Joint AAAI-ICML Workshop on AI Approaches to Time Series Problems (Technical Report WS–98–07), Madison, WI (pp. 42–43). Menlo Park, CA: AAAI Press.
Hsu, W. H. & Ray, S. R. (1999). A recurrent mixture model for time series classification. Proceedings of the International Joint Conference on Neural Networks (IJCNN-99), Washington, DC.
Hsu, W. H. & Zwarico, A. E. (1995). Automatic synthesis of compression techniques for heterogeneous files. Software: Practice and Experience, 25(10), 1097–1116.
Jacobs, R. A., Jordan, M. I., & Barto, A. G. (1991). Task decomposition through competition in a modular connectionist architecture: the what and where vision tasks. Cognitive Science, 15, 219–250.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.
Jordan, M. I. (1997a). Approximate inference via variational techniques. International Conference on Uncertainty in Artificial Intelligence, Providence, RI, invited talk.
Jordan, M. I. (1997b). Personal communication.
Jordan, M. I. & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.
Kantz, H. & Schreiber, T. (1997). Nonlinear time series analysis. Cambridge, UK: Cambridge University Press.
Kira, K. & Rendell, L. A. (1992). The feature selection problem: traditional methods and a new algorithm. Proceedings of the National Conference on Artificial Intelligence (AAAI-92), San Jose, CA (pp. 129–134). Cambridge, MA: MIT Press.
Kohavi, R. & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence (special issue on relevance), 97(1–2), 273–324.
Kohavi, R., Sommerfield, D. & Dougherty, J. (1996). Data mining using MLC++: A machine learning library in C++. Tools with artificial intelligence (pp. 234–245). Rockville, MD: IEEE Computer Society Press.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78, 1464–1480.
Lang, K. J., Waibel, A. H., & Hinton, G. E. (1990). A time-delay neural network architecture for isolated word recognition. Neural Networks, 3, 23–43.
Li, T., Fang, L. & Li, K. Q-Q. (1993). Hierarchical classification and vector quantization with neural trees. Neurocomputing, 5, 119–139.
McCullagh, P. & Nelder, J. A. (1983). Generalized linear models. London, UK: Chapman and Hall.
Mengshoel, O. J. & Wilkins, D. C. (1996). Recognition and critiquing of erroneous student actions. Proceedings of the AAAI Workshop on Agent Modeling (pp. 61–68). Menlo Park, CA: AAAI Press.
Michalski, R. S. (1983). A theory and methodology of inductive learning. Artificial Intelligence, 20(2), 111–161. Reprinted in; Readings in knowledge acquisition and learning, B. G. Buchanan & D. C. Wilkins (Eds.) (1993). San Mateo, CA: Morgan-Kaufmann.
Mozer, M. C. (1994). Neural net architectures for temporal sequence processing. In A. S. Weigend & N. A. Gershenfeld (Eds.), Time series prediction: forecasting the future and understanding the past, Santa Fe institute studies in the sciences of complexity (Vol. XV). Reading, MA: Addison-Wesley.
Neal, R. M. (1996). Bayesian learning for neural networks. New York, NY: Springer-Verlag.
Principè, J. & deVries. (1992). The Gamma Model-A new neural net model for temporal processing. Neural Networks, 5, 565–576.
Principè, J. & Lefebvre, C. (1998). NeuroSolutions v3.02. Gainesville, FL: NeuroDimension. URL: http://www.nd.com.
Ray, S. R. & Hsu, W. H. (1998). Self-organized-expert modular network for classification of spatiotemporal sequences. Journal of Intelligent Data Analysis, 2(4). URL: http://www-east.elsevier.com/ida/browse/0204/ida00039/ida00039.htm.
Resnick, P. & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58.
Rueckl, J. G., Cave, K. R., & Kosslyn, S. M. (1989). Why are “What” and “Where” Processed by Separate Cortical Visual Systems? A computational investigation. Journal of Cognitive Neuroscience, 1, 171–186.
Russell, S. & Norvig, P. (1995). Artificial intelligence: A modern approach. Englewood Cliffs, NJ: Prentice Hall.
Sarle, W. S. (Ed.) (1999). Neural network FAQ, periodic posting to the USENET newsgroup comp.ai.neural-nets.
Schuurmans, D. (1997). A new metric-based approach to model selection. Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), Providence, RI (pp. 552–558).
Smyth, P. (1998). Challlenges for the application of machine learning problems. 1998 Joint AAAI-ICMLWorkshop on the Methodology of Applying Machine, Madison, WI, Invited talk. Menlo Park, CA: AAAI Press.
Stein, B. & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press.
Stepp, R. E. & Michalski, R. S. (1986). Conceptual clustering: Inventing goal-oriented classifications of structured objects. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial Intelligence Approach. San Mateo, CA: Morgan-Kaufmann.
Stone, M. (1977). An asymptoticl equivalence of choice of models by cross-validation and akaike's criterion. Journal of the Royal Statistical Society Series B, 39, 44–47.
Vapnik, V. N. (1996). The nature of statistical learning theory. New York, NY: Springer-Verlag.
Watanabe, S. (1985). Pattern recognition: human and mechanical. New York, NY: John Wiley and Sons.
Wilkins, D. C. & Sniezek, J. A. (1997). DC-ARM: Automation for reduced manning. Knowledge Based Systems Laboratory, Technical Report UIUC-BI-KBS–97–012. Beckman Institute, University of Illinois at Urbana-Champaign.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hsu, W.H., Ray, S.R. & Wilkins, D.C. A Multistrategy Approach to Classifier Learning from Time Series. Machine Learning 38, 213–236 (2000). https://doi.org/10.1023/A:1007694209216
Issue Date:
DOI: https://doi.org/10.1023/A:1007694209216