Abstract
A higher order recurrent neural network architecture learns to recognize and generate languages after being “trained” on categorized exemplars. Studying these networks from the perspective of dynamical systems yields two interesting discoveries: First, a longitudinal examination of the learning process illustrates a new form of mechanical inference: Induction by phase transition. A small weight adjustment canses a “bifurcation” in the limit behavior of the network. This phase transition corresponds to the onset of the network's capacity for generalizing to arbitrary-length strings. Second, a study of the automata resulting from the acquisit on of previously published training sets indicates that while the architecture is not guaranteed to find a minimal finite automaton consistent with the given exemplars, which is an NP-Hard problem, the architecture does appear capable of generating non regular languages by exploiting fractal and chaotic dynamics. I end the paper with a hypothesis relating linguistic generative capacity to the behavioral regimes of non-linear dynamical systems.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Anderson J.A., Silverstein J.W., Ritz S.A. & Jones R.S. (1977). Distinctive features, categorical perception, and probability learning: Some applications of a neural model. Psychological Review, 84, 413–451.
Angluin D. (1978). On the complexity of minimum inference of regular sets. Information and Control, 39, 337–350.
Angluin D. & Smith C.H. (1983). Inductive inference: Theory and methods. Computing Surveys, 15, 237–269.
Barnsley M.F. (1988). Fractals everywhere. San Diego: Academic Press.
Berwick R. (1985). The acquisition of syntactic knowledge. Cambridge: MIT Press.
Chaitin G.J. (1966). On the length of programs for computing finite binary sequences. Journal of the AM 13, 547–569.
Chomsky N. (1956). Three models for the description of language. IRE Transactions on Information Theory IT-2, 113–124.
Chomsky N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Crutchfield J.P., Farmer J.D., Packard N.H. & Shaw R.S. (1986). Chaos. Scientific American, 255, 46–57.
Crutchfield J.P. & Young K. (1989). Computation at the onset of chaos. In W.Zurek. (Ed.), Complexity, entropy and the physics of Information. Reading, MA: Addison-Wesley.
Derrida, B. & Meir, R. (1988). Chaotic behavior of a layered neural network. Phys. Rev. A. 38.
Devaney R.L. (1987) An introduction to chaotic dynamical systems. Reading, MA: Addison-Wesley.
Elman J.L. (1990). Finding structure in time. Cognitive Science, 14, 179–212.
Feldman J.A. (1972). Some decidability results in grammatical inference. Information & Control, 20, 244–462.
Fodor J. & Pylyshyn A. (1988). Connectionism and cognitive architecture: A critical analysis. cognition, 28, 3–71.
Giles C.L., Sun G.Z., Chen H.H., Lee Y.C. & Chen D. (1990). Higher order recurrent networks and grammatical inference. In D.S.Touretzky, (Ed.), Advances in neural information processing systems, Los Gatos, CA: Morgan Kaufmann.
Gleick J. (1987). Chaos: Making a new science. New York: Viking.
Gold E.M. (1967). Language identification in the limit. Information & Control, 10, 447–474.
Gold E.M. (1978). Complexity of automaton identification from given data. Information and Control, 37, 302–320.
Grassberger P. & Procaccia I. (1983). Measuring the strangeness of strange attractors. Physica, 9D, 189–208.
Grebogi C., Ott E. & Yorke J.A. (1987). Chaos, strange attractors, and fractal basin boundaries in nonlinear dynamics. Science, 238, 632–638.
Hendin, O., Horn, D. & Usher, M. (1991). Chaotic behavior of a neural network with dynamical thresholds. Int. Journal of Neural Systems, to appear.
Hopfield J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences USA, 79, 2554–2558.
Hornik, K., Stinchcombe, M. & White, H. (1990). Multi-layer feedforward networks are universal approximators. Neural networks, 3.
Huberman B.A. & Hogg T. (1987). Phase transitions in artificial intelligence systems. Artificial Intelligence, 33, 155–172.
Joshi A.K. (1985). Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? In D.R.Dowty, L.Karttunen & A.M.Zwicky, (Eds.). Natural language parsing Cambridge, Cambridge University Press.
Joshi A.K., Vijay-shanker K. & Weir D.J. (1989). Convergence of mildly context-sensitive grammar formalism. In T.Wasow & P.Sells, (Eds.), The processing of linguistic structure. Cambridge: MIT Press.
Kolen J.F. & Pollack J.B. (1990). Back-propagation is sensitive to initial conditions. Complex Systems, 4, 269–280.
Kurten, K.E. (1987). Phase transitions in quasirandom neural networks. In Institute of Electrical and Electronics Engineers First International Conference on Neural Networks. San Diego, II-197-20.
Lapedes, A.S. & Farber, R.M. (1988). How neural nets work (LAUR-88-418): Los Alamos, NM.
Lieberman P. (1984). The biology and evolution of language. Cambridge: Harvard University Press.
Lindgren K. & Nordahl M.G. (1990). Universal computation in simple one-dimensional cellular automata. Complex Systems, 4, 299–318.
Lippman, R.P. (1987). An introduction to computing with neural networks. Institute of Electrical and Electronics Engineers ASSP Magazine, April, 4–22.
MacLennan B.J. (1989). Continuous computation (CS-89–83). Knoxville, TN: University of Tennessee, Computer Science Dept.
MacWhinney B. (1987). Mechanisms of language acquisition. Hillsdale: Lawrence Erlbaum Associates
Mandelbrot B. (1982). The fractal geometry of nature, San Francisco: Freeman.
McCulloch W.S. & Pitts W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.
Mealy G.H. (1955). A method for synthesizing sequential circuits. Bell System Technical Journal, 43, 1045–1079.
Metcalfe J. & Wiebe D. (1987). Intuition in insight and noninsight problem solving. Memory and Cognition. 15, 238–246.
Minsky M. (1972). Computation: Finite and infinite machines. Cambridge, MA: MIT Press.
Minsky M. & Poper S. (1988). Perceptrons. Cambridge, MA: MIT Press.
Moore C. (1990). Unpredictability and undecidability in dynamical systems. Physical Review Letters, 62, 2354–2357.
Mozer, M. (1988). A focused back-propagation algorithm for temporal pattern recognition (CRG-Technical Report-88-3). University of Toronto.
Pearlmutter B.A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation, 1, 263–269.
Pineda F.J. (1987). Generalization of back-propagation to recurrent neural networks Physical Review Letters, 59, 2229–2232.
Pinker S. (1984). Language learnability and language development. Cambridge: Harvard University Press.
Pinker S. & Prince A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language inquisition. Cognition, 28, 73–193.
Pinker S. & Bloom P. (1990). Natural language and natural selection. Brain and Behavioral Sciences, 12 707–784.
Plunkett K. & Marchman V. (1989). Pattern association in a back-propagation network: Implications for child language acquisition (Technical Report 8902). San Diego: UCSD Center for Research in Language.
Pollack, J.B. (1987). Cascaded back propagation on dynamic connectionist networks. Proceedings of the Ninth Conference of the Cognitive Science Society (pp. 391–404). Seattle, WA.
Pollack, J.B. (1987). On connectionist models of natural language processing. Ph.D. Thesis, Computer Science Department, University of Illinois, Urbana, IL. (Available as MCCS-87-100. Computing Research Laboratory Las Cruces, NM).
Pollack J.B. (1989). Implications of recursive distributed representations. In D.S.Touretzky, (Ed.), Advances in neural information processing systems. Los Gatos, CA: Morgan Kaufmann.
Pollack J.B. (1990). Recursive distributed representation. Artificial Intelligence, 46, 77–105.
Pollard, C. (1984). Generalized context-free grammars, head grammars and natural language. Doctoral Dissertation, Dept. of Linguistics, Stanford University, Palo Alto, CA.
Rivest, R.L. & Schapire, R.E. (1987). A new approach to unsupervised learning in determunistic environments. Proceedings of the Fourth International Workshop on Machine Learning (pp. 364–475). Irvine, CA.
Rumelhart D.E. & McClelland J.L. (1986). PDP models and general issues in cognitive science. In D.E.Rumelhart, J.L.McClelland & the PDP Research Group, (Eds.), Parallel distributed processing: Experiments in the microstructure of cognition, Vol. 1. Cambridge: MIT Press.
Rumelhart D.E., Hinton G. & Williams R. (1986). Learning internal representations through error propagation In D.E.Rumelhart, J.L.McClelland & the PDP Research Group, (Eds.), Parallel distributed processing Experiments in the microstructure of cognition, Vol. 1. Cambridge: MIT Press.
Servan-Schreiber D., Cleeremans A. & McClelland J.L. (1989). Encoding sequential structure in simple recurrent networks. In D.S.Touretzky, (Ed.), Advances in neutral information processing systems. Los Gatos, CA: Morgan Kaufmann.
Skarda, C.A. & Freeman, W.J. (1987). How brains make chaos. Brain & Behavioral Science, 10.
Smolensky P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D.F.Rumelhart, J.L.McClelland & the PDP Research Group, (Eds.), Parallel distributed processing: Experiments in the microstructure of cognition, Vol. 1. Cambridge: MIT Press.
Tomita, M. (1982). Dynamic construction of finite-state automata from examples using hill-climbing. Proceedings of the Fourth Annual Cognitive Science Conference (pp. 105–108). Ann Arbor. MI.
Touretzky, D.S. & Geva, S. (1987). A distributed connectionist representation for corcept structures. Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 155–164). Seattle. WA.
van derMaas H., Verschure P. & Molenaar P. (1990). A note on chaotic behavior in simple neutral networks, Neural Networks, 3, 119–122.
Wexler K. & Culicover P.W. (1980). Formal principles of language acquisition Cambridge: MIT Press.
Wolfram S. (1984). Universality and complexity in cellular automata. Physica, 10D, 1–35.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Pollack, J.B. The induction of dynamical recognizers. Mach Learn 7, 227–252 (1991). https://doi.org/10.1007/BF00114845
Issue Date:
DOI: https://doi.org/10.1007/BF00114845