Learning internal representations by error propagation, parallel distributed processing, explorations in the microstructure of cognition, ed. de rumelhart and j …

DE Rumelhart, GE Hinton, RJ Williams - Biometrika, 1986 - direct.mit.edu
DE Rumelhart, GE Hinton, RJ Williams
Biometrika, 1986direct.mit.edu
Figure 1 A multilayer network. In this case the information coming to the input units is
recoded into an internal representation and the outputs are generated by the internal
representation rather than by the original pattern. Input patterns can always be encoded, if
there are enough hidden units, in a form so that the appropriate output pattern can be
generated from any input pattern. basic responses to this lack. One response is represented
by competitive learning (Chapter 5) in which simple unsupervised learning rules are …
Figure 1 A multilayer network. In this case the information coming to the input units is recoded into an internal representation and the outputs are generated by the internal representation rather than by the original pattern. Input patterns can always be encoded, if there are enough hidden units, in a form so that the appropriate output pattern can be generated from any input pattern. basic responses to this lack. One response is represented by competitive learning (Chapter 5) in which simple unsupervised learning rules are employed so that useful hidden units develop. Although these approaches are promising, there is no external force to insure that hidden units appropriate for the required mapping are developed. The second response is to simply assume an internal representation that, on some a priori grounds, seems reasonable. This is the tack taken in the chapter on verb learning (Chapter 18) and in the interactive activation model of word perception (McClelland & Rumelhart, 1981; Rumelhart & McClelland, 1982). The third approach is to attempt to develop a learning procedure capable of learning an internal representation ade~ uate for performing the task at hand. One such development is presented in the discussion of Boltzmann machines in Chapter 7. As we have seen, this procedure involves the use of stochastic units, requires the network to reach equilibrium in two different phases, and is limited to symmetric networks. Another recent approach, also employing stochastic units, has been developed by Barto (1985) and various of his colleagues (cf. Barto & Anandan, 1985). In this chapter we present another alternative that works with deterministic units, that involves only local computations, and that is a clear generalization of the delta rule. We call this the generalized delta rule. From other considerations, Parker (1985) has independently derived a similar generalization, which he calls learning-logic. Le Cun (1985) has also studied a roughly similar learning scheme. In the remainder of this chapter we first derive the generalized delta rule, then we illustrate its use by providing some results of our simulations, and finally we indicate some further generalizations of the basic idea.
MIT Press