Google Scholar

[PDF][PDF] Convolutional networks for images, speech, and time series

Y LeCun, Y Bengio - The handbook of brain theory and neural networks, 1995 - Citeseer

The handbook of brain theory and neural networks, 1995•Citeseer

The ability of multilayer back-propagation networks to learn complex, high-dimensional, nonlinear mappings from large collections of examples makes them obvious candidates for image recognition or speech recognition tasks (see PATTERN RECOGNITION AND NEURAL NETWORKS). In the traditional model of pattern recognition, a hand-designed feature extractor gathers relevant information from the input and eliminates irrelevant variabilities. A trainable classi er then categorizes the resulting feature vectors (or strings of symbols) into classes. In this scheme, standard, fully-connected multilayer networks can be used as classi ers. A potentially more interesting scheme is to eliminate the feature extractor, feeding the network with\raw" inputs (eg normalized images), and to rely on backpropagation to turn the rst few layers into an appropriate feature extractor. While this can be done with an ordinary fully connected feed-forward network with some success for tasks such as character recognition, there are problems.

Firstly, typical images, or spectral representations of spoken words, are large, often with several hundred variables. A fully-connected rst layer with, say a few 100 hidden units, would already contain several 10,000 weights. Over tting problems may occur if training data is scarce. In addition, the memory requirement for that many weights may rule out certain hardware implementations. But, the main de ciency of unstructured nets for image or speech aplications is that they have no built-in invariance with respect to translations, or

Citeseer

Show moreShow less

Save Cite Cited by 8000 Related articles All 20 versions View as HTML

Cite

Advanced search

Saved to My library

[PDF][PDF] Convolutional networks for images, speech, and time series