article

Learning Deep Architectures for AI

Author:

Yoshua BengioAuthors Info & Claims

Foundations and Trends® in Machine Learning, Volume 2, Issue 1

Pages 1 - 127

https://doi.org/10.1561/2200000006

Published: 01 January 2009 Publication History

Abstract

Theoretical results suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g., in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This monograph discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.

References

[1]

D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A learning algorithm for boltzmann machines," Cognitive Science, vol. 9, pp. 147-169, 1985.

[2]

A. Ahmed, K. Yu, W. Xu, Y. Gong, and E. P. Xing, "Training hierarchical feed-forward visual recognition models using transfer learning from pseudo tasks," in Proceedings of the 10th European Conference on Computer Vision (ECCV'08), pp. 69-82, 2008.

Digital Library

[3]

E. L. Allgower and K. Georg, Numerical Continuation Methods. An Introduction. No. 13 in Springer Series in Computational Mathematics, Springer-Verlag, 1980.

Digital Library

[4]

C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003.

[5]

D. Attwell and S. B. Laughlin, "An energy budget for signaling in the grey matter of the brain," Journal of Cerebral Blood Flow And Metabolism, vol. 21, pp. 1133-1145, 2001.

[6]

J. A. Bagnell and D. M. Bradley, "Differentiable sparse coding," in Advances in Neural Information Processing Systems 21 (NIPS'08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), NIPS Foundation, 2009.

[7]

J. Baxter, "Learning internal representations," in Proceedings of the 8th International Conference on Computational Learning Theory (COLT'95), pp. 311- 320, Santa Cruz, California: ACM Press, 1995.

Digital Library

[8]

J. Baxter, "A Bayesian/information theoretic model of learning via multiple task sampling," Machine Learning, vol. 28, pp. 7-40, 1997.

Digital Library

[9]

M. Belkin, I. Matveeva, and P. Niyogi, "Regularization and semi-supervised learning on large graphs," in Proceedings of the 17th International Conference on Computational Learning Theory (COLT'04), (J. Shawe-Taylor and Y. Singer, eds.), pp. 624-638, Springer, 2004.

[10]

M. Belkin and P. Niyogi, "Using manifold structure for partially labeled classification," in Advances in Neural Information Processing Systems 15 (NIPS'02), (S. Becker, S. Thrun, and K. Obermayer, eds.), Cambridge, MA: MIT Press, 2003.

[11]

A. J. Bell and T. J. Sejnowski, "An information maximisation approach to blind separation and blind deconvolution," Neural Computation, vol. 7, no. 6, pp. 1129-1159, 1995.

Digital Library

[12]

Y. Bengio and O. Delalleau, "Justifying and generalizing contrastive divergence," Neural Computation, vol. 21, no. 6, pp. 1601-1621, 2009.

Digital Library

[13]

Y. Bengio, O. Delalleau, and N. Le Roux, "The Curse of highly variable functions for local kernel machines," in Advances in Neural Information Processing Systems 18 (NIPS'05), (Y. Weiss, B. Schölkopf, and J. Platt, eds.), pp. 107- 114, Cambridge, MA: MIT Press, 2006.

[14]

Y. Bengio, O. Delalleau, and C. Simard, "Decision trees do not generalize to new variations," Computational Intelligence, To appear, 2009.

[15]

Y. Bengio, R. Ducharme, and P. Vincent, "A neural probabilistic language model," in Advances in Neural Information Processing Systems 13 (NIPS'00), (T. Leen, T. Dietterich, and V. Tresp, eds.), pp. 933-938, MIT Press, 2001.

[16]

Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," Journal of Machine Learning Research, vol. 3, pp. 1137- 1155, 2003.

Digital Library

[17]

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layer-wise training of deep networks," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 153- 160, MIT Press, 2007.

[18]

Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte, "Convex neural networks," in Advances in Neural Information Processing Systems 18 (NIPS'05), (Y. Weiss, B. Schölkopf, and J. Platt, eds.), pp. 123-130, Cambridge, MA: MIT Press, 2006.

[19]

Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007.

[20]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, "Curriculum learning," in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML09), (L. Bottou and M. Littman, eds.), pp. 41-48, Montreal: ACM, 2009.

Digital Library

[21]

Y. Bengio, M. Monperrus, and H. Larochelle, "Non-local estimation of manifold structure," Neural Computation, vol. 18, no. 10, pp. 2509-2528, 2006.

Digital Library

[22]

Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994.

Digital Library

[23]

J. Bergstra and Y. Bengio, "Slow, decorrelated features for pretraining complex cell-like networks," in Advances in Neural Information Processing Systems 22 (NIPS'09), (D. Schuurmans, Y. Bengio, C. Williams, J. Lafferty, and A. Culotta, eds.), December 2010.

[24]

B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," in Fifth Annual Workshop on Computational Learning Theory, pp. 144-152, Pittsburgh: ACM, 1992.

Digital Library

[25]

H. Bourlard and Y. Kamp, "Auto-association by multilayer perceptrons and singular value decomposition," Biological Cybernetics, vol. 59, pp. 291-294, 1988.

Digital Library

[26]

M. Brand, "Charting a manifold," in Advances in Neural Information Processing Systems 15 (NIPS'02), (S. Becker, S. Thrun, and K. Obermayer, eds.), pp. 961-968, MIT Press, 2003.

[27]

L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

Digital Library

[28]

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.

[29]

L. D. Brown, Fundamentals of Statistical Exponential Families. 1986. Vol. 9, Inst. of Math. Statist. Lecture Notes Monograph Series.

Digital Library

[30]

E. Candes and T. Tao, "Decoding by linear programming," IEEE Transactions on Information Theory, vol. 15, no. 12, pp. 4203-4215, 2005.

Digital Library

[31]

M. A. Carreira-Perpiñan and G. E. Hinton, "On contrastive divergence learning," in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS'05), (R. G. Cowell and Z. Ghahramani, eds.), pp. 33-40, Society for Artificial Intelligence and Statistics, 2005.

[32]

R. Caruana, "Multitask connectionist learning," in Proceedings of the 1993 Connectionist Models Summer School, pp. 372-379, 1993.

[33]

P. Clifford, "Markov random fields in statistics," in Disorder in Physical Systems: A Volume in Honour of John M. Hammersley, (G. Grimmett and D. Welsh, eds.), pp. 19-32, Oxford University Press, 1990.

[34]

D. Cohn, Z. Ghahramani, and M. I. Jordan, "Active learning with statistical models," in Advances in Neural Information Processing Systems 7 (NIPS'94), (G. Tesauro, D. Touretzky, and T. Leen, eds.), pp. 705-712, Cambridge MA: MIT Press, 1995.

[35]

T. F. Coleman and Z. Wu, "Parallel continuation-based global optimization for molecular conformation and protein folding," Technical Report Cornell University, Dept. of Computer Science, 1994.

Digital Library

[36]

R. Collobert and S. Bengio, "Links between perceptrons, MLPs and SVMs," in Proceedings of the Twenty-first International Conference on Machine Learning (ICML'04), (C. E. Brodley, ed.), p. 23, New York, NY, USA: ACM, 2004.

Digital Library

[37]

R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 160-167, ACM, 2008.

Digital Library

[38]

C. Cortes, P. Haffner, and M. Mohri, "Rational kernels: Theory and algorithms," Journal of Machine Learning Research, vol. 5, pp. 1035-1062, 2004.

Digital Library

[39]

C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, pp. 273-297, 1995.

Digital Library

[40]

N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On kerneltarget alignment," in Advances in Neural Information Processing Systems 14 (NIPS'01), (T. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 367-373, 2002.

[41]

F. Cucker and D. Grigoriev, "Complexity lower bounds for approximation algebraic computation trees," Journal of Complexity, vol. 15, no. 4, pp. 499- 512, 1999.

Digital Library

[42]

P. Dayan, G. E. Hinton, R. Neal, and R. Zemel, "The Helmholtz machine," Neural Computation, vol. 7, pp. 889-904, 1995.

Digital Library

[43]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391-407, 1990.

[44]

O. Delalleau, Y. Bengio, and N. L. Roux, "Efficient non-parametric function induction in semi-supervised learning," in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, (R. G. Cowell and Z. Ghahramani, eds.), pp. 96-103, Society for Artificial Intelligence and Statistics, January 2005.

[45]

G. Desjardins and Y. Bengio, "Empirical evaluation of convolutional RBMs for vision," Technical Report 1327, Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, 2008.

[46]

E. Doi, D. C. Balcan, and M. S. Lewicki, "A theoretical analysis of robust coding over noisy overcomplete channels," in Advances in Neural Information Processing Systems 18 (NIPS'05), (Y. Weiss, B. Schölkopf, and J. Platt, eds.), pp. 307-314, Cambridge, MA: MIT Press, 2006.

[47]

D. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.

Digital Library

[48]

S. Duane, A. Kennedy, B. Pendleton, and D. Roweth, "Hybrid Monte Carlo," Phys. Lett. B, vol. 195, pp. 216-222, 1987.

[49]

J. L. Elman, "Learning and development in neural networks: The importance of starting small," Cognition, vol. 48, pp. 781-799, 1993.

[50]

D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, "The difficulty of training deep architectures and the effect of unsupervised pretraining," in Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS'09), pp. 153-160, 2009.

[51]

Y. Freund and D. Haussler, "Unsupervised learning of distributions on binary vectors using two layer networks," Technical Report UCSC-CRL-94-25, University of California, Santa Cruz, 1994.

Digital Library

[52]

Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," in Machine Learning: Proceedings of Thirteenth International Conference, pp. 148-156, USA: ACM, 1996.

[53]

B. J. Frey, G. E. Hinton, and P. Dayan, "Does the wake-sleep algorithm learn good density estimators?," in Advances in Neural Information Processing Systems 8 (NIPS'95), (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), pp. 661-670, Cambridge, MA: MIT Press, 1996.

[54]

K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biological Cybernetics, vol. 36, pp. 193-202, 1980.

[55]

P. Gallinari, Y. LeCun, S. Thiria, and F. Fogelman-Soulie, "Memoires associatives distribuees," in Proceedings of COGNITIVA 87, Paris, La Villette, 1987.

[56]

T. Gärtner, "A survey of kernels for structured data," ACM SIGKDD Explorations Newsletter, vol. 5, no. 1, pp. 49-58, 2003.

Digital Library

[57]

S. Geman and D. Geman, "Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 721-741, November 1984.

Digital Library

[58]

R. Grosse, R. Raina, H. Kwong, and A. Y. Ng, "Shift-invariant sparse coding for audio classification," in Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence (UAI'07), 2007.

[59]

R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'06), pp. 1735-1742, IEEE Press, 2006.

Digital Library

[60]

R. Hadsell, A. Erkan, P. Sermanet, M. Scoffier, U. Muller, and Y. LeCun, "Deep belief net learning in a long-range vision system for autonomous offroad driving," in Proc. Intelligent Robots and Systems (IROS'08), pp. 628-633, 2008.

[61]

J. M. Hammersley and P. Clifford, "Markov field on finite graphs and lattices," Unpublished manuscript, 1971.

[62]

J. Håstad, "Almost optimal lower bounds for small depth circuits," in Proceedings of the 18th annual ACM Symposium on Theory of Computing, pp. 6-20, Berkeley, California: ACM Press, 1986.

Digital Library

[63]

J. Håstad and M. Goldmann, "On the power of small-depth threshold circuits," Computational Complexity, vol. 1, pp. 113-129, 1991.

[64]

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, "The entire regularization path for the support vector machine," Journal of Machine Learning Research, vol. 5, pp. 1391-1415, 2004.

Digital Library

[65]

K. A. Heller and Z. Ghahramani, "A nonparametric bayesian approach to modeling overlapping clusters," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), pp. 187-194, San Juan, Porto Rico: Omnipress, 2007.

[66]

K. A. Heller, S. Williamson, and Z. Ghahramani, "Statistical models for partial membership," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 392-399, ACM, 2008.

Digital Library

[67]

G. Hinton and J. Anderson, Parallel Models of Associative Memory. Hillsdale, NJ: Lawrence Erlbaum Assoc., 1981.

Digital Library

[68]

G. E. Hinton, "Learning distributed representations of concepts," in Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pp. 1-12, Amherst: Lawrence Erlbaum, Hillsdale, 1986.

[69]

G. E. Hinton, "Products of experts," in Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN), vol. 1, pp. 1-6, Edinburgh, Scotland: IEE, 1999.

[70]

G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, vol. 14, pp. 1771-1800, 2002.

Digital Library

[71]

G. E. Hinton, "To recognize shapes, first learn to generate images," Technical Report UTML TR 2006-003, University of Toronto, 2006.

[72]

G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal, "The wake-sleep algorithm for unsupervised neural networks," Science, vol. 268, pp. 1558-1161, 1995.

[73]

G. E. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.

Digital Library

[74]

G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.

Digital Library

[75]

G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, pp. 504-507, 2006.

[76]

G. E. Hinton and T. J. Sejnowski, "Learning and relearning in Boltzmann machines," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, (D. E. Rumelhart and J. L. McClelland, eds.), pp. 282-317, Cambridge, MA: MIT Press, 1986.

Digital Library

[77]

G. E. Hinton, T. J. Sejnowski, and D. H. Ackley, "Boltzmann machines: Constraint satisfaction networks that learn," Technical Report TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science, 1984.

[78]

G. E. Hinton, M. Welling, Y. W. Teh, and S. Osindero, "A new view of ICA," in Proceedings of 3rd International Conference on Independent Component Analysis and Blind Signal Separation (ICA'01), pp. 746-751, San Diego, CA, 2001.

[79]

G. E. Hinton and R. S. Zemel, "Autoencoders, minimum description length, and Helmholtz free energy," in Advances in Neural Information Processing Systems 6 (NIPS'93), (D. Cowan, G. Tesauro, and J. Alspector, eds.), pp. 3-10, Morgan Kaufmann Publishers, Inc., 1994.

[80]

T. K. Ho, "Random decision forest," in 3rd International Conference on Document Analysis and Recognition (ICDAR'95), pp. 278-282, Montreal, Canada, 1995.

Digital Library

[81]

S. Hochreiter Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.

[82]

H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, vol. 24, pp. 417-441, 498-520, 1933.

[83]

D. H. Hubel and T. N. Wiesel, "Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex," Journal of Physiology (London), vol. 160, pp. 106-154, 1962.

[84]

A. Hyvärinen, "Estimation of non-normalized statistical models using score matching," Journal of Machine Learning Research, vol. 6, pp. 695-709, 2005.

Digital Library

[85]

A. Hyvärinen, "Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables," IEEE Transactions on Neural Networks, vol. 18, pp. 1529-1531, 2007.

Digital Library

[86]

A. Hyvärinen, "Some extensions of score matching," Computational Statistics and Data Analysis, vol. 51, pp. 2499-2512, 2007.

Digital Library

[87]

A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley-Interscience, May 2001.

[88]

N. Intrator and S. Edelman, "How to make a low-dimensional representation suitable for diverse tasks," Connection Science, Special issue on Transfer in Neural Networks, vol. 8, pp. 205-224, 1996.

[89]

T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers," Available from http://www.cse.ucsc.edu/haussler/pubs.html, Preprint, Dept.of Computer Science, Univ. of California. A shorter version is in Advances in Neural Information Processing Systems 11, 1998.

Digital Library

[90]

N. Japkowicz, S. J. Hanson, and M. A. Gluck, "Nonlinear autoassociation is not equivalent to PCA," Neural Computation, vol. 12, no. 3, pp. 531-545, 2000.

Digital Library

[91]

M. I. Jordan, Learning in Graphical Models. Dordrecht, Netherlands: Kluwer, 1998.

Digital Library

[92]

K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "Fast inference in sparse coding algorithms with applications to object recognition," Technical Report, Computational and Biological Learning Lab, Courant Institute, NYU, Technical Report CBLL-TR-2008-12-01, 2008.

[93]

S. Kirkpatrick, C. D. G. Jr., and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983.

[94]

U. Köster and A. Hyvärinen, "A two-layer ICA-like model estimated by score matching," in Int. Conf. Artificial Neural Networks (ICANN'2007), pp. 798- 807, 2007.

Digital Library

[95]

K. A. Krueger and P. Dayan, "Flexible shaping: How learning in small steps helps," Cognition, vol. 110, pp. 380-394, 2009.

[96]

G. Lanckriet, N. Cristianini, P. Bartlett, L. El Gahoui, and M. Jordan, "Learning the kernel matrix with semi-definite programming," in Proceedings of the Nineteenth International Conference on Machine Learning (ICML'02), (C. Sammut and A. G. Hoffmann, eds.), pp. 323-330, Morgan Kaufmann, 2002.

Digital Library

[97]

H. Larochelle and Y. Bengio, "Classification using discriminative restricted Boltzmann machines," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 536-543, ACM, 2008.

Digital Library

[98]

H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, "Exploring strategies for training deep neural networks," Journal of Machine Learning Research, vol. 10, pp. 1-40, 2009.

Digital Library

[99]

H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, "An empirical evaluation of deep architectures on problems with many factors of variation," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 473-480, ACM, 2007.

Digital Library

[100]

J. A. Lasserre, C. M. Bishop, and T. P. Minka, "Principled hybrids of generative and discriminative models," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'06), pp. 87-94, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[101]

Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

[102]

N. Le Roux and Y. Bengio, "Representational power of restricted boltzmann machines and deep belief networks," Neural Computation, vol. 20, no. 6, pp. 1631-1649, 2008.

Digital Library

[103]

Y. LeCun, "Modèles connexionistes de l'apprentissage," PhD thesis, Universit é de Paris VI, 1987.

[104]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989.

Digital Library

[105]

Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient BackProp," in Neural Networks: Tricks of the Trade, (G. B. Orr and K.-R. Müller, eds.), pp. 9-50, Springer, 1998.

Digital Library

[106]

Y. LeCun, S. Chopra, R. M. Hadsell, M.-A. Ranzato, and F.-J. Huang, "A tutorial on energy-based learning," in Predicting Structured Data, pp. 191- 246, G. Bakir and T. Hofman and B. Scholkopf and A. Smola and B. Taskar: MIT Press, 2006.

[107]

Y. LeCun and F. Huang, "Loss functions for discriminative training of energy-based models," in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS'05), (R. G. Cowell and Z. Ghahramani, eds.), 2005.

[108]

Y. LeCun, F.-J. Huang, and L. Bottou, "Learning methods for generic object recognition with invariance to pose and lighting," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'04), vol. 2, pp. 97- 104, Los Alamitos, CA, USA: IEEE Computer Society, 2004.

Digital Library

[109]

H. Lee, A. Battle, R. Raina, and A. Ng, "Efficient sparse coding algorithms," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 801-808, MIT Press, 2007.

[110]

H. Lee, C. Ekanadham, and A. Ng, "Sparse deep belief net model for visual area V2," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. P. Roweis, eds.), Cambridge, MA: MIT Press, 2008.

[111]

H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations," in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML'09), (L. Bottou and M. Littman, eds.), Montreal (Qc), Canada: ACM, 2009.

Digital Library

[112]

T.-S. Lee and D. Mumford, "Hierarchical bayesian inference in the visual cortex," Journal of Optical Society of America, A, vol. 20, no. 7, pp. 1434- 1448, 2003.

[113]

P. Lennie, "The cost of cortical computation," Current Biology, vol. 13, pp. 493-497, Mar 18 2003.

[114]

I. Levner, Data Driven Object Segmentation. 2008. PhD thesis, Department of Computer Science, University of Alberta.

[115]

M. Lewicki and T. Sejnowski, "Learning nonlinear overcomplete representations for efficient coding," in Advances in Neural Information Processing Systems 10 (NIPS'97), (M. Jordan, M. Kearns, and S. Solla, eds.), pp. 556-562, Cambridge, MA, USA: MIT Press, 1998.

Digital Library

[116]

M. S. Lewicki and T. J. Sejnowski, "Learning overcomplete representations," Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.

Digital Library

[117]

M. Li and P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications. New York, NY: Springer, Second ed., 1997.

Digital Library

[118]

P. Liang and M. I. Jordan, "An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 584-591, New York, NY, USA: ACM, 2008.

Digital Library

[119]

T. Lin, B. G. Horne, P. Tino, and C. L. Giles, "Learning long-term dependencies is not as difficult with NARX recurrent neural networks," Technical Report UMICAS-TR-95-78, Institute for Advanced Computer Studies, University of Mariland, 1995.

Digital Library

[120]

G. Loosli, S. Canu, and L. Bottou, "Training invariant support vector machines using selective sampling," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), pp. 301-320, Cambridge, MA: MIT Press, 2007.

[121]

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Supervised dictionary learning," in Advances in Neural Information Processing Systems 21 (NIPS'08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1033-1040, 2009. NIPS Foundation.

[122]

J. L. McClelland and D. E. Rumelhart, "An interactive activation model of context effects in letter perception," Psychological Review, pp. 375-407, 1981.

[123]

J. L. McClelland and D. E. Rumelhart, Explorations in parallel distributed processing. Cambridge: MIT Press, 1988.

Digital Library

[124]

J. L. McClelland, D. E. Rumelhart, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 2. Cambridge: MIT Press, 1986.

[125]

W. S. McCulloch and W. Pitts, "A logical calculus of ideas immanent in nervous activity," Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.

[126]

R. Memisevic and G. E. Hinton, "Unsupervised learning of image transformations," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'07), 2007.

[127]

E. Mendelson, Introduction to Mathematical Logic, 4th ed. 1997. Chapman & Hall.

Digital Library

[128]

R. Miikkulainen and M. G. Dyer, "Natural language processing with modular PDP networks and distributed lexicon," Cognitive Science, vol. 15, pp. 343-399, 1991.

[129]

A. Mnih and G. E. Hinton, "Three new graphical models for statistical language modelling," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 641-648, ACM, 2007.

Digital Library

[130]

A. Mnih and G. E. Hinton, "A scalable hierarchical distributed language model," in Advances in Neural Information Processing Systems 21 (NIPS'08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1081-1088, 2009.

[131]

H. Mobahi, R. Collobert, and J. Weston, "Deep Learning from temporal coherence in video," in Proceedings of the 26th International Conference on Machine Learning, (L. Bottou and M. Littman, eds.), pp. 737-744, Montreal: Omnipress, June 2009.

Digital Library

[132]

J. More and Z. Wu, "Smoothing techniques for macromolecular global optimization," in Nonlinear Optimization and Applications, (G. D. Pillo and F. Giannessi, eds.), Plenum Press, 1996.

[133]

I. Murray and R. Salakhutdinov, "Evaluating probabilities under highdimensional latent variable models," in Advances in Neural Information Processing Systems 21 (NIPS'08), vol. 21, (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1137-1144, 2009.

[134]

J. Mutch and D. G. Lowe, "Object class recognition and localization using sparse features with limited receptive fields," International Journal of Computer Vision, vol. 80, no. 1, pp. 45-57, 2008.

Digital Library

[135]

R. M. Neal, "Connectionist learning of belief networks," Artificial Intelligence, vol. 56, pp. 71-113, 1992.

Digital Library

[136]

R. M. Neal, "Bayesian learning for neural networks," PhD thesis, Department of Computer Science, University of Toronto, 1994.

Digital Library

[137]

A. Y. Ng and M. I. Jordan, "On Discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes," in Advances in Neural Information Processing Systems 14 (NIPS'01), (T. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 841-848, 2002.

[138]

J. Niebles and L. Fei-Fei, "A hierarchical model of shape and appearance for human action classification," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'07), 2007.

[139]

B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: a strategy employed by V1?," Vision Research, vol. 37, pp. 3311-3325, December 1997.

[140]

P. Orponen, "Computational complexity of neural networks: a survey," Nordic Journal of Computing, vol. 1, no. 1, pp. 94-110, 1994.

Digital Library

[141]

S. Osindero and G. E. Hinton, "Modeling image patches with a directed hierarchy of Markov random field," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), pp. 1121-1128, Cambridge, MA: MIT Press, 2008.

[142]

B. Pearlmutter and L. C. Parra, "A context-sensitive generalization of ICA," in International Conference On Neural Information Processing, (L. Xu, ed.), pp. 151-157, Hong-Kong, 1996.

[143]

E. Pérez and L. A. Rendell, "Learning despite concept variation by finding structure in attribute-based data," in Proceedings of the Thirteenth International Conference on Machine Learning (ICML'96), (L. Saitta, ed.), pp. 391-399, Morgan Kaufmann, 1996.

[144]

G. B. Peterson, "A day of great illumination: B. F. Skinner's discovery of shaping," Journal of the Experimental Analysis of Behavior, vol. 82, no. 3, pp. 317-328, 2004.

[145]

N. Pinto, J. DiCarlo, and D. Cox, "Establishing good benchmarks and baselines for face recognition," in ECCV 2008 Faces in 'Real-Life' Images Workshop, 2008. Marseille France, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie.

[146]

J. B. Pollack, "Recursive distributed representations," Artificial Intelligence, vol. 46, no. 1, pp. 77-105, 1990.

Digital Library

[147]

L. R. Rabiner and B. H. Juang, "An Introduction to hidden Markov models," IEEE ASSP Magazine, pp. 257-285, January 1986.

[148]

R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, "Self-taught learning: transfer learning from unlabeled data," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 759-766, ACM, 2007.

Digital Library

[149]

M. Ranzato, Y. Boureau, S. Chopra, and Y. LeCun, "A unified energy-based framework for unsupervised learning," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Porto Rico: Omnipress, 2007.

[150]

M. Ranzato, Y.-L. Boureau, and Y. LeCun, "Sparse feature learning for deep belief networks," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), pp. 1185- 1192, Cambridge, MA: MIT Press, 2008.

[151]

M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'07), IEEE Press, 2007.

[152]

M. Ranzato and Y. LeCun, "A sparse and locally shift invariant feature extractor applied to document images," in International Conference on Document Analysis and Recognition (ICDAR'07), pp. 1213-1217, Washington, DC, USA: IEEE Computer Society, 2007.

Digital Library

[153]

M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, "Efficient learning of sparse representations with an energy-based model," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 1137-1144, MIT Press, 2007.

[154]

M. Ranzato and M. Szummer, "Semi-supervised learning of compact document representations with deep networks," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), vol. 307, (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 792-799, ACM, 2008.

Digital Library

[155]

S. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.

[156]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533-536, 1986.

[157]

D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1, Cambridge: MIT Press, 1986.

Digital Library

[158]

R. Salakhutdinov and G. E. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Porto Rico: Omnipress, 2007.

[159]

R. Salakhutdinov and G. E. Hinton, "Semantic hashing," in Proceedings of the 2007 Workshop on Information Retrieval and applications of Graphical Models (SIGIR 2007), Amsterdam: Elsevier, 2007.

[160]

R. Salakhutdinov and G. E. Hinton, "Using deep belief nets to learn covariance kernels for Gaussian processes," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), pp. 1249-1256, Cambridge, MA: MIT Press, 2008.

[161]

R. Salakhutdinov and G. E. Hinton, "Deep Boltzmann machines," in Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS'09), vol. 5, pp. 448-455, 2009.

[162]

R. Salakhutdinov, A. Mnih, and G. E. Hinton, "Restricted Boltzmann machines for collaborative filtering," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 791-798, New York, NY, USA: ACM, 2007.

Digital Library

[163]

R. Salakhutdinov and I. Murray, "On the quantitative analysis of deep belief networks," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 872-879, ACM, 2008.

Digital Library

[164]

L. K. Saul, T. Jaakkola, and M. I. Jordan, "Mean field theory for sigmoid belief networks," Journal of Artificial Intelligence Research, vol. 4, pp. 61-76, 1996.

Digital Library

[165]

M. Schmitt, "Descartes' rule of signs for radial basis function neural networks," Neural Computation, vol. 14, no. 12, pp. 2997-3011, 2002.

Digital Library

[166]

B. Schölkopf, C. J. C. Burges, and A. J. Smola, Advances in Kernel Methods -- Support Vector Learning. Cambridge, MA: MIT Press, 1999.

Digital Library

[167]

B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola, "Input space versus feature space in kernel-based methods," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1000-1017, 1999.

Digital Library

[168]

B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear component analysis as a kernel eigenvalue problem," Neural Computation, vol. 10, pp. 1299-1319, 1998.

Digital Library

[169]

H. Schwenk, "Efficient training of large neural networks for language modeling," in International Joint Conference on Neural Networks (IJCNN), pp. 3050-3064, 2004.

[170]

H. Schwenk and J.-L. Gauvain, "Connectionist language modeling for large vocabulary continuous speech recognition," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 765-768, Orlando, Florida, 2002.

[171]

H. Schwenk and J.-L. Gauvain, "Building continuous space language models for transcribing European languages," in Interspeech, pp. 737-740, 2005.

[172]

H. Schwenk and M. Milgram, "Transformation invariant autoassociation with application to handwritten character recognition," in Advances in Neural Information Processing Systems 7 (NIPS'94), (G. Tesauro, D. Touretzky, and T. Leen, eds.), pp. 991-998, MIT Press, 1995.

[173]

T. Serre, G. Kreiman, M. Kouh, C. Cadieu, U. Knoblich, and T. Poggio, "A quantitative theory of immediate visual recognition," Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function, vol. 165, pp. 33-56, 2007.

[174]

S. H. Seung, "Learning continuous attractors in recurrent networks," in Advances in Neural Information Processing Systems 10 (NIPS'97), (M. Jordan, M. Kearns, and S. Solla, eds.), pp. 654-660, MIT Press, 1998.

Digital Library

[175]

D. Simard, P. Y. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks," in International Conference on Document Analysis and Recognition (ICDAR'03), p. 958, Washington, DC, USA: IEEE Computer Society, 2003.

Digital Library

[176]

P. Y. Simard, Y. LeCun, and J. Denker, "Efficient pattern recognition using a new transformation distance," in Advances in Neural Information Processing Systems 5 (NIPS'92), (C. Giles, S. Hanson, and J. Cowan, eds.), pp. 50-58, Morgan Kaufmann, San Mateo, 1993.

Digital Library

[177]

B. F. Skinner, "Reinforcement today," American Psychologist, vol. 13, pp. 94- 99, 1958.

[178]

P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," in Parallel Distributed Processing, vol. 1, (D. E. Rumelhart and J. L. McClelland, eds.), pp. 194-281, Cambridge: MIT Press, 1986. ch. 6.

Digital Library

[179]

E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky, "Describing visual scenes using transformed objects and parts," International Journal of Computer Vision, vol. 77, pp. 291-330, 2007.

Digital Library

[180]

I. Sutskever and G. E. Hinton, "Learning multilevel distributed representations for high-dimensional sequences," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Porto Rico: Omnipress, 2007.

[181]

R. Sutton and A. Barto, Reinforcement Learning: An Introduction. MIT Press, 1998.

Digital Library

[182]

G. Taylor and G. Hinton, "Factored conditional restricted Boltzmann machines for modeling motion style," in Proceedings of the 26th International Conference on Machine Learning (ICML'09), (L. Bottou and M. Littman, eds.), pp. 1025-1032, Montreal: Omnipress, June 2009.

Digital Library

[183]

G. Taylor, G. E. Hinton, and S. Roweis, "Modeling human motion using binary latent variables," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 1345-1352, Cambridge, MA: MIT Press, 2007.

[184]

Y. Teh, M. Welling, S. Osindero, and G. E. Hinton, "Energy-based models for sparse overcomplete representations," Journal of Machine Learning Research, vol. 4, pp. 1235-1260, 2003.

Digital Library

[185]

J. Tenenbaum, V. de Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, pp. 2319- 2323, 2000.

[186]

S. Thrun, "Is learning the n-th thing any easier than learning the first?," in Advances in Neural Information Processing Systems 8 (NIPS'95), (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), pp. 640-646, Cambridge, MA: MIT Press, 1996.

[187]

T. Tieleman, "Training restricted Boltzmann machines using approximations to the likelihood gradient," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1064-1071, ACM, 2008.

Digital Library

[188]

T. Tieleman and G. Hinton, "Using fast weights to improve persistent contrastive divergence," in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML'09), (L. Bottou and M. Littman, eds.), pp. 1033-1040, New York, NY, USA: ACM, 2009.

Digital Library

[189]

I. Titov and J. Henderson, "Constituent parsing with incremental sigmoid belief networks," in Proc. 45th Meeting of Association for Computational Linguistics (ACL'07), pp. 632-639, Prague, Czech Republic, 2007.

[190]

A. Torralba, R. Fergus, and Y. Weiss, "Small codes and large databases for recognition," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'08), pp. 1-8, 2008.

[191]

P. E. Utgoff and D. J. Stracuzzi, "Many-layered learning," Neural Computation, vol. 14, pp. 2497-2539, 2002.

Digital Library

[192]

L. van der Maaten and G. E. Hinton, "Visualizing data using t-Sne," Journal of Machine Learning Research, vol. 9, pp. 2579-2605, November 2008.

[193]

V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.

Digital Library

[194]

R. Vilalta, G. Blix, and L. Rendell, "Global data analysis and the fragmentation problem in decision tree induction," in Proceedings of the 9th European Conference on Machine Learning (ECML'97), pp. 312-327, Springer-Verlag, 1997.

Digital Library

[195]

P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1096-1103, ACM, 2008.

Digital Library

[196]

L. Wang and K. L. Chan, "Learning kernel parameters by using class separability measure," 6th kernel machines workshop, in conjunction with Neural Information Processing Systems (NIPS), 2002.

[197]

M. Weber, M. Welling, and P. Perona, "Unsupervised learning of models for recognition," in Proc. 6th Europ. Conf. Comp. Vis., ECCV2000, pp. 18-32, Dublin, 2000.

Digital Library

[198]

I. Wegener, The Complexity of Boolean Functions. John Wiley & Sons, 1987.

Digital Library

[199]

Y. Weiss, "Segmentation using eigenvectors: a unifying view," in Proceedings IEEE International Conference on Computer Vision (ICCV'99), pp. 975-982, 1999.

Digital Library

[200]

M. Welling, M. Rosen-Zvi, and G. E. Hinton, "Exponential family harmoniums with an application to information retrieval," in Advances in Neural Information Processing Systems 17 (NIPS'04), (L. Saul, Y. Weiss, and L. Bottou, eds.), pp. 1481-1488, Cambridge, MA: MIT Press, 2005.

[201]

M. Welling, R. Zemel, and G. E. Hinton, "Self-supervised boosting," in Advances in Neural Information Processing Systems 15 (NIPS'02), (S. Becker, S. Thrun, and K. Obermayer, eds.), pp. 665-672, MIT Press, 2003.

[202]

J. Weston, F. Ratle, and R. Collobert, "Deep learning via semi-supervised embedding," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1168-1175, New York, NY, USA: ACM, 2008.

Digital Library

[203]

C. K. I. Williams and C. E. Rasmussen, "Gaussian processes for regression," in Advances in neural information processing systems 8 (NIPS'95), (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), pp. 514-520, Cambridge, MA: MIT Press, 1996.

[204]

L. Wiskott and T. J. Sejnowski, "Slow feature analysis: Unsupervised learning of invariances," Neural Computation, vol. 14, no. 4, pp. 715-770, 2002.

Digital Library

[205]

D. H. Wolpert, "Stacked generalization," Neural Networks, vol. 5, pp. 241-249, 1992.

Digital Library

[206]

Z. Wu, "Global continuation for distance geometry problems," SIAM Journal of Optimization, vol. 7, pp. 814-836, 1997.

Digital Library

[207]

P. Xu, A. Emami, and F. Jelinek, "Training connectionist models for the structured language model," in Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP'2003), vol. 10, pp. 160-167, 2003.

Digital Library

[208]

A. Yao, "Separating the polynomial-time hierarchy by oracles," in Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, pp. 1-10, 1985.

Digital Library

[209]

D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, and B. Schölkopf, "Learning with local and global consistency," in Advances in Neural Information Processing Systems 16 (NIPS'03), (S. Thrun, L. Saul, and B. Schölkopf, eds.), pp. 321-328, Cambridge, MA: MIT Press, 2004.

[210]

X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-supervised learning using Gaussian fields and harmonic functions," in Proceedings of the Twenty International Conference on Machine Learning (ICML'03), (T. Fawcett and N. Mishra, eds.), pp. 912-919, AAAI Press, 2003.

[211]

M. Zinkevich, "Online convex programming and generalized infinitesimal gradient ascent," in Proceedings of the Twenty International Conference on Machine Learning (ICML'03), (T. Fawcett and N. Mishra, eds.), pp. 928-936, AAAI Press, 2003.

Cited By

Gimeno-Gómez DMartínez-Hinarejos C(2024)Continuous lipreading based on acoustic temporal alignmentsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00345-72024:1Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1186/s13636-024-00345-7
Sharma SBajaj KDeshpande PBhattacharya ATripathi S(2024)Short-Term Fog Forecasting using Meteorological Observations at Airports in North IndiaProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632449(307-315)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632449
Gou JHe XDu LYu BChen WYi Z(2024)Hierarchical Locality-Aware Deep Dictionary Learning for ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326661026(447-461)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3266610
Show More Cited By

Index Terms

Learning Deep Architectures for AI
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Learning Deep Architectures for AI
Autoencoders, unsupervised learning and deep architectures
UTLW'11: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop - Volume 27

Autoencoders play a fundamental role in unsupervised learning and in deep architectures for transfer learning and other tasks. In spite of their fundamental role, only linear autoencoders over the real numbers have been solved analytically. Here we ...

Comments

Information & Contributors

Information

Published In

cover image Foundations and Trends® in Machine Learning

Foundations and Trends® in Machine Learning Volume 2, Issue 1

January 2009

130 pages

ISSN:1935-8237

EISSN:1935-8245

Issue’s Table of Contents

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 01 January 2009

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,275
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gimeno-Gómez DMartínez-Hinarejos C(2024)Continuous lipreading based on acoustic temporal alignmentsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00345-72024:1Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1186/s13636-024-00345-7
Sharma SBajaj KDeshpande PBhattacharya ATripathi S(2024)Short-Term Fog Forecasting using Meteorological Observations at Airports in North IndiaProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632449(307-315)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632449
Gou JHe XDu LYu BChen WYi Z(2024)Hierarchical Locality-Aware Deep Dictionary Learning for ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326661026(447-461)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3266610
Barbareschi MBarone SMazzocca NMoriconi A(2024)FPGA approximate logic synthesis through catalog-based AIG-rewriting techniqueJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103112150:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.sysarc.2024.103112
Yu HFeng XWang Y(2024)Enhancing deep feature representation in self-knowledge distillation via pyramid feature refinementPattern Recognition Letters10.1016/j.patrec.2023.12.014178:C(35-42)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.patrec.2023.12.014
Tonin FTao QPatrinos PSuykens J(2024)Deep Kernel Principal Component Analysis for multi-level feature learningNeural Networks10.1016/j.neunet.2023.11.045170:C(578-595)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.11.045
Cherkassky VLee E(2024)To understand double descent, we need to understand VC theoryNeural Networks10.1016/j.neunet.2023.10.014169:C(242-256)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.10.014
Ke QJing XWoźniak MXu SLiang YZheng J(2024)APGVAEInformation Sciences: an International Journal10.1016/j.ins.2023.119903657:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.ins.2023.119903
Gou JXie NLiu JYu BOu WYi ZChen W(2024)Hierarchical graph augmented stacked autoencoders for multi-view representation learningInformation Fusion10.1016/j.inffus.2023.102068102:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.inffus.2023.102068
Lotfipoor APatidar SJenkins D(2024)Deep neural network with empirical mode decomposition and Bayesian optimisation for residential load forecastingExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121355237:PAOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121355
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents