Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Learning Deep Architectures for AI

Published: 01 January 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Theoretical results suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g., in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This monograph discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.

    References

    [1]
    D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A learning algorithm for boltzmann machines," Cognitive Science, vol. 9, pp. 147-169, 1985.
    [2]
    A. Ahmed, K. Yu, W. Xu, Y. Gong, and E. P. Xing, "Training hierarchical feed-forward visual recognition models using transfer learning from pseudo tasks," in Proceedings of the 10th European Conference on Computer Vision (ECCV'08), pp. 69-82, 2008.
    [3]
    E. L. Allgower and K. Georg, Numerical Continuation Methods. An Introduction. No. 13 in Springer Series in Computational Mathematics, Springer-Verlag, 1980.
    [4]
    C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan, "An introduction to MCMC for machine learning," Machine Learning, vol. 50, pp. 5-43, 2003.
    [5]
    D. Attwell and S. B. Laughlin, "An energy budget for signaling in the grey matter of the brain," Journal of Cerebral Blood Flow And Metabolism, vol. 21, pp. 1133-1145, 2001.
    [6]
    J. A. Bagnell and D. M. Bradley, "Differentiable sparse coding," in Advances in Neural Information Processing Systems 21 (NIPS'08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), NIPS Foundation, 2009.
    [7]
    J. Baxter, "Learning internal representations," in Proceedings of the 8th International Conference on Computational Learning Theory (COLT'95), pp. 311- 320, Santa Cruz, California: ACM Press, 1995.
    [8]
    J. Baxter, "A Bayesian/information theoretic model of learning via multiple task sampling," Machine Learning, vol. 28, pp. 7-40, 1997.
    [9]
    M. Belkin, I. Matveeva, and P. Niyogi, "Regularization and semi-supervised learning on large graphs," in Proceedings of the 17th International Conference on Computational Learning Theory (COLT'04), (J. Shawe-Taylor and Y. Singer, eds.), pp. 624-638, Springer, 2004.
    [10]
    M. Belkin and P. Niyogi, "Using manifold structure for partially labeled classification," in Advances in Neural Information Processing Systems 15 (NIPS'02), (S. Becker, S. Thrun, and K. Obermayer, eds.), Cambridge, MA: MIT Press, 2003.
    [11]
    A. J. Bell and T. J. Sejnowski, "An information maximisation approach to blind separation and blind deconvolution," Neural Computation, vol. 7, no. 6, pp. 1129-1159, 1995.
    [12]
    Y. Bengio and O. Delalleau, "Justifying and generalizing contrastive divergence," Neural Computation, vol. 21, no. 6, pp. 1601-1621, 2009.
    [13]
    Y. Bengio, O. Delalleau, and N. Le Roux, "The Curse of highly variable functions for local kernel machines," in Advances in Neural Information Processing Systems 18 (NIPS'05), (Y. Weiss, B. Schölkopf, and J. Platt, eds.), pp. 107- 114, Cambridge, MA: MIT Press, 2006.
    [14]
    Y. Bengio, O. Delalleau, and C. Simard, "Decision trees do not generalize to new variations," Computational Intelligence, To appear, 2009.
    [15]
    Y. Bengio, R. Ducharme, and P. Vincent, "A neural probabilistic language model," in Advances in Neural Information Processing Systems 13 (NIPS'00), (T. Leen, T. Dietterich, and V. Tresp, eds.), pp. 933-938, MIT Press, 2001.
    [16]
    Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, "A neural probabilistic language model," Journal of Machine Learning Research, vol. 3, pp. 1137- 1155, 2003.
    [17]
    Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layer-wise training of deep networks," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 153- 160, MIT Press, 2007.
    [18]
    Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte, "Convex neural networks," in Advances in Neural Information Processing Systems 18 (NIPS'05), (Y. Weiss, B. Schölkopf, and J. Platt, eds.), pp. 123-130, Cambridge, MA: MIT Press, 2006.
    [19]
    Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007.
    [20]
    Y. Bengio, J. Louradour, R. Collobert, and J. Weston, "Curriculum learning," in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML09), (L. Bottou and M. Littman, eds.), pp. 41-48, Montreal: ACM, 2009.
    [21]
    Y. Bengio, M. Monperrus, and H. Larochelle, "Non-local estimation of manifold structure," Neural Computation, vol. 18, no. 10, pp. 2509-2528, 2006.
    [22]
    Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994.
    [23]
    J. Bergstra and Y. Bengio, "Slow, decorrelated features for pretraining complex cell-like networks," in Advances in Neural Information Processing Systems 22 (NIPS'09), (D. Schuurmans, Y. Bengio, C. Williams, J. Lafferty, and A. Culotta, eds.), December 2010.
    [24]
    B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," in Fifth Annual Workshop on Computational Learning Theory, pp. 144-152, Pittsburgh: ACM, 1992.
    [25]
    H. Bourlard and Y. Kamp, "Auto-association by multilayer perceptrons and singular value decomposition," Biological Cybernetics, vol. 59, pp. 291-294, 1988.
    [26]
    M. Brand, "Charting a manifold," in Advances in Neural Information Processing Systems 15 (NIPS'02), (S. Becker, S. Thrun, and K. Obermayer, eds.), pp. 961-968, MIT Press, 2003.
    [27]
    L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
    [28]
    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 1984.
    [29]
    L. D. Brown, Fundamentals of Statistical Exponential Families. 1986. Vol. 9, Inst. of Math. Statist. Lecture Notes Monograph Series.
    [30]
    E. Candes and T. Tao, "Decoding by linear programming," IEEE Transactions on Information Theory, vol. 15, no. 12, pp. 4203-4215, 2005.
    [31]
    M. A. Carreira-Perpiñan and G. E. Hinton, "On contrastive divergence learning," in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS'05), (R. G. Cowell and Z. Ghahramani, eds.), pp. 33-40, Society for Artificial Intelligence and Statistics, 2005.
    [32]
    R. Caruana, "Multitask connectionist learning," in Proceedings of the 1993 Connectionist Models Summer School, pp. 372-379, 1993.
    [33]
    P. Clifford, "Markov random fields in statistics," in Disorder in Physical Systems: A Volume in Honour of John M. Hammersley, (G. Grimmett and D. Welsh, eds.), pp. 19-32, Oxford University Press, 1990.
    [34]
    D. Cohn, Z. Ghahramani, and M. I. Jordan, "Active learning with statistical models," in Advances in Neural Information Processing Systems 7 (NIPS'94), (G. Tesauro, D. Touretzky, and T. Leen, eds.), pp. 705-712, Cambridge MA: MIT Press, 1995.
    [35]
    T. F. Coleman and Z. Wu, "Parallel continuation-based global optimization for molecular conformation and protein folding," Technical Report Cornell University, Dept. of Computer Science, 1994.
    [36]
    R. Collobert and S. Bengio, "Links between perceptrons, MLPs and SVMs," in Proceedings of the Twenty-first International Conference on Machine Learning (ICML'04), (C. E. Brodley, ed.), p. 23, New York, NY, USA: ACM, 2004.
    [37]
    R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 160-167, ACM, 2008.
    [38]
    C. Cortes, P. Haffner, and M. Mohri, "Rational kernels: Theory and algorithms," Journal of Machine Learning Research, vol. 5, pp. 1035-1062, 2004.
    [39]
    C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, pp. 273-297, 1995.
    [40]
    N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On kerneltarget alignment," in Advances in Neural Information Processing Systems 14 (NIPS'01), (T. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 367-373, 2002.
    [41]
    F. Cucker and D. Grigoriev, "Complexity lower bounds for approximation algebraic computation trees," Journal of Complexity, vol. 15, no. 4, pp. 499- 512, 1999.
    [42]
    P. Dayan, G. E. Hinton, R. Neal, and R. Zemel, "The Helmholtz machine," Neural Computation, vol. 7, pp. 889-904, 1995.
    [43]
    S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
    [44]
    O. Delalleau, Y. Bengio, and N. L. Roux, "Efficient non-parametric function induction in semi-supervised learning," in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, (R. G. Cowell and Z. Ghahramani, eds.), pp. 96-103, Society for Artificial Intelligence and Statistics, January 2005.
    [45]
    G. Desjardins and Y. Bengio, "Empirical evaluation of convolutional RBMs for vision," Technical Report 1327, Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, 2008.
    [46]
    E. Doi, D. C. Balcan, and M. S. Lewicki, "A theoretical analysis of robust coding over noisy overcomplete channels," in Advances in Neural Information Processing Systems 18 (NIPS'05), (Y. Weiss, B. Schölkopf, and J. Platt, eds.), pp. 307-314, Cambridge, MA: MIT Press, 2006.
    [47]
    D. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006.
    [48]
    S. Duane, A. Kennedy, B. Pendleton, and D. Roweth, "Hybrid Monte Carlo," Phys. Lett. B, vol. 195, pp. 216-222, 1987.
    [49]
    J. L. Elman, "Learning and development in neural networks: The importance of starting small," Cognition, vol. 48, pp. 781-799, 1993.
    [50]
    D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, "The difficulty of training deep architectures and the effect of unsupervised pretraining," in Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS'09), pp. 153-160, 2009.
    [51]
    Y. Freund and D. Haussler, "Unsupervised learning of distributions on binary vectors using two layer networks," Technical Report UCSC-CRL-94-25, University of California, Santa Cruz, 1994.
    [52]
    Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," in Machine Learning: Proceedings of Thirteenth International Conference, pp. 148-156, USA: ACM, 1996.
    [53]
    B. J. Frey, G. E. Hinton, and P. Dayan, "Does the wake-sleep algorithm learn good density estimators?," in Advances in Neural Information Processing Systems 8 (NIPS'95), (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), pp. 661-670, Cambridge, MA: MIT Press, 1996.
    [54]
    K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biological Cybernetics, vol. 36, pp. 193-202, 1980.
    [55]
    P. Gallinari, Y. LeCun, S. Thiria, and F. Fogelman-Soulie, "Memoires associatives distribuees," in Proceedings of COGNITIVA 87, Paris, La Villette, 1987.
    [56]
    T. Gärtner, "A survey of kernels for structured data," ACM SIGKDD Explorations Newsletter, vol. 5, no. 1, pp. 49-58, 2003.
    [57]
    S. Geman and D. Geman, "Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 721-741, November 1984.
    [58]
    R. Grosse, R. Raina, H. Kwong, and A. Y. Ng, "Shift-invariant sparse coding for audio classification," in Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence (UAI'07), 2007.
    [59]
    R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'06), pp. 1735-1742, IEEE Press, 2006.
    [60]
    R. Hadsell, A. Erkan, P. Sermanet, M. Scoffier, U. Muller, and Y. LeCun, "Deep belief net learning in a long-range vision system for autonomous offroad driving," in Proc. Intelligent Robots and Systems (IROS'08), pp. 628-633, 2008.
    [61]
    J. M. Hammersley and P. Clifford, "Markov field on finite graphs and lattices," Unpublished manuscript, 1971.
    [62]
    J. Håstad, "Almost optimal lower bounds for small depth circuits," in Proceedings of the 18th annual ACM Symposium on Theory of Computing, pp. 6-20, Berkeley, California: ACM Press, 1986.
    [63]
    J. Håstad and M. Goldmann, "On the power of small-depth threshold circuits," Computational Complexity, vol. 1, pp. 113-129, 1991.
    [64]
    T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, "The entire regularization path for the support vector machine," Journal of Machine Learning Research, vol. 5, pp. 1391-1415, 2004.
    [65]
    K. A. Heller and Z. Ghahramani, "A nonparametric bayesian approach to modeling overlapping clusters," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), pp. 187-194, San Juan, Porto Rico: Omnipress, 2007.
    [66]
    K. A. Heller, S. Williamson, and Z. Ghahramani, "Statistical models for partial membership," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 392-399, ACM, 2008.
    [67]
    G. Hinton and J. Anderson, Parallel Models of Associative Memory. Hillsdale, NJ: Lawrence Erlbaum Assoc., 1981.
    [68]
    G. E. Hinton, "Learning distributed representations of concepts," in Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pp. 1-12, Amherst: Lawrence Erlbaum, Hillsdale, 1986.
    [69]
    G. E. Hinton, "Products of experts," in Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN), vol. 1, pp. 1-6, Edinburgh, Scotland: IEE, 1999.
    [70]
    G. E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, vol. 14, pp. 1771-1800, 2002.
    [71]
    G. E. Hinton, "To recognize shapes, first learn to generate images," Technical Report UTML TR 2006-003, University of Toronto, 2006.
    [72]
    G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal, "The wake-sleep algorithm for unsupervised neural networks," Science, vol. 268, pp. 1558-1161, 1995.
    [73]
    G. E. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.
    [74]
    G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
    [75]
    G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, pp. 504-507, 2006.
    [76]
    G. E. Hinton and T. J. Sejnowski, "Learning and relearning in Boltzmann machines," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, (D. E. Rumelhart and J. L. McClelland, eds.), pp. 282-317, Cambridge, MA: MIT Press, 1986.
    [77]
    G. E. Hinton, T. J. Sejnowski, and D. H. Ackley, "Boltzmann machines: Constraint satisfaction networks that learn," Technical Report TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science, 1984.
    [78]
    G. E. Hinton, M. Welling, Y. W. Teh, and S. Osindero, "A new view of ICA," in Proceedings of 3rd International Conference on Independent Component Analysis and Blind Signal Separation (ICA'01), pp. 746-751, San Diego, CA, 2001.
    [79]
    G. E. Hinton and R. S. Zemel, "Autoencoders, minimum description length, and Helmholtz free energy," in Advances in Neural Information Processing Systems 6 (NIPS'93), (D. Cowan, G. Tesauro, and J. Alspector, eds.), pp. 3-10, Morgan Kaufmann Publishers, Inc., 1994.
    [80]
    T. K. Ho, "Random decision forest," in 3rd International Conference on Document Analysis and Recognition (ICDAR'95), pp. 278-282, Montreal, Canada, 1995.
    [81]
    S. Hochreiter Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.
    [82]
    H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, vol. 24, pp. 417-441, 498-520, 1933.
    [83]
    D. H. Hubel and T. N. Wiesel, "Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex," Journal of Physiology (London), vol. 160, pp. 106-154, 1962.
    [84]
    A. Hyvärinen, "Estimation of non-normalized statistical models using score matching," Journal of Machine Learning Research, vol. 6, pp. 695-709, 2005.
    [85]
    A. Hyvärinen, "Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables," IEEE Transactions on Neural Networks, vol. 18, pp. 1529-1531, 2007.
    [86]
    A. Hyvärinen, "Some extensions of score matching," Computational Statistics and Data Analysis, vol. 51, pp. 2499-2512, 2007.
    [87]
    A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. Wiley-Interscience, May 2001.
    [88]
    N. Intrator and S. Edelman, "How to make a low-dimensional representation suitable for diverse tasks," Connection Science, Special issue on Transfer in Neural Networks, vol. 8, pp. 205-224, 1996.
    [89]
    T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers," Available from http://www.cse.ucsc.edu/haussler/pubs.html, Preprint, Dept.of Computer Science, Univ. of California. A shorter version is in Advances in Neural Information Processing Systems 11, 1998.
    [90]
    N. Japkowicz, S. J. Hanson, and M. A. Gluck, "Nonlinear autoassociation is not equivalent to PCA," Neural Computation, vol. 12, no. 3, pp. 531-545, 2000.
    [91]
    M. I. Jordan, Learning in Graphical Models. Dordrecht, Netherlands: Kluwer, 1998.
    [92]
    K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "Fast inference in sparse coding algorithms with applications to object recognition," Technical Report, Computational and Biological Learning Lab, Courant Institute, NYU, Technical Report CBLL-TR-2008-12-01, 2008.
    [93]
    S. Kirkpatrick, C. D. G. Jr., and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, pp. 671-680, 1983.
    [94]
    U. Köster and A. Hyvärinen, "A two-layer ICA-like model estimated by score matching," in Int. Conf. Artificial Neural Networks (ICANN'2007), pp. 798- 807, 2007.
    [95]
    K. A. Krueger and P. Dayan, "Flexible shaping: How learning in small steps helps," Cognition, vol. 110, pp. 380-394, 2009.
    [96]
    G. Lanckriet, N. Cristianini, P. Bartlett, L. El Gahoui, and M. Jordan, "Learning the kernel matrix with semi-definite programming," in Proceedings of the Nineteenth International Conference on Machine Learning (ICML'02), (C. Sammut and A. G. Hoffmann, eds.), pp. 323-330, Morgan Kaufmann, 2002.
    [97]
    H. Larochelle and Y. Bengio, "Classification using discriminative restricted Boltzmann machines," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 536-543, ACM, 2008.
    [98]
    H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, "Exploring strategies for training deep neural networks," Journal of Machine Learning Research, vol. 10, pp. 1-40, 2009.
    [99]
    H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, "An empirical evaluation of deep architectures on problems with many factors of variation," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 473-480, ACM, 2007.
    [100]
    J. A. Lasserre, C. M. Bishop, and T. P. Minka, "Principled hybrids of generative and discriminative models," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'06), pp. 87-94, Washington, DC, USA, 2006. IEEE Computer Society.
    [101]
    Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [102]
    N. Le Roux and Y. Bengio, "Representational power of restricted boltzmann machines and deep belief networks," Neural Computation, vol. 20, no. 6, pp. 1631-1649, 2008.
    [103]
    Y. LeCun, "Modèles connexionistes de l'apprentissage," PhD thesis, Universit é de Paris VI, 1987.
    [104]
    Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 1989.
    [105]
    Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient BackProp," in Neural Networks: Tricks of the Trade, (G. B. Orr and K.-R. Müller, eds.), pp. 9-50, Springer, 1998.
    [106]
    Y. LeCun, S. Chopra, R. M. Hadsell, M.-A. Ranzato, and F.-J. Huang, "A tutorial on energy-based learning," in Predicting Structured Data, pp. 191- 246, G. Bakir and T. Hofman and B. Scholkopf and A. Smola and B. Taskar: MIT Press, 2006.
    [107]
    Y. LeCun and F. Huang, "Loss functions for discriminative training of energy-based models," in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS'05), (R. G. Cowell and Z. Ghahramani, eds.), 2005.
    [108]
    Y. LeCun, F.-J. Huang, and L. Bottou, "Learning methods for generic object recognition with invariance to pose and lighting," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'04), vol. 2, pp. 97- 104, Los Alamitos, CA, USA: IEEE Computer Society, 2004.
    [109]
    H. Lee, A. Battle, R. Raina, and A. Ng, "Efficient sparse coding algorithms," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 801-808, MIT Press, 2007.
    [110]
    H. Lee, C. Ekanadham, and A. Ng, "Sparse deep belief net model for visual area V2," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. P. Roweis, eds.), Cambridge, MA: MIT Press, 2008.
    [111]
    H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations," in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML'09), (L. Bottou and M. Littman, eds.), Montreal (Qc), Canada: ACM, 2009.
    [112]
    T.-S. Lee and D. Mumford, "Hierarchical bayesian inference in the visual cortex," Journal of Optical Society of America, A, vol. 20, no. 7, pp. 1434- 1448, 2003.
    [113]
    P. Lennie, "The cost of cortical computation," Current Biology, vol. 13, pp. 493-497, Mar 18 2003.
    [114]
    I. Levner, Data Driven Object Segmentation. 2008. PhD thesis, Department of Computer Science, University of Alberta.
    [115]
    M. Lewicki and T. Sejnowski, "Learning nonlinear overcomplete representations for efficient coding," in Advances in Neural Information Processing Systems 10 (NIPS'97), (M. Jordan, M. Kearns, and S. Solla, eds.), pp. 556-562, Cambridge, MA, USA: MIT Press, 1998.
    [116]
    M. S. Lewicki and T. J. Sejnowski, "Learning overcomplete representations," Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
    [117]
    M. Li and P. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications. New York, NY: Springer, Second ed., 1997.
    [118]
    P. Liang and M. I. Jordan, "An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 584-591, New York, NY, USA: ACM, 2008.
    [119]
    T. Lin, B. G. Horne, P. Tino, and C. L. Giles, "Learning long-term dependencies is not as difficult with NARX recurrent neural networks," Technical Report UMICAS-TR-95-78, Institute for Advanced Computer Studies, University of Mariland, 1995.
    [120]
    G. Loosli, S. Canu, and L. Bottou, "Training invariant support vector machines using selective sampling," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), pp. 301-320, Cambridge, MA: MIT Press, 2007.
    [121]
    J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Supervised dictionary learning," in Advances in Neural Information Processing Systems 21 (NIPS'08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1033-1040, 2009. NIPS Foundation.
    [122]
    J. L. McClelland and D. E. Rumelhart, "An interactive activation model of context effects in letter perception," Psychological Review, pp. 375-407, 1981.
    [123]
    J. L. McClelland and D. E. Rumelhart, Explorations in parallel distributed processing. Cambridge: MIT Press, 1988.
    [124]
    J. L. McClelland, D. E. Rumelhart, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 2. Cambridge: MIT Press, 1986.
    [125]
    W. S. McCulloch and W. Pitts, "A logical calculus of ideas immanent in nervous activity," Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.
    [126]
    R. Memisevic and G. E. Hinton, "Unsupervised learning of image transformations," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'07), 2007.
    [127]
    E. Mendelson, Introduction to Mathematical Logic, 4th ed. 1997. Chapman & Hall.
    [128]
    R. Miikkulainen and M. G. Dyer, "Natural language processing with modular PDP networks and distributed lexicon," Cognitive Science, vol. 15, pp. 343-399, 1991.
    [129]
    A. Mnih and G. E. Hinton, "Three new graphical models for statistical language modelling," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 641-648, ACM, 2007.
    [130]
    A. Mnih and G. E. Hinton, "A scalable hierarchical distributed language model," in Advances in Neural Information Processing Systems 21 (NIPS'08), (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1081-1088, 2009.
    [131]
    H. Mobahi, R. Collobert, and J. Weston, "Deep Learning from temporal coherence in video," in Proceedings of the 26th International Conference on Machine Learning, (L. Bottou and M. Littman, eds.), pp. 737-744, Montreal: Omnipress, June 2009.
    [132]
    J. More and Z. Wu, "Smoothing techniques for macromolecular global optimization," in Nonlinear Optimization and Applications, (G. D. Pillo and F. Giannessi, eds.), Plenum Press, 1996.
    [133]
    I. Murray and R. Salakhutdinov, "Evaluating probabilities under highdimensional latent variable models," in Advances in Neural Information Processing Systems 21 (NIPS'08), vol. 21, (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.), pp. 1137-1144, 2009.
    [134]
    J. Mutch and D. G. Lowe, "Object class recognition and localization using sparse features with limited receptive fields," International Journal of Computer Vision, vol. 80, no. 1, pp. 45-57, 2008.
    [135]
    R. M. Neal, "Connectionist learning of belief networks," Artificial Intelligence, vol. 56, pp. 71-113, 1992.
    [136]
    R. M. Neal, "Bayesian learning for neural networks," PhD thesis, Department of Computer Science, University of Toronto, 1994.
    [137]
    A. Y. Ng and M. I. Jordan, "On Discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes," in Advances in Neural Information Processing Systems 14 (NIPS'01), (T. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 841-848, 2002.
    [138]
    J. Niebles and L. Fei-Fei, "A hierarchical model of shape and appearance for human action classification," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'07), 2007.
    [139]
    B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: a strategy employed by V1?," Vision Research, vol. 37, pp. 3311-3325, December 1997.
    [140]
    P. Orponen, "Computational complexity of neural networks: a survey," Nordic Journal of Computing, vol. 1, no. 1, pp. 94-110, 1994.
    [141]
    S. Osindero and G. E. Hinton, "Modeling image patches with a directed hierarchy of Markov random field," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), pp. 1121-1128, Cambridge, MA: MIT Press, 2008.
    [142]
    B. Pearlmutter and L. C. Parra, "A context-sensitive generalization of ICA," in International Conference On Neural Information Processing, (L. Xu, ed.), pp. 151-157, Hong-Kong, 1996.
    [143]
    E. Pérez and L. A. Rendell, "Learning despite concept variation by finding structure in attribute-based data," in Proceedings of the Thirteenth International Conference on Machine Learning (ICML'96), (L. Saitta, ed.), pp. 391-399, Morgan Kaufmann, 1996.
    [144]
    G. B. Peterson, "A day of great illumination: B. F. Skinner's discovery of shaping," Journal of the Experimental Analysis of Behavior, vol. 82, no. 3, pp. 317-328, 2004.
    [145]
    N. Pinto, J. DiCarlo, and D. Cox, "Establishing good benchmarks and baselines for face recognition," in ECCV 2008 Faces in 'Real-Life' Images Workshop, 2008. Marseille France, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie.
    [146]
    J. B. Pollack, "Recursive distributed representations," Artificial Intelligence, vol. 46, no. 1, pp. 77-105, 1990.
    [147]
    L. R. Rabiner and B. H. Juang, "An Introduction to hidden Markov models," IEEE ASSP Magazine, pp. 257-285, January 1986.
    [148]
    R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, "Self-taught learning: transfer learning from unlabeled data," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 759-766, ACM, 2007.
    [149]
    M. Ranzato, Y. Boureau, S. Chopra, and Y. LeCun, "A unified energy-based framework for unsupervised learning," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Porto Rico: Omnipress, 2007.
    [150]
    M. Ranzato, Y.-L. Boureau, and Y. LeCun, "Sparse feature learning for deep belief networks," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), pp. 1185- 1192, Cambridge, MA: MIT Press, 2008.
    [151]
    M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'07), IEEE Press, 2007.
    [152]
    M. Ranzato and Y. LeCun, "A sparse and locally shift invariant feature extractor applied to document images," in International Conference on Document Analysis and Recognition (ICDAR'07), pp. 1213-1217, Washington, DC, USA: IEEE Computer Society, 2007.
    [153]
    M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, "Efficient learning of sparse representations with an energy-based model," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 1137-1144, MIT Press, 2007.
    [154]
    M. Ranzato and M. Szummer, "Semi-supervised learning of compact document representations with deep networks," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), vol. 307, (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 792-799, ACM, 2008.
    [155]
    S. Roweis and L. K. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
    [156]
    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533-536, 1986.
    [157]
    D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1, Cambridge: MIT Press, 1986.
    [158]
    R. Salakhutdinov and G. E. Hinton, "Learning a nonlinear embedding by preserving class neighbourhood structure," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Porto Rico: Omnipress, 2007.
    [159]
    R. Salakhutdinov and G. E. Hinton, "Semantic hashing," in Proceedings of the 2007 Workshop on Information Retrieval and applications of Graphical Models (SIGIR 2007), Amsterdam: Elsevier, 2007.
    [160]
    R. Salakhutdinov and G. E. Hinton, "Using deep belief nets to learn covariance kernels for Gaussian processes," in Advances in Neural Information Processing Systems 20 (NIPS'07), (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), pp. 1249-1256, Cambridge, MA: MIT Press, 2008.
    [161]
    R. Salakhutdinov and G. E. Hinton, "Deep Boltzmann machines," in Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS'09), vol. 5, pp. 448-455, 2009.
    [162]
    R. Salakhutdinov, A. Mnih, and G. E. Hinton, "Restricted Boltzmann machines for collaborative filtering," in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML'07), (Z. Ghahramani, ed.), pp. 791-798, New York, NY, USA: ACM, 2007.
    [163]
    R. Salakhutdinov and I. Murray, "On the quantitative analysis of deep belief networks," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 872-879, ACM, 2008.
    [164]
    L. K. Saul, T. Jaakkola, and M. I. Jordan, "Mean field theory for sigmoid belief networks," Journal of Artificial Intelligence Research, vol. 4, pp. 61-76, 1996.
    [165]
    M. Schmitt, "Descartes' rule of signs for radial basis function neural networks," Neural Computation, vol. 14, no. 12, pp. 2997-3011, 2002.
    [166]
    B. Schölkopf, C. J. C. Burges, and A. J. Smola, Advances in Kernel Methods -- Support Vector Learning. Cambridge, MA: MIT Press, 1999.
    [167]
    B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola, "Input space versus feature space in kernel-based methods," IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 1000-1017, 1999.
    [168]
    B. Schölkopf, A. Smola, and K.-R. Müller, "Nonlinear component analysis as a kernel eigenvalue problem," Neural Computation, vol. 10, pp. 1299-1319, 1998.
    [169]
    H. Schwenk, "Efficient training of large neural networks for language modeling," in International Joint Conference on Neural Networks (IJCNN), pp. 3050-3064, 2004.
    [170]
    H. Schwenk and J.-L. Gauvain, "Connectionist language modeling for large vocabulary continuous speech recognition," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 765-768, Orlando, Florida, 2002.
    [171]
    H. Schwenk and J.-L. Gauvain, "Building continuous space language models for transcribing European languages," in Interspeech, pp. 737-740, 2005.
    [172]
    H. Schwenk and M. Milgram, "Transformation invariant autoassociation with application to handwritten character recognition," in Advances in Neural Information Processing Systems 7 (NIPS'94), (G. Tesauro, D. Touretzky, and T. Leen, eds.), pp. 991-998, MIT Press, 1995.
    [173]
    T. Serre, G. Kreiman, M. Kouh, C. Cadieu, U. Knoblich, and T. Poggio, "A quantitative theory of immediate visual recognition," Progress in Brain Research, Computational Neuroscience: Theoretical Insights into Brain Function, vol. 165, pp. 33-56, 2007.
    [174]
    S. H. Seung, "Learning continuous attractors in recurrent networks," in Advances in Neural Information Processing Systems 10 (NIPS'97), (M. Jordan, M. Kearns, and S. Solla, eds.), pp. 654-660, MIT Press, 1998.
    [175]
    D. Simard, P. Y. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks," in International Conference on Document Analysis and Recognition (ICDAR'03), p. 958, Washington, DC, USA: IEEE Computer Society, 2003.
    [176]
    P. Y. Simard, Y. LeCun, and J. Denker, "Efficient pattern recognition using a new transformation distance," in Advances in Neural Information Processing Systems 5 (NIPS'92), (C. Giles, S. Hanson, and J. Cowan, eds.), pp. 50-58, Morgan Kaufmann, San Mateo, 1993.
    [177]
    B. F. Skinner, "Reinforcement today," American Psychologist, vol. 13, pp. 94- 99, 1958.
    [178]
    P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," in Parallel Distributed Processing, vol. 1, (D. E. Rumelhart and J. L. McClelland, eds.), pp. 194-281, Cambridge: MIT Press, 1986. ch. 6.
    [179]
    E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky, "Describing visual scenes using transformed objects and parts," International Journal of Computer Vision, vol. 77, pp. 291-330, 2007.
    [180]
    I. Sutskever and G. E. Hinton, "Learning multilevel distributed representations for high-dimensional sequences," in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Porto Rico: Omnipress, 2007.
    [181]
    R. Sutton and A. Barto, Reinforcement Learning: An Introduction. MIT Press, 1998.
    [182]
    G. Taylor and G. Hinton, "Factored conditional restricted Boltzmann machines for modeling motion style," in Proceedings of the 26th International Conference on Machine Learning (ICML'09), (L. Bottou and M. Littman, eds.), pp. 1025-1032, Montreal: Omnipress, June 2009.
    [183]
    G. Taylor, G. E. Hinton, and S. Roweis, "Modeling human motion using binary latent variables," in Advances in Neural Information Processing Systems 19 (NIPS'06), (B. Schölkopf, J. Platt, and T. Hoffman, eds.), pp. 1345-1352, Cambridge, MA: MIT Press, 2007.
    [184]
    Y. Teh, M. Welling, S. Osindero, and G. E. Hinton, "Energy-based models for sparse overcomplete representations," Journal of Machine Learning Research, vol. 4, pp. 1235-1260, 2003.
    [185]
    J. Tenenbaum, V. de Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, pp. 2319- 2323, 2000.
    [186]
    S. Thrun, "Is learning the n-th thing any easier than learning the first?," in Advances in Neural Information Processing Systems 8 (NIPS'95), (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), pp. 640-646, Cambridge, MA: MIT Press, 1996.
    [187]
    T. Tieleman, "Training restricted Boltzmann machines using approximations to the likelihood gradient," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1064-1071, ACM, 2008.
    [188]
    T. Tieleman and G. Hinton, "Using fast weights to improve persistent contrastive divergence," in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML'09), (L. Bottou and M. Littman, eds.), pp. 1033-1040, New York, NY, USA: ACM, 2009.
    [189]
    I. Titov and J. Henderson, "Constituent parsing with incremental sigmoid belief networks," in Proc. 45th Meeting of Association for Computational Linguistics (ACL'07), pp. 632-639, Prague, Czech Republic, 2007.
    [190]
    A. Torralba, R. Fergus, and Y. Weiss, "Small codes and large databases for recognition," in Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR'08), pp. 1-8, 2008.
    [191]
    P. E. Utgoff and D. J. Stracuzzi, "Many-layered learning," Neural Computation, vol. 14, pp. 2497-2539, 2002.
    [192]
    L. van der Maaten and G. E. Hinton, "Visualizing data using t-Sne," Journal of Machine Learning Research, vol. 9, pp. 2579-2605, November 2008.
    [193]
    V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
    [194]
    R. Vilalta, G. Blix, and L. Rendell, "Global data analysis and the fragmentation problem in decision tree induction," in Proceedings of the 9th European Conference on Machine Learning (ECML'97), pp. 312-327, Springer-Verlag, 1997.
    [195]
    P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1096-1103, ACM, 2008.
    [196]
    L. Wang and K. L. Chan, "Learning kernel parameters by using class separability measure," 6th kernel machines workshop, in conjunction with Neural Information Processing Systems (NIPS), 2002.
    [197]
    M. Weber, M. Welling, and P. Perona, "Unsupervised learning of models for recognition," in Proc. 6th Europ. Conf. Comp. Vis., ECCV2000, pp. 18-32, Dublin, 2000.
    [198]
    I. Wegener, The Complexity of Boolean Functions. John Wiley & Sons, 1987.
    [199]
    Y. Weiss, "Segmentation using eigenvectors: a unifying view," in Proceedings IEEE International Conference on Computer Vision (ICCV'99), pp. 975-982, 1999.
    [200]
    M. Welling, M. Rosen-Zvi, and G. E. Hinton, "Exponential family harmoniums with an application to information retrieval," in Advances in Neural Information Processing Systems 17 (NIPS'04), (L. Saul, Y. Weiss, and L. Bottou, eds.), pp. 1481-1488, Cambridge, MA: MIT Press, 2005.
    [201]
    M. Welling, R. Zemel, and G. E. Hinton, "Self-supervised boosting," in Advances in Neural Information Processing Systems 15 (NIPS'02), (S. Becker, S. Thrun, and K. Obermayer, eds.), pp. 665-672, MIT Press, 2003.
    [202]
    J. Weston, F. Ratle, and R. Collobert, "Deep learning via semi-supervised embedding," in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML'08), (W. W. Cohen, A. McCallum, and S. T. Roweis, eds.), pp. 1168-1175, New York, NY, USA: ACM, 2008.
    [203]
    C. K. I. Williams and C. E. Rasmussen, "Gaussian processes for regression," in Advances in neural information processing systems 8 (NIPS'95), (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), pp. 514-520, Cambridge, MA: MIT Press, 1996.
    [204]
    L. Wiskott and T. J. Sejnowski, "Slow feature analysis: Unsupervised learning of invariances," Neural Computation, vol. 14, no. 4, pp. 715-770, 2002.
    [205]
    D. H. Wolpert, "Stacked generalization," Neural Networks, vol. 5, pp. 241-249, 1992.
    [206]
    Z. Wu, "Global continuation for distance geometry problems," SIAM Journal of Optimization, vol. 7, pp. 814-836, 1997.
    [207]
    P. Xu, A. Emami, and F. Jelinek, "Training connectionist models for the structured language model," in Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP'2003), vol. 10, pp. 160-167, 2003.
    [208]
    A. Yao, "Separating the polynomial-time hierarchy by oracles," in Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, pp. 1-10, 1985.
    [209]
    D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, and B. Schölkopf, "Learning with local and global consistency," in Advances in Neural Information Processing Systems 16 (NIPS'03), (S. Thrun, L. Saul, and B. Schölkopf, eds.), pp. 321-328, Cambridge, MA: MIT Press, 2004.
    [210]
    X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-supervised learning using Gaussian fields and harmonic functions," in Proceedings of the Twenty International Conference on Machine Learning (ICML'03), (T. Fawcett and N. Mishra, eds.), pp. 912-919, AAAI Press, 2003.
    [211]
    M. Zinkevich, "Online convex programming and generalized infinitesimal gradient ascent," in Proceedings of the Twenty International Conference on Machine Learning (ICML'03), (T. Fawcett and N. Mishra, eds.), pp. 928-936, AAAI Press, 2003.

    Cited By

    View all
    • (2024)Continuous lipreading based on acoustic temporal alignmentsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00345-72024:1Online publication date: 6-May-2024
    • (2024)Short-Term Fog Forecasting using Meteorological Observations at Airports in North IndiaProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632449(307-315)Online publication date: 4-Jan-2024
    • (2024)Hierarchical Locality-Aware Deep Dictionary Learning for ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326661026(447-461)Online publication date: 1-Jan-2024
    • Show More Cited By

    Index Terms

    1. Learning Deep Architectures for AI
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Foundations and Trends® in Machine Learning
      Foundations and Trends® in Machine Learning  Volume 2, Issue 1
      January 2009
      130 pages
      ISSN:1935-8237
      EISSN:1935-8245
      Issue’s Table of Contents

      Publisher

      Now Publishers Inc.

      Hanover, MA, United States

      Publication History

      Published: 01 January 2009

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Continuous lipreading based on acoustic temporal alignmentsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00345-72024:1Online publication date: 6-May-2024
      • (2024)Short-Term Fog Forecasting using Meteorological Observations at Airports in North IndiaProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632449(307-315)Online publication date: 4-Jan-2024
      • (2024)Hierarchical Locality-Aware Deep Dictionary Learning for ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.326661026(447-461)Online publication date: 1-Jan-2024
      • (2024)FPGA approximate logic synthesis through catalog-based AIG-rewriting techniqueJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103112150:COnline publication date: 1-May-2024
      • (2024)Enhancing deep feature representation in self-knowledge distillation via pyramid feature refinementPattern Recognition Letters10.1016/j.patrec.2023.12.014178:C(35-42)Online publication date: 1-Feb-2024
      • (2024)Deep Kernel Principal Component Analysis for multi-level feature learningNeural Networks10.1016/j.neunet.2023.11.045170:C(578-595)Online publication date: 12-Apr-2024
      • (2024)To understand double descent, we need to understand VC theoryNeural Networks10.1016/j.neunet.2023.10.014169:C(242-256)Online publication date: 4-Mar-2024
      • (2024)APGVAEInformation Sciences: an International Journal10.1016/j.ins.2023.119903657:COnline publication date: 1-Feb-2024
      • (2024)Hierarchical graph augmented stacked autoencoders for multi-view representation learningInformation Fusion10.1016/j.inffus.2023.102068102:COnline publication date: 1-Feb-2024
      • (2024)Deep neural network with empirical mode decomposition and Bayesian optimisation for residential load forecastingExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121355237:PAOnline publication date: 27-Feb-2024
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media