Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3313276.3316372acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Learning restricted Boltzmann machines via influence maximization

Published: 23 June 2019 Publication History

Abstract

Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are a popular model with wide-ranging applications in dimensionality reduction, collaborative filtering, topic modeling, feature extraction and deep learning.
The main message of our paper is a strong dichotomy in the feasibility of learning RBMs, depending on the nature of the interactions between variables: ferromagnetic models can be learned efficiently, while general models cannot. In particular, we give a simple greedy algorithm based on influence maximization to learn ferromagnetic RBMs with bounded degree. In fact, we learn a description of the distribution on the observed variables as a Markov Random Field. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Our algorithm extends straighforwardly to general ferromagnetic Ising models with latent variables.
Conversely, we show that even for a contant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs: the distribution on their observed variables can simulate any bounded order MRF. This result is of independent interest since RBMs are the building blocks of deep belief networks.

References

[1]
Animashree Anandkumar, Vincent YF Tan, Furong Huang, and Alan S Willsky. 2012. High-dimensional structure estimation in Ising models: Local separation criterion. The Annals of Statistics (2012), 1346–1375.
[2]
Animashree Anandkumar and Ragupathyraj Valluvan. 2013.
[3]
Learning loopy graphical models with latent variables: Efficient methods and guarantees. The Annals of Statistics (2013), 401–435.
[4]
Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. 2014. Provable bounds for learning some deep representations. In International Conference on Machine Learning. 584–592.
[5]
Alexander Barvinok. 2016. Computing the permanent of (some) complex matrices. Foundations of Computational Mathematics 16, 2 (2016), 329–342.
[6]
Andrej Bogdanov, Elchanan Mossel, and Salil Vadhan. 2008.
[7]
The complexity of distinguishing Markov random fields. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. Springer, 331–342.
[8]
Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing. ACM, 771–782.
[9]
Guy Bresler, David Gamarnik, and Devavrat Shah. 2014. Hardness of parameter estimation in graphical models. In Advances in Neural Information Processing Systems. 1062–1070.
[10]
Guy Bresler, Elchanan Mossel, and Allan Sly. 2008. Reconstruction of Markov random fields from samples: Some observations and algorithms. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. Springer, 343–356.
[11]
Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. 2007. Maximizing a submodular set function subject to a matroid constraint. In International Conference on Integer Programming and Combinatorial Optimization. Springer, 182–196.
[12]
Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. 2012. Latent variable graphical model selection via convex optimization. The Annals of Statistics 40, 4 (2012), 1935–1967.
[13]
C Chow and Cong Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory 14, 3 (1968), 462–467.
[14]
Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 215–223.
[15]
Ronen Eldan and Ohad Shamir. 2016. The power of depth for feedforward neural networks. In Conference on Learning Theory. 907–940.
[16]
Surbhi Goel and Adam Klivans. 2017. Learning depth-three neural networks in polynomial time. arXiv preprint arXiv:1709.06010 (2017).
[17]
Robert B. Griffiths, C. A. Hurst, and S. Sherman. 1970. Concavity of Magnetization of an Ising Ferromagnet in a Positive External Field. J. Math. Phys. 11, 3 (1970), 790– 795.
[18]
Linus Hamilton, Frederic Koehler, and Ankur Moitra. 2017. Information theoretic properties of Markov random fields, and their algorithmic applications. In Advances in Neural Information Processing Systems. 2460–2469.
[19]
Geoffrey E Hinton. 2009. Deep belief networks. Scholarpedia 4, 5 (2009), 5947.
[20]
Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. science 313, 5786 (2006), 504–507.
[21]
Geoffrey E Hinton and Ruslan R Salakhutdinov. 2009. Replicated softmax: an undirected topic model. In Advances in neural information processing systems. 1607–1614.
[22]
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. 2015. Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473 (2015).
[23]
M. Jerrum and A. Sinclair. 1990. Polynomial-Time Approximation Algorithms for Ising Model (Extended Abstract). In Automata, Languages and Programming. 462–475.
[24]
Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. 2011. Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems. 927–935.
[25]
David Karger and Nathan Srebro. 2001. Learning Markov networks: Maximum bounded tree-width graphs. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 392–401.
[26]
Adam Klivans and Raghu Meka. 2017. Learning Graphical Models Using Multiplicative Weights. In FOCS.
[27]
Tsung-Dao Lee and Chen-Ning Yang. 1952.
[28]
Statistical theory of equations of state and phase transitions. II. Lattice gas and Ising model. Physical Review 87, 3 (1952), 410.
[29]
Jingcheng Liu, Alistair Sinclair, and Piyush Srivastava. 2017.
[30]
The Ising Partition Function: Zeros and Deterministic Approximation. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) (2017), 986–997.
[31]
Philip M Long and Rocco Servedio. 2010. Restricted Boltzmann machines are hard to approximately evaluate or simulate. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 703–710.
[32]
Christopher W Lynn and Daniel D Lee. 2018.
[33]
Maximizing Activity in Ising Networks via the TAP Approximation. arXiv preprint arXiv:1803.00110 (2018).
[34]
James Martens, Arkadev Chattopadhya, Toni Pitassi, and Richard Zemel. 2013.
[35]
On the representational efficiency of restricted boltzmann machines. In Advances in Neural Information Processing Systems. 2877–2885.
[36]
Elchanan Mossel, Ryan O’Donnell, and Rocco P Servedio. 2003. Learning juntas. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing. ACM, 206–212.
[37]
George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizing submodular set functionsâĂŤI. Mathematical Programming 14, 1 (1978), 265–294.
[38]
Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. 2010.
[39]
Highdimensional Ising model selection using ?1-regularized logistic regression. The Annals of Statistics 38, 3 (2010), 1287–1319.
[40]
Itay Safran and Ohad Shamir. 2017.
[41]
Depth-width tradeoffs in approximating natural functions with neural networks. In International Conference on Machine Learning. 2979–2987.
[42]
Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning. ACM, 791–798.
[43]
Narayana P Santhanam and Martin J Wainwright. 2012. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory 58, 7 (2012), 4117–4134.
[44]
Allan Sly and Nike Sun. 2012. The computational hardness of counting in twospin models on d-regular graphs. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE, 361–369.
[45]
Matus Telgarsky. 2016. Benefits of depth in neural networks. In Conference on Learning Theory. 1517–1539.
[46]
Gregory Valiant. 2012. Finding correlations in subquadratic time, with applications to learning parities and juntas. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE, 11–20.
[47]
Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov. 2016. Interaction screening: Efficient and sample-optimal learning of Ising models. In Advances in Neural Information Processing Systems. 2595–2603.
[48]
Yuchen Zhang, Jason D Lee, and Michael I Jordan. 2016.
[49]
ℓ 1 -regularized neural networks are improperly learnable in polynomial time. In International Conference on Machine Learning. 993–1001.

Cited By

View all
  • (2024)A Unified Approach to Learning Ising Models: Beyond Independence and Bounded WidthProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649674(503-514)Online publication date: 10-Jun-2024
  • (2022)Mean estimation in high-dimensional binary Markov Gaussian mixture modelsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601700(19673-19686)Online publication date: 28-Nov-2022
  • (2022)Graph Learning Over Partially Observed Diffusion Networks: Role of Degree ConcentrationIEEE Open Journal of Signal Processing10.1109/OJSP.2022.31893153(335-371)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
June 2019
1258 pages
ISBN:9781450367059
DOI:10.1145/3313276
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graphical models
  2. Restricted Boltzmann Machines
  3. submodularity
  4. unsupervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

STOC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)119
  • Downloads (Last 6 weeks)17
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Unified Approach to Learning Ising Models: Beyond Independence and Bounded WidthProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649674(503-514)Online publication date: 10-Jun-2024
  • (2022)Mean estimation in high-dimensional binary Markov Gaussian mixture modelsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601700(19673-19686)Online publication date: 28-Nov-2022
  • (2022)Graph Learning Over Partially Observed Diffusion Networks: Role of Degree ConcentrationIEEE Open Journal of Signal Processing10.1109/OJSP.2022.31893153(335-371)Online publication date: 2022
  • (2022)Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS52979.2021.00049(417-426)Online publication date: Feb-2022
  • (2021)Learning latent causal graphs via mixture oraclesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541645(18087-18101)Online publication date: 6-Dec-2021
  • (2021)Submodular + concaveProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541146(11577-11591)Online publication date: 6-Dec-2021
  • (2020)Learning restricted boltzmann machines with sparse latent variablesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496313(7020-7030)Online publication date: 6-Dec-2020
  • (2020)From boltzmann machines to neural networks and back againProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496257(6354-6365)Online publication date: 6-Dec-2020
  • (2020)Graph Learning Under Partial ObservabilityProceedings of the IEEE10.1109/JPROC.2020.3013432108:11(2049-2066)Online publication date: Nov-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media