research-article

Open access

Learning restricted Boltzmann machines via influence maximization

Authors:

Frederic Koehler,

Ankur MoitraAuthors Info & Claims

STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

Pages 828 - 839

https://doi.org/10.1145/3313276.3316372

Published: 23 June 2019 Publication History

Abstract

Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are a popular model with wide-ranging applications in dimensionality reduction, collaborative filtering, topic modeling, feature extraction and deep learning.

The main message of our paper is a strong dichotomy in the feasibility of learning RBMs, depending on the nature of the interactions between variables: ferromagnetic models can be learned efficiently, while general models cannot. In particular, we give a simple greedy algorithm based on influence maximization to learn ferromagnetic RBMs with bounded degree. In fact, we learn a description of the distribution on the observed variables as a Markov Random Field. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Our algorithm extends straighforwardly to general ferromagnetic Ising models with latent variables.

Conversely, we show that even for a contant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs: the distribution on their observed variables can simulate any bounded order MRF. This result is of independent interest since RBMs are the building blocks of deep belief networks.

References

[1]

Animashree Anandkumar, Vincent YF Tan, Furong Huang, and Alan S Willsky. 2012. High-dimensional structure estimation in Ising models: Local separation criterion. The Annals of Statistics (2012), 1346–1375.

[2]

Animashree Anandkumar and Ragupathyraj Valluvan. 2013.

[3]

Learning loopy graphical models with latent variables: Efficient methods and guarantees. The Annals of Statistics (2013), 401–435.

[4]

Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. 2014. Provable bounds for learning some deep representations. In International Conference on Machine Learning. 584–592.

Digital Library

[5]

Alexander Barvinok. 2016. Computing the permanent of (some) complex matrices. Foundations of Computational Mathematics 16, 2 (2016), 329–342.

Digital Library

[6]

Andrej Bogdanov, Elchanan Mossel, and Salil Vadhan. 2008.

[7]

The complexity of distinguishing Markov random fields. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. Springer, 331–342.

Digital Library

[8]

Guy Bresler. 2015. Efficiently learning Ising models on arbitrary graphs. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing. ACM, 771–782.

Digital Library

[9]

Guy Bresler, David Gamarnik, and Devavrat Shah. 2014. Hardness of parameter estimation in graphical models. In Advances in Neural Information Processing Systems. 1062–1070.

Digital Library

[10]

Guy Bresler, Elchanan Mossel, and Allan Sly. 2008. Reconstruction of Markov random fields from samples: Some observations and algorithms. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. Springer, 343–356.

Digital Library

[11]

Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. 2007. Maximizing a submodular set function subject to a matroid constraint. In International Conference on Integer Programming and Combinatorial Optimization. Springer, 182–196.

Digital Library

[12]

Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky. 2012. Latent variable graphical model selection via convex optimization. The Annals of Statistics 40, 4 (2012), 1935–1967.

[13]

C Chow and Cong Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE transactions on Information Theory 14, 3 (1968), 462–467.

Digital Library

[14]

Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 215–223.

[15]

Ronen Eldan and Ohad Shamir. 2016. The power of depth for feedforward neural networks. In Conference on Learning Theory. 907–940.

[16]

Surbhi Goel and Adam Klivans. 2017. Learning depth-three neural networks in polynomial time. arXiv preprint arXiv:1709.06010 (2017).

[17]

Robert B. Griffiths, C. A. Hurst, and S. Sherman. 1970. Concavity of Magnetization of an Ising Ferromagnet in a Positive External Field. J. Math. Phys. 11, 3 (1970), 790– 795.

[18]

Linus Hamilton, Frederic Koehler, and Ankur Moitra. 2017. Information theoretic properties of Markov random fields, and their algorithmic applications. In Advances in Neural Information Processing Systems. 2460–2469.

Digital Library

[19]

Geoffrey E Hinton. 2009. Deep belief networks. Scholarpedia 4, 5 (2009), 5947.

[20]

Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. science 313, 5786 (2006), 504–507.

[21]

Geoffrey E Hinton and Ruslan R Salakhutdinov. 2009. Replicated softmax: an undirected topic model. In Advances in neural information processing systems. 1607–1614.

Digital Library

[22]

Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. 2015. Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473 (2015).

[23]

M. Jerrum and A. Sinclair. 1990. Polynomial-Time Approximation Algorithms for Ising Model (Extended Abstract). In Automata, Languages and Programming. 462–475.

Digital Library

[24]

Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. 2011. Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems. 927–935.

Digital Library

[25]

David Karger and Nathan Srebro. 2001. Learning Markov networks: Maximum bounded tree-width graphs. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 392–401.

Digital Library

[26]

Adam Klivans and Raghu Meka. 2017. Learning Graphical Models Using Multiplicative Weights. In FOCS.

[27]

Tsung-Dao Lee and Chen-Ning Yang. 1952.

[28]

Statistical theory of equations of state and phase transitions. II. Lattice gas and Ising model. Physical Review 87, 3 (1952), 410.

[29]

Jingcheng Liu, Alistair Sinclair, and Piyush Srivastava. 2017.

[30]

The Ising Partition Function: Zeros and Deterministic Approximation. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) (2017), 986–997.

[31]

Philip M Long and Rocco Servedio. 2010. Restricted Boltzmann machines are hard to approximately evaluate or simulate. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 703–710.

Digital Library

[32]

Christopher W Lynn and Daniel D Lee. 2018.

[33]

Maximizing Activity in Ising Networks via the TAP Approximation. arXiv preprint arXiv:1803.00110 (2018).

[34]

James Martens, Arkadev Chattopadhya, Toni Pitassi, and Richard Zemel. 2013.

[35]

On the representational efficiency of restricted boltzmann machines. In Advances in Neural Information Processing Systems. 2877–2885.

Digital Library

[36]

Elchanan Mossel, Ryan O’Donnell, and Rocco P Servedio. 2003. Learning juntas. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing. ACM, 206–212.

Digital Library

[37]

George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizing submodular set functionsâĂŤI. Mathematical Programming 14, 1 (1978), 265–294.

Digital Library

[38]

Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. 2010.

[39]

Highdimensional Ising model selection using ?1-regularized logistic regression. The Annals of Statistics 38, 3 (2010), 1287–1319.

[40]

Itay Safran and Ohad Shamir. 2017.

[41]

Depth-width tradeoffs in approximating natural functions with neural networks. In International Conference on Machine Learning. 2979–2987.

Digital Library

[42]

Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning. ACM, 791–798.

Digital Library

[43]

Narayana P Santhanam and Martin J Wainwright. 2012. Information-theoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory 58, 7 (2012), 4117–4134.

Digital Library

[44]

Allan Sly and Nike Sun. 2012. The computational hardness of counting in twospin models on d-regular graphs. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE, 361–369.

Digital Library

[45]

Matus Telgarsky. 2016. Benefits of depth in neural networks. In Conference on Learning Theory. 1517–1539.

[46]

Gregory Valiant. 2012. Finding correlations in subquadratic time, with applications to learning parities and juntas. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on. IEEE, 11–20.

Digital Library

[47]

Marc Vuffray, Sidhant Misra, Andrey Lokhov, and Michael Chertkov. 2016. Interaction screening: Efficient and sample-optimal learning of Ising models. In Advances in Neural Information Processing Systems. 2595–2603.

Digital Library

[48]

Yuchen Zhang, Jason D Lee, and Michael I Jordan. 2016.

[49]

ℓ 1 -regularized neural networks are improperly learnable in polynomial time. In International Conference on Machine Learning. 993–1001.

Digital Library

Cited By

Gaitonde JMossel EMohar BShinkar IO'Donnell R(2024)A Unified Approach to Learning Ising Models: Beyond Independence and Bounded WidthProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649674(503-514)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649674
Zhang YWeinberger NKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Mean estimation in high-dimensional binary Markov Gaussian mixture modelsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601700(19673-19686)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601700
Matta VSantos ASayed A(2022)Graph Learning Over Partially Observed Diffusion Networks: Role of Degree ConcentrationIEEE Open Journal of Signal Processing10.1109/OJSP.2022.31893153(335-371)Online publication date: 2022
https://doi.org/10.1109/OJSP.2022.3189315
Show More Cited By

Index Terms

Learning restricted Boltzmann machines via influence maximization
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
      1. Markov networks
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Sample complexity and generalization bounds
      2. Unsupervised learning and clustering

Recommendations

Learning ensemble classifiers via restricted Boltzmann machines

Recently, restricted Boltzmann machines (RBMs) have attracted considerable interest in machine learning field due to their strong ability to extract features. Given some training data, an RBM or a stack of several RBMs can be used to extract informative ...
Hardness of identity testing for restricted Boltzmann machines and Potts models

We study the identity testing problem for restricted Boltzmann machines (RBMs), and more generally, for undirected graphical models. In this problem, given sample access to the Gibbs distribution corresponding to an unknown or hidden model M^* and given an ...
Action Recognition Using Convolutional Restricted Boltzmann Machines
MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

In this work we study deep learning architectures for the problem of action recognition in image sequences focusing on generative neural networks, namely the convolutional extension of restricted Boltzmann machines (RBMs). We first use a stack of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

June 2019

1258 pages

ISBN:9781450367059

DOI:10.1145/3313276

General Chair:
Moses Charikar
Stanford University
,
Program Chair:
Edith Cohen
Google, USA / Tel Aviv University, Israel

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ONR
NSF

Conference

STOC '19

Sponsor:

SIGACT

STOC '19: 51st Annual ACM SIGACT Symposium on the Theory of Computing

June 23 - 26, 2019

AZ, Phoenix, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25

Sponsor:
sigact

57th Annual ACM Symposium on Theory of Computing (STOC 2025)

June 23 - 27, 2025

Prague , Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
582
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)17

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gaitonde JMossel EMohar BShinkar IO'Donnell R(2024)A Unified Approach to Learning Ising Models: Beyond Independence and Bounded WidthProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649674(503-514)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649674
Zhang YWeinberger NKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Mean estimation in high-dimensional binary Markov Gaussian mixture modelsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601700(19673-19686)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601700
Matta VSantos ASayed A(2022)Graph Learning Over Partially Observed Diffusion Networks: Role of Degree ConcentrationIEEE Open Journal of Signal Processing10.1109/OJSP.2022.31893153(335-371)Online publication date: 2022
https://doi.org/10.1109/OJSP.2022.3189315
Boix-Adsera EBresler GKoehler F(2022)Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS52979.2021.00049(417-426)Online publication date: Feb-2022
https://doi.org/10.1109/FOCS52979.2021.00049
Kivva BRajendran GRavikumar PAragam BRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Learning latent causal graphs via mixture oraclesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541645(18087-18101)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541645
Mitra SFeldman MKarbasi ARanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Submodular + concaveProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541146(11577-11591)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541146
Bresler GBuhai RLarochelle HRanzato MHadsell RBalcan MLin H(2020)Learning restricted boltzmann machines with sparse latent variablesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496313(7020-7030)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496313
Goel SKlivans AKoehler FLarochelle HRanzato MHadsell RBalcan MLin H(2020)From boltzmann machines to neural networks and back againProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496257(6354-6365)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496257
Matta VSantos ASayed A(2020)Graph Learning Under Partial ObservabilityProceedings of the IEEE10.1109/JPROC.2020.3013432108:11(2049-2066)Online publication date: Nov-2020
https://doi.org/10.1109/JPROC.2020.3013432

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents