Abstract
The increasing use of domain-specific computing hardware and architectures has led to an increasing demand for unconventional computing approaches. One such approach is the Ising machine, which is designed to solve combinatorial optimization problems. Here we show that a probabilistic-bit (p-bit)-based Ising machine can be used to train deep Boltzmann networks. Using hardware-aware network topologies on field-programmable gate arrays, we train the full Modified National Institute of Standards and Technology (MNIST) and Fashion MNIST datasets without downsampling, as well as a reduced version of the Canadian Institute for Advanced Research, 10 classes (CIFAR-10) dataset. For the MNIST dataset, our machine, which has 4,264 nodes (p-bits) and about 30,000 parameters, can achieve the same classification accuracy (90%) as an optimized software-based restricted Boltzmann machine with approximately 3.25 million parameters. Similar results are achieved for the Fashion MNIST and CIFAR-10 datasets. The sparse deep Boltzmann network can also generate new handwritten digits and fashion products, a task the software-based restricted Boltzmann machine fails at. Our hybrid computer performs a measured 50 to 64 billion probabilistic flips per second and can perform the contrastive divergence algorithm (CD-n) with up to n = 10 million sweeps per update, which is beyond the capabilities of existing software implementations.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data that support the plots within this paper and other findings of this study are available from the corresponding authors upon reasonable request.
Code availability
The computer code used in this study is available from the corresponding authors upon reasonable request.
References
Mohseni, N., McMahon, P. L. & Byrnes, T. Ising machines as hardware solvers of combinatorial optimization problems. Nat. Rev. Phys. 4, 363â379 (2022).
Hinton, G. E., Sejnowski, T. J. & Ackley, D. H. Boltzmann Machines: Constraint Satisfaction Networks that Learn (Carnegie-Mellon University, 1984).
Huembeli, P., Arrazola, J. M., Killoran, N., Mohseni, M. & Wittek, P. The physics of energy-based models. Quantum Mach. Intell. 4, 1 (2022).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436 (2015).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Camsari, K. Y., Faria, R., Sutton, B. M. & Datta, S. Stochastic p-bits for invertible logic. Phys. Rev. X 7, 031014 (2017).
Chowdhury, S. et al. A full-stack view of probabilistic computing with p-bits: devices, architectures and algorithms. IEEE J. Explor. Solid-State Comput. Devices Circuits 9, 1â11 (2023).
Kaiser, J. et al. Hardware-aware in situ learning based on stochastic magnetic tunnel junctions. Phys. Rev. Appl. 17, 014016 (2022).
Coles, P. J. et al. Thermodynamic AI and the fluctuation frontier. In Proc. 2023 IEEE International Conference on Rebooting Computing (ICRC) 1â10 (IEEE, 2023) .
Patterson, D. et al. Carbon emissions and large neural network training. Preprint at https://arxiv.org/abs/2104.10350 (2021).
Hinton, G. E. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade: Second Edition (eds Montavon, G. et al.) 599â619 (Springer, 2012).
Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive hebbian learning in a layered network. Neural Comput. 15, 441 (2003).
Liao, R., Kornblith, S., Ren, M., Fleet, D. J. & Hinton, G. Gaussian-Bernoulli RBMs without tears. Preprint at https://arxiv.org/abs/2210.10318 (2022).
Scellier, B. & Bengio, Y. Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017).
Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T. & Bogacz, R. Backpropagation at the infinitesimal inference limit of energy-based models: unifying predictive coding, equilibrium propagation, and contrastive hebbian learning. Preprint at https://arxiv.org/abs/2206.02629 (2022).
Sejnowski, T. J. Higher-order boltzmann machines. In AIP Conference Proceedings Vol. 151, 398â403 (American Institute of Physics, 1986).
Aarts, E. & Korst, J. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing (John Wiley & Sons, Inc., 1989).
Aadit, N. A. et al. Massively parallel probabilistic computing with sparse Ising machines. Nat. Electron. 5, 460 (2022).
Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).
Andrieu, C., De Freitas, N., Doucet, A. & Jordan, M. I. An introduction to MCMC for machine learning. Mach. Learn. 50, 5 (2003).
Pervaiz, A. Z., Ghantasala, L. A., Camsari, K. Y. & Datta, S. Hardware emulation of stochastic p-bits for invertible logic. Sci. Rep. 7, 10994 (2017).
Hayakawa, K. et al. Nanosecond random telegraph noise in in-plane magnetic tunnel junctions. Phys. Rev. Lett. 126, 117202 (2021).
Safranski, C. et al. Demonstration of nanosecond operation in stochastic magnetic tunnel junctions. Nano Lett. 21, 2040 (2021).
Lee, K. et al. 1Gbit high density embedded STT-MRAM in 28nm FDSOI technology. In 2019 IEEE International Electron Devices Meeting (IEDM) 2 (IEEE, 2019).
Dattani, N., Szalay, S. & Chancellor, N. Pegasus: the second connectivity graph for large-scale quantum annealing hardware. Preprint at https://arxiv.org/abs/1901.07636 (2019).
Boothby, K., King, A. & Raymond, J. Zephyr Topology of D-Wave Quantum Processors Technical Report (D-Wave Systems, 2021).
Salakhutdinov, R. & Hinton, G. In Artificial Intelligence and Statistics 448â455 (PMLR, 2009).
Bassett, D. S. & Bullmore, E. Small-world brain networks. Neuroscientist 12, 512 (2006).
Tsai, C.-H., Yu, W.-J., Wong, W. H. & Lee, C.-Y. A 41.3/26.7 pJ per neuron weight RBM processor supporting on-chip learning/inference for IoT applications. IEEE J. Solid-State Circuits 52, 2601 (2017).
Kim, S. K., McAfee, L. C., McMahon, P. L. & Olukotun, K. A highly scalable restricted Boltzmann machine FPGA implementation. In Proc. 2009 International Conference on Field Programmable Logic and Applications 367â372 (IEEE, 2009).
Ardakani, A., Condo, C. & Gross, W. J. Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks. Preprint at https://arxiv.org/abs/1611.01427 (2016).
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771 (2002).
Adachi, S. H. & Henderson, M. P. Application of quantum annealing to training of deep neural networks. Preprint at https://arxiv.org/abs/1510.06356 (2015).
Dixit, V., Selvarajan, R., Alam, M. A., Humble, T. S. & Kais, S. Training restricted Boltzmann machines with a D-Wave quantum annealer. Front. Phys. 9, 589626 (2021).
Böhm, F., Alonso-Urquijo, D., Verschaffelt, G. & Van der Sande, G. Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning. Nat. Commun. 13, 5847 (2022).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. International Conference on Machine Learning 2256â2265 (PMLR, 2015).
Larochelle, H. & Bengio, Y. Classification using discriminative restricted Boltzmann machines. In Proc. 25th International Conference on Machine Learning 536â543 (Association for Computing Machinery, 2008).
Tieleman, T. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proc. 25th International Conference on Machine Learning 1064â1071 (Association for Computing Machinery, 2008).
LeCun, Y., Cortes, C. & Burges, C. J. C. The MNIST Database of Handwritten Digits (accessed 30 April 2020); http://yann.lecun.com/exdb/mnist/index.html
Larochelle, H., Mandel, M., Pascanu, R. & Bengio, Y. Learning algorithms for the classification restricted boltzmann machine. J. Mach. Learn. Res. 13, 643 (2012).
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602 (2017).
Hu, H., Gao, L. & Ma, Q. Deep restricted Boltzmann networks. Preprint at https://arxiv.org/abs/1611.07917 (2016).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139 (2020).
Levin, D. A. & Peres, Y. Markov Chains and Mixing Times Vol. 107 (American Mathematical Society, 2017).
Brélaz, D. New methods to color the vertices of a graph. Commun. ACM 22, 251 (1979).
Bashar, M. K. & Shukla, N. Designing Ising machines with higher order spin interactions and their application in solving combinatorial optimization. Sci. Rep. 13, 9558 (2023).
Bybee, C. et al. Efficient optimization with higher-order ising machines. Nat. Commun. 14, 6033 (2023).
U250 Data Sheet (AMD Xilinx, 2023); https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#documentation
D-Wave Ocean Documentation: DNX Generators (D-Wave Systems Inc., 2021); https://docs.ocean.dwavesys.com/en/latest/docs_dnx/reference/generators.html
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504â507 (2006).
Acknowledgements
We gratefully acknowledge discussions with J. Kaiser. We thank the AMD (Xilinx) University Programme (XUP) for the FPGA development boards and G. Eschemann for useful discussions on airhdl. This work is partially supported by an Office of Naval Research Young Investigator Program grant and a National Science Foundation CCF 2106260 grant. Part of this material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under the Air Force Research Laboratory (AFRL) Agreement No. FA8650-23-3-7313. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government.
Author information
Authors and Affiliations
Contributions
S.N. and K.Y.C. conceived the study. K.Y.C. supervised the study. S.N. and N.A.A. developed the hybrid FPGAâCPU implementation. S.N. and S.C. performed the benchmark RBM training. S.N. and N.A.A. performed the FPGA experiments to train sparse DBMs. S.N., N.A.A., M.M., S.C., Y.Q. and K.Y.C. discussed, analysed the experiments and participated in writing the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Electronics thanks Suhas Kumar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Training sparse DBMs.
A pseudocode of the algorithm used in this work to train sparse DBMs.
Supplementary information
Supplementary Information
Supplementary Discussion, Figs. 1â13 and Tables 1â3.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Niazi, S., Chowdhury, S., Aadit, N.A. et al. Training deep Boltzmann networks with sparse Ising machines. Nat Electron 7, 610â619 (2024). https://doi.org/10.1038/s41928-024-01182-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41928-024-01182-4
This article is cited by
-
Spintronic foundation cells for large-scale integration
Nature Reviews Electrical Engineering (2024)