Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Training deep Boltzmann networks with sparse Ising machines

Abstract

The increasing use of domain-specific computing hardware and architectures has led to an increasing demand for unconventional computing approaches. One such approach is the Ising machine, which is designed to solve combinatorial optimization problems. Here we show that a probabilistic-bit (p-bit)-based Ising machine can be used to train deep Boltzmann networks. Using hardware-aware network topologies on field-programmable gate arrays, we train the full Modified National Institute of Standards and Technology (MNIST) and Fashion MNIST datasets without downsampling, as well as a reduced version of the Canadian Institute for Advanced Research, 10 classes (CIFAR-10) dataset. For the MNIST dataset, our machine, which has 4,264 nodes (p-bits) and about 30,000 parameters, can achieve the same classification accuracy (90%) as an optimized software-based restricted Boltzmann machine with approximately 3.25 million parameters. Similar results are achieved for the Fashion MNIST and CIFAR-10 datasets. The sparse deep Boltzmann network can also generate new handwritten digits and fashion products, a task the software-based restricted Boltzmann machine fails at. Our hybrid computer performs a measured 50 to 64 billion probabilistic flips per second and can perform the contrastive divergence algorithm (CD-n) with up to n = 10 million sweeps per update, which is beyond the capabilities of existing software implementations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Hybrid computing scheme for machine learning with sparse deep BMs.
Fig. 2: MNIST dataset accuracy with sparse DBMs and RBMs.
Fig. 3: Image generation with sparse DBM and RBM.
Fig. 4: Mixing time analysis.
Fig. 5: Randomization of indices for accuracy improvement.
Fig. 6: Architecture of p-computer.

Similar content being viewed by others

Data availability

The data that support the plots within this paper and other findings of this study are available from the corresponding authors upon reasonable request.

Code availability

The computer code used in this study is available from the corresponding authors upon reasonable request.

References

  1. Mohseni, N., McMahon, P. L. & Byrnes, T. Ising machines as hardware solvers of combinatorial optimization problems. Nat. Rev. Phys. 4, 363–379 (2022).

    Article  Google Scholar 

  2. Hinton, G. E., Sejnowski, T. J. & Ackley, D. H. Boltzmann Machines: Constraint Satisfaction Networks that Learn (Carnegie-Mellon University, 1984).

  3. Huembeli, P., Arrazola, J. M., Killoran, N., Mohseni, M. & Wittek, P. The physics of energy-based models. Quantum Mach. Intell. 4, 1 (2022).

    Article  Google Scholar 

  4. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436 (2015).

    Article  Google Scholar 

  5. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  6. Camsari, K. Y., Faria, R., Sutton, B. M. & Datta, S. Stochastic p-bits for invertible logic. Phys. Rev. X 7, 031014 (2017).

    Google Scholar 

  7. Chowdhury, S. et al. A full-stack view of probabilistic computing with p-bits: devices, architectures and algorithms. IEEE J. Explor. Solid-State Comput. Devices Circuits 9, 1–11 (2023).

    Article  Google Scholar 

  8. Kaiser, J. et al. Hardware-aware in situ learning based on stochastic magnetic tunnel junctions. Phys. Rev. Appl. 17, 014016 (2022).

    Article  Google Scholar 

  9. Coles, P. J. et al. Thermodynamic AI and the fluctuation frontier. In Proc. 2023 IEEE International Conference on Rebooting Computing (ICRC) 1–10 (IEEE, 2023) .

  10. Patterson, D. et al. Carbon emissions and large neural network training. Preprint at https://arxiv.org/abs/2104.10350 (2021).

  11. Hinton, G. E. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade: Second Edition (eds Montavon, G. et al.) 599–619 (Springer, 2012).

  12. Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive hebbian learning in a layered network. Neural Comput. 15, 441 (2003).

    Article  Google Scholar 

  13. Liao, R., Kornblith, S., Ren, M., Fleet, D. J. & Hinton, G. Gaussian-Bernoulli RBMs without tears. Preprint at https://arxiv.org/abs/2210.10318 (2022).

  14. Scellier, B. & Bengio, Y. Equilibrium propagation: bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017).

    Article  Google Scholar 

  15. Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T. & Bogacz, R. Backpropagation at the infinitesimal inference limit of energy-based models: unifying predictive coding, equilibrium propagation, and contrastive hebbian learning. Preprint at https://arxiv.org/abs/2206.02629 (2022).

  16. Sejnowski, T. J. Higher-order boltzmann machines. In AIP Conference Proceedings Vol. 151, 398–403 (American Institute of Physics, 1986).

  17. Aarts, E. & Korst, J. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing (John Wiley & Sons, Inc., 1989).

  18. Aadit, N. A. et al. Massively parallel probabilistic computing with sparse Ising machines. Nat. Electron. 5, 460 (2022).

    Article  Google Scholar 

  19. Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).

  20. Andrieu, C., De Freitas, N., Doucet, A. & Jordan, M. I. An introduction to MCMC for machine learning. Mach. Learn. 50, 5 (2003).

    Article  Google Scholar 

  21. Pervaiz, A. Z., Ghantasala, L. A., Camsari, K. Y. & Datta, S. Hardware emulation of stochastic p-bits for invertible logic. Sci. Rep. 7, 10994 (2017).

    Article  Google Scholar 

  22. Hayakawa, K. et al. Nanosecond random telegraph noise in in-plane magnetic tunnel junctions. Phys. Rev. Lett. 126, 117202 (2021).

    Article  Google Scholar 

  23. Safranski, C. et al. Demonstration of nanosecond operation in stochastic magnetic tunnel junctions. Nano Lett. 21, 2040 (2021).

    Article  Google Scholar 

  24. Lee, K. et al. 1Gbit high density embedded STT-MRAM in 28nm FDSOI technology. In 2019 IEEE International Electron Devices Meeting (IEDM) 2 (IEEE, 2019).

  25. Dattani, N., Szalay, S. & Chancellor, N. Pegasus: the second connectivity graph for large-scale quantum annealing hardware. Preprint at https://arxiv.org/abs/1901.07636 (2019).

  26. Boothby, K., King, A. & Raymond, J. Zephyr Topology of D-Wave Quantum Processors Technical Report (D-Wave Systems, 2021).

  27. Salakhutdinov, R. & Hinton, G. In Artificial Intelligence and Statistics 448–455 (PMLR, 2009).

  28. Bassett, D. S. & Bullmore, E. Small-world brain networks. Neuroscientist 12, 512 (2006).

    Article  Google Scholar 

  29. Tsai, C.-H., Yu, W.-J., Wong, W. H. & Lee, C.-Y. A 41.3/26.7 pJ per neuron weight RBM processor supporting on-chip learning/inference for IoT applications. IEEE J. Solid-State Circuits 52, 2601 (2017).

    Article  Google Scholar 

  30. Kim, S. K., McAfee, L. C., McMahon, P. L. & Olukotun, K. A highly scalable restricted Boltzmann machine FPGA implementation. In Proc. 2009 International Conference on Field Programmable Logic and Applications 367–372 (IEEE, 2009).

  31. Ardakani, A., Condo, C. & Gross, W. J. Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks. Preprint at https://arxiv.org/abs/1611.01427 (2016).

  32. Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771 (2002).

    Article  Google Scholar 

  33. Adachi, S. H. & Henderson, M. P. Application of quantum annealing to training of deep neural networks. Preprint at https://arxiv.org/abs/1510.06356 (2015).

  34. Dixit, V., Selvarajan, R., Alam, M. A., Humble, T. S. & Kais, S. Training restricted Boltzmann machines with a D-Wave quantum annealer. Front. Phys. 9, 589626 (2021).

    Article  Google Scholar 

  35. Böhm, F., Alonso-Urquijo, D., Verschaffelt, G. & Van der Sande, G. Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning. Nat. Commun. 13, 5847 (2022).

    Article  Google Scholar 

  36. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. International Conference on Machine Learning 2256–2265 (PMLR, 2015).

  37. Larochelle, H. & Bengio, Y. Classification using discriminative restricted Boltzmann machines. In Proc. 25th International Conference on Machine Learning 536–543 (Association for Computing Machinery, 2008).

  38. Tieleman, T. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proc. 25th International Conference on Machine Learning 1064–1071 (Association for Computing Machinery, 2008).

  39. LeCun, Y., Cortes, C. & Burges, C. J. C. The MNIST Database of Handwritten Digits (accessed 30 April 2020); http://yann.lecun.com/exdb/mnist/index.html

  40. Larochelle, H., Mandel, M., Pascanu, R. & Bengio, Y. Learning algorithms for the classification restricted boltzmann machine. J. Mach. Learn. Res. 13, 643 (2012).

    MathSciNet  Google Scholar 

  41. Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602 (2017).

    Article  MathSciNet  Google Scholar 

  42. Hu, H., Gao, L. & Ma, Q. Deep restricted Boltzmann networks. Preprint at https://arxiv.org/abs/1611.07917 (2016).

  43. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139 (2020).

    Article  Google Scholar 

  44. Levin, D. A. & Peres, Y. Markov Chains and Mixing Times Vol. 107 (American Mathematical Society, 2017).

  45. Brélaz, D. New methods to color the vertices of a graph. Commun. ACM 22, 251 (1979).

    Article  MathSciNet  Google Scholar 

  46. Bashar, M. K. & Shukla, N. Designing Ising machines with higher order spin interactions and their application in solving combinatorial optimization. Sci. Rep. 13, 9558 (2023).

    Article  Google Scholar 

  47. Bybee, C. et al. Efficient optimization with higher-order ising machines. Nat. Commun. 14, 6033 (2023).

    Article  Google Scholar 

  48. U250 Data Sheet (AMD Xilinx, 2023); https://www.xilinx.com/products/boards-and-kits/alveo/u250.html#documentation

  49. D-Wave Ocean Documentation: DNX Generators (D-Wave Systems Inc., 2021); https://docs.ocean.dwavesys.com/en/latest/docs_dnx/reference/generators.html

  50. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge discussions with J. Kaiser. We thank the AMD (Xilinx) University Programme (XUP) for the FPGA development boards and G. Eschemann for useful discussions on airhdl. This work is partially supported by an Office of Naval Research Young Investigator Program grant and a National Science Foundation CCF 2106260 grant. Part of this material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under the Air Force Research Laboratory (AFRL) Agreement No. FA8650-23-3-7313. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government.

Author information

Authors and Affiliations

Authors

Contributions

S.N. and K.Y.C. conceived the study. K.Y.C. supervised the study. S.N. and N.A.A. developed the hybrid FPGA–CPU implementation. S.N. and S.C. performed the benchmark RBM training. S.N. and N.A.A. performed the FPGA experiments to train sparse DBMs. S.N., N.A.A., M.M., S.C., Y.Q. and K.Y.C. discussed, analysed the experiments and participated in writing the paper.

Corresponding authors

Correspondence to Shaila Niazi or Kerem Y. Camsari.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Electronics thanks Suhas Kumar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Training sparse DBMs.

A pseudocode of the algorithm used in this work to train sparse DBMs.

Extended Data Table 1 Comparison of Sampling throughput

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–13 and Tables 1–3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niazi, S., Chowdhury, S., Aadit, N.A. et al. Training deep Boltzmann networks with sparse Ising machines. Nat Electron 7, 610–619 (2024). https://doi.org/10.1038/s41928-024-01182-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41928-024-01182-4

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics