Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113
Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113
Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113
Kyle Mills*
Department of Physics, University of Ontario Institute of Technology, Oshawa, Ontario, Canada L1H 7K4
Michael Spanner
National Research Council of Canada, Ottawa, Ontario, Canada K1A 0R6
Isaac Tamblyn†
Department of Physics, University of Ontario Institute of Technology, Oshawa, Ontario, Canada L1H 7K4
and National Research Council of Canada, Ottawa, Ontario, Canada K1A 0R6
(Received 6 February 2017; revised manuscript received 8 June 2017; published 18 October 2017)
We have trained a deep (convolutional) neural network to predict the ground-state energy of an electron in
four classes of confining two-dimensional electrostatic potentials. On randomly generated potentials, for which
there is no analytic form for either the potential or the ground-state energy, the model was able to predict the
ground-state energy to within chemical accuracy, with a median absolute error of 1.49 mHa. We also investigated
the performance of the model in predicting other quantities such as the kinetic energy and the first excited-state
energy.
DOI: 10.1103/PhysRevA.96.042113
042113-2
DEEP LEARNING AND THE SCHRÖDINGER EQUATION PHYSICAL REVIEW A 96, 042113 (2017)
FIG. 2. In this work, we use the machinery of deep learning to learn the mapping between potential and energy, bypassing the need to
numerically solve the Schrödinger equation and the need for computing wave functions. The architecture we used (shown here) consisted
primarily of convolutional layers capable of extracting relevant features of the input potentials. Two fully connected layers at the end serve as
a decision layer, mapping the automatically extracted features to the desired output quantity. No manual feature selection is necessary; this is a
featureless-learning approach.
042113-3
KYLE MILLS, MICHAEL SPANNER, AND ISAAC TAMBLYN PHYSICAL REVIEW A 96, 042113 (2017)
FIG. 4. Histograms of the true vs predicted energies for each example in the test set indicate the performance of the various models: (a)
simple harmonic oscillator, (b) infinite well, (c) DIG potential, (d) random potential, and (e) DIG potential on random model. The insets show
the distribution of error away from the diagonal line representing perfect predictions. A 1-mHa2 square bin is used for the main histograms and
a 1-mHa bin size for the inset histogram. During training, the neural network was not exposed to the examples on which theses plots are based.
The higher error at high energies in (d) is due to fewer training examples being present in the data set at these energies. The histogram shown
in (d) is for the further-trained model, described in the text.
042113-4
DEEP LEARNING AND THE SCHRÖDINGER EQUATION PHYSICAL REVIEW A 96, 042113 (2017)
IV. CONCLUSION as well as the kinetic energy T̂ , and the first-excited-state
We note that many other machine learning algorithms exist energy ε1 . Although our focus here has been on a particular
and have traditionally seen great success, such as kernel ridge type of problem, namely, an electron in a confining 2D well,
regression [18,20,32,53–55] and random forests [18,56]. Like the concepts here are directly applicable to many problems
these algorithms, convolutional deep neural networks have in physics and engineering. Ultimately, we have demonstrated
the ability to learn relevant features and form a nonlinear the ability of a deep neural network to learn, through example
input-to-output mapping without prior formulation of an input alone, how to rapidly approximate the solution to a set of
representation [47,57]. In our tests, these methods performed partial differential equations. A generalizable, transferable
more poorly and scaled such that a large number of training deep learning approach to solving partial differential equations
examples is infeasible. We have included a comparison of would impact all fields of theoretical physics and mathematics.
these alternative machine learning methods in the Appendixes, ACKNOWLEDGMENTS
justifying our decision of using a deep convolutional neural
network. One notable limitation of our approach is that the The authors would like to acknowledge fruitful discussions
efficient training and evaluation of the deep neural network with P. Bunker, P. Darancet, D. Klug, and D. Prendergast. K.M.
requires uniformity in the input size. Future work should focus and I.T. acknowledge funding from NSERC and SOSCIP.
on an approach that would allow transferability to variable Compute resources were provided by SOSCIP, Compute
input sizes. Canada, National Research Council of Canada, and an
Additionally, an electrostatic potential defined on a finite NVIDIA Faculty Hardware Grant.
grid can be rotated in integer multiples of 90◦ , without a
APPENDIX A: COMPARISON OF MACHINE
change to the electrostatic energies. Convolutional deep neural
LEARNING METHODS
networks do not natively capture such rotational invariance.
Clearly, this is a problem in any application of deep neural One might question the use of a convolutional deep
networks (e.g., image classification) and various techniques are neural network over other more traditional machine learning
used to compensate for the desired invariance. The common approaches. After all, kernel ridge regression (KRR), random
approach is to train the network on an augmented data set forests (RF), and artificial neural networks (ANN) have
consisting of both the original training set and rotated copies proven to be quite useful (see the main text for references
of the training data [58]. In this way, the network learns a to appropriate work). Here we compare the use of our
rotationally invariant set of features. convolutional deep neural network approach to kernel ridge
In demonstration of this technique, we tuned our model regression and random forests, the latter two implemented
trained on the random potentials by training it further on an through Scikit-learn [63].
augmented data set of rotated random potentials. We then
tested our model on the original testing data set as well as a 1. Kernel ridge regression
rotated copy of the test set. The median absolute error in both
cases was less than 1.6 mHa. The median absolute difference We trained a kernel ridge regression model on a training set
in predicted energy between the rotated and unaltered test sets of simple-harmonic-oscillator images, recording the wall time
was however larger, at 1.7 mHa. This approach to training the
deep neural network is not absolutely rotationally invariant,
however the numerical error experienced due to a rotation
was on the same order as the error of the method itself.
Recent proposals to modify the network architecture itself to
make it rotationally invariant are promising, as the additional
training cost incurred with using an augmented data set could
be avoided [59,60].
In summary, convolutional deep neural networks are
promising candidates for application to electronic structure
calculations as they are designed for data that have a spatial
encoding of information. As the number of electrons in
a system increases, the computational complexity grows
polynomially. Accurate electronic structure methods (e.g.,
coupled cluster) exhibit a scaling with respect to the number of
FIG. 6. Kernel ridge regression on simple-harmonic-oscillator
particles of N 7 and even the popular Kohn-Sham formalism of potentials. When few training examples are provided, kernel ridge
density-functional theory scales as N 3 [61,62]. The evaluation regression performs better; however, with a larger number of
of a convolutional neural network exhibits no such scaling, training examples, both methods perform comparably, with DNN
and while the training process for more complicated systems slightly better. The training time for kernel ridge regression scales
would be more expensive, this is a one-time cost. quadratically. The evaluation time for a fixed number of testing
In this work we have taken a simple problem (one electron in examples scales linearly with respect to the number of training
a confining potential) and demonstrated that a convolutional examples in the case of kernel ridge regression. In the case of the
neural network can automatically extract features and learn deep neural network, the training set size does not affect the testing
the mapping between V (r) and the ground-state energy ε0 set evaluation.
042113-5
KYLE MILLS, MICHAEL SPANNER, AND ISAAC TAMBLYN PHYSICAL REVIEW A 96, 042113 (2017)
FIG. 7. Kernel ridge regression on random potentials. When few FIG. 9. Random forests on random potentials. On the more
training examples are present, kernel ridge regression performs better complicated random potentials, random forests perform significantly
(at constant training time). This is likely due to the fact that the worse than the deep neural network. This combined with the
DNN is only given 10 s to run. At larger training set sizes, the deep extremely high training time suggests that the deep neural network is
neural network performs much better; kernel ridge regression barely much better equipped to handle these more varied potentials.
improves as training set size increases, however training wall time
increases dramatically. 3. Discussion
While the timing comparison is not quantitatively fair (the
random forest algorithm is not parallelized and uses only one
(real-world time) taken to train the model. Then we evaluated CPU core, the kernel ridge regression algorithm is parallelized
the trained model on a test set (the same test set was used and ran across all available cores, and the deep neural network
throughout). We recorded both the evaluation wall time and is highly parallelized via GPU optimization and runs across
the MAE observed from the trained model. We then trained our thousands of cores), this investigation gives useful insight into
deep neural network on the same training data set, allowing the time-to-solution advantages of deep neural networks. The
it the same training wall time as the KRR model. We then error rates, however are quantitatively comparable, as the KRR
evaluated the deep neural network on the same testing set of and random forest (RF) algorithms were permitted to run until
data, again recording the MAE and the evaluation wall time. convergence. The DNN was able to perform better in most
This process was repeated for various training set sizes and cases given the same amount of wall time.
on training data from both the simple-harmonic-oscillator and We see that for all but the simplest cases, our deep neural
random data sets. The results are presented in Figs. 6 and 7. network is vastly superior to both kernel ridge regression and
random forests. For very simple potentials, it is understandable
that the machinery of the deep neural network was unnecessary
2. Random forests and that the traditional methods perform well. For more
We carried out an identical process, training a random complicated potentials with more variation in the input data,
forests regressor. The results are presented in Figs. 8 and 9. the deep neural network was able to provide significantly better
accuracy in the same amount of time.
042113-6
DEEP LEARNING AND THE SCHRÖDINGER EQUATION PHYSICAL REVIEW A 96, 042113 (2017)
042113-7
KYLE MILLS, MICHAEL SPANNER, AND ISAAC TAMBLYN PHYSICAL REVIEW A 96, 042113 (2017)
binary grid of 1’s and 0’s and upscale it to 256 × 256. We the inside of this closed blob with 1’s and the outside with
then generate a second 16 × 16 binary grid and upscale it to 0’s. Resizing the blob to a resolution of R × R and applying
128 × 128. We center the smaller grid within the larger grid a Gaussian blur with standard deviation σ2 , we arrive at the
and then subtract them elementwise. We then apply a Gaussian final mask. Here R and σ2 are generated uniformly within the
blur with standard deviation σ1 to the resulting image, where ranges given in Table III.
σ1 is generated uniformly within the range given in Table III. Elementwise multiplication of the mask with the randomly
The potential is now random and smooth, but does not achieve blurred image gives a random potential that approaches zero at
a maximum at the boundary. the boundary. We randomize the “sharpness” of the potential
To achieve this, we generate a mask that smoothly goes to by then exponentiating by d = 0.1, 0.5, 1.0, or 2.0, chosen
0 at the boundary and 1 in the interior. We wish the mask to at random with equal probabilities (i.e., V := V d ). We then
be random, e.g., a randomly generated blob. To generate the subtract the result from its maximum to invert the well.
blob, we generate k 2 random coordinate pairs on a 200 × 200 This process, while lengthy, produces very random po-
grid, where k is an integer between 2 and 7, inclusive. We tentials, of which no two are alike. The energy range of
then throw away all points that lie inside the convex hull of 0–400 mHa is appropriate for producing wave functions that
these points and smoothly interpolate the remaining points span a moderate portion of the domain, as shown in Fig. 10.
with cubic splines. We then form a binary mask by filling Examples of all classes of potentials can be seen in Fig. 11.
[1] P. A. M. Dirac, Proc. R. Soc. London Ser. A 123, 714 (1929). [22] S. Curtarolo, D. Morgan, K. Persson, J. Rodgers, and G. Ceder,
[2] M. J. Cherukara, B. Narayanan, A. Kinaci, K. Sasikumar, S. K. Phys. Rev. Lett. 91, 135503 (2003).
Gray, M. K. Chan, and S. K. R. S. Sankaranarayanan, J. Phys. [23] G. Hautier, C. C. Fischer, A. Jain, T. Mueller, and G. Ceder,
Chem. Lett. 7, 3752 (2016). Chem. Mater. 22, 3762 (2010).
[3] M. Riera, A. W. Götz, and F. Paesani, Phys. Chem. Chem. Phys. [24] Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W.
18, 30334 (2016). Andreoni, Phys. Rev. B 85, 104104 (2012).
[4] A. Jaramillo-Botero, S. Naserifar, and W. A. Goddard, J. Chem. [25] L. Wang, Phys. Rev. B 94, 195105 (2016).
Theory Comput. 10, 1426 (2014). [26] C. Monterola and C. Saloma, Opt. Commun. 222, 331
[5] B. W. H. van Beest, G. J. Kramer, and R. A. van Santen, Phys. (2003).
Rev. Lett. 64, 1955 (1990). [27] Y. Shirvany, M. Hayati, and R. Moradian, Commun. Nonlinear
[6] J. W. Ponder and D. A. Case, Adv. Protein Chem. 66, 27 (2003). Sci. Numer. Simul. 13, 2132 (2008).
[7] V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, and [28] C. Caetano, J. L. Reis, J. Amorim, M. R. Lemes, and A. D. Pino,
C. Simmerling, Proteins 65, 712 (2006). Int. J. Quantum Chem. 111, 2732 (2011).
[8] D. J. Cole, M. C. Payne, G. Csányi, S. Mark Spearing, and L. [29] B. P. van Milligen, V. Tribaldos, and J. A. Jiménez, Phys. Rev.
Colombi Ciacchi, J. Chem. Phys. 127, 204704 (2007). Lett. 75, 3594 (1995).
[9] J. D. van der Waals, De continuiteit van den gasen Vloeistoftoe- [30] G. Carleo and M. Troyer, Science 355, 602 (2017).
stand, Ph.D. thesis, University of Leiden, 1873. [31] J. C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, and K. Burke,
[10] H. Z. Li, L. Li, Z. Y. Zhong, Y. Han, L. Hu, and Y. H. Lu, Math. Phys. Rev. Lett. 108, 253002 (2012).
Prob. Eng. 2013, 860357 (2013). [32] F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and
[11] J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007). K.-R. Müller, arXiv:1609.02815.
[12] T. Morawietz and J. Behler, J. Phys. Chem. A 117, 7356 (2013). [33] K. Yao and J. Parkhill, J. Chem. Theory Comput. 12, 1139
[13] J. Behler, R. Martonák, D. Donadio, and M. Parrinello, Phys. (2016).
Status Solidi B 245, 2618 (2008). [34] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Proc. IEEE 86,
[14] P. E. Dolgirev, I. A. Kruglov, and A. R. Oganov, AIP Adv. 6, 2278 (1998).
085318 (2016). [35] P. Simard, D. Steinkraus, and J. Platt, in Proceedings of the
[15] N. Artrith and A. Urban, Comput. Mater. Sci. 114, 135 (2016). Seventh International Conference on Document Analysis and
[16] W. Tian, F. Meng, L. Liu, Y. Li, and F. Wang, Sci. Rep. 7, 40827 Recognition (IEEE, Piscataway, 2003), Vol. 1, pp. 958–963.
(2017). [36] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and J.
[17] M. Rupp, A. Tkatchenko, K. R. Müller, and O. A. von Lilienfeld, Schmidhuber, in Proceedings of the 22nd International Joint
Phys. Rev. Lett. 108, 058301 (2012). Conference on Artificial Intelligence (AAAI, Palo Alto, 2011),
[18] F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Vol. 22, pp. 1237–1242.
Schoenholz, G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Riley, [37] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
and O. A. von Lilienfeld, J. Chem. Theory Comput. (2017), D. Erhan, V. Vanhoucke, and A. Rabinovich, in Proceedings of
doi: 10.1021/acs.jctc.7b00577. the IEEE Computer Society Conference on Computer Vision and
[19] G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Pattern Recognition (IEEE, Piscataway, 2015), pp. 1–9.
Hansen, A. Tkatchenko, K. R. Müller, and O. Anatole von [38] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre,
Lilienfeld, New J. Phys. 15, 095003 (2013). G. van den Driessche, J. Schrittwieser, I. Antonoglou, V.
[20] A. Lopez-Bezanilla and O. A. von Lilienfeld, Phys. Rev. B 89, Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J.
235411 (2014). Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,
[21] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. K. Kavukcuoglu, T. Graepel, and D. Hassabis, Nature (London)
Girshick, S. Guadarrama, and T. Darrell, arXiv:1408.5093. 529, 484 (2016).
042113-8
DEEP LEARNING AND THE SCHRÖDINGER EQUATION PHYSICAL REVIEW A 96, 042113 (2017)
[39] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, [53] L. F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld, and
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, A. J. Millis, Phys. Rev. B 90, 155136 (2014).
G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, [54] L. Li, J. C. Snyder, I. M. Pelaschier, J. Huang, U.-N. Niranjan, P.
H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Duncan, M. Rupp, K.-R. Müller, and K. Burke, Int. J. Quantum
Nature (London) 518, 529 (2015). Chem. 116, 819 (2016).
[40] K. I. Funahashi, Neural Networks 2, 183 (1989). [55] T. Suzuki, R. Tamura, and T. Miyazaki, Int. J. Quantum Chem.
[41] J. L. Castro, C. J. Mantas, and J. M. Benítez, Neural Networks 117, 33 (2017).
13, 561 (2000). [56] L. Ward, A. Agrawal, A. Choudhary, and C. Wolverton, Npj
[42] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Proceedings Comp. Mat. 2, 16028 (2016).
of the 25th International Conference on Neural Information [57] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley,
Processing Systems (Curran, Red Hook, 2012), pp. 1097–1105. J. Comput.-Aided Mol. Des. 30, 595 (2016).
[43] D. H. Hubel and T. N. Wiesel, J. Physiol. 195, 215 (1968). [58] S. Dieleman, K. W. Willett, and J. Dambre, Mon. Not. R. Astron.
[44] Y. Bengio, Found. Trends Mach. Learn. 2, 1 (2009). Soc. 450, 1441 (2015).
[45] P. Mehta and D. J. Schwab, arXiv:1410.3831. [59] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J.
[46] H. W. Lin, M. Tegmark, and D. Rolnick, J. Stat. Phys. 168, 1223 Brostow, arXiv:1612.04642.
(2017). [60] S. Dieleman, J. De Fauw, and K. Kavukcuoglu,
[47] K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. arXiv:1602.02660.
Tkatchenko, Nat. Commun. 8, 13890 (2017). [61] W. Kohn, Int. J. Quantum Chem. 56, 229 (1995).
[48] P. Frolkovič, Acta Applicandae Mathematicae, 3rd ed. (Cam- [62] S. A. Kucharski and R. J. Bartlett, J. Chem. Phys. 97, 4282
bridge University Press, New York, 1990), Vol. 19, pp. 297–299. (1992).
[49] A. Gharaati and R. Khordad, Superlatt. Microstruct. 48, 276 [63] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
(2010). O. Grisel, M. Blondel, G. Louppe, P. Prettenhofer, R. Weiss, V.
[50] S. S. Gomez and R. H. Romero, Cent. Eur. J. Phys. 7, 12 (2009). Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
[51] M. D. Zeiler, arXiv:1212.5701. M. Perrot, and É. Duchesnay, J. Mach. Learn. Res. 12, 2825
[52] M. Abadi et al., arXiv:1603.04467 (2015). (2011).
042113-9