Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113

PHYSICAL REVIEW A 96, 042113 (2017)
Deep learning and the Schrödinger equation
Kyle Mills*
Department of Physics, University of Ontario Institute of Technology, Oshawa, Ontario, Canada L1H 7K4
Michael Spanner
National Research Council of Canada, Ottawa, Ontario, Canada K1A 0R6
Isaac Tamblyn†
Department of Physics, University of Ontario Institute of Technology, Oshawa, Ontario, Canada L1H 7K4
and National Research Council of Canada, Ottawa, Ontario, Canada K1A 0R6
(Received 6 February 2017; revised manuscript received 8 June 2017; published 18 October 2017)
We have trained a deep (convolutional) neural network to predict the ground-state energy of an electron in
four classes of confining two-dimensional electrostatic potentials. On randomly generated potentials, for which
there is no analytic form for either the potential or the ground-state energy, the model was able to predict the
ground-state energy to within chemical accuracy, with a median absolute error of 1.49 mHa. We also investigated
the performance of the model in predicting other quantities such as the kinetic energy and the first excited-state
energy.
DOI: 10.1103/PhysRevA.96.042113
I. INTRODUCTION classification, where the performance of the traditional hand-

selected feature approach has stagnated [21].
Solving the electronic structure problem for molecules,
Such feature-based approaches are also being used in
materials, and interfaces is of fundamental importance to a
materials discovery [22–24] to assist materials scientists in
large number of disciplines including physics, chemistry, and
efficiently targeting promising material candidates. Unsuper-
materials science. Since the early development of quantum
vised learning techniques have been used to identify phases
mechanics, it has been noted, by Dirac among others, that in many-body atomic configurations [25]. In previous work,
“...approximate, practical methods of applying quantum mean artificial neural network was shown to interpolate the
chanics should be developed, which can lead to an explanation mapping of position to wave function for a specific electrostatic
of the main features of complex atomic systems without too potential [26–28], but the fit was not transferable, a limitation
much computation” [1]. Historically, this has meant invoking also present in other applications of artificial neural networks
approximate forms of the underlying interactions (mean field, to partial differential equations [29,30]. By transferable, we
tight binding, etc.) or relying on phenomenological fits to mean that a model trained on a particular form of partial
a limited number of either experimental observations or differential equation will accurately and reliably predict results
theoretical results (e.g., force fields) [2–8]. The development for examples of the same form (in our case, different confining
of feature-based models is not new in the scientific literature. potentials).
Indeed, prior even to the acceptance of the atomic hypothesis, Machine learning can also be used to accelerate or bypass
van der Waals argued for an equation of state based on two some of the heavy machinery of the ab initio method itself. In
physical features [9]. Machine learning (i.e., fitting parameters [31], the authors replaced the kinetic energy functional within
within a model) has been used in physics and chemistry since density-functional theory with a machine-learned one, and in
the dawn of the computer age. The term machine learning is [32,33], the authors “learned” the mappings from potential to
new; the approach is not. electron density and from charge density to kinetic energy,
More recently, high-level ab initio calculations have been respectively.
used to train artificial neural networks to fit high-dimensional Here we use a fundamentally different approach inspired
interaction models [10–15] and to make informed predictions by the successful application of deep convolutional neural
about material properties [16,17]. These approaches have networks to problems in computer vision [34–37] and com-
proven to be quite powerful, yielding models trained for putational games [38,39]. Rather than seeking an appropriate
specific atomic species or based upon hand-selected geomet- input representation to capture the relevant physical attributes
ric features [18–20]. Hand-selected features are arguably a of a system, we train a highly flexible model on an enormous
significant limitation of such approaches, with the outcomes collection of ground-truth examples. In doing so, the deep
dependent upon the choice of input representation and the neural network learns both the features (in weight space) and
inclusion of all relevant features. This limitation is well the mapping required to produce the desired output. This
known in the fields of handwriting recognition and image approach does not depend on the appropriate selection of
input representations and features; we provide the same data
to both the deep neural network and the numerical method. As
such, we call this featureless learning. Such an approach may
*
kyle.mills@uoit.net offer a more scalable and parallelizable approach to large-scale
† electronic structure problems than existing methods can offer.
isaac.tamblyn@nrc.ca
2469-9926/2017/96(4)/042113(9) 042113-1 ©2017 American Physical Society

KYLE MILLS, MICHAEL SPANNER, AND ISAAC TAMBLYN PHYSICAL REVIEW A 96, 042113 (2017)
In this paper we demonstrate the success of a featureless ma-

chine learning approach, a convolutional deep neural network,
at learning the mapping between a confining electrostatic
potential and quantities such as the ground-state energy,
kinetic energy, and first excited state of a bound electron. The
excellent performance of our model suggests deep learning as
an important direction for treating multielectron systems in
materials.
It is known that a sufficiently large artificial neural network
can approximate any continuous mapping [40,41], but the cost
of optimizing such a network can be prohibitive. Convolutional
neural networks make computation feasible by exploiting the FIG. 1. Wave functions (probability density) |ψ0 |2 and the corre-
spatial structure of input data [42], similar to how the neurons sponding potentials V (r) for two random potentials.
in the visual cortex function [43]. When multiple convolutional
layers are included, the network is called a deep convolutional
neural network, forming a hierarchy of feature detection [44].
This makes them particularly well suited to data rooted in
physical origin [45,46], since many physical systems also uniquely√define the ground-state energy of a single electron
display a structural hierarchy. Applications of such a network [ε0 = h̄2 ( kx + ky )]. Furthermore, these parameters repre-
structure in the field of electronic structure, however, are few sent a very physical and visible quantity: the curvature of the
(although recent work focused on training against a geometric potential in the two primary axes. Although these parameters
matrix representation looks particularly promising [47]). are not provided to the neural network explicitly, the fact that
a simple mapping exists means that the convolutional neural
network need only learn it to accurately predict energies.
II. METHODS A similar situation exists for the infinite well. Like the
A. Training set: Choice of potentials simple harmonic oscillator, the ground-state energy depends
only on the width of the well in the two dimensions
Developing a deep learning model involves both the design [ε0 = 12 π 2 h̄2 (L−2 −2
x + Ly )]. It would be no surprise if even
of the network architecture and the acquisition of training data. a modest network architecture is able to accurately “discover”
The latter is the most important aspect of a machine learning this mapping. An untrained human, given a ruler, sufficient
model, as it defines the transferability of the resulting model. examples, and an abundance of time, would likely succeed in
We investigated four classes of potentials: simple harmonic determining this mapping.
oscillators (SHOs), infinite wells (IWs) (i.e., particle in a box), The double-well inverted Gaussian data set is more complex
double-well inverted Gaussians (DIG), and random potentials. in two respects. First, the potential, generated by summing a
Each potential can be thought of as a grayscale image: a grid pair of two-dimensional (2D) Gaussians, depends on signifi-
of floating-point numbers. cantly more parameters; the depth, width, and aspect ratio of
each Gaussian; in addition, the relative positions of the wells
will impact the ground-state energy. Furthermore, there is no
B. Numerical solver
known analytical solution for a single electron in a potential
We implemented a standard finite-difference [48] method well of this nature. There is, however, still a concise function
to solve the eigenvalue problem that describes the underlying potential, and while this is not
Ĥ ψ ≡ (T̂ + V̂ )ψ = εψ (1) directly accessible to the convolutional neural network, one
must wonder if the existence of such simplifies the task of the
for each potential V we created. The potentials were generated convolutional neural network. Gaussian confining potentials
with a dynamic range and length scale suitable to produce appear in works relating to quantum dots [49,50].
ground-state energies within a physically relevant range. With The random data set presents the ultimate challenge. Each
the random potentials, special care was taken to ensure that random potential is generated by a multistep process with
some training examples produced nontrivial wave functions randomness introduced at numerous steps along the way.
(Fig. 1). Atomic units are used, such that h̄ = me = 1. The There is no closed-form equation to represent the potentials
potentials are represented on a square domain from −20 and certainly not the eigenenergies. A convolutional neural
to 20 a.u., discretized on a 256 × 256 grid. As the simple- network tasked with learning the solution to the Schrödinger
harmonic-oscillator potentials have an analytic solution, we equation through these examples would have to base its
used this as reference with which to validate the accuracy of predictions on many individual features, truly learning the
the solver. The median absolute error between the analytic mapping of potential to energy. One might question our
and the calculated energies for all simple-harmonic-oscillator omission of the Coulomb potential as an additional canonical
potentials was 0.12 mHa. We discuss the generation of all example. The singular nature of the Coulomb potential is
potentials further in the Appendixes. difficult to represent within a finite dynamic range and, more
The simple harmonic oscillator presents the simplest case importantly, the electronic structure methods that we would
for a convolutional neural network as there is an analytic ultimately seek to reproduce already have frameworks in place
solution dependent on two simple parameters (kx and ky ) that to deal with these singularities (e.g., pseudopotentials).
042113-2
DEEP LEARNING AND THE SCHRÖDINGER EQUATION PHYSICAL REVIEW A 96, 042113 (2017)
FIG. 2. In this work, we use the machinery of deep learning to learn the mapping between potential and energy, bypassing the need to
numerically solve the Schrödinger equation and the need for computing wave functions. The architecture we used (shown here) consisted
primarily of convolutional layers capable of extracting relevant features of the input potentials. Two fully connected layers at the end serve as
a decision layer, mapping the automatically extracted features to the desired output quantity. No manual feature selection is necessary; this is a
featureless-learning approach.
C. Deep neural network by a factor of 2 at each step. In between each pair of

We chose to use a simple yet deep neural network these reducing convolutional layers, we have inserted two
architecture (shown in Fig. 2) composed of a number of convolutional layers (for a total of 12) that operate with 16
repeated units of convolutional layers, with sizes chosen for a filters of size 4 × 4. These filters have unit stride and therefore
balance of speed and accuracy (inset of Fig. 3). We use two preserve the resolution of the image. The purpose of these
different types of convolutional layers, which we call reducing layers is to add additional trainable parameters to the network.
and nonreducing. All convolutional layers have rectified linear unit (ReLU)
The seven reducing layers operate with filter (kernel) sizes activation.
of 3 × 3 pixels. Each reducing layer operates with 64 filters The final convolutional layer is fed into a fully connected
and a stride of 2 × 2, effectively reducing the image resolution layer of width 1024, also with ReLU activation. This layer
feeds into a final fully connected layer with a single output.
This output is the output value of the deep neural network
(DNN). It is used to compute the mean-square error between
the true label and the predicted label, also known as the loss.
We used the AdaDelta [51] optimization scheme with a
global learning rate of 0.001 to minimize this loss function
(Fig. 3), monitoring its value as training proceeded. We found
that after 1000 epochs (1000 times through all the training
examples), the loss no longer decreased significantly.
We built a custom TensorFlow [52] implementation in
order to make use of four graphical processing units (GPUs)
in parallel. We placed a complete copy of the neural network
on each of the four GPUs, so that each could compute a
forward- and backpropagation iteration on one full batch of
images. Thus our effective batch size was 1000 images per
FIG. 3. Training loss curve for each model we trained. Since iteration (250 per GPU). After each iteration, the GPUs share
the training loss is based upon the training data sets, it does not their independently computed gradients with the optimizer
necessarily indicate how well the model generalizes to new examples. and the optimizer moves the parameters in the direction that
The convergence seen here indicates that 1000 epochs is an adequate minimizes the loss function. Unless otherwise specified, all
stopping point; further training would produce further reduction training data sets consisted of 200 000 training examples
in loss, however 1000 epochs provides sufficient evidence that and training was run for 1000 epochs. All reported errors are
the method performs well on the most interesting (i.e., random) based on evaluating the trained model on validation data sets
potentials. In the inset, we see that two nonreducing convolution consisting of 50 000 potentials not accessible to the network
layers is a consistent balance of training time and low error. during the training process.
042113-3
FIG. 4. Histograms of the true vs predicted energies for each example in the test set indicate the performance of the various models: (a)
simple harmonic oscillator, (b) infinite well, (c) DIG potential, (d) random potential, and (e) DIG potential on random model. The insets show
the distribution of error away from the diagonal line representing perfect predictions. A 1-mHa2 square bin is used for the main histograms and
a 1-mHa bin size for the inset histogram. During training, the neural network was not exposed to the examples on which theses plots are based.
The higher error at high energies in (d) is due to fewer training examples being present in the data set at these energies. The histogram shown
in (d) is for the further-trained model, described in the text.
III. RESULTS to what it would see in the double-well inverted Gaussian

data set. However, this moderate performance is a testament
Figures 4(a)–4(d) displays the results for the simple-
to the transferability of convolutional neural network models.
harmonic-oscillator, infinite well, double-well inverted Gaus-
Furthermore, we trained a model on an equal mixture of all four
sian, and random potentials. The simple harmonic oscillator,
classes of potentials. It performs moderately with a MAE of
being one of the simplest potentials, performed extremely well.
5.90 mHa. This error could be reduced through further tuning
The trained model was able to predict the ground-state energies
of the network architecture, allowing it to better capture the
with a median absolute error (MAE) of 1.51 mHa.
higher variation in the data set.
The infinite well potentials performed moderately well with
The total energy is just one of the many quantities
a MAE of 5.04 mHa. This is notably poorer than the simple-
associated with these one-electron systems. To demonstrate
harmonic-oscillator potentials, despite their similarity in being
the applicability of deep neural networks to other quantities,
analytically dependent upon two simple parameters. This is
we trained a model on the first excited-state energy ε1 of the
likely due to the sharp discontinuity associated with the infinite
double-well inverted Gaussian potentials. The model achieved
well potentials, combined with the sparsity of information
a MAE of 10.93 mHa. We now have two models capable
present in the binary-valued potentials.
of predicting the ground-state and first-excited-state energies
The model trained on the double-well inverted Gaussian
separately, demonstrating that a neural network can learn
potentials performed moderately well with a MAE of 2.70 mHa
quantities other than the ground-state energy.
and the random potentials performed quite well with a MAE
The ground-state and first excited state are both eigenvalues
of 2.13 mHa. We noticed, however, that the loss was not
of the Hamiltonian. Therefore, we investigated the training
completely converged at 1000 epochs, so we provided an
of a model on the expectation value of the kinetic energy
additional 200 000 training examples to the network and
T̂ = ψ0 |T̂ |ψ0 under the ground-state wave function ψ0 that
allowed it to train for an additional 1000 epochs. With this
we computed numerically for the random potentials. Since Ĥ
added training, the model performed exceptionally well, with
and T̂ do not commute, the prediction of T̂ can no longer be
a MAE of 1.49 mHa, below the threshold of chemical accuracy
summarized as an eigenvalue problem. The trained model pre-
(1 kcal/mol, 1.6 mHa). In Fig. 4(d), it is evident that the model
dicts the kinetic energy value with a MAE of 2.98 mHa. While
performs more poorly at high energies, a result of the relative
the spread of testing examples in Fig. 5(a) suggests that the
absence of high-energy training examples in the data set. Given
model performs more poorly, the absolute error is still small.
the great diversity in this latter set of potentials, it is impressive
that the convolutional neural network was able to learn how to
predict the energy with such a high degree of accuracy.
Now that we have a trained model that performs well on the
random test set, we investigated its transferability to another
class of potentials. The model trained on the random data
set is able to predict the ground-state energy of the double-
well inverted Gaussian potentials with a MAE of 2.94 mHa.
We can see in Fig. 4(e) that the model fails at high energies,
an expected result given that the model was not exposed to
many examples in this energy regime during training on the
overall lower-energy random data set. This moderately good
performance is not entirely surprising; the production of the FIG. 5. Histograms of the true vs predicted energies for the model
random potentials includes an element of Gaussian blurring, so trained on the (a) kinetic energy T̂ of the random potential and (b)
the neural network would have been exposed to features similar excited-state energy ε1 of the double-well inverted Gaussian.
042113-4
IV. CONCLUSION as well as the kinetic energy T̂ , and the first-excited-state
We note that many other machine learning algorithms exist energy ε1 . Although our focus here has been on a particular
and have traditionally seen great success, such as kernel ridge type of problem, namely, an electron in a confining 2D well,
regression [18,20,32,53–55] and random forests [18,56]. Like the concepts here are directly applicable to many problems
these algorithms, convolutional deep neural networks have in physics and engineering. Ultimately, we have demonstrated
the ability to learn relevant features and form a nonlinear the ability of a deep neural network to learn, through example
input-to-output mapping without prior formulation of an input alone, how to rapidly approximate the solution to a set of
representation [47,57]. In our tests, these methods performed partial differential equations. A generalizable, transferable
more poorly and scaled such that a large number of training deep learning approach to solving partial differential equations
examples is infeasible. We have included a comparison of would impact all fields of theoretical physics and mathematics.
these alternative machine learning methods in the Appendixes, ACKNOWLEDGMENTS
justifying our decision of using a deep convolutional neural
network. One notable limitation of our approach is that the The authors would like to acknowledge fruitful discussions
efficient training and evaluation of the deep neural network with P. Bunker, P. Darancet, D. Klug, and D. Prendergast. K.M.
requires uniformity in the input size. Future work should focus and I.T. acknowledge funding from NSERC and SOSCIP.
on an approach that would allow transferability to variable Compute resources were provided by SOSCIP, Compute
input sizes. Canada, National Research Council of Canada, and an
Additionally, an electrostatic potential defined on a finite NVIDIA Faculty Hardware Grant.
grid can be rotated in integer multiples of 90◦ , without a
APPENDIX A: COMPARISON OF MACHINE
change to the electrostatic energies. Convolutional deep neural
LEARNING METHODS
networks do not natively capture such rotational invariance.
Clearly, this is a problem in any application of deep neural One might question the use of a convolutional deep
networks (e.g., image classification) and various techniques are neural network over other more traditional machine learning
used to compensate for the desired invariance. The common approaches. After all, kernel ridge regression (KRR), random
approach is to train the network on an augmented data set forests (RF), and artificial neural networks (ANN) have
consisting of both the original training set and rotated copies proven to be quite useful (see the main text for references
of the training data [58]. In this way, the network learns a to appropriate work). Here we compare the use of our
rotationally invariant set of features. convolutional deep neural network approach to kernel ridge
In demonstration of this technique, we tuned our model regression and random forests, the latter two implemented
trained on the random potentials by training it further on an through Scikit-learn [63].
augmented data set of rotated random potentials. We then
tested our model on the original testing data set as well as a 1. Kernel ridge regression
rotated copy of the test set. The median absolute error in both
cases was less than 1.6 mHa. The median absolute difference We trained a kernel ridge regression model on a training set
in predicted energy between the rotated and unaltered test sets of simple-harmonic-oscillator images, recording the wall time
was however larger, at 1.7 mHa. This approach to training the
deep neural network is not absolutely rotationally invariant,
however the numerical error experienced due to a rotation
was on the same order as the error of the method itself.
Recent proposals to modify the network architecture itself to
make it rotationally invariant are promising, as the additional
training cost incurred with using an augmented data set could
be avoided [59,60].
In summary, convolutional deep neural networks are
promising candidates for application to electronic structure
calculations as they are designed for data that have a spatial
encoding of information. As the number of electrons in
a system increases, the computational complexity grows
polynomially. Accurate electronic structure methods (e.g.,
coupled cluster) exhibit a scaling with respect to the number of
FIG. 6. Kernel ridge regression on simple-harmonic-oscillator
particles of N 7 and even the popular Kohn-Sham formalism of potentials. When few training examples are provided, kernel ridge
density-functional theory scales as N 3 [61,62]. The evaluation regression performs better; however, with a larger number of
of a convolutional neural network exhibits no such scaling, training examples, both methods perform comparably, with DNN
and while the training process for more complicated systems slightly better. The training time for kernel ridge regression scales
would be more expensive, this is a one-time cost. quadratically. The evaluation time for a fixed number of testing
In this work we have taken a simple problem (one electron in examples scales linearly with respect to the number of training
a confining potential) and demonstrated that a convolutional examples in the case of kernel ridge regression. In the case of the
neural network can automatically extract features and learn deep neural network, the training set size does not affect the testing
the mapping between V (r) and the ground-state energy ε0 set evaluation.
042113-5
FIG. 7. Kernel ridge regression on random potentials. When few FIG. 9. Random forests on random potentials. On the more
training examples are present, kernel ridge regression performs better complicated random potentials, random forests perform significantly
(at constant training time). This is likely due to the fact that the worse than the deep neural network. This combined with the
DNN is only given 10 s to run. At larger training set sizes, the deep extremely high training time suggests that the deep neural network is
neural network performs much better; kernel ridge regression barely much better equipped to handle these more varied potentials.
improves as training set size increases, however training wall time
increases dramatically. 3. Discussion
While the timing comparison is not quantitatively fair (the
random forest algorithm is not parallelized and uses only one
(real-world time) taken to train the model. Then we evaluated CPU core, the kernel ridge regression algorithm is parallelized
the trained model on a test set (the same test set was used and ran across all available cores, and the deep neural network
throughout). We recorded both the evaluation wall time and is highly parallelized via GPU optimization and runs across
the MAE observed from the trained model. We then trained our thousands of cores), this investigation gives useful insight into
deep neural network on the same training data set, allowing the time-to-solution advantages of deep neural networks. The
it the same training wall time as the KRR model. We then error rates, however are quantitatively comparable, as the KRR
evaluated the deep neural network on the same testing set of and random forest (RF) algorithms were permitted to run until
data, again recording the MAE and the evaluation wall time. convergence. The DNN was able to perform better in most
This process was repeated for various training set sizes and cases given the same amount of wall time.
on training data from both the simple-harmonic-oscillator and We see that for all but the simplest cases, our deep neural
random data sets. The results are presented in Figs. 6 and 7. network is vastly superior to both kernel ridge regression and
random forests. For very simple potentials, it is understandable
that the machinery of the deep neural network was unnecessary
2. Random forests and that the traditional methods perform well. For more
We carried out an identical process, training a random complicated potentials with more variation in the input data,
forests regressor. The results are presented in Figs. 8 and 9. the deep neural network was able to provide significantly better
accuracy in the same amount of time.
APPENDIX B: DATA-SET GENERATION

The potentials are defined on a grid from x,y = −20 to
20 a.u. on a 256 × 256 grid.
1. Simple harmonic oscillator

The SHO potentials are generated with the scalar function
V (x,y) = 12 [kx (x − cx )2 + ky (y − cy )2 ], (B1)
TABLE I. Random number generation criteria for the simple-

harmonic-oscillator dataset.
Parameter Description Lower bound Upper bound

FIG. 8. Random forests on simple harmonic oscillator. Random
forests perform better than deep neural networks for all training kx spring constant 0.0 0.16
set sizes on the relatively trivial simple-harmonic-oscillator data set. ky spring constant 0.0 0.16
Random forests takes a very long time to train. Note that the training cx center position −8.0 8.0
times plotted above have been scaled by a factor of 0.1 for plotting cy center position −8.0 8.0
and thus the true times are ten times greater than shown.
042113-6
TABLE II. Random number generation criteria for the double-

well-inverted-Gaussian dataset.

A1 well 1 depth 2.0 4.0
A2 well 2 depth 2.0 4.0
cx1 well 1 center x −8.0 8.0
cy1 well 1 center y −8.0 8.0
cx2 well 2 center x −8.0 8.0
cy2 well 2 center y −8.0 8.0
kx1 well 1 width 1.6 8.0
ky1 well 1 length 1.6 8.0
kx2 well 2 width 1.6 8.0 FIG. 10. Some example random potentials V and the norm of
ky2 well 2 length 1.6 8.0 their associated ground-state wave functions |ψ0 |2 .
3. Double-well inverted Gaussians

where kx , ky , cx , and cy are randomly generated according to The DIG potentials are generated with the scalar function
Table I. The potentials are truncated at 20.0 Ha (i.e., if V > 20,

V = 20). x − cx1 2 y − cy1 2
V (x,y) = −A1 exp − −
kx1 ky1
2. Infinite well

x − cx2 2 y − cy2 2
The IW potentials are generated with the scalar function − A2 exp − − , (B4)
kx2 ky2
⎧
⎪
⎨0
1
(2cx − Lx ) < x 12 (2cx + Lx ),
2 where the parameters are randomly sampled from a uniform
V (x,y) = 1
(2cy − Ly ) < y 12 (2cy + Ly ) (B2) distribution within the ranges given in Table II. These ranges
⎪
⎩
2
20 otherwise, were determined through trial and error to achieve energies in
the range of 0–400 mHa.
where 20.0 is used as numerical infinity, an appropriate choice
given the scale of energies used. Because of the nature of
4. Random potentials
the IW energy, randomly generating Lx and Ly independently
leads to a distribution of energies highly biased toward low- The random potentials are generated through a lengthy
energy values (it is more likely to randomly produce a large process motivated by three requirements: The potentials must
well than a small). Since we want a distribution that is as (a) be random (i.e., extremely improbable that two identical
even as possible over the range of energies, we need to take a potentials ever be generated), (b) be smooth, and (c) go to a
slightly different approach. We randomly generate the energy maximum of 20.0 at the boundary. First, we generate a 16 × 16
E uniformly on the interval [0,0.4] Ha. We then generate Lx
randomly on the interval [4.0,15.0], defining the width of the
well. We then solve for the value of Ly that will produce an
energy of E, given Lx , e.g.,

2E 1
Ly = 1 − 2. (B3)
π2 Lx
Not all combinations of Lx and E lead to valid solutions for Ly ,

so we keep trying until one does. We then swap the values of
Lx and Ly with a 50% probability to prevent one dimension of
the well always being larger. This process leads to a relatively
even distribution of energies.
TABLE III. Random number generation criteria for the random

potential dataset. In column 2, SD denotes standard deviation.

σ1 SD blur 1 6 10
k blob points 2 7
R blob size 80 180
σ2 SD blur 2 10 16
FIG. 11. Examples of the four classes of potentials.
042113-7
binary grid of 1’s and 0’s and upscale it to 256 × 256. We the inside of this closed blob with 1’s and the outside with
then generate a second 16 × 16 binary grid and upscale it to 0’s. Resizing the blob to a resolution of R × R and applying
128 × 128. We center the smaller grid within the larger grid a Gaussian blur with standard deviation σ2 , we arrive at the
and then subtract them elementwise. We then apply a Gaussian final mask. Here R and σ2 are generated uniformly within the
blur with standard deviation σ1 to the resulting image, where ranges given in Table III.
σ1 is generated uniformly within the range given in Table III. Elementwise multiplication of the mask with the randomly
The potential is now random and smooth, but does not achieve blurred image gives a random potential that approaches zero at
a maximum at the boundary. the boundary. We randomize the “sharpness” of the potential
To achieve this, we generate a mask that smoothly goes to by then exponentiating by d = 0.1, 0.5, 1.0, or 2.0, chosen
0 at the boundary and 1 in the interior. We wish the mask to at random with equal probabilities (i.e., V := V d ). We then
be random, e.g., a randomly generated blob. To generate the subtract the result from its maximum to invert the well.
blob, we generate k 2 random coordinate pairs on a 200 × 200 This process, while lengthy, produces very random po-
grid, where k is an integer between 2 and 7, inclusive. We tentials, of which no two are alike. The energy range of
then throw away all points that lie inside the convex hull of 0–400 mHa is appropriate for producing wave functions that
these points and smoothly interpolate the remaining points span a moderate portion of the domain, as shown in Fig. 10.
with cubic splines. We then form a binary mask by filling Examples of all classes of potentials can be seen in Fig. 11.
[1] P. A. M. Dirac, Proc. R. Soc. London Ser. A 123, 714 (1929). [22] S. Curtarolo, D. Morgan, K. Persson, J. Rodgers, and G. Ceder,
[2] M. J. Cherukara, B. Narayanan, A. Kinaci, K. Sasikumar, S. K. Phys. Rev. Lett. 91, 135503 (2003).
Gray, M. K. Chan, and S. K. R. S. Sankaranarayanan, J. Phys. [23] G. Hautier, C. C. Fischer, A. Jain, T. Mueller, and G. Ceder,
Chem. Lett. 7, 3752 (2016). Chem. Mater. 22, 3762 (2010).
[3] M. Riera, A. W. Götz, and F. Paesani, Phys. Chem. Chem. Phys. [24] Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W.
18, 30334 (2016). Andreoni, Phys. Rev. B 85, 104104 (2012).
[4] A. Jaramillo-Botero, S. Naserifar, and W. A. Goddard, J. Chem. [25] L. Wang, Phys. Rev. B 94, 195105 (2016).
Theory Comput. 10, 1426 (2014). [26] C. Monterola and C. Saloma, Opt. Commun. 222, 331
[5] B. W. H. van Beest, G. J. Kramer, and R. A. van Santen, Phys. (2003).
Rev. Lett. 64, 1955 (1990). [27] Y. Shirvany, M. Hayati, and R. Moradian, Commun. Nonlinear
[6] J. W. Ponder and D. A. Case, Adv. Protein Chem. 66, 27 (2003). Sci. Numer. Simul. 13, 2132 (2008).
[7] V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, and [28] C. Caetano, J. L. Reis, J. Amorim, M. R. Lemes, and A. D. Pino,
C. Simmerling, Proteins 65, 712 (2006). Int. J. Quantum Chem. 111, 2732 (2011).
[8] D. J. Cole, M. C. Payne, G. Csányi, S. Mark Spearing, and L. [29] B. P. van Milligen, V. Tribaldos, and J. A. Jiménez, Phys. Rev.
Colombi Ciacchi, J. Chem. Phys. 127, 204704 (2007). Lett. 75, 3594 (1995).
[9] J. D. van der Waals, De continuiteit van den gasen Vloeistoftoe- [30] G. Carleo and M. Troyer, Science 355, 602 (2017).
stand, Ph.D. thesis, University of Leiden, 1873. [31] J. C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, and K. Burke,
[10] H. Z. Li, L. Li, Z. Y. Zhong, Y. Han, L. Hu, and Y. H. Lu, Math. Phys. Rev. Lett. 108, 253002 (2012).
Prob. Eng. 2013, 860357 (2013). [32] F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and
[11] J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007). K.-R. Müller, arXiv:1609.02815.
[12] T. Morawietz and J. Behler, J. Phys. Chem. A 117, 7356 (2013). [33] K. Yao and J. Parkhill, J. Chem. Theory Comput. 12, 1139
[13] J. Behler, R. Martonák, D. Donadio, and M. Parrinello, Phys. (2016).
Status Solidi B 245, 2618 (2008). [34] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Proc. IEEE 86,
[14] P. E. Dolgirev, I. A. Kruglov, and A. R. Oganov, AIP Adv. 6, 2278 (1998).
085318 (2016). [35] P. Simard, D. Steinkraus, and J. Platt, in Proceedings of the
[15] N. Artrith and A. Urban, Comput. Mater. Sci. 114, 135 (2016). Seventh International Conference on Document Analysis and
[16] W. Tian, F. Meng, L. Liu, Y. Li, and F. Wang, Sci. Rep. 7, 40827 Recognition (IEEE, Piscataway, 2003), Vol. 1, pp. 958–963.
(2017). [36] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and J.
[17] M. Rupp, A. Tkatchenko, K. R. Müller, and O. A. von Lilienfeld, Schmidhuber, in Proceedings of the 22nd International Joint
Phys. Rev. Lett. 108, 058301 (2012). Conference on Artificial Intelligence (AAAI, Palo Alto, 2011),
[18] F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Vol. 22, pp. 1237–1242.
Schoenholz, G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Riley, [37] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
and O. A. von Lilienfeld, J. Chem. Theory Comput. (2017), D. Erhan, V. Vanhoucke, and A. Rabinovich, in Proceedings of
doi: 10.1021/acs.jctc.7b00577. the IEEE Computer Society Conference on Computer Vision and
[19] G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Pattern Recognition (IEEE, Piscataway, 2015), pp. 1–9.
Hansen, A. Tkatchenko, K. R. Müller, and O. Anatole von [38] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre,
Lilienfeld, New J. Phys. 15, 095003 (2013). G. van den Driessche, J. Schrittwieser, I. Antonoglou, V.
[20] A. Lopez-Bezanilla and O. A. von Lilienfeld, Phys. Rev. B 89, Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J.
235411 (2014). Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,
[21] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. K. Kavukcuoglu, T. Graepel, and D. Hassabis, Nature (London)
Girshick, S. Guadarrama, and T. Darrell, arXiv:1408.5093. 529, 484 (2016).
042113-8
[39] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, [53] L. F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld, and
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, A. J. Millis, Phys. Rev. B 90, 155136 (2014).
G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, [54] L. Li, J. C. Snyder, I. M. Pelaschier, J. Huang, U.-N. Niranjan, P.
H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Duncan, M. Rupp, K.-R. Müller, and K. Burke, Int. J. Quantum
Nature (London) 518, 529 (2015). Chem. 116, 819 (2016).
[40] K. I. Funahashi, Neural Networks 2, 183 (1989). [55] T. Suzuki, R. Tamura, and T. Miyazaki, Int. J. Quantum Chem.
[41] J. L. Castro, C. J. Mantas, and J. M. Benítez, Neural Networks 117, 33 (2017).
13, 561 (2000). [56] L. Ward, A. Agrawal, A. Choudhary, and C. Wolverton, Npj
[42] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Proceedings Comp. Mat. 2, 16028 (2016).
of the 25th International Conference on Neural Information [57] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley,
Processing Systems (Curran, Red Hook, 2012), pp. 1097–1105. J. Comput.-Aided Mol. Des. 30, 595 (2016).
[43] D. H. Hubel and T. N. Wiesel, J. Physiol. 195, 215 (1968). [58] S. Dieleman, K. W. Willett, and J. Dambre, Mon. Not. R. Astron.
[44] Y. Bengio, Found. Trends Mach. Learn. 2, 1 (2009). Soc. 450, 1441 (2015).
[45] P. Mehta and D. J. Schwab, arXiv:1410.3831. [59] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J.
[46] H. W. Lin, M. Tegmark, and D. Rolnick, J. Stat. Phys. 168, 1223 Brostow, arXiv:1612.04642.
(2017). [60] S. Dieleman, J. De Fauw, and K. Kavukcuoglu,
[47] K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. arXiv:1602.02660.
Tkatchenko, Nat. Commun. 8, 13890 (2017). [61] W. Kohn, Int. J. Quantum Chem. 56, 229 (1995).
[48] P. Frolkovič, Acta Applicandae Mathematicae, 3rd ed. (Cam- [62] S. A. Kucharski and R. J. Bartlett, J. Chem. Phys. 97, 4282
bridge University Press, New York, 1990), Vol. 19, pp. 297–299. (1992).
[49] A. Gharaati and R. Khordad, Superlatt. Microstruct. 48, 276 [63] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
(2010). O. Grisel, M. Blondel, G. Louppe, P. Prettenhofer, R. Weiss, V.
[50] S. S. Gomez and R. H. Romero, Cent. Eur. J. Phys. 7, 12 (2009). Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
[51] M. D. Zeiler, arXiv:1212.5701. M. Perrot, and É. Duchesnay, J. Mach. Learn. Res. 12, 2825
[52] M. Abadi et al., arXiv:1603.04467 (2015). (2011).
042113-9

Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113

Uploaded by

Copyright:

Available Formats

Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning and The Schrödinger Equation: 10.1103/physreva.96.042113

Uploaded by

Copyright:

Available Formats

PHYSICAL REVIEW A 96, 042113 (2017)

Deep learning and the Schrödinger equation

I. INTRODUCTION classification, where the performance of the traditional hand-

2469-9926/2017/96(4)/042113(9) 042113-1 ©2017 American Physical Society

In this paper we demonstrate the success of a featureless ma-

C. Deep neural network by a factor of 2 at each step. In between each pair of

III. RESULTS to what it would see in the double-well inverted Gaussian

APPENDIX B: DATA-SET GENERATION

1. Simple harmonic oscillator

TABLE I. Random number generation criteria for the simple-

Parameter Description Lower bound Upper bound

TABLE II. Random number generation criteria for the double-

Parameter Description Lower bound Upper bound

3. Double-well inverted Gaussians

Not all combinations of Lx and E lead to valid solutions for Ly ,

TABLE III. Random number generation criteria for the random

Parameter Description Lower bound Upper bound

You might also like