Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Accelerating Materials Property Predictions Using Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

OPEN Accelerating materials property

SUBJECT AREAS:
predictions using machine learning
POLYMERS Ghanshyam Pilania1, Chenchen Wang1, Xun Jiang2, Sanguthevar Rajasekaran3
ELECTRONIC STRUCTURE & Ramamurthy Ramprasad1
COMPUTATIONAL METHODS
ELECTRONIC PROPERTIES AND 1
Department of Materials Science and Engineering, University of Connecticut, 97 North Eagleville Road, Storrs, Connecticut 06269,
MATERIALS 2
Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, Connecticut 06269, 3Department of Computer
Science and Engineering, University of Connecticut, 371 Fairfield Road, Storrs, Connecticut 06269.

Received
25 June 2013 The materials discovery process can be significantly expedited and simplified if we can learn effectively from
available knowledge and data. In the present contribution, we show that efficient and accurate prediction of
Accepted a diverse set of properties of material systems is possible by employing machine (or statistical) learning
9 September 2013 methods trained on quantum mechanical computations in combination with the notions of chemical
Published similarity. Using a family of one-dimensional chain systems, we present a general formalism that allows us
30 September 2013 to discover decision rules that establish a mapping between easily accessible attributes of a system and its
properties. It is shown that fingerprints based on either chemo-structural (compositional and
configurational information) or the electronic charge density distribution can be used to make ultra-fast, yet
accurate, property predictions. Harnessing such learning paradigms extends recent efforts to systematically
Correspondence and explore and mine vast chemical spaces, and can significantly accelerate the discovery of new
requests for materials application-specific materials.
should be addressed to

O
R.R. (rampi@uconn. wing to the staggering compositional and configurational degrees of freedom possible in materials, it is
edu) fair to assume that the chemical space of even a restricted subclass of materials (say, involving just two
elements) is far from being exhausted, and an enormous number of new materials with useful properties
are yet to be discovered. Given this formidable chemical landscape, a fundamental bottleneck to an efficient
materials discovery process is the lack of suitable methods to rapidly and accurately predict the properties of a vast
array (within a subclass) of new yet-to-be-synthesized materials. The standard approaches adopted thus far
involve either expensive and lengthy Edisonian synthesis-testing experimental cycles, or laborious and time-
intensive computations, performed on a case-by-case manner. Moreover, neither of these approaches is able to
readily unearth Hume-Rothery-like ‘‘hidden’’ semi-empirical rules that govern materials behavior.
The present contribution, aimed at materials property predictions, falls under a radically different paradigm1,2,
namely, machine (or statistical) learning—a topic central to network theory3, cognitive game theory4,5, pattern
recognition6–8, artificial intelligence9,10, and event forecasting11. We show that such learning methods may be used
to establish a mapping between a suitable representation of a material (i.e., its ‘fingerprint’ or its ‘profile’) and any
or all of its properties using known historic, or intentionally generated, data. The material fingerprint or profile
can be coarse-level chemo-structural descriptors, or something as fundamental as the electronic charge density,
both of which are explored here. Subsequently, once the profile u property mapping has been established, the
properties of a vast number of new materials within the same subclass may then be directly predicted (and
correlations between properties may be unearthed) at negligible computational cost, thereby completely by-
passing the conventional laborious approaches towards material property determination alluded to above. In its
most simplified form, this scheme is inspired by the intuition that (dis)similar materials will have (dis)similar
properties. Needless to say, training of this intuition requires a critical amount of prior diverse information/
results12–16 and robust learning devices12,17–22.
The central problem in learning approaches is to come up with decision rules that will allow us to establish a
mapping between measurable (and easily accessible) attributes of a system and its properties. Quantum
mechanics (here employed within the framework of density functional theory, DFT)23,24, provides us with such
a decision rule that connects the wave function (or charge density) with properties via the Schrödinger’s (or the
Kohn-Sham) equation. Here, we hope to replace the rather cumbersome rule based on the Schrödinger’s or Kohn-
Sham equation with a module based on similarity-based machine learning. The essential ingredients of the
proposed scheme is captured schematically in Figure 1.

SCIENTIFIC REPORTS | 3 : 2810 | DOI: 10.1038/srep02810 1


www.nature.com/scientificreports

Figure 1 | The machine (or statistical) learning methodology. First, material motifs within a class are reduced to numerical fingerprint vectors. Next, a
suitable measure of chemical (dis)similarity, or chemical distance, is used within a learning scheme—in this case, kernel ridge regression—to map the
distances to properties.

Results missing in above vector as it is not an independent quantity owing


P
The ideal testing ground for such a paradigm is a case where a parent to the relation: f7 ~1{ 6i~1 fi ). One may generalize the above vec-
material is made to undergo systematic chemical and/or configura- tor to include all possible i 2 j pairs, i 2 j 2 k triplets, i 2 j 2 k 2 l
tional variations, for which controlled initial training and test data quadruplets, etc., but such extensions were found to be unnecessary
can be generated. In the present investigation, we consider infinite as the chosen 20-component vector was able to satisfactorily codify
polymer chains—quasi 1-d material motifs (Figure 1)—with their the information content of the polymeric chains.
building blocks drawn from a pool of the following seven possibil- Next, a suitable measure of chemical distance is defined to allow for
ities: CH2, SiF2, SiCl2, GeF2, GeCl2, SnF2, and SnCl2. Setting all the a quantification of the degree of (dis)similarity between any two
building blocks of a chain to be CH2 leads to polyethylene (PE), a fingerprint vectors. Consider two systems a and b with fingerprint
common, inexpensive polymeric insulator. The rationale for intro- vectors ~F a and ~ F b . The similarity of the two vectors may be measured
ducing the other Group IV halides is to interrogate the beneficial
in many ways, e.g., using the Euclidean  norm of the difference
effects (if any) these blocks may have on various properties when
introduced in a base polymer such as PE. The properties that we will between the two vectors, ~ F b , or the dot product of the two
F a {~
focus on include: the atomization energy, the formation energy, the F a :~
vectors ~ b
F . In the present work, we  abuse
 the former, which we refer
lattice constant, the spring constant, the band gap, the electron affin- to as ~F ab  (Figure 1). Clearly, if ~ F ~0, materials a and b are
ity, and the optical and static components of the dielectric constant. equivalent (insofar as we can conclude based on the fingerprint
The initial dataset for 175 such material motifs containing 4 building vectors), and their property values Pa and Pb are the same. When
 ab 
blocks per repeat unit was generated using DFT. ~ 
F =0, materials a and b are not equivalent, and Pa 2 Pb is not
The first step in the mapping process prescribed in the panels of  ab 
Figure 1 is to reduce each material system under inquiry to a string of necessarily zero, and depends on ~ F . This observation may be
numbers—we refer to this string as the fingerprint vector. For the formally quantified when we have a prior materials-property dataset,
specific case under consideration here, namely, polymeric chains in which case we can  abdetermine
 the parametric dependence of the
composed of seven possible building blocks, the following coarse- property values on ~ F .
level chemo-structural fingerprint vector was considered first: jf1, …, In the present work, we apply the machine learning algorithm
f6, g1, …, g7, h1, …, h7æ, where fi, gi and hi are, respectively, the number referred to as kernel ridge regression (KRR)25,26, to our family of
of building blocks of type i, number of i 2 i pairs, and number of i 2 i polymeric chains. Technical details on the KRR methodology are
2 i triplets, normalized to total number of units (note that f7 is provided in the Methods section of the manuscript. As mentioned

SCIENTIFIC REPORTS | 3 : 2810 | DOI: 10.1038/srep02810 2


www.nature.com/scientificreports

Figure 2 | Learning performance of chemo-structural fingerprint vectors. Parity plots comparing property values computed using DFT against
predictions made using learning algorithms trained using chemo-structural fingerprint vectors. Pearson’s correlation index is indicated in each of the
panels to quantify the agreement between the two schemes.

above, the initial dataset was generated using DFT for systems with each pair of properties. The Pearson correlation coefficient (r) used
repeat units containing 4 distinct building blocks. Of the total 175 to quantify a correlation between two given property datasets {Xi}
such systems, 130 were classified to be in the ‘training’ set (used in the and {Yi} for a class of n material systems is defined as follows:
training of the KRR model, Equation (1)), and the remainder in the Pn

i~1 ðXi {X ÞðYi {Y Þ

‘test’ set. Figure 2 shows the agreement between the predictions of the r~ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P : ð1Þ
learning model and the DFT results for the training and the test sets,  2 n  2
i~1 ðXi {X Þ i~1 ðYi {Y Þ
for each of the 8 properties examined. Furthermore, we considered
several chains composed of 8-block repeat units (in addition to the Here, X and Y represent the average values of the properties over the
175 4-block systems), performed DFT computations on these, and respective datasets. Figure 3a shows a matrix of the correlation coef-
compared the DFT predictions of the 8-block systems with those ficients, color-coded to allow for immediate identification of pairs of
predicted using our learning scheme. As can be seen, the level of properties that are most correlated.
agreement between the DFT and the learning schemes is uniformly It can be seen from Figure 3a that the band gap is most strongly
good for all properties across the 4-block training and test set, as well correlated with many of the properties. Panels p1–p6 of Figure 3b
as the somewhat out-of-sample 8-block test set (regardless of the explicitly show the correlation between the band gap and six of the
variance in the property values). Moreover, properties controlled remaining seven properties. Most notably, the band gap is inversely
by the local environment (e.g., the lattice parameter), as well as those correlated with the atomization energy (p1), size (p2), electron affin-
controlled by nonlocal global effects (e.g., the electronic part of the ity (p4), and the dielectric constants (p5 and p6), and directly corre-
dielectric constant) are well-captured. We do note that the agreement lated with the spring constant (p3). The relationships captured in
is most spectacular for the energies than for the other properties (as panels p1–p3 follow from stability and bond strength arguments.
the former are most well-converged, and the latter are derived or The interesting inverse relationship between the band gap and the
extrapolated properties; see Methods). Overall, the high fidelity nat- electron affinity is a consequence of the uniform shift of the conduc-
ure of the learning predictions is particularly impressive, given that tion band minimum (due to changes in the band gap) with respect to
these calculations take a minuscule fraction of the time necessitated the vacuum level. The inverse correlation of the band gap with the
by a typical DFT computation. electronic part of the dielectric constant follows from the quantum
While the favorable agreement between the machine learning and mechanical picture of electronic polarization being due to electronic
the DFT results for a variety of properties is exciting, in and of itself, excitations. As no such requirement is expected for the ionic part of
the real power of this prediction paradigm lies in the possibility of the dielectric constant, it is rather surprising that a rough inverse
exploring a much larger chemical-configurational space than is prac- correlation is seen between the total dielectric constant and the band
tically possible using DFT computations (or laborious experimenta- gap, although clear deviations from this inverse behavior can be seen.
tion). For instance, merely expanding into a family of 1-d systems Finally, we note that the formation energy is uncorrelated with all the
with 8-block repeat units leads to 29,365 symmetry unique cases (an other seven properties, including the band gap. This is particularly
extremely small fraction of this class was scrutinized above for valid- notable as it is a common tendency to assume that the formation
ation purposes). Not only can the learning approach make the study energy (indicative of thermodynamic stability) is inversely correlated
of this staggeringly large number of cases possible, it also allows for a with the band gap (indicative of electronic stability).
search for correlations between properties in a systematic manner. In
order to unearth such correlations, we first determined the properties Discussion
of the 29,365 systems using our machine learning methodology, Correlation diagrams such as the ones in Figure 3b offer a pathway to
followed by the estimation of Pearson’s correlation coefficient for ‘design’ systems that meet a given set of property requirements. For

SCIENTIFIC REPORTS | 3 : 2810 | DOI: 10.1038/srep02810 3


www.nature.com/scientificreports

Figure 3 | High throughput predictions and correlations from machine learning. (a) The upper triangle presents a schematic of the atomistic
model composed of repeat units with 8 building blocks. Populating each of he 8 blocks with one of the seven units leads to 29,365 systems. The matrix in
the lower triangle depicts the Pearson’s correlation index for each pair of the eight properties of the 8-block systems predicted using machine learning.
(b) Panels p1 to p6 show the correlations between the band gap and six properties. The panel labels are also appropriately indexed in (a). The circle in
panel p6 indicates systems with a simultaneously large dielectric constant and band gap.

instance, a search for insulators with high dielectric constant and between the learning scheme and DFT is not as remarkable as with
large band gap would lead to those systems that are at the top part the chemo-structural fingerprint approach adopted earlier, this can
of panel p6 of Figure 3b (corresponding to the ‘deviations’ from the most likely be addressed by the utilization of the actual 3-d charge
inverse correlation alluded to above, and indicated by a circle in panel density. Nevertheless, we believe that the performance of the learning
p6). These are systems that contain 2 or more contiguous SnF2 units, scheme is satisfactory, and heralds the possibility of arriving at a
but with an overall CH2 mole fraction of at least 25%. Such organo- ‘universal’ approach for property predictions solely using the elec-
tin systems may be particularly appropriate for applications requir- tronic charge density.
ing high-dielectric constant polymers. Furthermore, such diagrams A second issue with the charge density based materials profiling
can aid in the extraction of knowledge from data eventually leading relates to determining the charge density in the first place. If indeed a
to Hume-Rothery-like semi-empirical rules that dictate materials mapping between charge density and the properties can be made for
behavior. For instance, the panel p3 reveals a well known corres- the training set, how do we obtain the charge density of a new system
pondence between mechanical strength and chemical stability27, without explicitly performing a DFT computation? We suggest that
while panels p5 and p6 capture an inverse relationship between the the ‘atoms in molecules’ concept may be exploited to create a
dielectric constant and the bandgap, also quite familiar to the semi- patched-up charge density distribution32. Needless to say, barring
conductor physics community28. some studies in the area of atoms and molecules33, these concepts
The entire discussion thus far has focused on fingerprint vectors are in a state of infancy, and there is much room available for both
defined in terms of coarse-level chemo-structural descriptors. This fundamental developments and innovative applications.
brings up a question as to whether other more fundamental quant- To conclude, we have shown that the efficient and accurate pre-
ities may be used as a fingerprint to profile a material. The first diction of a diverse set of unrelated properties of material systems is
Hohenberg-Kohn theorem of DFT29 proves that the electronic possible by combining the notions of chemical (dis)similarity and
charge density of a system is a universal descriptor containing the machine (or statistical) learning methods. Using a family of 1-d chain
sum total of the information about the system, including all its prop- systems, we have presented a general formalism that allows us to
erties. The shape30 and the holographic31 electron density theorems discover decision rules that establish a mapping between easily
constitute further extensions of the original Hohenberg-Kohn the- accessible attributes of a system and its various properties. We have
orem. Inspired by these theorems, we propose that machine learning unambiguously shown that simple fingerprint vectors based on
methods may be used to establish a mapping between the electronic either compositional and configurational information, or the elec-
charge density and various properties. tronic charge density distribution, can be used to profile a material
A fundamental issue related to this perspective deals with defining and make property predictions at an enormously small cost com-
pared either with quantum mechanical calculations or laborious
a (dis)similarity criterion that can enable a fair comparison between
experimentation. The methodology presented here is of direct rel-
the charge density of two different systems. Note that any such mea-
evance in identifying (or screening) undiscovered materials in a tar-
sure has to be invariant with respect to relative translations and/or
geted class with desired combination of properties in an efficient
rotations of the systems. In the present work, we have employed
manner with high fidelity.
Fourier coefficients of the 1-d charge density of our systems (aver-
aged along the plane normal to the chain axis). The Fourier coeffi-
cients are invariant to translations of the systems along the chain axis, Methods
and consideration of the 1-d planar averaged charge density makes First principles computations. The quantum mechanical computations were
performed using density functional theory (DFT)23,24 as implemented in the Vienna
the rotational degrees of freedom irrelevant. Figure 4 shows a com- ab initio software package34,35. The generalized gradient approximation (GGA)
parison of the predictions of the learning model based on charge functional parametrized by Perdew, Burke and Ernzerhof (PBE)36 to treat the
density with the corresponding DFT results. While the agreement electronic exchange-correlation interaction, the projector augmented wave (PAW)37

SCIENTIFIC REPORTS | 3 : 2810 | DOI: 10.1038/srep02810 4


www.nature.com/scientificreports

Figure 4 | Learning performance of electron charge density-based fingerprint vectors. Parity plots comparing property values computed using DFT
against predictions made using learning algorithms trained using electron density-based fingerprint vectors. The Fourier coefficients of the planar-
averaged Kohn-Sham charge density are used to construct the fingerprint vector. Pearson’s correlation index is indicated in each of the panels to quantify
the agreement between the two schemes.
 
potentials, and plane-wave basis functions up to a kinetic energy cutoff of 500 eV 1  2
were employed. and Kab ~exp { 2 F ab  is the kernel matrix elements of all polymers in the
2s
Our 1-d systems were composed of all-trans infinitely long isolated chains con- training set. The parameters l, s and aas are determined in an inner loop of fivefold
taining 4 independent building units in a supercell geometry (with periodic boundary cross validation using a logarithmically scaling fine grid.
conditions along the axial direction). One CH2 unit was always retained in the
backbone (to break the extent of s-conjugation along the backbone), and the three
other units were drawn from a ‘‘pool’’ of seven possibilities: CH2, SiF2, SiCl2, GeF2, 1. Poggio, T., Rifkin, R., Mukherjee, S. & Niyogi, P. General conditions for
GeCl2, SnF2 and SnCl2, in a combinatorial and exhaustive manner. This scheme predictivity in learning theory. Nature 428, 419–422 (2004).
resulted in 175 symmetry unique systems after accounting for translational peri- 2. Tomasi, C. Past performance and future results. Nature 428, 378 (2004).
odicity and inversion symmetry. A Monkhorst-Pack k-point mesh of 1 3 1 3 k (with 3. Rehmeyer, J. Influential few predict behavior of the many. Nature News, http://dx.
kc . 50) was used to produce converged results for a supercell of length c Å along the doi.org/10.1038/nature.2013.12447.
chain direction (i.e., the z direction). The supercells were relaxed using a conjugate 4. Holland, J. H. Emergence: from Chaos to order (Cambridge, Perseus, 1998).
gradient algorithm until the forces on all atoms were ,0.02 eV/Å and the stress 5. Jones, N. Quiz-playing computer system could revolutionize research. Nature
component along the z direction was ,1.0 3 1022 GPa. Sufficiently large grids were News, http://dx.doi.org/10.1038/news.2011.95.
used to avoid numerical errors in fast Fourier transforms. A small number of cases 6. MacLeod, N., Benfield, M. & Culverhouse, P. Time to automate identification.
involving 8 building units were also performed for validation purposes. Nature 467, 154–155 (2010).
The calculated atomization energies and formation energies are referenced to the 7. Crutchfield, J. P. Between order and chaos. Nature Physics 8, 17–24 (2012).
isolated atoms and homo-polymer chains of the constituents, respectively. While the 8. Chittka, L. & Dyer, A. Cognition: Your face looks familiar. Nature 481, 154–155
lattice parameters, spring constants, band gaps and electron affinities of the systems (2012).
are readily accessible through DFT computations, the calculations of the optical and 9. Abu-Mostafa, Y. S. Machines that Learn from Hints. Sci. Am. 272, 64–69 (1995).
static components of the dielectric constant require particular care. The dielectric 10. Abu-Mostafa, Y. S. Machines that Think for Themselves. Sci. Am. 307, 78–81
permittivity of the isolated polymer chains placed in a large supercell were first (2012).
computed within the density functional perturbation theory (DFPT)38,39 formalism, 11. Silver, N. The Signal and the Noise: Why So Many Predictions Fail but Some Don’t
which includes contributions from the polymer as well as from the surrounding (Penguin Press, New York, 2012).
vacuum region of the supercell. Next, treating the supercell as a vacuum-polymer 12. Curtarolo, S. et al. The high-throughput highway to computational materials
composite, effective medium theory40 was used to estimate the dielectric constant of design. Nature Mater. 12, 191–201 (2013).
just the polymer chains using methods described recently13,41. Table 1 of the 13. Pilania, G. et al. New group IV chemical motifs for improved dielectric
Supporting Information contains the DFT computed atomization energies, forma- permittivity of polyethylene. J. Chem. Inf. Modeling 53, 879–886 (2013).
tion energies, c lattice parameters, spring constants, electron affinities, bandgaps, and
14. Levy, O., Hart, G. L. W. & Curtarolo, S. Uncovering compounds by synergy of
dielectric permittivities for the 175 symmetry unique polymeric systems.
cluster expansion and high-throughput methods. J. Am. Chem. Soc. 132,
4830–4833 (2010).
Machine learning details. Within the present similarity-based learning model, a 15. Jain, A. et al. A high-throughput infrastructure for density functional theory
property of a system in the test set is given by a sum of weighted Gaussians over the calculations. Comp. Mater. Sci. 50, 22952310 (2011).
entire training set, as 16. Hart, G. L. W., Blum, V., Walorski, M. J. & Zunger, A. Evolutionary approach for
XN   determining first-principles hamiltonians. Nature Mater. 4, 391394 (2005).
1  2
Pb ~ aa exp { 2 F ab  : ð2Þ 17. Fischer, C. C., Tibbetts, K. J., Morgan, D. & Ceder, G. Predicting crystal structure
a~1
2s by merging data mining with quantum mechanics. Nature Mater. 5, 641–646
(2006).
where a runs over the systems in the previously known dataset. The coefficients aas 18. Saad, Y. et al. Data mining for materials: Computational experiments with AB
and the parameter s are obtained by ‘training’ the above form on the systems a in the compounds. Phys. Rev. B 85, 104104 (2012).
previously known dataset. The training (or learning) process is built on minimizing 19. Rupp, M., Tkatchenko, A., Muller, K. & von Lilienfeld, O. A. Fast and accurate
N 
P 2 PN
the expression a
PEst a
{PDFT zl a2a , with PEst
a
being the estimated property modeling of molecular atomization energies with machine learning. Phys. Rev.
a~1 a~1 Lett. 108, 058301 (2012).
a
value, PDFT the DFT value, and l a regularization parameter25,26. The explicit solution 20. Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding Density
to this minimization problem is a 5 (K 1 lI)21PDFT, where I is the identity matrix, Functionals with Machine Learning. Phys. Rev. Lett. 108, 253002 (2012).

SCIENTIFIC REPORTS | 3 : 2810 | DOI: 10.1038/srep02810 5


www.nature.com/scientificreports

21. Montavon, G. et al. Machine Learning of Molecular Electronic Properties in 39. Gonze, X. Dynamical matrices, Born effective charges, dielectric permittivity
Chemical Compound Space. Accepted to New J. Phys. tensors, and interatomic force constants from density-functional perturbation
22. Hautier, G., Fisher, C. C., Jain, A., Mueller, T. & Ceder, G. Finding natures missing theory. Phys. Rev. B 55, 10355–10368 (1997).
ternary oxide compounds using machine learning and density functional theory. 40. Choy, T. C. Effective medium theory: principles and applications (Oxford
Chem. Mater. 22, 3762 (2010). University Press Inc., Oxford, 1999).
23. Kohn, W. Electronic structure of matter–wave functions and density functionals. 41. Wang, C. C., Pilania, G. & Ramprasad, R. Dielectric properties of carbon-, silicon-,
Rev. Mod. Phys. 71, 1253 (1999). and germanium-based polymers: A first-principles study. Phys. Rev. B 87, 035103
24. Martin, R. Electronic Structure: Basic Theory and Practical Methods (Cambridge (2013).
University Press, New York, 2004).
25. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction (Springer, New York, 2009).
26. Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K. & Scholkopf, B. An introduction to
kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 181
Acknowledgements
This paper is based upon work supported by a Multidisciplinary University Research
(2001).
Initiative (MURI) grant from the Office of Naval Research. Computational support was
27. Gilman, J. J. Physical chemistry of intrinsic hardness. Mater. Sci. and Eng. A209,
74–81 (1996). provided by the Extreme Science and Engineering Discovery Environment (XSEDE), which
28. Zhu, H., Tang, C., Fonseca, L. R. C. & Ramprasad, R. Recent progress in ab initio is supported by National Science Foundation. Discussions with Kenny Lipkowitz, Ganpati
simulations of hafnia-based gate stacks. J. Mater. Sci. 47, 7399–7416 (2012). Ramanath and Gerbrand Ceder are gratefully acknowledged.
29. Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136,
B864–B871 (1964). Author contributions
30. Geerlings, P., Boon, G., Van Alsenoy, C. & De Proft, F. Density functional theory R.R., C.W. and G.P. conceived the statistical learning model, with input from S.R. and X.J.
and quantum similarity. Int. J. Quantum. Chem. 101, 722 (2005). The DFT computations were performed by G.P. The initial implementation of the statistical
31. Mezey, P. G. Holographic electron density shape theorem and its role in drug
learning framework was performed by C.W. and extended by G.P. The manuscript was
design and toxicological risk assessment. J. Chem. Inf. Comput. Sci. 39, 224 (1999).
written by G.P., S.R. and R.R.
32. Bader, R. F. W. Atoms in molecules: a quantum theory (Oxford University Press,
Oxford, 1990).
33. Bultinck, P., Girones, X. & Carbo-Dorca, R. Molecular quantum similarity: theory Additional information
and applications. Reviews in Computational Chemistry, Volume 21 (2005). Supplementary information accompanies this paper at http://www.nature.com/
34. Kresse, G. & Furthmuller, J. Efficient iterative schemes for ab initio total-energy scientificreports
calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).
35. Kresse, G. & Furthmuller, J. Efficiency of ab-initio total energy calculations for Competing financial interests: The authors declare no competing financial interests.
metals and semiconductors using a plane-wave basis set. J. Comput. Mater. Sci. 6, How to cite this article: Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R.
15–50 (1996). Accelerating materials property predictions using machine learning. Sci. Rep. 3, 2810;
36. Perdew, J., Burke, K. & Ernzerhof, M. Generalized Gradient Approximation Made DOI:10.1038/srep02810 (2013).
Simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
37. Blöchl, P. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 This work is licensed under a Creative Commons Attribution 3.0 Unported license.
(1994). To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0
38. Baroni, S., de Gironcoli, S. & Dal Corso, A. Phonons and related crystal properties
from density-functional perturbation theory. Rev. Mod. Phys. 73, 515–562 (2001).

SCIENTIFIC REPORTS | 3 : 2810 | DOI: 10.1038/srep02810 6

You might also like