Materials and Design 193 (2020) 108835
Contents lists available at ScienceDirect
Materials and Design
journal homepage: www.elsevier.com/locate/matdes
Machine learning reveals the importance of the formation enthalpy and
atom-size difference in forming phases of high entropy alloys
Lei Zhang a,d, Hongmei Chen b, Xiaoma Tao b, Hongguo Cai a,d, Jingneng Liu c, Yifang Ouyang b,⁎,
Qing Peng e, Yong Du f
a
School of Mathematics and Information Science, Guangxi College of Education, Nanning 530023, China,
School of Physical Science and Technology, Guangxi University, Nanning 530004, China,
Maritime College, Beibu Gulf University, Qinzhou 535011, China,
d
Institute for Intelligent Computing and Simulation Research, Guangxi College of Education, Nanning 530023, China
e
Physics Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
f
State Key Laboratory of Powder Metallurgy, Central South University, Changsha 410083, China
b
c
H I G H L I G H T S
G R A P H I C A L
A B S T R A C T
• The accurate thermodynamic properties
of alloys can be calculated quickly by extended Miedema theory
• The feature variables are optimized by
Kernel Principal Component Analysis.
• The phases of multi-principal element
alloys are distinguished well by support
vector machine of machine learning
model.
a r t i c l e
i n f o
Article history:
Received 1 April 2020
Received in revised form 14 May 2020
Accepted 28 May 2020
Available online 30 May 2020
Keywords:
Multi-principal element alloys
Miedema theory
Machine learning
Feature selection
Alloy phase prediction
a b s t r a c t
Despite outstanding and unique properties, the structure-property relationship of high entropy alloys (HEAs) is
not well established. The machine learning (ML) is used to scrutinize the effect of nine physical quantities on four
phases. The nine parameters include formation enthalpies determined by the extended Miedema theory, and
mixing entropy. They are highly related to the phase formation, common ML methods cannot distinguish accurately. In this paper, feature selection and feature variable transformation based on Kernel Principal Component
Analysis (KPCA) are proposed, the feature variables are optimized, the distinction of phases is carried out by Support vector machine (SVM) model. The results indicate that elastic energy and atom-size difference contribute
significantly in the formation of different phases. The accuracy of testing set predicted by SVM based on four feature variables and KPCA (4V-KPCA) is 0.9743. The F1-scores predicted detailedly by SVM based on 4V-KPCA for
the considered alloy phases are 0.9787, 0.9463, 0.9863 and 0.8103, corresponding to solid solution, amorphous,
the mixture of solid solution and intermetallic, and intermetallic respectively. The extended Miedema theory
provides accurate thermodynamic properties for the design of HEAs, and ML methods (especially SVM combined
KPCA) are powerful in the prediction of alloy phases.
© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://
creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
⁎ Corresponding author.
E-mail address: ouyangyf@gxu.edu.cn (Y. Ouyang).
The conventional alloys were designed primarily on the basis of one
or two principal constituent elements and a few other alloying elements
https://doi.org/10.1016/j.matdes.2020.108835
0264-1275/© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
2
L. Zhang et al. / Materials and Design 193 (2020) 108835
to adjust their microstructures and properties. However, the addition of
alloying elements might result in the formation of some brittle intermetallic compounds, and the mitigation of the mechanical properties. It is a
desirable and continuous effort to discover the relationships among the
alloying elements, the composition of alloying elements, and the performance of alloys. The alloys composed of several constituent elements
with equal atomic composition are named multi-principal element alloys (MPEAs). The seminal papers of Yeh et al. [1] and Cantor et al. [2]
proposed a new class of materials and increased the awareness of the
understanding for alloy design. The alloys such as high-entropy alloys
(HEAs), which were subject to MPEAs, typically comprise of five or
more constituent elements (for brevity, the term HEA is used for both
of HEA and MPEA in this paper). They possess a single phase with a
face-centered cubic (FCC), body-centered cubic (BCC), or hexagonal
close-packed (HCP) structure [3]. Since then, HEAs have attracted considerable attentions and research interests [4–7]. Due to the high entropy effect, lattice distortion effect, hysteretic effect among alloying
elements, HEAs show excellent characteristics different from conventional alloys in mechanical properties [8–10], high temperature properties [11–13], corrosion resistance properties [14–16], and magnetic
properties [17–19].
A few of theoretical methods [20] have been used to design HEAs,
such as CALPHAD method [21,22], ab-initio calculations [23,24], and
Monte Carlo Simulation [25,26]. These methods have reliable theoretical basis for the design of HEAs. However, they are limited to simple
cases due to complexity, time-consuming process, and/or low efficiency. Alternatively, Zhang et al. [27,28] and Senkov et al. [29] proposed
an empirical method to predict the formation of phases in HEAs. Poletti
et al. [30] proposed Electronic parameters for alloys
(e.g., electronegativity, valence electron concentration (VEC), itinerant
electron concentration) to improve the formulation of HEAs. However,
the accuracy for distinguish of different phases is far below satisfactory.
Experimentally, the formation of different phases depends on the preparation process. The preparation methods [31] for HEAs include the
melt-cast process, powder metallurgy, melt spinning, and deposition
techniques [32,33]. In the procedure of alloy prepared, the cost, processing ability, and the experimental complexity need to be considered. Despite these difficulties, quite a few of meaningful data for HEAs have
been obtained by theoretical and/or experimental methods.
Computer simulation technology has been widely applied to the design of complex material systems. The long-term accumulation of the
high-throughput calculations and experiments provides a meaningful
material database. The new computer data processing method that collates the existing data and discovers complex predictive relationship
among multiple variables to evaluate the properties of new materials
has already become an important new path for material design. In the
past few years, machine learning (ML), one of the data processing
methods, has been used to design new materials and predict various
performances of materials [34–40]. Pilania et al. [41] demonstrated a
systematic feature-engineering approach and a robust learning framework based on ML for accurate predictions of electronic bandgaps of
double perovskites. Ubaru et al. [42] used ML methods such as sensitivity analyze, least absolute shrinkage and selection operator (LASSO)
based methods, and Support Vector Machine to predict the formation
enthalpies of binary intermetallic compounds. Choudhury et al. [43]
classified the HEAs based on several ML algorithms such as K-nearest
neighbor (KNN), support vector machine (SVM), logistic regression,
naïvebased approach, decision tree and neural network. Gong et al.
[44] classified superheavy elements based on ML, and found the relationship between atomic data and classification of elements. Huang
and Islam et al., and Zhuang's group [45,46] have predicted the phases
of HEAs based on ML methods including KNN, SVM, and artificial neural
network (ANN). They concluded that the trained ANN model is the best
and thus the most useful in predicting the formation of new HEAs. Zhou
et al. [47] have compared the sensitivity measures of the 13 design parameters based on the result of the ANN model. As aforementioned,
many initiate researches on materials design by ML have been made.
However, there is ample room for improvement in the construction of
data samples for alloy systems, the generalization ability, the learning
effectively, and the accuracy for models.
In the implementation of ML methods, an important question is how
to select relevant and effective features of alloys. The feature represents
the basic attributes for alloy or constituent elements of alloy system. The
properties include the thermodynamic properties (e.g., enthalpy, entropy), the atomic radius, VEC [48], parameter Ω, and atom size difference δ [28] etc. In the empirical prediction of alloy phases, the
empirical rules are summarized as Ω ≥ 1.1 and δ ≤ 6.6% for the solid solution phase. However, the discrimination of phases is not good enough,
especially for the mixture of different phases. In this work, the thermodynamic properties of HEAs calculated with Miedema theory and
atomic attributes were used to establish a dataset for HEAs. We considered nine parameters, including mixing enthalpy of amorphous phase
(HAM), formation enthalpy of intermetallic compound phase (HIM), formation enthalpy of solid solution phase (HSS), elastic energy of alloy
(HE), mixing enthalpy of liquid phase (HL), mixing entropy of alloy
(Smix), weighted melting temperature of alloy (Tm), atomic size difference δ and parameter Ω. In addition, several ML algorithms were applied to select feature, train data, model and predict different phases
for HEAs.
2. Methods
2.1. Establishing the dataset
The dataset is firstly built up from Refs [15, 28, 29, 49–53] containing
556 entries. After the removal of the duplicated data, the new dataset is
composed of 407 HEAs, consisting of 215 solid solutions (SS), 12 intermetallic compounds (IM), 142 mixtures of solid solutions and intermetallic compounds (SS + IM), and 38 amorphous alloys (AM). The nine
corresponding properties for HEA were used for feature variables in ML.
Ouyang's model [54] based on Miedema theory [55], which has a
good prediction [56–59] of the formation enthalpies for multicomponent alloys, was used to predict the thermodynamic properties, such
as HAM, HIM, HSS and HL. The formation or mixing enthalpy for binary
alloy is calculated by Miedema theory as:
ð1Þ
ð2Þ
ð3Þ
ð4Þ
ð5Þ
where V, φ, and n are mole volume, electron chemical potential and
electronic density at the Wigner-Seitz cell boundary, respectively. P, Q,
R, α, γ and a are empirical parameters, in which Q/P = 9.4, α = 0.73
for a liquid alloy, α = 1 for a solid alloy. γ = 0 for random status (i.e. liquid and solid solution phase), γ = 5 for amorphous phase, γ = 8 for intermetallic phase, respectively. The description of all abovementioned
parameters referred to Ref. [55].
As for the binary alloy, the elastic energy was estimated by the following formulas
ð6Þ
ð7Þ
where B and G are the bulk modulus and shear modulus, respectively.
On the basis of the properties for binary alloys, the thermodynamic
properties for the HEAs were calculated by the extended geometric
model [54].
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
3
L. Zhang et al. / Materials and Design 193 (2020) 108835
The features Smix, Tm, δ and Ω can be calculated as following.
ð9Þ
2.3.1. Multilayer perceptron
Multilayer perceptron [65] is considered feed-forward neural networks because all data flow in only one direction, from input to output
units. MLP viewed as a universal approximator is very fast and easy to
use. MLP is given as follows:
ð10Þ
ð13Þ
ð11Þ
where xi, for i = 1, 2, …, n, are inputs, y is output, wi is the weights with
the ith input, and θ is a threshold. Most often φ(x) is (1 + e−x)−1.
This neural network consists of multiple processing units. Each unit
performs a biased weighted sum of its inputs and passes this activation
level through a transfer function to generate output, and the units are
arranged in a layered feed-forward topology. The learning of MLP is accomplished by adjusting the weights of connections between neurons.
MLP is a nonlinear classifier and is suitable for handling discrete data.
ð8Þ
where xi, (Tm)i, and ri refer to the atomic concentration, melting temperature and atomic radius of the ith element, respectively. R in Eq. (8) denotes the gas constant. HSS is the formation enthalpy of solid solution
phase for HEAs.
2.2. Feature selection
The feature selection is often used to reduce feature space dimensionality and remove noisy and redundant features [60–62]. It aims to
select a small subset of original features that minimize redundancy
and maximize relevance, and it is superior in terms of better readability
and interpretability.
Pearson correlation coefficient (PCC) [63] belonging to statistical
index is expressed as:
ð12Þ
where , .The coefficient rx,y ranges from −1 to 1 and is constant for linear
transformation of either variables. Pearson correlation coefficient represents the strength of the linear relationship between two random variables x and y. The positive (negative) sign of the correlation coefficient
corresponds that these variables correlate directly (inversely), otherwise rx,y = 0, meaning they are uncorrelated. The closer the value of |
rx,y| is to 1, the stronger the measure is to the linear relationship. This
is because correlation measures reflect trends in the expression levels
of each pair in the two profiles.
Univariate feature selection [62] helps to determine the strength of
the relationship between each feature and the target variable through
some statistical methods or ML algorithms such as Chi square, F-test,
and Mutual Information. The features are ranked and extracted according to the strength of relationship. The screened feature variables should
be used for the training, testing and verifying for ML models. Therefore,
univariate feature selection is often used as a preprocessor before applying the estimation model to the dataset. Compared to Pearson correlation coefficient, univariate feature selection has better performance for
the discrete data.
Stability selection [64], which is based on subsampling in combination with selection algorithms (e.g., regression, SVM), is a relatively
novel method for feature selection. The high-level idea is to apply the
feature selection algorithm to different subsets of data and different
subsets of features. After repeating the process several times, the selection results can be aggregated. The strong features will have scores close
to 100%; the weak relevant features will have non-zero scores; and the
irrelevant features will have scores (close to) zero. In this paper, the randomized Lasso algorithm was used to estimate the stability.
2.3. Machine learning algorithms
To predict phases of HEAs, the key topic is to seek the relationship
between some properties of alloys and the corresponding phases, and
then different alloy phases can be distinguished. In fact, this is a classification problem. The relationship may be explicit (e.g., functional expression) or implicit (e.g., mapping matrix). The aforementioned
empirical methods belong to explicit functions. By contrast, the ML algorithm is able to get the implicit mapping matrix. The classification algorithms such as Multilayer perceptron (MLP), SVM and Gradient
boosting decision tree (GBDT) were used for the predictions.
2.3.2. Support vector machine
Support vector machine [66] is one of the binary classifiers based on
maximum margin strategy, which is a concise and effective classification method based upon statistical learning theory. SVM maps input
vectors into a high dimensional feature space to obtain the optimal separation hyperplane. SVM was originally used for linear classification
with margin, and was extended to nonlinear examples until the nonlinear separation problem was transformed into a high dimensional feature space. The separation hyperplane is determined by the support
vector, so it has strong robustness to outliers and is more suitable than
other classification algorithms for dealing with unbalanced class
problems.
In the linearly separating case, the decision surface equation of the
separating hyperplane can be written [67] as
ð14Þ
where x is input vector, w is an adjustable weight vector and b is bias.
In the nonlinearly separating case, the decision surface equation of
the separating hyperplane can be written [67] as
ð15Þ
where αi is a Lagrangian multiplier, di is expectation response, and k(x,
xi) is called kernel function. Different kernel functions, including linear
kernel (xTx'), polynomial kernel ((xTx' + 1)d), RBF kernel (exp(−γ||xx'||2)) and sigmoidal kernel (tanh(γxx' + C)) can be used in SVMs for
the nonlinear problem.
2.3.3. Gradient boosting decision tree
Gradient boosting decision tree [68,69] is an ensemble model of decision trees, which is trained in sequence and learned by fitting the negative gradient. GBDT is an enhancement algorithm originally used for
regression task. GBDT can also be used for classification tasks by using
suitable loss functions. In order to avoid over-fitting, it is very important
to choose the correct number of iterations in the gradient boosted forest. Setting it too high may result in overfitting, and setting it too low
may result in under-fitting. GBDT over-fitting can also be greatly reduced through random sampling training.
With the training dataset {xi, yi}, the approximation function can be
expressed as [69]:
ð16Þ
where the corresponding training dataset of decision tree hk is and γk is
. It indicates the update rate for GBDT. With its increasing depth the decision tree constantly corrects the errors left by the previous model, thus
it improves the prediction effect of GBDT.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
4
L. Zhang et al. / Materials and Design 193 (2020) 108835
2.3.4. Kernel principal component analysis
Kernel Principal Component Analysis integrates kernel function on
traditional linear Principal Component Analysis (PCA), and it is helpful
to solve the nonlinear problems. PCA is a powerful technique for
extracting a structure from potentially high-dimensional datasets, and
KPCA [70,71] calculates principal components in a high-dimensional
feature space that is nonlinearly related to the input space. By adjusting
parameters of KPCA, KPCA can achieve dimensionality reduction and
expanding dimensions of input data. The kernel function in KPCA is similar to the kernel function used in SVM. In this paper, the poly kernel
function was used to predict the classification.
3. Results and discussion
3.1. Feature selection
PCCs have been calculated to analyze the correlation between the
properties (i.e. HAM, HIM, HSS, HE, HL, Smix, Tm, δ, Ω) for HEAs and different
phases in Fig. 1. From Fig. 1, the PCCs for HSS and Ω are close to zero in
the first column. This indicates that HSS and Ω are irrelevant with
phases. The PCC for Tm is very small, and it is also irrelevant. The PCCs
for HAM, HIM, HL and Smix are around 0.3, which indicates there is a certain degree of relevance. The absolute value of PCCs for HE and δ are
larger than 0.5, which shows HE and δ are strongly relevant.
On the other hand, the values of mutual PCCs for HAM, HIM, HSS and HL
are very close to 1, indicating that they are strongly relevant. The four
formation enthalpies have been calculated on the basis of Miedema theory (Eqs. (1)–(5)) and extended geometric model. In these equations,
the expressions are similar and just several parameters (i.e. α and γ)
should be changed in different phases. Even so, the effect of the changes
for parameters (i.e. α and γ) is still small, and the expressions show
strong relevance in the mathematical sense. It can be seen from the
above correlation analysis for the PCCs of nine feature variables that
the parameters HSS, Tm and Ω are redundant variables.
However, the PCC can only describe the linear dependency between
variables. If there is a nonlinear correlation between variables, the result
for PCC is poor. Therefore, other methods should be used to further evaluate the correlation for feature variables.
The correlation of phases between feature variables is illustrated in
Fig. 2. These correlations have been evaluated by PCC, univariate feature
selection, and stability selection. The larger the values for univariate feature selection and stability selection are, the stronger the correlation is.
For the univariate feature selection, the values of HSS, Tm and Ω are significantly smaller than the rest, which indicates that these feature variables are strongly relevant with phases. For the stability selection, the
value of HSS, Tm and Ω are almost zero, and the value of HIM is very
close to zero. This indicates the corresponding feature variables are irrelevant with phases, and the remaining variables are strongly relevant
with phases.
From the comparison of the above three methods for feature selection, the results of HE and δ are consistently strong relevance. According
to the Hume-Rothery rules [72] for the solubility in binary alloy systems,
the atomic size and the formation enthalpy will affect the formation of
the solid solution phase. First, if the atomic size difference of the constituent elements of alloy is N15%, it is the most improbable to form solid
solution. Second, compared with solid solution, the more negative formation enthalpy is, the more likely the alloys form intermetallic compound. On one hand, the parameter δ proposed by Zhang et al. [27,28]
indicates the size effects for the component of alloys, with δ ≤ 6.6%.
Takeuchi and Inoue [73,74] also proposed similar criterion. The lager
difference of atomic size for component in alloys can result in disordered arrangement of atoms, and benefit on amorphous forming. On
the other hand, the elastic energy is positive, and the formation enthalpy is the sum of chemical enthalpy and elastic energy. Thus the elastic energy can shift the formation enthalpy towards positive. The
formation enthalpy of small magnitude benefits to form solid solution.
In view of this, we classify different phases by two variables with HE
and δ, as follows.
Fig. 1. Pearson correlation coefficients with the nine thermodynamics variables. The nine parameters examined by the extended Miedema theory are mixing enthalpy of amorphous phase
(HAM), formation enthalpy of intermetallic compound phase (HIM), formation enthalpy of solid solution phase (HSS), elastic energy of alloy (HE), mixing enthalpy of liquid phase (HL),
mixing entropy of alloy (Smix), weighted melting temperature of alloy (Tm), atom-size difference (δ), and parameter Ω.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
L. Zhang et al. / Materials and Design 193 (2020) 108835
5
indirectly from Tm, Smix and HSS, but Tm and HSS are redundant, and Smix
is not strongly relevant with the phases, resulting in that the parameter
Ω is irrelevant. The results of HIM in different methods are discrepant, so
it is impossible to be eliminated. Therefore, the efficient subset can consist of the feature variables with HAM, HIM, HE, HL, Smix and δ.
3.2. Classification by machine learning algorithms
Fig. 2. The correlation for feature variables evaluated by Pearson correlation coefficient,
univariate feature selection and stability selection.
The relationship between HE and δ with different alloy phases is
displayed in Fig. 3. Surprisingly, both 3D scatter plot and the projection
drawing of HE and δ are disable to classify phases. The solid solution
phase (i.e. HEA) and the mixture of solid solution and intermetallic
phase overlap each other, suggesting that the parameters of HE and δ
are not enough to distinguish different phases. Some other properties
also contribute to the formation of phase. It is important that a number
of properties for HEAs must be used to establish efficient dataset from
feature variables.
Furthermore, the feature variables of HAM, HE, HL, Smix and δ are relevant with phases, these thermodynamic properties are efficient feature
variables, and the value of variance for δ is small and indicates high consistency. HSS, Tm and Ω are irrelevant with phases, and these properties
are redundant. This result is contrary to Zhang's point of view [27,28]
where the parameter Ω is important for the forming of solid solution,
and the larger Ω would facilitate the formation of solid solution. This
discripancy may be attribute to the fact that the parameter Ω calculated
Fig. 3. The relationship between HE and δ with different alloy phases. The HE indicates the
elastic energy of HEA, and δ indicates the atom-size difference.
The models of MLP, SVM and GBDT were used to predict the phase
with the new subset, which consists of six feature variables. The learning curve for k-fold cross validation was used to select models and evaluate the performance of fitted models. 30% of the dataset was extracted
randomly as the testing set (hereinafter the proportion of the test set is
30%), and the numbers of k-fold are 10 (hereinafter the value of k-fold
for cross validation is 10). The learning curves based on six feature variables (6 V) predicted by adjusting the parameters with different
models are shown in Fig. 4(a).
From Fig. 4(a), all learning curves are convergent, and there is slight
overfitting in the models. However, the curves are still steep, indicating
that the models have a fast learning rate. The evaluated result of prediction model for SVM is better than those for MLP and GBDT. The learning
curve for MLP is not stable, and that for GBDT is slow in converging. The
accuracy of SVM with its faster convergence rate and higher stability is
about 0.75 for the testing data. In addition, the dataset only has 407
samples, which leads to instability of the learning curve for MLP, but
the models for SVM and GBDT are not sensitive to the number of
samples.
The accuracy 0.75 is obviously not satisfactory. It is challenging to
improve the accuracy of prediction. Previously, the method of reducing
the feature variables has been used to optimize the data structure. The
low evaluation accuracy is not enough to predict the different phases
of HEA. Conversely, does increasing the feature variable improve prediction accuracy? Zhou et al. [47] used thirteen parameters to build subset
and train the ML model. Tancret et al. [75] trained ML model based on
nine physical parameters, Zhang et al. [76] used fourteen empirical materials descriptors to train the ML model. The more feature variables, the
more information can be provided. But the redundant feature variables
still cannot be increased. Kernel Principal Component Analysis (KPCA)
can be used to expand dimension and increase feature variables. The
KPCA model was optimized by adjusting parameters, and the feature
variables with 6 dimensions is expanded to 11 dimensions. The
preprocessed data can be used to train the prediction model. The learning curves for prediction model based on six feature variables and KPCA
(6V-KPCA) are displayed in Fig. 4(b). Compared to Fig. 4(a), the values
of predictive accuracy for train and test increase substantially. The accuracies of training set and testing set converge to a higher value. The accuracy of testing set for SVM model is close to 0.975. Thus, expanding
dimension benefits to distinguish different phases. The preprocessing
of KPCA provides more valuable information for the classification in
ML models.
Our aforementioned results suggest that expanding dimensions can
improve the predictive accuracy. It is equivalent to increasing the number of variables from a certain perspective. Does that mean the more
variables will lead to better predictive results by expanding dimensions? In the following, we make an attempt to predict phases useing
different ML models based on nine feature variables and KPCA (9VKPCA). The KPCA model was optimized by adjusting parameters. The
feature variables with 9 dimensions is expanded to 20 dimensions. In
Fig. 4(c), the learning curves perform well, but the results are still not
as good as those in Fig. 4(b). The predictive accuracy of SVM model is
b0.975. Therefore, the feature selection is necessary. The redundant feature variable can interfere with the ML model and degrade the predictive accuracy.
The strongly relevant feature variable has great influence on prediction. Furthermore, we analyze the above six feature variables in Fig. 2.
The variables HE and δ are relevant consistently in the different feature
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
6
L. Zhang et al. / Materials and Design 193 (2020) 108835
Fig. 4. The learning curves for different predictive models: (a) The model based on six feature variables (6V); (b) The model based on six feature variables and KPCA (6V-KPCA); (c) The
model based on nine feature variables and KPCA (9V-KPCA); (d) The model based on four feature variables and KPCA (4V-KPCA).
selection models. HL and Smix must be preserved because of relevance. In
contrast, HAM and HIM show a certain relevant in PCCs. The rest show little or even no correlation in other evaluation matrixes. The PCC performs better in the evaluation of linear correlation. For the present
nonlinear problem, the error of the evaluation by univariate feature selection and stability selection might be smaller. Thus the efficient subset
is further refined and consist of the feature variables with HE, HL, Smix
and δ.
The KPCA model was optimized by adjusting parameters, and the
feature variables with 4 dimensions is expanded to 13 dimensions.
The learning curves for prediction model based on four feature variables
and KPCA (4V-KPCA) are illustrated in Fig. 4(a). Compared with Fig. 4
(b) and (c), the learning curve of 4V-KPCA performs best in both predictive accuracies, convergence and learning rate. In particular, the predictive accuracy of SVM model attains as much as 0.98.
In order to reduce the random error, the learning processes were
carried out 30 times and the results were averaged. The predictive accuracy of different phases is illustrated in Fig. 5 with the models of MLP,
SVM and GBDT. The overall performances are consistent with the
above mentioned methodologies. The results for SVM and GBDT are better than that for MLP. Among the preprocessing methods, 4 V-KPCA performs best by the optimized feature variables and KPCA. The worst
predictive results were obtained by six unprocessed feature variables.
The accuracies of testing set predicted by SVM, GBDT and MLP for 4 VKPCA are 0.9743, 0.9780 and 0.9396,and those for training set are
0.9952, 1 and 0.994. The worst fitting for MLP may be caused by its instability. The difference of the accuracies predicted by GBDT based on
4 V-KPCA between training set and testing set is 0.022, and is bigger
than that for SVM. The model of SVM shows better convergence. From
Fig. 5. The predictive accuracies of different preprocessing methods and different models.
6V indicates the six feature variables are unprocessed; 6V-KPCA indicates that the six
feature variables are processed by Kernel PCA; 4 V-KPCA indicates that the four feature
variables are processed by Kernel PCA. MLP indicates multilayer perceptron model; SVM
indicates support vector machine model; GBDT indicates gradient boosting decision tree
model.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
L. Zhang et al. / Materials and Design 193 (2020) 108835
7
the variance perspective, the smaller variance means the more stablilty
of the model. The results of variance predicted by MLP based on 4 VKPCA for both training set and testing set are not good enough.
From Figs. 4-5, the results reflect that the increment of the number of
precise feature variable improves the predictive accuracy. The negative
effect of redundant feature variable will be further magnified by
expanding dimension of KPCA. The more dimensions KPCA expands,
the more complex the relationships between them become. A model
with too many variables ends up degrading the results. Therefore, it is
essential to select the most important feature variables and appropriately expand dimensions.
The predictive accuracies of SVM and GBDT in Fig. 4(d) and 5 are
slightly different. It is convenient and fast to evaluate the predictive effect of the ML models using accuracy, which, however, is too simple and
rough. For imbalanced data, the evaluation of F1-score [77] which comprehensively considered the precision and recall will be more sensitive.
The evaluation results of the classification prediction for each phase
with F1-score predicted by 6 V, 6 V-KPCA and 4 V-KPCA methods are
given in Fig. 6.
There are remarkable differences in the predictive F1-scores for different phases and methods. Compared with the predictive accuracies,
the results of F1-score for 4 V-KPCA still perform the best. We analyze
Fig. 6(c) in detail. The F1-score of IM phase is the lowest. The value of
variance for IM is larger than those of the rest. The predictive F1-score
for amorphous phase ranges from 0.8877 to 0.9463, and that for solid
solution phase ranges from 0.9567 to 0.9787. The F1-score for solid solution and intermetallic phase performs best and ranges from 0.9603 to
0.9863. This result indicates that the amorphous phase, solid solution
phase, and solid solution and intermetallic phase can be separated by
ML models. The predictive F1-score for intermetallic compound ranges
from 0.3613 to 0.8103. The F1-score is significantly lower than those obtained in other phases. The main reason is probably the following. On
one hand, the number of samples for intermetallic compound is too
small (i.e. 12). This is called imbalance of samples in ML. Too little sample data leads to underfitting of the model. The largest variance also
confirms the imbalance of samples. On the other hand, from Fig. 4(d),
the model has low predictive accuracy when the number of samples is
small. It is noteworthy that the model for SVM still fits better than the
other two when the number of samples is very small, this is inseparable
from the preprocessing feature variables by KPCA. For intermetallic
phase, the intermetallic is often precipitated in small amounts in solid
solution matrix. Then the present model does not fit well. However,
the predictive accuracy of 0.8103 for SVM is still significant for the prediction of mixtures of solid solution and intermetallic phase.
From the thermodynamic point of view, formation enthalpy plays an
important role in formation of phase. According to Miedema theory, the
calculation of enthalpy for different phases is different. But the enthalpy
of HAM and HIM is mainly composed of chemical enthalpy (i.e. HL). The
chemical enthalpy represents the combined effect of interatomic interactions during alloying under different atoms and structures in the
alloy. However, HAM contains chemical enthalpy and topological energy,
the express of HIM is very similar to that of HL. The difference between
them is small, indicating that the interatomic interactions in different
phase are similar. The alternative is chemical enthalpy, where it contributes significantly to the formation of phase. So it is important that HL is
retained in the feature selection. The elastic energy HE and atomic size
difference δ are closely related to the atomic radius and have the same
role. HE involves the effect of atomic size, and also involves interactions
between atoms of the same structure. Both HE and δ are important. Smix
Fig. 6. The predictive F1-scores of different preprocessing methods and different models.
(a) F1-scores were predicted based on the unprocessed six feature variables (6 V);
(b) F1-scores were predicted based on the six feature variables processed by Kernel PCA
(6 V-KPCA); (c) F1-scores were predicted based on the four feature variables processed
by Kernel PCA (4 V-KPCA). MLP indicates multilayer perceptron model; SVM indicates
support vector machine model; GBDT indicates gradient boosting decision tree model.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
8
L. Zhang et al. / Materials and Design 193 (2020) 108835
is a basic thermodynamic property (chemical mixing entropy) for HEA,
thus retained in the feature selection.
Takeuchi and Inoue [73] proposed empirical rule including the
chemical enthalpy ΔHC and the normalized mismatch entropy Sσ/kB.
Bhatt et al. [78] and Rao et al. [79] further developed empirical criterion
including the chemical enthalpy ΔHC, normalized mismatch entropy Sσ/
kB, the configurational entropy (Sc/R) and ΔHC(Sσ/kB)(Sc/R). In our previous work [57], ΔHC(Sσ/kB)(Sc/R) has been used to predict the amorphous forming composition ranges of Al-Fe-Nd-Zr system.
Furthermore, the chemical enthalpy is actually the mixing enthalpy of
liquid phase (HL). The mismatch entropy Sσ can be calculated by the
equation proposed by Mansoori [80]. In other word, Sσ can be calculated
by atomic radius and has a strong correlation with δ. Sc is actually Smix.
Therefore, the thermodynamic properties of chemical enthalpy ΔHC,
mismatch entropy Sσ and configurational entropy Sc can be used to distinguish different phases.
Besides in distinguishing between crystalline and amorphous states,
the ΔHC, Sσ and Sc play important roles in phase formation. This result is
consistent with the previously optimized feature variables of HE, HL, Smix
and δ. However, it's worth noting the empirical criterions from Refs [73,
78, 79] could not predict the different phases well. The functions of empirical criterions are still simple and can not to distinguish more complex phases. The deviation of the results in Refs [57,79] indicates that
the empirical criterion is not robust. A better criterion should be developed to predict different phases of HEA.
CRediT authorship contribution statement
Lei Zhang:Conceptualization, Investigation, Methodology, Data
curation, Formal analysis, Writing - original draft.Hongmei Chen:Supervision, Writing - review & editing.Xiaoma Tao:Supervision, Writing
- review & editing, Funding acquisition.Hongguo Cai:Methodology,
Data curation.Jingneng Liu:Methodology, Data curation.Yifang
Ouyang:Supervision, Writing - review & editing, Funding acquisition.
Qing Peng:Supervision, Writing - review & editing.Yong Du:Supervision, Writing - review & editing.
Declaration of competing interest
The authors declare no competing financial interest.
Acknowledgements
This work is financially supported by the National Natural Science
Foundation of China (Grant Nos. 51531009, 61962006 and
51961007), the Guangxi Natural Science Foundation (Grant Nos.
2016GXNSFBA380166 and 2018GXNSFAA281291), “BAGUI Scholar”
Program of Guangxi Zhuang Autonomous Region of China (201979)
and the Science Foundation of Guangxi Education Department
(Grant Nos. 2017KY1472).
References
4. Conclusions
The relationship between the phases and the nine thermodynamics
properties of HEAs is examined using machine-learning method accompany with extended Miedema theory. The thermodynamic properties of
HEAs were calculated by Miedema theory and geometric model. These
data were used as feature variables to establish feature dataset for ML.
The relative importance of the nine feature properties was evaluated
by the feature selection with Pearson correlation coefficient, univariate
feature selection, and stability selection. The parameters of HE and δ
have a strong relevance with phases. However, it need more parameters
to distinguish different phases. After removing irrelevant feature variables, the new subset consists of HAM, HIM, HE, HL, Smix and δ.
The ML models of MLP, SVM and GBDT were used to build implicit
mapping matrix and classify the dataset. The model of RBF kernel SVM
evaluated by learning curve with k-fold cross validation has the best
fitting. The predictive accuracy based on the six feature variables is
0.7528, which is below satisfactory. The four feature variables of HE,
HL, Smix and δ has been optimized and preprocessed with expanding dimensions by Kernel PCA. The different preprocessed method of KPCA
and the different models of MLP, SVM and GBDT were compared. The
model of SVM for 4 V-KPCA is the best overall due to higher stability
and convergence.
The imbalance of sample leads to the worst fitting of intermetallic
phase for the various models. The predictive accuracy and F1-score of
the model could be improved by increasing the number of samples
and the effective relevant feature variables. The expanding dimensions
by Kernel PCA improves the predictive results. The predictive accuracies
of SVM and GBDT for 4 V-KPCA are over 0.97. The F1-score of HEA (i.e.
SS) for 4 V-KPCA is 0.9787, and that for amorphous phase, mixture of
solid solution and intermetallic phase and intermetallic phase is
0.9463, 0.9863 and 0.8103, respectively. All of them are higher than
those from using MLP and GBDT. Therefore, the model of SVM combined
KPCA is the best ML model for the phase selection of HEAs in present
dataset. HE, HL, Smix and δ are the effective and relevant variables. The
present ML model is helpful to distinguish different phases in HEAs,
and beneficial to the discovery of the new HEAs.
[1] J.W. Yeh, S.K. Chen, S.J. Lin, J.Y. Gan, T.S. Chin, T.T. Shun, C.H. Tsau, S.Y. Chang, Nanostructured high-entropy alloys with multiple principal elements: novel alloy design
concepts and outcomes, Adv. Eng. Mater. 6 (2004) 299–303, https://doi.org/10.
1002/adem.200300567.
[2] B. Cantor, I.T.H. Chang, P. Knight, A.J.B. Vincent, Microstructural development in
equiatomic multicomponent alloys, Mater. Sci. Eng. A 375-377 (2004) 213–218,
https://doi.org/10.1016/j.msea.2003.10.257.
[3] S.H. Joo, H. Kato, M.J. Jang, J. Moon, E.B. Kim, S.J. Hong, H.S. Kim, Structure and properties of ultrafine-grained CoCrFeMnNi high-entropy alloys produced by mechanical
alloying and spark plasma sintering, J. Alloy. Compd. 698 (2017) 591–604, https://
doi.org/10.1016/j.jallcom.2016.12.010.
[4] W. Zhang, P.K. Liaw, Y. Zhang, Science and technology in high-entropy alloys, Sci.
China Mater. 61 (2018) 2–22, https://doi.org/10.1007/s40843-017-9195-8.
[5] Z. Lei, X. Liu, H. Wang, Y. Wu, S. Jiang, Z. Lu, Development of advanced materials via
entropy engineering, Scripta Mater 165 (2019) 164–169, https://doi.org/10.1016/j.
scriptamat.2019.02.015.
[6] M. Vaidya, G.M. Muralikrishna, B.S. Murty, High-entropy alloys by mechanical
alloying: a review, J. Mater. Res. 34 (2019) 664–686, https://doi.org/10.1557/jmr.
2019.37.
[7] E.P. George, D. Raabe, R.O. Ritchie, High-entropy alloys, Nat. Rev. Mater. 4 (2019)
515–534, https://doi.org/10.1038/s41578-019-0121-4.
[8] Y. Zhang, Z.P. Lu, S.G. Ma, P.K. Liaw, Z. Tang, Y.Q. Cheng, M.C. Gao, Guidelines in
predicting phase formation of high-entropy alloys, MRS Commun 4 (2014) 57–62,
https://doi.org/10.1557/mrc.2014.11.
[9] T. Yang, Y.L. Zhao, Y. Tong, Z.B. Jiao, J. Wei, J.X. Cai, X.D. Han, D. Chen, A. Hu, J.J. Kai, K.
Lu, Y. Liu, C.T. Liu, Multicomponent intermetallic nanoparticles and superb mechanical behaviors of complex alloys, Science 362 (2018) 933–937, https://doi.org/10.
1126/science.aas8815.
[10] L. Lilensten, J.P. Couzinie, L. Perriere, A. Hocini, C. Keller, G. Dirras, I. Guillot, Study of
a bcc multi-principal element alloy: tensile and simple shear properties and underlying deformation mechanisms, Acta Mater. 142 (2018) 131–141, https://doi.org/
10.1016/j.actamat.2017.09.062.
[11] O.N. Senkov, G.B. Wilks, J.M. Scott, D.B. Miracle, Mechanical properties of
Nb25Mo25Ta25W25 and V20Nb20Mo20Ta20W20 refractory high entropy alloys, Intermetallics 19 (2011) 698–706, https://doi.org/10.1016/j.intermet.2011.01.004.
[12] B. Gludovatz, A. Hohenwarter, D. Catoor, E.H. Chang, E.P. George, R.O. Ritchie, A
fracture-resistant high-entropy alloy for cryogenic applications, Science 345
(2014) 1153–1158, https://doi.org/10.1126/science.1254581.
[13] V. Shivam, Y. Shadangi, J. Basu, N.K. Mukhopadhyay, Alloying behavior and thermal
stability of mechanically alloyed nano AlCoCrFeNiTi high-entropy alloy, J. Mater.
Res. 34 (2019) 787–795, https://doi.org/10.1557/jmr.2019.5.
[14] Y.L. Chou, J.W. Yeh, H.C. Shih, The effect of molybdenum on the corrosion behaviour
of the high-entropy alloys Co1.5CrFeNi1.5Ti0.5Mox in aqueous environments, Corros.
Sci. 52 (2010) 2571–2581, https://doi.org/10.1016/j.corsci.2010.04.004.
[15] Y. Shi, B. Yang, P.K. Liaw, Corrosion-resistant high-entropy alloys: a review, Metals 7
(2017) 43, https://doi.org/10.3390/met7020043.
[16] R.K. Mishra, P.P. Sahay, R.R. Shahi, Alloying, magnetic and corrosion behavior of
AlCrFeMnNiTi high entropy alloy, J. Mater. Sci. 54 (2019) 4433–4443, https://doi.
org/10.1007/s10853-018-3153-z.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
L. Zhang et al. / Materials and Design 193 (2020) 108835
[17] Y. Zhang, T.T. Zuo, Y.Q. Cheng, P.K. Liaw, High-entropy alloys with high saturation
magnetization, electrical resistivity, and malleability, Sci. Rep. 3 (2013) 1455,
https://doi.org/10.1038/srep01455.
[18] U. Roy, H. Roy, H. Daoud, U. Glatzel, K.K. Ray, Fracture toughness and fracture
micromechanism in a cast AlCoCrCuFeNi high entropy alloy system, Mater. Lett.
132 (2014) 186–189, https://doi.org/10.1016/j.matlet.2014.06.067.
[19] O. Schneeweiss, M. Friák, M. Dudová, D. Holec, M. Šob, D. Kriegner, V. Holý, P. Beran,
E.P. George, J. Neugebauer, A. Dlouhý, Magnetic properties of the CrMnFeCoNi highentropy alloy, Phys. Rev. B 96 (2017) 014437, https://doi.org/10.1103/PhysRevB.96.
014437.
[20] M.C. Gao, J.W. Yeh, P.K. Liaw, Y. Zhang, High-Entropy Alloys: Fundamentals and Applications, Springer Press, Cham, 2016https://doi.org/10.1007/978-3-319-27013-5.
[21] W.M. Choi, S. Jung, Y.H. Jo, S. Lee, B.J. Lee, Design of new face-centered cubic high entropy alloys by thermodynamic calculation, Met. Mater. Int. 23 (2017) 839–847,
https://doi.org/10.1007/s12540-017-6701-1.
[22] J.E. Saal, I.S. Berglund, J.T. Sebastian, P.K. Liaw, Equilibrium high entropy alloy phase
stability from experiments and thermodynamic modeling, Scripta Mater 146 (2018)
5–8, https://doi.org/10.1016/j.scriptamat.2017.10.027.
[23] C. Jiang, B.P. Uberuaga, Efficient ab initio modeling of random multicomponent alloys, Phys. Rev. Lett. 116 (2016), 105501. https://doi.org/10.1103/PhysRevLett.
116.105501.
[24] Y. Lederer, C. Toher, K.S. Vecchio, S. Curtarolo, The search for high entropy alloys: a
high-throughput ab-initio approach, Acta Mater. 159 (2018) 364–383, https://doi.
org/10.1016/j.actamat.2018.07.042.
[25] Z. Liu, Y. Lei, C. Gray, G. Wang, Examination of solid-solution phase formation rules
for high entropy alloys from atomistic Monte Carlo simulations, JOM 67 (2015)
2364–2374, https://doi.org/10.1007/s11837-015-1508-3.
[26] C. Niu, W. Windl, M. Ghazisaeidi, Multi-cell Monte Carlo relaxation method for
predicting phase stability of alloys, Scripta Mater 132 (2017) 9–12, https://doi.
org/10.1016/j.scriptamat.2017.01.001.
[27] Y. Zhang, Y.J. Zhou, J.P. Lin, G.L. Chen, P.K. Liaw, Solid-solution phase formation rules
for multi-component alloys, Adv. Eng. Mater. 10 (2008) 534–538, https://doi.org/10.
1002/adem.200700240.
[28] X. Yang, Y. Zhang, Prediction of high-entropy stabilized solid-solution in multicomponent alloys, Mater. Chem. Phys. 132 (2012) 233–238, https://doi.org/10.
1016/j.matchemphys.2011.11.021.
[29] O.N. Senkov, D.B. Miracle, A new thermodynamic parameter to predict formation of
solid solution or intermetallic phases in high entropy alloys, J. Alloy. Compd. 658
(2016) 603–607, https://doi.org/10.1016/j.jallcom.2015.10.279.
[30] M.G. Poletti, L. Battezzati, Electronic and thermodynamic criteria for the occurrence
of high entropy alloys in metallic systems, Acta Mater. 75 (2014) 297–306, https://
doi.org/10.1016/j.actamat.2014.04.033.
[31] Y. Zhang, High-Entropy Materials: A Brief Introduction, Springer Press, 2019https://
doi.org/10.1007/978-981-13-8526-1.
[32] F. Zhang, H. Lou, B. Cheng, Z. Zeng, Q. Zeng, High-pressure induced phase transitions
in high-entropy alloys: a review, Entropy 21 (2019) 239, https://doi.org/10.3390/
e21030239.
[33] Y.J. An, L. Zhu, S.H. Jin, J.J. Lu, X.Y. Liu, Laser-ignited self-propagating sintering of
AlCrFeNiSi high-entropy alloys: an improved technique for preparing highentropy alloys, Metals 9 (2019) 438, https://doi.org/10.3390/met9040438.
[34] P. Raccuglia, K.C. Elbert, P.D.F. Adler, C. Falk, M.B. Wenny, A. Mollo, M. Zeller, S.A.
Friedler, J. Schrier, A.J. Norquist, Machine-learning-assisted materials discovery
using failed experiments, Nature 533 (2016) 73, https://doi.org/10.1038/
nature17439.
[35] Z.K. Liu, Ocean of data: integrating first-principles calculations and CALPHAD modeling with machine learning, J. Phase Equilib. Diff. 39 (2018) 635–649, https://doi.org/
10.1007/s11669-018-0654-z.
[36] K.T. Butler, D.W. Davies, H. Cartwright, O. Isayev, A. Walsh, Machine learning for molecular and materials science, Nature 559 (2018) 547, https://doi.org/10.1038/
s41586-018-0337-2.
[37] J.E. Gubernatis, T. Lookman, Machine learning in materials design and discovery: examples from the present and suggestions for the future, Phys. Rev. Mater. 2 (2018)
120301, https://doi.org/10.1103/PhysRevMaterials.2.120301.
[38] C. Wen, Y. Zhang, C. Wang, D. Xue, Y. Bai, S. Antonov, L. Dai, T. Lookman, Y. Su, Machine learning assisted design of high entropy alloys with desired property, Acta
Mater. 170 (2019) 109–117, https://doi.org/10.1016/j.actamat.2019.03.010.
[39] S.P. Ong, Accelerating materials science with high-throughput computations and
machine learning, Comput. Mater. Sci. 161 (2019) 143–150, https://doi.org/10.
1016/j.commatsci.2019.01.013.
[40] L. Himanen, A. Geurts, A.S. Foster, P. Rinke, Data-driven materials science: status,
challenges, and perspectives, Adv. Sci. 6 (2019) 1900808, https://doi.org/10.1002/
advs.201900808.
[41] G. Pilania, A. Mannodi-Kanakkithodi, B.P. Uberuaga, R. Ramprasad, J.E. Gubernatis, T.
Lookman, Machine learning bandgaps of double perovskites, Sci. Rep. 6 (2016),
19375. https://doi.org/10.1038/srep19375.
[42] S. Ubaru, A. Międlar, Y. Saad, J.R. Chelikowsky, Formation enthalpies for transition
metal alloys using machine learning, Phys. Rev. B 95 (2017), 214102. https://doi.
org/10.1103/PhysRevB.95.214102.
[43] A. Choudhury, T. Konnur, P.P. Chattopadhyay, S. Pal, Structure prediction of multiprincipal element alloys using ensemble learning, Eng. Comput. 37 (2019)
1003–1022, https://doi.org/10.1108/EC-04-2019-0151.
[44] S. Gong, W. Wu, F.Q. Wang, J. Liu, Y. Zhao, Y. Shen, S. Wang, Q. Sun, Q. Wang, Classifying superheavy elements by machine learning, Phys. Rev. A 99 (2019),
022110. https://doi.org/10.1103/PhysRevA.99.022110.
9
[45] N. Islam, W. Huang, H.L. Zhuang, Machine learning for phase selection in multiprincipal element alloys, Comput. Mater. Sci. 150 (2018) 230–235, https://doi.org/
10.1016/j.commatsci.2018.04.003.
[46] W. Huang, P. Martin, H.L. Zhuang, Machine-learning phase prediction of highentropy alloys, Acta Mater. 169 (2019) 225–236, https://doi.org/10.1016/j.
actamat.2019.03.012.
[47] Z.Q. Zhou, Y. Zhou, Q. He, Z. Ding, F. Li, Y. Yang, Machine learning guided appraisal
and exploration of phase design for high entropy alloys, npj, Computational Materials 5 (2019) 1–9, https://doi.org/10.1038/s41524-019-0265-1.
[48] S. Guo, C. Ng, J. Lu, C.T. Liu, Effect of valence electron concentration on stability of fcc
or bcc phase in high entropy alloys, J. Appl. Phys. 109 (2011), 103505. https://doi.
org/10.1063/1.3587228.
[49] Y.F. Ye, Q. Wang, J. Lu, C.T. Liu, Y. Yang, High-entropy alloy: challenges and prospects, Mater. Today 19 (2016) 349–362, https://doi.org/10.1016/j.mattod.2015.11.
026.
[50] D.B. Miracle, O.N. Senkov, A critical review of high entropy alloys and related concepts, Acta Mater. 122 (2017) 448–511, https://doi.org/10.1016/j.actamat.2016.08.
081.
[51] O.N. Senkov, D.B. Miracle, K.J. Chaput, J.P. Couzinie, Development and exploration of
refractory high entropy alloys-a review, J. Mater. Res. 33 (2018) 3092–3128, https://
doi.org/10.1557/jmr.2018.153.
[52] J. Chen, X. Zhou, W. Wang, B. Liu, Y. Lv, W. Yang, D. Xu, Y. Liu, A review on fundamental of high entropy alloys with promising high-temperature properties, J.
Alloy. Compd. 760 (2018) 15–30, https://doi.org/10.1016/j.jallcom.2018.05.067.
[53] F. He, Z. Wang, C. Ai, J. Li, J. Wang, J.J. Kai, Grouping strategy in eutectic multiprincipal-component alloys, Mater. Chem. Phys. 221 (2019) 138–143, https://doi.
org/10.1016/j.matchemphys.2018.09.044.
[54] Y.F. Ouyang, X.P. Zhong, Y. Du, Y.P. Feng, Y.H. He, Enthalpies of formation for the AlCu-Ni-Zr quaternary alloys calculated via a combined approach of geometric model
and Miedema theory, J. Alloy. Compd. 420 (2016) 175–181, https://doi.org/10.1016/
j.jallcom.2005.10.047.
[55] F.R. De Boer, W.C.M. Mattens, R. Boom, A.R. Miedema, A.K. Niessen, Cohesion in
Metals, North-Holland, Amsterdam, 1988.
[56] Z. Śniadecki, J.W. Narojczyk, B. Idzikowski, Calculation of glass forming ranges in the
ternary Y-Cu-Al system and its sub-binaries based on geometric and Miedema's
models, Intermetallics 26 (2012) 72–77, https://doi.org/10.1016/j.intermet.2012.
03.003.
[57] L. Zhang, H.M. Chen, Y.F. Ouyang, Y. Du, Amorphous forming ranges of Al-Fe-Nd-Zr
system predicted by Miedema and geometrical models, J. Rare Earth. 32 (2014)
343–351, https://doi.org/10.1016/S1002-0721(14)60077-6.
[58] L. Zhang, R.C. Wang, X.M. Tao, H. Guo, H.M. Chen, Y.F. Ouyang, Formation enthalpies
of Al-Fe-Zr-Nd system calculated by using geometric and Miedema's models,
Physica B 463 (2015) 82–87, https://doi.org/10.1016/j.physb.2015.01.023.
[59] L. Zhang, H.M. Chen, X.M. Tao, Y.F. Ouyang, Thermodynamics study of Al-based high
entropy quinary alloys, Chin. J. Nonferr. Met. 29 (2019) 2601–2608, https://doi.org/
10.19476/j.ysxb.1004.0609.2019.11.17.
[60] A.L. Blum, P. Langley, Selection of relevant features and examples in machine learning, Artif. Intell. 97 (1997) 245–271, https://doi.org/10.1016/S0004-3702(97)
00063-5.
[61] J. Reunanen, Overfitting in making comparisons between variable selection
methods, J. Mach. Learn. Res. 3 (2003) 1371–1382.
[62] A. Jović, K. Brkić, N. Bogunović, A review of feature selection methods with applications, 2015 38th International Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO), IEEE 2015, pp. 1200–1205,
https://doi.org/10.1109/MIPRO.2015.7160458.
[63] J. Lee Rodgers, W.A. Nicewander, Thirteen ways to look at the correlation coefficient,
Am. Stat. 42 (1988) 59–66, https://doi.org/10.2307/2685263.
[64] N. Meinshausen, P. Bühlmann, Stability selection, J. R. Stat. Soc. B 72 (2010)
417–473https://doi-org.proxy.lib.utk.edu/10.1111/j.1467-9868.2010.00740.x.
[65] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error
propagation, Readings in Cognitive Science, A Perspective from Psychology and Artificial Intelligence 1988, pp. 399–421, https://doi.org/10.1016/B978-1-4832-14467.50035-2.
[66] V. Vapnik, Estimation of Dependences Based on Empirical Data, Springer, New York,
2006https://doi.org/10.1007/0-387-34239-7.
[67] S.S. Haykin, Neural Networks and Learning Machines, Prentice Hall, New Jersey,
2009.
[68] J.H. Friedman, Stochastic gradient boosting, Comp. Stat. Data Anal. 38 (2002)
367–378, https://doi.org/10.1016/S0167-9473(01)00065-2.
[69] J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann.
Stat. 29 (2001) 1189–1232, https://doi.org/10.1214/aos/1013203451.
[70] B. Schölkopf, A. Smola, K.R. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (1998) 1299–1319, https://doi.org/10.1162/
089976698300017467.
[71] K.R. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf, An introduction to kernelbased learning algorithms, IEEE T. Neural Networ. 12 (2001) 181–201, https://doi.
org/10.1109/72.914517.
[72] R. Abbaschian, R.E. Reed-Hill, Physical Metallurgy Principles, PWS Publishing Company, Boston, 1994.
[73] A. Takeuchi, A. Inoue, Calculations of mixing enthalpy and mismatch entropy for ternary amorphous alloys, Mater. Trans. JIM 41 (2000) 1372–1378, https://doi.org/10.
2320/matertrans1989.41.1372.
[74] A. Takeuchi, A. Inoue, Classification of bulk metallic glasses by atomic size difference,
heat of mixing and period of constituent elements and its application to characterization of the main alloying element, Mater. Trans. JIM 46 (2005) 2817–2829,
https://doi.org/10.2320/matertrans.46.2817.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835
10
L. Zhang et al. / Materials and Design 193 (2020) 108835
[75] F. Tancret, I. Toda-Caraballo, E. Menou, P.E.J.R. Díaz-Del, Designing high entropy alloys employing thermodynamics and Gaussian process statistical analysis, Mater.
Design 115 (2017) 486–497, https://doi.org/10.1016/j.matdes.2016.11.049.
[76] Y. Zhang, C. Wen, C. Wang, S. Antonov, D. Xue, Y. Bai, Y. Su, Phase prediction in high
entropy alloys with a rational selection of materials descriptors and machine learning models, Acta Mater. 185 (2020) 528–539, https://doi.org/10.1016/j.actamat.
2019.11.067.
[77] C.J. Van Rijsbergen, Information Retrieval, Butterworth-Heinemann, London, 1979.
[78] J. Bhatt, W. Jiang, X. Junhai, W. Qing, C. Dong, B.S. Murty, Optimization of bulk metallic glass forming compositions in Zr-Cu-Al system by thermodynamic modeling,
Intermetallics 15 (2007) 716–721, https://doi.org/10.1016/j.intermet.2006.10.018.
[79] B.R. Rao, M. Srinivas, A.K. Shah, A.S. Gandhi, B.S. Murty, A new thermodynamic parameter to predict glass forming ability in iron based multi-component systems
containing zirconium, Intermetallics 35 (2013) 73–81, https://doi.org/10.1016/j.
intermet.2012.11.020.
[80] G.A. Mansoori, N.F. Carnahan, K.E. Starling, T.W. Leland Jr, Equilibrium thermodynamic properties of the mixture of hard spheres, J. Chem. Phys. 54 (1971)
1523–1525, doi:https://doi.org/10.1063/1.1675048.
Please cite this article as: L. Zhang, H. Chen, X. Tao, et al., Machine learning reveals the importance of the formation enthalpy and atom-size
difference in formin..., Materials and Design, https://doi.org/10.1016/j.matdes.2020.108835