Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Neural Network Kinetics: Diffusion Multiplicity and B2 Ordering in Compositionally Complex Alloys

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Neural Network Kinetics: Diffusion Multiplicity and B2 Ordering in

Compositionally Complex Alloys

Bin Xing1,2, Timothy J. Rupert1,2,3, Xiaoqing Pan1,2, Penghui Cao1,3,2*

1Center for Complex and Active Materials, University of California, Irvine, Irvine, California
92697, United States
2Department of Material Science and Engineering, University of California Irvine, Irvine,
California 92697, United States
3Department of Mechanical and Aerospace Engineering, University of California, Irvine, Irvine,
CA 92697, United States
* Email: caoph@uci.edu

Diffusion involving atom transport from one location to another governs many important
processes and behaviors such as precipitation and phase nucleation. Local chemical
complexity in compositionally complex alloys poses challenges for modeling atomic diffusion
and the resulting formation of chemically ordered structures. Here, we introduce a neural
network kinetics (NNK) scheme that predicts and simulates diffusion-induced chemical and
structural evolution in complex concentrated chemical environments. The framework is
grounded on efficient on-lattice structure and chemistry representation combined with
neural networks, enabling precise prediction of all path-dependent migration barriers and
individual atom jumps. Using this method, we study the temperature-dependent local
chemical ordering in a refractory Nb-Mo-Ta alloy and reveal a critical temperature at which
the B2 order reaches a maximum. Our atomic jump randomness map exhibits the highest
diffusion heterogeneity (multiplicity) in the vicinity of this characteristic temperature, which
is closely related to chemical ordering and B2 structure formation. The scalable NNK
framework provides a promising new avenue to exploring diffusion-related properties in the
vast compositional space within which extraordinary properties are hidden.

1
Introduction
Diffusion in materials dictates the kinetics of precipitation1, new phase formation2 and
microstructure evolution3, and strongly influences mechanical and physical properties 4. For
example, altering nanoprecipitate size and dispersion by thermal processing enables substantial
increases in strength and good ductility in multicomponent alloys5,6. Essentially rooted in diffusion
kinetics, predicting how fast local composition and microstructure evolve is a fundamental goal of
material science. In metals and alloys, diffusion processes are connected with vacancies, point
defects that mediate atom jumps in the crystal lattice. Molecular dynamics (MD)7 modeling based
on force fields or density functional theory, which probe the atomic mechanisms of diffusion at a
nanosecond timescale, are often not able to access slow diffusion kinetics-induced microstructure
change. To circumvent this time limitation inherent in MD, the kinetic Monte Carlo method (kMC)
is primarily adopted to model diffusion-mediated structure evolution, for instance, the early stage
of precipitation in dilute alloys8,9. In the kMC simulations, the crucial parameter (vacancy
migration energy) is generally parameterized from continuum models such as cluster expansion 10
and Ising model11, owing to the high computational cost in transition state search. The rise of
compositionally complex alloys (CCAs), commonly known as high-entropy alloys, brings many
intriguing kinetics behaviors, ranging from chemical short-range ordering12, precipitation6,
segregation13, and radiation defect annihilation14, which have yet to be fundamentally interpreted
and ultimately predicted. The chemical complexity of such events, however, poses an new
challenge for modeling diffusion-mediated processes due to local chemical fluctuations leading to
diverse activation barriers (i.e., a wide spectrum)15.

The emergence of machine learning methods has demonstrated the potential for addressing
computationally complex problems in materials science that involve nonlinear interactions and
massive combinatorial space16. One of the most promising examples is machine-learned
interatomic potentials that map a three-dimensional (3D) atomic configuration to its
conformational energy with a high accuracy at a substantially reduced computational cost 17. The
key step in machine learning in molecular science is converting atomistic structure into numerical
values (descriptive parameters–descriptor18) to represent the individual local chemical and
structural environments. Two successful atomic environment descriptors are atom-centered
symmetry function19 and smooth overlap of atomic position20. Being invariant to atomic rotation,
they are ideal for predicting atomic site related scalar properties, such as atomic energy and
segregation energy21, but incapable of directly predicting vector quantities, for instance, diffusion
path-dependent activation energy. More notably, the dimension of these local structure descriptors
(consideration of all neighboring atoms within a cutoff distance) increases quadratically with the
number of constituent elements22, which drastically escalates the number of parameters and
training time for the application of machine learning to chemically complex CCAs.

In this study, we introduce a neural network kinetics (NNK) scheme for predicting atomic diffusion
and its resulting microstructure evolution in compositionally complex materials. Grounded on an
efficient on-lattice atomic representation that converts individual atoms to neurons (while
2
maintaining the atomic structure), the NNK precisely describes atom (interneuron) interactions
through a neural network model and predicts neuron kinetics evolution, embodying physical atom
diffusion and microstructure evolution. Using refractory NbMoTa as a model system, we explore
chemical ordering and B2 phase formation mediated by diffusion kinetics and reveal the
anomalous diffusion (diffusion multiplicity) that is inherent in CCAs.

Figure 1. Schematic illustration of neural network kinetics (NNK). a, The on-lattice structure
and chemistry representation. A vacancy and its local atomic environment are encoded into a
digital matrix (neuron map). b, NNK framework consists of a neural network that outputs vacancy
migration barriers, and a neuron kinetics module implementing neuron jump (diffusion jump).
Vacancy diffusion and resultant chemical evolution are fully mirrored by efficient neuron map
evolution.

Results
Neural network kinetics scheme. Figure 1a shows the on-lattice structure and chemistry
representation, where the initial atomic configuration with a vacancy is encoded into a digital
matrix, or neuron map. The digits (1, 2, and 3) represent the corresponding atom types, and 0
denotes the vacancy (refer to Figure S1 for conversion and visualization of 3D crystals). This
digital matrix capturing structure and composition features offers several advantages important as
a descriptor18. The map dimension 𝑂(𝑁) scales linearly with the number of atoms 𝑁, which has
the lowest dimension possible as descriptor. Importantly, the determination of the descriptive map
is simple and involves no intensive calculation or painstaking parameter tuning. Essential for
diffusion, the representation is rotational non-invariant and enables prediction of diffusion path-
dependent activation barriers (vector quantities). These vectorized digits are then passed to the
NNK model and serve as input neurons.

3
Figure 1b depicts the schematic of the NNK which consists of an artificial neural network and a
neuron kinetics module. The introduced neural network (with more than two hidden layers) is
designed to learn the nonlinear interactions between input neurons (i.e., atoms and vacancy), and
to predict the diffusion energy barriers. Notably, the network only uses the vacancy and its
neighboring neurons as inputs, resulting in a low and constant computational cost (independent of
system size) without sacrificing accuracy (see Supplementary section 3 for details). With the
available barriers associated with each individual diffusion path, the neuron kinetics module adopts
the kinetic Monte Carlo method to carry out diffusion kinetics evolution (see Methods). There are
two features rendering the NNK a high computational efficiency and scalability with system size.
First, the descriptor map is calculated only once for the initial atomic configuration, because
atomic diffusion and local chemical evolution are operated on the representing neuron map.
Second, since atomic diffusion depends solely on the local chemical environment, the NNK trained
on small configurations can be directly applied to large systems for diffusion modeling.

Predicting a path-dependent diffusion barrier spectrum in multidimensional composition


space. Diffusion in crystals occurs through elementary atomic jumps between a vacancy and its
neighboring lattice sites (vacancy mechanism4,23). In body-centered cubic (bcc) CCAs, a vacancy
is associated with eight different jump directions, and the variation in the jumping atoms and
surrounding chemical environment can result in eight distinct migration barriers15,24. By utilizing
the rotational non-invariant lattice representation, it is possible to predict the jump path-dependent
barriers (a vector quantity) from a single chemical configuration. Specifically, by aligning each
diffusion path to a constant reference orientation through rotation and/or mirroring operations (see
Table S1), unique neuron map and digital vector, 𝐷𝑖 , can be generated for each individual
diffusion path 𝑖, without breaking the structural symmetry, as demonstrated in Figure 2a.

The neural network takes in 𝐷𝑖 , which carries local atomic environment encompassing the
vacancy, as input. The data (atomic digits) then flow through hidden layers to the output layer,
which predicts the associated diffusion activation barrier, 𝐸𝑖 . The first hidden layer in neural
network characterizes the linear contribution of the input neurons (atoms and vacancy) to the
migration barrier, while the following hidden layers capture the nonlinear and high-order
interactions that impact vacancy jump. With just four hidden layers and 112 neighboring atoms
(up to the 8th nearest neighbor shell) of the vacancy, the neural network achieves a high level of
accuracy in predicting the path-dependent diffusion barrier (Supplementary section 4 and Figures
S12-14 for the testing of different neural network structures). Figure 2b presents the evaluation of
machine learning model performance for two different concentrations (one concentrated and one
dilute), where the predicted energy barrier value is compared with the ground truth (see Methods).
The predicted and true values exhibit the same spectrum of barriers, and the mean absolute error
(MAE) is less than 1.2% of the average true migration barrier for the two alloys, concentrated
solution Ta33Nb33Mo33 and dilute solution Ta90Nb5Mo5 (see Figure S3 for new compositions and
different system sizes).

4
Figure 2. Predicting diffusion barrier spectra in the entire composition space of Nb-Mo-Ta.
a, Creation of unique neuron maps and feature vectors for each individual diffusion path, which
enables the prediction of eight path-dependent barriers from a vacancy. b, Performance of neural
network in predicting diffusion barrier spectrum in concentrated, Ta33Nb33Mo33, and dilute,
Ta90Nb5Mo5, solutions. c, Diffusion barrier diagram generated by the neural network. The
nonequimolar Nb15Mo65Ta20 alloy exhibits the highest barrier in the Nb-Mo-Ta system.

After training on only tens of compositions (Supplementary section 5 and Figures S15-16), the
neural network remarkably harnesses the complete composition space of the ternary Ta-Nb-Mo
system, building the relationship between composition and diffusion barrier spectrum. Figure 2c
shows the diffusion barrier diagram generated by the neural network, from which the alloy
(Nb15Mo65Ta20) having the highest mean barrier is quickly identified. While research efforts have
been primarily focused on equimolar or near-equimolar compositions, our results indicate an
abnormal behavior can originate from non-equimolar concentrations hidden in the vast
composition space. The neural network, which accurately predicts diffusion barriers for new and

5
unseen compositions, implies that it fully deciphers the complex local chemistry variation and
links it with diffusion property.

Diffusion kinetics-induced local chemical order. Originating from the attractive and repulsive
interactions among the constituent elements of CCAs, atomic diffusion leads to the emergence of
local chemical order on a short- to medium-range scale. To uncover diffusion-mediated chemical
ordering and its dependence on annealing temperature, we employ the NNK model to simulate the
equimolar NbMoTa system, using a model which contains 1,024 lattice sites, at temperatures
ranging from 100 - 3000 K. With the ability to resolve individual atomic jump and the low
computational cost, 20 million diffusion jumps are carried out for each temperature.

Figure 3. Diffusion kinetics-mediated local chemical order in the equimolar NbMoTa alloy.
a, Variation of chemical order 𝛿ij obtained at different annealing temperatures displays a critical
temperature that divides the map into two characteristic regimes, denoted as diffusion-favored (I)
and diffusion-limited (II). b, Development of Mo-Ta order, 𝛿𝑀𝑜−𝑇𝑎 , as a function of diffusion
jumps from 2 × 104 to 2 × 107 . The inset shows that the jump number dependence of peak
temperature suggest the critical temperature ~800 K below which the chemical ordering is
suppressed.

Figure 3a shows the change of the local chemical order 𝛿𝑖𝑗 as a function of temperature. Here the
non-proportional order parameter, 𝛿𝑖𝑗 , quantifies the chemical order between a pair of atom types
𝑖 and 𝑗 in the first nearest neighbor shell (see Methods). A positive 𝛿𝑖𝑗 indicates a higher number
of pairs compared to a random solid solution, suggesting that element 𝑖 prefers to bond with
element 𝑗 (favored pairing), while a negative value suggests an unfavored pairing. At a high
temperature (3000 K), the system ultimately approaches the random solid solution, as reflected by
the small value of 𝛿𝑖𝑗 . As the temperature decreases, the magnitudes of 𝛿𝑖𝑗 for Mo-Ta, Ta-Ta, Mo-
Mo pairs increase monotonically until they reach a turning point (around 800 K), beyond which

6
the trend reverses. The chemical order falls rapidly as the temperature is further lowered and, at
400 K, it nearly vanishes. It is noted that the system experienced an identical number of 20 million
jumps at all temperatures. These results suggest the existence of a critical temperature at which the
diffusion-favored ordering reaches a maximum (Regime I in Figure 3a). Below the critical value
(Regime II), diffusion jumps barely develop and enhance chemical order.

To better understand this critical temperature and how the number of diffusion jumps affects it, we
present the 𝛿Mo−Ta order parameter values obtained from a wide range of jumps, from 2 × 104 to
2 × 107, in Figure 3b. As the number of jumps increases, the characteristic temperature 𝑇(𝛿𝑚𝑎𝑥 )
corresponding to the maximum order gradually shifts to lower values. The inset of Figure 3b
illustrates the variation of 𝑇(𝛿𝑚𝑎𝑥 ) with diffusion jumps, again unveiling this critical temperature
below which diffusion-mediated ordering is substantially limited.

Figure 4. Jump randomness and diffusion multiplicity of an equimolar NbMoTa alloy. a,


Schematics of two limiting lattice jump modes. One of the eight paths is predominated in
7
directional jump (jump randomness 𝑅 = 0 ), while all eight paths have the same hopping
probability in random jump ( 𝑅 = 1 ). b, Spatial and statistical distributions of lattice jump
randomness, 𝑅, at three representative temperatures. At 3000 K the distribution of 𝑅 (𝑅peak = 0.7)
indicates highly random diffusion, while at 400 K the lattice jumps transform to directional
diffusion mode (𝑅peak = 0.0). Lattice jumps at 800 K exhibit highly heterogeneous diffusion
modes, shown by the broad distribution of 𝑅. c, Diffusion multiplicity 𝑉𝑎𝑟(𝑅) as a function of
temperature reveals a critical temperature (~850 K) at which diffusion is more heterogeneous
(widest distribution of 𝑅 ). Moving to the two ends, diffusion approaches simple random or
directions modes at ultimate high and low temperatures, respectively.

Jump randomness and diffusion multiplicity in CCAs. In monoatomic crystals, the diffusion
of vacancy can be described as purely random, with each possible jump path having an equal
probability of occurrence. However, in CCAs, local variations in chemical composition give rise
to distinct and path-dependent energy barriers, resulting in a multivariate distribution of jump
probabilities. For example, in bcc CCAs, the jump probability for each of the eight possible paths
associated with a vacancy site can be expressed as 𝑝𝑖 = 𝑒𝑥𝑝(−𝐸𝑖 /𝑘B 𝑇)/ ∑8𝑗=1 𝑒𝑥𝑝(𝐸𝑗 /𝑘B 𝑇) ,
where 𝐸𝑖 is the energy barrier of path 𝑖, 𝑘B is Boltzmann constant, and 𝑇 is temperature. This can
lead to various diffusion modes, as illustrated in Figure 4a, with the two limiting cases being pure
random jump (where all jump paths have the same probability of occurrence) and non-random,
directional lattice jump (where one path predominates). To quantify the degree of lattice jump
randomness, we define an order parameter 𝑅 = 1 − σ(𝒑)/max(σ), where σ(𝒑) is the standard
deviation of jump probability, 𝒑, and max(σ) is the maximum standard deviation occurring in
directional jump. Note the randomness parameter, 𝑅, ranges from 0 to 1, with R = 1 and R = 0
representing the limiting cases of random diffusion and directional diffusion, respectively.

Figure 4b shows spatial and statistical distributions of lattice jump randomness 𝑅 at three
representative temperatures. The spatial maps display color-coded lattices based on their
respective 𝑅 values. At a high temperature of 3000 K, the thermal energy (𝑘𝐵 𝑇 ≫ 𝐸𝑖 ) smears out
the energy barrier difference between paths, leading to a peak 𝑅 value of 0.7, indicating highly
random jumps. It is tempting to speculate that random atomic diffusion is insufficient to build and
develop B2 ordered phase, which apparently corresponds to the low order observed at high
temperatures (Figure 3a). At a low temperature of 400 K, the lattice jumps transform into
directional diffusion, as demonstrated by the 𝑅 distribution having a peak value of 0. This implies
that only one of the eight diffusion pathways is active at each lattice site. Presumably, this one-
dimensional directional diffusion predominating at low temperatures (< 400 K) limits and
suppresses the nucleation and growth of three-dimensional B2 structure. Intriguingly, at an
intermediate temperature (~800 K), the lattice jump randomness 𝑅 exhibits a broad distribution,
spanning from 0.0 to 0.7, indicating highly heterogeneous diffusion modes.

8
To assess the system-level diffusion multiplicity (heterogeneity) and its temperature dependence,
we calculate the variance of diffusion randomness Var(𝑅) across temperatures ranging from 100
- 3000 K, as illustrated in Figure 4c. When close to the high or low-temperature ends, there is a
rapid change in Var(𝑅), implying that diffusion approaches a random or directional mode. The
temperature variation of Var(𝑅) reveals a peak value of diffusion multiplicity at around 850 K.
Random and directional-type lattice jumps are spatially interspersed throughout the entire system,
as shown in the spatial map of Figure 4b. The observation of the highest diffusion multiplicity
(Figure 4c) and maximum B2 order (Figure 3a) occurring in the same intermediate temperature
range suggests a strong correlation between diffusion heterogeneity and the formation of local
chemical order.

Figure 5. B2 structure nucleation and growth kinetics during annealing in NbMoTa. a, B2


cluster size evolution with the number of diffusion jumps. b-d, Spatial distributions of growing
B2 cluster at 1 × 106, 5 × 106, and 1 × 107 diffusion jumps. Clusters are color coded by their
size.

9
B2 structure nucleation and growth kinetics. Determining the formation kinetics of chemically
ordered structure in a complex solid solution has been a challenge due to the local chemical
fluctuations and huge amounts of diffusion barriers. The NNK framework efficiently and precisely
predicting diffusion barrier at any chemical environment is intended to address this issue. To
demonstrate the efficacy of the model, we perform aging simulations of NbMoTa consisting of
120,000 atoms. Figure 5a-c shows the spatial-temporal nucleation and evolution of B2 structure
induced by diffusion. With 1 × 106 diffusion jumps, considerable amount of B2 clusters emerge
in the system (Figure 5b), most of which are small clusters (size < 8 atoms). As the number of
diffusion jumps further increases (1 × 107), large clusters begin to appear and continue to grow,
accompanied by annihilation and reduction of small ones (Figure 5a). The decrease in spatially
isolated small clusters are a result of their attachment or adsorption by nearby growing large ones.
Apart from small clusters, another essential kinetic process underlying growth is large cluster
interaction and coalescence. When two spreading clusters come near to each other, they merge
into a large one mediated by diffusion (Figure S7). Figure 5d reveals the spatial distribution of
formed B2 clusters colored by their size in the aged material. In contrast to the precipitation of
ordered nanoparticles in dilute solutions, the more heterogenous growth of chemically ordered
structure signifies the substantial role of diffusion multiplicity in governing the complex chemical
ordering in concentrated solutions.

Discussion
Diffusion kinetics in the emergent compositionally complex materials25,26 (often called high-
entropy alloys and high entropy oxides) raise many intriguing rate-controlling phenomena and
properties, such as chemical short-range order12, chemically ordered nanoparticle formation27,
decomposition28, superionic conductivity29, extraordinary radiation tolerance14,30, to new a few.
These behaviors are controlled by the underlying atomic diffusion, which occurs in a chemical
environment with a high degree of local composition fluctuations. Uncovering the kinetic
processes and predicting structure evolution in these materials requires novel computational
techniques that can disentangle their chemical complexity and connect it with individual atomic
jumps. The NNK scheme introduced here aims to tackle the kinetic behaviors arising from
diffusion processes, with a particular focus on this novel class of materials. Underpinned by an
interpretable chemistry and structure representation (neuron map), the neural network precisely
predicts the diffusion-path dependent energy barriers governing individual atomic jumps. The
atomic diffusion and structure variations are effectively modeled on the neuron map through
neuron digit exchange (Figure 1b). This framework possesses three key advantages that give both
high computational efficiency and accuracy. Firstly, the interpretable on-lattice representation,
which converts chemistry and structure to physically equivalent neuron maps, yields an ultra-small
feature size, critical for machine learning models. Secondly, the determination of neuron map is a
one-time process, as it can be updated to fully replicate atomic diffusion jumps and structure
evolution. Crucially, the rotational non-invariance of the neuron map enables the prediction of

10
vector values from a single neuron map (vacancy configuration). Thirdly, the NNK trained by
small models can be applied directly to investigate the kinetic behavior of large systems without
sacrificing accuracy. This size scalability is demonstrated, for instance, by accurate barrier
predictions (see Figure S3, S17) and ordered phase growth in large NbMoTa systems (Figure 5).

Stemming from attractive/repulsive interactions between solutes, atomic diffusion inevitably leads
to nucleation of chemically ordered structure in CCAs during annealing. Using the NNK and bcc
NbMoTa as model system, we uncover the existence of a critical temperature, at which the B2
order reaches its maximum value. This temperature dependence of chemical order is closely
related to the underlying lattice jump randomness, as shown by the randomness maps (Figure 4).
At high temperatures close to melting point, diffusion jumps ultimately approach a purely random
process, corresponding to a low propensity for order formation. At low temperatures, lattice
diffusion becomes dominated by the lowest barrier path, manifesting as directional jumping and
restricting the nucleation of chemically ordered structure. At the critical temperature in the
intermediate range, random-like and directional-type lattice jumps spread the entire system,
exhibiting the highest diffusion heterogeneity (multiplicity, Figure 4c). By tracking individual B2
clusters during annealing, it is found that their nucleation and growth are intermittent and non-
uniform, accompanied by the reduction and annihilation of small clusters (Figure 5 and
Supplementary Video 1). This salient feature in kinetics growth of B2 structure is not captured by
fictitious thermodynamics-based modeling using random atom type swap (see Methods and Figure
S8), which shows a more uniform growth (Figure S9). These results highlight the complex and
multitudinous kinetic pathways in CCAs towards stable states, where many processes like ordered
structure nucleation, annihilation, growth, and rearrangement are interplayed and coordinated.

The neural network trained on dozens of compositions demonstrates remarkable predictability for
unseen compositions, unveiling the entire ternary space of Nb-Mo-Ta (Figure 2c). With the design
space for composition being practically limitless, the compositionally complex material formed by
mixing multiple element opens a new frontier waiting to be explored. Traditional structure-
property calculations relying on density functional theory and molecular dynamics work well for
small datasets but fall short in harnessing the vast composition space. Recent advances in the
rapidly growing field of machine learning creates a fertile ground for computational material
science31,32, having led to the discovery of alloys with optimal properties33. By directly connecting
the multidimensional composition with diffusion barrier spectra, the NNK illuminates a bright path
to explore the vast compositional space of CCAs, where hidden extraordinary kinetic properties
lie.

Methods
Material system and diffusion barrier calculation

11
We focus on the emergent refractory CCA, Nb-Mo-Ta, as the study system to demonstrate the
neural network kinetics (NNK) scheme. When generating diffusion datasets for training the neural
networks, we use relatively small atomic models which has 10 × 10 × 10 unit cells (containing
2000 atoms). The climbing image nudged elastic band (CI-NEB)34 method is adopted to compute
the vacancy diffusion energy barriers in the Nb-Mo-Ta system using a state-of-the-art machine
learning potential35. For one initial configuration of a vacancy, the eight final configurations are
prepared by swapping the vacancy with its first nearest neighbor atoms. By labeling each diffusion
path, the path-dependent diffusion energy barriers are therefore generated. Before CI-NEB
calculation, both initial and final configurations are optimized to their local energy minimum
states. The CI-NEB inter-replica spring constant is set to be 5 eV/Å2, and the energy tolerance and
force tolerance are 0 eV and 0.01 eV/Å, respectively. The choice of parameters that optimize
convergence of the calculations result in essentially the same energy barrier using smaller tolerance
and large spring constant.

Structure representation and neural networks


The on-lattice representation coverts the atomic structure into a digit matrix, which will be
deciphered by neural networks. The conversion is done through a voxel grid that separates the 3D
material model into uniform cubes. Each grid acquires a digit value (voxel) according to its
enclosed atom type or vacancy. For bcc structure, the largest grid we can use, which can fully
distinct all lattices and yield the smallest voxel grid dimensionality, is 𝑎/2, where 𝑎 is the lattice
constant of the crystal (see Supplementary section 2).

The neural network, taking the representative structure and chemistry digits (neurons) as input,
process them through the hidden layers, outputting the energy barriers. The connections between
neurons in the hidden layers imitate the physical interactions between atoms and atom-vacancy.
Representing the interaction strength (contribution to the migration barrier), the weights associated
with the connections are adjusted during training. To understand the influence of network
architecture on prediction performance, we train a series of neural networks with varying number
of layers and number of neurons in each layer (Supplementary section 4). As the number of
neurons in each hidden layer increases from 16, 32, 64, to 256, the testing MAE rapidly decreases,
followed by convergence at 128 that is enough to explicitly describe all the local neighbors of a
vacancy (Figure S11). By testing the different number of layers, the final network structure with 4
hidden layers and 128 neurons in each layer is screened out for the balanced best performance. In
addition, we separately train a convolutional neural network (CNN) to compare with the simple
neural network. The CNN comprises four convolutional layers that compress the 3D neuroma map
to 1 × 128 dimension for barrier prediction. The architecture of CNN is depicted in Figure S14
and described in Supplementary section 4. Likely resulting from adaptive learning spatial
hierarchies of features from input 3D atomic structure, CNN exhibits slightly enhanced predictive
performance (Figure S17).
The training data are generated from 46 different compositions which uniformly sample the Nb-
Mo-Ta diagram (Figure S15). In Supplementary section 5, we carefully study and discuss the
number of compositions required to train a highly accurate network for predicting the complete
ternary space. Each composition model contains 2000 atoms, giving rise to 16,000 diffusion
barriers. The total 736,000 data points are split into training dataset (95% total data) and validation
dataset (5%). After validation, the neural network is tested for barrier prediction in unseen

12
compositions and in atomic configuration with different sizes. For example, Figure S3 shows the
testing results for the new compositions, Nb10Mo10Ta80, Nb20Mo60Ta20, Nb40Mo30Ta30, and the
average MAE is around 0.018 eV. Notably the neural network preserves the consistent high
accuracy for different sized systems containing 512, 2000, and 6750 atoms, indicating scalability.

Neuron kinetics. Atom diffusion and kinetics evolution are carried on the neuron map using the
kinetic Monte Carlo (kMC) algorithm. Diffusion occurs through vacancy (vacancy neuron) jump
to its nearest neighboring sites, and each site has a jump rate defined by 𝑘𝑖 = 𝑘0 exp⁡(−𝐸𝑖 /𝑘𝐵 𝑇),
where 𝐸𝑖 is the energy barrier along jump path 𝑖, 𝑘𝐵 is Boltzmann constant, 𝑇 is temperature, and
𝑘0 is an attempt frequency. The vacancy diffusion barriers associated with the eight jump paths
are obtained from the neural network. The total jump rate for the current vacancy configuration is
𝑅 = ∑8𝑖=1 𝑘𝑖 , i.e., the sum of all individual elementary rate. To simulate kinetic evolution, we first
draw a uniform random number 𝑢 ∈ (0,1] and select a diffusion path, 𝑝, which satisfies the
𝑝−1 𝑝
condition36, ∑𝑖=1 𝑘𝑖 /𝑅 ≤ 𝑢 ≤ ∑𝑖=1 𝑘𝑖 /𝑅. The vacancy jump along path 𝑝 are then executed by
exchanging the vacancy with the selected neighboring neuron (neuron digit swapping), resulting
in an updated neuron map for the next iteration.

Hybrid Monte Carlo and molecular dynamics simulation


We perform static Monte Carlo (MC) simulations coupled with molecular dynamics to reveal the
chemical order determined by enthalpy (mainly thermodynamics). In each MC trial, a pair of atoms
is randomly selected for type swap. The acceptance probability is according to the
exp(−𝛥𝐻/𝑘𝐵 𝑇) in Metropolis algorithm37. The term 𝛥𝐻 is the enthalpy change after swap,
therefore, the chemical evolution and ordering is predominately contorted by enthalpy. The MC
swaps are followed by MD equilibration. For the systems consisting 1024 atoms, we perform
18,000 swap attempts (each atom on average subjected to 18 swaps) and 600 ps MD equilibrium.
Figure S8 shows the local order as a function of MC step for temperatures from 100 - 3000 K. To
study B2 cluster growth, we perform the MC and MD simulation in a large model (128,000 atoms).
There are totally 135,000 swaps coupled with 150 ps MD equilibrium. Unlike diffusion-mediated
B2 cluster growth, the clusters grow in a uniform and homogeneous manner (Figure S9).

Local chemical order parameter


To quantify the degree of chemical order, we use the non-proportional parameter38 𝛿𝑖𝑗 = 𝑁𝑖𝑗 −
𝑁0,𝑖𝑗 , where 𝑁𝑖𝑗 denotes the actual number of pairs between atoms 𝑖 and 𝑗 in the first nearest
neighboring shell, and 𝑁0,𝑖𝑗 represents the average number of pairs in random solutions. A positive
𝛿𝑖𝑗 means a favored and increased number of 𝑖-j pairs, indicating element 𝑖 tends to bond with
element 𝑗. A negative 𝛿𝑖𝑗 indicates unfavored pair, meaning 𝑖 and 𝑗 repel each other. Random solid
solution has 𝛿𝑖𝑗 = 0.

B2 cluster analysis
Mo and Ta tend to attract each other and form the B2 structure. The B2 unit cell has the simple
bcc structure and comprises two species, Ta and Mo, orderly located in the cube corners or center.
The unit cell can have either Ta or Mo-centered pattern. Because of the high concentration of Nb
in the equimolar NbMoTa alloy, we characterize a unit as B2 when 3/4 of the Ta nearest neighbors
are Mo, or 3/4 of the Mo nearest neighbors are Ta. To analysis the B2 cluster, the identified

13
individual B2 units are gathered into individual group according to distance criterion. Two B2
units can have volume-, face-, edge-, and point-sharing at distance √3𝑎/2, 𝑎, √2𝑎, √3a (i.e., 5th
shell), respectively, where 𝑎 is lattice constant (illustrated in Figure S6). Choosing the cut-off
distance as half of the 5th shell and 6th shell, the spatial distribution and size of all B2 clusters can
be successfully characterized. During the kinetic annealing, clusters can be reduced or annihilated,
which causes clusters appearance or disappearance from time to time. The fluctuation hinders
visualization and analysis of stable B2 cluster evolution. To address this issue, we search and
identify the persist clusters that exist all the time during annealing. Focusing on the persist cluster
provides a clear evolution of cluster growth (Figure 5).

Author contributions
P.C. conceived the research idea, wrote the manuscript, and generated the figures with inputs from
B.X.. B.X. developed the model, implemented the code, and performed simulation and modeling.
T.J.R and X.P. reviewed and edited the manuscript. All authors contributed to data analysis and
project discussion.

References
1. Sun, W., Zhu, Y., Marceau, R., Wang, L., Zhang, Q., Gao, X. & Hutchinson, C.
Precipitation strengthening of aluminum alloys by room-temperature cyclic plasticity.
Science (80-. ). 363, 972–975 (2019).
2. Kim, S.-H., Kim, H. & Kim, N. J. Brittle intermetallic compound makes ultrastrong low-
density steel with large ductility. Nature 518, 77–79 (2015).
3. Ardell, A. J. & Ozolins, V. Trans-interface diffusion-controlled coarsening. Nat. Mater. 4,
309–316 (2005).
4. Mehrer, H. Diffusion in solids: fundamentals, methods, materials, diffusion-controlled
processes. Springer Ser. solid-state Sci. (2007).
5. Jiang, S. et al. Ultrastrong steel via minimal lattice misfit and high-density
nanoprecipitation. Nature 544, 460–464 (2017).
6. Yang, Y., Chen, T., Tan, L., Poplawsky, J. D., An, K., Wang, Y., Samolyuk, G. D.,
Littrell, K., Lupini, A. R., Borisevich, A. & George, E. P. Bifunctional nanoprecipitates
strengthen and ductilize a medium-entropy alloy. Nature 595, 245–249 (2021).
7. Allen, M. P. & Tildesley, D. J. Computer simulation of liquids. (Oxford university press,
2017).
8. Mao, Z., Sudbrack, C. K., Yoon, K. E., Martin, G. & Seidman, D. N. The mechanism of
morphogenesis in a phase-separating concentrated multicomponent alloy. Nat. Mater. 6,
210–216 (2007).
9. Caillard, D., Bienvenu, B. & Clouet, E. Anomalous slip in body-centred cubic metals.
Nature 609, 936–941 (2022).
10. Mayer, J. E. & Montroll, E. Molecular Distribution. J. Chem. Phys. 9, 2–16 (1941).
11. Ising, E. Beitrag zur Theorie des Ferromagnetismus. Zeitschrift für Phys. 31, 253–258
(1925).
12. Zhang, R., Zhao, S., Ding, J., Chong, Y., Jia, T., Ophus, C., Asta, M., Ritchie, R. O. &
Minor, A. M. Short-range order and its impact on the CrCoNi medium-entropy alloy.

14
Nature 581, 283–287 (2020).
13. Li, L., Li, Z., Kwiatkowski da Silva, A., Peng, Z., Zhao, H., Gault, B. & Raabe, D.
Segregation-driven grain boundary spinodal decomposition as a pathway for phase
nucleation in a high-entropy alloy. Acta Mater. 178, 1–9 (2019).
14. Du, J., Jiang, S., Cao, P., Xu, C., Wu, Y., Chen, H., Fu, E. & Lu, Z. Superior radiation
tolerance via reversible disordering–ordering transition of coherent superlattices. Nat.
Mater. (2022).
15. Xing, B., Wang, X., Bowman, W. J. & Cao, P. Short-range order localizing diffusion in
multi-principal element alloys. Scr. Mater. 210, 114450 (2022).
16. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for
molecular and materials science. Nature vol. 559 547–555 (2018).
17. Friederich, P., Häse, F., Proppe, J. & Aspuru-Guzik, A. Machine-learned potentials for
next-generation matter simulations. Nat. Mater. 20, 750–761 (2021).
18. Ghiringhelli, L. M., Vybiral, J., Levchenko, S. V, Draxl, C. & Scheffler, M. Big Data of
Materials Science: Critical Role of the Descriptor. Phys. Rev. Lett. 114, 105503 (2015).
19. Behler, J. & Parrinello, M. Generalized Neural-Network Representation of High-
Dimensional Potential-Energy Surfaces. Phys. Rev. Lett. 98, 146401 (2007).
20. Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys.
Rev. B 87, 184115 (2013).
21. Wagih, M. & Schuh, C. A. Learning Grain-Boundary Segregation: From First Principles
to Polycrystals. Phys. Rev. Lett. 129, 046102 (2022).
22. Schmidt, J., Marques, M. R. G., Botti, S. & Marques, M. A. L. Recent advances and
applications of machine learning in solid-state materials science.
23. Was, G. S. Radiation Materials Science. (Springer, 2007).
24. Fan, Z., Xing, B. & Cao, P. Predicting path-dependent diffusion barrier spectra in vast
compositional space of multi-principal element alloys via convolutional neural networks.
Acta Mater. 237, 118159 (2022).
25. George, E. P., Raabe, D. & Ritchie, R. O. High-entropy alloys. Nat. Rev. Mater. 4, 515–
534 (2019).
26. Oses, C., Toher, C. & Curtarolo, S. High-entropy ceramics. Nat. Rev. Mater. 5, 295–309
(2020).
27. Yang, T. et al. Multicomponent intermetallic nanoparticles and superb mechanical
behaviors of complex alloys. http://science.sciencemag.org/.
28. Otto, F., Dlouhý, A., Pradeep, K. G., Kuběnová, M., Raabe, D., Eggeler, G. & George, E.
P. Decomposition of the single-phase high-entropy alloy CrMnFeCoNi after prolonged
anneals at intermediate temperatures. Acta Mater. 112, 40–52 (2016).
29. Zeng, Y., Ouyang, B., Liu, J., Byeon, Y. W., Cai, Z., Miara, L. J., Wang, Y. & Ceder, G.
High-entropy mechanism to boost ionic conductivity. Science (80-. ). 378, 1320–1324
(2022).
30. El-Atwani, O., Li, N., Li, M., Devaraj, A., Baldwin, J. K. S., Schneider, M. M., Sobieraj,
D., Wróbel, J. S., Nguyen-Manh, D., Maloy, S. A. & Martinez, E. Outstanding radiation
resistance of tungsten-based high-entropy alloys. Sci. Adv. 5, eaav2002 (2019).
31. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine
learning: Generative models for matter engineering. Science (80-. ). 361, 360–365 (2018).
32. Hart, G. L. W., Mueller, T., Toher, C. & Curtarolo, S. Machine learning for alloys. Nat.
Rev. Mater. 6, 730–755 (2021).

15
33. Rao, Z. et al. Machine learning–enabled high-entropy alloy discovery. Science (80-. ).
378, 78–85 (2022).
34. Henkelman, G., Uberuaga, B. P. & Jónsson, H. A climbing image nudged elastic band
method for finding saddle points and minimum energy paths. J. Chem. Phys. 113, 9901–
9904 (2000).
35. Yin, S., Zuo, Y., Abu-Odeh, A., Zheng, H., Li, X.-G., Ding, J., Ong, S. P., Asta, M. &
Ritchie, R. O. Atomistic simulations of dislocation mobility in refractory high-entropy
alloys and the effect of chemical short-range order. Nat. Commun. 12, 4873 (2021).
36. Gillespie, D. T. A general method for numerically simulating the stochastic time evolution
of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976).
37. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation
of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).
38. Ding, J., Yu, Q., Asta, M. & Ritchie, R. O. Tunable stacking fault energies by tailoring
local chemical order in CrCoNi medium-entropy alloys. Proc. Natl. Acad. Sci. 115, 8919–
8924 (2018).

16
Supplementary Information
Neural Network Kinetics: Diffusion Multiplicity and B2 Ordering in
Compositionally Complex Alloys

Table of Contents

1. Supplementary Figures 1-9 2

2. On-lattice structure and chemistry representation 10

3. Determining the cutoff distance 11

4. Architecture of neural network and convolutional neural network 12

5. Number of compositions for predicting the entire ternary composition 15


space

17
1. Supplementary Figures

Figure S1. On-lattice presentation of local atomic environments in equimolar NbMoTa alloy.
(a) Atom plane containing a vacancy (color-coded by black). (b) Enlarged view of the region
within the circle region in (a) with cutoff distance 7.5 angstroms. (c-d) Digit matrix (neuron map)
converted from atomic structure. (e-g) 3D illustration of atomic configuration within/below the
vacancy-containing layer. (h-j) Vacancy and its first nearest neighbor atoms, and the
corresponding neuron map.

18
Table S1. Operations of aligning eight diffusion pathways with the reference direction.

Path Rotation Mirror Path Rotation Mirror


V-1 0 No V-5 0 Yes
V-2 0.5π No V-6 0.5π Yes
V-3 π No V-7 π Yes
V-4 -0.5π No V-8 -0.5π Yes

P P
P P P P 1 P
3 1
P P
Path 2

0.5π 3
encoding
V V 0
rotation 2
3 2
2
vacancy
3 0 3 0 0 0 3 0 2 ··· 0 ··· 1 0 1 0 0 0 2 0 2

P P
P P P P 1 P
1 3
P
Path 3

P π 3
encoding
V V 0
rotation 2
2 2
3
vacancy
1 0 3 0 0 0 2 0 3 ··· 0 ··· 1 0 3 0 0 0 2 0 2

P P P
P P 2
π rotation 2 2
Path 7

encoding 3
V + V 0
P mirror 1
1 3
P P
P 3
vacancy
2 0 3 0 0 0 1 0 3 ··· 0 ··· 2 0 2 0 0 0 1 0 3

P P P
P P 2
2 3
-0.5π rotation
Path 8

encoding 2
V + V 0
P mirror 3
1 3
P P P 1
vacancy
2 0 2 0 0 0 1 0 1 ··· 0 ··· 2 0 3 0 0 0 3 0 3

Figure S2. Aligning diffusion pathways 2, 3, 7 and 8 with the reference direction.

19
Figure S3. Performance of neural network in predicting diffusion barrier spectrum in unseen
compositions and varying system sizes (scalability). Three compositions, including
Nb10Mo10Ta80, Nb20Mo60Ta20, Nb40Mo30Ta30, and three systems containing 512, 2,000, and 6750
atoms are shown.

20
Figure S4. Diffusion and chemical ordering in NbMoTa alloy from NNK simulation at 1,000
K. (a) The accumulated diffusion time as a function of jumps. (b) Variation of chemical order
parameters with jump. (c) Initial atomic configuration with random solid solution, and (d) aged
structure demonstrating B2 ordered cluster.

21
Figure S5. Variation of the chemical order parameter as a function of diffusion jump
obtained from NNK simulation. The simulations are conducted at twenty different temperatures,
ranging from 3,000 K to 100 K, as indicated in the labels.

22
Figure S6. B2 cluster identification. (a-e) A cluster consists of two B2 cells that share volume,
face, edge, and vertices. (f) The corresponding separation distance between the two B2 cells.

23
Figure S7. Formation and coalescence of B2 clusters in a small equimolar NbMoTa model.
Panel. (a) shows the variation in B2-centred atoms as the number of jumps increases. (b) the same
configuration is displayed, but with the entire B2 cells visible. After 105 jumps, the two clusters
combine into one, represented in green. (c) displays the B2 cluster size distribution obtained after
varying numbers of jumps. (d) depicts the number of B2 cells as a function of atomic jumps, with
panel (e) indicating a decrease in the number of isolated B2 cells with increasing atomic jumps.

24
Figure S8. Variation of chemical order obtained at different annealing temperatures using
the random swap (see Methods).

Figure S9. B2 structure morphology generated from a random swap MCMD simulation,
exhibiting a more uniform distribution.

25
2. On-lattice structure and chemistry representation
We use on-lattice representation to convert local atomic environments into digital matrices in
which each value represents one atom or vacancy. To achieve this, we follow two rules: divide the
material model into a grid of pixels; place each atom at the center of one pixel. With the periodicity
of crystalline structures, the rules provide us guidance in digitalizing the material model
reasonably.

Figure S10. On-lattice representation and pixel size determination. (a) A unit cell with lattice
constant 𝑎. (b-c) depicts the grid separating the atomic model into uniform cells (pixels), with the
pixel size 𝑠 = 𝑎/2 (b) and 𝑠 = 𝑎/4 (c).

Figure S10 schematically illustrates the on-lattice representation, which converts a 2D atomic
structure into a matrix. The conversion is achieved using a pixel grid, which divides the structure
into uniform cells or pixels. For bcc structure, the largest grid we can use, which can fully distinct
all lattices and yield the smallest voxel grid dimensionality, is 𝑠⁡ = ⁡𝑎/2, where 𝑎 is the lattice
constant of the crystal, as shown in Figure S10b. In general, the structure domain can be equally
divided into pixels with size 𝑠 = 𝑎/2𝑛, where 𝑛⁡ = ⁡1, 2⁡3…. For instance, Figure S10c shows the
representation using pixel size 𝑠 = 𝑎/4. Once converting the model into pixels, we can encode
each pixel based on the local atom type as illustrated in the main text. The selection of pixel size
depends on the material structure alone without involving any hyperparameters which typically
exist in other structure descriptors. This avoids the need to adjust and select any hyperparameters.
Furthermore, it enables us to use the largest pixel that fully captures the local structure and
chemical information, reducing the burden of storage and accelerating the training of machine
learning models. In Figure S1 (on page 2), we illustrate the process of converting local atomic
environments into digit matrices for a 3D crystal.

26
3. Determining the cutoff distance
Vacancy diffusion and associated activation barrier depend on local atomic environment. The
impact of surrounding atoms on vacancy diffusion should decay as the distance increases. Beyond
a certain critical distance, the impact becomes negligible. To determine the critical distance, we
examine the dependence of model prediction performance on the cutoff distance. Figure S11a
presents a radial distribution function g(r)⁡from a NbMoTa alloy, which indicates that atoms
within 7.5 Å are separated into eight shells. When using a larger cutoff distance, we consider atoms
in higher order shells, thus more atoms. Figure S11b shows the dependence of number of atoms
on the cutoff distance. The number of atoms increases from 8 to 112 when the cutoff distance
increases from 3.0 to 7.5 Å (meaning we consider atoms in more shells), leading to a more
informative local environment representation. For each cutoff distance, we create a dataset from
four alloys, including Nb33Mo33Ta33, Nb50Mo25Ta25, Nb25Mo50Ta25 and Nb25Mo25Ta50, each
containing 2000 atoms. For each alloy, we compute the activation barriers at all lattice sites along
all directions, resulting in 64,000 barriers and corresponding images in the dataset. The dataset is
split into two parts, with 80% used for training and 20% for validation. For each cutoff distance,
we train a neural network with 4 hidden layers and 128 hidden layer units. Figure S11c shows the
mean absolute errors (MAEs) of prediction on both training and validation datasets at different
cutoff distances. The validation error decreases from 0.117 eV to 0.036 eV as the cutoff distance
increases from 3.0 to 7.5 Å. It almost converges at later stage from 7.0 to 7.5 Å, indicating that 7.5
Å is an effective cutoff distance for representing the local atomic environment. However, we note
that the neural network model and dataset have not reach a good balance for most cutoff distances,
as evidenced by the gap between training and validation error. Further tuning of the network
architecture can solve this problem. Nonetheless, our goal here is solely to demonstrate how the
cutoff distance influences the diffusion barrier prediction using identical neural network model for
all cases. We expect the conclusion will not change if we further adjust the neural network models
at different cases.

a b c
Training
Validation

Figure S11. Effect of cutoff distance on neural network prediction. (a) The radial distribution
function g(r)⁡of bcc NbMoTa. (b) The number of neighboring atoms surrounding a vacancy as a
function of cutoff distance. (c) The machine learning prediction error as a function of cutoff
distance has converged at 7.5 Å.

27
4. Architecture of neural network and convolutional neural network
The two critical parameters determining the architecture of a neural network include the number
of layers, the number of neurons in each layer. To understand the influence of architecture on
prediction performance, we train different neural networks using a dataset containing 46
compositions, as denoted in Figure S12a, with 16,000 barrier data points each. The dataset is split
into two parts, 95% as training dataset and 5% as validation dataset. We train a set of neural
networks with different numbers of hidden layers (from 1 - 4) and numbers of hidden layer units
(16 - 256). We use 69,920 data points (10% of the whole training dataset) to train the networks
and then compare the performance of different models on the validation dataset. Figure S12b
shows the mean absolute errors of prediction for these models, and Figure S13 presents a direct
comparison between true values and predicted value from different neural network models. The
prediction error decreases with either increasing the number of layers or neurons and begin to
converge for the model with 128 neurons and 2 layers. This suggests that the second order
interaction from two hidden layers is sufficient to capture the vacancy-atom interactions.
Additionally, the convergence on 128 neurons has physical meaning as they can explicitly capture
the 112 neighboring atoms of a vacancy.

In addition to the classic neural network, we have also trained a convolutional neural network
(CNN) using the same datasets. Figure S14 depicts the structure of the CNN, which comprises one
input layer, four convolutional layers, and one output layer. To the input layer, we feed the 3D
neuron map (images), and in each of the four convolutional layers, we apply filters of size
3 × 3 × 3. The number of filters used in the convolutional layers is 32, 64, 128, and 128, which is
equivalent to the number of channels of the generated images. Consequently, the data dimension
reduces to 1 × 1 × 1 × 128 from the original 9 × 9 × 9 × 1 . Following each convolutional
operation, we apply batch normalization (before the activation function), which provides benefits
such as a reduction of sensitivity to model parameter initialization, regularization. The Rectified
Linear Unit (ReLU) serves as the activation function, and the data from the final convolutional
layer is converted to a one-dimensional vector of length 128 before being passed to the output
layer. The output layer, comprising a single neuron, predicts the diffusion barrier. The CNN model
is trained for 100 epochs using the Adam optimizer with an initial learning rate of 0.001 and a
batch size of 32. After each epoch, the model is evaluated on the validation dataset to monitor the
evolution of the loss, which is represented by the mean square error. If the validation loss fails to
decrease after 10 consecutive epochs, the learning rate is decreased by a factor of 10. This smaller
learning rate reduces oscillation, avoids divergence of the optimization, and contributes to
convergence to the nearby minimum point. The minimum allowed learning rate is 1x10-5, as a
learning rate that is too low can greatly slow down the training procedure and waste computational
resources. Once the learning rate reaches the minimum value, it remains constant for the rest of
the training process. The training procedure ends after 100 epochs, regardless of whether the
learning rate has reached the minimum value. The parameters of the model with the best
performance are saved for further use.

28
Figure S12. Neural network prediction performance with different numbers of hidden layers
and units. (a) For the neural network model, 46 compositions selected from the NbMoTa
compositional space for training. (b) Different neural network models are evaluated based on their
prediction errors on the validation dataset.

Figure S13. Prediction performance of neural network models with varying hidden layers
and the number of units. The prediction accuracy increases with increasing the number of hidden
layers and units.

29
Figure S14. Architecture of the convolutional neural network, consisting of one input layer,
four convolutional layers, and one output layer.

30
5. Number of compositions for predicting the entire ternary composition space
We select different numbers of compositions to train both neutral network and CNN models, in
order to understand how many compositions are required to predict the entire compositional space.
Figure S15a depicts the 46 compositions uniformly distributed in the compositional space. Figure
S15b-d, illustrate 1, 4 and 10 compositions (red colored points) located at the center region of the
compositional space, respectively. Each composition comprises 16000 barrier data points. For 1-
composition dataset, 80% and 20% data are used for training and validation respectively. For 4-
composition dataset, 90% and 10% data are used for training and validation respectively. For 10-
composition and 46-composition datasets, 95% and 5% data are used for training and validation,
respectively.

Figure S16 presents the prediction performance (i.e., MAEs) in an unseen equimolar NbMoTa
alloy as function of number of compositions used for training models. The prediction error
decreases rapidly when the number of training compositions increases from 1 to 4. When the
number of compositions increases from 4 to 46, the prediction error is further lowered with a small
amount. The trend indicates that the addition of data from dilute solutions (i.e., the corners of
compositional space) can improve model prediction performance, but not as significant as
concentrated solutions. The CNN model performance from 4-composition dataset (MAE = 0.026
eV) is remarkable (the average barrier is around 1.5 eV), which implies that the CNN model has
deciphered the chemical complexity and successfully linked it with diffusion barriers. As to the
neural network models, it is worthwhile to note that the prediction errors barely change when we
increase the number of hidden layers, suggesting the 4 layers of neural network are sufficient for
the barrier prediction. Compared to neural network, the CNN shows enhanced performance with
lower MAEs, implying the added convolutional layers capture the large-scale atomic patterns
contributing to vacancy migration. In Figures S17, we present the performance of CNN model
(trained from 46-composition dataset) for unseen compositions and different model sizes. The
neural networks trained using small configurations precisely predict diffusion barriers in large
atomic configuration (computational scalability).

31
a b

25

25
75

75
%)

%)
Mo

Mo
at.

at.
(at

(at
50

50
50

50
(

(
Ta

Ta
. % 25

. % 25
)

)
75

75
25 50 75 25 50 75
Nb (at. %) Nb (at. %)

c d
25

25
75

75
50 %)

50 %)
Mo 0

Mo 0
at.

at.
(at

(at
5

5
(

(
Ta

Ta
. % 25

. % 25
)

)
75

75
25 50 75 25 50 75
Nb (at. %) Nb (at. %)

Figure S15. Compositions used for building different training datasets. (a) depicts forty-six
compositions occupying the NbMoTa compositional space uniformly. (b-d) depict one
composition, four compositions, ten compositions (red-colored points) located at the center of the
compositional space representing concentrated alloys.

NN - 4 layers
NN - 6 layers
NN - 10 layers
CNN

Figure S16. Prediction error of neural network (NN) and CNN as a function of the number
of compositions used during training. The evaluation is done on previously unseen compositions
in Nb-Mo-Ta, and the results indicate that including more than four compositions leads to a rapid
convergence of the network's performance.

32
MAE=0.013 eV MAE=0.013 eV MAE=0.014 eV
Nb10Mo10Ta80 Nb10Mo10Ta80 Nb10Mo10Ta80

512 atoms 2000 atoms 6750 atoms

MAE=0.018 eV MAE=0.018 eV MAE=0.019 eV


Nb20Mo60Ta20 Nb20Mo60Ta20 Nb20Mo60Ta20

512 atoms 2000 atoms 6750 atoms

MAE=0.019 eV MAE=0.021 eV MAE=0.021 eV


Nb40Mo30Ta30 Nb40Mo30Ta30 Nb40Mo30Ta30

512 atoms 2000 atoms 6750 atoms

Figure S17. Performance of CNN in predicting diffusion barrier spectrum in unseen


compositions and varying system sizes (scalability). Three compositions, including
Nb10Mo10Ta80, Nb20Mo60Ta20, Nb40Mo30Ta30, and three systems containing 512, 2,000, and 6750
atoms are shown. The architecture of Convolutional neural network is illustrated in Figure S14.

33

You might also like