Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Szakdolgozat Bence MSC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Master Thesis

X-ray line prole analysis using articial


intelligence algorithms

Dajka Bence

Physics Ma., Scientic Data Analytics and Modeling specialization

Consultant:

Dr. Ribárik Gábor

ELTE, Department of Materials Physics


Abstract
X-ray line prole analysis is a valuable technique for determining the struc-
tural properties of crystalline materials. However, the manual analysis meth-
ods traditionally used in this eld are time-consuming and subjective. This
master thesis focuses on the development and application of articial intelli-
gence (AI) algorithms to predict the structural parameters of X-ray diraction
(XRD) samples, aiming to enhance the eciency and accuracy of the analysis
process. The models leverage advanced machine learning techniques, including
deep neural networks, to extract relevant features and accurately estimate size
and strain parameters. The thesis also highlights dierent approaches to deal
with a large dataset, such as, out-of-core learning and dimension reduction.
The Convolutional Multiple Whole Prole (CMWP) software is utilized to gen-
erate synthetic datasets for training and validating the AI models, enabling
comprehensive evaluations of their performance. The thesis also proposes an
extention to this software for acceleration of serial evaluation of XRD peak
proles. The outcomes of this research demonstrate the potential of AI algo-
rithms in predicting structural parameters of XRD samples.

ii
Acknowledgement
I would like to thank Gábor Ribárik for helping me to prepare and write
the thesis.

Budapest, 31 May 2021.

iii
Contents
1 Theory of X-ray Diraction 1
1.1 Geometry of Crystalline Materials . . . . . . . . . . . . . . . . . . . . 1
1.2 Elements of X-ray Diraction . . . . . . . . . . . . . . . . . . . . . . 4
1.3 X-ray Line Broadening . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Size broadening . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Strain broadening . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Data Generation and Preprocessing 15


2.1 CMWP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Statistical Description of Multi-Output Regression 19


3.1 Overview of Regression in Statistics . . . . . . . . . . . . . . . . . . . 20
3.2 Multi-Output Regression . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Mean Squared Error (MSE) . . . . . . . . . . . . . . . . . . . 21
3.3.2 Mean Absolute Error (MAE) . . . . . . . . . . . . . . . . . . 21
3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 R-squared (R2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.2 Root Mean Squared Error (RMSE) . . . . . . . . . . . . . . . 22
3.4.3 Relative Root Mean Squared Error (RRMSE) . . . . . . . . . 23

4 Machine Learning Approaches for Multi-Output Regression 23


4.1 Tree-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1 Random Forest Regressor . . . . . . . . . . . . . . . . . . . . 24
4.1.2 Extremely Random Trees Regressor . . . . . . . . . . . . . . . 25
4.2 K-nearest Neighbors model . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Ensemble methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.2 Bagging (Bootstrap Aggregating) . . . . . . . . . . . . . . . . 28
4.3.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iv
4.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.1 Structure of Neural Networks . . . . . . . . . . . . . . . . . . 29
4.4.2 Working Principles of Neural Networks . . . . . . . . . . . . . 30
4.4.3 Backpropagation and Stochastic Gradient Descent . . . . . . . 30

5 Results 31
5.1 Out-of-core learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Out of the box methods . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Ensamble methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.5 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.6 Serial evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Conclusion 31

v
List of Figures
1.1 Prole function for large number of atoms . . . . . . . . . . . . . . . 8
1.2 Size function with xed σ . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Size function with xed m . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Dislocation congurations . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Strain prole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Al6Mg6 XRD peak pattern . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 CMWP t example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

List of Tables
2.1 Parameters for the theoretical XRD pattern generation. . . . . . . . . 18

vi
1 Theory of X-ray Diraction
This chapter lays the groundwork for understanding the fundamental principles
and concepts that underlie the research presented in this thesis. Its purpose is to
provide a comprehensive theoretical framework that enables a deeper understanding
of the phenomena, models, and methodologies employed throughout the study.

1.1 Geometry of Crystalline Materials


In the eld of solid-state physics, our focus is on crystalline materials, which
exhibit long-range periodicity in their structure. A key characteristic of these ma-
terials is their translational symmetry, which implies the existence of an elementary
cell within the material. When this elementary cell is translated, it lls the en-
tire sample without any gaps. Mathematically, this translation can be described as
follows:

r′ = r + n1 a1 + n2 a2 + n3 a3 = r + R, (1.1)
where r and r′ represent identical positions. Here, a1 , a2 , and a3 are the primitive
lattice vectors, and the R vectors are the lattice vectors formed by combining the
primitive lattice vectors as R = 3i=1 ni ai , where ni are integers.
P

The elementary cell, which is a parallelepiped spanned by the primitive lattice


vectors, represents the smallest repeating unit of the material. Its volume is given
by:

V0 = a1 · a2 · a3 . (1.2)
Within the elementary cell, the positions, types, and numbers of atoms (ions,
molecules) are determined. These can range from one to several thousand, depend-
ing on the complexity of the crystal. The combination of the elementary cells and
the atoms within them is referred to as the basis. The basis repeats periodically
in crystals, and the endpoints of the lattice vectors dene a lattice structure. By
knowing the lattice structure and the basis, we can dene the crystal structure.
The repeated spatial patterns present in the crystal structure are referred to
as the symmetries of the crystal. Symmetry elements, such as rotations and reec-
tions, are used to characterize these patterns. In two dimensions, there are ten such

1
elements, which give rise to plane point groups. By combining these elements and
considering the symmetries arising from sliding operations, a total of 17 plane groups
can be obtained. Expanding the discussion of symmetries to three dimensions, we
have 230 space groups derived from the 17 plane groups and 32 plane point groups
derived from the ten plane point groups. Additionally, there are 14 distinct lattice
structures obtained from the ve dierent plane lattices. These are known as Bra-
vais lattices. The space groups characterize the lattice structures, while the Bravais
lattices describe the elementary cells. To distinguish these symmetry groups, they
are denoted by letter or number symbols. For example, m originates from the word
mirror and denotes reection for symmetry elements, while p refers to primitive
lattices for lattice types.
The 14 distinct Bravais lattices can be classied into 7 crystal systems. These
systems include cubic, tetragonal, orthorhombic, rhombohedral, hexagonal, mono-
clinic, and triclinic Bravais lattices. The specic crystal system to which a crystal
belongs is determined by the lengths of the sides of its elementary cell and the angles
between them.
Materials with a crystalline structure exhibit long-range periodicity in their
atomic arrangement. To characterize the lattice structure, we introduce a concept
called the reciprocal lattice. Instead of using the lattice vectors a1 , a2 , and a3 , we
dene reciprocal lattice vectors b1 , b2 , and b3 with a unique correspondence such
that

ai · bj = 2πδij , (1.3)
where δij is the Kronecker delta. The reciprocal lattice vectors can be obtained
as follows:

a2 × a3
b1 = 2π , (1.4)
V0
a3 × a1
b2 = 2π , (1.5)
V0
a1 × a2
b3 = 2π , (1.6)
V0

where V0 is the volume of the lattice cell given by

2
V0 = a1 · (a2 × a3 ). (1.7)
The volume of the reciprocal lattice cell, denoted as VB , is given by

8π 3
VB = b1 · (b2 × b3 ) = . (1.8)
V0
The reciprocal lattice vectors can be expressed as linear combinations of the
reciprocal lattice basis vectors:
3
where Kj = 0, ±1, ±2, . . . (1.9)
X
K= Kj bj ,
j=1

The reciprocal lattice vectors span a reciprocal lattice in the reciprocal space,
and their scalar products with the corresponding lattice vectors are integer multiples
of 2π .
Lattice planes are crucial for characterizing the lattice structure and are often
referred to as lattice planes. Lattice planes that are equidistant and parallel to each
other form a lattice plane set. These plane sets are characterized by Miller indices,
denoted as (hkl). To determine the Miller indices, we consider three lattice vectors
ai (i = 1, 2, 3) that form a coordinate system intersecting the lattice planes. The
reciprocals of the distances from the lattice plane intersections with the axes, when
multiplied by the least common multiple of these distances, yield the Miller indices
(hkl).
The reciprocal lattice vectors and lattice plane sets are uniquely related. Speci-
cally, the vector (a1 /h − a2 /k) lies within the (hkl) lattice plane. The scalar product
of this vector with the reciprocal lattice vector Ghkl is zero, which also applies to
the vectors (a2 /k − a3 /l) and (a3 /l − a1 /h):

a 3
a2  a a2 
1
(1.10)
X 1
K − hk = Kj bj − hk = 0.
h k j=1
h l

The distance dhkl between two neighboring (hkl) lattice planes can be obtained
by expressing the distance of the plane intersected by a1 /h, a2 /k, and a3 /l from the
origin. This can be calculated by nding the projection of the vector a1 /h onto the
normal to the (hkl) plane, denoted as Khkl :

3
a1 Khkl a1 hb1 + kb2 + lb3 2π
dhkl = = = . (1.11)
h |Khkl | h |Khkl | |Khkl |
Thus, the reciprocal lattice vectors and lattice plane sets uniquely determine
each other in reciprocal space. They provide essential information about the lattice
structure in solid-state physics.

1.2 Elements of X-ray Diraction


X-ray radiation, which has a wavelength on the order of 10−10 meters, is well-
suited for studying the structure of crystalline materials due to its similarity in
magnitude to the lattice constants. When X-rays interact with a solid, they diract,
and by analyzing the spatial positions and relative intensities of the resulting in-
tensity maxima, we can determine the crystal structure of unknown materials. The
description of X-ray scattering is based on the following assumptions:

ˆ X-ray photons have a constant wavelength, resulting in elastic scattering.

ˆ The phase shift during diraction is consistent among dierent scattering cen-
ters, ensuring coherent scattering.

ˆ Multiple scattering events can be neglected.

Coherent scattering gives rise to an interference pattern determined by the spatial


arrangement of scattering centers. Let's consider two scattering centers: one is chosen
as the origin (O) of a coordinate system, and the other center (P ) is at a distance
r from it. A beam of X-rays with wavelength λ, represented by the wave vector k0
(|k0 | = 2π/λ), is incident on these points. The wave vector of the scattered radiation
is denoted as k, and due to elastic scattering, its wavelength is also λ (|k| = 2π/λ).
The following relation can be established:

(k0 − k) · r = −G · r, (1.12)
where G is the reciprocal lattice vector. The amplitude of the radiation in the
direction of k is given by:

A(K) = A0 e−iK·r , (1.13)

4
where A0 depends on the scattering strength and the intensity of the incident
beam. When considering multiple scattering centers, the resulting amplitude can be
expressed as the sum of the individual amplitudes:
!
(1.14)
X X
A(K) = fp e−iK·rp e−iK·Rn ,
n p

where Rn is the vector from the origin O to the nth lattice cell, and rp is the
position vector of the pth atom in the primitive cell. The term within the parentheses
is known as the structure factor, which describes the scattering of a cell. It can be
written as:

(1.15)
X
F (K) = fp e−iK·rp ,
p

where fp represents the scattering contribution of the pth atom in the primitive
cell, depending on the atom type and the directions of the incident and scattered
beams. Substituting this into the equation for A(K), we have:

(1.16)
X
A(K) = F (K) e−iK·Rn .
n

The intensity of the radiation can be obtained by squaring the absolute value of
the amplitude:

I(G) = |A(G)|2 = |F (G)|2 · |e−iG·Rn |2 . (1.17)


Next, let's examine the conditions for obtaining intensity maxima. For the second
term on the right-hand side of the equation, which appears in the expression for
I(G), to yield a maximum, it must satisfy the condition that the dot product of
the K vector and any reciprocal lattice vector Rn is an integral multiple of 2π . As
discussed in the section on reciprocal lattice, these vectors are the reciprocal lattice
vectors. Thus, we can state that intensity maxima occur in the directions of k where:

k − k0 = Khkl , (1.18)
and Ghkl = hb1 + kb2 + lb3 , which is the scattering vector corresponding to the
(hkl) Miller indices. During X-ray diraction measurements, the detector rotates

5
around the sample to investigate the scattering of rays, allowing determination of
the reciprocal lattice vectors Ghkl , which characterize the lattice structure.
The equation k − k0 = Khkl can be expressed in another form. The dierence
between the wave vectors k and k0 , calculated from a common starting point, is
equal to the reciprocal lattice vector Khkl . The magnitudes of the two wave vectors
are 2π/λ, while the magnitude of the reciprocal lattice vector is 2π/dhkl . In the
right-angled triangle formed by these three vectors, the following relationship holds:

2dhkl sin Θ = λ, (1.19)


where Θ is half the angle between k and k0 . This equation demonstrates that for
radiation with wavelength λ and lattice spacing dhkl , we observe maximum intensities
at angles 2Θ relative to the incident direction.

1.3 X-ray Line Broadening


In X-ray diraction measurements, the intensity proles I(2θ) of dierent reec-
tions are obtained. To simplify the analysis, the proles are often transformed to
a more convenient variable g , also denoted as K , which represents the reciprocal
space. The variable g is given by g = 2 sin
λ
θ
, where θ is the scattering angle and λ
is the wavelength of the X-ray radiation. The dierence between g and a reference
value gB at the exact Bragg position can be approximated as:

s = g − gB ≈ 2 cos θB λ∆θ, (1.20)


where θB is the Bragg angle and ∆θ is the angular deviation from θB . The
important parameters characterizing the I(s) intensity function corresponding to
the Bragg peak at 2θB are:

ˆ The maximum intensity:

I0 = max{I(s) | s ∈ R}. (1.21)

ˆ The Full Width at Half Maximum (FWHM):

I0
FWHM(I(s)) = s2 − s1 , where s1 < s2 and I(s1 ) = I(s2 ) = . (1.22)
2

6
ˆ The integral breadth (equivalent to the area under the normalized intensity
curve): R∞
I(s) ds
β= −∞
. (1.23)
I0

1.3.1 Size broadening


As the scattering volume decreases, diraction proles exhibit broadening, known
as size broadening. X-ray measurements are used to determine the size of coherently
scattering domains or crystallites. Let's consider an innite plane crystallite with a
thickness of N atoms. According to the theory of kinematical X-ray scattering, the
line prole of this specic crystallite is described by the function:

sin2 (N x)
I(s) ∼ , (1.24)
sin2 (x)
where x = πGa, G = g + ∆g , g is the diraction vector, ∆g is a small vector, and
a is the unit cell vector perpendicular to the plane of the crystallite. The function
sin2 (N x)
sin2 (x)
represents the shape and position of the peaks in this particular case. Figure
1.1 illustrates this function for dierent values of N . It reaches a maximum value at
positions x = nπ , where n ∈ Z. This condition is equivalent to the Laue equations.
The maximum value of this function is given by limx→0 sinsin2(N(x)x) = N 2 .
2

For large values of N , the prole function can be approximated by the following
simple expression:
2
sin2 (N x)

sin(N x)
2 = N2 . (1.25)
sin (x) Nx
The theoretical description requires the Fourier transform of the intensity prole
function sin x(N as well:
2 x)
2


π(N − π|L|) if π < |L| ≤ N
π
0 if |L| > Nπ
For an innite plane crystallite with a thickness of N atoms, the Fourier trans-
form of the size function, I(s), is given by:

7
Figure 1.1: The sin2 (N x)
sin2 (x)
function, close to it's rst maximum for dierent N values.


 N
LG,a
(N − N
LG,a
|L|) if |L| ≤ LG,a
0 if |L| > LG,a
where LG,a can be determined from the initial slope of the Fourier transform.
The size parameter L0 is generally dened for an arbitrary I(s) intensity prole as
the initial slope of the AS (L) Fourier transform:

AS (0) d
− =− (1.26)
L0 dL L=0

A polycrystalline or ne powder sample consists of multiple crystallites with


varying sizes, which can be characterized by a size distribution function. Theoretical
calculations of size-broadened proles can be performed by selecting an appropri-
ate size distribution and assuming a realistic crystallite shape. Several distribution
functions have been proposed to describe the size distribution of crystallites, but
one of the most exible options is the lognormal size distribution.
It has been observed that a milling procedure leads to a lognormal size distribu-

8
tion, making it widely employed in microstructural investigations. In this distribu-
tion, the logarithm of the crystallite size follows a normal distribution. The density
function of the lognormal size distribution can be expressed as:

(log(x/m))2
 
1
f (x) = √ exp − , (1.27)
2πσx 2σ 2
where m and σ are the distribution parameters. The quantity log m represents
the median, and σ corresponds to the variance of the normal distribution. These
parameters, m and σ , are referred to as the median and variance of the lognormal
size distribution, respectively.

By assuming a specic shape for the crystallites and a size distribution, it is


possible to determine the theoretical size prole. One method is to calculate the
size prole of a powder sample consisting of crystallites with arbitrary sizes and
shapes. The procedure involves dividing the crystallites into columns parallel to the
diraction vector g and obtaining the size intensity prole as the volume-weighted
sum of the intensity proles normalized by their integral intensities for each column.
The normalized intensity prole for a column with an area Ai and height Mi is
given by:

sin2 (Mi πs)


(1.28)
Mi (πs)2
By summing the contributions from all columns of all crystallites using the vol-
ume of the column as a weight, the intensity distribution becomes:
X sin2 (Mi πs)
I(s) ∼ Ai Mi (1.29)
i
Mi (πs)2

The volume sum of the columns with a height between M and M + dM from all
crystallites can be expressed as:

(1.30)
X
g(M )dM = dVj (M, dM )
j

Using this quantity, the intensity distribution can be written as:


Z  
1
I(s) ∼ 2
sin (M πs) g(M )dM (1.31)
M (πs)2

9
Thus, the size prole can be obtained by determining g(M )dM , which depends
on the shape and size distribution of the crystallites.
For spherical crystallites and the lognormal size distribution, g(M )dM can be
approximated by the volume of the part of a sphere with a column length between
M and M + dM :

g(M )dM ≈ −2πydyM (1.32)


By dierentiating the equation above, the relationship 2ydy = − M dM
2
is obtained.
Therefore, for one crystallite:

g(M )dM ∼ M 2 dM (1.33)


Since f (x)dx is proportional to the number of crystallites with a diameter be-
tween x and x + dx, and all crystallites with a diameter x ≥ M contain the column
length M , g(M )dM can be expressed as:
Z ∞
g(M )dM ∼ f (x)dx · M 2 dM (1.34)
M

Using the distribution density function given in equation 1.27, this integral can
be expressed as:
Z ∞  
1 log(M/m)
f (x)dx = erfc √ (1.35)
M 2 2σ
where erfc(x) is the complementary error function dened as:
Z ∞
2
(1.36)
2
erfc(x) = √ e−t dt
π x

Thus, for all crystallites, g(M )dM can be written as:


 
log(M/m)
g(M )dM ∼ M erfc 2
√ dM (1.37)

Using equation 1.31, the following size function is obtained:

sin2 (M πs)
Z  
log(M/m)
S
I (s) = M erfc √ dM (1.38)
0 (πs)2 2σ
This size function is plotted for dierent values of m and σ in gures 1.2 and

10
1.3.

Figure 1.2: The size function for spherical crystallites with lognormal distribution
with xed value of σ = 0.71, as a function of s.

1.3.2 Strain broadening


In a real crystal, lattice defects cause atoms to deviate from their ideal positions,
leading to distortion in reciprocal space. This distortion results in strain broadening,
where diraction occurs not only at the ideal positions of reciprocal lattice points
but also within a nite volume around them. To consider simultaneous size and
strain eects, we can express the Fourier transform of the X-ray line prole.
The total Fourier coecient, A(L), can be obtained as the product of the size
Fourier coecient, AS (L), and the strain Fourier coecient, AD (L):

A(L) = AS (L) · AD (L) (1.39)

The strain Fourier coecient, AD (L), takes the following form:

AD (L) = exp(−2π 2 g 2 L2 ⟨ε2g,L ⟩) (1.40)

Here, g represents the absolute value of the diraction vector, and ⟨ε2g,L ⟩ denotes

11
Figure 1.3: The size function for spherical crystallites with lognormal distribution
with xed value of m = 2.72 nm, as a function of s.

the mean square strain, which depends on the atom displacements from their ideal
positions. The spatial averaging is denoted by the brackets.
Wilkens introduced the eective outer cuto radius of dislocations, denoted as
Re∗ , as a length parameter instead of the crystal diameter. This modication elimi-
nates the logarithmic singularity in the expression for the mean square strain. The
crystal is divided into separate regions with a diameter of Re∗ , where randomly dis-
tributed screw dislocations exist. Within each region, the distribution of dislocations
is completely random with a density of ρ, and there is no interaction between dislo-
cations outside these regions.
To characterize the dislocation arrangement, we can introduce the dimensionless
parameter M ∗ :

R∗
M ∗ = √e . (1.41)
ρ
The value of M ∗ indicates the strength of correlation between dislocations: a
small M ∗ value implies a strong correlation, while a large M ∗ value indicates a
random distribution of dislocations within the crystallite. Figure 1.4 shows two dis-
location congurations: in the rst the dislocations are strongly correlated and the

12
value of M ∗ is small and in the second the correlation is weak and M ∗ is large.
Figure 1.5 presents the strain prole for xed ρ and variable M ∗ values.

Figure 1.4: Schematic representation of dislocation congurations and the corre-


sponding strain pro- le.

The phenomenon of strain anisotropy refers to the non-monotonic behavior of


prole broadening with respect to the hkl indices. The width of the diraction
proles does not solely depend on the length of the diraction vector or its square.
This anisotropy arises from the anisotropic nature of the mean square strain, denoted
by ⟨ε2g,L ⟩, which depends on the hkl indices. The dependence is described by contrast
factors, denoted as C , which are inuenced by the elastic constants of the material
and the relative orientation of various vectors associated with dislocations.
In polycrystalline samples, only the average contrast factors can be observed,
whereas in single crystals, individual contrast factors can be determined experimen-
tally. The contrast factors determine the visibility or broadening eect of dislocations
in diraction experiments. A dislocation with a Burgers vector (b) perpendicular to
the diraction vector (g ) has a negligible broadening eect (bg = 0), while other
combinations can contribute to prole broadening. The strain anisotropy in such
materials can be eectively accounted for by considering the average contrast fac-
tors.
For cubic crystals, the average contrast factor can be expressed as a fourth-order

13
Figure 1.5: The shape of the strain prole for xed ρ and variable M ∗ values.

polynomial of the hkl indices:

C = Ch00 (1 − qH 2 ), (1.42)
where

h2 k 2 + h2 l2 + k 2 l2
2
H = . (1.43)
(h2 + k 2 + l2 )2
For hexagonal crystals, the average contrast factor is given by:

a1 H 2
 
C = Chk0 1 + , (1.44)
1 + a2 H 2
where

(h2 + k 2 + (h + k)2 )l2


H12 = , (1.45)
(h2 + k 2 + (h + k)2 + 23 (ac)2 l2 )2
l4
H22 = 2 . (1.46)
(h + k 2 + (h + k)2 + 23 (ac)2 l2 )2

These expressions provide a means to describe and understand the strain anisotropy

14
observed in diraction experiments for dierent crystal structures.

2 Data Generation and Preprocessing


The diraction patterns used in this thesis are derived from powder diraction
measurements. The essence of powder diraction lies in the presence of numerous
randomly oriented crystalline particles within the irradiated volume. Consequently,
the obtained intensity distributions are independent of the relative positions of the
crystals. During the measurements, the scattered radiation intensity distribution is
examined as a function of 2θ and transformed to k. An example of such a measure-
ment result is shown in Figure 2.1.

Figure 2.1: Diraction pattern of an Al6Mg6 sample

The peaks in the diraction patterns correspond to individual lattice planes


(hkl). The peak intensities depend, among other factors, on the incident X-ray
intensity, the crystallographic structure factor, and the multiplicity. The multiplicity
arises from the fact that dierent Miller-indexed lattice planes correspond to the
same dhkl value, and therefore, the scattered beam appears at the same diraction
angle θ for these planes. Consequently, the peaks in the diraction pattern are
more prominent. To ensure that each lattice plane has only one corresponding peak,
it is important for the incident beam to be monochromatic, meaning that only a

15
single wavelength beam is scattered and reaches the detector. In the next section
the CMWP method will be discussed, which was the method for generating the
theoretical Al6Mg6 XRD peak patterns.

2.1 CMWP
The Convolutional Multiple Whole Prole (CMWP) tting procedure, developed
as a computer program, enables the determination of microstructural parameters
from diraction proles of materials with cubic, hexagonal, or orthorhombic crystal
lattices. However, its principles can be applied to all crystal systems. The method
operates directly on the measured intensity pattern without requiring prole sepa-
ration. Additionally, there is no need to correct the measured data for instrumental
eects. Instead, the instrumental eect is incorporated into the theoretical pattern
through convolution, thereby avoiding numerical division of small numbers.
The tting process involves directly modeling the entire measured powder dirac-
tion pattern as the sum of a background function and prole functions obtained
through convolution. These prole functions are derived from ab-initio theoretical
functions representing size, strain, and planar faults, combined with the measured
instrumental proles. The convolutional equation for a specic reection (hkl) can
be expressed as:

(2.1)
X
I hkl
theoretical
= BG(2θ) + I hkl I(2θ − 2θhkl
MAX
0
)

Here, BG(2θ) represents the background, I hkl is the peak intensity, and 2θhkl
MAX
0

denotes the 2θ value at the peak center. The theoretical prole for the hkl reection,
I hkl , is obtained by convolving the measured instrumental prole (I ) with the instr.

ab-initio prole functions, including the size prole (I ), the theoretical strain size

prole for dislocations (I ), and the theoretical prole function for planar faults
disl.

(Ipl. faults
):

I hkl = I instr.
∗I size
∗I disl.
∗I pl. faults
(2.2)
The CMWP method performs the convolution in Fourier-space, utilizing the an-
alytical form of the Fourier transforms of the proles. This approach is advantageous
in terms of computational eciency compared to directly applying the denition of

16
convolution. By inverse Fourier-transforming the product of the theoretical Fourier
transforms and the complex Fourier transform of the corresponding measured in-
strumental proles, the theoretical intensity proles (I hkl ) are obtained. The tting
procedure can provide various microstructural parameters depending on the eects
included, such as the median and variance of the size distribution, ellipticity of crys-
tallites, density and arrangement of dislocations, strain anisotropy parameters, and
the probability of planar faults. An example of a prole t with the CMWP software
can be seen on gure 2.2

Figure 2.2: XRD pattern t with the CMWP software of an Al6Mg6 sample

For the thesis dataset, 1591200 theoretical Al6Mg6 XRD patterns were created.
The parameters used for the data generation are in table 2.1. These are arbitrary
numbers, derived from experience with similar material samples and can be changed
for other tasks.

2.2 Preprocessing
Machine learning methods require careful handling of the training data, as values
on dierent scales can result in inaccurate models and wrong conclusions. Therefore,
it is essential to preprocess the XRD peak intensities before training any model with
them. In addition, certain data exploration methods, such as dimension reduction,

17
Table 2.1: Parameters for the theoretical XRD pattern generation.
a b c d e
-3 5 0.05 0.1 0.01
-2.5 10 0.1 0.25 0.025
-2 15 0.15 0.5 0.05
-1.5 20 0.2 0.75 0.075
-1 25 0.25 1 0.1
-0.5 30 0.3 2.5 0.25
0 35 0.35 5 0.5
0.25 40 0.4 7.5 0.75
0.5 45 0.45 10 1
0.75 50 0.5 25 1.25
1 60 0.6 50 1.5
1.25 70 0.7 75 1.75
1.5 80 0.8 100 2
1.75 90 0.9 250
2 100 1 500
2.25 110 1.1 750
2.5 120 1.2 1000
2.75 130 1.3
140 1.4
150 1.5

may require the data to have zero mean and unit variance, which can also be achieved
through preprocessing. In this thesis, the two most important data handling concepts
are normalization and standardization.

2.2.1 Normalization
Normalization, also known as min-max scaling, is a data preprocessing technique
that scales the features of a dataset to a xed range, usually between 0 and 1. This
is achieved by subtracting the minimum value from each feature and then dividing
it by the range (i.e., the dierence between the maximum and minimum values).
Mathematically, the normalization of a feature x is given by:

x − min(x)
x norm = (2.3)
max(x) − min(x)
where xnorm represents the normalized value of x. The resulting normalized values
will lie within the range [0, 1].

18
Normalization is particularly useful when the absolute values of the features do
not matter, but their relative positions or proportions are important. It ensures that
all features have a similar scale, preventing any particular feature from dominating
the others during model training.

2.2.2 Standardization
Standardization, also known as z-score normalization or feature scaling, is a data
preprocessing technique that transforms the features of a dataset to have zero mean
and unit variance. This is achieved by subtracting the mean value from each feature
and then dividing it by the standard deviation. Mathematically, the standardization
of a feature x is given by:

x − mean(x)
x = (2.4)
std
std(x)
where x represents the standardized value of x. The resulting standardized
std

values will have a mean of 0 and a standard deviation of 1.


Standardization is useful when the absolute values of the features are meaning-
ful and contribute to the analysis, but their scales dier signicantly. It ensures
that all features have a comparable scale, allowing models to interpret the relative
importance of each feature correctly.
Both normalization and standardization aim to bring the features of a dataset to
a common scale, but they dier in their scaling mechanisms. Normalization rescales
the features to a xed range, while standardization transforms the features to have
zero mean and unit variance.

3 Statistical Description of Multi-Output Regres-


sion
Regression analysis is a fundamental statistical technique used to model the
relationship between a set of predictor variables and a response variable. In its
traditional form, regression aims to estimate the conditional mean of the response
variable given the predictors. However, in many real-world scenarios, there is a need
to model and predict multiple response variables simultaneously. This gives rise to

19
the concept of multi-output regression, which extends the regression framework to
handle multiple dependent variables.

3.1 Overview of Regression in Statistics


In statistical regression, the goal is to understand and quantify the relationship
between a set of explanatory variables, often denoted as X , and a response variable,
denoted as Y . The objective is to estimate a regression function, denoted as f (X),
that can accurately predict the values of Y based on the observed values of X .
The regression function f (X) represents the underlying relationship between the
predictors and the response, which can be linear or nonlinear.
In a univariate regression setting, where there is a single response variable, the
regression function can be expressed as:

Y = f (X) + ε (3.1)
where Y is the response variable, f (X) is the regression function, and ε is a
random error term representing the variability that is not explained by the predic-
tors. The regression function f (X) is typically estimated using various statistical
methods, such as ordinary least squares (OLS) or maximum likelihood estimation
(MLE), to minimize the discrepancy between the observed values of Y and the
predicted values from f (X).

3.2 Multi-Output Regression


Multi-output regression, also known as multi-target regression or multivariate
regression, extends the regression framework to situations where there are multiple
response variables. In this setting, the goal is to model the joint relationship between
a set of predictors X and a set of response variables Y = (Y1 , Y2 , . . . , Yp ), where p is
the number of response variables.
The multi-output regression model can be formulated as:

Y = f (X) + ε (3.2)
where f (X) = (f1 (X), f2 (X), . . . , fp (X)) represents the vector-valued regression
function that maps the predictors to the response variables. Each component func-

20
tion fi (X) estimates the relationship between the predictors and the ith response
variable Yi . The error term ε captures the unexplained variability in the response
variables.

3.3 Loss Functions


To estimate the vector-valued regression function f (X), appropriate loss func-
tions need to be dened. Loss functions quantify the discrepancy between the pre-
dicted values and the observed values of the response variables. Commonly used loss
functions for multi-output regression include:

3.3.1 Mean Squared Error (MSE)


The mean squared error is a widely used loss function for regression problems.
For multi-output regression, the MSE can be dened as:
p n
1X1X
MSE(Y, Ŷ ) = (Yij − Ŷij )2 (3.3)
p i=1 n j=1

where Y represents the true values of the response variables, Ŷ represents the
predicted values, n is the number of observations, and p is the number of response
variables. The MSE measures the average squared dierence between the predicted
and true values of the response variables.

3.3.2 Mean Absolute Error (MAE)


The mean absolute error is another commonly used loss function for regression.
In the multi-output regression setting, the MAE can be dened as:
p n
1X1X
MAE(Y, Ŷ ) = |Yij − Ŷij | (3.4)
p i=1 n j=1

The MAE measures the average absolute dierence between the predicted and
true values of the response variables.

21
3.4 Evaluation Metrics
In addition to loss functions, evaluation metrics are used to assess the perfor-
mance of multi-output regression models. These metrics provide a comprehensive
understanding of the model's predictive capability. Common evaluation metrics for
multi-output regression include:

3.4.1 R-squared (R2 )


The R2 metric measures the proportion of variance in the response variables that
can be explained by the predictors. For multi-output regression, R2 can be computed
as the average R2 across all response variables:
p
1X 2
2
R (Y, Ŷ ) = R (Yi , Ŷi ) (3.5)
p i=1

where R2 (Yi , Ŷi ) represents the R2 value for the ith response variable.

3.4.2 Root Mean Squared Error (RMSE)


Root Mean Squared Error (RMSE) is a commonly used metric to assess the
accuracy of a regression model. It measures the average magnitude of the residuals,
which are the dierences between the predicted values and the actual values. RMSE
provides a measure of how well the model ts the observed data points.
Let's consider a regression problem with n data points, denoted as (xi , yi ) for
i = 1, 2, . . . , n, where xi represents the input variables and yi represents the cor-
responding true output values. Given a regression model that predicts the output
values ŷi , the RMSE is calculated as:
v
u n
u1 X
RMSE = t (ŷi − yi )2 . (3.6)
n i=1

The RMSE quanties the average discrepancy between the predicted values and
the actual values. A lower RMSE indicates a better t of the model to the data,
with smaller residuals.

22
3.4.3 Relative Root Mean Squared Error (RRMSE)
The Relative Root Mean Squared Error (RRMSE) is a normalized version of the
RMSE and is often used to compare the performance of dierent models or evaluate
the accuracy of a model in relation to the scale of the target variable. RRMSE is
calculated by dividing the RMSE by the range of the target variable.
Let's denote the range of the target variable as R = max(yi ) − min(yi ). The
RRMSE is then given by:

RMSE
RRMSE = . (3.7)
R
By normalizing the RMSE with respect to the range of the target variable, the
RRMSE allows for a standardized comparison across dierent datasets and target
variable scales. A lower RRMSE indicates a better predictive accuracy of the model
relative to the range of the target variable.
Both RMSE and RRMSE are widely used in regression analysis to evaluate the
performance of models and compare dierent approaches. These metrics provide
quantitative measures of the model's predictive accuracy and can help researchers
and practitioners in selecting the most suitable model for their specic regression
tasks.

4 Machine Learning Approaches for Multi-Output


Regression
Machine learning techniques oer powerful tools for solving multi-output re-
gression problems. Various algorithms can be leveraged to model the vector-valued
regression function f (X) and make predictions for multiple response variables si-
multaneously.
Tree-based models, such as random forests, can handle multi-output regression
naturally by extending their single-output counterparts. These models can capture
complex interactions and non-linear relationships between the predictors and the
response variables.
Neural networks, particularly feed-forward neural networks and deep learning
architectures, are also eective in multi-output regression tasks. By utilizing multi-

23
ple output nodes in the network's architecture, neural networks can learn the joint
representation of the predictors and generate predictions for multiple response vari-
ables.
Additionally, ensemble methods, such as stacking and bagging, can be employed
to combine the predictions from multiple models and enhance the overall predictive
performance in multi-output regression.

4.1 Tree-Based Models


Tree-based machine learning models have gained signicant attention in various
domains due to their ability to handle complex, nonlinear relationships and perform
well in both regression and classication tasks. These models are based on the con-
cept of decision trees, which are hierarchical structures that recursively partition the
input space based on feature values.
The Random Forest Regressor (RFR) and the Extremely Random Trees Regres-
sor (ETR) are two prominent ensemble methods that leverage the power of decision
trees. Ensemble methods combine multiple individual models to make predictions,
thereby enhancing the overall performance and robustness of the model.

4.1.1 Random Forest Regressor


The Random Forest Regressor is an ensemble model that consists of a collection
of decision trees. It operates by constructing a multitude of decision trees during
the training phase and making predictions by averaging the predictions of all the
individual trees. This averaging process helps to reduce overtting and improve
generalization.
Let's denote the training dataset as (X, Y), where X represents the input features
and Y represents the corresponding target outputs. The Random Forest Regressor
builds a set of decision trees {T1 , T2 , ..., Tn }, where each tree is constructed using a
random subset of the training data.
The prediction of the Random Forest Regressor can be computed as follows:
n
1X
Ŷ = Ti (X) (4.1)
n i=1

where Ŷ represents the predicted output and Ti (X) denotes the prediction of

24
the ith decision tree.
The Random Forest Regressor introduces randomness in two ways. First, during
the construction of each tree, a random subset of features is considered at each
split point. This random feature selection reduces the correlation among the trees
and improves the diversity of the ensemble. Second, the bootstrap aggregation (or
bagging) technique is used to randomly sample the training data with replacement.
This sampling strategy further enhances the model's robustness.

4.1.2 Extremely Random Trees Regressor


The Extremely Random Trees Regressor is another ensemble method that ex-
tends the concept of random forests. Similar to the Random Forest Regressor, it
builds an ensemble of decision trees. However, the key dierence lies in how the
splits are generated in the individual trees.
In the Extremely Random Trees Regressor, the splits are determined randomly
without considering any optimal thresholds. Instead of searching for the best split
based on certain impurity measures, the splitting thresholds are selected randomly
within the range of feature values. This randomization leads to increased diversity
among the trees and reduces the overall variance of the model.
The prediction of the Extremely Random Trees Regressor can be represented as:
n
1X
Ŷ = Ti (X) (4.2)
n i=1

where Ŷ represents the predicted output and Ti (X) denotes the prediction of
the ith decision tree.
The main advantage of the Extremely Random Trees Regressor is its computa-
tional eciency. Since the splits are generated randomly without any optimization,
the training process is faster compared to other tree-based models. Additionally,
the ETR model tends to be less prone to overtting, especially when the dataset
contains noisy or irrelevant features.
Both the Random Forest Regressor and the Extremely Random Trees Regressor
are powerful tree-based machine learning models for multi-output regression tasks.
They leverage the ensemble approach to combine the predictions of multiple decision
trees, thereby improving the overall accuracy and robustness of the model. The

25
Random Forest Regressor selects the best split based on optimal thresholds, while
the Extremely Random Trees Regressor generates splits randomly. These models
oer a exible and eective solution for capturing complex relationships in XRD
peak intensities and can be a valuable tool in various scientic and engineering
applications.

4.2 K-nearest Neighbors model


K-Nearest Neighbors (KNN) regression is a non-parametric algorithm that pre-
dicts the value of a data point by considering the average of the target values of its
K nearest neighbors in the feature space. It operates based on the assumption that
similar instances in the feature space tend to have similar target values.
Let's denote the training dataset as (X, Y), where X represents the input features
and Y represents the corresponding target outputs. The KNN regression algorithm
works as follows:
1. Distance Calculation: For a given query point X , the rst step is to cal-
query

culate the distances between X and all the points in the training dataset. The
query

commonly used distance metric is the Euclidean distance, given by:


v
u m
uX
d(X query , Xi ) = t (X query,j − Xi,j )2 (4.3)
j=1

where m is the number of features, X represents the query point, and Xi rep-
query

resents the ith training point.


2. Nearest Neighbor Selection: The K nearest neighbors of the query point X query

are selected based on the calculated distances. These K neighbors are the data points
in the training dataset that have the smallest distances to X . query

3. Weighted Averaging: Once the K nearest neighbors are identied, the predicted
output for the query point is calculated as the weighted average of the target values
of these neighbors. The weights are typically assigned based on the inverse of the
distances, giving more weight to the closer neighbors. The predicted output can be
expressed as:
K
1 X
Ŷ query = Yi (4.4)
K i=1

26
where Ŷquery represents the predicted output for the query point and Yi represents
the target value of the ith nearest neighbor.
One crucial aspect of KNN regression is the selection of the optimal value for
K, which determines the number of neighbors considered during prediction. A small
value of K may result in a more exible and locally adaptive model but can be
sensitive to noise. On the other hand, a large value of K may lead to a smoother but
less responsive model. The choice of K depends on the dataset and the underlying
problem, and it is often determined through cross-validation or other model selection
techniques.
KNN regression is a versatile algorithm that can capture complex relationships
in the data without assuming a specic functional form. However, it can be compu-
tationally expensive, especially for large datasets, as it requires calculating distances
for all training points during prediction.

4.3 Ensemble methods


Ensemble methods are powerful machine learning techniques that combine the
predictions of multiple individual models to make more accurate and robust pre-
dictions. These methods are particularly eective when dealing with complex and
noisy datasets, as they leverage the diversity of the individual models to improve
overall performance. Averaging, bagging, and boosting are three popular ensemble
techniques that have been widely studied and applied in various domains.

4.3.1 Averaging
Averaging, also known as model averaging or democratic voting, is a simple
yet eective ensemble technique. In this approach, multiple individual models are
trained independently on the same dataset, and their predictions are averaged to
obtain the nal ensemble prediction. Averaging can be applied to both regression
and classication problems.
Mathematically, for a regression problem, the ensemble prediction ŷ is obtained
by averaging the predictions yi of the individual models:
N
1 X
ŷ = yi
N i=1

27
where N is the number of individual models.
For a classication problem, the ensemble prediction is typically determined by
majority voting. Each individual model predicts a class label, and the most common
class label among the models is selected as the ensemble prediction.

4.3.2 Bagging (Bootstrap Aggregating)


Bagging is an ensemble technique that combines the concept of bootstrapping
and averaging. It involves training multiple individual models on dierent bootstrap
samples obtained by resampling the original training data with replacement. Each
individual model is trained independently, and their predictions are averaged to form
the nal ensemble prediction.
Bagging reduces the variance of the individual models and improves the stabil-
ity of the ensemble prediction. It is particularly eective for reducing overtting
and handling noisy data. The individual models in bagging can be trained using
any learning algorithm, such as decision trees, neural networks, or support vector
machines.

4.3.3 Boosting
Boosting is an ensemble technique that aims to improve the performance of
individual models by sequentially training them in a stage-wise manner. Unlike
bagging, boosting focuses on creating a strong model by iteratively learning from
the mistakes made by the previous models.
Boosting algorithms work by assigning higher weights to misclassied instances
in each iteration, forcing subsequent models to focus more on these dicult in-
stances. The nal ensemble prediction is obtained by aggregating the predictions of
all individual models, with each model's contribution weighted based on its perfor-
mance.
One popular boosting algorithm is AdaBoost (Adaptive Boosting), which assigns
weights to each instance in the training data and adjusts these weights based on the
misclassication rate of the previous model. AdaBoost places more emphasis on
misclassied instances in subsequent iterations, allowing the ensemble to focus on
challenging data points.
Gradient boosting is another widely used boosting algorithm that constructs an

28
ensemble of decision trees, with each tree trained to minimize the residual errors of
the previous trees. Gradient boosting incorporates gradient information and employs
optimization techniques to nd the best splitting points, resulting in a strong and
accurate model.
Both bagging and boosting techniques oer improvements over individual mod-
els by leveraging the diversity and collective wisdom of multiple models. Bagging
reduces variance and improves stability, while boosting focuses on reducing bias and
increasing predictive power.

4.4 Neural Networks


Neural networks are powerful models that consist of interconnected layers of
articial neurons. They are widely used in machine learning and have achieved re-
markable success in various domains, including image recognition, natural language
processing, and time series analysis.

4.4.1 Structure of Neural Networks


A neural network is composed of an input layer, one or more hidden layers, and
an output layer. Each layer consists of articial neurons (also known as nodes) that
perform computations on the input data.
The output of a neuron in layer j is calculated using the weighted sum of its
inputs and an activation function. Mathematically, the output aj is computed as
follows:
aj = σ(zj )

where zj represents the weighted sum of inputs to the neuron:


n
X
zj = wji ai−1 + bj
i=1

Here, n is the number of neurons in the previous layer, wji denotes the weight
connecting the i-th neuron in the previous layer to the j -th neuron in the current
layer, ai−1 is the output of the i-th neuron in the previous layer, and bj is the bias
term for the j -th neuron.

29
4.4.2 Working Principles of Neural Networks
Neural networks employ a process called forward propagation to generate pre-
dictions or outputs. During forward propagation, the input data is passed through
the network, and each neuron computes its output based on the weighted sum of
its inputs and an activation function. The output of the last layer represents the
network's prediction.
Activation functions introduce non-linearity to the network, allowing it to learn
complex patterns and relationships in the data. Commonly used activation functions
include sigmoid, tanh, and Rectied Linear Unit (ReLU).

4.4.3 Backpropagation and Stochastic Gradient Descent


Backpropagation is an algorithm used to train neural networks. It involves two
phases: forward propagation and backward propagation of errors. During forward
propagation, input data is processed through the network, and the predicted output
is compared with the actual output using a loss function.
The goal of backpropagation is to minimize the loss function by adjusting the
weights of the network. This is done using the stochastic gradient descent (SGD)
optimization algorithm. SGD updates the weights iteratively based on the gradients
of the loss function with respect to the weights. The weight update rule can be
expressed as:
new old ∂L
wji = wji −η·
∂wji
where wjiold and wjinew are the old and updated weights, η is the learning rate, and
∂L
∂wji
is the partial derivative of the loss function with respect to the weight wji .
When working on multi-output regression problems, the overall loss function for
is typically a combination of individual loss functions for each target variable. During
training, the backpropagation algorithm adjusts the weights to minimize the overall
loss across all output variables by computing the gradients of the loss function with
respect to the weights.

30
5 Results
5.1 Out-of-core learning
5.2 Dimensionality reduction
5.3 Out of the box methods
5.4 Ensamble methods
5.5 Neural networks
5.6 Serial evaluation
6 Conclusion

31

You might also like