Seminar Report Machine Learning
Seminar Report Machine Learning
on
Bachelor of Engineering
in
Mechanical Engineering
Submitted by
1SJ11ME090
Bachelor of Engineering
in
Mechanical Engineering
Certificate
This is to certify that, The Technical seminar on Machine Learning Techniques in Me-
chanical Engineering was presented at mechanical engineering department by Seggoju
Raja Havish bearing U.S.N 1SJ11ME090,in partial fulfillment for the award of B.E degree
in Mechanical Engineering of the Visveswaraya Technological University, Belgaum, during
2014-15. The Seminar report has been approved as it satisfies academic requirement in re-
spect of Technical seminar prescribed for B.E degree.
Maximum 50 Marks
marks obtained
Date:
Contents
1 Introduction 1
1.1 What is Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Types of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.3 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Conclusion 16
References16
1 Introduction
1.1 What is Machine Learning
It is a scientific discipline which deals with the construction of algorithms which can learn
from data.
What this means is that, given a data set {X 1 , X 2 , X 3 ...X m } where X i = {x1 , x2 , x3 ...xn },
and dependent variables {y 1 , y 2 , y 3 ...y m }, we can construct a hypothesis such that, for a given
X m+1 we can predict the behavior of y m+1 . Here {x1 , x2 , x3 ...xn } are called features of X i .
example :
x1 x2 x3 y
Specimen Length Diameter Maximum Load Maximum Load
in ’mm’ in ’mm’ (As det. by F.E.M.) in ’kN’ (Actual) in ’kN’
X1 Specimen 1 200 20 5.1 5.4
X2 Specimen 2 350 15 4.9 5.2
X3 Specimen 3 180 22 6.2 6.9
.. .. .. .. .. ..
. . . . . .
Xm Specimen ’m’ 350 20 5.4 5.6
Now let’s say we want to know the maximum allowable load on ’Specimen m+1’ where
we know the values of length,diameter and maximum load as determined by F.E.M then we
can use the algorithm to predict the actual maximum load.
a) Regression: In this type of machine learning problem the output is continuous valued,
like in the case of the above example where the maximum Load can vary continuously.
b) Classification:Here the output is discrete valued, i.e, there exist a finite number of possible
answers.
e.g. the problem of classifying a given picture as that of a man or a horse given a previous
data set.
1
1.2.3 Recommender Systems
These are information filtering systems that seek to predict the ’rating’ or ’preference’ to-
wards an item. The YoutubeTM ’related videos’ is an example of this type of learning.
Now, for simplicity, let us consider only one feature for the data set; the ratio of the
length to diameter.
hΘ (x1 ) = θ0 + θ1 x1
Here θ0 and θ1 are the parameters and they are chosen such that the value of hΘ (x1 ) is
as close to the value of y as possible.
This function is used to calculate the error of predictions for a given value of theta. The
optimal value of theta is at the minimum value of the cost function. The cost function for
univariate linear regression is given as:
m
X
1
J(θ0, θ1) = 2m [hθ (x(i)) − y (i)]2
i=1
2
2.1.2 Gradient Descent Algorithm
It is an algorithm devised to solve the linear regression problem. It calculates the optimal
value of the parameter set Θ = {θ0 , θ1 , θ2 ...θn }. This algorithm basically minimizes the value
of the cost function J(Θ).
The mathematical expression is given by:
θj := θj − α ∂θ∂ j J(Θ)
m
X
∂ ∂ 1
∂θj J(Θ) = ∂θj [ 2m [hΘ(x(i)) − y (i)]2]
i=1
Consider
the given data, representing it in matrix format for computational simplicity:
1 x01
1 x1 1 10 5.4
1
1 23.33 5.2
2
[X] = 1 x1 =
[Θ] = θ θ [y] =
0 1
.. .. 1 8.18 6.9
. .
1 17.5 5.6
1 xm 1
hΘ (X) = [X][Θ]
3
m h
X i2
∂ ∂ 1
∂θj J(Θ) = ∂θj 2m [X][ΘT ] − [y]
i=1
m iT
1 X h
Θ=Θ−α X T [X][ΘT ] − [y]
m i=1
From graph 1 we can see that the partial differential term takes care of calculating the
minimum point even if the value overshoots.
The value of α need not be varied manually since the amount of decrement made gradually
decreases along the curve for the same decrement in value of θ, as seen from graph 2.
4
Take the same example as we have been using with categories {0, 1}, let us apply a load
on the specimen and if the specimen doesn’t fail it is classified as 0 else as 1.
y ∈ {0, 1}
Let us say we apply a load ’c’, failure occurs when actual maximum load(or allowable
load ) is less than ’c’.
0 if (allowable load) > c
y=
1 if (allowable load) < c, i.e, failure
Here {0, 1} represent the probability of the failure of specimen. We arbitrarily assume
the threshold value of 0.5 in this case. If the probability of failure is grater than the threshold
it is assumed to fail. In this case we can’t use the expression for hΘ as it is, since we need
to calculate the probability of the specimen’s failure.
The sigmoid function gives an estimated probability of y belonging to any of the given
classifications.
1
g(z) = 1+exp(−z)
1
T
hΘ (X) = g XΘ = 1+exp(−XΘT )
Here g(z) is called the sigmoid function. A graph of the sigmoid function is given below.
5
For a value of z more than 0.5 g(z) is approximately equal to 1, and at -0.5 equal to 0.
E.g. applying the above formulas if we get hΘ (X) = 0.7, it implies there is a 70% chance of
the specimen failing.
The cost function calculations used in linear regressions will lead to non-convex curves, the
minima of which are difficult to calculate. For this reason we use a simplified cost function
given by -
m
X
1 i (i)
J(Θ) = m cost hΘ (x ), y
i=1
6
−log(hΘ (X)) if y = 1
cost hΘ (xi ), y (i)
=
−log(1 − hΘ (X)) if y = 0
The cost equation can also be written as -
cost hΘ (xi ), y (i) = −ylog(hΘ (X)) − (1 − y)log(1 − hΘ (X))
m
X
1
−y i log(hΘ (xi )) − (1 − y i )log(1 − hΘ (xi ))
J(Θ) = m
i=1
We use the same gradient descent algorithm to solve for optimum values of Θ.
Differentiating and replacing the gradient descent algorithm we get.
m
X T
Θ := Θ − α hΘ (xi ) − y i X
i=1
2. In each new training set assign the 0 y 0 value of the group to be determined as 1 and
others as 0.
3. Run the logistic regression algorithm and get the independent values of Θ for each
classification.
Θ ∈ Rk×n
.
4. When a new specimen is given to examine, the values of the sigmoid function for all
0 0
k sets of Θ values are calculated and the set where maximum probability occurs is
assigned to it.
7
The graph on the right represents the given data, the data is then relabeled as shown in
graph 2 to calculate the values of θ for the first group. We repeat the same for each group.
8
and are used to estimate or approximate functions that can depend on a large number of
inputs and are generally unknown. Artificial neural networks are generally presented as
systems of interconnected ”neurons” which can compute values from inputs, and are capable
of machine learning as well as pattern recognition.
ANNs essentially consist of hidden layers, these are like new features which the network
calculates for itself as per the outputs of the training set. These values of the hidden layer
can later be utilized for predictions.
On the left side is a single neuron element where x0 , x1 and x2 are the inputs or the
features, and using given algorithm the output is calculated.
On the right side is shown a full neural network with two hidden layers.
a21 = g x0 θ10
1 1 1
+ x1 θ11 + x2 θ12
a22 = g x0 θ20
1 1 1
+ x1 θ21 + x2 θ22
..
.
Here,
aji - activation unit where subscript i represents the activation unit number in that layer,
and j represents the layer number.
j
θi1,i2 - weight controlling the function mapping. i1, i2 are the activation unit number and
the wight number respectively and j is the hidden layer number.
g(z) - sigmoid function.
9
hΘ (x) = g (θ0 + θ1 x1 + θ2 x2 )
where -
θ0 −30
θ1 = 20
θ2 20
Putting these two together -
This clearly indicates how neural networks can be used to implement logical AND gate.
Similarly logical OR, NOT, XNOR and XOR can also be implemented.
Both these are explained using the image shown, in the first step ’k’ random readings are
marked as cluster centroids and the other (n-k) readings are assigned to one of these clusters
based on their proximity to the centroids (with respect to features).
In the second step the mean average of all the members of a cluster is calculated and
that cluster center is reassigned to it. This is done to all k clusters.
These two steps are repeated until optimal condition is reached.
10
The above example takes six iterations of the algorithm to find the optimal clusters. These
two steps can be mathematically represented as -
Repeat {
1. for i = 0 to m
c(i) =index (from 1 to K) for cluster
centroids closest to x(i) .
minkx(i) − µk k2
2. for k = 1 to K
µk =avg(mean) of points assigned
to cluster.
µk = mean(x1 + x2 + x3 ...)
11
3 Applied Machine Learning
In this section applications of machine learning and its problem solving techniques in me-
chanical engineering are discussed. The two applications discussed are -
12
3.2 Feature Extraction and Selection
The vibration signals acquired cannot be used directly. Useful information needs to be ex-
tracted from them which will represent the signal. Here statistical features are extracted. De-
scriptive statistics for a particular signal gives a wide range of parameters namely mean, stan-
dard error, median, mode, standard deviation, sample variance, kurtosis, skewness, range,
minimum, maximum, sum and count.
13
After the experiment progression, springback occurred. It is the result of flexible angular
changes and sidewall curl. Their sum is expressed in corner. For three different materials,
three parameters such as tool radius, restraint force and friction were analyzed.
14
Furthermore, a common ML model was created. More attributes (four material parameters)
and data sets were available for this procedure and despite the consideration of all data
sets they obtained a pretty well learned model. This can be seen in Figure below, where
correlation coefficients for all six ML algorithms are presented. If we take a look at the
results of the correlation coefficients we can see that the last three tested methods (SMO,
Gaussian Processes and Multilayer Perceptron) are more suitable for springback prediction.
One of the reasons is definitely in the nonlinearity of the complex springback phenomenon
and these methods are able to model this kind of phenomena. However, combination with
the FEM raises the correlation coefficient to a higher level compared to the solely FEM
method.
15
4 Conclusion
Machine learning and its various algorithms can provide a alternative to solving the existing
problems in mechanical engineering. The large amounts of data required for application
of M.L are available. In a world where computational intelligence in rapidly growing the
many branches in mechanical engineering can flourish with the support of M.L. This being
a relatively new field offers many research opportunities.
References
[1] Prof. Andrew Ng, Stanford CS dept. http://cs229.stanford.edu/materials.html
[3] Anish Bahri, V.Sugumaran, S. Babu Devasenapati, 12 Pages, 8 Figures, 4 Tables. In-
ternational Journal of Research in Mechanical Engineering, 2013
[5] Wikipedia
16