Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
76 views

Unit 5

The document discusses various techniques for data analysis including regression modelling, multivariate analysis, Bayesian modelling, time series analysis, and emerging trends. It provides an overview of the R programming language and examples of simple linear regression modeling.

Uploaded by

downloadjain123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Unit 5

The document discusses various techniques for data analysis including regression modelling, multivariate analysis, Bayesian modelling, time series analysis, and emerging trends. It provides an overview of the R programming language and examples of simple linear regression modeling.

Uploaded by

downloadjain123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

Big Data Analytics

Unit 5
Data Analysis

Dr. Vandana Bhatia


❑ Overview of R programming language
❑ Regression Modelling,
❑ Multivariate Analysis
❑ Bayesian Modelling
❑ Inference and Bayesian Networks
❑ Support Vector and Kernel Methods
❑ Analysis of Time Series
❑ Linear Systems Analysis
❑ Nonlinear Dynamics
Contents ❑

Rule Induction
Neural Networks
❑ Learning And Generalization
❑ Competitive Learning
❑ Principal Component Analysis and Neural Networks
❑ Fuzzy Logic: Extracting Fuzzy Models from Data
❑ Fuzzy Decision Trees,
❑ Stochastic Search Methods.
❑ Emerging Trends.
#Printing Hello World
myString <- "Hello, World!“
print ( myString)
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
# Get the class of the vector.
Overview of print(class(apple))
# Create a list.
R list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
programming print(list1)
# Create a matrix.
language M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
# Create an array
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26) )
print(BMI)
Regression Versus Classification
Regression Versus Classification
Contd..
Regression
Modelling
Regression analysis is a set of statistical
processes for estimating the relationships
between a dependent variable (often called
the 'outcome' or 'response' variable) and
one or more independent variables (often
called 'predictors', 'covariates', 'explanatory
variables' or 'features').
Regression Modeling Steps

1 2 3 4 5 6 7
Define Specify Collect data Do Estimate Evaluate Use model
problem or model descriptive unknown model for
question data analysis parameters prediction
•  represents the • i represents the
unit change in Y per unit change in Y
unit change in X . per unit change in
• Does not take into Xi.
account any other • Takes into account
Simple vs. variable besides the effect of other
single independent i s.
Multiple variable.
• “Net regression
coefficient.”
Linearity - the Y variable is linearly related to the
value of the X variable.

Independence of Error - the error (residual) is


independent for each value of X.
Assumptions
Homoscedasticity - the variation around the line
of regression be constant for all values of X.

Normality - the values of Y be normally


distributed at each value of X.
Goal

Develop a statistical model that


can predict the values of a
dependent (response) variable
based upon the values of the
independent (explanatory)
variables.
Simple Linear Regression
Simple Linear Regression

◼ Managerial decisions often are based on the


relationship between two or more variables.
◼ Regression analysis can be used to develop an
equation showing how the variables are related.
◼ The variable being predicted is called the dependent
variable and is denoted by y.
◼ The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.
Simple Linear Regression

◼ Simple linear regression involves one independent


variable and one dependent variable.
◼ The relationship between the two variables is
approximated by a straight line.
◼ Regression analysis involving two or more
independent variables is called multiple regression.
Representing Linear Regression Model
Simple Linear Regression Model
◼ The equation that describes how y is related to x and
an error term is called the regression model.
◼ The simple linear regression model is:

y = 0 + 1x +e

where:
0 and 1 are called parameters of the model,
e is a random variable called the error term.
Simple Linear Regression Equation

The simple linear regression equation is:

E(y) = 0 + 1x

• Graph of the regression equation is a straight line.


• 0 is the y intercept of the regression line.
• 1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression
Equation
Positive Linear Relationship

E(y)

Regression line

Intercept Slope 1
0 is positive

x
Simple Linear Regression Equation

Negative Linear Relationship

E(y)

Intercept
0 Regression line

Slope 1
is negative

x
Simple Linear Regression Equation

No Relationship

E(y)

Intercept Regression line


0
Slope 1
is 0

x
Least Squares Method
• Least Squares Criterion
min  (y i − y i ) 2

where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
Least Squares Method
• Slope for the Estimated Regression
Equation
b1 = 
( x − x )( y − y )
i i

 i
( x − x ) 2

where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares Method

y-Intercept for the Estimated Regression Equation

b0 = y − b1 x

x 2 5 3 5 1 6
y 4 7 6 8 4 9
Simple Linear Regression

Example: Reed Auto Sales


Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
Simple Linear Regression

Example: Reed Auto Sales

Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Sx = 10 Sy = 100
x=2 y = 20
Estimated Regression Equation
Slope for the Estimated Regression Equation

b1 =  ( x − x )( y − y ) 20
i i
= =5
 (x − x )i
2
4

y-Intercept for the Estimated Regression Equation


b0 = y − b1 x = 20 − 5(2) = 10
Estimated Regression Equation
yˆ = 10 + 5x
Coefficient of Determination
• Relationship Among SST, SSR, SSE
SST = SSR + SSE

 i
( y − y ) 2
=  i
( ˆ
y − y ) 2
+  i i
( y − ˆ
y ) 2

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Coefficient of Determination

The coefficient of determination is:

r2 = SSR/SST

where:
SSR = sum of squares due to regression
SST = total sum of squares
Coefficient of Determination

r2 = SSR/SST = 100/114 = .8772


The regression relationship is very strong; 87.72%
of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Multivariate Analysis
• Multivariate means involving multiple dependent variables resulting in
one outcome.
• The majority of the problems in the real world are Multivariate.
• For example, we cannot predict the weather of any year based on the
season. There are multiple factors like pollution, humidity,
precipitation, etc.
Multivariate analysis techniques:
Dependence vs. interdependence
Dependence methods
• Dependence methods are used when one or some of the variables are dependent
on others.
• Dependence looks at cause and effect; in other words, can the values of two or
more independent variables be used to explain, describe, or predict the value of
another, dependent variable?
• Example: the dependent variable of “weight” might be predicted by independent
variables such as “height” and “age.”
• In machine learning, dependence techniques are used to build predictive models.
• The analyst enters input data into the model, specifying which variables are
independent and which ones are dependent—in other words, which variables they
want the model to predict, and which variables they want the model to use to make
those predictions.
Multivariate analysis techniques:
Dependence vs. interdependence
Interdependence methods
• Interdependence methods are used to understand the structural makeup and underlying
patterns within a dataset. In this case, no variables are dependent on others, so you’re not
looking for causal relationships. Rather, interdependence methods seek to give meaning to
a set of variables or to group them together in meaningful ways.
• So: One is about the effect of certain variables on others, while the other is all about the
structure of the dataset.
• With that in mind, let’s consider some useful multivariate analysis techniques. We’ll look
at:
• Multiple linear regression
• Multiple logistic regression
• Multivariate analysis of variance (MANOVA)
• Factor analysis
• Cluster analysis
Multivariate analysis of variance (MANOVA)
• Multivariate analysis of variance (MANOVA) is used to measure the effect of multiple independent variables on two or
more dependent variables.
• With MANOVA, it’s important to note that the independent variables are categorical, while the dependent variables are
metric in nature.
• A categorical variable is a variable that belongs to a distinct category—for example, the variable “employment status”
could be categorized into certain units, such as “employed full-time,” “employed part-time,” “unemployed,” and so on.
A metric variable is measured quantitatively and takes on a numerical value.
Example of MANOVA:
• Let’s imagine you work for an engineering company that is on a mission to build a super-fast, eco-friendly rocket. You
could use MANOVA to measure the effect that various design combinations have on both the speed of the rocket and
the amount of carbon dioxide it emits. In this scenario, your categorical independent variables could be:
o Engine type, categorized as E1, E2, or E3
o Material used for the rocket exterior, categorized as M1, M2, or M3
o Type of fuel used to power the rocket, categorized as F1, F2, or F3
• Your metric dependent variables are speed in kilometers per hour, and carbon dioxide measured in parts per million.
Using MANOVA, you’d test different combinations (e.g. E1, M1, and F1 vs. E1, M2, and F1, vs. E1, M3, and F1, and
so on) to calculate the effect of all the independent variables. This should help you to find the optimal design solution
for your rocket.
Bayesian Modelling
Learning

supervised unsupervised

Eg: Fruit
Classification
Classifier-
with some Eg: Fruit
orange, apples, Seen lots of
known label Classifier-
bananas example but not
a proper label. “fruit with
soft skin”.
Like clustering “red
fruits”
Supervised
learning

O/P variable
Draw Classification Regression is real or
conclusions continuous
like spam values like
or not, red marks or
or blue Linear weight
Naïve Bayes
regression

Polynomial
Decision tree
regression

SVM
SVM
Regression
What Is Naive Bayes?

Naive Bayes is among one of the simplest, but most powerful


algorithms for classification based on Bayes' Theorem with an
assumption of independence among predictors.

The Naive Bayes classifier assumes that the presence of a


feature in a class is unrelated to any other feature.

Very intuitive classification algorithm.


• It makes the assumption that features
Naive of a measurement are independent of
each other.

• Even if these features depend on each


Naive Why Naïve other or upon the existence of the
Bayes other features, all of these properties
independently contribute to the
probability that a particular fruit is an
apple or an orange or a banana, and
Bayes that is why it is known as "Naive."
Things we would like to do…..
Spam Classification
• Given an email, predict whether it is spam or not

Medical Diagnosis
• Given a list of symptoms, predict whether a patient has disease X or not

Weather
• Based on temperature, humidity, etc… predict if it will rain tomorrow
Feature
Feature x1
X2

Bayesian Classification Feature


Problem statement x3
Given features X1 ,X2 ,…,Xn
Predict a label Y

Label Y
NAÏVE BAYS EXAMPLE:
To predict days suitable for a football match based on weather conditions
Smaller circle- low probability to play (P<0.5)
Big Circle- High probability to play (P>0.5
Combining both the conditions

We get an outlook by combining both the data


Comparing the states when more
information is added

Naïve bayes tries to


understand such
interaction of probability
Probabilistic classifier based on
Bayes’ Theorem

Naïve Bayes Takes independent assumptions between


Classifier the features.
Related dataset
Attribute probability

Probability
of the class

0.60
Prior Probability
Predict the likelihood to play football on
( Season =Winter, Sunny=No, Windy= yes
Probability of match not being played??
Face recognition

Mail classification

Handwriting analysis

Salary prediction
Statistical Learning :

Bayesian Network
Bayesian Network
A simple graphical representation for a joint probability distribution.
• Nodes are random variables
• Directed edges between nodes reflect dependence
Syntax:
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ "directly influences")
– if there is a link from x to y, x is said to be a parent of y
– a conditional distribution for each node given its parents:
P (Xi | Parents (Xi ))

In the simplest case, conditional distribution represented as a conditional


probability table (CPT) giving the distribution over Xi for each
combination of parent values
Step 3: Now choose the parents for each variable by
evaluating conditional independencies
.
✓ Fire is the first variable in the ordering, X1. It does not
have parents.
✓ Tampering independent of fire (learning that one is
true would not change your beliefs about the probability
of the other)
✓ Alarm depends on both Fire and Tampering: it could
be caused by either or both.
✓ Smoke is caused by Fire, and so is independent of
Tampering and Alarm given whether there is a Fire
✓ Leaving is caused by Alarm, and thus is independent
of the other variables given Alarm.
✓ Report is caused by Leaving, and thus is independent
of the other variables given Leaving.
Bayesian
Network
Bayes Nets Representing and Reasoning about
Uncertainty
• I am at work, my neighbor John calls to say that my alarm went off, my neighbor
Mary doesn’t call. Sometimes the alarm is set off by a minor earthquake. Is there
a burglar?
Example 2: Earthquake Example
I am at work, my neighbor John calls to say that my alarm went off,
neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Example 2:
Earthquake Example
Example 2: Earthquake Example

Find the probability when John Call and Marry Calls and Alarm
Went Off and there is no burglary and no earthquake happened.

P(J , M , A ,¬B , ¬ E)
= P(J | A)* P(M | A)* P(A| ¬ B , ¬ E) * P(¬ B) *P(¬ E)
= 0.90 x 0.70 x 0.001 x 0.999 x 0.998
= 0.0006
Inference and Bayesian
Networks
• Bayesian networks are a type of probabilistic
graphical model that uses Bayesian inference for
probability computations.
• Bayesian networks aim to model conditional
dependence, and therefore causation, by
representing conditional dependence by edges in a
directed graph.
• Through these relationships, one can efficiently
conduct inference on the random variables in the
graph through the use of factors.

Inference and Bayesian
Networks
• A Bayesian network is a directed acyclic graph in
which each edge corresponds to a conditional
dependency, and each node corresponds to a unique
random variable.
• Formally, if an edge (A, B) exists in the graph
connecting random variables A and B, it means that
P(B|A) is a factor in the joint probability distribution,
so we must know P(B|A) for all values of B and A in
order to conduct inference.
• In the above example, since Rain has an edge going
into WetGrass, it means that P(WetGrass|Rain) will be
a factor, whose probability values are specified next to
the WetGrass node in a conditional probability table.
• Support Vector Machine, abbreviated as
SVM can be used for both regression and
classification tasks.

• But, it is widely used in classification


objectives.

• The objective of the support vector


Support Vector and machine algorithm is to find a hyperplane
in an N-dimensional space(N — the
Kernel Methods number of features) that distinctly
classifies the data points.

• To separate the two classes of data points,


there are many possible hyperplanes that
could be chosen.
Support Vector Machines

• The objective is to find a plane that has


the maximum margin, i.e the maximum
distance between data points of both
classes.
• Maximizing the margin distance
provides some reinforcement so that
future data points can be classified with
more confidence.
Hyperplane as Decision Surface
• Hyperplanes are decision boundaries that help
classify the data points.
• Data points falling on either side of the
hyperplane can be attributed to different classes.
It is a sort of binary classification
• The dimension of the hyperplane depends upon
the number of features. If the number of input
features is 2, then the hyperplane is just a line. If
the number of input features is 3, then the
hyperplane becomes a two-dimensional plane.
Support Vectors

Support vectors are data points that are


closer to the hyperplane and influence
the position and orientation of the
hyperplane.

Using these support vectors, we


maximize the margin of the classifier.

Deleting the support vectors will change


the position of the hyperplane. These
are the points that help us build our
SVM.
Maximizing the Margin

• In logistic regression, we take the output of the linear function and squash the value
within the range of [0,1] using the sigmoid function.
• If the squashed value is greater than a threshold value(0.5) we assign it a label 1, else we
assign it a label 0.
• In SVM, we take the output of the linear function and if that output is greater than 1, we
identify it with one class and if the output is -1, we identify is with another class.
• Since the threshold values are changed to 1 and -1 in SVM, we obtain this reinforcement
range of values([-1,1]) which acts as margin.
Sec. 15.1

Support Vector • SVMs maximize the margin around the


separating hyperplane.
Machine (SVM) • A.k.a. large margin classifiers
Support vectors • The decision function is fully specified by a
subset of training samples, the support vectors.
• Solving SVMs is a quadratic programming
problem
• Seen by many as the most successful current
text classification method*

Maximizes
Narrower
margin
margin
71
SVM
Sec. 15.1

Maximum Margin: Formalization


w: decision hyperplane normal vector

xi: data point i

yi: class of data point i (+1 or -1) NB: Not 1/0

Classifier is: f(xi) = sign(wTxi + b)

Functional margin of xi is: But note that we can increase this


yi (wTxi + b) margin simply by scaling w, b….
73
Sec. 15.1

Geometric Margin
wT x + b
• Distance from example to the separator is r = y
w
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the width of separation between support vectors of classes.

Derivation of finding r:
x ρ Dotted line x’−x is perpendicular to
decision boundary so parallel to w.
r ➢ Unit vector is w/|w|, so line is
x′ rw/|w|.
x’ = x – yrw/|w|.
x’ satisfies wTx’+b = 0.
So wT(x –yrw/|w|) + b = 0
Recall that |w| = sqrt(wTw).
So wTx –yr|w| + b = 0
So, solving for r gives:
w r = y(wTx + b)/|w|
Sec. 15.1

Linear SVM Mathematically


The linearly separable case
• Assume that all data is at least distance 1 from the hyperplane, then the following two constraints
follow for a training set {(xi ,yi)}

wTxi + b ≥ 1 if yi = 1
wTxi + b ≤ −1 if yi = −1

• For support vectors, the inequality becomes an equality


• Then, since each example’s distance from the hyperplane is
wT x + b
r=y
w
• The margin is:
2
r=
w
75
Sec. 15.1

Linear Support Vector Machine (SVM)

ρ wTxa + b = 1

• Hyperplane
wTxb + b = -1
wT x + b = 0

• Extra scale constraint:


mini=1,…,n |wTxi + b| = 1

• This implies:
wT(xa–xb) = 2
ρ = ||xa–xb||2 = 2/||w||2 wT x + b = 0

76
Solving the Optimization Problem
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1

• This is now optimizing a quadratic function subject to linear constraints


• Quadratic optimization problems are a well-known class of mathematical
programming problem, and many (intricate) algorithms exist for solving them
(with many special ones built for SVMs)
• The solution involves constructing a dual problem where a Lagrange multiplier
αi is associated with every constraint in the primary problem:

Find α1…αN such that


Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
77
The Optimization Problem Solution

• The solution has the form:

w =Σαiyixi b= yk- wTxk for any xk such that αk 0

• Each non-zero αi indicates that corresponding xi is a support vector.


• Then the classifying function will have the form:

f(x) = ΣαiyixiTx + b

• Notice that it relies on an inner product between the test point x and the support vectors xi
• We will return to this later.
• Also keep in mind that solving the optimization problem involved computing the inner products xiTxj
between all pairs of training points.

78
Classification with SVMs

• Given a new point x, we can score its projection onto the


hyperplane normal:
• I.e., compute score: wTx + b = ΣαiyixiTx + b
• Decide class based on whether < or > 0

• Can set confidence threshold t.

Score > t: yes


Score < -t: no
1
0
Else: don’t know -1
79
Linear SVMs: Summary

• The classifier is a separating hyperplane.

• The most “important” training points are the support vectors; they define the hyperplane.

• Quadratic optimization algorithms can identify which training points xi are support vectors with
non-zero Lagrangian multipliers αi.

• Both in the dual formulation of the problem and in the solution, training points appear only inside
inner products:

Find α1…αN such that f(x) = ΣαiyixiTx + b


Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

80
Non-linear SVMs

• Datasets that are linearly separable (with some noise) work out great:
0 x

• But what are we going to do if the dataset is just too hard?


0 x

• How about … mapping data to a higher-dimensional


x2 space:

0 x
81
Non-linear SVMs: Feature spaces

• General idea: the original feature space can always be mapped to


some higher-dimensional feature space where the training set is
separable:

Φ: x → φ(x)

82
The “Kernel Trick”
• The linear classifier relies on an inner product between vectors K(xi,xj)=xiTxj
• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the
inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is some function that corresponds to an inner product in some expanded feature
space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj) where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
83
Sec. 15.2.3

Kernels

Why use kernels?

• Make non-separable problem separable.


• Map data into better representational space

Common kernels

• Linear
• Polynomial K(x,z) = (1+xTz)d
• Gives feature conjunctions
• Radial basis function (infinite dimensional space)

Haven’t been very useful in text classification


84
Analysis of Time Series
A time series is nothing but a sequence of various data points that
occurred in a successive order for a given period of time.

Objectives:
•To understand how time series works, what factors are affecting a
certain variable(s) at different points of time.
•Time series analysis will provide the consequences and insights of
features of the given dataset that changes over time.
•Supporting to derive the predicting the future values of the time series
variable.
•Assumptions: There is one and the only assumption that is
“stationary”, which means that the origin of time, does not affect the
properties of the process under the statistical factor.
How to analyze Time Series?

• Collecting the data and cleaning it


• Preparing Visualization with respect to time vs key feature
• Observing the stationarity of the series
• Developing charts to understand its nature.
• Model building – AR, MA, ARMA and ARIMA
• Extracting insights from prediction
Significance of Time Series and its types
• TSA is the backbone for prediction and forecasting analysis, specific to the time-based
problem statements.
• Analyzing the historical dataset and its patterns
• Understanding and matching the current situation with patterns derived from the previous
stage.
• Understanding the factor or factors influencing certain variable(s) in different periods.
• With help of “Time Series” we can prepare numerous time-based analyses and results.
o Forecasting
o Segmentation
o Classification
o Descriptive analysis`
o Intervention analysis
Components of Time Series Analysis

• Trend: In which there is no fixed interval and


any divergence within the given dataset is a
continuous timeline. The trend would be
Negative or Positive or Null Trend
• Seasonality: In which regular or fixed interval
shifts within the dataset in a continuous
timeline. Would be bell curve or saw tooth
• Cyclical: In which there is no fixed interval,
uncertainty in movement and its pattern
• Irregularity: Unexpected
situations/events/scenarios and spikes in a
short time span.
Data Types of Time
Series
Stationary: A dataset should follow the below
thumb rules, without having Trend, Seasonality,
Cyclical, and Irregularity component of time series
• The MEAN value of them should be completely
constant in the data during the analysis
• The VARIANCE should be constant with respect
to the time-frame
• The COVARIANCE measures the relationship
between two variables.
Non- Stationary: This is just the opposite of
Stationary.
Fuzzy Logic: Extracting Fuzzy Models from
Data
• The fuzzy logic works on the levels of possibilities of input to achieve the definite
output.
Implementation
• It can be implemented in systems with various sizes and capabilities ranging from
small micro-controllers to large, networked, workstation-based control systems.
• It can be implemented in hardware, software, or a combination of both.
Why Fuzzy Logic?
• Fuzzy logic is useful for commercial and practical purposes.
• It can control machines and consumer products.
• It may not give accurate reasoning, but acceptable reasoning.
• Fuzzy logic helps to deal with the uncertainty in engineering.
• Membership functions allow you to quantify linguistic term and
represent a fuzzy set graphically. A membership function for a
Fuzzy Logic: fuzzy set A on the universe of discourse X is defined as μA:X → [0,1].
Membership • Here, each element of X is mapped to a value between 0 and 1. It is
called membership value or degree of membership. It quantifies the
Function degree of membership of the element in X to the fuzzy set A.
• x axis represents the universe of discourse.
• y axis represents the degrees of membership in the [0, 1] interval.
• There can be multiple membership functions applicable to fuzzify a
numerical value. Simple membership functions are used as use of
complex functions does not add more precision in the output.
Fuzzy Logic: Extracting Fuzzy Models from Data

• In the boolean system truth value, 1.0 represents the absolute


truth value and 0.0 represents the absolute false value.
• But in the fuzzy system, there is no logic for the absolute truth
and absolute false value.
• But in fuzzy logic, there is an intermediate value too present
which is partially true and partially false.
Fuzzy Logic: Extracting Fuzzy Models from Data

Classical set
1. Classical set is a collection of distinct objects. For example, a set of students passing
grades.
2. Each individual entity in a set is called a member or an element of the set.
3. The classical set is defined in such a way that the universe of discourse is splitted into two
groups members and non-members. Hence, In case classical sets, no partial
membership exists.
4. Let A is a given set. The membership function can be use to define a set A is given by:
Fuzzy Logic: Extracting Fuzzy Models from Data

Fuzzy set:

1. Fuzzy set is a set having degrees of membership between 1 and 0. Fuzzy sets are
represented with tilde character(~). For example, Number of cars following traffic signals
at a particular time out of all cars present will have membership value between [0,1].
2. Partial membership exists when member of one fuzzy set can also be a part of other fuzzy
sets in the same universe.
3. The degree of membership or truth is not same as probability, fuzzy truth represents
membership in vaguely defined sets.
4. A fuzzy set A~ in the universe of discourse, U, can be defined as a set of ordered pairs
and it is given by
Fuzzy Logic: Extracting Fuzzy Models from Data

Common Operations on fuzzy sets: Given two Fuzzy sets A~ and B~


❑ Union : Fuzzy set C~ is union of Fuzzy sets A~ and B~ :

❑ Intersection: Fuzzy set D~ is intersection of Fuzzy sets A~ and B~ :

❑ Complement: Fuzzy set E~ is complement of Fuzzy set A~


❑ Algebraic sum:
❑ Algebraic:
❑ Algebraic product:
❑ Bounded sum:
❑ Bounded difference:
Fuzzy Set Example
• A=(x1,0.2)(x2,0.3)(x3,0.5)
• B=(x1,0.3)(x2,0.4),(x3,0.1)
• Find A union B
Fuzzy Logic: ARCHITECTURE

It has four main parts as shown −


• Fuzzification Module − It transforms the system
inputs, which are crisp numbers, into fuzzy sets.
• Knowledge Base − It stores IF-THEN rules
provided by experts.
• Inference Engine − It simulates the human
reasoning process by making fuzzy inference on
the inputs and IF-THEN rules.
• Defuzzification Module − It transforms the
fuzzy set obtained by the inference engine into a
crisp value.
Fuzzy Logic: Example- Air Conditioner
Fuzzy Logic: Extracting Fuzzy Models from
Data
Algorithm
• Define linguistic Variables and terms (start)
• Construct membership functions for them. (start)
• Construct knowledge base of rules (start)
• Convert crisp data into fuzzy data sets using membership functions.
(fuzzification)
• Evaluate rules in the rule base. (Inference Engine)
• Combine results from each rule. (Inference Engine)
• Convert output data into non-fuzzy values. (defuzzification)
Fuzzy Logic: Extracting Fuzzy Models from
Data
• Step 1 − Define linguistic variables and terms
• Linguistic variables are input and output variables in the
form of simple words or sentences. For room
temperature, cold, warm, hot, etc., are linguistic terms.
• Temperature (t) = {very-cold, cold, warm, very-warm,
hot}
• Every member of this set is a linguistic term and it can
cover some portion of overall temperature values.
• Step 2 − Construct membership functions for them
• The membership functions of temperature variable are
as shown −
Fuzzy Logic: Extracting Fuzzy Models from Data
RoomTemp.
• Step3 − Construct knowledge base /Target
Very_Cold Cold Warm Hot Very_Hot
rules
Very_Cold No_Change Heat Heat Heat Heat
• Create a matrix of room temperature
values versus target temperature values Cold Cool No_Change Heat Heat Heat
that an air conditioning system is
expected to provide.
Warm Cool Cool No_Change Heat Heat

Hot Cool Cool Cool No_Change Heat

Very_Hot Cool Cool Cool Cool No_Change


• Build a set of rules into the knowledge
base in the form of IF-THEN-ELSE
structures. Sr. No. Condition Action

1 IF temperature=(Cold OR Very_Cold) AND target=Warm THEN Heat

2 IF temperature=(Hot OR Very_Hot) AND target=Warm THEN Cool

3 IF (temperature=Warm) AND (target=Warm) THEN No_Change


Fuzzy Logic: Extracting Fuzzy Models from Data

• Step 4 − Obtain fuzzy value


• Fuzzy set operations perform evaluation of
rules. The operations used for OR and AND
are Max and Min respectively. Combine all
results of evaluation to form a final result.
This result is a fuzzy value.
• Step 5 − Perform defuzzification
• Defuzzification is then performed according
to membership function for output variable.
Fuzzy Logic: Extracting Fuzzy Models from
Data
Advantages of Fuzzy Logic System
• This system can work with any type of inputs whether it is imprecise, distorted or noisy input
information.
• The construction of Fuzzy Logic Systems is easy and understandable.
• Fuzzy logic comes with mathematical concepts of set theory and the reasoning of that is quite simple.
• It provides a very efficient solution to complex problems in all fields of life as it resembles human
reasoning and decision-making.
• The algorithms can be described with little data, so little memory is required.
Disadvantages of Fuzzy Logic Systems
• Many researchers proposed different ways to solve a given problem through fuzzy logic which leads to
ambiguity. There is no systematic approach to solve a given problem through fuzzy logic.
• Proof of its characteristics is difficult or impossible in most cases because every time we do not get a
mathematical description of our approach.
• As fuzzy logic works on precise as well as imprecise data so most of the time accuracy is compromised.
Fuzzy Logic: Extracting Fuzzy Models from
Data
Application
• It is used in the aerospace field for altitude control of spacecraft and satellites.
• It has been used in the automotive system for speed control, traffic control.
• It is used for decision-making support systems and personal evaluation in the large
company business.
• It has application in the chemical industry for controlling the pH, drying, chemical
distillation process.
• Fuzzy logic is used in Natural language processing and various intensive
applications in Artificial Intelligence.
• Fuzzy logic is extensively used in modern control systems such as expert systems.
• Fuzzy Logic is used with Neural Networks as it mimics how a person would make
decisions, only much faster. It is done by Aggregation of data and changing it into
more meaningful data by forming partial truths as Fuzzy sets.

You might also like