Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

ML Using Python Unit3 pdf

The document provides an overview of regression analysis, explaining its purpose in predicting continuous variables and detailing various types of regression algorithms, including linear, polynomial, and logistic regression. It highlights the differences between simple and multiple linear regression, as well as the significance of logistic regression for classification problems. Additionally, it discusses maximum likelihood estimation (MLE) and its applications in statistical modeling.

Uploaded by

Uday
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ML Using Python Unit3 pdf

The document provides an overview of regression analysis, explaining its purpose in predicting continuous variables and detailing various types of regression algorithms, including linear, polynomial, and logistic regression. It highlights the differences between simple and multiple linear regression, as well as the significance of logistic regression for classification problems. Additionally, it discusses maximum likelihood estimation (MLE) and its applications in statistical modeling.

Uploaded by

Uday
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1)Give an introduction to Regression?

Regression: Regression algorithms are used if there is a relationship between the input
variable and the output variable.
Regression is a process of finding the correlations between dependent and independent
variables.
It helps in predicting the continuous variables such as prediction of Market Trends,
prediction of House prices, Weather forecasting etc.
The task of the Regression algorithm is to find the mapping function to map the input
variable(x) to the continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data, and once
the training is completed, it can easily predict the weather for future days.
Types of Regression Algorithm:
1. Simple Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. Support Vector Regression
5. Decision Tree Regression
6. Random Forest Regression
7. Ridge Regression
8. Lasso Regression
2)Explain Linear Regression?
Linear regression is one of the easiest and most popular Machine Learning algorithms.
It is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product price,
etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression.
Since linear regression shows the linear relationship, which means it finds how the value of
the dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
between the variables.
Linear Regression Line:

A linear line showing the relationship between the dependent and independent variables is
called a regression line. A regression line can show two types of relationship:
Positive Linear Relation:If the dependent variable increases on the Y-axis and
independent variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.

Negative Linear Relation:If the dependent variable decreases on the Y-axis and
independent variable increases on the X-axis, then such a relationship is called a negative
linear relationship.
Types of Linear Regression:
a) Simple Linear Regression
b) Multiple Linear Regression
a) Simple Linear Regression:If a single independent variable is used to predict the value of
a numerical dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.

Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by a
Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression.

The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous or
categorical values.

Simple Linear regression algorithm has mainly two objectives:


o Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
o Forecasting new observations. Such as Weather forecasting according to temperature,
Revenue of a company according to the investments in a year, etc.
Simple Linear Regression Model:
y= a0+a1x+ ε

Where,

a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
b)Multiple Linear regression: If more than one independent variable is used to predict the
value of a numerical dependent variable, then such a Linear Regression algorithm is called
Multiple Linear Regression.

Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it takes


more than one predictor variable to predict the response variable. We can define it as:
Multiple Linear Regression is one of the important regression algorithms which models the
linear relationship between a single dependent continuous variable and more than one
independent variable.

Ex:Prediction of CO2 emission based on engine size and number of cylinders in a car.

Some key points about MLR:


o For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
o Each feature variable must model the linear relationship with the dependent variable.
oMLR tries to fit a regression line through a multidimensional space of data-points.
MLR equation:

In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple


predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so
the same is applied for the multiple linear regression equation, the equation becomes:
Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</sub>x<sub>2</sub>
+ b<sub>3</sub>x<sub>3</sub>+...... bnxn ............... (a)

Where,

Y= Output/Response variable

b0, b1, b2, b3 , bn....= Coefficients of the model.

x1, x2, x3, x4,...= Various Independent/feature variable

3)Explain Poynomial Regression?


Polynomial Regression is a regression algorithm that models the relationship between a
dependent(y) and independent variable(x) as nth degree polynomial. The Polynomial
Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
o It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear functions
and datasets.
o Hence, "In Polynomial regression, the original features are converted into
Polynomial features of required degree (2,3,..,n) and then modeled using a linear
model."

Need for Polynomial Regression:

The need of Polynomial Regression in ML can be understood in the below points:


o If we apply a linear model on a linear dataset, then it provides us a good result as we
have seen in Simple Linear Regression, but if we apply the same model without any
modification on a non-linear dataset, then it will produce a drastic output. Due to
which loss function will increase, the error rate will be high, and accuracy will be
decreased.
o So for such cases, where data points are arranged in a non-linear fashion, we need
the Polynomial Regression model. We can understand it in a better way using the
below comparison diagram of the linear dataset and non-linear dataset.

o In the above image, we have taken a dataset which is arranged non-linearly. So if we


try to cover it with a linear model, then we can clearly see that it hardly covers any data
point. On the other hand, a curve is suitable to cover most of the data points, which is
of the Polynomial model.
o Hence, if the datasets are arranged in a non-linear fashion, then we should use the
Polynomial Regression model instead of Simple Linear Regression.
Steps for Polynomial Regression:

The main steps involved in Polynomial Regression are given below:


o Data Pre-processing
o Build a Linear Regression model and fit it to the dataset
o Build a Polynomial Regression model and fit it to the dataset
o Visualize the result for Linear Regression and Polynomial Regression model.
o Predicting the output.

4)Explain Logistic Regression?


Logistic Regression in Machine Learning
o Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the
ability to provide probabilities and classify new data using continuous and discrete
datasets.
o Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
The below image is showing the logistic function:

Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it
is called logistic regression, but is used to classify samples; Therefore, it falls under the
classification algorithm.
Logistic Function (Sigmoid Function):
o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond
this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid
function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation
it will become:

The above equation is the final equation for Logistic Regression.

5)Discuss maximum likelihood Estimation in detail?

The likelihood of a given set of observations is the probability of obtaining that particular set
of data, given chosen probability distribution model.
MLE is carried out by writing an expression known as the Likelihood function for a set of
observations. This expression contains an unknown parameter, say, θ of he model. We obtain
the value of this parameter that maximizes the likelihood of the observations. This value is
called maximum likelihood estimate.
Think of MLE as opposite of probability. While probability function tries to determine the
probability of the parameters for a given sample, likelihood tries to determine the probability
of the samples given the parameter.

Properties of Maximum Likelihood Estimates:


likelihood function are very efficient in testing hypothesis about models and parameters.

They become unbiased minimum variance estimator with increasing sample size they have
approximate normal distributions.
Deriving the Likelihood Function:
Assuming a random sample x1, x2, x3, … ,xn which have joint probability density and denoted
by:

L(θ) = f(x1, x2, x3, … ,xn|θ)

where θ is a parameter of the distribution with unknown value.


We need to find the most likely value of the parameter θ given the set observations. To do this,
we use a likelihood function.
The likelihood function is defined as:
L(θ) = f(x1, x2, x3, … ,xn|θ)
which is considered as a function of θ
If we assume that the sample is normally distributed, then we can define the likelihood estimate
for θ as the value of θ that maximizes the L(θ), that is the value of θ that makes the data set
most likely.
We can split the function f(x1, x2, x3, … ,xn|θ) as a product of univariates such that:
L(θ) = f(x1, x2, x3, … ,xn|θ) = f(x1|θ) +f(x2|θ), + f(x3|θ) +… + f(xn|θ)
which would give us the same results.

So the question is ‘what would be the maximum value of θ for the given observations? This
can be found by maximizing this product using calculus methods, which is not covered in this
lesson.

Log Likelihood: .

Maximizing the likelihood function derived above can be a complex operation. So to work
around this, we can use the fact that the logarithm of a function is also an increasing function.
So maximizing the logarithm of the likelihood function, would also be equivalent to
maximizing the likelihood function.
This is given as:

So at this point, the result we have from maximizing this function is known as ‘maximum
likelihood estimate‘ for the given function
Applications of Maximum Likelihood Estimation:
MLE can be applied in different statistical models including;
• linear and generalized linear models,
• exploratory and confirmatory analysis,
• communication system,
• econometrics and signal detection.

You might also like