Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views14 pages

Outliers and Influential Points

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 14

Welcome

to
The presentation on the topic:

Outliers and Influential Points


Outliers and Influential Points
START

Contents

1. Definition of Outliers 1

2. Graphical Display of Outliers 2

3. Leverage Points 3

4
4. Influential Points
5. Necessity and Situation of Detecting Outliers 5

6. Detecting Outliers in Univariate Case 6


Fazlur Rahman
7. Detecting Outliers in Bivariate Case 7 MS 2nd Semester
Reg: 2017124009
8. Detecting Outliers in Multivariate Case 8 11 Department of Statistics,
Shahjalal University of Science and
Technology, Sylhet-3114
1
Outliers
Outliers are points that falls away from the cloud of points

Data points that diverge in a big way from the overall pattern are called outliers

There are four ways that a data point might be considered an outlier:
Broadly

1. It could have an extreme X value compared to other data points.


,

2. It could have an extreme Y value compared to other data points.


3. It could have extreme X and Y values.
4. It might be distant from the rest of the data, even without extreme X or Y values.
Graphically
2

Extreme X values Extreme Y values

1 2
Extreme X and Y values Distant data point
3 4
3

Leverage Points

Outliers that fall horizontally (x-values) away from the center of the cloud but
don’t influence the slope of the regression line are called leverage points
4

Influential Points

The outliers which influence the slope of the regression line are called
influential points

Y
Y

X X

Therefore, an influential point will typically have high leverage,


But a high leverage point is not necessarily an influential point
5
Detecting Outliers (Influential Points)

The influential points:


1. Greatly affects the slope of the regression line
2. Roughly increases or decreases the coefficient of determination
3. Mislead the results and very much affect the prediction
Therefore, in model fitting and prediction, detecting the influential points is very necessary.

Situation of detecting outliers (influential points):


There are three main situation in detecting the outliers

1. Univariate
2. Bivariate
3. Multivariate
Method of Detecting Outliers (Influential Points) 6

1. Univariate

Theoretically: The points that are lied away from the distance of
three standard deviation This can be done simply using Z-Score

Decision: The values less than -3 and greater then +3 is considered as outliers of Z-Score

Graphically: Box and Whisker Plot


Method of Detecting Outliers (Influential Points) 7

2. Bivariate

Scatter plot is very simple and useful way to find out the existence of outliers

One way to test the influence of an outlier is to compute the


regression equation with and without the outlier
Without Outlier With Outlier

Regression equation: ŷ = 104.78 - 4.10x Regression equation: ŷ = 97.51 - 3.32x


Coefficient of determination: R2 = 0.94 Coefficient of determination: R2 = 0.55
Method of Detecting Outliers (Influential Points) 8

3. Multivariate
1. Cook’s D Bar Plot
2. Cook’s D Chart
3. DFBETAs Panel
4. DFFITs Plot
5. Studentized Residual Plot
6. Standardized Residual Chart
7. Studentized Residuals vs Leverage Plot
8. Deleted Studentized Residual vs Fitted Values Plot
9. Hadi Plot
10. Potential Residual Plot
Cook’s D Bar Plot 9

> library(olsrr) Model Data


> model1 <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
> ols_plot_cooksd_bar(model1)
10
Standardized Residual Chart

> library(olsrr) Model Data


> model2 <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
> ols_plot_resid_stand(model2)
Studentized Residuals vs Leverage Plot 11

> library(olsrr) Model Data


> model3 <- lm(read ~ write + math + science, data = hsb)
> ols_plot_resid_lev(model3)
14

Thank
You!
A ny Q
uestio
n ?? ?

You might also like