Open navigation menu

Scribd

0% found this document useful (0 votes)

40 views

6.outlier Code - Jupyter Notebook

The document discusses various techniques to remove outliers from a dataset. It loads mba data from a CSV file and calculates the lower and upper boundaries to identify outliers. It then removes outliers by filtering rows that are below the lower or above the upper boundary. Alternatively, it replaces outlier values with the lower or upper boundary values or the mean to handle outliers.

Uploaded by

Copyright

© © All Rights Reserved

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

6.outlier Code - Jupyter Notebook

The document discusses various techniques to remove outliers from a dataset. It loads mba data from a CSV file and calculates the lower and upper boundaries to identify outliers. It then removes outliers by filtering rows that are below the lower or above the upper boundary. Alternatively, it replaces outlier values with the lower or upper boundary values or the mean to handle outliers.

Uploaded by

Copyright

© © All Rights Reserved

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

In

[1]: import matplotlib.pyplot as plt

import seaborn as sns
import pandas as pd
import numpy as np

In [2]: mba = pd.read_csv("D:\\Course\\Python\\Datasets\\mba.csv")

In [3]: mba.boxplot()

...

In [4]: mba.boxplot(column='gmat')

Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x204a6a40f88>

In [5]: # other way to draw boxplot

import seaborn as sns
sns.boxplot(mba['gmat'])

...

In [9]: # Process to Remove Outliers

In [6]: mba.describe()

...

In [6]: #Q11 = 690

#Q33 = 730
#IQR = q3-q1
#low = q1 - 1.5 *IQR
#high = q3+1.5 *IQR

In [7]: q1 = mba['gmat'].quantile(0.25)

In [8]: q1

Out[8]: 690.0

In [9]: q3 = mba['gmat'].quantile(0.75)
#q3 = 730

In [10]: q3

Out[10]: 730.0

In [13]: iqr = q3-q1

iqr

Out[13]: 40.0

In [14]: # Lower Boundary

low = q1-1.5*iqr
low

Out[14]: 630.0

In [15]: # Upper Boundary

high = q3+1.5*iqr
high

Out[15]: 790.0

In [16]: mba.shape

Out[16]: (773, 3)

In [17]: # any value which are greater than lower boundary and less than Upper boundary ar

mba1 = mba.loc[(mba['gmat'] > low) & (mba['gmat'] < high)]

In [15]: mba.shape

Out[15]: (773, 3)

In [14]: mba1.shape # 15 values are removed

Out[14]: (758, 3)
In [20]: import seaborn as sns
sns.boxplot(mba1['gmat'])

...

In [18]: # Method 2 to Remove the outlier

In [23]: out = mba[(mba['gmat'] < low) | (mba['gmat']> high)].index

In [24]: out

Out[24]: Int64Index([189, 337, 392, 403, 478, 491, 653, 766, 768, 770, 771, 772], dtype
='int64')

In [25]: mba2 = mba.drop(out)

In [26]: mba2

...

In [27]: mba.drop(out,inplace=True)

In [28]: mba

...

Method 3
Replace the Outlier

lower outlier is repalced with lower boundary

Upper outlier is replaced with upper boundary

In [37]: mba = pd.read_csv("D:\\Course\\Python\\Datasets\\mba.csv")

mba

...

In [30]: mba[(mba['gmat'] < low)]

...

In [33]: low = q1-1.5*iqr

low

Out[33]: 630.0

In [34]: mba[(mba["gmat"]<low)]

...
In [41]: out1 = mba[(mba["gmat"]<low)].values
out1

...

In [42]: mba['gmat'].replace(out1,low,inplace=True)

In [43]: mba

...

In [49]: sns.boxplot(mba['gmat'])
...

To replace with mean or median

In [46]: mean = mba['gmat'].mean()

In [47]: mean

Out[47]: 711.4230271668823

In [50]: mba = pd.read_csv("D:\\Course\\Python\\Datasets\\mba.csv")

mba

...

In [55]: mba['gmat'].replace(out1,mean,inplace=True)
In [56]: mba

Out[56]: Datasrno workex gmat

0 1 21 720.000000

1 2 107 640.000000

2 3 57 740.000000

3 4 99 690.000000

4 5 208 710.000000

... ... ... ...

768 769 88 711.423027

769 770 132 670.000000

770 771 28 711.423027

771 772 10 711.423027

772 773 52 711.423027

773 rows × 3 columns

In [ ]:

You might also like

Assessment of Learning 1 Exam
83% (6)
Assessment of Learning 1 Exam
53 pages
Project-Password Strength Classifier
No ratings yet
Project-Password Strength Classifier
6 pages
Applied Medical Statisticsv2
No ratings yet
Applied Medical Statisticsv2
277 pages
linear-regression
No ratings yet
linear-regression
8 pages
DSP_Lec6
No ratings yet
DSP_Lec6
10 pages
AIML
No ratings yet
AIML
5 pages
DSP_Lec8
No ratings yet
DSP_Lec8
12 pages
7 Data Transformation - Jupyter Notebook
No ratings yet
7 Data Transformation - Jupyter Notebook
3 pages
統計學習CH2 Lab - Jupyter Notebook (直向)
No ratings yet
統計學習CH2 Lab - Jupyter Notebook (直向)
41 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
ML Journal
No ratings yet
ML Journal
58 pages
scipy - Jupyter Notebook
No ratings yet
scipy - Jupyter Notebook
8 pages
Bai Nop Ngay 03.12.23pdf
No ratings yet
Bai Nop Ngay 03.12.23pdf
4 pages
上海餐饮情况分析
No ratings yet
上海餐饮情况分析
17 pages
Cia 1.1
No ratings yet
Cia 1.1
7 pages
Python Warm Up
No ratings yet
Python Warm Up
9 pages
seaborn
No ratings yet
seaborn
1 page
Analisis Dinamico Eje X
No ratings yet
Analisis Dinamico Eje X
24 pages
3 - Intro - To - Python: 1 Using Python As A Calculator
No ratings yet
3 - Intro - To - Python: 1 Using Python As A Calculator
16 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Assignment 11
100% (1)
Assignment 11
7 pages
Jupyter Notebook Viewer-Plotlib1
No ratings yet
Jupyter Notebook Viewer-Plotlib1
15 pages
Linear Algebra For Quantum Computing (From Amelie Schreiber Notebook)
No ratings yet
Linear Algebra For Quantum Computing (From Amelie Schreiber Notebook)
72 pages
Untitled 1
No ratings yet
Untitled 1
18 pages
Nadir GSLIB
No ratings yet
Nadir GSLIB
55 pages
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
No ratings yet
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
39 pages
Scenario 1:: Acknowlegement
No ratings yet
Scenario 1:: Acknowlegement
17 pages
Numbers - 1 - Jupyter Notebook
No ratings yet
Numbers - 1 - Jupyter Notebook
8 pages
Human Activity Recognition Using Smartphone Data
No ratings yet
Human Activity Recognition Using Smartphone Data
18 pages
Assign 1
No ratings yet
Assign 1
2 pages
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
No ratings yet
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
29 pages
Lab 1
No ratings yet
Lab 1
7 pages
Linear Regression Mca Lab - Jupyter Notebook
No ratings yet
Linear Regression Mca Lab - Jupyter Notebook
2 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
Simple_and_Multiple_Regression
No ratings yet
Simple_and_Multiple_Regression
9 pages
General Physics HW1 112022105
No ratings yet
General Physics HW1 112022105
4 pages
Python Practical solution
No ratings yet
Python Practical solution
54 pages
tesla_time_series
No ratings yet
tesla_time_series
18 pages
Act5 Wisnu Trenggono Wirayuda 57418379
No ratings yet
Act5 Wisnu Trenggono Wirayuda 57418379
9 pages
Netflix Stock Price Prediction
No ratings yet
Netflix Stock Price Prediction
20 pages
Python Fundamentals for Non - Programmers
No ratings yet
Python Fundamentals for Non - Programmers
15 pages
Cia Code
No ratings yet
Cia Code
38 pages
Python Qazaqsha Sabak 3
No ratings yet
Python Qazaqsha Sabak 3
12 pages
Rasters
No ratings yet
Rasters
8 pages
Sklearn Tutorial: DNN On Boston Data
No ratings yet
Sklearn Tutorial: DNN On Boston Data
9 pages
2.basic Statistics - Jupyter Notebook
100% (1)
2.basic Statistics - Jupyter Notebook
7 pages
Practical-5 - Jupyter Notebook
100% (1)
Practical-5 - Jupyter Notebook
8 pages
SageMath note 1
No ratings yet
SageMath note 1
21 pages
Poisonousmushrooms: 1 Importing The Libraries
No ratings yet
Poisonousmushrooms: 1 Importing The Libraries
8 pages
Numpy Session1
No ratings yet
Numpy Session1
1 page
2021-civ-73-part-2 python
No ratings yet
2021-civ-73-part-2 python
19 pages
Aula4 - (Arrays)
No ratings yet
Aula4 - (Arrays)
5 pages
Assignment 2 Utkarsh
No ratings yet
Assignment 2 Utkarsh
6 pages
Advanced Python
No ratings yet
Advanced Python
48 pages
Python Qazaqsha Sabak 2
No ratings yet
Python Qazaqsha Sabak 2
11 pages
P3) Code Neural Networks
No ratings yet
P3) Code Neural Networks
3 pages
STOCK - MARKET - PROJECT - Jupyter Notebook
No ratings yet
STOCK - MARKET - PROJECT - Jupyter Notebook
24 pages
NumPy Sorting
No ratings yet
NumPy Sorting
19 pages
Slip NO - 1
No ratings yet
Slip NO - 1
10 pages
Strings
No ratings yet
Strings
1 page
3-LinearRegression Formula Based
No ratings yet
3-LinearRegression Formula Based
3 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
3 SVM - Jupyter Notebook
No ratings yet
3 SVM - Jupyter Notebook
4 pages
6 XG Boost - Jupyter Notebook
100% (1)
6 XG Boost - Jupyter Notebook
3 pages
7 Looping Statements (While and For)
No ratings yet
7 Looping Statements (While and For)
5 pages
2 Basic of Python - Functions
No ratings yet
2 Basic of Python - Functions
3 pages
Hirerachical Clustering - Jupyter Notebook
No ratings yet
Hirerachical Clustering - Jupyter Notebook
4 pages
Tuple
No ratings yet
Tuple
4 pages
Label Encoders - Jupyter Notebook
No ratings yet
Label Encoders - Jupyter Notebook
3 pages
1 Basics of Python
No ratings yet
1 Basics of Python
6 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
2 MLR New - Jupyter Notebook
No ratings yet
2 MLR New - Jupyter Notebook
3 pages
1 Simple Linear Regression
No ratings yet
1 Simple Linear Regression
9 pages
5 Random Forest - Jupyter Notebook
No ratings yet
5 Random Forest - Jupyter Notebook
2 pages
Lampiran 6 Hasil Uji Statistik
No ratings yet
Lampiran 6 Hasil Uji Statistik
3 pages
Two_samples_t_test_same_n
No ratings yet
Two_samples_t_test_same_n
1 page
Estimating The Sample Mean and Standard Deviation From The Sample Size, Median, Range And/or Interquartile Range
No ratings yet
Estimating The Sample Mean and Standard Deviation From The Sample Size, Median, Range And/or Interquartile Range
14 pages
9g Measures of Centre
No ratings yet
9g Measures of Centre
2 pages
Data Exploration and Regression in Python With HBAT Dataset
No ratings yet
Data Exploration and Regression in Python With HBAT Dataset
4 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
MARIA MASRIAT Ujian Biostatistik
No ratings yet
MARIA MASRIAT Ujian Biostatistik
35 pages
Statistic and Probability WEEK 3 - 4 - MODULE 2 Answer Key
No ratings yet
Statistic and Probability WEEK 3 - 4 - MODULE 2 Answer Key
2 pages
December 2018 Question Paper (Based On Memory) : Marks 100 Time: 2 Hours
No ratings yet
December 2018 Question Paper (Based On Memory) : Marks 100 Time: 2 Hours
14 pages
11 Economics (HM) Complete
No ratings yet
11 Economics (HM) Complete
138 pages
Measures of Central Tendency: Maximo A. Llego, JR
No ratings yet
Measures of Central Tendency: Maximo A. Llego, JR
43 pages
Grid Data Report Kontur
No ratings yet
Grid Data Report Kontur
7 pages
Engineering Data Analysis (Mod2)
100% (1)
Engineering Data Analysis (Mod2)
7 pages
2.measures of Variation by Shakil-1107
No ratings yet
2.measures of Variation by Shakil-1107
18 pages
Arina Fathima Az-Zahra - 2110801022 (A)
No ratings yet
Arina Fathima Az-Zahra - 2110801022 (A)
2 pages
Hubungan Dukungan Sosial Teman Sebaya Dengan Stres Pada Mahasiswa Yang Mengerjakan Skripsi Di Fakultas Keperawatan Unklab
No ratings yet
Hubungan Dukungan Sosial Teman Sebaya Dengan Stres Pada Mahasiswa Yang Mengerjakan Skripsi Di Fakultas Keperawatan Unklab
7 pages
18bge14a U4
No ratings yet
18bge14a U4
16 pages
Ifp 20-21 PNS1 Unit9 Ce01 Liveworkshop
No ratings yet
Ifp 20-21 PNS1 Unit9 Ce01 Liveworkshop
44 pages
Bsafc4 PPT Ch10 (Anova) - Compressed
No ratings yet
Bsafc4 PPT Ch10 (Anova) - Compressed
87 pages
ST-2 (BAS-303) (Special)
No ratings yet
ST-2 (BAS-303) (Special)
2 pages
Skewness
No ratings yet
Skewness
39 pages
Questions For 2nd Midterm Exam
No ratings yet
Questions For 2nd Midterm Exam
5 pages
2C PDF
No ratings yet
2C PDF
14 pages
Unit-14 (1) - Area of Normal Distribution
No ratings yet
Unit-14 (1) - Area of Normal Distribution
32 pages
DMPT S-2024
No ratings yet
DMPT S-2024
2 pages
Section 2 - Descriptive Multivariate Statistics
No ratings yet
Section 2 - Descriptive Multivariate Statistics
9 pages
Business Statistics Notes
No ratings yet
Business Statistics Notes
6 pages
Coefficient of Variation and Areas Under Normal Curve
No ratings yet
Coefficient of Variation and Areas Under Normal Curve
5 pages

Assessment of Learning 1 Exam
Assessment of Learning 1 Exam
Project-Password Strength Classifier
Project-Password Strength Classifier
Applied Medical Statisticsv2
Applied Medical Statisticsv2
linear-regression
linear-regression
DSP_Lec6
DSP_Lec6
AIML
AIML
DSP_Lec8
DSP_Lec8
7 Data Transformation - Jupyter Notebook
7 Data Transformation - Jupyter Notebook
統計學習CH2 Lab - Jupyter Notebook (直向)
統計學習CH2 Lab - Jupyter Notebook (直向)
Import As Import As Import As: "Default - CSV"
Import As Import As Import As: "Default - CSV"
ML Journal
ML Journal
scipy - Jupyter Notebook
scipy - Jupyter Notebook
Bai Nop Ngay 03.12.23pdf
Bai Nop Ngay 03.12.23pdf
上海餐饮情况分析
上海餐饮情况分析
Cia 1.1
Cia 1.1
Python Warm Up
Python Warm Up
seaborn
seaborn
Analisis Dinamico Eje X
Analisis Dinamico Eje X
3 - Intro - To - Python: 1 Using Python As A Calculator
3 - Intro - To - Python: 1 Using Python As A Calculator
NumPy 2
NumPy 2
Assignment 11
Assignment 11
Jupyter Notebook Viewer-Plotlib1
Jupyter Notebook Viewer-Plotlib1
Linear Algebra For Quantum Computing (From Amelie Schreiber Notebook)
Linear Algebra For Quantum Computing (From Amelie Schreiber Notebook)
Untitled 1
Untitled 1
Nadir GSLIB
Nadir GSLIB
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
Scenario 1:: Acknowlegement
Scenario 1:: Acknowlegement
Numbers - 1 - Jupyter Notebook
Numbers - 1 - Jupyter Notebook
Human Activity Recognition Using Smartphone Data
Human Activity Recognition Using Smartphone Data
Assign 1
Assign 1
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
Lab 1
Lab 1
Linear Regression Mca Lab - Jupyter Notebook
Linear Regression Mca Lab - Jupyter Notebook
SVM K NN MLP With Sklearn Jupyter NoteBo
SVM K NN MLP With Sklearn Jupyter NoteBo
Simple_and_Multiple_Regression
Simple_and_Multiple_Regression
General Physics HW1 112022105
General Physics HW1 112022105
Python Practical solution
Python Practical solution
tesla_time_series
tesla_time_series
Act5 Wisnu Trenggono Wirayuda 57418379
Act5 Wisnu Trenggono Wirayuda 57418379
Netflix Stock Price Prediction
Netflix Stock Price Prediction
Python Fundamentals for Non - Programmers
Python Fundamentals for Non - Programmers
Cia Code
Cia Code
Python Qazaqsha Sabak 3
Python Qazaqsha Sabak 3
Rasters
Rasters
Sklearn Tutorial: DNN On Boston Data
Sklearn Tutorial: DNN On Boston Data
2.basic Statistics - Jupyter Notebook
2.basic Statistics - Jupyter Notebook
Practical-5 - Jupyter Notebook
Practical-5 - Jupyter Notebook
SageMath note 1
SageMath note 1
Poisonousmushrooms: 1 Importing The Libraries
Poisonousmushrooms: 1 Importing The Libraries
Numpy Session1
Numpy Session1
2021-civ-73-part-2 python
2021-civ-73-part-2 python
Aula4 - (Arrays)
Aula4 - (Arrays)
Assignment 2 Utkarsh
Assignment 2 Utkarsh
Advanced Python
Advanced Python
Python Qazaqsha Sabak 2
Python Qazaqsha Sabak 2
P3) Code Neural Networks
P3) Code Neural Networks
STOCK - MARKET - PROJECT - Jupyter Notebook
STOCK - MARKET - PROJECT - Jupyter Notebook
NumPy Sorting
NumPy Sorting
Slip NO - 1
Slip NO - 1
Strings
Strings
3-LinearRegression Formula Based
3-LinearRegression Formula Based
Profound Python Data Science
From Everand
Profound Python Data Science
3 SVM - Jupyter Notebook
3 SVM - Jupyter Notebook
6 XG Boost - Jupyter Notebook
6 XG Boost - Jupyter Notebook
7 Looping Statements (While and For)
7 Looping Statements (While and For)
2 Basic of Python - Functions
2 Basic of Python - Functions
Hirerachical Clustering - Jupyter Notebook
Hirerachical Clustering - Jupyter Notebook
Tuple
Tuple
Label Encoders - Jupyter Notebook
Label Encoders - Jupyter Notebook
1 Basics of Python
1 Basics of Python
1 KNN - Jupyter Notebook
1 KNN - Jupyter Notebook
2 MLR New - Jupyter Notebook
2 MLR New - Jupyter Notebook
1 Simple Linear Regression
1 Simple Linear Regression
5 Random Forest - Jupyter Notebook
5 Random Forest - Jupyter Notebook
Lampiran 6 Hasil Uji Statistik
Lampiran 6 Hasil Uji Statistik
Two_samples_t_test_same_n
Two_samples_t_test_same_n
Estimating The Sample Mean and Standard Deviation From The Sample Size, Median, Range And/or Interquartile Range
Estimating The Sample Mean and Standard Deviation From The Sample Size, Median, Range And/or Interquartile Range
9g Measures of Centre
9g Measures of Centre
Data Exploration and Regression in Python With HBAT Dataset
Data Exploration and Regression in Python With HBAT Dataset
Measures of Central Tendency
Measures of Central Tendency
MARIA MASRIAT Ujian Biostatistik
MARIA MASRIAT Ujian Biostatistik
Statistic and Probability WEEK 3 - 4 - MODULE 2 Answer Key
Statistic and Probability WEEK 3 - 4 - MODULE 2 Answer Key
December 2018 Question Paper (Based On Memory) : Marks 100 Time: 2 Hours
December 2018 Question Paper (Based On Memory) : Marks 100 Time: 2 Hours
11 Economics (HM) Complete
11 Economics (HM) Complete
Measures of Central Tendency: Maximo A. Llego, JR
Measures of Central Tendency: Maximo A. Llego, JR
Grid Data Report Kontur
Grid Data Report Kontur
Engineering Data Analysis (Mod2)
Engineering Data Analysis (Mod2)
2.measures of Variation by Shakil-1107
2.measures of Variation by Shakil-1107
Arina Fathima Az-Zahra - 2110801022 (A)
Arina Fathima Az-Zahra - 2110801022 (A)
Hubungan Dukungan Sosial Teman Sebaya Dengan Stres Pada Mahasiswa Yang Mengerjakan Skripsi Di Fakultas Keperawatan Unklab
Hubungan Dukungan Sosial Teman Sebaya Dengan Stres Pada Mahasiswa Yang Mengerjakan Skripsi Di Fakultas Keperawatan Unklab
18bge14a U4
18bge14a U4
Ifp 20-21 PNS1 Unit9 Ce01 Liveworkshop
Ifp 20-21 PNS1 Unit9 Ce01 Liveworkshop
Bsafc4 PPT Ch10 (Anova) - Compressed
Bsafc4 PPT Ch10 (Anova) - Compressed
ST-2 (BAS-303) (Special)
ST-2 (BAS-303) (Special)
Skewness
Skewness
Questions For 2nd Midterm Exam
Questions For 2nd Midterm Exam
2C PDF
2C PDF
Unit-14 (1) - Area of Normal Distribution
Unit-14 (1) - Area of Normal Distribution
DMPT S-2024
DMPT S-2024
Section 2 - Descriptive Multivariate Statistics
Section 2 - Descriptive Multivariate Statistics
Business Statistics Notes
Business Statistics Notes
Coefficient of Variation and Areas Under Normal Curve
Coefficient of Variation and Areas Under Normal Curve