Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

1.3 Data Analysis With Python- Data Wrangling 1

Uploaded by

nhut.an41004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1.3 Data Analysis With Python- Data Wrangling 1

Uploaded by

nhut.an41004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Wrangling

Objectives
 Pre-processing Data in Python
 Describe how to handle missing values
 Describe data formatting techniques
 Describe data normalization
 Demonstrate the use of binning
 Demonstrate the use of categotical variables

Data Wrangling 2
Pre-processing Data in
Python
 Data preprocessing is a necessary step in data analysis.
 It is the process of converting or mapping data from one raw
form into another format to make it ready for further
analysis.
 Data preprocessing is often called data cleaning or data
wrangling:
 Identify and handle missing values
 Data formatting
 Data Normalization( centering/ scaling)
 Data binning
 Turning Categorical values to numeric variables

Data Wrangling 3
Dealing with Missing
Values
 A missing value condition occurs whenever a data entry
is left empty.
 When no data value is stored for a variable in an
observation
 Missing value in data set appears as question mark and
a zero or just a blank cell.

Data Wrangling 4
Dealing with Missing
Values
 How to deal with missing data?

--> Student give the opinions

Data Wrangling 5
Dealing with Missing
Values
 How to deal with missing data?
 Go back and find what the actual value should be
 Just to remove the data where that missing value is found
 Drop the whole variable

 Drop the single data entry with the missing value

 If you don't have a lot of observations with missing

data, usually dropping the particular entry is the best.

Data Wrangling 6
Dealing with Missing
Values
 How to deal with missing data?
 Replace the missing values
 Replace it with an average

 Replace it by frequency

 Replace it based on other functions

 Leave it as missing data


 It may be useful to keep that observation even if some

features are missing

Data Wrangling 7
Dealing with Missing
Values
 Using dataframes.dropna() to drop missing data

 Inplace= true: writes the result back into the data frame

Data Wrangling 8
Dealing with Missing
Values
 Using dataframe.replace(missingValue, newValue):
replace missing data by other value

Data Wrangling 9
Dealing with Missing
Values
 How to deal with missing data?
 Go back and find what the actual value should be
 Leave it as missing data
 You can always check for a higher quality data set or
source
 You may want to leave the missing data as missing
data.

Data Wrangling 10
Data Formatting in Python
 Data are usually collected from different places and
stored in different formats
 What is data formatting? bring data into a common
standard of expression allows users to make meaningful
comparison.

Data Wrangling 11
Data Formatting in Python
 Data types in Python and Pandas
 Objects: “B”, “HoaDNT”
 Int64: 0,2,4
 Float64: 1.345, 78.9
 To identify data types: dataframe.dtypes().
 To convert data types: dataframe.astype().
 Example: convert data type to integer in column “price”

Data Wrangling 12
Summary
 Pre-processing Data in Python
 Describe how to handle missing values
 Describe data formatting techniques

Data Wrangling 13
Q&A

Data Wrangling 14

You might also like