4 Data Preprocessing
4 Data Preprocessing
1
Data Preprocessing
• X = df.iloc[:, :-1].values
• y = df.iloc[:, -1].values
• print(X)
• print(y)
4) Handling Missing data
• There are mainly two ways to handle missing data, which are:
• By deleting the particular row: The first way is used to
commonly deal with null values.
• In this way, we just delete the specific row or column which
consists of null values.
• But this way is not so efficient and removing data may lead to
loss of information which will not give the accurate output.
Ways to handle missing data:
• Dummy Variables:
• Dummy variables are those variables which have
values 0 or 1.
• The 1 value gives the presence of that variable in
a particular column, and rest variables become 0.
• With dummy encoding, we will have a number of
columns equal to the number of categories.
5) Encoding Categorical data: