Exploratory Data Analysis EDA Part of Data PreProcessing
Exploratory Data Analysis EDA Part of Data PreProcessing
Analysis (EDA)
PART OF DATA PRE-PROCESSING
Kamalesh
@kamalesh12
What is Exploratory
Data Analysis (EDA)?
Exploratory Data Analysis (EDA) is an essential
step in any data science project. It involves
investigating and analyzing datasets to
understand their characteristics, identify
patterns, detect outliers, and uncover
relationships between variables. EDA helps in
gaining initial insights into the data before
diving into more complex analyses.
The Foremost
Goals of EDA
1. Descriptive Statistics
2. Data Visualization
3. Feature Engineering
4. Correlation and Relationships
5. Data Segmentation
6. Hypothesis Generation
7. Data Quality Assessment
Types of EDA
1. Bivariate Analysis
2. Multivariate Analysis
3. Time Series Analysis
4. Missing Data Analysis
5. Outlier Analysis
6. Data Visualization
EDA Using Python
Libraries
Python libraries like Pandas and
Matplotlib are commonly used for EDA.
Techniques such as data reading,
summary statistics, data type
conversion, handling missing values,
and data visualization are performed
using these libraries.
Handling Missing
Values
Missing data can impact analysis.
Techniques such as filling missing
values, dropping rows with missing
data, and data imputation are used to
handle missing values effectively.
Data Encoding
Categorical data may need to be
encoded into numerical columns for
certain models. Techniques like Label
Encoding and One-hot Encoding can
be used for this purpose.
Data Visualization
Techniques
Various visualization techniques such
as histograms, box plots, scatter plots,
and pair plots are used to explore data
visually and understand trends and
patterns.
Handling Outliers
Outliers, data points significantly
deviating from the rest, can affect
analysis. Techniques like Interquartile
Range (IQR) method are used to detect
and remove outliers.
Handling Missing
Values
Missing data can impact analysis.
Techniques such as filling missing
values, dropping rows with missing
data, and data imputation are used to
handle missing values effectively.
Follow me for
more tips to help
you connect with
your audience
LEAVE A COMMENT BELOW
Kamalesh K B
@kamalesh12