Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a technique for analyzing and visualizing datasets to summarize their main characteristics, developed by John Tukey in the 1970s. Key components of EDA include visualization techniques, summary statistics, pairwise correlation, and class breakdowns to understand data distributions and relationships. Common visualization methods include scatter plots, histograms, and box plots, which help in interpreting data and identifying patterns.

Uploaded by

chandramaryt

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Exploratory Data Analysis

Uploaded by

chandramaryt

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 10

EXPLORATORY DATA ANALYSIS

Exploratory data analysis is an analysis

technique to analyze and investigate the data set
and summarize the main characteristics of the
dataset in a visualized form.

Exploratory data analysis is a set of

techniques that have been principally developed by
American Mathematician John Tukey 1970 .

The analysis of datasets based on various

numerical methods and graphical tools.
VISUALIZATION TECHNIQUES
Visualization techniques are essential in
Exploratory Data Analysis (EDA), helps to
understand and communicate insights. Here are
some common techniques:
1. Scatter Plots
2. Pie Charts
3. Bar Charts
4. Histograms
5. Line Plots
6. Heatmaps
7. Violin Plots
8. Distribution Plots
CONFRONTING NEW DATA SET
It refers to the initial encounter and examination of
a previous unseen data collection.
Answer the basic questions:
Learn about the data's origin, purpose, and
any relevant background information.
 Who constructed this data set, when, and
why?
Eg: National Health and Nutrition Examination Survey
2009-2010
 How big is it?
Eg: The data set has 4978 records each with seven data
fields.
 What do the fields mean?
Eg: The lengths and weights were measured using the
metric system (i.e) centimetres and kilograms
respectively.
LOOK FOR FAMILIAR OR INTERPRETABLE
RECORDS:
It refers to the process of identifying data points,
patterns, or features that are:

1.Recognizable: Easily identifiable due to prior

knowledge or experience.
2.Understandable: Can be comprehended
without extensive additional explanation.
3.Relatable: Connected to existing knowledge or
real-world concepts.
4.Meaningful: Possess a clear interpretation
or significance.
SUMMARY STATISTICS:
Summary statistics refers to a numerical
measures that summarizes and describes the
basic features of a data set.

Common summary statistics used in EDA include:

1.Measures of central tendency:
• Mean, Median, Mode
2.Measures of variability:
• Range, Variance, Standard deviation
3.Measures of distribution:
• Skewness, Kurtosis, Quantiles
4.Counts and frequencies:
• Number of observations (n), Missing values,
Frequency distributions
PAIRWISE CORRELATION
It refers to the process of calculating and
analyzing the correlation coefficients between all
possible pairs of variables in a dataset.

1.Identify relationships: Discover how

variables are related to each other.
2.Understand interactions: Reveal how
changes in one variable affect others.
3.Detect patterns: Uncover hidden patterns
and correlations.
4.Guide feature selection: Inform the
selection of relevant variables for
further analysis.
CLASS BREAKDOWNS:
Class breakdowns in Exploratory Data Analysis (EDA)
refer to the process of analyzing and summarizing the
distribution of categorical variables, also known as class
variables or target variables.
They Involves :

1.Identifying unique classes: Determining the

distinct categories or groups within a categorical
variable.
2.Counting observations: Calculating the number of
observations (records) in each class.
3.Calculating frequencies: Determining the
proportion or percentage of observations in each class.
4.Visualizing distributions: Using plots like bar
charts, pie charts, or histograms to illustrate the class
breakdowns.
PLOT OF DISTRIBUTIONS

Plotting distributions is a crucial step in EDA to

understand the shape, central tendency, and
variability of your data.

Common distribution plots in EDA include:

1. Histograms: Visualize frequency distributions.
2. Box Plots: Show median, quartiles, and outliers.
3. Density Plots: Smooth, continuous
representations.
4. Q-Q Plots(Quantile-Quantile):Compare
distributions.
5. Violin Plots: Combine box plots and density
Example:
| Customer ID | Age | Gender | Income | Purchase Amount |
| 1 | 25 | Male | 50000 | 100 |
| 2 | 31 | Female | 60000 | 200 |
| 3 | 42 | Male | 70000 | 50 |

EDA Steps:
1.Summary Statistics:
- Mean Age: 35
-Median Income: 55000
- Average Purchase Amount: 150

2. Data Visualization:
- Histogram of Age: skewed to the right
- Scatter plot of Income vs. Purchase Amount: positive correlation
- Bar chart of Gender vs. Purchase Amount: males spend more
3. Familiar or Interpretable Records:
- Customers with high income (>75000) tend to make
larger purchases
- Males aged 25-40 have higher purchase amounts

4. Pairwise Correlation:
- Strong correlation between Income and Purchase Amount
(0.8) - Moderate correlation between Age and Income
(0.5)

5. Class Breakdown:
- 60% of customers are males
- 40% of customers have income above 60000

Fbs Week 1-10
75% (8)
Fbs Week 1-10
183 pages
Advanced Statistics in Quantitative Research
No ratings yet
Advanced Statistics in Quantitative Research
21 pages
Bilogy Lecture
No ratings yet
Bilogy Lecture
121 pages
exp 4-10 merged
No ratings yet
exp 4-10 merged
89 pages
5412-1
No ratings yet
5412-1
13 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
STA 111 Note
No ratings yet
STA 111 Note
12 pages
STA 224 LECTURE NOTE BY UPCOMING UPDATE ?
No ratings yet
STA 224 LECTURE NOTE BY UPCOMING UPDATE ?
63 pages
Class notes
No ratings yet
Class notes
15 pages
Research Report
No ratings yet
Research Report
47 pages
Statistics Notes
No ratings yet
Statistics Notes
28 pages
5412-1-A23
No ratings yet
5412-1-A23
13 pages
Unit 3
No ratings yet
Unit 3
47 pages
Week 2 Notes
No ratings yet
Week 2 Notes
11 pages
Quantitative Data Analysis Thru Descriptive Statistics
No ratings yet
Quantitative Data Analysis Thru Descriptive Statistics
6 pages
HSO 4104 Basic Social Statistics
No ratings yet
HSO 4104 Basic Social Statistics
51 pages
Descriptive Staticstics: College of Information and Computing Sciences
No ratings yet
Descriptive Staticstics: College of Information and Computing Sciences
28 pages
Modified Ps Final 2023
No ratings yet
Modified Ps Final 2023
124 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
13 pages
IDA Question Bank Ch2
No ratings yet
IDA Question Bank Ch2
26 pages
Probability and Statistics Notes
No ratings yet
Probability and Statistics Notes
38 pages
prw questions
No ratings yet
prw questions
31 pages
STATISTICAL-TREATMENT-OF-DATA-WRITING-GUIDE (1)
No ratings yet
STATISTICAL-TREATMENT-OF-DATA-WRITING-GUIDE (1)
4 pages
Creative and Minimal Portfolio Presentation
No ratings yet
Creative and Minimal Portfolio Presentation
5 pages
Data Gathering, Organization, Presentation and Interpretation
No ratings yet
Data Gathering, Organization, Presentation and Interpretation
10 pages
FROM DR Neerja Nigam
No ratings yet
FROM DR Neerja Nigam
75 pages
Fundamentals of Statistics
No ratings yet
Fundamentals of Statistics
10 pages
Unit 3
No ratings yet
Unit 3
77 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Biostatistics Ch.1 – Kopie
No ratings yet
Biostatistics Ch.1 – Kopie
5 pages
BUSINESS AND STATISTICS
No ratings yet
BUSINESS AND STATISTICS
29 pages
KCU 200-Statistics For Agriculture-Notes.
No ratings yet
KCU 200-Statistics For Agriculture-Notes.
115 pages
Chapter 5
No ratings yet
Chapter 5
40 pages
STAT. - Adamu2 Finialcorrect NEW-LASTEST
No ratings yet
STAT. - Adamu2 Finialcorrect NEW-LASTEST
398 pages
Statistic Lecture2023
No ratings yet
Statistic Lecture2023
99 pages
Statistics For Management
No ratings yet
Statistics For Management
22 pages
STA 111 NURSING NOTES
No ratings yet
STA 111 NURSING NOTES
36 pages
Statistics 101
100% (1)
Statistics 101
28 pages
dataanexp-2
No ratings yet
dataanexp-2
8 pages
QA QB solution
No ratings yet
QA QB solution
20 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Educ 202
No ratings yet
Educ 202
8 pages
Scholarly Publication-CHAPTER 1 & 2.pdf261-1176727076
No ratings yet
Scholarly Publication-CHAPTER 1 & 2.pdf261-1176727076
25 pages
Business Statistics I BBA 1303: Muktasha Deena Chowdhury Assistant Professor, Statistics, AUB
100% (1)
Business Statistics I BBA 1303: Muktasha Deena Chowdhury Assistant Professor, Statistics, AUB
54 pages
Assignment 3 - Exploratory Data Analysis
No ratings yet
Assignment 3 - Exploratory Data Analysis
2 pages
Business Math & Stat Midterm Topics Summary
No ratings yet
Business Math & Stat Midterm Topics Summary
17 pages
Biostatistics CN
No ratings yet
Biostatistics CN
79 pages
Descriptive and Inferential Statistics
100% (1)
Descriptive and Inferential Statistics
10 pages
Statistics
No ratings yet
Statistics
13 pages
Stas Tics
No ratings yet
Stas Tics
129 pages
Exploratory Data Analysis unit 2
No ratings yet
Exploratory Data Analysis unit 2
39 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
LM-maths-section-8-Lversion
No ratings yet
LM-maths-section-8-Lversion
41 pages
Unit 3 DEV QB
No ratings yet
Unit 3 DEV QB
37 pages
ds unit 2 qb
No ratings yet
ds unit 2 qb
25 pages
Research Theory
No ratings yet
Research Theory
8 pages
Statistics For Economics
No ratings yet
Statistics For Economics
58 pages
Stats Interview Questions Answers 1697190472
No ratings yet
Stats Interview Questions Answers 1697190472
54 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
Reading graphs_white.ppt
No ratings yet
Reading graphs_white.ppt
8 pages
Developing the Visual Aesthetics
No ratings yet
Developing the Visual Aesthetics
6 pages
Great Visualization w.ppt
No ratings yet
Great Visualization w.ppt
9 pages
Journalism
No ratings yet
Journalism
2 pages
Paper Presentation Score Sheet 1
No ratings yet
Paper Presentation Score Sheet 1
1 page
Impromptu Starter Kit
No ratings yet
Impromptu Starter Kit
26 pages
Basic Medical Terms 101 Terms Every Future Healthcare Pro Should Know
No ratings yet
Basic Medical Terms 101 Terms Every Future Healthcare Pro Should Know
7 pages
Masterlist of Enrolled Learners With End of Program/Cy Status (Af-3)
No ratings yet
Masterlist of Enrolled Learners With End of Program/Cy Status (Af-3)
4 pages
11vda. de Quirino v. Palarca 29 SCRA 1 GR L 28269 08151969 G.R. No. L 28269
No ratings yet
11vda. de Quirino v. Palarca 29 SCRA 1 GR L 28269 08151969 G.R. No. L 28269
3 pages
Eeprom Avr
No ratings yet
Eeprom Avr
3 pages
Weekly-sales-activity-report
No ratings yet
Weekly-sales-activity-report
6 pages
State Succession A Critical Analysis
No ratings yet
State Succession A Critical Analysis
12 pages
English in Mind Starter Level Beginner Students Book Sample Pages PDF
No ratings yet
English in Mind Starter Level Beginner Students Book Sample Pages PDF
5 pages
SBI CA High Variants
No ratings yet
SBI CA High Variants
2 pages
Was That Supposed To Be Funny - A Rhetorical Analysis of Politics PDF
No ratings yet
Was That Supposed To Be Funny - A Rhetorical Analysis of Politics PDF
181 pages
Experiment No 01.: Aim: Develop A Javascript To Use Decision Making and Looping
No ratings yet
Experiment No 01.: Aim: Develop A Javascript To Use Decision Making and Looping
6 pages
Classroom Observation Tasks A Resource Book For Language Teachers and Trainers
No ratings yet
Classroom Observation Tasks A Resource Book For Language Teachers and Trainers
9 pages
Allied Bank V Lim Sio Wan
100% (1)
Allied Bank V Lim Sio Wan
3 pages
Summary of IAS 16
No ratings yet
Summary of IAS 16
5 pages
Drug Study TB
No ratings yet
Drug Study TB
5 pages
The Unicorn in The Garden - Moral
100% (1)
The Unicorn in The Garden - Moral
1 page
Fall 2024 - CS609 - 2
No ratings yet
Fall 2024 - CS609 - 2
3 pages
Chapter 2
No ratings yet
Chapter 2
7 pages
Pengaruh Proses Politik Sebelum Selama Dan Sesudah
No ratings yet
Pengaruh Proses Politik Sebelum Selama Dan Sesudah
15 pages
Micro-Hydro Electricity Project Kadugannawa
No ratings yet
Micro-Hydro Electricity Project Kadugannawa
14 pages
Task Monopolistic Competition
No ratings yet
Task Monopolistic Competition
2 pages
Karnataka State Budget July 2023
No ratings yet
Karnataka State Budget July 2023
19 pages
Cro Chords
80% (5)
Cro Chords
255 pages
Colleges List Kerala University
100% (1)
Colleges List Kerala University
13 pages
Pre-Test English 2
No ratings yet
Pre-Test English 2
2 pages
Finders Keepers Chapter 5 - MEMO
100% (1)
Finders Keepers Chapter 5 - MEMO
2 pages
Final Module 02 - Characteristics of An Entrepreneur
No ratings yet
Final Module 02 - Characteristics of An Entrepreneur
8 pages
Anurag TQM
No ratings yet
Anurag TQM
6 pages
IJAS - Volume 12 - Issue 2 - Pages 5067-5076
No ratings yet
IJAS - Volume 12 - Issue 2 - Pages 5067-5076
10 pages