Initial Data Analysis
Initial Data Analysis
Data Data
Processing Description
Data Scrutiny
and Cleaning
Data Quality
Accuracy. All entries must be correct. Relevance. Data might be accurate and complete
Completeness. Let your data be complete as much but some variables are not relevant to a
as possible. particular analysis/ research inquiry.
Reliability. A reliable data means that variables Availability. The data must be always organized
should conform, not contradict, other related and useful readily available for possible
variables. analysis in the future
Timeliness. All entries should be updated and timely
especially for variables that are constantly
changing and observation/ analysis is
longitudinal.
INITIAL DATA ANALYSIS
Data Validity and
Processing Reliability
(if Survey questionnaire)
Ethical
Consideration
Data Processing
• Number of observations/ respondents
involves coding and • Number of variables
entry of the data into a • Scale of measurement for each variable
dataset with a format
suitable for further data • Labels/ categories/ characteristics the numerical code represents for nominal
analysis and ordinal variables. The numerical code must be consistent.
• For ratio and interval variables, be consistent on the decimal points of values
each variable must be • Provide codes for missing values (no value recorded), out-of-range values (a
numerically coded value recorded but known to be impossible), and “don’t know” and “not
applicable” responses.
• Make sure that missing values are no values recorded and not zeroes. A zero
value when it is zero is a value and still be recorded as zero especially for interval
and ratio variables. Hence, a zero should not be used as code for missing
values. Some may keep it as a missing cell (no values encoded).
INITIAL DATA ANALYSIS
Validity and
Reliability
(if Survey questionnaire)
Data Scrutiny
and Cleaning
Data Scrutiny and Cleaning
data 9
24
18
28
11
23
16 20 27
13 21 16
10 6 19
24 16 6
8 8 27
CI f TCB CM (xi) RF(%) <cF >cF Rank
Grouped 6 - 10 10 5.5 – 10.5 8 33.33 10 30 5
data 11 - 15
16 - 20
4
8
10.5 – 15.5
15.5 – 20.5
13
18
13.33
26.67
14
22
20
16
(3) 2
4
21 - 25 4 20.5 – 23.5 23 13.33 26 8 (2) 2
26 - 31 4 23.5 – 28.5 28 13.33 30 4 (1) 2
SUMMARY MEASURES
Visualization using Box and Whisker Plot
Mean 15.8
16 18 12
30 7 13 Highest value Median 16
8 10 10
9 18 11
Mode 16
24 28 23 StDev 7.20345
16 20 27 75th percentile
13 21 16 IQR 7.75
10 6 19
24 16 6 Mean (x)
IQR Median (line)
8 8 27
Mode
25th percentile
Lowest value
SUMMARY MEASURES
Visualization using Histogram and Frequency Polygon
MEASURE OF SKEWNESS
Mean 16
Median 16.125
Mode 9
StDev 7.4138
MEASURE OF KURTOSIS
TABULAR is a
systematic way of
arranging data in columns
and rows according to
classifications or
categories.
Meana et al.,
(2019), MJSIR
DATA PRESENTATIONS
GRAPHICAL. A graph or chart is a pictorial presentation of a set of data
Bar Graph vs. Column Chart Clustered vs. Stacked
https://depictdatastudio.com/when-to-use-horizontal-bar-charts-vs-vertical-column-charts/
https://www.xelplus.com/excel-clustered-column-and-stacked-chart/
DATA PRESENTATIONS
Histogram vs. Pareto
https://study.com/academy/lesson/difference-between-a-pareto-chart-histogram.html
DATA PRESENTATIONS
Line graph is used to show fluctuations and A frequency polygon is obtained
trends in the components of the total over by plotting the Class Mark vs. the
time, or pattern of changes in the data Frequency of that class
https://www.excel-easy.com/examples/depreciation.html
https://www.sciencedirect.com/topics/mathematics/fr
equency-polygon
https://www.think- https://community.plotly.com/t/nested-pie-
cell.com/en/resources/manual/pie.html charts/24011
DATA PRESENTATIONS
Pictograph makes Statistical map. This
use of symbols and shows the geographical
is used to compare location and may contain
few quantitative different symbols on the
discrete data map. The legend which
usually of one kind tells what the symbols
represent is very
https://psa.gov.ph/content/philippine necessary.
-population-density-based-2015-
census-population
https://old.pcij.org/stories/stats-on-the-
state-of-the-regionsland-population-
population-density/
https://www.pinterest.ph/pin/540924605225268792/
https://www.anychart.com/products/anychart/gallery/Radar_Charts_(Spiderweb)/
https://support.microsoft.com/en-us/office/available-chart-types-in-office-a6187218-807e-4103-9e0a-27cdb19afb90
https://www.anychart.com/products/anychart/gallery/Tree_Map_Charts/
https://www.excel-exercise.com/sunburst-chart/
DATA PRESENTATIONS
https://towardsdatascience.com/scattered-boxplots-graphing-experimental-results-with-matplotlib-seaborn-and-pandas-
81f9fa8a1801?gi=c1f8591680e3
DATA PRESENTATIONS
INFOGRAPHIC
“A chart, diagram, or illustration (as in a book or
magazine, or on a website) that uses graphic elements
to present information in a visually striking way”
(Merriam-Webster Dictionary)