Chapter Five:: Analyses and Interpretation of Data
Chapter Five:: Analyses and Interpretation of Data
Chapter Five:: Analyses and Interpretation of Data
01/28/2022 1
Cont…..d
The analysis of data requires a number of closely related
operations such as:
a) Editing
b) Coding, Entering in to Excel or SPSS or STATA
c) Presentation of Data via Tabulation and the like
d) Descriptive analysis
e) Drawing statistical inferences.
Demographic Profile of Respondents
Research Objective (Major Issues of Analysis)
Present your data using tables, graphs, texts… then
interpret and analyze – show the implication, support with
additional data from interview… and evidences in the
literature
01/28/2022 2
Cont…..d
Example …taken from my unpublished manuscript
Percentage
Measure of dispersion
Data analysis…
(iii) Mode
Data analysis…
Mean – the sum of all measurements divided by the
number of observations in the data set
Median: is the number present in the middle when
the numbers in a set of data are arranged in
ascending or descending order.
Mode: is the value that occurs most frequently in a
set of data.
◦ This is the only central tendency measure that can
be used with nominal data, which have purely
qualitative category assignments.
Example: Measures of Central Tendency
(Arithmetic Mean)
The arithmetic mean is the average of all the values
under consideration
Branch Revenue
1 50,000,000
2 150,000,000
3 40,000,000
4 60,000,000
Total = 300,000,000
10
9
8
7
6
Frequency
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
Data analysis…
The shape of the distribution is said to be skewed if the
observations are not symmetrically distributed around
the center.
A positively skewed
Positively Skewed Distribution
12
Frequency
6
of positive values.
0
1 2 3 4 5 6 7 8 9
A negatively skewed
Negatively Skewed Distribution
12
negative values. 0
1 2 3 4 5 6 7 8 9
Data analysis…
Measurement of Dispersion
Dispersion measure how the value of an item is
scattered around the true value of the average.
It is a measurement of how far is the value of the
variable from the average value.
Important measures of dispersion are:
Range: difference between the max & min value of
an observed variable.
Mean deviation: It is the average dispersion of an
observation around the mean value.
Standard deviation: is defined as the square-root
of the average of squares of deviations.
When the distribution of item in a series
happens to be perfectly symmetrical
Data Presentation
• Data in raw form are usually not easy to use for decision
making
• Some type of organization is needed
• Table
• Graph
• Data presentation: The process of transforming a mass of raw
data into tables and charts-as a part of making sense of the data.
• Refers to the preparation of data in a manner that could be used
by general audience
• Tables:
◦ They can be used with just about all types of numerical data.
• Graphical
• The type of graph to use depends on the variable being
summarized
Data presentation: The Frequency
Distribution Table
Summarize data by category
(Variables are
categorical)
Data presentation-Cont’d
1000
2000
3000
4000
5000
0
Cardiac Care
Emergency
Intensive
Care
Maternity
Hospital Patients by Unit
Simple Bar Chart Example
Surgery
A multiple bar Chart Example
Sales by quarter for three sales territories:
60
50
40
East
30 West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
A Component bar Chart Example
• Sales by quarter for three sales territories:
Pie chart:
Intensive Care
(Percentages are 4%
Maternity
rounded to the
nearest percent) 6%
Tests of significance
There are two general classes of significance tests:
Parametric hypothesis testing Non-parametric hypothesis
testing
• When the data are interval-or • when data are either ordinal
ratio-scaled (gross national or nominal
product, industry sales
volume) and sample size is •Examples: Chi-square,
large Kolmogorov-Smirnov test
•It assumes that the data in the
study are drawn from
population with normal (bell-
shaped) distributions and /or
normal sampling distribution
complete population.
Seek to determine the relationship between
variables and test statistical significance.
Inferential Analysis…
01/28/2022 37
Measures of relationship:
Need to determine whether there is a
relationship between variables
Correlation
• Magnitude
• Direction
Inferential Analysis…
(ii) Is there any cause and effect relationship
between the two variables? If yes, of what degree
and in which direction?
This will be answered by technique of
regression.
There are different techniques of regression.
◦ In case of bivariate population cause and effect
relationship can be studied through simple
regression.
◦ In case of multivariate population, causal
relationship can be studied through multiple
regression analysis.
Linear/Multiple regression
Regression is used to predict the value of dependent
variable based on the value of independent variable.
Linear regression tries to find the best line to fit the
data. It deals with only one DV and IV.
A multiple regression equation the extension of linear
regression. It deals with more than two independent
variables.
B0 and Bi are regression coefficient
Y 0 1 x1 2 x2 ... n xn e
01/28/2022 41
Linear/Multiple regression…
n( XY ) ( X )( Y )
b
n( X 2 ) ( X ) 2
Y X
a b
n n
01/28/2022 42
Multivariate Assumptions
linear relationship -IV and the DV- can best be tested with
scatter plot
Multivariate Normality– residuals should normally distributed
No Multicollinearity—Independent Variables should not
highly correlated with each other-can be tested using Variance
Inflation Factor (VIF), Correlation matrix, Tolerance, and
Condition Index
Homoscedasticity–variance of error terms should be similar
across the values of the independent variables. A plot of
standardized residuals versus predicted values can show
whether points are equally distributed across all values of the
independent variables.
Autocorrelation-this is common problem in time series data-
can best be tested with Durbin-Watson's
01/28/2022 43
T-test
A t-test helps to compare whether two groups have
different average values
◦ for example, whether men and women have different
average heights.
This analysis is appropriate whenever you want to
compare the means of two groups.
Figure 1. Idealized distributions for treated and comparison group posttest values.
One sample t-test
Impact of one independent variable on
dependent/response variable
◦ Eg. Number of patient on weekly sales of the
store
Paired samples
Paired samples t-tests typically consist of a sample
of matched pairs of similar units, or one group of
units that has been tested twice (a "repeated
measures" t-test)
◦ Paired sample t-test is used in ‘before-after’ studies.
A typical example of the repeated measures t-test would
be where subjects are tested prior to a treatment, say for
high blood pressure, and the same subjects are tested
again after treatment with a blood-pressure lowering
medication.
By comparing the same patient's numbers before and
after treatment
Independent/unpaired samples/The two-sample
t-test
The independent samples t-test is used when two
separate sets of independent and identically
distributed samples are obtained, one from each of
the two populations being compared.
It tests for significant differences in the means of
two distinct populations.
◦ For example, we can use this test to see if there are
significant differences in how men and women score the
new concept
◦ Eg. Name of two teachers who teach same course
different section
◦ Test its effect on students’ grade/score
Analysis of variance/ANOVA
ANOVA, is a technique from statistical interference
that allows us to deal with several populations.
A hypothesis test to compare the means of more than
two population
ANOVA test assumes three things:
◦ The population sample must be normal
◦ The observations must be independent in each
sample
◦ Homogeneity: Homogeneity means that the
variance between the groups should be
approximately equal.
Analysis of variance/ANOVA
These assumptions can be tested using statistical
software.
◦ The assumption of homogeneity of variance can be tested
using tests such as Levene’s test or the Brown-Forsythe Test.