Module 3 - Lesson 3.2 Quantitative Data Analysis
Module 3 - Lesson 3.2 Quantitative Data Analysis
Lesson 3.2
Quantitative Data Analysis
ELVIRA E. ONGY
Department of Business and Management
Creating a Data Set
• The use of rows and columns on a spreadsheet is the
most common technique.
• A row is given to each record or case and each column is
given to a variable, allowing each cell to contain the data
for the case/variable
• The data might be in the form of integers (whole
numbers), real numbers (numbers with decimal points) or
categories (nominal units e.g. gender, of which ‘male’ and
‘female’ are the elements).
• Missing data also need to be indicated, distinguishing
between genuine missing data and a ‘don’t know’
response.
Data Spreadsheet
• It is easy to make mistakes
in the rather tedious
process of transferring
data.
• It is therefore important
to check on the accuracy
of the data entry.
Parametric & Non-Parametric Tests
• The two major classes of statistics are:
Parametric and Non-parametric statistics
• A parameter of a population (i.e. the things or
people you are surveying) is a constant feature
that it shares with other populations.
• The most common one is the ‘bell’ or ‘Gaussian’
curve of normal frequency distribution (see
figure).
Parametric & Non-Parametric Tests
• This parameter reveals that most populations display
a large number of more or less ‘average’ cases with
extreme cases tailing off at each end.
• Although the shape of this curve varies from case to
case (e.g. flatter or steeper, lopsided to the left or
right) this feature is so common amongst
populations that statisticians take it as a constant – a
basic parameter.
• Calculations of parametric statistics are based on
this feature.
• Not all data are parametric, i.e. populations
sometimes do not behave in the form of a Gaussian
curve
Parametric & Non-Parametric Tests
• Data measured by nominal and ordinal methods will not
be organized in a curve form.
• Nominal data tend to be in groups (e.g. this is a cow or a
sheep or goat)
• Ordinal data can be displayed in the form of a set of steps
(e.g. the first, second and third positions on a winners’
podium).
• For those cases where this parameter is absent,
nonparametric statistics may be applicable.
Parametric & Non-Parametric Tests
• Non-parametric statistical tests have been
devised to recognize the particular
characteristics of non-curve data and to
take into account these singular
characteristics by specialized methods.
• In general, these types of test are less
sensitive and powerful than parametric
tests; they need larger samples in order to
generate the same level of significance
Parametric Statistical Tests
• There are two classes of parametric
statistical tests: descriptive and inferential.
• Descriptive tests will reveal the ‘shape’ of
the data in the sense of how the values of a
variable are distributed.
• Inferential tests will suggest (i.e. infer)
results from a sample in relation to a
population.
Parametric Statistical Tests
• Univariate analysis – analyses the qualities of
one variable at a time. Only descriptive tests
can be used in this type of analysis.
• Bivariate analysis – considers the properties
of two variables in relation to each other.
Inferences can be drawn from this type of
analysis.
• Multivariate analysis – looks at the
relationships between more than two
variables. Again, inferences can be drawn
from results.
Univariate Analysis
• Univariate analysis – a range of properties of
one variable can be examined using the
following measures:
1. FREQUENCY DISTRIBUTION
2. MEASURE OF CENTRAL TENDENCY
3. MEASURES OF DISPERSION (OR VARIABILITY)
Univariate Analysis – Frequency of Distribution
Usually presented as a
table, frequency
distribution simply shows
the values for each variable
expressed as a number and
as a percentage of the total
of cases (see figure)
Univariate Analysis – Measure of Central Tendency
• Central tendency is one number that
denotes various ‘averages’ of the
values for a variable.
• There are several measures that can be
used, such as the arithmetic mean
(average), the median (the
mathematical middle between the
highest and lowest value) and the
mode (the most frequently occurring
value)
Univariate Analysis – Measure of Central Tendency
• Normal distribution is
when the mean, median
and mode are located at
the same value.
• This produces a
symmetrical curve.
Univariate Analysis – Measure of Central Tendency
• Skewness occurs when the
mean is pushed to one side of
the median.
• If there are two modes to each
side of the mean and median
points, then it is a bimodal
distribution.
• The curve will have two peaks
and a valley between.
Univariate Analysis – Measures of Dispersion
(variability)
• The computer programs provide a choice of
display options to illustrate the these measures.
• The most basic is a summary table of descriptive
statistics which gives figures for all of the
measures.
Univariate Analysis – Measures of Dispersion (variability)
Bivariate Analysis
• Bivariate analysis – considers the properties of
two variables in relation to each other.
• The relationship between two variables is of
common interest in the social sciences, e.g.
• Does social status influence academic achievement?
• Are boys more likely to be delinquents than girls?
• Does age have an effect on community involvement?
There are various methods for investigating the
relationships between two variables.
Bivariate Analysis
• An important aspect is the different
measurement of these relationships,
such as assessing the direction and
degree of association, statistically
termed correlation coefficients
• The commonly used coefficients assume
that there is a linear relationship
between the two variables, either
positive or negative.
• In reality, this is seldom achieved, but
degrees of correlation can be computed,
how near to a straight line the
relationship is.
Bivariate Analysis
• A positive relationship is one where more of
one variable is related to more of another, or
less of one is related to less of another.
• For example, more income relates to more political
power, or less income relates to less political
power.
• A negative relationship is one where more of
one variable is related to less of another or the
other way round.
• For example, more income relates to less anxiety,
less income relates to more anxiety. Note that
detecting a relationship does not mean you have
found an influence or a cause, although it may be
that this is the case.
• Again, graphical displays help to show the
analysis.
Bivariate Analysis
• Scattergrams are a useful type of diagram that graphically
shows the relationship between two variables by plotting
variable data from cases on a two-dimensional matrix.
• If the resulted plotted points appear in a scattered and
random arrangement, then no association is indicated.
• If however they fall into a linear arrangement, a
relationship can be assumed, either positive or negative.
• The closer the points are to a perfect line, the stronger the
association.
• A line that is drawn to trace this notional line is called the
line of best fit or regression line. This line can be used to
predict one variable value on the basis of the other (see
figure).
Bivariate Analysis
• Cross tabulation (contingency tables) is a simple way to display the
relationship between variables that have only a few categories.
• They show the relationships between each of the categories of the variables
in both number of responses and percentages.
• In addition, the column and row totals and percentages are shown (see
Figure).
Bivariate Analysis
• The choice of appropriate statistical tests that do bivariate
analysis depends on the levels of measurement used in the
variables.
• These tests are called by exotic names, e.g.
• Pearson’s correlation coefficient (r), used for examining
relationships between interval/ratio variables,
• Spearman’s rho (U) which is used when both variables are
ordinal, or when one is ordinal and the other is
interval/ratio.
Bivariate Analysis – Statistical Significance
Are the results simply occasioned by chance or are they truly
representative, i.e. are they statistically significant?
• To estimate the likelihood that the results are relevant to the
population as a whole one has to use statistical inference.
• The most common statistical tool for this is known as the chi-
square test.
• This measures the degree of association or linkage between two
variables by comparing the differences between the observed
values and expected values if no association were present i.e. those
that would be a result of pure chance.
Bivariate Analysis – Analysis of Variance
• The previous mentioned tests are all designed to look for
relationships between variables.
• Another common requirement is to look for differences
between values obtained under two or more different
conditions, e.g.
• a group before and after a training course
• three groups after different training courses
Walliman, Nicholas (2011). Research Methods: The Basics. Routledge. Taylor & Francis Group.