Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
30 views

Module 3 - Lesson 3.2 Quantitative Data Analysis

This document provides an overview of quantitative data analysis techniques. It discusses creating data sets using rows and columns in spreadsheets, with each row representing a record and each column representing a variable. It describes univariate analysis to analyze one variable at a time through measures of central tendency, frequency distributions, and dispersion. Bivariate analysis considers the relationship between two variables through correlation coefficients, scatterplots, cross-tabulation, and tests of statistical significance. Parametric and non-parametric statistical tests are also introduced.

Uploaded by

Kent Vergara
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Module 3 - Lesson 3.2 Quantitative Data Analysis

This document provides an overview of quantitative data analysis techniques. It discusses creating data sets using rows and columns in spreadsheets, with each row representing a record and each column representing a variable. It describes univariate analysis to analyze one variable at a time through measures of central tendency, frequency distributions, and dispersion. Bivariate analysis considers the relationship between two variables through correlation coefficients, scatterplots, cross-tabulation, and tests of statistical significance. Parametric and non-parametric statistical tests are also introduced.

Uploaded by

Kent Vergara
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Module 3

Lesson 3.2
Quantitative Data Analysis
ELVIRA E. ONGY
Department of Business and Management
Creating a Data Set
• The use of rows and columns on a spreadsheet is the
most common technique.
• A row is given to each record or case and each column is
given to a variable, allowing each cell to contain the data
for the case/variable
• The data might be in the form of integers (whole
numbers), real numbers (numbers with decimal points) or
categories (nominal units e.g. gender, of which ‘male’ and
‘female’ are the elements).
• Missing data also need to be indicated, distinguishing
between genuine missing data and a ‘don’t know’
response.
Data Spreadsheet
• It is easy to make mistakes
in the rather tedious
process of transferring
data.
• It is therefore important
to check on the accuracy
of the data entry.
Parametric & Non-Parametric Tests
• The two major classes of statistics are:
Parametric and Non-parametric statistics
• A parameter of a population (i.e. the things or
people you are surveying) is a constant feature
that it shares with other populations.
• The most common one is the ‘bell’ or ‘Gaussian’
curve of normal frequency distribution (see
figure).
Parametric & Non-Parametric Tests
• This parameter reveals that most populations display
a large number of more or less ‘average’ cases with
extreme cases tailing off at each end.
• Although the shape of this curve varies from case to
case (e.g. flatter or steeper, lopsided to the left or
right) this feature is so common amongst
populations that statisticians take it as a constant – a
basic parameter.
• Calculations of parametric statistics are based on
this feature.
• Not all data are parametric, i.e. populations
sometimes do not behave in the form of a Gaussian
curve
Parametric & Non-Parametric Tests
• Data measured by nominal and ordinal methods will not
be organized in a curve form.
• Nominal data tend to be in groups (e.g. this is a cow or a
sheep or goat)
• Ordinal data can be displayed in the form of a set of steps
(e.g. the first, second and third positions on a winners’
podium).
• For those cases where this parameter is absent,
nonparametric statistics may be applicable.
Parametric & Non-Parametric Tests
• Non-parametric statistical tests have been
devised to recognize the particular
characteristics of non-curve data and to
take into account these singular
characteristics by specialized methods.
• In general, these types of test are less
sensitive and powerful than parametric
tests; they need larger samples in order to
generate the same level of significance
Parametric Statistical Tests
• There are two classes of parametric
statistical tests: descriptive and inferential.
• Descriptive tests will reveal the ‘shape’ of
the data in the sense of how the values of a
variable are distributed.
• Inferential tests will suggest (i.e. infer)
results from a sample in relation to a
population.
Parametric Statistical Tests
• Univariate analysis – analyses the qualities of
one variable at a time. Only descriptive tests
can be used in this type of analysis.
• Bivariate analysis – considers the properties
of two variables in relation to each other.
Inferences can be drawn from this type of
analysis.
• Multivariate analysis – looks at the
relationships between more than two
variables. Again, inferences can be drawn
from results.
Univariate Analysis
• Univariate analysis – a range of properties of
one variable can be examined using the
following measures:

1. FREQUENCY DISTRIBUTION
2. MEASURE OF CENTRAL TENDENCY
3. MEASURES OF DISPERSION (OR VARIABILITY)
Univariate Analysis – Frequency of Distribution
Usually presented as a
table, frequency
distribution simply shows
the values for each variable
expressed as a number and
as a percentage of the total
of cases (see figure)
Univariate Analysis – Measure of Central Tendency
• Central tendency is one number that
denotes various ‘averages’ of the
values for a variable.
• There are several measures that can be
used, such as the arithmetic mean
(average), the median (the
mathematical middle between the
highest and lowest value) and the
mode (the most frequently occurring
value)
Univariate Analysis – Measure of Central Tendency

• Normal distribution is
when the mean, median
and mode are located at
the same value.
• This produces a
symmetrical curve.
Univariate Analysis – Measure of Central Tendency
• Skewness occurs when the
mean is pushed to one side of
the median.
• If there are two modes to each
side of the mean and median
points, then it is a bimodal
distribution.
• The curve will have two peaks
and a valley between.
Univariate Analysis – Measures of Dispersion
(variability)
• The computer programs provide a choice of
display options to illustrate the these measures.
• The most basic is a summary table of descriptive
statistics which gives figures for all of the
measures.
Univariate Analysis – Measures of Dispersion (variability)
Bivariate Analysis
• Bivariate analysis – considers the properties of
two variables in relation to each other.
• The relationship between two variables is of
common interest in the social sciences, e.g.
• Does social status influence academic achievement?
• Are boys more likely to be delinquents than girls?
• Does age have an effect on community involvement?
There are various methods for investigating the
relationships between two variables.
Bivariate Analysis
• An important aspect is the different
measurement of these relationships,
such as assessing the direction and
degree of association, statistically
termed correlation coefficients
• The commonly used coefficients assume
that there is a linear relationship
between the two variables, either
positive or negative.
• In reality, this is seldom achieved, but
degrees of correlation can be computed,
how near to a straight line the
relationship is.
Bivariate Analysis
• A positive relationship is one where more of
one variable is related to more of another, or
less of one is related to less of another.
• For example, more income relates to more political
power, or less income relates to less political
power.
• A negative relationship is one where more of
one variable is related to less of another or the
other way round.
• For example, more income relates to less anxiety,
less income relates to more anxiety. Note that
detecting a relationship does not mean you have
found an influence or a cause, although it may be
that this is the case.
• Again, graphical displays help to show the
analysis.
Bivariate Analysis
• Scattergrams are a useful type of diagram that graphically
shows the relationship between two variables by plotting
variable data from cases on a two-dimensional matrix.
• If the resulted plotted points appear in a scattered and
random arrangement, then no association is indicated.
• If however they fall into a linear arrangement, a
relationship can be assumed, either positive or negative.
• The closer the points are to a perfect line, the stronger the
association.
• A line that is drawn to trace this notional line is called the
line of best fit or regression line. This line can be used to
predict one variable value on the basis of the other (see
figure).
Bivariate Analysis
• Cross tabulation (contingency tables) is a simple way to display the
relationship between variables that have only a few categories.
• They show the relationships between each of the categories of the variables
in both number of responses and percentages.
• In addition, the column and row totals and percentages are shown (see
Figure).
Bivariate Analysis
• The choice of appropriate statistical tests that do bivariate
analysis depends on the levels of measurement used in the
variables.
• These tests are called by exotic names, e.g.
• Pearson’s correlation coefficient (r), used for examining
relationships between interval/ratio variables,
• Spearman’s rho (U) which is used when both variables are
ordinal, or when one is ordinal and the other is
interval/ratio.
Bivariate Analysis – Statistical Significance
Are the results simply occasioned by chance or are they truly
representative, i.e. are they statistically significant?
• To estimate the likelihood that the results are relevant to the
population as a whole one has to use statistical inference.
• The most common statistical tool for this is known as the chi-
square test.
• This measures the degree of association or linkage between two
variables by comparing the differences between the observed
values and expected values if no association were present i.e. those
that would be a result of pure chance.
Bivariate Analysis – Analysis of Variance
• The previous mentioned tests are all designed to look for
relationships between variables.
• Another common requirement is to look for differences
between values obtained under two or more different
conditions, e.g.
• a group before and after a training course
• three groups after different training courses

There are a range of tests that can be applied to discern the


variance depending on the number of groups.
Bivariate Analysis – Analysis of Variance
•For a single group, say
the performance of
students on a particular
course compared with the
mean results of all the other
courses in the university
you can use the chi-square
or the one group t-test.
Bivariate Analysis – Analysis of Variance
• For two groups, e.g. comparing the
results from the same course at two
different universities, you can use the
two group t-tests, which compares the
means of two groups. There are two
types of test,
• one for paired scores, i.e. where the
same persons provided scores under
each condition,
• for unpaired scores where this is not
the case.
Bivariate Analysis – Analysis of Variance
•For three or more
groups, e.g. the performance
of three different age groups in
a test. It is necessary to
identify the dependent and
independent variables that will
be tested. A simple test using
SPSS is ANOVA (analysis of
variance).
The results of the various statistical tests
described previously are expressed as
numbers which indicate the characteristic
and strength of the associations.
Multivariate Analysis
• Multivariate analysis looks at the relationships
between more than two variables.
• Types:
• ELABORATION ANALYSIS
• MULTIPLE REGRESSION
• LOGISTIC REGRESSION
Multivariate Analysis – Elaboration Analysis
• This tests the effect of a third variable in the relationship
between two variables,
• for example, the effect of gender on the income and level of education
of a group of people. This uses a simple tabular comparison by
generating two tables and comparing them.
• You can continue the process of producing tables for fourth and fifth
variables, but this quickly becomes unwieldy and difficult to achieve
enough data in each table to achieve significant results.
• There are better ways to understand the interactions between
large numbers of variables and the relative strength of their
influence using regression techniques: multiple regression and
logistic regression.
Multivariate Analysis – Multiple Regression
• This is a technique used to measure the effects of two or more
independent variables on a single dependent variable measured
on interval or ratio scales,
• e.g. the effect on income due to age, education, ethnicity, area
of living, and gender.
Multivariate Analysis – Logistic Regression
• This method is a development of multiple regression, that has the added
advantage of holding certain variables constant in order to assess the
independent influence of key variables of interest.
• It is suitable for assessing the influence of independent variables on
dependent variables measured in nominal scale
• (e.g. whether a candidate’s decision to accept a job was determined by a range of
considerations such as amount of income, future promotion prospects, level of
enjoyment of corporate life, amount of interest in the duties etc.).
Non-parametric Statistical Tests
• Statistical tests built around
discovering the means,
standard deviations etc. of
the typical characteristics of
a Gaussian curve are clearly
inappropriate for analyzing
non-parametric data that
does not follow this pattern
Non-parametric Statistical Tests
• Non-parametric statistical
tests are used when:
• the sample size is very small;
• few assumptions can be made
about the data;
• data are rank ordered or
nominal;
• samples are taken from several
different populations
Non-parametric Statistical Tests
Some of the tests that you may encounter are:
1. Kolmogorov-Smirnov (used to test a two-sample case with
independent samples, the values of which are ordinal),
2. Kruskal-Wallis (an equivalent of the analysis of variance on
independent samples, with variables measured on the ordinal
scale),
3. Cramer coefficient (gives measures of association of variables
with nominal categories) and
4. Spearman and Kendall (which provide a range of tests to
measures association such as rank order correlation
coefficient, coefficient of concordance and agreement for
variables measured at the ordinal or interval levels).
REFERENCE:

Walliman, Nicholas (2011). Research Methods: The Basics. Routledge. Taylor & Francis Group.

You might also like