Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Analysis of Data and Interpretetion of The Results of Statistical Computations

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 62

ANALYSIS of DATA and

INTERPRETETION of the RESULTS


of STATISTICAL COMPUTATIONS

By: REYNALDO H. DALAYAP, JR., Ph.D.


Statistics has two major areas

1. Descriptive Statistics
• Gives numerical and graphic procedures to
summarize a collection of data in a clear
and understandable way.
2. Inferential Statistics
• Provides procedures to draw inferences
about a population from a sample
Arrangement of the statement of the
problem is based from the major areas of
statistics to be used.

Problems that needs descriptive statistics


should be written first.

Followed by problems that requires


inferential statistics for analysis.
Example:
This study seeks to answer the following
questions:
1. What is the level of (employee’s
performance)?
2. What is the extent of (implementation of
the program)?
Example:
3. Is there a significant relationship
between (employee’s performance and
trainings attended)?

4. Is there a significant difference


between (employee’s performance when
classified according to gender)?
What are problems to be considered with
corresponding hypothesis?
The statement of the problems that require
inferential statistics have corresponding
hypotheses.
These are problems stated as:
a) Is there a significant relationship
between (employee’s performance and
trainings attended)?

b) Is there a significant difference between


(employee’s performance when classified
according to gender)?
SKSU style of thesis involve Null Hypothesis (Ho)
Example:
The Null Hypothesis of previous statement of
the problems are the following.
a) There is no significant relationship
between (employee’s performance and
trainings attended)?

b) There is no significant difference


between (employee’s performance when
classified according to gender)?
Typically, there are two general types of statistic that are
used to describe data (under descriptive statistics)

• Measures of central tendency: these are ways of


describing the central position of a frequency distribution
for a group of data. We can describe this central
position using a number of statistics, including the
mode, median, and mean.
• Measures of spread: these are ways of summarizing
a group of data by describing how spread out the
scores are. To describe this spread, a number of
statistics are available to us, including the range,
quartiles, absolute deviation, variance and standard
deviation.
• The presentation of a mean requires a
researcher to indicate standard deviation.
• Reporting a mean might be insufficient
information because mean is sometimes
affected by extreme scores.
• How small is Standard Deviation for us to
say that a set of scores is homogeneous?
(not widely scattered)
• How large is Standard Deviation for us to
say that a set of scores is not
homogeneous? (heterogeneous or widely
scattered)
Coefficient of Variation (CV)

CV = (SD/mean)(100%)

If CV is less than or equal to 20% ,


homogeneous

If CV is greater than 20%, scattered


• Inferential statistics are used to draw
conclusions about a population by examining
the sample
POPULATION is the totality of all subjects
possessing certain common characteristics
that are being studied
Sample is the subset of the population
To find the number of sample, use the formula
below.
Number of sample from a population
To find the number of sample, use the formula
below.
Slovin’s formula
n= N/(1 + Ne2), where n= number of sample
N=total population
e= error tolerance
Example: When a population is group of 3000
employees and e= 0.05, find n.
n= 3000/[1 + 3000(0.05)2]
• n= 352.94, say n= 353 employees belong to
sample
Sampling Techniques
1. simple random sampling
Random samples are selected by using
chance methods or random numbers.
- the name of each member of the population
is written in a small sheet of paper.
- rolled the paper then place in the box. Mix
thoroughly the rolled paper
- selection of the sample without replacement
Stratified Sampling
- the population is divided into groups (called
strata)
- number of sample from each stratum is
computed in proportion
Systematic Sampling
- Randomly pick sample as random start
- It is done by numbering each subject of the
population and then selecting every kth
number.
Cluster sampling
- Intact groups called clusters are used to select
samples.
- Cluster sampling may be used when it is either
impossible or impractical to compile an
exhaustive list of the elements that make up
the target population.
• To conduct a cluster sample, the researcher
first selects groups or clusters and then from
each cluster, selects the individual subjects
either by simple random sampling or
systematic random sampling.
• Or, if the cluster is small enough, the
researcher may choose to include the entire
cluster in the final sample rather than a subset
from it.
• Multi-stage sampling represents a more
complicated form of cluster sampling in which
larger clusters are further subdivided into
smaller, more targeted groupings for the
purposes of surveying.
• Researchers used a multi-stage sampling
design to survey teachers in Region XII in
order to examine whether socio-demographic
characteristics determine teachers’ attitudes
towards adolescent sexuality education. First-
stage sampling included a simple random
sample to select 15 secondary schools in the
region. The second stage of sampling selected
10 teachers from each of these schools, who
were then administered questionnaires.
• Note:
Non-probability sampling should be avoided:
• volunteer samples
• haphazard (convenience) samples
Methods of collecting data
1. direct - interview
2. indirect – use of validated questionnaire
twice validated instrument is required
3. registration – data taken from records
4. observation – carefully watching and
recording data
5. experimentation – set up experiment
• Good instrument is needed for research. The
instrument must be pretested. To know
whether questions flow smoothly, instructions
are in proper place, whether the questions are
suitably and clearly worded, and whether the
instrument is reliable and valid.

The attributes of an instrument that


researchers typically pay close attention to are
its validity and reliability.
• Validity
The validity of test in instrument refers to the
degree to which such test measures what it
intends to measure.

Reliability
The concept of consistency and accuracy form
the reliability of a test. Consistency means
that the instrument yields similar results in
two or more testing occasions under identical
circumstances.
• Ways to reinforce validity of an instrument
1. Item inspection
Have the initial draft of the instrument
inspected by a group of evaluators such as
thesis advisers, test construction experts, and
teachers or professionals whose
specializations are related to the subject
matter.
Ways to reinforce validity of an instrument
2. Inter-Judge Consistency
You may collate the data gathered from the
evaluators for analysis. If you have requested
three persons to inspect your first draft, you
will have to look for the agreement or
consistency of their judgment.
3. First trial run
4. Item analysis
Two Basic Types of Inferential Statistics
Parametric statistics– techniques which make the
assumption that you are working with a normal
distributions and that the sample is random

Nonparametric statistics– techniques which make


few if any assumptions about the nature of the
population from which the sample is taken
Scales/levels of measurement
Nominal level
-number is used to express identity
Ex. Age, political party, religion, race

Ordinal level
-number is used to express rank
Ex. Rank in military, results of contests, students’
year level
Interval level

- Numbers point size and magnitude


- There is an arbitrary zero
- Ex. Temperature, scores in a quiz,
- Researcher can form differences
- Sometimes multiplication and division are not
applicable
Ratio level

- The highest level of measurement


- It has all the characteristics of interval level
- It has an absolute zero
- Addition, subtraction, mult., division are
applicable
Alternative and Null Hypotheses
• Alternative Hypothesis, symbolized by Ha, is a
statistical hypothesis that states that there is a
difference or relationship between two
parameters.
• Null Hypothesis, symbolized by Ho, is a statistical
hypothesis that states that there is no difference
or relationship between two parameters.

• If the .05 level is achieved (p is equal to or less


than .05), then a researcher rejects the H0 and
accepts the H1
• If the .05 significance level is not achieved, then
the H0 is retained
Degrees of Freedom
• Degrees of freedom (df) are the way in which the
scientific tradition accounts for variation due to
error
• it specifies how many values vary within a
statistical test
• scientists recognize that collecting data can never be
error-free
• each piece of data collected can vary, or carry error that
we cannot account for
• by including df in statistical computations, scientists
help account for this error
• there are clear rules for how to calculate df for each
statistical test and it is used to determine the critical
value from the table.
Testing Hypothesis
• Type I error -- rejecting Ho when it was true (it
should have been accepted)
– equal to alpha
– if  = .05, then there’s a 5% chance of Type I error

• Type II error -- accepting Ho when it should have


been rejected
– If you increase , you will decrease the chance of
Type II error
Level of significance ()-the maximum probability of
commiting a type I error in hypothesis testing
 = .05 level of significance
-It means that there are about 5 chances in 100
that we would reject the hypothesis when it
should be accepted.
- we are about 95% confident that we have made
the right decision
if the hypothesis is rejected at  = .05 level of
significance, we could be wrong with probability
0.05
if the hypothesis is rejected at  = .01 level of
significance, we could be wrong with probability
0.01
Steps Involve in Inferential Statistics
1. State Hypothesis
2. Level of Significance ( )
 = .05 level of significance (for social researches)
 = .01 level of significance (for pure science )
3. Obtain Critical Value
– a criterion used based on df and alpha level (.05
or .01) is compared to the calculated value to
determine if findings are significant and therefore
reject Ho
4. Decision Rule
Based on the CALCULATED value compared to the
CRITICAL value to determine if the difference is
significant enough to reject Ho at the predetermined
level of significance
• If CALCULATED value is less than CRITICAL value ,
accept Ho
• If CALCULATED value is greater than or equal to
CRITICAL value reject Ho
Reject the null hypothesis, if and only if the p-
value (probability)is less than the level of
significance (computer program is used to
analyze data)
5. Computing Test Value
– Use statistical test to derive some calculated value
(e.g., t value or F value)
6. Decision: decide to reject or to accept null hypothesis
6. Interpretation of results of the analysis
Step 1. describe the analysis
- for relationship of variables
- for the differences between variables
Step 2. State the result of computation of the
test value
Step 3. Indicate the probability to reject Ho or
critical value
Step 4. State the level of significance
Step 5. Write your claim base from the result (do
not repeat the null hypothesis)
Step 6. Provide implications for your claim
Step 7. Deepen the discussion by giving more
meaning to the result of the analysis.
Equivalent statements
There is significant relationship between
employee’s performance and trainings
attended.
The employee’s performance is associated
with trainings attended.
Trainings attended may likely affect
employee’s performance.
There is a possibility that as number of
trainings attended increase, employee’s
performance increases.
Equivalent statements
There is significant difference between
employee’s performance when classified
according to gender.
The difference between employee’s
performance when classified according to
gender is greater than expected by chance.
It is possible that male employees performed
well compared to female.
It is possible that female employees performed
well compared to male.
• t-test of Dependent Means
Test the difference between means from a single
group.
• Example:
• Researchers want to test a new anti-hunger
weight loss pill. They have 10 people rate their
hunger both before and after taking the pill.
Does the pill do anything? Use alpha = 0.05
• ∑d=17, ∑d2 = 49
• Using the formula below, t=3.60, t-tab.= 1.833
Interpretation of results of the analysis

Step 1. describe the analysis


The analysis on the comparison of means
before and after taking anti-hunger weight loss pill.
resulted to
Step 2. State the result of computation of the
test value.
a computed t= 3.6
Step 3. Indicate the critical value to reject Ho
critical value = 1.833.
Step 4. State the level of significance
which is greater than  = .05.
Step 5. Write your claim base from the result (do not
repeat the null hypothesis)
There is enough evidence to support the claim that
people’s hunger before is not the same as their hunger
after taking wt. loss pill.
Step 6. Provide implications for your claim
The results imply that the weigh loss pill has
something to do with people’s hunger.
Step 7. Deepen the discussion by giving more
meaning to the result of the analysis
t-test of independent means
It is used to compare two means from two independent
groups.
• Example:
To determine whether active participation in
extra curricular activities are detrimental to
one’s grades, the following grade point
averages were collected:
Active: 79 83 85 88 78 95 87 83
Not active: 90 87 84 91 88 93 82 89
Assuming the population to be normally
distributed, test at 5% level of significance
whether to participate actively in school
extra-curricular activities is detrimental to
one’s grade.
• Results of the analysis on the comparison of the students’
grade using t-test.

Comparison mean SD t Probability

A 84.75 5.42 1.41 0.18


B 88 3.63

 = .05 level of significance


Null Hypothesis: Student’s active participation in extra
curricular activities are not detrimental to his/her
grades.
Mean difference = 88-84.75=3.25
6. Interpretation of results of the analysis

Step 1. describe the analysis


The analysis of students’ grades classified
according to their participation in extra-curricular
activities resulted to
Step 2. State the result of computation of the
test value.
a computed t= 1.41
Step 3. Indicate the probability to reject Ho or
critical value.
with a probability of 0.18
Step 4. State the level of significance
which is greater than  = .05.
Step 5. Write your claim base from the result (do not
repeat the null hypothesis)
There is enough evidence to support the claim that
students’ participation in extra curricular activities do not
affect their grades.
Step 6. Provide implications for your claim
The results imply that active students may like
have the same grades to those who are not active in extra
curricular activities.
Step 7. Deepen the discussion by giving more
meaning to the result of the analysis
Analysis of variance (ANOVA)
It is used to compare three or more means.

Example:
Workers are randomly assigned to three
machines on an assembly line. The number of
defective parts produced by each worker for
one day is recorded. The data are shown below.
At .05 level of significance can one conclude
that the mean number of defective parts
produced by the workers is the same?
Machine A : 3 2 0 6 4 3 5

Machine B: 8 6 2 0 1 9 7

Machine C: 10 8 9 12 11 15 17
• Results of the analysis on the comparison of
mean number of defective parts produced by
the workers.
Source sum of df mean F Prob.
Of Var. Squares squares
between 284.86 2 142.43 15.42 0.0001
within 166.29 18 9.238
Total 451.14 20
 = .05 level of significance
• The analysis on the comparison of the defective
parts produced by the workers resulted to
F = 18.42
with a probability of 0.0001.
There is enough evidence to claim that the variation
of the mean defective parts is attributed to the
workers.
The results imply that there is a worker who
produced greater number of defective parts
compared to others.
The machine and skills of the workers significantly
differ in the production of defective parts.
Results of Tukey-Cramer Comparison test

Comparison mean diff. q Prob.

A vs B -1.429 1.244 p> .05


A vs C -8.249 7.337 p< .05
B vs C 7.000 6.893 p< .05

• A and B comparable
• A and C significant
• B and C significant, worker assigned to machine
C produced largest number of defectives than the
workers assigned to B and A.
Machine Mean

A 3.29a
B 4.71a
C 11.71b
Correlation analysis
• Correlation analysis is statistical technique use to
determine the strength or degree of linear
relationship between two variables. A measure of
the degree of linear relationship is called the
correlation coefficient (r).
• r= + , positive correlation, when both variable
increase or decrease at the same time.
• r= -., Negative correlation, as one variable
increase , the other variable decreases, and vice
versa. ( ex. Vol. of production and Price)
r= 0 , it means that two variable vary separately
Example of correlation
• Problem
The following data show the scores (Y) which
student s got in examination and the number of
hours (x) they studied for the examination
X: 9, 3, 10, 12, 11, 10, 6, 8, 7
Y; 85, 78 90 92, 87, 89, 80, 82, 81
Find the value of r. Discuss the result of the
computation.
Result of the correlation Analysis
Pair of r tcomp ttab
variables

score vs. 0.93 6.65 2.365


No. of hours
of study
 = .05 level of significance
• The analysis resulted to a correlation coefficient of r
= .93 that indicates a very strong linear relationship
between students’ scores and number hours they
studied . The test for significance of r resulted to
tcomp=6.65 which is less than ttab=2.365 at .05 level of
significance. There is enough evidence to claim that
the students’ scores are affected by the number of
hours they studied for the examination. It implies
that students with high scores devoted their time to
study. It means that students who performed well in
the examination spend much time to study their
lessons.
Linear Regression
• Linear regression is the most basic and commonly
used predictive analysis. Regression estimates are
used to describe data and to explain the relationship
between one dependent variable and one or more
independent variables.
• At the center of the regression analysis is the task of
fitting a single line through a scatter plot. The
simplest form with one dependent and one
independent variable is defined by the formula y’ = A
+ Bx, where y = estimated dependent, A = constant,
B = regression coefficients, and x = independent
variable.
• Sometimes the dependent variable is also called a
criterion variable, endogenous variable, prognostic
variable, or regressand. The independent variables
are also called exogenous variables, predictor
variables or regressors.
Example of Simple Linear Regression

• Last year, five randomly selected students took a


math aptitude test before they began their statistics
course. The data are the following:
Scores in math: 95 85 80 70 60
Scores in statistics: 85 95 70 65 70
The Statistics Department has three questions.
1. What linear regression equation best predicts
statistics performance, based on math aptitude
scores?
2. If a student made an 80 on the aptitude test, what
grade would we expect her to make in statistics?
3. How well does the regression equation fit the data?
1. The regression equation is a linear equation of the form:
y’ = A + Bx . To conduct a regression analysis, we need
to solve for A and B.
Using scientific calculator, we get A =26.78 and B=0.644.
The regression equation is
y’ = 26.78 + 0.644x. (note: x is independent variable, score
in statistics)
2. How to Use the Regression Equation
Once you have the regression equation, using it is a
snap. Choose a value for the independent variable (x),
perform the computation, and you have an estimated
value (y’) for the dependent variable.
In our example, the independent variable is the student's
score on the aptitude test. The dependent variable is the
student's statistics grade. If a student made an 80 on the
aptitude test, the estimated statistics grade would be:
Y’ = 26.768 + 0.644x = 26.768 + 0.644 * 80 = 26.768 +
51.52 = 78.288
• Warning: When you use a regression equation,
do not use values for the independent variable
that are outside the range of values used to
create the equation. That is called
extrapolation, and it can produce unreasonable
estimates.
• In this example, the aptitude test scores used to
create the regression equation ranged from 60
to 95. Therefore, only use values inside that
range to estimate statistics grades. Using values
outside that range (less than 60 or greater than
95) is problematic.
3. Find the Coefficient of Determination and
explain its role in the problem
How to Find the Coefficient of Determination
• Whenever you use a regression equation, you
should ask how well the equation fits the data.
One way to assess fit is to the coefficient of
determination, denoted by R2, which can be
computed from the following formula.
R2 = (r)2
where: r is the coefficient of correlation

when r = 0.69
R2 = (0.69)2
R2 = 0.4761
• A coefficient of determination equal to 0.4761
indicates that about 47.61% of the variation in
statistics grades (the dependent variable) can
be explained by the relationship to math
aptitude scores (the independent variable).
This would be considered a good fit to the
data, in the sense that it would substantially
improve an educator's ability to predict
student performance in statistics class.
THANK YOU AND GOOD DAY!

You might also like