Fds Unit 4 FINSH
Fds Unit 4 FINSH
Fds Unit 4 FINSH
PREPARED BY
VERIFIED BY
1
UNIT IV
ANALYSIS OF VARIANCE
Sampling Distribution Of T
T-Test Procedure
P-Value
Statistical Significance
F-Test
ANOVA
Three F-Tests
Two-Factor ANOVAs
2
LIST OF IMPORTANT QUESTIONS
UNIT IV
ANALYSIS OF VARIANCE
PART A (2 marks)
4. What are the 3 types of t tests when do you use each one?
3
PART B(16 marks)
4
UNIT IV
PART A(2-MARKS)
value of the null hypothesis. A paired t-test takes paired observations (like before and
after), subtracts one from the other, and conducts a 1-sample t-test on the difference.
4. What are the 3 types of t tests when do you use each one?
If you are studying one group, use a paired t-test to compare the group mean
over time or after an intervention, or use a one-sample t-test to compare the group
mean to a standard value. If you are studying two groups, use a two-sample t-test. If you
5
5.what is property of chi-square distribution ?
A chi-square distribution is a continuous probability distribution. The shape of a chi-
square distribution depends on its degrees of freedom, k. The mean of a chi-square
distribution is equal to its degrees of freedom (k) and the variance is 2k. The range is 0 to
∞.
Two-tailed (non-directional)
Is there a statistically significant difference between the mean value of the sample and
the population?
One-tailed (directional)
Is the mean value of the sample significantly larger (or smaller) than the mean value of
the population?
6
9. How do you find the distribution of T?
Note that t is calculated by dividing the mean difference (E) by the standard error
mean (from the One-Sample Statistics box). C df: The degrees of freedom for the test. For
a one-sample t test, df = n - 1; so here, df = 408 - 1 = 407.
11. What is the formula for sampling distribution?
If a random sample of n observations is taken from a binomial population with
parameter p, the sampling distribution (i.e. all possible samples taken from the population)
will have a standard deviation of .
Standard deviation of binomial distribution = σ p = √[pq/n] where q=1-p.
12. What are the four 4 types of sampling?
Probability sampling methods include simple random sampling, systematic
sampling, stratified sampling, and cluster sampling.
7
It compares the mean of the data of the two groups.
95% confidence interval for the mean difference is calculated.
The data is taken from subjects who have been measured twice.
95% confidence interval is obtained from the difference between the two sets of
joined observations.
8
20. What does the F-test tell you?
The F-test is used in regression analysis to test the hypothesis that all model
parameters are zero. It is also used in statistical analysis when comparing statistical
models that have been fitted using the same underlying factors and data set to determine
the model with the best fit.
Here,
The chi-squared test is done to check if there is any difference between the
observed value and the expected value.
For a Chi-square test, a p-value that is less than or equal to the specified
significance level indicates sufficient evidence to conclude that the observed distribution is
not the same as the expected distribution. Here, we can conclude that a relationship exists
between the given categorical variables.
9
26. How to Perform a Two-Way ANOVA by Hand
Step 1: Calculate Sum of Squares for First Factor (Watering Frequency) ...
Step 2: Calculate Sum of Squares for Second Factor (Sunlight Exposure) ...
Step 3: Calculate Sum of Squares Within (Error) ...
Step 4: Calculate Total Sum of Squares. ...
Step 5: Calculate Sum of Squares Interaction.
10
PART B (16- MARKS)
The alternate hypothesis can also state that the sample mean is greater than or less
than the comparison mean.
Use the degree of freedom and the alpha level (0.05) to find the T-critical.
Step 4: Determine if the computed test statistic falls in the rejection region.
Alternately, simply compute the P-value. If it is less than the significance level (0.05 or
0.01), reject the null hypothesis.
11
2. We have the potato yield from 12 different farms. We know that the standard
potato yield for the given variety is µ=20,x = [21.5, 24.5, 18.5, 17.2, 14.5, 23.2, 22.1,
20.5, 19.4, 18.1, 24.1, 18.5]Test if the potato yield from these farms is significantly
better than the standard yield.
Solution:
H0: x̅ = 20
H1: x̅ > 20
n = 12. Since this is one sample T test, the degree of freedom = n-1 = 12-1 = 11.
12
Step 4: Does it fall in rejection region?
Since the computed T Statistic is less than the T-critical, it does not fall in the rejection region.
3. For example, imagine a company wants to test the claim that their batteries last
more than 40 hours. Using a simple random sample of 15 batteries yielded a mean
of 44.9 hours, with a standard deviation of 8.9 hours. Test this claim using a
significance level of 0.05.
13
4. According to the American Psychological Assoc., members with a doctorate and
a full-time teaching job earn, on the avg, $82,500 per year, with a sd of $6,000. An
investigator wishes to determine whether $82,500 is also the mean salary for all
female members with a doctorate and a full-time teaching appointment. Salaries
are obtained for a random sample of 100 women from this population, and mean
salary equals $80,100. A - What form should Ho and H1 take if the investigator is
concerned only about salary discrimination against female members? B - If this
hypothesis test supports the conclusion of the female salary discrimination a
costly suit will be initiated. Would you recommend using the .05 or the .01 level of
significance and why?
Ho:μ=$82,500
H1:μ≠$82,500(Two Tail)
Test statistics:
z= (82,500−80,100) / (6,000/√100)
z= 4
Decision rule:
Otherwise accept H0
p-value=2P(z>4)=0.000063
Decision:
When α=0.05
p-value ≈0.000063<α=0.05
or
14
When
α=0.01,
p-value ≈0.000063<α=0.0100
Conclusion:
There is sufficient evidence to say that there is salary discrimination against female
members.
(B) Given that the p-value is below both 0.05 and 0.01 Therefore, regardless of the
significance level chosen, the conclusion—that there is a pay difference which holds true for
both levels. Despite the fact that a lower significance level makes it more difficult to reject
the null hypothesis, a higher significance level makes it easier to do so.
5. Calculate a paired t test by hand for the following data:t test for paired sample
15
Step 2: Add up all of the values from Step 1 then set this number aside for a
moment.
16
Step 5: Use the following formula to calculate the t-score:
Step 6: Subtract 1 from the sample size to get the degrees of freedom. We have 11 items.
So 11 – 1 = 10.
Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6. But if you
don’t have a specified alpha level, use 0.05 (5%).
So for this example t test problem, with df = 10, the t-value is 2.228.
Step 8: In conclusion, compare your t-table value from Step 7 (2.228) to your calculated t-
value (-2.74). The calculated t-value is greater than the table value at an alpha level of .05.
In addition, note that the p-value is less than the alpha level: p <.05. So we can reject the
null hypothesis that there is no difference between means.
However, note that you can ignore the minus sign when comparing the two t-values as ±
indicates the direction; the p-value remains the same for both directions.
17
6. 256 visual artists were surveyed to find out their zodiac sign. The results were:
Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19),
Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the
hypothesis that zodiac signs are evenly distributed across visual artists.
Step 2: Fill in your categories. Categories should be given to you in the question.
There are 12 zodiac signs, so:
18
Step 3: Write your counts. Counts are the number of each items in each category in
column 2. You’re given the counts in the question:
Step 4: Calculate your expected value for column 3. In this question, we would expect
the 12 zodiac signs to be evenly distributed for all 256 people, so 256/12=21.333.
Write this in column 3.
Step 5: Subtract the expected value (Step 4) from the Observed value (Step 3) and place
the result in the “Residual” column. For example, the first row is Aries: 29-
19
21.333=7.667.
Step 6: Square your results from Step 5 and place the amounts in the (Obs-
Exp)2 column.
20
Step 7: Divide the amounts in Step 6 by the expected value (Step 4) and place those
results in the final column.
21
7. A group of secondary education student teachers were given 2 1/2 days of training in
interpersonal communication group work. The effect of such a training session on the
dogmatic nature of the student teachers was measured y the difference of scores on the
"Rokeach Dogmatism test given before and after the training session. The difference "post
minus pre score" was recorded as follows:
16, -5, 4, 19, -40, -16, -29, 15, -2, 0, 5, -23, -3, 16, -8, 9, -14, -33, -64, -33
Can we conclude from this evidence that the training session makes student
teachers less dogmatic (at the 5% level of significance) ?
This is of course the same example as before, where we incorrectly used the normal
distribution to compute the probability in the last step. This time, we will do it correctly,
which is fortunately almost identical to the previous case (except that we use TDIST instead
of NORMDIST):
22
8. Explain about the Analyze Sample Data ?
Using sample data, find the degrees of freedom, expected frequencies, test statistic,
and the P- value associated with the test statistic. The approach described in this section is
illustrated in the sample problem at the end of this lesson.
DF = (r - 1) * (c - 1)
where r is the number of levels for one catagorical variable, and c is the number of
levels for the other categorical variable.
where Er,c is the expected frequency count for level r of Variable A and level c of
Variable B, nr is the total number of sample observations at level r of Variable A, nc
is the total number of sample observations at level c of Variable B, and n is the
total sample size.
where Or,c is the observed frequency count at level r of Variable A and level c of
Variable B, and Er,c is the expected frequency count at level r of Variable A and
level c of Variable B.
23
9. A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were
classified by gender (male or female) and by voting preference (Republican, Democrat, or
Independent). Results are shown in the contingency table below.
Voting Preferences
Ro
Republican w
Democrat tota
Independent l
Is there a gender gap? Do the men's voting preferences differ significantly from the
women's preferences? Use a 0.05 level of significance.
Solution
The solution to this problem takes four steps:
(1) state the hypotheses,
(2) formulate an analysis plan,
(3) analyze sample data, and
(4) interpret results.
We work through those steps below:
State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
24
Formulate an analysis plan. For this analysis, the significance level is 0.05. Using sample
data, we will conduct a chi-square test for independence.
Analyze sample data. Applying the chi-square test for independence to
sample data, we compute the degrees of freedom, the expected frequency
counts, and the chi-square test statistic. Based on the chi-square statistic and
the degrees of freedom, we determine the P-value.
DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2
where DF is the degrees of freedom, r is the number of levels of gender, c is the number of
levels of the voting preference, nr is the number of observations from level r of gender, nc is
the number of observations from level c of voting preference, n is the number of
observations in the sample, Er,c is the expected frequency count when gender is level r and
voting preference is level c, and Or,c is the observed frequency count when gender is level r
voting preference is level c.
The P-value is the probability that a chi-square statistic having 2 degrees of freedom
is more extreme than 16.2.
25
We use the Chi-Square Distribution Calculator to find P(Χ2 > 16.2) = 0.0003.
Interpret results. Since the P-value (0.0003) is less than the significance level (0.05),
we cannot accept the null hypothesis. Thus, we conclude that there is a relationship
between gender and voting preference.
10. A certain group of welfare recipients receives SNAP benefits of $110 per week with a
standard deviation of $20. If a random sample of 25 people is taken, what is the probability
their mean benefit will be greater than $120 per week?
Step 1: Insert the information into the z-formula:
= (120-110)/20 √25 = 10/ (20/5) = 10/4 = 2.5.
Step 2: Look up the z-score in a table (or calculate it using technology). A z-score of
2.5 has an area of roughly 49.38%. Adding 50% (for the left half of the curve), we get
99.38%.
11. Conduct an F-Test on the following samples: Sample-1 having variance = 109.63, sample
size = 41.
Solution:
Step-2:- Calculate the F-critical value. Here take the highest variance as the numerator and
the lowest variance as the denominator:
FValue=σ21σ22
FValue=109.6365.99
FValue=1.66
26
Step-3:- Calculate the degrees of freedom as:
The degrees of freedom in the table will be the sample size -1, so for sample-1 it is 40 and for
sample-2 it is 20.
Step-4:- Choose the alpha level. As, no alpha level was given in the question, so we may use
the standard level of 0.05. This needs to be halved for the test, so use 0.025.
Step-5:- We will find the critical F-Value using the F-Table. We will use the table with 0.025.
Critical-F for (40,20) at alpha (0.025) is 2.287.
Step-6:- Compare the calculated value to the standard table value. If our calculated value is
higher than the table value, then we may reject the null hypothesis. Here, 1.66 < 2 .287. So,
we cannot reject the null hypothesis.
27
Lower p-value means a significant difference in the considered values from the
population value that was hypothesized in the beginning. The results are highly significant if
the p-value is very less, i.e. 0.05 as it is rarely practiced.
When measuring the level of statistical significance of the result, the researcher first
needs to evaluate the p-value. It defines the probability of isolating an outcome which
shows that the null hypothesis is true. If the p-value is less than the level of significance (α),
the null hypothesis is declined. If the p-value observed is equal to or greater than the
significance level α, then hypothetically, the null hypothesis is made customary. When in
real practice, the sample size is increased to check whether the significance level is
reached. In general practice, we consider p-value based upon the level of significance of
10%. As per the above assumption,
If p > 0.1, the null hypothesis will not be considered as an assumption
If p > 0.05 and ≤ 0.1, the null hypothesis has a chance of low assumption.
If p > 0.01 and ≤ 0.05, the null hypothesis is strongly assumed
If p ≤ 0.01, the null hypothesis is very significantly assumed.
The rejection rule of the null hypothesis is as follows:
If p < α, then one must reject the null hypothesis
If p > α, then one should not reject the null hypothesis
Rejection Region
The values of test static for which the null hypothesis is rejected is considered to be
the rejection region.
Non-Rejection Region
The set of all potential outcomes for which the null hypothesis is not rejected is
called the non-rejection region.
13. Explain about General Steps for an F Test
If you’re running an F Test, you should use Excel, SPSS, Minitab or some other kind
of technology to run the test. Why? Calculating the F test by hand, including variances, is
tedious and time-consuming. Therefore you’ll probably make some errors along the way.
If you’re running an F Test using technology (for example, an F Test two sample for
variances in Excel), the only steps you really need to do are Step 1 and 4 (dealing with the
null hypothesis). Technology will calculate Steps 2 and 3 for you.
Decision rule: If F > F critical value then reject the null hypothesis.
To determine the critical value of an ANOVA f test the degrees of freedom are given
by df1df1 = K - 1 and df1df1 = N - K, where N is the overall sample size and K is the
number of groups.
15. A research team wants to study the effects of a new drug on insomnia. 8 tests were
conducted with a variance of 600 initially. After 7 months 6 tests were conducted with a variance
of 400. At a significance level of 0.05 was there any improvement in the results after 7 months?
Solution: As the variance needs to be compared, the f test needs to be used.
H0H0 : s21=s22s12=s22
H1H1 : s21>s22s12>s22
n1n1 = 8, n2n2 = 6
df1df1 = 8 - 1 = 7
df2df2 = 6 - 1 = 5
s21s12 = 600, s22s22 = 400
The f test formula is given as follows:
F = s21s22s12s22 = 600 / 400
F = 1.5
Now from the F table the critical value F(0.05, 7, 5) = 4.88
29
As 1.5 < 4.88, thus, the null hypothesis cannot be rejected and there is not enough
evidence to conclude that there was an improvement in insomnia after using the new drug.
Answer: Fail to reject the null hypothesis.
16. Three sets of five mice were randomly selected to be placed in a standard maze but
with different color doors. The response is the time required to complete the maze as seen
below. Perform the appropriate analysis to test if there is an effect due to door color. (Use α
= 0.01) Color Time Red 9 11 10 9 15 Green 20 21 23 17 30 Black 6 5 8 14 7
Step 0 : Check Assumptions
Step 1 : Hypotheses H0: µ Red = µGreen = µBlack Ha: at least one inequality
Step 3 : Critical Value and Rejection Region Critical Value: Fα , df1 =k −1,df 2 ( ) =N − k =
F0.01, df1 = 2,df 2 ( ) =12 = 6.93 Reject the null hypothesis if F ≥ 6.93.
30
ni ∑ i=1 k ∑ − ( ) Ti 2 i=1 ni k ∑ = 3537 − 3367.4 =169.6 Source df SS MS = SS/df F-
statistic p-value Treatments 2 565.7333 282.8667 20.0142 p-value < 0.001 Error 12
Step 5 : Decision Since 20.0142 ≥ 6.93 (p-value ≤ 0.01), we shall reject the null hypothesis.
Step 6 : State conclusion in words At the α = 0.01 level of significance, there exists enough
Step 1 : In two-way ANOVA we have two pairs of hypotheses, one for treatments and one
for the blocks.
Framing Hypotheses
Null Hypotheses
groups (Treatments)
H02: There is no significant difference among the population means of different Blocks
Alternative Hypotheses
Step 2 : Data is presented in a rectangular table form as described in the previous section.
31
To find the test statistic we have to find the following intermediate values.
32
Step 6 : Critical values
Step 7 : Decision
For Treatments: If the calculated F0t value is greater than the corresponding critical
value, then we reject the null hypothesis and conclude that there is significant difference
among the treatment means, in atleast one pair.
For Blocks: If the calculated F0b value is greater than the corresponding critical value, then
we reject the null hypothesis and conclude that there is significant difference among the
block means, in at least one pair.
33
18.A reputed marketing agency in India has three different training programs for its
salesmen. The three programs are Method – A, B, C. To assess the success of the
programs, 4 salesmen from each of the programs were sent to the field. Their
performances in terms of sales are given in the following table.
Test whether there is significant difference among methods and among salesmen.
Solution:
Step 1 : Hypotheses
That is, there is no significant difference among the three programs in their mean sales.
Alternative Hypotheses:
H11 : At least one average is different from the other, among the three programs.
H12 : At least one average is different from the other, among the four salesmen.
Step 2 : Data
34
35
ANOVA Table (two-way)
36
f(3, 6),0.05 = 4.7571 (for treatments)
Step 7 : Decision
conclude that there is significant difference in the mean sales among the three
programs.
that there does not exist significant difference in the mean sales among the four
salesmen.
37