Fds Unit 4 FINSH

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS
II YEAR / IV SEMESTER (B.Tech- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE)

UNIT – IV
ANALYSIS OF VARIANCE
PREPARED BY
S.SANTHI PRIYA, M.E., (AP/ AI&DS)
VERIFIED BY
HOD PRINCIPAL CEO/CORRESPONDENT
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
SENGUNTHAR COLLEGE OF ENGINEERING ,TIRUCHENGODE-637 205.
1
UNIT IV
 T-Test For One Sample
 Sampling Distribution Of T
 T-Test Procedure
 T-Test For Two Independent Samples
 P-Value
 Statistical Significance
 T-Test For Two Related Samples.
 F-Test
 ANOVA
 Two- Factor Experiments
 Three F-Tests
 Two-Factor ANOVAs
 Introduction To Chi-Square Tests.
2
LIST OF IMPORTANT QUESTIONS
UNIT IV
PART A (2 marks)
1.What is a one-sample t-test and when is it used?
2. What is the formula for a one-sample t-test?
3. What is the difference between one-sample t-test and paired t-test?
4. What are the 3 types of t tests when do you use each one?
5.what is property of chi-square distribution ?
6. Difference between two tail and one tail ?

7. what is One Sample t-test formula ?
8.What is the sampling distribution of T?
9. How do you find the distribution of T?
10. How do you find the sample T?
3
PART B(16 marks)
4
UNIT IV
PART A(2-MARKS)
1.What is a one-sample t-test and when is it used?

The one-sample t-test is used when we want to know whether our sample
comes from a particular population but we do not have full population
information available to us. For instance, we may want to know if a particular
sample of college students is similar to or different from college students in general.
2. What is the formula for a one-sample t-test?
T = (X̄ – μ) / S/√n
Figure 1: one sample test

 Where, X̄ is the sample mean
 μ is the hypothesized population mean
 S is the standard deviation of the sample
 n is the number of sample observations.
3. What is the difference between one-sample t-test and paired t-test?

You use a 1-sample t-test to assess the difference between a sample mean and the
value of the null hypothesis. A paired t-test takes paired observations (like before and
after), subtracts one from the other, and conducts a 1-sample t-test on the difference.
4. What are the 3 types of t tests when do you use each one?
If you are studying one group, use a paired t-test to compare the group mean
over time or after an intervention, or use a one-sample t-test to compare the group
mean to a standard value. If you are studying two groups, use a two-sample t-test. If you
want to know only whether a difference exists, use a two-tailed test.
5
5.what is property of chi-square distribution ?
A chi-square distribution is a continuous probability distribution. The shape of a chi-
square distribution depends on its degrees of freedom, k. The mean of a chi-square
distribution is equal to its degrees of freedom (k) and the variance is 2k. The range is 0 to
∞.
6. Difference between two tail and one tail ?
Two-tailed (non-directional)
Is there a statistically significant difference between the mean value of the sample and
the population?
One-tailed (directional)
Is the mean value of the sample significantly larger (or smaller) than the mean value of
the population?
7. what is One Sample t-test formula ?

You can calculate the t-test either with a statistics software like DATAtab or by hand. For the
calculation by hand you first need the test statistics t, which can be calculated for the one sample t-test with
the formula
8.what is the sampling distribution of T?

The t-distribution is a way of describing a set of observations where most
observations fall close to the mean, and the rest of the observations make up the
tails on either side. It is a type of normal distribution used for smaller sample sizes, where
the variance in the data is unknown.
6
9. How do you find the distribution of T?
Figure 2: formula for t distribution
The formula to calculate T-distribution (also popularly known as student's T-distribution) is

shown as subtracting the population mean (mean of the second sample) from the
sample mean ( mean of the first sample) that is [ x̄ – μ ] which is then divided by the
standard deviation of means.
10. How do you find the sample T?
Note that t is calculated by dividing the mean difference (E) by the standard error
mean (from the One-Sample Statistics box). C df: The degrees of freedom for the test. For
a one-sample t test, df = n - 1; so here, df = 408 - 1 = 407.
11. What is the formula for sampling distribution?
If a random sample of n observations is taken from a binomial population with
parameter p, the sampling distribution (i.e. all possible samples taken from the population)
will have a standard deviation of .
Standard deviation of binomial distribution = σ p = √[pq/n] where q=1-p.
12. What are the four 4 types of sampling?
Probability sampling methods include simple random sampling, systematic
sampling, stratified sampling, and cluster sampling.
13. What is the procedure of t-test?

Mathematically, the t-test takes a sample from each of the two sets and
establishes the problem statement. It assumes a null hypothesis that the two means are
equal. Using the formulas, values are calculated and compared against the standard
values. The assumed null hypothesis is accepted or rejected accordingly.
14. What is mean by Paired Vs Unpaired T-Test ?

The similarity between paired and unpaired t-test is that both assume data from the
normal distribution.
Characteristics of Unpaired T-Test:
 The two groups taken should be independent.

 The sample size of the two groups need not be equal.
7
 It compares the mean of the data of the two groups.
 95% confidence interval for the mean difference is calculated.
Characteristics of Paired T-Test:
 The data is taken from subjects who have been measured twice.
 95% confidence interval is obtained from the difference between the two sets of
joined observations.
15. What are the four steps in testing procedures?

Step 1: State the hypotheses.
Step 2: Set the criteria for a decision.
Step 3: Compute the test statistic.
Step 4: Make a decision.
16. What does a P value tell you?
The p-value is the probability that the null hypothesis is true. (1 – the p-value) is
the probability that the alternative hypothesis is true. A low p-value shows that the results
are replicable. A low p-value shows that the effect is large or that the result is of major
theoretical, clinical or practical importance.
17. How p-value is calculate?

The p-value is calculated using the sampling distribution of the test statistic under the
null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-
tailed test, or two-sided test). The p-value for: a lower-tailed test is specified by:
p-value = P(TS ts | H 0 is true) = cdf(ts)
18. What is a statistically significant p-value?

In most sciences, results yielding a p-value of . 05 are considered on the borderline
of statistical significance. If the p-value is under . 01, results are considered statistically
significant and if it's below . 005 they are considered highly statistically significant.
19. What is F-test in ANOVA?

The F-test in one-way analysis of variance (ANOVA) is used to assess whether the
expected values of a quantitative variable within several pre-defined groups differ
from each other.
8
20. What does the F-test tell you?
The F-test is used in regression analysis to test the hypothesis that all model
parameters are zero. It is also used in statistical analysis when comparing statistical
models that have been fitted using the same underlying factors and data set to determine
the model with the best fit.
21. What is the chi-square test write its formula?

When we consider the null hypothesis is true, the test statistic’s sampling distribution
is called chi-squared distribution. The formula for chi-square is:
χ^2 = ∑(O_i – E_i)^2/E_i
Here,
O_i = Observed value
E_i = Expected value
22. What is a chi-square test used for?
The chi-squared test is done to check if there is any difference between the
observed value and the expected value.
23.How do you interpret a chi-square test?
For a Chi-square test, a p-value that is less than or equal to the specified
significance level indicates sufficient evidence to conclude that the observed distribution is
not the same as the expected distribution. Here, we can conclude that a relationship exists
between the given categorical variables.
24.What is a good chi-square value?
A good chi-square value is assumed to be 5. As we know, for the chi-square

approach to be valid, the expected frequency should be at least 5.
25. What is two-way ANOVA with example?

With a two-way ANOVA, there are two independents. For example, a two-way
ANOVA allows a company to compare worker productivity based on two independent
variables, such as department and gender. It is utilized to observe the interaction
between the two factors. It tests the effect of two factors at the same time.
9
26. How to Perform a Two-Way ANOVA by Hand
Step 1: Calculate Sum of Squares for First Factor (Watering Frequency) ...
Step 2: Calculate Sum of Squares for Second Factor (Sunlight Exposure) ...
Step 3: Calculate Sum of Squares Within (Error) ...
Step 4: Calculate Total Sum of Squares. ...
Step 5: Calculate Sum of Squares Interaction.
10
PART B (16- MARKS)
1.write Procedure to do One Sample T Test ?

Step 1: Define the Null Hypothesis (H0) and Alternate Hypothesis (H1)
H0: Sample mean (x̅ ) = Hypothesized Population mean (µ)
H1: Sample mean (x̅ ) != Hypothesized Population mean (µ)
The alternate hypothesis can also state that the sample mean is greater than or less
than the comparison mean.
Step 2: Compute the test statistic (T)
Step 3: Find the T-critical from the T-Table
Use the degree of freedom and the alpha level (0.05) to find the T-critical.
Step 4: Determine if the computed test statistic falls in the rejection region.
Alternately, simply compute the P-value. If it is less than the significance level (0.05 or
0.01), reject the null hypothesis.
11
2. We have the potato yield from 12 different farms. We know that the standard
potato yield for the given variety is µ=20,x = [21.5, 24.5, 18.5, 17.2, 14.5, 23.2, 22.1,
20.5, 19.4, 18.1, 24.1, 18.5]Test if the potato yield from these farms is significantly
better than the standard yield.
Solution:
Step 1: Define the Null and Alternate Hypothesis
H0: x̅ = 20
H1: x̅ > 20
n = 12. Since this is one sample T test, the degree of freedom = n-1 = 12-1 = 11.
Step 2: Calculate the Test Statistic (T)
1. Calculate sample mean
2. Calculate sample standard deviation
1.Substitute in the T Statistic formula
Step 3: Find the T-Critical

Confidence level = 0.95, alpha=0.05. For one tailed test, look under 0.05 column.
For d.o.f = 12 – 1 = 11, T-Critical = 1.796.
12
Step 4: Does it fall in rejection region?
Figure 3: Graph For One Tail Test
Since the computed T Statistic is less than the T-critical, it does not fall in the rejection region.
3. For example, imagine a company wants to test the claim that their batteries last
more than 40 hours. Using a simple random sample of 15 batteries yielded a mean
of 44.9 hours, with a standard deviation of 8.9 hours. Test this claim using a
significance level of 0.05.
13
4. According to the American Psychological Assoc., members with a doctorate and
a full-time teaching job earn, on the avg, $82,500 per year, with a sd of $6,000. An
investigator wishes to determine whether $82,500 is also the mean salary for all
female members with a doctorate and a full-time teaching appointment. Salaries
are obtained for a random sample of 100 women from this population, and mean
salary equals $80,100. A - What form should Ho and H1 take if the investigator is
concerned only about salary discrimination against female members? B - If this
hypothesis test supports the conclusion of the female salary discrimination a
costly suit will be initiated. Would you recommend using the .05 or the .01 level of
significance and why?
A) Let X- salary of female members with a doctorate and a full-time teaching

appointment.
Let X̅ =80100 be the average salary of n=100 women.Here, we have to test whether
$82,500 is also the mean salary for all female members or not?
So, the hypotheses statements are,
Ho:μ=$82,500
H1:μ≠$82,500(Two Tail)
Test statistics:
z= (X̅ −μ0) / (σ/√n)
z= (82,500−80,100) / (6,000/√100)
z= 4
Decision rule:
Reject the null if, p-value ≤α
Otherwise accept H0
We can use a z-distribution table, or using using Excel command- “=2*(1-NORMSDIST(4))”:
p-value=2P(z>4)=0.000063
Decision:
When α=0.05
p-value ≈0.000063＜α=0.05
or
14
When
α=0.01,
p-value ≈0.000063＜α=0.0100
Decision- Reject the null.
Conclusion:
There is sufficient evidence to say that there is salary discrimination against female
members.
(B) Given that the p-value is below both 0.05 and 0.01 Therefore, regardless of the
significance level chosen, the conclusion—that there is a pay difference which holds true for
both levels. Despite the fact that a lower significance level makes it more difficult to reject
the null hypothesis, a higher significance level makes it easier to do so.
5. Calculate a paired t test by hand for the following data:t test for paired sample
Step 1: Subtract each Y score from each X score.
15
Step 2: Add up all of the values from Step 1 then set this number aside for a
moment.
Step 3: Square the differences from Step 1.
Step 4: Add up all of the squared differences from Step 3.
16
Step 5: Use the following formula to calculate the t-score:
1. The “ΣD” is the sum of X-Y from Step 2.

2. ΣD2: Sum of the squared differences (from Step 4).
3. (ΣD)2: Sum of the differences (from Step 2), squared.
If you’re unfamiliar with the Σ notation used in the t test, it basically means to “add
everything up”. You may find this article useful: summation notation.
Step 6: Subtract 1 from the sample size to get the degrees of freedom. We have 11 items.
So 11 – 1 = 10.
Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6. But if you
don’t have a specified alpha level, use 0.05 (5%).
So for this example t test problem, with df = 10, the t-value is 2.228.
Step 8: In conclusion, compare your t-table value from Step 7 (2.228) to your calculated t-
value (-2.74). The calculated t-value is greater than the table value at an alpha level of .05.
In addition, note that the p-value is less than the alpha level: p <.05. So we can reject the
null hypothesis that there is no difference between means.
However, note that you can ignore the minus sign when comparing the two t-values as ±
indicates the direction; the p-value remains the same for both directions.
17
6. 256 visual artists were surveyed to find out their zodiac sign. The results were:
Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19),
Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the
hypothesis that zodiac signs are evenly distributed across visual artists.
Step 1: Make a table with columns for “Categories,” “Observed,” “Expected,”

“Residual (Obs-Exp)”, “(Obs-Exp)2” and “Component (Obs-Exp)2 / Exp.” Don’t worry
what these mean right now; We’ll cover that in the following steps.
Step 2: Fill in your categories. Categories should be given to you in the question.
There are 12 zodiac signs, so:
18
Step 3: Write your counts. Counts are the number of each items in each category in
column 2. You’re given the counts in the question:
Step 4: Calculate your expected value for column 3. In this question, we would expect
the 12 zodiac signs to be evenly distributed for all 256 people, so 256/12=21.333.
Write this in column 3.
Step 5: Subtract the expected value (Step 4) from the Observed value (Step 3) and place
the result in the “Residual” column. For example, the first row is Aries: 29-
19
21.333=7.667.
Step 6: Square your results from Step 5 and place the amounts in the (Obs-
Exp)2 column.
20
Step 7: Divide the amounts in Step 6 by the expected value (Step 4) and place those
results in the final column.
Step 8: Add up (sum) all the values in the last column.
This is the chi-square statistic: 5.094.
21
7. A group of secondary education student teachers were given 2 1/2 days of training in
interpersonal communication group work. The effect of such a training session on the
dogmatic nature of the student teachers was measured y the difference of scores on the
"Rokeach Dogmatism test given before and after the training session. The difference "post
minus pre score" was recorded as follows:
16, -5, 4, 19, -40, -16, -29, 15, -2, 0, 5, -23, -3, 16, -8, 9, -14, -33, -64, -33
Can we conclude from this evidence that the training session makes student
teachers less dogmatic (at the 5% level of significance) ?
This is of course the same example as before, where we incorrectly used the normal
distribution to compute the probability in the last step. This time, we will do it correctly,
which is fortunately almost identical to the previous case (except that we use TDIST instead
of NORMDIST):
 Null Hypothesis: there is no difference in dogmatism, i.e. mean = 0

 Alternative Hypothesis: dogmatism is different, i.e. mean not equal to 0
 Test statistics: sample mean = -10.9, standard deviation =
21.33, sample size = 20. Compute
T = (-10.9 - 0) / (21.33 / sqrt(20) ) = -2.28
 Rejection Region: We use Excel to compute p = TDIST(2.28, 19,

2) = 0.034, or 3.4%. That probability is less than 0.05 so we reject
the null hypothesis.
Note that in the previous section we (incorrectly) computed the probability p

to be 2.2%, now it is 3.4%. The difference is small, but can be significant in
special situations. Thus, to be safe:
 if N > 30 use the Z-Test based on the standard normal distribution

NORMSDIST as in the previous section
 if N < 30 use the T-Test based on the T-Distribution TDIST as in this section
22
8. Explain about the Analyze Sample Data ?
Using sample data, find the degrees of freedom, expected frequencies, test statistic,
and the P- value associated with the test statistic. The approach described in this section is
illustrated in the sample problem at the end of this lesson.
 Degrees of freedom. The degrees of freedom (DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of levels for one catagorical variable, and c is the number of
levels for the other categorical variable.
 Expected frequencies. The expected frequency counts are computed

separately for each level of one categorical variable at each level of the other
categorical variable. Compute r
* c expected frequencies, according to the following formula.
Er,c = (nr * nc) / n
where Er,c is the expected frequency count for level r of Variable A and level c of
Variable B, nr is the total number of sample observations at level r of Variable A, nc
is the total number of sample observations at level c of Variable B, and n is the
total sample size.
 Test statistic. The test statistic is a chi-square random variable (Χ2)

defined by the following equation.
Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count at level r of Variable A and level c of
Variable B, and Er,c is the expected frequency count at level r of Variable A and
level c of Variable B.
23
9. A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were
classified by gender (male or female) and by voting preference (Republican, Democrat, or
Independent). Results are shown in the contingency table below.
Voting Preferences
Ro
Republican w
Democrat tota
Independent l
Male 200 150 50 400
Femal 250 300 50 600

e
Colum 450 450 100 100
0
n
total
Is there a gender gap? Do the men's voting preferences differ significantly from the
women's preferences? Use a 0.05 level of significance.
Solution
The solution to this problem takes four steps:
(1) state the hypotheses,
(2) formulate an analysis plan,
(3) analyze sample data, and
(4) interpret results.
We work through those steps below:
State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.
H0: Gender and voting preferences are

independent. Ha: Gender and voting
preferences are not independent.
24
Formulate an analysis plan. For this analysis, the significance level is 0.05. Using sample
data, we will conduct a chi-square test for independence.
 Analyze sample data. Applying the chi-square test for independence to
sample data, we compute the degrees of freedom, the expected frequency
counts, and the chi-square test statistic. Based on the chi-square statistic and
the degrees of freedom, we determine the P-value.
DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2
Er,c = (nr * nc) / n
E1,1 = (400 * 450) / 1000 = 180000/1000 = 180

E1,2 = (400 * 450) / 1000 = 180000/1000 = 180
E1,3 = (400 * 100) / 1000 = 40000/1000 = 40
E2,1 = (600 * 450) / 1000 = 270000/1000 = 270
E2,2 = (600 * 450) / 1000 = 270000/1000 = 270
E2,3 = (600 * 100) / 1000 = 60000/1000 = 60
Χ2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40

+ (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/60
Χ2 = 400/180 + 900/180 + 100/40 + 400/270 +

900/270 + 100/60 Χ2 = 2.22 + 5.00 + 2.50 + 1.48
+ 3.33 + 1.67 = 16.2
where DF is the degrees of freedom, r is the number of levels of gender, c is the number of
levels of the voting preference, nr is the number of observations from level r of gender, nc is
the number of observations from level c of voting preference, n is the number of
observations in the sample, Er,c is the expected frequency count when gender is level r and
voting preference is level c, and Or,c is the observed frequency count when gender is level r
voting preference is level c.
The P-value is the probability that a chi-square statistic having 2 degrees of freedom
is more extreme than 16.2.
25
We use the Chi-Square Distribution Calculator to find P(Χ2 > 16.2) = 0.0003.
 Interpret results. Since the P-value (0.0003) is less than the significance level (0.05),
we cannot accept the null hypothesis. Thus, we conclude that there is a relationship
between gender and voting preference.
10. A certain group of welfare recipients receives SNAP benefits of $110 per week with a
standard deviation of $20. If a random sample of 25 people is taken, what is the probability
their mean benefit will be greater than $120 per week?
Step 1: Insert the information into the z-formula:
= (120-110)/20 √25 = 10/ (20/5) = 10/4 = 2.5.
Step 2: Look up the z-score in a table (or calculate it using technology). A z-score of
2.5 has an area of roughly 49.38%. Adding 50% (for the left half of the curve), we get
99.38%.
11. Conduct an F-Test on the following samples: Sample-1 having variance = 109.63, sample
size = 41.
Sample-2 having Variance = 65.99, sample size = 21.
Solution:
Step-1:- First write the hypothesis statements as:
H_0: No difference in variances.
H_a: Difference in variances.
Step-2:- Calculate the F-critical value. Here take the highest variance as the numerator and
the lowest variance as the denominator:
FValue=σ21σ22
FValue=109.6365.99
FValue=1.66
26
Step-3:- Calculate the degrees of freedom as:
The degrees of freedom in the table will be the sample size -1, so for sample-1 it is 40 and for
sample-2 it is 20.
Step-4:- Choose the alpha level. As, no alpha level was given in the question, so we may use
the standard level of 0.05. This needs to be halved for the test, so use 0.025.
Step-5:- We will find the critical F-Value using the F-Table. We will use the table with 0.025.
Critical-F for (40,20) at alpha (0.025) is 2.287.
Step-6:- Compare the calculated value to the standard table value. If our calculated value is
higher than the table value, then we may reject the null hypothesis. Here, 1.66 < 2 .287. So,
we cannot reject the null hypothesis.
12. Explain about Significance Level Definition

The level of significance refers to a constant probability of incorrect abolition of the
null hypothesis. It is mainly a Type I error probability that is predetermined by the
statistician before the collection of data, together with the outcomes of error. It refers to the
measurement of statistical significance when the null hypothesis is implicit to be established
or discarded. The level of significance helps to determine the statistical significance of the
result of the null hypothesis to be false. For the rejection of the null hypothesis, there should
be stronger evidence when the level of significance is low.
P-Value and Significance Level
 The level of significance can be said to be the value which is represented by the
Greek symbol α (alpha). Here, Level of significance = p (type I error) = α
 The less likely values of the observations are always farther from the mean value.
The results are claimed to be “significant at x%”.
 p-values are the probability of procuring an effect no less than as intense as the one
in the test data, assuming the null hypothesis to be true.
 For example, the significant value at 7% signifies that the p-values are less than 0.07
or p < 0.07. Correspondingly, when a result is significant at 2%, it means that p <
0.01.
When the null hypothesis is rejected, a type I error occurs. It can be a false positive too,
and they can be controlled only by defining an appropriate level of significance. For
research purposes, the 5 significance level is the most commonly determined level.
27
Lower p-value means a significant difference in the considered values from the
population value that was hypothesized in the beginning. The results are highly significant if
the p-value is very less, i.e. 0.05 as it is rarely practiced.
When measuring the level of statistical significance of the result, the researcher first
needs to evaluate the p-value. It defines the probability of isolating an outcome which
shows that the null hypothesis is true. If the p-value is less than the level of significance (α),
the null hypothesis is declined. If the p-value observed is equal to or greater than the
significance level α, then hypothetically, the null hypothesis is made customary. When in
real practice, the sample size is increased to check whether the significance level is
reached. In general practice, we consider p-value based upon the level of significance of
10%. As per the above assumption,
 If p > 0.1, the null hypothesis will not be considered as an assumption
 If p > 0.05 and ≤ 0.1, the null hypothesis has a chance of low assumption.
 If p > 0.01 and ≤ 0.05, the null hypothesis is strongly assumed
 If p ≤ 0.01, the null hypothesis is very significantly assumed.
The rejection rule of the null hypothesis is as follows:
 If p < α, then one must reject the null hypothesis
 If p > α, then one should not reject the null hypothesis
Rejection Region
The values of test static for which the null hypothesis is rejected is considered to be
the rejection region.
Non-Rejection Region
The set of all potential outcomes for which the null hypothesis is not rejected is
called the non-rejection region.
13. Explain about General Steps for an F Test
If you’re running an F Test, you should use Excel, SPSS, Minitab or some other kind
of technology to run the test. Why? Calculating the F test by hand, including variances, is
tedious and time-consuming. Therefore you’ll probably make some errors along the way.
If you’re running an F Test using technology (for example, an F Test two sample for
variances in Excel), the only steps you really need to do are Step 1 and 4 (dealing with the
null hypothesis). Technology will calculate Steps 2 and 3 for you.
1. State the null hypothesis and the alternate hypothesis.

28
2. Calculate the F value. The F Value is calculated using the formula F = (SSE1 –
SSE2 / m) / SSE2 / n-k, where SSE = residual sum of squares, m = number of
restrictions and k = number of independent variables.
3. Find the F Statistic (the critical value for this test). The F statistic formula is:
F Statistic = variance of the group means / mean of the within group variances.
You can find the F Statistic in the F-Table.
4. Support or Reject the Null Hypothesis.
14. Explain ANOVA F Test ?
The one-way ANOVA is an example of an f test. ANOVA stands for analysis of

variance. It is used to check the variability of group means and the associated variability in
observations within that group. The F test statistic is used to conduct the ANOVA test. The
hypothesis is given as follows:
H0H0: The means of all groups are equal.
H1H1: The means of all groups are not equal.
Test Statistic: F = explained variance / unexplained variance
Decision rule: If F > F critical value then reject the null hypothesis.
To determine the critical value of an ANOVA f test the degrees of freedom are given
by df1df1 = K - 1 and df1df1 = N - K, where N is the overall sample size and K is the
number of groups.
15. A research team wants to study the effects of a new drug on insomnia. 8 tests were
conducted with a variance of 600 initially. After 7 months 6 tests were conducted with a variance
of 400. At a significance level of 0.05 was there any improvement in the results after 7 months?
Solution: As the variance needs to be compared, the f test needs to be used.
H0H0 : s21=s22s12=s22
H1H1 : s21>s22s12>s22
n1n1 = 8, n2n2 = 6
df1df1 = 8 - 1 = 7
df2df2 = 6 - 1 = 5
s21s12 = 600, s22s22 = 400
The f test formula is given as follows:
F = s21s22s12s22 = 600 / 400
F = 1.5
Now from the F table the critical value F(0.05, 7, 5) = 4.88
29
As 1.5 < 4.88, thus, the null hypothesis cannot be rejected and there is not enough
evidence to conclude that there was an improvement in insomnia after using the new drug.
Answer: Fail to reject the null hypothesis.
16. Three sets of five mice were randomly selected to be placed in a standard maze but
with different color doors. The response is the time required to complete the maze as seen
below. Perform the appropriate analysis to test if there is an effect due to door color. (Use α
= 0.01) Color Time Red 9 11 10 9 15 Green 20 21 23 17 30 Black 6 5 8 14 7
Step 0 : Check Assumptions
Step 1 : Hypotheses H0: µ Red = µGreen = µBlack Ha: at least one inequality
Step 2 : Significance Level α = 0.01
Step 3 : Critical Value and Rejection Region Critical Value: Fα , df1 =k −1,df 2 ( ) =N − k =
F0.01, df1 = 2,df 2 ( ) =12 = 6.93 Reject the null hypothesis if F ≥ 6.93.
Step 4 : Construct the One-way ANOVA Table ( )
Ti 2 i=1 ni k ∑ = ( ) 9 +11+10 + 9 +15 2 5 + ( ) 20 + 21 + 23 +17 + 30 2 5 + ( ) 6 + 5 + 8
+14 + 7 2 5 = ( ) 54 2 5 + ( ) 111 2 5 + ( ) 40 2 5 = 3367.4 ( ) T 2 N = ( ) 9 +11 +10 + 9 +15 +
20 + 21+ 23 +17 + 30 + 6 + 5 + 8 +14 + 7 2 15 = ( ) 205 2 15 = 2801.6667 yij 2 j =1 ni ∑ i=1
k ∑ = 92 +112 +102 + 92 +152 + 202 + 212 + 232 +172 + 302 + 62 + 52 + 82 +142 + 72 =
3537 SSTr = ( ) Ti 2 i=1 ni k ∑ − ( ) T 2 N = 3367.4 − 2801.6667 = 565.7333 SSE = yij 2 j =1
30
ni ∑ i=1 k ∑ − ( ) Ti 2 i=1 ni k ∑ = 3537 − 3367.4 =169.6 Source df SS MS = SS/df F-
statistic p-value Treatments 2 565.7333 282.8667 20.0142 p-value < 0.001 Error 12
169.6000 14.1333 Total 14 735.3333
Step 5 : Decision Since 20.0142 ≥ 6.93 (p-value ≤ 0.01), we shall reject the null hypothesis.
Step 6 : State conclusion in words At the α = 0.01 level of significance, there exists enough
evidence to conclude that there is an effect due to door color.
17. Write the Test Procedure for TWO-way ANOVA test?
Steps involved in two-way ANOVA are:
Step 1 : In two-way ANOVA we have two pairs of hypotheses, one for treatments and one
for the blocks.
Framing Hypotheses
Null Hypotheses
H01: There is no significant difference among the population means of different
groups (Treatments)
H02: There is no significant difference among the population means of different Blocks
Alternative Hypotheses
H11: Atleast one pair of treatment means differs significantly
H12: Atleast one pair of block means differs significantly
Step 2 : Data is presented in a rectangular table form as described in the previous section.
Step 3 : Level of significance α.
Step 4 : Test Statistic
F0t (treatments) = MST / MSE
F0b (block) = MSB / MSE
31
To find the test statistic we have to find the following intermediate values.
v) Sum of Squares due to Error: SSE = TSS-SST-SSB
vi) Degrees of freedom
vii) Mean Sum of Squares
Step 5 : Calculation of the Test Statistic
ANOVA Table (two-way)
32
Step 6 : Critical values
Critical value for treatments = f(k-1,(m-1)(k-1)),α
Critical value for blocks = f(m-1, (m-1)(k-1)),α
Step 7 : Decision
For Treatments: If the calculated F0t value is greater than the corresponding critical
value, then we reject the null hypothesis and conclude that there is significant difference
among the treatment means, in atleast one pair.
For Blocks: If the calculated F0b value is greater than the corresponding critical value, then
we reject the null hypothesis and conclude that there is significant difference among the
block means, in at least one pair.
33
18.A reputed marketing agency in India has three different training programs for its
salesmen. The three programs are Method – A, B, C. To assess the success of the
programs, 4 salesmen from each of the programs were sent to the field. Their
performances in terms of sales are given in the following table.
Test whether there is significant difference among methods and among salesmen.
Solution:
Step 1 : Hypotheses
Null Hypotheses: H01 : μM1= μM2 = μM3 (for treatments)
That is, there is no significant difference among the three programs in their mean sales.
H02 : μS1 = μS2 = μS3 = μS4 (for blocks)
Alternative Hypotheses:
H11 : At least one average is different from the other, among the three programs.
H12 : At least one average is different from the other, among the four salesmen.
Step 2 : Data
Step 3 : Level of significance α = 5%
Step 4 : Test Statistic
Step-5 : Calculation of the Test Statistic
34
35
ANOVA Table (two-way)
Step 6 : Critical values
36
f(3, 6),0.05 = 4.7571 (for treatments)
f(2, 6),0.05 = 5.1456 (for blocks)
Step 7 : Decision
(i) Calculated F = 3.40 < f

0t = 4.7571, the null hypothesis is not rejected and we
(3, 6),0.05
conclude that there is significant difference in the mean sales among the three
programs.
(ii) Calculate F = 5.39 > f

0b = 5.1456, the null hypothesis is rejected and conclude
(2, 6),0.05
that there does not exist significant difference in the mean sales among the four
salesmen.
37

Fds Unit 4 FINSH

Uploaded by

Copyright:

Available Formats

Fds Unit 4 FINSH

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fds Unit 4 FINSH

Uploaded by

Copyright:

Available Formats

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

II YEAR / IV SEMESTER (B.Tech- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE)

S.SANTHI PRIYA, M.E., (AP/ AI&DS)

HOD PRINCIPAL CEO/CORRESPONDENT

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SENGUNTHAR COLLEGE OF ENGINEERING ,TIRUCHENGODE-637 205.

 T-Test For One Sample

 T-Test For Two Independent Samples

 T-Test For Two Related Samples.

 Two- Factor Experiments

 Introduction To Chi-Square Tests.

1.What is a one-sample t-test and when is it used?

2. What is the formula for a one-sample t-test?

3. What is the difference between one-sample t-test and paired t-test?

5.what is property of chi-square distribution ?

6. Difference between two tail and one tail ?

8.What is the sampling distribution of T?

9. How do you find the distribution of T?

10. How do you find the sample T?

1.What is a one-sample t-test and when is it used?

Figure 1: one sample test

3. What is the difference between one-sample t-test and paired t-test?

want to know only whether a difference exists, use a two-tailed test.

6. Difference between two tail and one tail ?

7. what is One Sample t-test formula ?

8.what is the sampling distribution of T?

Figure 2: formula for t distribution

The formula to calculate T-distribution (also popularly known as student's T-distribution) is

13. What is the procedure of t-test?

14. What is mean by Paired Vs Unpaired T-Test ?

 The two groups taken should be independent.

Characteristics of Paired T-Test:

15. What are the four steps in testing procedures?

17. How p-value is calculate?

18. What is a statistically significant p-value?

19. What is F-test in ANOVA?

21. What is the chi-square test write its formula?

is called chi-squared distribution. The formula for chi-square is:

χ^2 = ∑(O_i – E_i)^2/E_i

O_i = Observed value

E_i = Expected value

22. What is a chi-square test used for?

23.How do you interpret a chi-square test?

24.What is a good chi-square value?

A good chi-square value is assumed to be 5. As we know, for the chi-square

25. What is two-way ANOVA with example?

1.write Procedure to do One Sample T Test ?

H1: Sample mean (x̅ ) != Hypothesized Population mean (µ)

Step 2: Compute the test statistic (T)

Step 3: Find the T-critical from the T-Table

Step 1: Define the Null and Alternate Hypothesis

Step 2: Calculate the Test Statistic (T)

1. Calculate sample mean

2. Calculate sample standard deviation

1.Substitute in the T Statistic formula

Step 3: Find the T-Critical

For d.o.f = 12 – 1 = 11, T-Critical = 1.796.

Figure 3: Graph For One Tail Test

A) Let X- salary of female members with a doctorate and a full-time teaching

So, the hypotheses statements are,