Midterm Fall 2019
Midterm Fall 2019
Midterm Fall 2019
i to
MAE 301 Midterm Fall 2019
Instructions:
There are 100 points possible. I will provide points for correct ideas, even if calculations
Ed
are in error. If in doubt, provide rationale. Each of these problems is similar to those you
have done each week for homework, and they are in kind with those included on the
midterm practice test. You have the knowledge, just relax and show what you know.
1. Define the following terms and state why they are important for testing hypotheses
(15 pts, 5 points each (2 for correct definition, 3 for stating importance correctly))
DF
a. Central Limit Theorem
The CLT is the most important idea in all of statistics. It states that, if one repeatedly
samples, with replacement, many samples of the same size from the same population and
computes the mean: 1) the distribution of the sample means will approach a Normal
distribution as the number of samples increases, no matter what the shape of the parent
rP
population; 2) the mean of the sampling distribution will approach the population mean,
and 3) The standard deviation of the sampling distribution, the Standard Error of the Mean,
!
will get smaller as a function of .
√#
te
It is important in that we can know with certainty the shape of a sampling distribution if we
know its parameters. Using sample statistics to estimate parameters allows us to compare
statistics to the known probabilities associated with the Normal distribution. THIS allows
us to estimate the likelihood that a sample could have come from a population due to
as
random chance. Moreover, if our sample size is large, we are almost assured of more
precise estimates due to the relationship of the standard error of the mean with sample
size.
M
b. Power
Power is the probability that we will correctly reject a null hypothesis that is false. More
than just making a correct decision, this is the probability that Null IS false, AND that we
in
make the correct decision in our hypothesis test. In other words, it is 1-the probability of
making a Type II error.
It is important in that correctly rejecting the Null Hypothesis lends support to the
Alternative Hypothesis—our real reason for conducting the study. Power varies directly
ed
with sample size—the larger our sample, the greater the power we have to reject the null
hypothesis. But since sample size is costly, we generally want to find an optimal power for
a given Type I error rate when we plan our sampling procedure.
at
c. Conditional Probability
Conditional Probability is the probability that a certain outcome will happen, given another
outcome has already occurred.
re
1
r
i to
Symbolized p(A|B), conditional probability is used whenever we test hypotheses. If two
outcomes, A and B are independent—i.e., unrelated, then p(A|B) = p(A). The null
hypothesis is a statement of a conditional probability, so if p(A|B) is significantly different
from p(A), then we can reject the null hypothesis.
Ed
2. The measured probability of failure of an RFID chip to recognize radio signal
transmitted at 3 inches away is 0.03. A manufacturing facility randomly tests 20
RFID chips. If they found that 2 or more of the 20 failed the test, should they be
concerned? (10 pts).
DF
This is a binomial probability. We are looking to find the probability that 2 or more failed,
so we have to calculate a number of probabilities and add them together to get the
cumulative probability:
𝑛
𝑃(𝑋 = 𝑥|𝑝, 𝑛) = . / 𝑝 0 (1 − 𝑝)#30
𝑥
rP
=P(2) + P(3) + … + P(20)
Or (much easier)
te
= 1 – [P(0) + P(1)]
= 1 – [0.5438 + 0.3364]
= 1 – [0.8802]
= 0.1198
as
3. A researcher is studying the performance of her newly designed suspension system
for off-road vehicles in high speed turns. She tests a random sample of 15 vehicles,
equipped with the new suspension under controlled conditions. She gets a mean
M
turning radius of 6 meters, with a standard deviation of +/- 1.2 meters. She wants to
compare her sample to published reports of typical performance for off-road
vehicles. The published results show a mean turning radius of 7.5meters +/- 1 m.
in
a. Perform the appropriate test to answer her question. Use the confidence
interval approach, and interpret your results. Assume alpha is .05 and the
critical value of the test statistic (2-tailed) is 1.96 (10 pts).
This has only one sample, and she has a published mean and standard deviation, so this is a
ed
1-sample Z-test:
𝜎
95%𝐶𝐼 = 𝑥̅ ± 𝑍<
= √𝑛
at
1
95%𝐶𝐼 = 6 ± 1.96
√15
re
2
r
i to
95%𝐶𝐼 = 6 ± 0.5061
5.4939 ≤ 𝜇 ≤ 6.5061
Ed
Because the population mean, 𝜇 is not in the interval, we can reject the null hypothesis and
say the new suspension system made a difference, and has a shorter turning radius, on
average, than the published system.
4. Your capstone team wants to compare the quality of two different détente pins with
slightly different designs. You randomly sample a total of 40 pins, 20 from Type A
DF
and 20 from Type B and count the number of pins that fail versus those that hold up
under testing conditions. Your data is summarized in the following table:
Pin quality Company A Company B
Total
rP
Fail 7 10 17
Succeed 13 10 23
Total 20 20 40
a. Perform the appropriate test and determine the appropriate conclusion. The
te
critical value of the test statistics is 6.635, for alpha = .01 (20 pts).
This is a Chi-square test of independence. I have two, dichotomous variables: Company is
the independent variable, and Success is the dependent variable. The Null Hypothesis is
as
that Success/Failure rate is independent of Company.
First, I need to calculate my expected frequencies:
M
17 17 23 23
20 = 8.5; 20 = 8.5; 20 = 11.5; 20 = 11.5
40 40 40 40
RN[ VNX
=
(𝑓NOPQRSQT − 𝑓Q0UQVWQT )=
χ = LL
in
𝑓Q0UQVWQT
\Z! YZ!
χ= = 0.9207
Since 0.9207 is far less than the critical value of 6.635, we fail to reject the Null Hypothesis
that the two companies are independent with respect to success/failure rates.
at
re
3
r
i to
5. Your professor is testing the performance of his automatic tennis-ball throwing
machine for his dog. His hypothesis is that using a torsion spring will throw the
Ed
ball further than a coil spring, keeping all other design elements constant. He has
enough material for two samples of 15 machines each. Describe how he should
go about studying this problem.
a. State the null and alternative hypotheses (10 pts)
This is a one-tailed test (he wants the torsion spring to throw farther than the coil spring)
DF
𝐻_ : 𝜇WNRP\N# ≤ 𝜇VN\X
𝐻_ : 𝜇WNRP\N# > 𝜇VN\X
b. Show how the data should be analyzed (10 points)
Since we have two samples, and we are looking at distance, a continuous random variable,
the 2-sample t-test is the most appropriate. First, we would have to sample, randomly, from
rP
the population of coil springs and then from the population of torsion springs. Once these
have been sampled, he should install them in randomly selected ball throwers, OR install
one spring, test it, and then install another spring and test it if he only has one thrower. At
any rate, we have two samples of springs as our experimental conditions.
te
After the data have been collected he needs to calculate the sample statistics for both
samples and run a 2-sample t-test. The Type I error rate need not be too high because the
consequences of a false positive are not too egregious. I would choose 0.05.
as
𝑥̅! − 𝑥̅ =
𝑡(<,TcZ#d e#f 3=) =
1 1
𝑠U h(𝑛 + 𝑛 )
! =
M
If tcalculated is greater than tcritical (0.05, df=28), he should reject the null hypothesis in favor of the
alternative and have a beer because his torsion spring mechanism outperformed the coil
spring.
in
If, however, tcalculated is less than tcritical (0.05, df=28), he should have a beer to console himself
that his test did not reveal any significant improvement by using the torsion spring.
c. Suppose he only had 10 pieces per sample but got the same means and
ed
smaller than the original analysis, the denominator of the test will be larger, thus reducing
the overall scaled difference in means. Size matters!
re
4
r
i to
6. Suppose you are studying the relationship between cross sectional area of a beam
(in mm^2), and its bending moment (in Mpa). The scatterplot below shows data
beams tested. (15 pts):
Ed
DF
rP
Discuss the method you would go through to determine the best mathematical
relationship between area and bending moment for this sample.
This relationship looks quadratic, and I remember back from beam theory, that it is in fact
te
quadratic, so I need to transform my data. If I didn’t already know it was quadratic, the
shape of the scatterplot makes me worry a bit: it appears both curved, and heteroscedastic.
In that case, I would run a linear regression, and examine my residuals for violations of the
as
assumptions of normality, and random residuals.
The Null hypothesis I am testing is that B1 is equal to zero (that there is no one-to-one
relationship between area and bending moment). The alternative is that there IS a one-to-
one relationship that is linear in its parameters. This is a pretty standard analysis, and I
M
don’t really have any context behind it (thanks, Middleton…), so I will just use a default
Type I error rate of 0.05—I don’t need to be conservative because the consequences of
making a Type I error rate are not severe.
in
So now I will transform the data using y=sqrt(y). This should linearize it. I will then do a
scatterplot of the transformed data to make sure.
Next, if the scatterplot looks reasonably linear, I will perform Least Squares Regression on
ed
𝛽! =
∑#\Z!(𝑥\ − 𝑥̅ \ )=
This gives me my parameters for a line of best fit:
re
5
r
i to
𝑦m\ = 𝛽_ + 𝛽! 𝑥
Once I have this, I can compute the residuals and perform a residual analysis:
Ed
I first plot the residuals against the independent variable, cross-sectional area. If there is a
distinct pattern, I assume my model is incorrect. If there is no distinct pattern, I can assume
that I have little heteroscedasticity. Then I run a Q-Q plot to see if the data are normally
distributed. This is a plot of predicted quantiles of the data if they fit a normal distribution,
with the actual quantile values. They should all line on a straight line. If they show severe
displacement from the line, I would assume the data are not normally distributed and
DF
attempt to perform a transformation to make a better model, or to linearize the data.
Now I can compute R2, to assess my goodness of fit.
∑#\Z!(𝑦m\ − 𝑦j)= 𝑆𝑆RQpRQPP\N#
𝑅= = # =
rP
∑\Z!(𝑦\ − 𝑦j)= 𝑆𝑆WNWqX
I want the proportion of variation the analysis explains to be high, maybe 0.80 or higher.
Then I perform a t-test of the slope to determine if the relationship modeled is significantly
te
different from that expected due to random chance.
𝛽! 𝛽!
𝑡(#3=,<) = =
𝑆𝐸sd
as
∑ # =
1 (𝑦 − 𝑦m\ )
t ∙ \Z! \
(𝑛 − 2) ∑#\Z!(𝑥\ − 𝑥̅ \ )=
if t is greater than the critical value of t at 𝑡(#3=,<) , then I reject the null hypothesis that
M
EXTRA CREDIT (5 points)
What are the assumptions of the 2-sample t-test?
ed
6
r
i to
Formulas for Midterm
Sample Statistics
∑w
vxd 0v
𝑥̅ = Effect Size
#
Ed
= ∑w
vxd(0v 30̅ )
f
𝑠 = 0̅ 30̅
#3! 𝑑 = dP f
∑w
vxd(0v 30̅ )
f ™
𝑠=h
#3!
DF
Probability F-test of Equal Variances
𝑃(~𝑥) = 1 − 𝑃(𝑥)
Pf
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) 𝐹(#d3!,#f 3!) = Pdf
f
rP
•(€∩•)
𝑃(𝐵 |𝐴) = •(€)
Normal Distribution
!
†(‰†‡)f
𝑃(𝑥 ) = Š√=‹ 𝑒 f•f
†•f
te
Binomial Distribution !
𝑝( 𝑧 ) = 𝑒 f
√=‹
𝜇 = 𝑛𝑝
𝜎 = 𝑛𝑝(1 − 𝑝) 0 3ˆ
𝑛 𝑧\ = vŠ for a population, and
as
𝑃(𝑋 = 𝑥|𝑝, 𝑛) = . / 𝑝 0 (1 − 𝑝)#30
𝑥 𝑧\ =
0v 30̅
for a sample.
𝑛 #! P
. / = 0!(#30)!
𝑥 𝑍=
0̅ 3ˆ
•
√w
M
Š
Hypergeometric Distribution 𝑆𝐸0̅ =
√#
𝜇 = 𝑛𝑝
Š
Two-tailed: 𝐶𝐼 = 𝑥̅ ± 𝑍•
f √#
#U(!3U)(ƒ3#)
in
𝜎=h (ƒ3!)
One-tailed (right): 𝐶𝐼 ≤ 𝑥̅ +
Š
… ƒ3…
. /. /
𝑍•
0 #30 f √#
𝑝(𝑋 = 𝑥|𝑛, 𝑁, 𝑆) = Š
ƒ
. / One-tailed (left): 𝐶𝐼 ≥ 𝑥̅ − 𝑍•
# f √#
ed
Poisson Distribution X2 Test of Independence
𝜇 = 𝑛𝑝
RN[ VNX
(𝑓NOPQRSQT − 𝑓Q0UQVWQT )=
at
=
𝜎0 = √𝜇 χ = LL
𝑓Q0UQVWQT
\Z! YZ!
Q †‡ˆ ‰
𝑃(𝑥|𝜇 ) = 0! 𝑑𝑓 = (𝑟𝑜𝑤 − 1)(𝑐𝑜𝑙 − 1)
re
7
r
i to
t Distribution Simple Linear Regression
One-Sample t-test 𝑦\ = 𝛽_ + 𝛽! 𝑥\ + 𝜀\
0̅ 3ˆ
𝑡(<,TcZ#3!) =
Ed
√w
𝑠 𝑦m\ = 𝛽_ + 𝛽! 𝑥
1 − 𝛼%𝐶𝐼 = 𝑥̅ ± 𝑡(<,TcZ#3!)
√𝑛
𝜀\ = (𝑦\ − 𝑦m)=
2-Sample t-test
0̅ d 30̅ f
𝑡(<,TcZ#de#f3=) = d d 𝑆𝑆› = ∑#\Z!(𝑦\ − 𝑦m)=
P™ h( e )
wd wf
DF
(#d 3!)Pdf e(#f 3!)Pff 𝑦j = 𝛽_ + 𝛽! 𝑥̅
𝑠U= =
#d e#f 3=
(#d 3!)Pdf e(#f 3!)Pff ∑w
vxd(0v 30̅ v )(œv 3œ
jv )
𝑠U = h 𝛽! = ∑w f
vxd(0v 30̅ v )
#d e#f 3=
𝑆𝐸0̅ = 𝑠U h# + #
! !
rP
d f
𝑆𝑆WNWqX = 𝑆𝑆RQpRQPP\N# + 𝑆𝑆›
! !
1 − 𝛼%𝐶𝐼 = 𝑥̅! − 𝑥̅ = ± 𝑡(<,TcZ#d e#f3=) 𝑠U h(# + # ) 𝑆𝑆WNWqX = ∑#\Z!(𝑦\ − 𝑦j)=
d f
Paired-sample t-test 𝑆𝑆RQpRQPP\N# = ∑#\Z!(𝑦m\ − 𝑦j)=
te
Tj
𝑡(<,T cZ#¦3!) = ¦
§w¦
∑#\Z!(𝑦\ − 𝑦j)= = ∑#\Z!(𝑦m\ − 𝑦j)= + ∑#\Z!(𝑦\ − 𝑦m\ )=
as
𝛽! 𝛽!
𝑡(#3=,<) = =
𝑆𝐸sd #
1 ∑ (𝑦\ − 𝑦m\ ) =
t ∙ \Z!
(𝑛 − 2) ∑#\Z!(𝑥\ − 𝑥̅ \ )=
M
= ∑w mv 3œj)f
vxd(œ ……•žŸ•ž v¡w
𝑅 = ∑w =
j)f
vxd(œv 3œ ……¢¡¢£¤
in
ed
at
re