Practice Final
Practice Final
Practice Final
11-12. A machining process produces screws for which the population proportion of screws with
defective threads is 0.10. A new technology-enhanced process has become available and the developer
claims that the population proportion of screws with defective threads will be less than 0.10. To test the
validity of this claim, 900 screws produced by the new process are selected at random and the sample
proportion of defective is computed. This result will be used to make a decision as to whether or not the
manufacturer should invest in the new process.
11. Which of the following should be used as the null and alternative hypotheses?
12. If a type II error is made in carrying out this test, what are the consequences to the manufacturer?
(a) They will purchase the new process when it is no better than the current process being used
(b) They will not purchase the new process when it is no better than the current process being used
(c) They will purchase the new process when it is an improvement over the process currently being
used
(d) They will not purchase the new process when it is an improvement over the process currently
being used
13-14 A chemical plant is allowed to dispose a certain amount of waste water into a river. The EPA takes
samples of the water in the river at several locations downstream from the plant. If the plant is in
compliance with the EPA regulations, the population mean amount of toxic particles in the water is µ=50
parts per million (ppm). Each day, the EPA selects 36 water samples and computes the average amount of
toxic particles, and then uses this information to determine if they should conclude that the population
mean amount of toxic particles is greater than 50 ppm. If so, the owner of the plant will be fined. Use the
hypothesis 𝐻0 : µ ≤ 50 ppm, and 𝐻𝑎 : µ > 50 ppm.
13. The manager of the plant has been asked by the EPA which value for α would be acceptable for the
test described above: α=0.01, 0.025, or 0.05. The manager tells you that he would like keep the chance of
being fined at a minimum. What would your advice to the manager be?
(a) Use α=0.01 (b) Use α=0.05
(c) The probability of being fined is not affected by the choice of α.
(d) There is no chance of being fined as long as the plant is in compliance with the EPA regulations.
14 .The EPA informs the manager that they will use α=0.025. Given this decision, the manager asks the
EPA to increase the number of water samples selected each day to 64. What affect will this increase in
sample size have?
(a) The probability of a Type I error will move closer to a value of 0.01
(b) The probability of a Type II error will move closer to a value of 0.05
(c) The probability of a Type I error will be unchanged
(d) The probability that H0 is true will increase.
15. A manufacturer packages cement into bags labeled to weigh 50 kilograms. Each hour at the packaging
plant, a laborer pulls a bag from those just filled and weighs the bag on a scale to determine the net
weight. The following figure summarizes the measured weights for 200 bags.
Weight (kg)
Mean 51.0
Based on the information shown in the figure, the production manager should conclude that the process
that fills the bags is
16-18. A movie studio is developing a film. The film will be either an action film or a mystery-suspense
thriller. To help come to a decision, the studio gathered data reporting the worldwide sales (in millions of
US dollars) for recent films in the two categories. Analyses of these data follow.
Means and Std Deviations
Film Type n Mean Std Dev SE Mean Lower 95% Upper 95%
t Test
16. If we accept that these data are simple random samples from the populations of action and
mystery/suspense films, and take the significant level to be 0.05, then the studio should conclude that
17. The p-value shown in the output indicates that, if we accept the assumptions of the shown statistical
analysis, there is a
(a) 5% chance for a Type I error when using the associated interval.
(b) 4% chance for a Type I error if we reject H0 for this t-statistic.
(c) 96% chance for a Type II error when using this t-statistic.
(d) 4% chance that the means of the two populations are equal.
18. Had the US dollar amounts been converted to HK dollars (using an exchange rate of 7.8 HK dollars
per US dollar), then the 95% confidence interval for the mean difference in earnings (a – m) would have
been
19-20. At an assembly plant which produces computer chips, each worker is responsible for the entire
assembly of a chip. The production manager would like to compare the workers productivity (the number
of chips produced per month) in two different work environments, A and B. A group of 50 experienced
workers is selected at random and placed in environment A for a month and the sample mean number of
chips produced is determined. The sample 50 workers are then placed in environment B for a month and
again the sample mean number of chips produced is determined. The manager would then like to use
these results to determine if there is statistically significant evidence that the population mean number of
chips is different for these two working environments. The results are given as follows: ̅̅̅ 𝑥𝐴 = 150.2 chips,
𝑥𝐵 = 151.3 chips, 𝑠𝑑 =7.4 chips.
̅̅̅
19. Determine a 95% confidence interval for the difference in the population mean number of chips
(µ𝐴 − µ𝐵 ) produced in each environment.
(a) [-3.2, 1.0], (b) [-4.2, 2.0], (c) [ -2.15, 0.05], (d) [-1.4, -0.8]
20. Suppose that a 98% confidence interval for (µ𝐴 − µ𝐵 ) is [-3.5, 1.3]. Which of the following is an
appropriate interpretation of this result?
(a) Based on this result, we should conclude that µ𝐴 = µ𝐵 .
(b) There is statistically significant evidence that the two population means are different.
(c) There is only 2% chance that the difference in the two population means is outside of this
interval.
(d) We have not observed a statistically significant difference in the two sample means.
Answers:
Adacc
Bbaad
Part II: Regression
Ans: b
observations. Then
A. RMSE is around 0.
B. RMSE is around 40
Answer: C
3-4. An insurance agent has selected a sample of drivers that she insures whose age in the range from 16
to 42 years. For each driver, she records the age of the driver and the dollar amount of claims that the
driver filed in the previous 12 months. A scatterplot showing the dollar amount of claims as the response
variable and the age as the predictor shows a linear trend. The least squares regression line is determined
to be: 𝑦̂ = 3715 − 75.4𝑥 . A plot of the residuals versus age of the drivers showed no pattern, and the
following were reported: 𝑟 2 = 0.822, Standard deviation of the residuals 𝑆𝑒 =312.1.
Ans: (a)
4. Which of the following is not correct
(a) 82.2% of the variation in the dollar amounts of claims is explained by the age of the driver.
(b) The correlation 𝑟 between the response variable and the predictor is 0.907
(c) If the histogram of the residuals is symmetric around zero and bell-shaped, then about 68% of the
dollar amounts of claims are within 312.1 dollars of the regression line.
(d) A driver in the data set whose age is 25 years had a residual of -$150 using the fitted line above;
this means his dollar amount of claims is $1680.
Ans: (b)
5-7. Suppose that in the population the annual salary (Salary) of a CEO i measured in million dollars is
related to the annual sales of the company (Sales) measured in million dollars according to the following
regression model:
5. What is the standard deviation CEO salaries in million dollars for CEOs of firms with annual
6. What is the expected difference in million dollars between the salary of CEO of a firm with five
million dollars in annual sales and the CEO of a firm with annual sales of eight million dollars?
7. What is the probability the salary of CEO is greater than 7 million dollars if the sales is 10 million
dollars?
Answer (d)
a) A linear model is okay because the association between the two variables is fairly strong.
Ans: d
9-10. Use the following to answer questions (26) and (27). Suppose you run a linear least squares
regression of Y on X. The estimated regression line is 𝑌̂ = 3 + 2𝑋. The t-statistic for testing the null
hypothesis that 𝛽1 = 1 is 3.
9. You get an additional data point with X = 2 and Y = 7 and run the regression again including the new
data point. What happens to the estimated slope coefficient?
A. It increases.
B. It decreases.
Answer (c)
10. What happens to the standard error of regression in the new regression run using the new data point
relative to the standard error of regression in the original regression?
A. It increases.
B. It decreases.
C. It remains the same.
Answer (c)
B. the data suggests that the predictor x is not helpful in predicting the response y.
Answer: D
Linear Fit
Summary of Fit
RSquare 0.091185
Root Mean Square Error 39.41486
Mean of Response 187.2359
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
12. According to the fitted model, the average price per square foot for a house of 2500 square feet is
approximately
A. $160
B. $180
C. $200
D.$ 220
Answer: B
13. If a realtor want to provide to a potential buyer a rough price range of a house of 2500 square feet,
then it is
Answer: A
14. The information given in the parameter estimate table about the slope implies that
B. Fixed costs per square foot are significantly different from zero.
Answer: A
15. Suppose the price per square foot of similar houses in Hong Kong is around HK$1500 per square foot
(1$ = 7.8HK$). Then
Answer: B
16. The following results were obtained from a simple regression analysis:
r2 = 0.6744
s2 = 0.2934
For each unit change in the independent variable x, the estimated change in the mean value of the
dependent variable y is equal to
(a) A predicted value of the response variable computed from a value of the explanatory variable that
is outside the range of values used to determine the regression equation
(b) A predicted value of the response variable that has a very small residual
(c) An observation in the data that stands apart from the rest of the data
(d) An outlier that is near the minimum or maximum value of the explanatory variable
Ans: (d)
Answer: C
20-23. The following regression output of the sales prices of 100 homes on the square footage and number of
bedrooms. The price is measured in 1000's of dollars. Square footage is measured in 1000's of square feet.
Answer (a)
Answer (c)
22. Test if the beta coefficient of the variable “Number of Bedrooms” is negative by
significance level.
Answer (d)
23. Compute the 95% confidence interval for the beta coefficient of the variable “square footage”.
(a) [3.669, 5.023] (b) [3.722, 5.902] (c) [3.555, 6.089] (d) [3.555, 5.023]
Answer (b)
24 – 26. The scatter plot of sales (y) of half-gallon orange juice versus the price (x) is given below. We
apply log transform on both y and x to fit the nonlinear pattern.
Summary of Fit
RSquare 0.755335
RSquare Adj 0.750238
Root Mean Square Error 0.385788
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
26. The optimal price equals (C × elasticity) /(1+ elasticity), where C equals the cost. Suppose the cost of
a half-gallon juice is $1.5. Then the 95% confidence interval for optimal price is
Answer: A
27. The heights (y) of 50 men and their shoes sizes (x) were obtained. The variable height is measured in
centimetres (cm) and the shoe sizes of these 14 men ranged from 8 to 11. From these 50 pairs of
observations, the least squares regression line predicting height from shoe size was computed to be 𝑦̂ =
130455 + 4.7498𝑥. What height would you predict for a man with a shoe size of 13?
(a) 130.46cm (b) 192.20cm (c) 182.70cm (d) I would not use this regression line to predict the height of
a man with a shoe size of 13.
Answer (d).
28-31. A large national bank charges local companies for using their services. A bank official reported
the results of a regression analysis designed to predict the bank's charges (Y) -- measured in dollars per
month -- for services rendered to local companies. One explanatory variable used to predict service
charge to a company is the company's sales revenue (X) -- measured in millions of dollars. Data for 21
companies who use the bank's services were used to fit the model. The results of the simple linear
regression are provided below.
𝑦̂ = - 2,700 + 20 x, RMSE =65, p-value for testing 𝛽1 = 0 is 0.034.
c) About 95% of the observed service charges fall within $2,700 of the least squares line.
d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700.
Ans: b
29. Interpret the estimate of 𝜎є , the standard deviation of the error term in the model.
a) About 95% of the observed service charges fall within $65 of the least squares line.
b) About 95% of the observed service charges equal their corresponding predicted values.
c) About 95% of the observed service charges fall within $130 of the least squares line.
d) For every $1 million increase in sales revenue, we expect a service charge to increase $65.
Ans: c
a) There is sufficient evidence (at the α = 0.05) to conclude that sales revenue (X) is a useful linear
predictor of service charge (Y).
b) There is insufficient evidence (at the α = 0.10) to conclude that sales revenue (X) is a useful linear
predictor of service charge (Y).
c) Sales revenue (X) is a poor predictor of service charge (Y).
d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034.
Ans: a
31. A 95% confidence interval for 𝛽1 is [15, 30]. Interpret the interval.
a) We are 95% confident that the mean service charge will fall between $15 and $30 per month.
b) We are 95% confident that the sales revenue (X) will increase between $15 and $30 million for
every $1 increase in service charge (Y).
c) We are 95% confident that average service charge (Y) will increase between $15 and $30 for
every $1 million increase in sales revenue (X).
d) At the α= 0.05 level, there is no evidence of a linear relationship between service charge (Y) and
sales revenue (X).
Ans: c
32-33. A medium-sized business has a policy that keeps its weekly advertising budget within the range
from $2000 to $6000. The marketing manager has collected data from a sample of weeks, recording the
amount spent on advertising (ADV) and the revenue (REV) for each week. The amounts spent on
advertising are recorded in thousands of dollars (for example, an actual amount of $3500 corresponds to
ADV = 3.5). Revenue amounts are in actual dollars. After examining the data, the manager decides to use
a (natural) log transformation on both variables in order to derive a regression line. The log-log equation
is determined to be: 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 log 𝑅𝐸𝑉 = 6.8 + 2.2 log 𝐴𝐷𝑉
32. In the context of this application, elasticity refers to which of the following
(b) How the absolute change in REV relates to the absolute change in ADV
(d) The fact that the original data collected for revenue and advertising dollars did not meet the linear
condition for regression
Ans: (a)
33. What percentage increase in revenue would be predicted for a 0.5% increase in dollars spent on
advertising?
(a) 1.1% (b) 1.78% (c) 0.354% 0.72%
Ans: (a)
(a) how small percentage changes in x are associated with small percentage changes in y
(b) how small percentage changes in y are associated with small percentage changes in x.
Ans: (a)
35-36. It is believed that, the average numbers of hours spent studying per day (HOURS) during
undergraduate education should have a positive linear relationship with the starting salary (SALARY,
measured in thousands of dollars per month) after graduation. Given below is the output from regressing
starting salary on number of hours spent studying per day for a sample of 51 students.
R Square 0.7845
Observations 51
35. What's the value of the t-test statistic to test whether average SALARY depends linearly on HOURS?
Ans: d
36. The 90% confidence interval for the average change in SALARY (in thousands of dollars) as a result
of spending an extra hour per day studying is
a) wider than [-2.70, -1.09].
Ans: d
37-40 A large company employs several thousand people in the manufacture of keyboards, equipment cases,
and cables for the small-computer industry. The personnel manager of the company would like to find ways
to forecast the absentee rate among the company employees. An effective method of forecasting would
greatly strengthen the ability to plan properly. He took a sample of 40 employees and recorded the number of
absent days (Y) during the last fiscal year along with employee age (X). The computer output of a regression
analysis is as follows.
37. An employee, John, is 30 years old. According to the regression equation, what is his expected number of
absent days in the coming fiscal year?
Answer (a)
38. Test the regression coefficient 𝛽1 of age is larger than 0.2 using 5% significance level.
39. Find a 95% confidence interval for the regression coefficient of age.
(a) [0.1056, 0.4286] (b) [0.2006, 0.3104] (c)[0.1979, 0.3096 ] (d) [0.0056, 0.4286]
Answer (c)
40. The sample mean and sample standard deviation for age are 37.87 and 10.39, respectively. Find a 95%
confidence interval for the mean absent days of 30 years old employees.
(a) [2.907, 3.773] (b) [2.907, 4.025] (c) [1.998, 4.025] (d) [1.998, 3.773]
Answer(a)