Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Practice Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Part I: hypothesis test

11-12. A machining process produces screws for which the population proportion of screws with
defective threads is 0.10. A new technology-enhanced process has become available and the developer
claims that the population proportion of screws with defective threads will be less than 0.10. To test the
validity of this claim, 900 screws produced by the new process are selected at random and the sample
proportion of defective is computed. This result will be used to make a decision as to whether or not the
manufacturer should invest in the new process.

11. Which of the following should be used as the null and alternative hypotheses?

(a) H0: p ≥ 0.10 versus Ha: p < 0.10


(b) H0: p ≤ 0.10 versus Ha: p > 0.10
(c) H 0 : pˆ  0.10 versus H a : pˆ  0.10
(d) H 0 : pˆ  0.10 versus H a : pˆ  0.10

12. If a type II error is made in carrying out this test, what are the consequences to the manufacturer?
(a) They will purchase the new process when it is no better than the current process being used
(b) They will not purchase the new process when it is no better than the current process being used
(c) They will purchase the new process when it is an improvement over the process currently being
used
(d) They will not purchase the new process when it is an improvement over the process currently
being used

13-14 A chemical plant is allowed to dispose a certain amount of waste water into a river. The EPA takes
samples of the water in the river at several locations downstream from the plant. If the plant is in
compliance with the EPA regulations, the population mean amount of toxic particles in the water is µ=50
parts per million (ppm). Each day, the EPA selects 36 water samples and computes the average amount of
toxic particles, and then uses this information to determine if they should conclude that the population
mean amount of toxic particles is greater than 50 ppm. If so, the owner of the plant will be fined. Use the
hypothesis 𝐻0 : µ ≤ 50 ppm, and 𝐻𝑎 : µ > 50 ppm.

13. The manager of the plant has been asked by the EPA which value for α would be acceptable for the
test described above: α=0.01, 0.025, or 0.05. The manager tells you that he would like keep the chance of
being fined at a minimum. What would your advice to the manager be?
(a) Use α=0.01 (b) Use α=0.05
(c) The probability of being fined is not affected by the choice of α.
(d) There is no chance of being fined as long as the plant is in compliance with the EPA regulations.
14 .The EPA informs the manager that they will use α=0.025. Given this decision, the manager asks the
EPA to increase the number of water samples selected each day to 64. What affect will this increase in
sample size have?
(a) The probability of a Type I error will move closer to a value of 0.01
(b) The probability of a Type II error will move closer to a value of 0.05
(c) The probability of a Type I error will be unchanged
(d) The probability that H0 is true will increase.

15. A manufacturer packages cement into bags labeled to weigh 50 kilograms. Each hour at the packaging
plant, a laborer pulls a bag from those just filled and weighs the bag on a scale to determine the net
weight. The following figure summarizes the measured weights for 200 bags.

Weight (kg)

Mean 51.0

Std Dev 0.5

Based on the information shown in the figure, the production manager should conclude that the process
that fills the bags is

(a) Filling bags as designed.


(b) Unstable over time because the weights start out low, rise, and then fall.
(c) Over-filling the bags on average by a statistically significant amount
(d) Under-filling the bags on average

16-18. A movie studio is developing a film. The film will be either an action film or a mystery-suspense
thriller. To help come to a decision, the studio gathered data reporting the worldwide sales (in millions of
US dollars) for recent films in the two categories. Analyses of these data follow.
Means and Std Deviations

Film Type n Mean Std Dev SE Mean Lower 95% Upper 95%

Action 56 350.180 164.290 21.954 306.18 394.18

MysterySuspense 29 287.883 108.975 20.236 246.43 329.33

t Test

Difference -62.30 t Ratio -2.079

Std Err Dif 29.97 DF 77.8

Prob > |t| 0.040

16. If we accept that these data are simple random samples from the populations of action and
mystery/suspense films, and take the significant level to be 0.05, then the studio should conclude that

(a) there is not a statistically significant difference in average worldwide sales.


(b) action films earn significantly higher average worldwide sales.
(c) mystery/suspense films earn significantly higher average worldwide sales.
(d) the variation in the groups is too different to allow comparison of means.

17. The p-value shown in the output indicates that, if we accept the assumptions of the shown statistical
analysis, there is a

(a) 5% chance for a Type I error when using the associated interval.
(b) 4% chance for a Type I error if we reject H0 for this t-statistic.
(c) 96% chance for a Type II error when using this t-statistic.
(d) 4% chance that the means of the two populations are equal.

18. Had the US dollar amounts been converted to HK dollars (using an exchange rate of 7.8 HK dollars
per US dollar), then the 95% confidence interval for the mean difference in earnings (a – m) would have
been

(a) 18 million to 953 million.


(b) 474 million to 498 million.
(c) 252 million 720 million.
(d) Cannot be determined from the shown results.

19-20. At an assembly plant which produces computer chips, each worker is responsible for the entire
assembly of a chip. The production manager would like to compare the workers productivity (the number
of chips produced per month) in two different work environments, A and B. A group of 50 experienced
workers is selected at random and placed in environment A for a month and the sample mean number of
chips produced is determined. The sample 50 workers are then placed in environment B for a month and
again the sample mean number of chips produced is determined. The manager would then like to use
these results to determine if there is statistically significant evidence that the population mean number of
chips is different for these two working environments. The results are given as follows: ̅̅̅ 𝑥𝐴 = 150.2 chips,
𝑥𝐵 = 151.3 chips, 𝑠𝑑 =7.4 chips.
̅̅̅

19. Determine a 95% confidence interval for the difference in the population mean number of chips
(µ𝐴 − µ𝐵 ) produced in each environment.
(a) [-3.2, 1.0], (b) [-4.2, 2.0], (c) [ -2.15, 0.05], (d) [-1.4, -0.8]

20. Suppose that a 98% confidence interval for (µ𝐴 − µ𝐵 ) is [-3.5, 1.3]. Which of the following is an
appropriate interpretation of this result?
(a) Based on this result, we should conclude that µ𝐴 = µ𝐵 .
(b) There is statistically significant evidence that the two population means are different.
(c) There is only 2% chance that the difference in the two population means is outside of this
interval.
(d) We have not observed a statistically significant difference in the two sample means.

Answers:
Adacc
Bbaad
Part II: Regression

1. The slope (𝑏1 ) represents

(a) predicted value of Y when X = 0.

(b) the estimated average change in Y per unit change in X.

(c) the predicted value of Y.

(d) variation around the line of regression.

Ans: b

2.. The residual scatter plot on the right consists of 104

observations. Then

A. RMSE is around 0.

B. RMSE is around 40

C. RMSE is around 25.

D. RMSE cannot be estimated from the plot.

Answer: C

3-4. An insurance agent has selected a sample of drivers that she insures whose age in the range from 16
to 42 years. For each driver, she records the age of the driver and the dollar amount of claims that the
driver filed in the previous 12 months. A scatterplot showing the dollar amount of claims as the response
variable and the age as the predictor shows a linear trend. The least squares regression line is determined
to be: 𝑦̂ = 3715 − 75.4𝑥 . A plot of the residuals versus age of the drivers showed no pattern, and the
following were reported: 𝑟 2 = 0.822, Standard deviation of the residuals 𝑆𝑒 =312.1.

3. Which of the following is correct


(a) If the age of a driver increases from 20 to 21, the dollar amount of claims is predicted to be
decreased by $75.4
(b) If the age of a driver increases by one year, the dollar amount of claims is predicted to be
increased by $3715
(c) One can use the least squares regression line to obtain a reliable prediction of the dollar amount
of claims for a driver whose age is 55 years
(d) The dollar amount of claims for a driver of 10 years old is expected to be $ 2961.

Ans: (a)
4. Which of the following is not correct
(a) 82.2% of the variation in the dollar amounts of claims is explained by the age of the driver.
(b) The correlation 𝑟 between the response variable and the predictor is 0.907
(c) If the histogram of the residuals is symmetric around zero and bell-shaped, then about 68% of the
dollar amounts of claims are within 312.1 dollars of the regression line.
(d) A driver in the data set whose age is 25 years had a residual of -$150 using the fitted line above;
this means his dollar amount of claims is $1680.

Ans: (b)

5-7. Suppose that in the population the annual salary (Salary) of a CEO i measured in million dollars is
related to the annual sales of the company (Sales) measured in million dollars according to the following
regression model:

5. What is the standard deviation CEO salaries in million dollars for CEOs of firms with annual

sales of five million dollars?

(a) 9 (b) 10 (c) 19 (d) 13.45


Answer (a)

6. What is the expected difference in million dollars between the salary of CEO of a firm with five
million dollars in annual sales and the CEO of a firm with annual sales of eight million dollars?

(a) -0.3 (b) -0.5 (c) 8 (d) 5


Answer (a)

7. What is the probability the salary of CEO is greater than 7 million dollars if the sales is 10 million
dollars?

(a) 0.0438 (b) 0.1110 (c) 0.3890 (d) 0.4562

Answer (d)

8. The residual plot for a linear regression model is shown below.


Which of the following is true?

a) A linear model is okay because the association between the two variables is fairly strong.

b) The linear model is no good because the correlation is near 0.

c) The linear model is no good because some residuals are large.

d) The linear model is no good because of the curve in the residuals.

Ans: d

9-10. Use the following to answer questions (26) and (27). Suppose you run a linear least squares
regression of Y on X. The estimated regression line is 𝑌̂ = 3 + 2𝑋. The t-statistic for testing the null
hypothesis that 𝛽1 = 1 is 3.

9. You get an additional data point with X = 2 and Y = 7 and run the regression again including the new
data point. What happens to the estimated slope coefficient?

A. It increases.

B. It decreases.

C. It remains the same.

D. Cannot tell based on the information given.

Answer (c)

10. What happens to the standard error of regression in the new regression run using the new data point
relative to the standard error of regression in the original regression?

A. It increases.

B. It decreases.
C. It remains the same.

D. Cannot tell based on the information given.

Answer (c)

11. The p-value of the slope in a simple regression is 0.45. Then

A. H0: β1 = 0 should be accepted.

B. the data suggests that the predictor x is not helpful in predicting the response y.

C. the slope is less than 1 SE from zero.

D. all the above are correct

Answer: D

12 - 15. Data on 94 houses in US yields the following regression output.

Linear Fit

Price per Square Foot ($/SqFt) = 157.75299 + 53886.738*1/Sq Ft

Summary of Fit

RSquare 0.091185
Root Mean Square Error 39.41486
Mean of Response 187.2359
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|

Intercept 157.75299 10.52119 14.99 <.0001*


1/Sq Ft 53886.738 17736.35 3.04 0.0031*

12. According to the fitted model, the average price per square foot for a house of 2500 square feet is
approximately

A. $160

B. $180

C. $200

D.$ 220

Answer: B

13. If a realtor want to provide to a potential buyer a rough price range of a house of 2500 square feet,
then it is

A. between $250,500 to $644,500 with probability 95%

B. between $220,500 to $ 674,500 with probability 95%

C. between $280,500 to $614,500 with probability 95%

D. between $310,500 to $584,500 with probability 95%

Answer: A

14. The information given in the parameter estimate table about the slope implies that

A. Fixed costs of houses are significantly different from zero.

B. Fixed costs per square foot are significantly different from zero.

C. Fixed costs decrease as the number of square feet increase

D. Fix costs cannot not be estimated from the model.

Answer: A
15. Suppose the price per square foot of similar houses in Hong Kong is around HK$1500 per square foot
(1$ = 7.8HK$). Then

A. The cost in Hong Kong is higher but not significantly higher

B. The cost in Hong Kong is significantly higher

C. The cost in Hong Kong is lower but not significantly lower

D. The cost in Hong Kong is significantly lower.

Answer: B

16. The following results were obtained from a simple regression analysis:

r2 = 0.6744
s2 = 0.2934

For each unit change in the independent variable x, the estimated change in the mean value of the
dependent variable y is equal to

(a) –1.2024 (b) 0.6774 (c) 37.2895 (d) 0.2934


Answer (a)

17. A leveraged outlier refers to

(a) A predicted value of the response variable computed from a value of the explanatory variable that
is outside the range of values used to determine the regression equation
(b) A predicted value of the response variable that has a very small residual
(c) An observation in the data that stands apart from the rest of the data
(d) An outlier that is near the minimum or maximum value of the explanatory variable

Ans: (d)

19. Which assumption of SRM is violated in the residual

plot at the right?

A. the relationship is linear

B. the random errors are independent

C. the random errors have equal variance


D. the random errors are normally distributed.

Answer: C

20-23. The following regression output of the sales prices of 100 homes on the square footage and number of
bedrooms. The price is measured in 1000's of dollars. Square footage is measured in 1000's of square feet.

20. Compute the unknown (A).

(a) 9.53 (b) 90.9 (c) 8.66 (d) 0.81

Answer (a)

21. Compute the unknown (B).

(a) 1 (b) 0.905 (c) 0.58 (d) 0

Answer (c)

22. Test if the beta coefficient of the variable “Number of Bedrooms” is negative by

first computing the corresponding P-value. State your decision for 5%

significance level.

(a) p-value is 0.905 and we cannot reject the null 𝐻0 : 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑛𝑢𝑚𝑏𝑒𝑟 ≥ 0.


(b) p-value is 0.095 and we cannot reject the null 𝐻0 : 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑛𝑢𝑚𝑏𝑒𝑟 ≥ 0.
(c) p-value is 0.0478 and we can reject the null 𝐻0 : 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑛𝑢𝑚𝑏𝑒𝑟 ≥ 0.
(d) p-value is 0.4522 and we cannot reject the null 𝐻0 : 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑛𝑢𝑚𝑏𝑒𝑟 ≥ 0.

Answer (d)
23. Compute the 95% confidence interval for the beta coefficient of the variable “square footage”.

(a) [3.669, 5.023] (b) [3.722, 5.902] (c) [3.555, 6.089] (d) [3.555, 5.023]

Answer (b)

24 – 26. The scatter plot of sales (y) of half-gallon orange juice versus the price (x) is given below. We
apply log transform on both y and x to fit the nonlinear pattern.

Transformed Fit Log to Log

Log(Sales) = 4.811646 - 1.7523832*Log(Price)

Summary of Fit

RSquare 0.755335
RSquare Adj 0.750238
Root Mean Square Error 0.385788

Mean of Response 3.136468


Observations (or Sum Wgts) 50

Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|

Intercept 4.811646 0.148033 32.50 <.0001*


Log(Price) -1.752383 0.143954 -12.17 <.0001*

24. Assume the transformed x and y agrees with SRM.


A. As the price increase by 1%, the sales decrease by 1.75% on the average
B. As the price increase by $1, the sales decrease by 1.75 units on the average.
C. As the price increase by 1%, the sales decrease by 1.75 units

D. As the price increase by 1$, the sales decrease by 175%.

25. The t ratio of the slope shows that

A. The elasticity of demand with respect to the price < 0 insignificantly


B. The elasticity < 0 significantly and < ̵ 1 not significantly.
C. The elasticity < ̵ 1 significantly.

D. None of the above is correct.


Answer: C

26. The optimal price equals (C × elasticity) /(1+ elasticity), where C equals the cost. Suppose the cost of
a half-gallon juice is $1.5. Then the 95% confidence interval for optimal price is

A. between $2.9 and $4.7


B. between $2.0 and $4.0
C. between $2.5 and $ 4.5
D. between $3.1 and $5.1

Answer: A

27. The heights (y) of 50 men and their shoes sizes (x) were obtained. The variable height is measured in
centimetres (cm) and the shoe sizes of these 14 men ranged from 8 to 11. From these 50 pairs of
observations, the least squares regression line predicting height from shoe size was computed to be 𝑦̂ =
130455 + 4.7498𝑥. What height would you predict for a man with a shoe size of 13?

(a) 130.46cm (b) 192.20cm (c) 182.70cm (d) I would not use this regression line to predict the height of
a man with a shoe size of 13.

Answer (d).

28-31. A large national bank charges local companies for using their services. A bank official reported
the results of a regression analysis designed to predict the bank's charges (Y) -- measured in dollars per
month -- for services rendered to local companies. One explanatory variable used to predict service
charge to a company is the company's sales revenue (X) -- measured in millions of dollars. Data for 21
companies who use the bank's services were used to fit the model. The results of the simple linear
regression are provided below.
𝑦̂ = - 2,700 + 20 x, RMSE =65, p-value for testing 𝛽1 = 0 is 0.034.

28. Interpret the estimate of 𝛽0 , the intercept of the line.

a) All companies will be charged at least $2,700 by the bank.

b) There is no practical interpretation since a sales revenue of $0 is a nonsensical value.

c) About 95% of the observed service charges fall within $2,700 of the least squares line.

d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700.

Ans: b

29. Interpret the estimate of 𝜎є , the standard deviation of the error term in the model.

a) About 95% of the observed service charges fall within $65 of the least squares line.

b) About 95% of the observed service charges equal their corresponding predicted values.

c) About 95% of the observed service charges fall within $130 of the least squares line.

d) For every $1 million increase in sales revenue, we expect a service charge to increase $65.

Ans: c

30. Interpret the p-value for testing whether 𝛽1 =0.

a) There is sufficient evidence (at the α = 0.05) to conclude that sales revenue (X) is a useful linear
predictor of service charge (Y).

b) There is insufficient evidence (at the α = 0.10) to conclude that sales revenue (X) is a useful linear
predictor of service charge (Y).
c) Sales revenue (X) is a poor predictor of service charge (Y).

d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034.

Ans: a

31. A 95% confidence interval for 𝛽1 is [15, 30]. Interpret the interval.

a) We are 95% confident that the mean service charge will fall between $15 and $30 per month.

b) We are 95% confident that the sales revenue (X) will increase between $15 and $30 million for
every $1 increase in service charge (Y).

c) We are 95% confident that average service charge (Y) will increase between $15 and $30 for
every $1 million increase in sales revenue (X).

d) At the α= 0.05 level, there is no evidence of a linear relationship between service charge (Y) and
sales revenue (X).

Ans: c

32-33. A medium-sized business has a policy that keeps its weekly advertising budget within the range
from $2000 to $6000. The marketing manager has collected data from a sample of weeks, recording the
amount spent on advertising (ADV) and the revenue (REV) for each week. The amounts spent on
advertising are recorded in thousands of dollars (for example, an actual amount of $3500 corresponds to
ADV = 3.5). Revenue amounts are in actual dollars. After examining the data, the manager decides to use
a (natural) log transformation on both variables in order to derive a regression line. The log-log equation
is determined to be: 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 log 𝑅𝐸𝑉 = 6.8 + 2.2 log 𝐴𝐷𝑉

32. In the context of this application, elasticity refers to which of the following

(a) The slope of the line 𝑏1 = 2.2

(b) How the absolute change in REV relates to the absolute change in ADV

(c) The size of the intercept 𝑏0 = 6.8

(d) The fact that the original data collected for revenue and advertising dollars did not meet the linear
condition for regression

Ans: (a)

33. What percentage increase in revenue would be predicted for a 0.5% increase in dollars spent on
advertising?
(a) 1.1% (b) 1.78% (c) 0.354% 0.72%

Ans: (a)

34. The elasticity of y with respect to x describes

(a) how small percentage changes in x are associated with small percentage changes in y

(b) how small percentage changes in y are associated with small percentage changes in x.

(c) how changes in y effect changes in x

(d) how changes in x effect changes in y

Ans: (a)

35-36. It is believed that, the average numbers of hours spent studying per day (HOURS) during
undergraduate education should have a positive linear relationship with the starting salary (SALARY,
measured in thousands of dollars per month) after graduation. Given below is the output from regressing
starting salary on number of hours spent studying per day for a sample of 51 students.

R Square 0.7845

Standard Error 1.3704

Observations 51

Coefficients Standard Error t Stat P-value

Intercept -1.8940 0.4018 -4.7134 2.051E-05

Hours 0.9795 0.0733 13.3561 5.944E-18

35. What's the value of the t-test statistic to test whether average SALARY depends linearly on HOURS?

a) -4.7134 b) -1.8940 c) 0.9795 d) 13.3561

Ans: d

36. The 90% confidence interval for the average change in SALARY (in thousands of dollars) as a result
of spending an extra hour per day studying is
a) wider than [-2.70, -1.09].

b) narrower than [-2.70, -1.09].

c) wider than [0.83, 1.13].

d) narrower than [0.83, 1.13].

Ans: d

37-40 A large company employs several thousand people in the manufacture of keyboards, equipment cases,
and cables for the small-computer industry. The personnel manager of the company would like to find ways
to forecast the absentee rate among the company employees. An effective method of forecasting would
greatly strengthen the ability to plan properly. He took a sample of 40 employees and recorded the number of
absent days (Y) during the last fiscal year along with employee age (X). The computer output of a regression
analysis is as follows.

37. An employee, John, is 30 years old. According to the regression equation, what is his expected number of
absent days in the coming fiscal year?

(a) 3.34 (b) 4.56 (c) 5.34 (d) 6.56

Answer (a)

38. Test the regression coefficient 𝛽1 of age is larger than 0.2 using 5% significance level.

(a) p-value is 0.002 and we reject the null 𝐻0 : 𝛽1 ≤ 0.2


(b) p-value is 0.998 and we cannot reject the null 𝐻0 : 𝛽1 ≤ 0.2
(c) p-value is 0.029 and we reject the null 𝐻0 : 𝛽1 ≤ 0.2
(d) p-value is 0.47 and we cannot reject the null 𝐻0 : 𝛽1 ≤ 0.2
Answer (c)

39. Find a 95% confidence interval for the regression coefficient of age.

(a) [0.1056, 0.4286] (b) [0.2006, 0.3104] (c)[0.1979, 0.3096 ] (d) [0.0056, 0.4286]

Answer (c)

40. The sample mean and sample standard deviation for age are 37.87 and 10.39, respectively. Find a 95%
confidence interval for the mean absent days of 30 years old employees.

(a) [2.907, 3.773] (b) [2.907, 4.025] (c) [1.998, 4.025] (d) [1.998, 3.773]

Answer(a)

You might also like