Reading 1 Multiple Regression
Reading 1 Multiple Regression
Phillip Lee works for Song Bank as a quantitative analyst. He is currently working on a model
to explain the returns (in %) of 20 hedge funds for the past year. He includes three
independent variables:
Estimated model: hedge fund return = 3.2 + 0.22 market return + 1.65 closed – 0.11 prior
period alpha
Lee is concerned about the impact of outliers on the estimated regression model and
collects the following information:
Observation 1 2 3 4 5 6 7 8 9 10
Cook's D 0.332 0.219 0.115 0.212 0.376 0.232 0.001 0.001 0.233 0.389
Observation 11 12 13 14 15 16 17 18 19 20
Cook's D 0.089 0.112 0.001 0.001 0.219 0.001 0.112 0.044 0.517 0.212
Additionally, Lee wants to estimate the probability of a hedge fund closing to new investors,
and he uses two variables:
Intercept –3.76
What is the correct interpretation of the coefficient of closed in the first regression?
A closed fund is estimated to have an extra return of 1.65% relative to funds that
A)
are not closed.
B) A closed fund is likely to generate a return of 1.65%.
C) If a model is closed to new investors, the expected excess fund return is 1.65%.
To check for only the outliers in the sample, Lee should most appropriately use:
A) Cook’s D.
B) Studentized residuals.
C) leverage.
A) Observation 19.
B) Observations 1, 10, and 11.
C) Observations 10 and 19.
What is the change in probability of fund closure for a 1% increase in Ln(assets under
management)?
A) 4.83%.
B) 2.33%.
C) 5.08%.
Which of the following conditions will least likely affect the statistical inference about
regression parameters by itself?
A) Unconditional heteroskedasticity.
B) Multicollinearity.
C) Model misspecification.
A high-yield bond analyst is trying to develop an equation using financial ratios to estimate
the probability of a company defaulting on its bonds. A technique that can be used to
develop this equation is:
The management of a large restaurant chain believes that revenue growth is dependent
upon the month of the year. Using a standard 12 month calendar, how many dummy
variables must be used in a regression model that will test whether revenue growth differs
by month?
A) 11.
B) 12.
C) 13.
Which of the following questions is least likely answered by using a qualitative dependent
variable?
Based on the following company-specific financial ratios, will company ABC enter
A)
bankruptcy?
Based on the following executive-specific and company-specific variables, how many
B)
shares will be acquired through the exercise of executive stock options?
Based on the following subsidiary and competition variables, will company XYZ
C)
divest itself of a subsidiary?
A regression with three independent variables have VIF values of 3, 4, and 2 for the first,
second, and third independent variables, respectively. Which of the following conclusions is
most appropriate?
with a PI calculated t-statistic of 0.45, a TEEN calculated t-statistic of 2.2, and an INS
calculated t-statistic of 0.63.
The equation was estimated over 40 companies. The predicted value of AUTO if PI is 4, TEEN
is 0.30, and INS = 0.6 is closest to:
A) 17.50.
B) 14.10.
C) 14.90.
Toni Williams, CFA, has determined that commercial electric generator sales in the Midwest
U.S. for Self-Start Company is a function of several factors in each area: the cost of heating
oil, the temperature, snowfall, and housing starts. Using data for the most currently
available year, she runs a cross-sectional regression where she regresses the deviation of
sales from the historical average in each area on the deviation of each explanatory variable
from the historical average of that variable for that location. She feels this is the most
appropriate method since each geographic area will have different average values for the
inputs, and the model can explain how current conditions explain how generator sales are
higher or lower from the historical average in each area. In summary, she regresses current
sales for each area minus its respective historical average on the following variables for each
area.
The difference between the retail price of heating oil and its historical average.
The mean number of degrees the temperature is below normal in Chicago.
The amount of snowfall above the average.
The percentage of housing starts above the average.
Total 25 941.60
df1
df2 1 2 4 10 20
One of her goals is to forecast the sales of the Chicago metropolitan area next year. For that
area and for the upcoming year, Williams obtains the following projections: heating oil prices
will be $0.10 above average, the temperature in Chicago will be 5 degrees below normal,
snowfall will be 3 inches above average, and housing starts will be 3% below average.
In addition to making forecasts and testing the significance of the estimated coefficients, she
plans to perform diagnostic tests to verify the validity of the model's results.
Williams proceeds to test the hypothesis that none of the independent variables has
significant explanatory power. Using the joint F-test for the significance of all slope
coefficients, at a 5% level of significance:
With respect to testing the validity of the model's results, Williams may wish to perform:
When Williams ran the model, the computer said the R2 is 0.233. She examines the other
output and concludes that this is the:
A) adjusted R2 value.
B) neither the unadjusted nor adjusted R2 value, nor the coefficient of correlation.
C) unadjusted R2 value.
In preparing and using this model, Williams has least likely relied on which of the following
assumptions?
For next year, Hilton estimates the following parameters: (1) the population under 20 will be
120 million, (2) disposable income will be $300,000,000, and (3) advertising expenditures will
be $100,000,000. Based on these estimates and the regression equation, what are predicted
sales for the industry for next year?
A) $509,980,000.
B) $557,143,000.
C) $656,991,000.
Washburn has estimated a regression equation in which 160 quarterly returns on the S&P
500 are explained by three macroeconomic variables: employment growth (EMP) as
measured by nonfarm payrolls, gross domestic product (GDP) growth, and private
investment (INV). The results of the regression analysis are as follows:
Coefficient Estimates
Standard Error of
Parameter Coefficient
Coefficient
Other Data:
n dl du dl du dl du dl du dl du
20 1.20 1.41 1.10 1.54 1.00 1.68 0.90 1.83 0.79 1.99
50 1.50 1.59 1.46 1.63 1.42 1.67 1.38 1.72 1.34 1.77
>100 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78
How many of the three independent variables (not including the intercept term) are
statistically significant in explaining quarterly stock returns at the 5.0% level?
Can the null hypothesis that the GDP growth coefficient is equal to 3.50 be rejected at the
1.0% confidence level versus the alternative that it is not equal to 3.50? The null hypothesis
is:
The percentage of the total variation in quarterly stock returns explained by the
independent variables is closest to:
A) 32%.
B) 47%.
C) 42%.
A) 4.7%.
B) 5.0%.
C) 4.4%.
A) 1.71.
B) 1.31.
C) 0.81.
A) serial correlation.
B) conditional heteroskedasticity.
C) multicollinearity.
An analyst regresses the return of a S&P 500 index fund against the S&P 500, and also
regresses the return of an active manager against the S&P 500. The analyst uses the last five
years of data in both regressions. Without making any other assumptions, which of the
following is most accurate? The index fund:
Consider the following estimated regression equation, with calculated t-statistics of the
estimates as indicated:
with a PI calculated t-statistic of 0.45, a TEEN calculated t-statistic of 2.2, and an INS
calculated t-statistic of 0.63.
The equation was estimated over 40 companies. Using a 5% level of significance, which of
the independent variables significantly different from zero?
When two or more of the independent variables in a multiple regression are correlated with
each other, the condition is called:
A) serial correlation.
B) multicollinearity.
C) conditional heteroskedasticity.
Consider the following estimated regression equation, with the standard errors of the slope
coefficients as noted:
Salesi = 10.0 + 1.25 R&Di + 1.0 ADVi – 2.0 COMPi + 8.0 CAPi
where the standard error for the estimated coefficient on R&D is 0.45, the
standard error for the estimated coefficient on ADV is 2.2 , the standard error for
the estimated coefficient on COMP is 0.63, and the standard error for the
estimated coefficient on CAP is 2.5.
The equation was estimated over 40 companies. Using a 5% level of significance, which of
the estimated coefficients are significantly different from zero?
Consider a study of 100 university endowment funds that was conducted to determine if the
funds' annual risk-adjusted returns could be explained by the size of the fund and the
percentage of fund assets that are managed to an indexing strategy. The equation used to
model this relationship is:
Where:
ARARi = the average annual risk-adjusted percent returns for the fund i over the
1998-2002 time period.
Sizei = the natural logarithm of the average assets under management for fund i.
The table below contains a portion of the regression results from the study.
Which of the following is the most accurate interpretation of the slope coefficient for size?
ARAR:
will change by 0.6% when the natural logarithm of assets under management
A)
changes by 1.0, holding index constant.
will change by 1.0% when the natural logarithm of assets under management
B)
changes by 0.6, holding index constant.
and index will change by 1.1% when the natural logarithm of assets under
C)
management changes by 1.0.
Which of the following is the estimated standard error of the regression coefficient for
index?
A) 2.31.
B) 1.91.
C) 0.52.
Question #33 - 36 of 164 Question ID: 1471877
A) 0.30.
B) 3.33.
C) 0.70.
A) −2.86.
B) −9.45.
C) −0.11.
Which of the following statements is most accurate regarding the significance of the
regression parameters at a 5% level of significance?
The parameter estimates for the intercept are significantly different than zero. The
A)
slope coefficients for index and size are not significant.
All of the parameter estimates are significantly different than zero at the 5% level of
B)
significance.
The parameter estimates for the intercept and the independent variable size are
C)
significantly different than zero. The coefficient for index is not significant.
Which of the following is NOT a required assumption for multiple linear regression?
It is possible for the adjusted-R2 to decline as more variables are added to the
A)
multiple regression.
Jacob Warner, CFA, is evaluating a regression analysis recently published in a trade journal
that hypothesizes that the annual performance of the S&P 500 stock index can be explained
by movements in the Federal Funds rate and the U.S. Producer Price Index (PPI). Which of
the following statements regarding his analysis is most accurate?
If the p-value of a variable is less than the significance level, the null hypothesis
A)
cannot be rejected.
If the t-value of a variable is less than the significance level, the null hypothesis
B)
cannot be rejected.
If the p-value of a variable is less than the significance level, the null hypothesis can
C)
be rejected.
Autumn Voiku is attempting to forecast sales for Brookfield Farms based on a multiple
regression model. Voiku has constructed the following model:
Where:
Voiku uses monthly data from the previous 180 months of sales data and for the
independent variables. The model estimates (with coefficient standard errors in
parentheses) are:
The sum of squared errors is 140.3 and the total sum of squares is 368.7.
Voiku calculates the unadjusted R2, the adjusted R2, and the standard error of estimate to
be 0.592, 0.597, and 0.910, respectively.
Voiku is concerned that one or more of the assumptions underlying multiple regression has
been violated in her analysis. In a conversation with Dave Grimbles, CFA, a colleague who is
considered by many in the firm to be a quant specialist, Voiku says, "It is my understanding
that there are five assumptions of a multiple regression model:"
Voiku tests and fails to reject each of the following four null hypotheses at the 99%
confidence interval:
Figure 2: Partial F-Table critical values for right-hand tail area equal to 0.05
Figure 3: Partial F-Table critical values for right-hand tail area equal to 0.025
incorrect to agree with Voiku’s list of assumptions because two of the assumptions
A)
are stated incorrectly.
incorrect to agree with Voiku’s list of assumptions because one of the assumptions
B)
is stated incorrectly.
C) correct to agree with Voiku’s list of assumptions.
For which of the four hypotheses did Voiku incorrectly fail to reject the null, based on the
data given in the problem?
A) Hypothesis 2.
B) Hypothesis 4.
C) Hypothesis 3.
The most appropriate decision with regard to the F-statistic for testing the null hypothesis
that all of the independent variables are simultaneously equal to zero at the 5 percent
significance level is to:
reject the null hypothesis because the F-statistic is larger than the critical F-value of
A)
3.19.
fail to reject the null hypothesis because the F-statistic is smaller than the critical F-
B)
value of 2.66.
reject the null hypothesis because the F-statistic is larger than the critical F-value of
C)
2.66.
Question #43 - 45 of 164 Question ID: 1471910
Regarding Voiku's calculations of R2 and the standard error of estimate, she is:
correct in her calculation of the unadjusted R2 but incorrect in her calculation of the
A)
standard error of estimate.
incorrect in her calculation of the unadjusted R2 but correct in her calculation of the
B)
standard error of estimate.
incorrect in her calculation of both the unadjusted R2 and the standard error of
C)
estimate.
A) heteroskedasticity.
B) serial correlation of the error terms.
C) multicollinearity.
A) –1.5 to 20.0.
B) –1.9 to 19.6.
C) 0.5 to 22.9.
In preparing an analysis of HB Inc., Jack Stumper is asked to look at the company's sales in
relation to broad based economic indicators. Stumper's analysis indicates that HB's monthly
sales are related to changes in housing starts (H) and changes in the mortgage interest rate
(M). The analysis covers the past ten years for these variables. The regression equation is:
S = 1.76 + 0.23H - 0.08M
F statistic: 9.80
Variable Descriptions
A) $55,000.
B) $36,000.
C) $44,000.
Will Stumper conclude that the housing starts coefficient is statistically different from zero
and how will he interpret it at the 5% significance level:
A) different from zero; sales will rise by $23 for every 100 house starts.
B) not different from zero; sales will rise by $0 for every 100 house starts.
C) different from zero; sales will rise by $100 for every 23 house starts.
Is the regression coefficient of changes in mortgage interest rates different from zero at the
5 percent level of significance?
The regression statistics above indicate that for the period under study, the independent
variables (housing starts, mortgage interest rate) together explained approximately what
percentage of the variation in the dependent variable (sales)?
A) 77.00.
B) 9.80.
C) 67.00.
In this multiple regression, if Stumper discovers that the residuals exhibit positive serial
correlation, the most likely effect is:
Ben Sasse is a quantitative analyst at Gurnop Asset Managers. Sasse is interviewing Victor
Sophie for a junior analyst position. Sasse mentions that the firm currently uses several
proprietary multiple regression models and wants Sophie's opinion about regression
models.
Sasse then discusses a model that the firm uses to forecast credit spread on investment-
grade corporate bonds. Sasse states that while the current model parameters are a secret,
the following is an older version of the model:
Based on the credit spread model, if an issuer gets included in the CDX index and assuming
everything else the same, which of the following statements most accurately describes the
model's forecast?
Miles Mason, CFA, works for ABC Capital, a large money management company based in
New York. Mason has several years of experience as a financial analyst, but is currently
working in the marketing department developing materials to be used by ABC's sales team
for both existing and prospective clients. ABC Capital's client base consists primarily of large
net worth individuals and Fortune 500 companies. ABC invests its clients' money in both
publicly traded mutual funds as well as its own investment funds that are managed in-
house. Five years ago, roughly half of its assets under management were invested in the
publicly traded mutual funds, with the remaining half in the funds managed by ABC's
investment team. Currently, approximately 75% of ABC's assets under management are
invested in publicly traded funds, with the remaining 25% being distributed among ABC's
private funds. The managing partners at ABC would like to shift more of its client's assets
away from publicly traded funds into ABC's proprietary funds, ultimately returning to a
50/50 split of assets between publicly traded funds and ABC funds. There are three key
reasons for this shift in the firm's asset base. First, ABC's in-house funds have outperformed
other funds consistently for the past five years. Second, ABC can offer its clients a reduced
fee structure on funds managed in-house relative to other publicly traded funds. Lastly, ABC
has recently hired a top fund manager away from a competing investment company and
would like to increase his assets under management.
ABC Capital's upper management requested that current clients be surveyed in order to
determine the cause of the shift of assets away from ABC funds. Results of the survey
indicated that clients feel there is a lack of information regarding ABC's funds. Clients would
like to see extensive information about ABC's past performance, as well as a sensitivity
analysis showing how the funds will perform in varying market scenarios. Mason is part of a
team that has been charged by upper management to create a marketing program to
present to both current and potential clients of ABC. He needs to be able to demonstrate a
history of strong performance for the ABC funds, and, while not promising any measure of
future performance, project possible return scenarios. He decides to conduct a regression
analysis on all of ABC's in-house funds. He is going to use 12 independent economic
variables in order to predict each particular fund's return. Mason is very aware of the many
factors that could minimize the effectiveness of his regression model, and if any are present,
he knows he must determine if any corrective actions are necessary. Mason is using a
sample size of 121 monthly returns.
A) Durbin-Watson.
B) Breusch-Godfrey
C) Breusch-Pagan.
If a regression equation shows that no individual t-tests are significant, but the F-statistic is
significant, the regression probably exhibits:
A) serial correlation.
B) multicollinearity.
C) heteroskedasticity.
George Smith, an analyst with Great Lakes Investments, has created a comprehensive report
on the pharmaceutical industry at the request of his boss. The Great Lakes portfolio
currently has a significant exposure to the pharmaceuticals industry through its large equity
position in the top two pharmaceutical manufacturers. His boss requested that Smith
determine a way to accurately forecast pharmaceutical sales in order for Great Lakes to
identify further investment opportunities in the industry as well as to minimize their
exposure to downturns in the market. Smith realized that there are many factors that could
possibly have an impact on sales, and he must identify a method that can quantify their
effect. Smith used a multiple regression analysis with five independent variables to predict
industry sales. His goal is to not only identify relationships that are statistically significant,
but economically significant as well. The assumptions of his model are fairly standard: a
linear relationship exists between the dependent and independent variables, the
independent variables are not random, and the expected value of the error term is zero.
Smith is confident with the results presented in his report. He has already done some
hypothesis testing for statistical significance, including calculating a t-statistic and
conducting a two-tailed test where the null hypothesis is that the regression coefficient is
equal to zero versus the alternative that it is not. He feels that he has done a thorough job
on the report and is ready to answer any questions posed by his boss.
However, Smith's boss, John Sutter, is concerned that in his analysis, Smith has ignored
several potential problems with the regression model that may affect his conclusions. He
knows that when any of the basic assumptions of a regression model are violated, any
results drawn for the model are questionable. He asks Smith to go back and carefully
examine the effects of heteroskedasticity, multicollinearity, and serial correlation on his
model. In specific, he wants Smith to make suggestions regarding how to detect these errors
and to correct problems that he encounters.
Sutter has detected the presence of conditional heteroskedasticity in Smith's report. This is
evidence that:
Suppose there is evidence that the variance of the error term is correlated with the values of
the independent variables. The most likely effect on the statistical inferences Smith can
make from the regressions results using financial data is to commit a:
Type I error by incorrectly failing to reject the null hypothesis that the regression
A)
parameters are equal to zero.
Type II error by incorrectly failing to reject the null hypothesis that the regression
B)
parameters are equal to zero.
Type I error by incorrectly rejecting the null hypotheses that the regression
C)
parameters are equal to zero.
Which of the following is most likely to indicate that two or more of the independent
variables, or linear combinations of independent variables, may be highly correlated with
each other? Unless otherwise noted, significant and insignificant mean significantly different
from zero and not significantly different from zero, respectively.
The R2 is high, the F-statistic is significant and the t-statistics on the individual slope
B)
coefficients are insignificant.
The R2 is high, the F-statistic is significant and the t-statistics on the individual slope
C)
coefficients are significant.
Using the Durbin-Watson test statistic, Smith rejects the null hypothesis suggested by the
test. This is evidence that:
A) two or more of the independent variables are highly correlated with each other.
B) the error term is normally distributed.
C) the error terms are correlated with each other.
In preparing an analysis of Treefell Company, Jack Lumber is asked to look at the company's
sales in relation to broad-based economic indicators. Lumber's analysis indicates that
Treefell's monthly sales are related to changes in housing starts (H) and changes in the
mortgage interest rate (M). The analysis covers the past 10 years for these variables. The
regression equation is:
F-statistic: 9.80
Variable Descriptions
1 3.84
2 5.99
3 7.81
4 9.49
5 11.07
6 12.59
Using the regression model developed, the closest prediction of sales for December 20X6 is:
A) $55,000.
B) $44,000.
C) $36,000.
Will Jack conclude that the housing starts coefficient is statistically different from zero and
how will he interpret it at the 5% significance level?
A) Different from zero; sales will rise by $100 for every 23 house starts.
B) Different from zero; sales will rise by $23 for every 100 house starts.
C) Not different from zero; sales will rise by $0 for every 100 house starts.
A) deviation of the estimated values from the actual values of the dependent variable.
B) degree of correlation between the independent variables.
C) the joint significance of the independent variables.
The regression statistics indicate that for the period under study, the independent variables
(housing starts, mortgage interest rate) together explain approximately what percentage of
the variation in the dependent variable (sales)?
A) 9.80.
B) 77.00.
C) 67.00.
For this question only, assume that the regression of squared residuals on the independent
variables has R2 = 11%. At a 5% level of significance, which of the following conclusions is
most accurate?
With a test statistic of 0.22, we cannot reject the null hypothesis of no conditional
A)
heteroskedasticity.
Because the critical value is 3.84, we reject the null hypothesis of no conditional
B)
heteroskedasticity.
With a test statistic of 13.53, we can conclude the presence of conditional
C)
heteroskedasticity.
Jill Wentraub is an analyst with the retail industry. She is modeling a company's sales over
time and has noticed a quarterly seasonal pattern. If she includes dummy variables to
represent the seasonality component of the sales she must use:
One possible problem that could jeopardize the validity of the employment growth rate
model is multicollinearity. Which of the following would most likely suggest the existence of
multicollinearity?
Wilson estimated a regression that produced the following analysis of variance (ANOVA)
table:
Total 400 41
The values of R2 and the F-statistic to test the null hypothesis that slope coefficients on all
variables are equal to zero are:
Binod Salve, CFA, is investigating the application of the Fama-French three-factor model
(Model 1) for the Indian stock market for the period 2001–2011 (120 months). Using the
dependent variable as annualized return (%), the results of the analysis are shown in Indian
Equities—Fama-French Model
R-squared 0.36
SSE 38.00
BG (lag 1) 2.11
BG (lag 2) 1.67
1 3.84
2 5.99
3 7.81
4 9.49
5 11.07
6 12.59
Because the test statistic of 7.20 is higher than the critical value of 3.84, we reject
A)
the null hypothesis of no conditional heteroskedasticity in residuals.
Because the test statistic of 7.20 is lower than the critical value of 7.81, we reject the
B)
null hypothesis of no conditional heteroskedasticity in residuals.
Because the test statistic of 3.60 is lower than the critical value of 3.84, we reject the
C)
null hypothesis of no conditional heteroskedasticity in residuals.
A) Yes, and Salve should exclude variable Rm-Rf from the model.
B) No.
C) Yes, and Salve should exclude either variable SMB or HML from the model.
An analyst is estimating whether company sales is related to three economic variables. The
regression exhibits conditional heteroskedasticity, serial correlation, and multicollinearity.
The analyst uses White and Newey-West corrected standard errors. Which of the following is
most accurate?
The regression will still exhibit multicollinearity, but the heteroskedasticity and serial
A)
correlation problems will be solved.
The regression will still exhibit heteroskedasticity and multicollinearity, but the serial
B)
correlation problem will be solved.
The regression will still exhibit serial correlation and multicollinearity, but the
C)
heteroskedasticity problem will be solved.
Regression 20 1 20
Error 80 40 2
Total 100 41
The F-statistic for the test of the fit of the model is closest to:
A) 0.25.
B) 0.10.
C) 10.00.
Question #77 of 164 Question ID: 1472012
During the course of a multiple regression analysis, an analyst has observed several items
that she believes may render incorrect conclusions. For example, the coefficient standard
errors are too small, although the estimated coefficients are accurate. She believes that
these small standard error terms will result in the computed t-statistics being too big,
resulting in too many Type I errors. The analyst has most likely observed which of the
following assumption violations in her regression analysis?
A) Multicollinearity.
B) Homoskedasticity.
C) Positive serial correlation.
When interpreting the results of a multiple regression analysis, which of the following terms
represents the value of the dependent variable when the independent variables are all equal
to zero?
A) Intercept term.
B) p-value.
C) Slope coefficient.
Raul Gloucester, CFA, is analyzing the returns of a fund that his company offers. He tests the
fund's sensitivity to a small capitalization index and a large capitalization index, as well as to
whether the January effect plays a role in the fund's performance. He uses two years of
monthly returns data, and runs a regression of the fund's return on the indexes and a
January-effect qualitative variable. The "January" variable is 1 for the month of January and
zero for all other months. The results of the regression are shown in the tables below.
Regression Statistics
Multiple R 0.817088
R2 0.667632
Adjusted R2 0.617777
Observations 24
ANOVA
df SS MS
Total 23 164.9963
Gloucester plans to test for serial correlation and conditional and unconditional
heteroskedasticity.
The percent of the variation in the fund's return that is explained by the regression is:
A) 66.76%.
B) 81.71%.
C) 61.78%.
No, because the BG statistic is less than the critical test statistic of 3.49, we don't
A)
have evidence of serial correlation.
No, because the BG statistic is less than the critical test statistic of 3.55, we don't
B)
have evidence of serial correlation.
Yes, because the BG statistic exceeds the critical test statistic of 3.16, there is
C)
evidence of serial correlation.
Gloucester subsequently revises the model to exclude the small cap index and finds that the
revised model has a RSS of 106.332. Which of the following statements is most accurate? At
a 5% level of significance, the test statistic:
of 13.39 indicates that we cannot reject the hypothesis that the coefficient of small-
A)
cap index is significantly different from 0.
of 1.30 indicates that we cannot reject the hypothesis that the coefficient of small-
B)
cap index is not significantly different from 0.
of 4.35 indicates that we cannot reject the hypothesis that the coefficient of small-
C)
cap index is significantly different from 0.
In the month of January, if both the small and large capitalization index have a zero return,
we would expect the fund to have a return equal to:
A) 2.322.
B) 2.799.
C) 2.561.
Assuming (for this question only) that the F-test was significant but that the t-tests of the
independent variables were insignificant, this would most likely suggest:
A) serial correlation.
B) conditional heteroskedasticity.
C) multicollinearity.
Jessica Jenkins, CFA, is looking at the retail property sector for her manager. She is
undertaking a top down review as she feels this is the best way to analyze the industry
segment. To predict U.S. property starts (housing), she has used regression analysis.
Given these variables, the following output was generated from 30 years of data:
Exhibit 1 – Results from regressing housing starts (in millions) on interest rates and
GDP per capita
ANOVA df SS MSS F
Total 29 6.327
Observations 30
Durbin-Watson 1.27
Interest rate = 7%
Using the regression model represented in Exhibit 1, what is the predicted number of
housing starts for 20X7?
A) 1,751,000.
B) 1,394.
C) 1,394,420.
The residual standard error of only 0.3 indicates that the regression equation is a
A)
good fit for the sample data.
B) The independent variables explain 61.58% of the variation in housing starts.
The large F-statistic indicates that both independent variables help explain changes
C)
in housing starts.
Which of the following is the least appropriate statement in relation to R-square and
adjusted R-square?
Adjusted R-square decreases when the added independent variable adds little value
A)
to the regression model.
Adjusted R-square can be higher than the coefficient of determination for a model
B)
with a good fit.
R-square typically increases when new independent variables are added to the
C)
regression.
One of the underlying assumptions of a multiple regression is that the variance of the
residuals is constant for various levels of the independent variables. This quality is referred
to as:
A) a linear relationship.
B) homoskedasticity.
C) a normal distribution.
A) heteroskedasticity.
B) positive serial correlation.
C) unstable remnant deviation.
Consider the following model of earnings (EPS) regressed against dummy variables for the
quarters:
where:
Which of the following statements regarding this model is most accurate? The:
coefficient on each dummy tells us about the difference in earnings per share
A)
between the respective quarter and the one left out (first quarter in this case).
significance of the coefficients cannot be interpreted in the case of dummy
B)
variables.
C) EPS for the first quarter is represented by the residual.
A multiple regression model has included independent variables that are not linearly related
to the dependent variable. The model is most likely misspecified due to:
A) incorrect data pooling.
B) incorrect variable scaling.
C) incorrect variable form.
When pooling the samples over multiple economic environments in a multiple regression
model, which of the following errors is most likely to occur?
A) Multicollinearity.
B) Heteroskedasticity.
C) Model misspecification.
A real estate agent wants to develop a model to predict the selling price of a home. The
agent believes that the most important variables in determining the price of a house are its
size (in square feet) and the number of bedrooms. Accordingly, he takes a random sample of
32 homes that has recently been sold. The results of the regression are:
R2 = 0.56; F = 40.73
1 2
28 4.20 3.34
29 4.18 3.33
30 4.17 3.32
32 4.15 3.29
(Degrees of freedom for the numerator in columns; Degrees of freedom for the
denominator in rows)
The predicted price of a house that has 2,000 square feet of space and 4 bedrooms is closest
to:
A) $114,000.
B) $256,000.
C) $185,000.
The conclusion from the hypothesis test of H0: b1 = b2 = 0, is that the null hypothesis should:
A) not be rejected as the calculated F of 40.73 is greater than the critical value of 3.29.
B) be rejected as the calculated F of 40.73 is greater than the critical value of 3.33.
C) be rejected as the calculated F of 40.73 is greater than the critical value of 3.29.
Which of the following is most likely to present a problem in using this regression for
forecasting?
A) Heteroskedasticity.
B) Multicollinearity.
C) Autocorrelation.
Som Muttney has been asked to forecast the level of operating profit for a proposed new
branch of a tire store. His forecast is one component in forecasting operating profit for the
entire company for the next fiscal year. Muttney decide to conduct multiple regression
analysis using "branch store operating profit" as the dependent variable and three
independent variables. The three independent variables are "population within 5 miles of
the branch," "operating hours per week," and "square footage of the facility." Muttney used
data on the company's existing 23 branches to develop the model (n=23).
Two-tailed Significance
In his research report, Muttney claims that when the square footage of the store is
increased by 1%, operating profit will increase by more than 5%
A) 0.081 − 8.66
B) −0.81 − 9.56
C) −0.086 − 8.83
The probability of finding a value of t for variable X1 that is as-large or larger than |2.133|
when the null hypothesis is true is:
The correlation between the actual values of operating profit and the predicted value of
operating profit is closest to:
A) 0.53
B) 0.76
C) 0.36
B) H0: b3 ≤ 5 Reject H0
Fail to reject
C) H0: b3 ≤ 5
H0
A) 239.42
B) 15.47
C) 0.42
Salesi = 10.0 + 1.25 R&Di + 1.0 ADVi – 2.0 COMPi + 8.0 CAPi
If R&D and advertising expenditures are $1 million each, there are 5 competitors,
A)
and capital expenditures are $2 million, expected Sales are $8.25 million.
One more competitor will mean $2 million less in Sales (holding everything else
B)
constant).
If a company spends $1 million more on capital expenditures (holding everything
C)
else constant), Sales are expected to increase by $8.0 million.
Which of the following statements least accurately describes one of the fundamental
multiple regression assumptions?
A) homoskedasticity.
B) autocorrelation.
C) heteroskedasticity.
If R&D and advertising expenditures are $1 million each and there are 5
A)
competitors, expected sales are $9.5 million.
One more competitor will mean $3 million less in sales (holding everything else
B)
constant).
If a company spends $1 more on R&D (holding everything else constant), sales are
C)
expected to increase by $1.5 million.
Question #107 of 164 Question ID: 1479908
An analyst is trying to determine whether fund return performance is persistent. The analyst
divides funds into three groups based on whether their return performance was in the top
third (group 1), middle third (group 2), or bottom third (group 3) during the previous year.
The manager then creates the following equation: R = a + b1D1 + b2D2 + b3D3 + ε, where R is
return premium on the fund (the return minus the return on the S&P 500 benchmark) and Di
is equal to 1 if the fund is in group i. Assuming no other information, this equation will suffer
from:
A) multicollinearity.
B) heteroskedasticity.
C) serial correlation.
Jason Fye, CFA, wants to check for seasonality in monthly stock returns (i.e., the January
effect) after controlling for market cap and systematic risk. The type of model that Fye would
most appropriately select is:
Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that
bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level
of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All
data are measured in millions of units. Hilton gathers data for the last 20 years. Which of the
follow regression equations correctly represents Hilton's hypothesis?
Which of the following statements regarding serial correlation that might be encountered in
regression analysis is least accurate?
A) Positive serial correlation and heteroskedasticity can both lead to Type I errors.
B) Serial correlation occurs least often with time series data.
C) Serial correlation does not affect consistency of regression coefficients.
The F-statistic for the test of the fit of the model is the ratio of the mean squared
A)
regression to the mean squared error.
The R2 of a regression will be greater than or equal to the adjusted-R2 for the same
B)
regression.
The R2 is the ratio of the unexplained variation to the explained variation of the
C)
dependent variable.
Werner Baltz, CFA, has regressed 30 years of data to forecast future sales for National Motor
Company based on the percent change in gross domestic product (GDP) and the change in
retail price of a U.S. gallon of fuel. The results are presented below.
Standard Error of
Predictor Coefficient
the Coefficient
Intercept 78 13.710
Regression 291.30
Error 27 132.12
Total 29 423.42
If GDP rises 2.2% and the price of fuels falls $0.15, Baltz's model will predict Company sales
to be (in $ millions) closest to:
A) $82.00.
B) $128.00.
C) $206.00.
Baltz proceeds to test the hypothesis that none of the independent variables has significant
explanatory power. He concludes that, at a 5% level of significance:
all of the independent variables have explanatory power, because the calculated F-
A)
statistic exceeds its critical value.
none of the independent variables has explanatory power, because the calculated F-
B)
statistic does not exceed its critical value.
at least one of the independent variables has explanatory power, because the
C)
calculated F-statistic exceeds its critical value.
A) computed F-statistic.
B) coefficient estimates.
C) computed t-statistic.
Regression 20 1 20
Error 80 20 4
Total 100 21
The F-statistic for a test of joint significance of all the slope coefficients is closest to:
A) 0.2.
B) 5.
C) 0.05.
Manuel Mercado, CFA has performed the following two regressions on sales data for a given
industry. He wants to forecast sales for each quarter of the upcoming year.
Model ONE
Regression Statistics
Multiple R 0.941828
R2 0.887039
Adjusted R2 0.863258
Observations 24
ANOVA
df SS MS F Significance F
Total 23 1087.9583
Model TWO
Regression Statistics
Multiple R 0.941796
R2 0.886979
Adjusted R2 0.870026
Observations 24
df SS MS F Significance F
Total 23 1087.9584
Using Model ONE, what is the sales forecast for the second quarter of the next year?
A) $56.02 million.
B) $46.31 million.
C) $51.09 million.
Mercado probably did not include a fourth dummy variable Q4, which would have had 0, 0,
0, 1 as its first four observations because:
Vijay Shapule, CFA, is investigating the application of the Fama-French three-factor model
(Model 1) for the Indian stock market for the period 2001–2011 (120 months). Using the
dependent variable as annualized return (%), the results of the analysis are shown in Indian
Equities—Fama-French Model.
R-squared 0.36
SSE 38.00
AIC –129.99
BIC –118.84
Shapule then modifies the model to include a liquidity factor. Results for this four-factor
model (Model 2) are shown in Revised Fama-French Model With Liquidity Factor
R-squared 0.39
SSE 34.00
AIC –141.34
BIC –127.40
A) 0.36.
B) 0.39.
C) 0.37.
Question #123 - 125 of 164 Question ID: 1479898
The F-statistic for testing H0: coefficient of LIQ = 0 versus Ha: coefficient of LIQ ≠ 0 is closest
to:
A) 2.11.
B) 13.33.
C) 5.45.
What is the predicted return for a stock using Model 1 when SMB = 3.30, HML = 1.25 and
Rm-Rf = 5?
A) 9.58%.
B) 6.80%.
C) 7.88%.
Assume that in a particular multiple regression model, it is determined that the error terms
are uncorrelated with each other. Which of the following statements is most accurate?
Serial correlation may be present in this multiple regression model, and can be
A)
confirmed only through a Durbin-Watson test.
Unconditional heteroskedasticity present in this model should not pose a problem,
B)
but can be corrected by using robust standard errors.
This model is in accordance with the basic assumptions of multiple regression
C)
analysis because the errors are not serially correlated.
Alex Wade, CFA, is analyzing the result of a regression analysis comparing the performance
of gold stocks versus a broad equity market index. Wade believes that first lag serial
correlation may be present and, in order to prove his theory, should use which of the
following methods to detect its presence?
May Jones estimated a regression that produced the following analysis of variance (ANOVA)
table:
Regression 20 1 20
Error 80 40 2
Total 100 41
The values of R2 and the F-statistic for joint test of significance of all the slope coefficients
are:
Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that
bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level
of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All
data are measured in millions of units. Hilton gathers data for the last 20 years and
estimates the following equation (standard errors in parentheses):
The critical t-statistic for a 95% confidence level is 2.120. Which of the independent variables
is statistically different from zero at the 95% confidence level?
A fund has changed managers twice during the past 10 years. An analyst wishes to measure
whether either of the changes in managers has had an impact on performance. R is the
return on the fund, and M is the return on a market index. Which of the following regression
equations can appropriately measure the desired impacts?
Turner plans to use the result in the analysis of two investments. WLK Corp. has twelve
analysts following it and a market capitalization of $2.33 billion. NGR Corp. has two analysts
following it and a market capitalization of $47 million.
Standard Error of
Variable Coefficient t-statistic p-value
the Coefficient
Ln(No. of
−0.027 0.00466 −5.80 < 0.001
Analysts)
Table 2: ANOVA
The 95% confidence interval (use a t-stat of 1.96 for this question only) of the estimated
coefficient for the independent variable Ln(Market Value) is closest to:
A) 0.014 to -0.009.
B) -0.018 to -0.036.
C) 0.011 to 0.001.
If the number of analysts on NGR Corp. were to double to 4, the change in the forecast of
NGR would be closest to?
A) −0.055.
B) −0.019.
C) −0.035.
Based on a R2 calculated from the information in Table 2, the analyst should conclude that
the number of analysts and ln(market value) of the firm explain:
What is the F-statistic from the regression? And, what can be concluded from its value at a
1% level of significance?
F = 1.97, fail to reject a hypothesis that both of the slope coefficients are equal to
A)
zero.
B) F = 5.80, reject a hypothesis that both of the slope coefficients are equal to zero.
C) F = 17.00, reject a hypothesis that both of the slope coefficients are equal to zero.
Upon further analysis, Turner concludes that multicollinearity is a problem. What might have
prompted this further analysis and what is intuition behind the conclusion?
At least one of the t-statistics was not significant, the F-statistic was significant, and
A)
an intercept not significantly different from zero would be expected.
At least one of the t-statistics was not significant, the F-statistic was not significant,
B) and a positive relationship between the number of analysts and the size of the firm
would be expected.
At least one of the t-statistics was not significant, the F-statistic was significant, and a
C) positive relationship between the number of analysts and the size of the firm would
be expected.
Consider the following estimated regression equation, with standard errors of the
coefficients as indicated:
Salesi = 10.0 + 1.25 R&Di + 1.0 ADVi − 2.0 COMPi + 8.0 CAPi
where the standard error for R&D is 0.45, the standard error for ADV is 2.2, the
standard error for COMP 0.63, and the standard error for CAP is 2.5.
Sales are in millions of dollars. An analyst is given the following predictions on the
independent variables: R&D = 5, ADV = 4, COMP = 10, and CAP = 40.
A) $320.25 million.
B) $310.25 million.
C) $300.25 million.
Peter Pun, an enrolled candidate for the CFA Level II examination, has decided to perform a
calendar test to examine whether there is any abnormal return associated with investments
and disinvestments made in blue-chip stocks on particular days of the week. As a proxy for
blue-chips, he has decided to use the S&P 500 Index. The analysis will involve the use of
dummy variables and is based on the past 780 trading days. Here are selected findings of his
study:
RSS 0.0039
SSE 0.9534
SST 0.9573
R-squared 0.004
SEE 0.035
Jessica Jones, CFA, a friend of Peter, overhears that he is interested in regression analysis
and warns him that whenever heteroskedasticity is present in multiple regression, it could
undermine the regression results. She mentions that one easy way to spot conditional
heteroskedasticity is through a scatter plot, but she adds that there is a more formal test.
Unfortunately, she can't quite remember its name. Jessica believes that heteroskedasticity
can be rectified using White-corrected standard errors. Her son Jonathan who has also taken
part in the discussion, hears this comment and argues that White corrections would typically
reduce the number of Type I errors in financial data.
What can be said of the overall explanatory power of the model at the 5% significance?
Are Jessica and her son Jonathan correct in terms of the method used to correct for
heteroskedasticity and the likely effects?
A) Neither is correct.
B) Both are correct.
C) One is correct.
Which of the following statements regarding the results of a regression analysis is least
accurate? The:
slope coefficient in a multiple regression is the value of the dependent variable for a
A)
given value of the independent variable.
slope coefficient in a multiple regression is the change in the dependent variable for
B)
a one-unit change in the independent variable, holding all other variables constant.
C) slope coefficients in the multiple regression are referred to as partial betas.
A) Scatter plot.
B) Breusch-Godfrey test.
C) Breusch-Pagan test.
An analyst is trying to estimate the beta for a fund. The analyst estimates a regression
equation in which the fund returns are the dependent variable and the Wilshire 5000 is the
independent variable, using monthly data over the past five years. The analyst finds that the
correlation between the square of the residuals of the regression and the Wilshire 5000 is
0.2. Which of the following is most accurate, assuming a 0.05 level of significance? There is:
Quin Tan Liu, CFA, is looking at the retail property sector for her manager. She is
undertaking a top down review as she feels this is the best way to analyze the industry
segment. To predict U.S. property starts (housing), she has used regression analysis.
Given these variables the following output was generated from 30 years of data:
Exhibit 1 – Results from Regressing Housing Starts (in Millions) on Interest Rates and
GDP Per Capita
ANOVA df SS MSS F
Observations 30
Durbin-Watson 1.22
Interest rate = 7%
Using the regression model represented in Exhibit 1, what is the predicted number of
housing starts for 20X7?
A) 1,394
B) 1,394,420
C) 1,751,000
The residual standard error of only 0.3 indicates that the regression equation is a
A)
good fit for the sample data.
B) The independent variables explain 61.58% of the variation in housing starts.
The large F-statistic indicates that both independent variables help explain changes
C)
in housing starts.
Which of the following is the least appropriate statement in relation to R-square and
adjusted R-square:
Adjusted R-square decreases when the added independent variable adds little value
A)
to the regression model.
Adjusted R-square is a value between 0 and 1 and can be interpreted as a
B)
percentage.
R-square typically increases when new independent variables are added to the
C)
regression regardless of their explanatory power.
William Brent, CFA, is the chief financial officer for Mega Flowers, one of the largest
producers of flowers and bedding plants in the Western United States. Mega Flowers grows
its plants in three large nursery facilities located in California. Its products are sold in its
company-owned retail nurseries as well as in large, home and garden "super centers". For its
retail stores, Mega Flowers has designed and implemented marketing plans each season
that are aimed at its consumers in order to generate additional sales for certain high-margin
products. To fully implement the marketing plan, additional contract salespeople are
seasonally employed.
For the past several years, these marketing plans seemed to be successful, providing a
significant boost in sales to those specific products highlighted by the marketing efforts.
However, for the past year, revenues have been flat, even though marketing expenditures
increased slightly. Brent is concerned that the expensive seasonal marketing campaigns are
simply no longer generating the desired returns, and should either be significantly modified
or eliminated altogether. He proposes that the company hire additional, permanent
salespeople to focus on selling Mega Flowers' high-margin products all year long. The chief
operating officer, David Johnson, disagrees with Brent. He believes that although last year's
results were disappointing, the marketing campaign has demonstrated impressive results
for the past five years, and should be continued. His belief is that the prior years'
performance can be used as a gauge for future results, and that a simple increase in the
sales force will not bring about the desired results.
Brent gathers information regarding quarterly sales revenue and marketing expenditures for
the past five years. Based upon historical data, Brent derives the following regression
equation for Mega Flowers (stated in millions of dollars):
Brent shows the equation to Johnson and tells him, "This equation shows that a $1 million
increase in marketing expenditures will increase the independent variable by $1 .6 million,
all other factors being equal." Johnson replies , "It also appears that sales will equal $12.6
million if all independent variables are equal to zero."
Using data from the past 20 quarters, Brent calculates the t-statistic for marketing
expenditures to be 3.68 and the t-statistic for salespeople at 2.19. At a 5% significance level,
the two-tailed critical values are tc = +/- 2.127. This most likely indicates that:
Brent calculated that the sum of squared errors (SSE) for the variables is 267. The mean
squared error (MSE) would be:
A) 14.831.
B) 14.055.
C) 15.706.
Brent is trying to explain the concept of the standard error of estimate (SEE) to Johnson. In
his explanation, Brent makes three points about the SEE:
Point 1: The SEE is the standard deviation of the differences between the estimated
values for the independent variables and the actual observations for the independent
variable.
Point 2: Any violation of the basic assumptions of a multiple regression model is going
to affect the SEE.
Point 3: If there is a strong relationship between the variables and the SSE is small, the
individual estimation errors will also be small.
Assuming that next year's marketing expenditures are $3,500,000 and there are five
salespeople, predicted sales for Mega Flowers should will be:
A) $24,200,000.
B) $11,600,000.
C) $24,000,000.
Question #155 - 155 of 164 Question ID: 1471905
Brent would like to further investigate whether at least one of the independent variables can
explain a significant portion of the variation of the dependent variable. Which of the
following methods would be best for Brent to use?
A) An ANOVA table.
B) The multiple coefficient of determination.
C) The F-statistic.
Which of the following statements most accurately interprets the following regression
results at the given significance level?
Variable p-value
Intercept 0.0201
X1 0.0284
X2 0.0310
X3 0.0143
The variables X1 and X2 are statistically significantly different from zero at the 2%
A)
significance level.
The variable X3 is statistically significantly different from zero at the 2% significance
B)
level.
The variable X2 is statistically significantly different from zero at the 3% significance
C)
level.
Lynn Carter, CFA, is an analyst in the research department for Smith Brothers in New York.
She follows several industries, as well as the top companies in each industry. She provides
research materials for both the equity traders for Smith Brothers as well as their retail
customers. She routinely performs regression analysis on those companies that she follows
to identify any emerging trends that could affect investment decisions.
Due to recent layoffs at the company, there has been some consolidation in the research
department. Two research analysts have been laid off, and their workload will now be
distributed among the remaining four analysts. In addition to her current workload, Carter
will now be responsible for providing research on the airline industry. Pinnacle Airlines, a
leader in the industry, represents a large holding in Smith Brothers' portfolio. Looking back
over past research on Pinnacle, Carter recognizes that the company historically has been a
strong performer in what is considered to be a very competitive industry. The stock price
over the last 52-week period has outperformed that of other industry leaders, although
Pinnacle's net income has remained flat. Carter wonders if the stock price of Pinnacle has
become overvalued relative to its peer group in the market, and wants to determine if the
timing is right for Smith Brothers to decrease its position in Pinnacle.
Carter decides to run a regression analysis, using the monthly returns of Pinnacle stock as
the dependent variable and monthly returns of the airlines industry as the independent
variable.
df SS Mean Square
Source
(Degrees of Freedom) (Sum of Squares) (SS/df)
0.916, indicating that the variability of industry returns explains about 91.6% of the
A)
variability of company returns.
0.084, indicating that the variability of industry returns explains about 8.4% of the
B)
variability of company returns.
0.839, indicating that company returns explain about 83.9% of the variability of
C)
industry returns.
Based upon her analysis, Carter has derived the following regression equation: Ŷ = 1.75 +
3.25X1. The predicted value of the Y variable equals 50.50, if the:
Carter realizes that although regression analysis is a useful tool when analyzing investments,
there are certain limitations. Carter made a list of points describing limitations that Smith
Brothers equity traders should be aware of when applying her research to their investment
decisions.
When reviewing Carter's list, one of the Smith Brothers' equity traders points out that not all
of the points describe regression analysis limitations. Which of Carter's points most
accurately describes the limitations to regression analysis?
A) Points 1, 3, and 4.
B) Points 2, 3, and 4.
C) Points 1, 2, and 3.
Using a recent analysis of salaries (in $1,000) of financial analysts, Timbadia runs a
regression of salaries on education, experience, and gender. (Gender equals one for men
and zero for women.) The regression results from a sample of 230 financial analysts are
presented below, with t-statistics in parenthesis.
Timbadia also runs a multiple regression to gain a better understanding of the relationship
between lumber sales, housing starts, and commercial construction. The regression uses a
large data set of lumber sales as the dependent variable with housing starts and commercial
construction as the independent variables. The results of the regression are:
Finally, Timbadia runs a regression between the returns on a stock and its industry index
with the following results:
What is the expected salary (in $1,000) of a woman with 16 years of education and 10 years
of experience?
A) 65.48.
B) 54.98.
C) 59.18.
If the return on the industry index is 4%, the stock's expected return would be:
A) 7.6%.
B) 11.2%.
C) 9.7%.
The percentage of the variation in the stock return explained by the variation in the industry
index return is closest to:
A) 84.9%.
B) 72.1%.
C) 63.2%.
Suppose the analyst wants to add a dummy variable for whether a person has a business
college degree and an engineering degree. What is the CORRECT representation if a person
has both degrees?
Business Engineering
Degree Dummy Degree Dummy
Variable Variable
A) 1 1
B) 0 1
C) 0 0