Stoc2517 ch10 343-350 C1
Stoc2517 ch10 343-350 C1
CHAPTER
10
Conducting a Regression
Study Using Economic Data
time efficiently and to avoid some potentially frustrating pitfalls. This chapter
provides suggestions for how to conduct a regression study using economic data
and how to report the results of that study.
One reason that this book repeatedly uses the California test score data is
to illustrate the steps involved in undertaking a serious empirical application:
becoming familiar with the data, developing and estimating a base regression
specification, thinking through potential omitted variables, modeling relevant
nonlinearities, performing sensitivity analysis, assessing the internal and external
validity of the findings, and reporting the results and their limitations. This
chapter steps back from the details of the test score application and describes
the main steps in conducting an empirical analysis and reporting the results.
It is important to approach an empirical analysis with an open mind. It is
tempting to think that your goal should be a high adjusted R2 or an estimated
coefficient of interest that is economically large and statistically significant. But
this is not the purpose of a thoughtful empirical analysis; instead, the purpose is
to answer a specific question while using your best judgment and being honest
about what the data do and do not tell you. The coefficient of interest might be
large and well estimated; it might be small and well estimated; or it might just be
imprecisely estimated because of limitations of the data or because the question
being asked is a very difficult one. Reaching any one of these conclusions
whether it confirms your prior suspicions or notis interesting and helps you
better to understand the topic you are researching.
343
344
CHAPTER 10
In our analysis of the California test score data, if our objective had been to
find a large coefficient, we might have stopped at the regression of test scores
against the studentteacher ratio and never included any control variables. But,
upon reflection, it became clear that that estimate was subject to considerable
omitted variable bias, which was addressed by including the control variables.
By the end of the analysis, we had concluded that the class size effect, while
statistically significant, is economically smalla conclusion confirmed using a
different observational data set (the Massachusetts data). The distinction is
subtle between trying to measure an effect as reliably as possible and trying
to prove that the effect is important, but it can make the difference between a
study that is credible and one that is not.
10.2
Collecting Data
345
346
CHAPTER 10
10.3
347
methods require some extensions and modifications. It is beyond the scope of this
brief edition to go into those extensions in detail. However, if you have mastered
the material in this book, it is not a big step to learn the necessary modifications
to handle panel data and time series data. If you are interested in learning more
about such data sets, see Chapter 10 (panel data) or Chapters 14 and 15 (time
series data) in the full edition of this book.
348
CHAPTER 10
guidelines for whether to include a variable in a regression are given in Key Concept 9.2. Some of the alternative specifications might investigate possible nonlinearities, as illustrated in the analysis in Section 8.4 of the California test score data.
After you have some regression results, it is useful to go through the checklist
of threats to internal validity in Key Concept 9.7. Are there arguably important
threats to the internal validity of your study? If so, can you address them using
multiple regression analysis of your data?
At this point, it is useful to share your findings with a classmate or instructor.
The process of explaining what you have done and what you have found will help
you think through any shortcomings of the analysisyour classmate or instructor
can help with this tooand this in turn will point to additional specifications and
additional sensitivity analysis to undertake. In this way, conducting an empirical
analysis is a process that goes through multiple iterations.
10.4
349
a careful discussion of the results, including assessments both of statistical significance and of economic significance, that is, the magnitude of the estimated
relations in a real-world sense. Present full disclosure results: report results
that you consider to be an honest and complete summary of what the data say
concerning your question of interest, including results that raise doubts about
or suggest limitations of your interpretation.
The empirical analyses of the test score data in Sections 7.6 and 8.4 provide examples of discussions of base and alternative specifications. This section of your paper should also contain a discussion of the potential threats to
the validity of your analysis. Key Concept 9.7 provides a list of five potential
threats to internal validity of regression studies using observational data. Some
of those threats might not be relevant to your study, and this section should
focus on the most salient threats. All empirical analyses have limitations, and
it is important to provide a concise statement of what you consider to be the
most substantial limitations of your analysis.
5. Summary and Discussion. This section summarizes your main empirical findings and discusses their implications for the original question of interest.
The guidelines in this chapter for conducting an empirical study are summarized in Key Concept 10.1.
350
CHAPTER 10
KEY CONCEPT
10.1
AN
The following guidelines can help you be efficient when you undertake an empirical study.
1. Choose a topic that interests you personally.
2. Develop a few narrow questions and think through an empirical analysis that
would answer them. For each question, what base specification would you use?
What is the key regressor and what is the regression coefficient of interest?
What might be important sources of omitted variable bias?
3. Learn about relevant data sets by consulting a data librarian or the Web (see
www.aw-bc.com/stock_watson).
4. Narrow your question further. Will your candidate data set plausibly help you
to estimate the parameter of interest?
5. Format the data so that they can be read into your statistical software.
6. Compute summary statistics, scatterplots, and other data diagnostics. Correct
or discard outliers arising from data entry or computer errors.
7. Conduct your regression analysis:
a. Estimate your base regression.
b. Estimate alternative specifications that address potential nonlinearity
and omitted variable bias.
c. Assess the threats to the internal validity of your analysis using the list in
Key Concept 9.7.
d. Explain to a classmate or instructor what you have done, why you have
done it, and what you have found.
e. Repeat steps ad until you are satisfied that you have addressed, as best
you can, the main threats to the internal validity of your analysis.
8. Write up your results using the outline in Section 10.4. Discuss the statistical
and economic (real-world) significance of your results, report full disclosure
results, and discuss any remaining threats to internal and external validity.