The document discusses the assumptions of the Gauss-Markov theorem and the linear regression model. It outlines the key assumptions required for ordinary least squares (OLS) estimators to be best linear unbiased estimators: that the regression is linear, the variables are fixed, the errors have zero mean and are uncorrelated. It also defines R2 as a measure of goodness of fit, noting it ranges from 0 to 1, and is generally higher for time series than cross-sectional data. Excel outputs for regression summaries like total, explained, and residual sums of squares are also presented.
The document discusses the assumptions of the Gauss-Markov theorem and the linear regression model. It outlines the key assumptions required for ordinary least squares (OLS) estimators to be best linear unbiased estimators: that the regression is linear, the variables are fixed, the errors have zero mean and are uncorrelated. It also defines R2 as a measure of goodness of fit, noting it ranges from 0 to 1, and is generally higher for time series than cross-sectional data. Excel outputs for regression summaries like total, explained, and residual sums of squares are also presented.
Consider the following simple linear regression equation: Yi = 1 + 2 Xi + ui
Gauss-Markov Assumptions are the following: 1. The regression equation is linear in parameters ( 0 s). 2. X is non-random/non-stochastic (fixed in repeated samples) 3. Error term has zero mean: E(ui ) = 0 4. Error term is homoscedastic: E(u2i ) = 2 5. Zero autocorrelation between errors: Cov(ui , uj ) = E(ui uj ) = 0 where i 6= j 6. Zero covariance between u and X: Cov(u, X) = 0 7. No. of observations is greater than no of parameters to be estimated: n > k where k is the number of regressors including the intercept 8. Not all values of X are the same: V ar(X) > 0 Gauss-Markov theorem: Given the above assumptions OLS estimators are Best Linear Unbiased Estimators(BLUE).
R2 is a measure of goodness of fit (Also known as coefficient of determination)
Remember the following: R2 lies between 0 and 1. R2 = r2 Y,Yb R2 is generally high in time-series data. R2 is generally low in cross-sectional data. R2 is most meaningful in OLS estimation with an intercept.(can lie outside [0, 1] interval if estimated without an intercept) R2 from two different regression equations with different dependent variables cannot be compared. Excel output
(Yi Y )2 [Total sum of sq with df = n 1]
P ESS = (Ybi Y )2 [Regression sum of sq with df = k 1] M S = T SS =
ESS k1
RSS = R2 =
ubi 2 =
P (Yi Ybi )2 [Residual sum of sq with df = n k] M S =