Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Problem Set 7 (With Instructions) : Regression Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Problem Set 7 (With instructions)

Chapter 12

File: SWA_PS7_S14 in the PS7 Laulima folder. Group or individual.

1. Obtain the equation to predict Revenue (Y) by using Flights as an independent (X) variable and
answer the following questions. Use the data for Flights and Revenues in the Data1 tab. (3 pts)
Use PHStat – Regression - Simple Linear Regression as shown below to answer the
questions. Use Coefficient table in “COMPUTE” tab.

Simple Linear Regression Analysis

Regression Statistics
Multiple R 0.8091
R Square 0.6547
Adjusted R
0.6414
Square
Standard Error 390.4261
Observations 28

ANOVA
Significance
df SS MS F
F
7514368.479 49.296
Regression 1 7514368.4793 0.0000
3 4
Residual 26 3963246.1992 152432.5461
11477614.678
Total 27
6

Coefficient Upper
Standard Error t Stat P-value Lower 95% Lower 95%
s 95%
-
Intercept -4465.1587 1023.5165 -4.3626 0.0002 -6569.0269 2361.290 -6569.0269
5
Flights 0.0252 0.0036 7.0211 0.0000 0.0178 0.0325 0.0178

a. What is the equation that predicts revenue?

Y= B0+B1 (x1)
Revenue= -4465.1587 + 0.0252(321,000) (+Error)

b. What is the value of R2 ? Interpret its meaning.

R2 = 0.6547. This shows the change in percentage of revenue when the number of flights are adjusted,
and it shows that there is a 65.57% change in revenue when the number of flights increase.

c. How do you find the correlation coefficient for revenue and Flights? Interpret its
meaning.
The correlation coefficient or multiple R is 0.8091 which means that there is about 81%
correlation between revenue and flights.

d. What is the standard error in the regression stats section? Interpret its meaning.

The standard error is 390.4261. This is how much we can expect the outcome to vary by, or how
much room we are allotting for error in change of prices.

2. Check the 4 L.I.N.E. assumptions for making inferences in regression. You need 2 plots here as well
as skewness/kurtosis and the Durbin-Watson statistic. For each assumption state whether you feel
the assumption is OK or not OK and state why you decided the way you did. (3 pts)

Use the 1 page sheet handout on regression or Excel files in your Laulima support files folder,
lecture notes, or the text to help you interpret whether or not these 4 assumptions are
satisfied. Include excel results in your answers!

a. Residuals against Flights (X) to check linearity. This plot is given in the Residual Plot sheet.

Residual Plot
1600

1400

1200

1000

800

600
Residuals

400

200

0 R² = 7.7E-32

-200

-400

-600
0 50000 100000 150000 X
200000 250000 300000 350000 400000

b. Residuals against Predicted Y to check constant variability. Do you have constant variability?
The data is in the Residuals sheet. Copy and paste the Predicted Y and the Residuals values
next to each other with the Predicted Values series first. Highlight both of those data series
and use Insert - Scatter Plot.

2
Residuals
1600

1400

1200

1000

800

600
Residuals
400

200

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
-200

-400

-600

The points all seem to be random and have no discernible pattern so there appears to be constant variability.

c. Skewness and kurtosis to check normality of residuals; report conclusion.


Descriptive Summary

Flights Residuals
Mean 284895.5357 -4.22265E-13
Median 287211 -125.6581272
Mode #N/A #N/A
Minimum 249119 -473.9149332
Maximum 349191 1316.93538
Range 100072 1790.850313
Variance 439701915.3690 146786.8963
Standard Deviation 20969.0704 383.1278
-
Coeff. of Variation 7.36% 90731512241726900.00%
Skewness 1.0484 1.6206
Kurtosis 2.4813 3.8009
Count 28 28
Standard Error 3962.7818 72.4043

The skewness and kurtosis is not between -1 and 1 so it is not normally distributed.

d. The Durbin-Watson statistic to check independence of residuals. Are the residuals


independent? Read the D-W value in “DurbinWatson” sheet
Durbin-Watson Calculations

3
Sum of Squared Difference of 3271417.775
Residuals 6
Sum of Squared Residuals 3963246.199
2

Durbin-Watson Statistic 0.8254

The D-W value is 0.8254 which is not below 1.3 therefore they are not independent and we
must look at the standard deviation.

e. Given your answers above, report on the validity and reliability of making predictions using this
regression model.

The sample size is only 28 which is below 30 so it is not normally distributed as seen in part c. Also since the
D-W value is below 1.3, there is significance dependence so using a linear regression model probably is not
the best model to use.

3. Answer the following inference questions. (2 pts)

You can use the results in the COMPUTE tab.

Simple Linear Regression Analysis

Regression Statistics
Multiple R 0.8091
R Square 0.6547
Adjusted R
0.6414
Square
Standard Error 390.4261
Observations 28

ANOVA
Significance
df SS MS F
F
7514368.479 49.296
Regression 1 7514368.4793 0.0000
3 4
Residual 26 3963246.1992 152432.5461
11477614.678
Total 27
6

Coefficient Upper
Standard Error t Stat P-value Lower 95% Lower 95%
s 95%
-
Intercept -4465.1587 1023.5165 -4.3626 0.0002 -6569.0269 2361.290 -6569.0269
5
Flights 0.0252 0.0036 7.0211 0.0000 0.0178 0.0325 0.0178

4
a. Is there a significant linear relationship between Revenue and Flights? Use a level of significance
of .05. Make sure to include hypotheses and a statement on how confident you are in H 1.

H0 = B1 = 0
H1 = B1 =/= 0

Since the p-value is so low, I am almost 100% confident that there is a linear relationship
between Revenue and Flights.

b. State and interpret the 95% interval estimate for the population slope. Make sure you include the
units (using revenue ($M) and number of flights).

The slope is 0.0252 so for every additional flight, revenue will increase by 0.0252 million dollars.

c. State and interpret the estimate for the population intercept. Make sure you include the units
using revenue ($M). Discuss if the intercept is significant and relevant/valid as a predictor.

The estimate for the population intercept is -4465.1587$M with a standard error of about 1023.5165$M. This shows
how much revenue is lost if there are 0 flights.

d. Using your results in (1),(2) and (a) above, explain how good this model is for predicting revenue.
If there are problems, explain what might be done to address them.

This model would be good for predicting revenue because it was shown in problem 1 that there was a positive
relationship between revenues and flights but it was shown in 2 that there were a few flaws in the model however in
the results showed there’s a strong linear relationship. In order to address these problems, the sample size could be
increased to get a more accurate understanding.

4. Obtain and interpret a 95% interval estimate for Revenue in q1 2012 when Flights = 321000 (1 pt)

Use the CIE and PI sheet.

The 95% prediction interval is given at the bottom of CIE and PI tab under “For Individual Response Y”.
Write a one sentence conclusion interpreting your interval. Make sure you include revenue and Flights and
the correct units of the problem (Revenues is in millions of dollars). Include excel results.

Confidence Interval Estimate

Data
X Value 321000
Confidence Level 95%

Intermediate Calculations
Sample Size 28
Degrees of Freedom 26
t Value 2.055529439
XBar, Sample Mean of X 284895.5357
Sum of Squared Differences from 1187195171
XBar 5
Standard Error of the Estimate 390.4261084
h Statistic 0.145513615
Predicted Y (YHat) 3610.72794

5
For Average Y
Interval Half Width 306.1360
Confidence Interval Lower Limit 3304.5920
Confidence Interval Upper Limit 3916.863915

For Individual Response Y


Interval Half Width 858.9397
Prediction Interval Lower Limit 2751.7882
Prediction Interval Upper Limit 4469.667649

In q1 of 2012 if there are 32,100 flights, we are 95% confident that revenue is projected to be between
2571.7882$M and $4469.6676$M.

You might also like