Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Using Excel

This document provides instructions for using basic statistical analysis tools in Microsoft Excel, including histograms, descriptive statistics, simple linear regression, and hypothesis testing. It includes step-by-step explanations and examples using sample housing price and square footage data. Exercises guide the reader to construct histograms, frequency distributions, descriptive statistics, scatter plots, linear regression models, and t-tests for linear relationships using the Excel Data Analysis toolpack.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Using Excel

This document provides instructions for using basic statistical analysis tools in Microsoft Excel, including histograms, descriptive statistics, simple linear regression, and hypothesis testing. It includes step-by-step explanations and examples using sample housing price and square footage data. Exercises guide the reader to construct histograms, frequency distributions, descriptive statistics, scatter plots, linear regression models, and t-tests for linear relationships using the Excel Data Analysis toolpack.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Basic Business Statistics:

Using Microsoft Excel

1
1. Histograms in Exel

1
Select
Tools/Data Analysis

2
1. Histograms in Exel (contiued)
2
Choose Histogram

3
Input data and bin ranges

Select Chart Output

3
Exercise 1:
• Given below are the heigts, in centimetres of 50 students:

164 155 160 162 172 171 162 160 162 159
160 158 166 172 158 163 165 164 161 158
160 170 168 157 168 166 160 162 163 167
171 164 167 158 159 160 163 167 168 159
160 162 170 168 164 160 168 165 165 160

4
Excercise 1 (continued)
• 1. Place the data in ordered array.
• 2. Set up a stem-and-leaf display for these data.
• 3. Construct the frequency distribution and the
percentage distribution for these data.
• 4. Construct a grouped frequency distribution table
with the width of classe interval of 5cm.
• 5. From this frequency distribution table, construct
the bar graphs and the pie charts.

5
2. Descriptive statistics
• Use menu choice:
tools / data analysis / descriptive statistics
• Enter details in dialog box

6
2. Descriptive statistics (continued)

 Use menu choice:


tools / data analysis /
descriptive statistics

7
2. Descriptive statistics(continued)

 Enter dialog box


details

 Check box for


summary statistics

 Click OK

8
2. Descriptive statistics (continued)
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:

$2,000,000
500,000
300,000
100,000
100,000

9
3. Simple Linear Regression
Sample Data for House Price Model:
House Price in $1000s Square Feet
(y) (x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700 10
3. Simple linear regression
• Tools / Data Analysis /Regression

11
3. Simple linear regression : Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082 The regression equation is:
Adjusted R
Square 0.52842 house price  98.24833  0.10977 (square feet)
Standard Error 41.33032
Observations 10

ANOVA Significance
  df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000      

Upper
  Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
12
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
3. Regression Excel Output
(continued) (Graphical Presentation)
• House price model: scatter plot and regression line

450
400
House Price ($1000s)

350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

house price  98.24833  0.10977 (square feet)


13
3. Simple LinearRegression
(Interpretation of b0 , b1)

house price  98.24833  0.10977 (square feet)

– Here, no houses had 0 square feet, so b0 =


98.24833 just indicates that, for houses within the
range of sizes observed, $98,248.33 is the portion
of the house price not explained by square feet
– Here, b1 = .10977 tells us that the average value of
a house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of 14
size
3. Regression Excel Output : Estimate
Estimated Regression Equation: Predict the price for a house
with 2000 square feet
house price  98.25  0.1098 (sq.ft.)

house price  98.25  0.1098 (sq.ft.)


 98.25  0.1098(200 0)
 317.85

The predicted price for a house with 2000


square feet is 317.85($1,000s) = $317,850 15
3. Regression Excel Output : Coefficient of
Determination, R2
Regression Statistics
Multiple R 0.76211 2 SSR 18934.9348
R Square 0.58082
R    0.58082
SST 32600.5000
Adjusted R
Square 0.52842 58.08% of the variation in
Standard Error 41.33032 house prices is explained by
Observations 10 variation in square feet

ANOVA Significance
  df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000      

Upper
  Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
16
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
3. Simple LinearRegression:
t Test for the Slope, b1
• t test for a population slope
– Is there a linear relationship between x and y?
• Hypotheses
– H0: β1 = 0 (no linear relationship)
– H1: β1 0 (linear relationship does exist)
• Test statistic
where:
– b1  β1 b1 = Sample regression slope
t coefficient
sb1 β1 = Hypothesized slope
sb1 = Estimator of the standard
– d.f.  n  2 error of the slope 17
3. t Test for the Slope, the standard error Excel
Output
Regression Statistics sε  41.33032
Multiple R 0.76211
R Square 0.58082
Adjusted R
Square 0.52842 sb1  0.03297
Standard Error 41.33032
Observations 10

ANOVA Significance
  df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000      

Upper
  Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 18
232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
3. t Test for the Slope (continued)
Test Statistic: t = 3.329
b1 sb1 t
H0:β1 =0 From Excel output:
H1:β1  0   Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
d.f. = 10-2 = 8
Decision:
/2=.025 /2=.025 Reject H0
Conclusion:
Reject H0 Do not reject H0 Reject H
There is sufficient evidence
-tα/2 tα/2 0

0 that square footage affects


-2.3060 2.3060 3.329
house price 19
4. Multiple Regression: Example

• A distributor of frozen desert pies wants to


evaluate factors thought to influence demand
– Dependent variable: Pie sales (units per week)
– Independent variables: Price (in $)
Advertising ($100’s)

• Data are collected for 15 weeks

20
Multiple Linear Regression Equation
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R 0.44172
Standard Error 47.46341
Sales  306.526 - 24.975(Price)  74.131(Advertising)
Observations 15

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
21
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
4. Multiple Regression : (continued)
Multiple Linear Regression Equation

Sales  306.526 - 24.975(Price)  74.131(Advertising)


where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of effects of changes
changes due to due to price
advertising 22
4. Multiple Regression : (continued)
Using The Model to Make Predictions

Predict sales for a week in which the selling


price is $5.50 and advertising is $350:

Sales  306.526 - 24.975(Price)  74.131(Advertising)


 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62

Note that Advertising is


Predicted sales in $100’s, so $350
means that x2 = 3.5
is 428.62 pies
23
4. Multiple Regression : (continued)
Multiple Coefficient of Determination
Regression Statistics
2 SSR 29460.0
Multiple R 0.72213 R    .52148
R Square 0.52148 SST 56493.3
Adjusted R 52.1% of the variation in pie sales
Square 0.44172
is explained by the variation in
Standard Error 47.46341
price and advertising
Observations 15

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 24
555.46404
4. Multiple Regression : (continued)
Correlation matrix
Pie Price Advertising
Week Sales ($) ($100s) Multiple regression model:
1 350 5.50 3.3
2 460 7.50 3.3 Sales = b0 + b1 (Price)
3 350 8.00 3.0
4 430 8.00 4.5
+ b2 (Advertising)
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0 Correlation matrix:
8 470 6.40 3.7
  Pie Sales Price Advertising
9 450 7.00 3.5
Pie Sales 1
10 490 5.00 4.0
Price -0.44327 1
11 340 7.20 3.5
Advertising 0.55632 0.03044 1
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5 25
15 300 7.00 2.7
4. Multiple Regression : (continued)
Correlation matrix

  Pie Sales Price Advertising


Pie Sales 1
Price -0.44327 1
Advertising 0.55632 0.03044 1

• Price vs. Sales : r = -0.44327


– There is a negative association between
price and sales
• Advertising vs. Sales : r = 0.55632
– There is a positive association between
advertising and sales
26
4. Multiple Regression : t-tests of
individual variable slopes, b1 and b2
Regression Statistics
Multiple R 0.72213
t-value for Price is t = -2.306, with
R Square 0.52148
p-value .0398
Adjusted R 0.44172
Standard Error 47.46341
t-value for Advertising is t = 2.855,
Observations 15 with p-value .0145

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
27
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
4. Multiple Regression : t-tests of
individual variable slopes, b1 and b2
From Excel output:
H0: βi = 0
  Coefficients Standard Error t Stat P-value
H1: βi 0 Price -24.97509 10.83213 -2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
d.f. = 15-2-1 = 12
 = .05 The test statistic for each variable falls
t/2 = 2.1788 in the rejection region (p-values < .05)
Decision:
/2=.025 /2=.025
Reject H0 for each variable
Conclusion:
Reject H0 Do not reject H0 Reject H0
There is evidence that both
-tα/2 0
tα/2 Price and Advertising affect
-2.1788 2.1788 pie sales at  = .05 28
4. Multiple Regression :
Standard Deviation of the Regression Model
Regression Statistics
Multiple R 0.72213
R Square 0.52148
The standard deviation of the
Adjusted R 0.44172
regression model is 47.46
Standard Error 47.46341
Observations 15

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
29
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
4. Multiple Regression:
F-Test for Overall Significance of(continued)
the Model

Regression Statistics
Multiple R 0.72213
MSR 14730.0
R Square 0.52148 F   6.5386
Adjusted R MSE 2252.8
Square 0.44172
With 2 and 12 degrees P-value for
Standard Error 47.46341 of freedom the F-Test
Observations 15

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 30
555.46404
4. Multiple Regression:
(continued)
F-Test for Overall Significance of the Model

H0: β1 = β2 = 0 Test Statistic:


H1:β1 and β2 not both zero MSR
F  6.5386
 = .05 MSE
df1= 2 df2 = 12
Decision:
Reject H0 at  = 0.05
F = 3.885
Conclusion:
The regression model does explain
 = .05 a significant portion of the
variation in pie sales
0 Do not Reject H0
F (There is evidence that at least one
reject H0
F.05 = 3.885 independent variable affects y)31
5. ANOVA
• ** Tools – Data Analysis  Anova:
single factor

32
• ** Tools – Data Analysis  Anova:
single factor

33

You might also like