Excel Guide
Excel Guide
Michael Pahlow
Green River College
December 2019
Green River College students can download and install a free copy of Office 365. For
instructions on how to do this go to https://libguides.greenriver.edu/logon/msoffice.
Another helpful add-in is the XLMiner Data Visualization. In Office 365, got to the Insert tab
and click on the Get Add-Ins button.
In the Office Add-ins window that pops up, search xlminer. Then Add the XLMiner Data
Visualization Add-in. You can also add the Analysis Toolpak Add-in this way.
On the Insert tab, there are now additional chart options.
Descriptive Statistics
Enter data or copy and paste data into a column in Excel. The data below is from
https://seattlecentral.edu/qelp/sets/005/005.html
In the Descriptive Statistics dialog box, select the Input Range of the data by highlighting the
desired column of data values. In this example, it is the column of numbers below ozone (ppm).
Then select the output options. Select Output Range option, then click in the box to the right,
then select a blank cell to the right of the data. Check the Summary Statistics box and click on
OK.
Here are the results.
1. Bar Charts
Highlight the column of categories and the totals. Highlight one of the lists, then while pressing
ctrl, highlight the second list. Then select the bar/column charts option on the Insert Tab. Then
select the clustered 2D-column option.
The following chart appears (left). Double-clicking on a chart feature allows you to edit that
feature. Changing the chart title and widening the bars gives the chart on the right.
2. Pie Charts
Highlight the column of categories and the totals. Highlight one of the lists, then while pressing
ctrl, highlight the second list. Then select the pie charts option on the Insert Tab. Then select the
standard 2D Pie chart option. The title and legend were edited to obtain the pie chart below.
Strong Democrat
Democrat
Independent, near Dem.
Independent
Independent, near Rep.
Republican
Strong Republican
Other Party
3. Pareto Chart
Highlight the column of categories and the totals. Highlight one of the lists, then while pressing
ctrl, highlight the second list. Then select the statistical charts option on the Insert Tab. Then
select the Pareto chart option.
This is the result.
After editing the result, removing the Pareto line and rescaling to read all the labels, the resulting
chart looks like this.
Quantitative Variable Plots
Example:
1. Histograms
Select a column of data. (I chose the hits (H) column
from the Edgar Martinez data above.) Then select
the histogram from the statistics charts button on the
Insert tab.
2. Boxplots
Here is the result after editing the chart title and removing the mean marker.
3. Time Plots
Select a column a with time (days, years, etc.), then select another column while pressing ctrl.
Then select the scatter plot option with connecting lines.
Here is the result after changing the title and deleting the legend.
GeoGebra
Another option for making statistical plots and calculating basic descriptive statistics is
GeoGebra (www.geogebra.org). This free software makes nicer graphs but is less customizable
than Excel. GeoGebra is one of the few software options that creates a stem and leaf plot.
4. Scatterplots
Here is the result after editing the chart titles and axes titles.
80
60
40
20
0
0 50 100 150 200
Hits
80
60
40
20
0
0 50 100 150 200
Hits
Hypothesis Testing
This guide will not cover hypothesis testing for one or two proportions. Excel has built-in
formulas for each calculation for these tests but does not have a single, built-in command for
doing a proportion test. Use your graphing calculator or GeoGebra for these tests.
1. t Test
Excel does not have a built-in function for a one-sample t test but there is a trick. First type the
hypothesized mean twice in a new column. Then select t-Test: Two-Sample Assuming Unequal
Variances from the Data Analysis dialog box.
In the next window, select the data as Variable 1 Range and the hypothesized mean as Variable 2
Range. Then select the output range as a blank cell, then click on OK.
Here are the results. Excel returns the left-tailed and two-tailed p-values. The right-tailed p-value
is 1 – (left-tailed p-value).
2. Two-Sample t Test
Example: The data below is from the Online Stat Book in a case study prepared by David Lane
(http://onlinestatbook.com/2/case_studies/stroop.html). Is there a difference between “words”
and “colors”?
Select t-Test: Two-Sample Assuming Unequal Variances in the Data Analysis dialog box.
Select the two columns for the two variable ranges and select an output range.
Here are the results.
Select t-Test: Two-Sample Assuming Equal Variances in the Data Analysis dialog box.
Select the two columns for the two variable ranges and select an output range.
3. Matched-Pairs t Test.
Here is an example from WAMAP. Select the Paired t-test from the Data Analysis dialog box.
Select the two columns for the two variable ranges and select an output range.
Here is an example from WAMAP comparing four levels of depression after taking a treatment.
After selecting the ANOVA: Single Factor option from the Data Analysis dialog box. Select all
the columns for the input range. Here I selected the labels as well as the data and then checked
the Labels in first row box. Select an output range and then click OK.
5. χ2 Goodness-of-Fit Test
Excel has a built-in Chi-Square Test function. The following example will also show how to
calculate χ2 contributions and χ2 test statistic.
Example: Data was collected on dog-owners (only people who have a dog for a pet were invited
to respond to the survey). The categorical variable with three options: owns no other pets, own
other pets including at least one cat, own other pets not including any cats was counted. Based on
past experience, you believe the percentages for each of these categories is 50%, 40% and 10%,
respectively. You collect data from 60 people, with 33 respondents own no other pets, 20 own
other pets including at least one cat, and 7 own other pets but no cats.
First, enter a column of observed data counts and another column of expected counts. In a third
column, labeled as Chi-Squared Contribution below, in the cell next to the first row of data, enter
the formula =(A2-B2)^2/B2. Then fill down this column.
Next, sum the Chi-Squared Contribution column.
Finally, calculate the p-value by using the CHISQ.TEST formula. Select the observed and
expected columns as the two inputs for this formula.
6. χ2 Test
Excel’s built-in Chi Square Test function is for a χ2 goodness-of-fit test, not a χ2 test of
independence. However, all the steps for a χ2 test can be achieved with simple calculations in
Excel. For an automated version of this test you can use Minitab which is available on most
Green River College campus computers (or www.minitab.com for a free trial.)
Example: Data from the 2002 General Social Survey (GSS) on political party and age group is
shown below.
Age Group
18-30 31-40 41-55 55-89 Total
Strong Democrat 60 83 113 151 407
Democrat 99 126 138 148 511
Independent, near Dem. 72 56 77 62 267
Independent 152 124 149 102 527
Independent, near Rep. 53 41 50 54 198
Republican 90 85 133 138 446
Strong Republican 42 56 89 127 314
Other Party 9 12 14 13 48
Total 577 583 763 795 2718
After copying and pasting the table into Excel, insert two rows below each row category. Below
these are labeled expected and chi-squared contribution.
In the first expected row cell enter the formula (row total)*(column total)/(table total.) Here I
fixed the row total cell and used the value for the table total instead of the table total cell. This
makes filling across the row possible.
Next, in the first chi-squared contribution cell enter the formula =(data value –
expected)^2/expected, then fill across the row.
After this has been done for all of the expected rows and chi-squared contribution rows, sum all
of the values in the chi-squared contribution rows.
Finally, calculate the p-value using the CHISQ.DIST formula. This formula calculates the area in
the left tail, but the χ2 is right tailed, so use 1 – left-tail area to calculate the right-tailed area. The
three arguments for this formula are the chi-squared value, the degrees of freedom ((# rows –
1)(# columns – 1)) and True for the area. (False gives the pdf value, instead of the area.)
Age Group
Total
18-30 31-40 41-55 55-89
Strong Democrat 60 83 113 151 407
expected 86.40139809 87.29985283 114.2534952 119.0452539
chi-squared contribution 8.067390533 0.211784256 0.013752317 8.577459139
Democrat 99 126 138 148 511
expected 108.4793966 109.6074319 143.4484915 149.4646799
chi-squared contribution 0.828350479 2.451624703 0.20694578 0.014353138
Independent, near Dem. 72 56 77 62 267
expected 56.68101545 57.27041943 74.95253863 78.09602649
chi-squared contribution 4.140209657 0.028181486 0.055930034 3.317480804
Independent 152 124 149 102 527
expected 111.8760118 113.0393672 147.9400294 154.1445916
chi-squared contribution 14.39034522 1.062775516 0.007594548 17.63966161
Independent, near Rep. 53 41 50 54 198
expected 42.03311258 42.47019868 55.58278146 57.91390728
chi-squared contribution 2.861377905 0.050894138 0.560739279 0.264507628
Republican 90 85 133 138 446
expected 94.68064753 95.665195 125.2016188 130.4525386
chi-squared contribution 0.231393235 1.189004886 0.485734524 0.436665884
Strong Republican 42 56 89 127 314
expected 66.65857248 67.35172921 88.1464312 91.84326711
chi-squared contribution 9.121785453 1.913265741 0.008265561 13.45766442
Other Party 9 12 14 13 48
expected 10.18984547 10.29580574 13.47461369 14.0397351
chi-squared contribution 0.138935596 0.282083613 0.020485246 0.07699925
Total 577 583 763 795 2718
chi-square 92.11364157
p-value 6.95299E-11
Regression Test and Multiple Regression
Example: The following data is from Redfin for Edmond, Oklahoma retrieved on December 19,
2019. The vacant lot listings were deleted from the data. Select Regression from the Data
Analysis dialog box.
Select one column of data for the Y Range. Here I chose sales price as the response variable.
Then select one or more columns for the X Range. Here I chose square footage and lot size.
Here are the results.