Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

C747 Transcripts Part1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Introduction to SAS Essentials

Mastering SAS for Data Analytics

Alan Elliott and Wayne Woodward


1 SAS ESSENTIALS -- Elliott & Woodward
Chapter 10: ANALYZING COUNTS AND
TABLES

2 SAS ESSENTIALS -- Elliott & Woodward


LEARNING OBJECTIVES
• To be able to use PROC FREQ to create one-way frequency tables
• To be able to use PROC FREQ to create two-way (cross-
tabulation) tables
• To be able to use two-by-two contingency tables to calculate
relative risk measures
• To be able to use Cohen's kappa to calculate inter-rater reliability

3 SAS ESSENTIALS -- Elliott & Woodward


10.1 USING PROC FREQ
 PROC FREQ is a multipurpose SAS procedure for analyzing
count data. It can be used to obtain frequency counts for
one or more individual variables or to create two-way
tables (cross-tabulations) from two variables.
 A simplified syntax is

PROC FREQ <Options(s)>; <Statements>


TABLES requests </options>;

4 SAS ESSENTIALS -- Elliott & Woodward


Table 10.1. Common Options for PROC FREQ
Option Meaning
DATA=dataname Specify which data set to use
ORDER=option Specifies the order in which results are listed in the output
5 table Options are DATA,
SAS ESSENTIALS FORMATTED,
-- Elliott & Woodward FREQ and ORDER.

This is illustrated in an upcoming example.


PAGE Specifies that only one table will appear per page (not
applicable to HTML output.)
ALPHA=n Sets the level for confidence limits (default 0.05)
COMPRESS Begins the next table on the same page when possible (not
applicable to HTML output.)
NOPRINT Used when you want to capture output but not display
tables.
Table 10.2 Common Statements for PROC FREQ
Option Meaning
EXACT Produces exact p-values for tests. Fisher’s Exact
Test automatically calculated for a 2x2 table.
OUTPUT= dataname Creates an output data set containing statistics
from an analysis
WEIGHT variable Identifies a weight variable that contains
summarized counts
TABLES <variable- Specifies which tables will be displayed. More
combinations/options>; information about this statement is given below.

TEST Specifies which statistical tests will be performed


(Requires a TABLES statement)
BY, FORMAT, LABEL, These statements are common to most
WHERE procedures, and may be used here.
6 SAS ESSENTIALS -- Elliott & Woodward
The TABLES Statement
 The TABLES statement is required for all of the examples
in this chapter. Its format is:

TABLES <variable-combinations/options>;

 where variable-combinations specifies frequency or


cross-tabulation tables. Options for the TABLE statement
follow a slash (/). For example,
TABLES A*B / CHISQ;
 requests that the chi-square and related statistics will be
reported for the cross-tabulation A*B.
7 SAS ESSENTIALS -- Elliott & Woodward
More about TABLES
 To obtain counts of the number of subjects observed in each
category of group (GP), use the following:
PROC FREQ; TABLES GP; RUN;
 To produce a cross-tabulation of GENDER by treatment GP:
PROC FREQ; TABLES GENDER*GP;RUN;
 The variables specified in the TABLES statement can be either
categorical/character or numeric.
 To request chi-square statistics for a table, include the option
/CHISQ at the end of the TABLES statement. For example,
PROC FREQ; TABLES GENDER*GP/CHISQ;

8 SAS ESSENTIALS -- Elliott & Woodward


Table 10.3. Sample TABLES statements
Table Specification Description

TABLES A; Specifies frequencies for a single variable

TABLES A*B; Specifies a crosstabulation between two variables

TABLES A*B B*C X*Y; Also,


TABLES A*(B C D);
Specify several cross-tabulation tables
is the same as
TABLES A*B A*C A*D;

TABLES (A -- C)*X;
is the same as Use a range of variables in a TABLES statement
TABLES A*X B*X C*X;

9 SAS ESSENTIALS -- Elliott & Woodward


TABLE 10.4 Options for the TABLE Statement
Option Description
AGREE Request Kappa statistic (inter-rater reliability).
RELRISK Requests relative risk calculations
FISHER Requests Fisher’s Exact test for tables greater than 2 x 2
SPARSE Requests all possible combinations of levels
MISSING Requests missing values treated as nonmissing
CELLCHI2 Displays the contribution to chi-square
NOCOL Suppresses column percentages for each cell
NOCUM Suppresses cumulative frequencies
NOFREQ Suppresses frequency count for each cell
NOPERCENT Suppresses row percentage and column percentage in cross-
tabulation tables
NOPRINT Suppresses tables but displays statistics
NOROW Suppresses row percentage for each cell
TESTP=(list) Specifies a test based on percentages (Goodness-of-Fit
test.)

10 SAS ESSENTIALS -- Elliott & Woodward


10.2 ANALYZING ONE-WAY FREQUENCY TABLES
 When count data are collected, you can use PROC FREQ
to produce tables of the counts by category as well as to
perform statistical analyses on the counts.
 This section describes how to create tables of counts by
category and how to perform a goodness-of-fit test.
 Do Hands On Example p 246 (AFREQ1.SAS) (Frequencies)

11 SAS ESSENTIALS -- Elliott & Woodward


ORDER= Option
 The ORDER=FORMATTED option for PROC FREQ specifies
the order in which the categories are displayed in the
table.
 You must first create a custom format in a PROC FORMAT
command to define the order that you want to be used in
your output table. For Example:
PROC FORMAT;
VALUE $FMTRACE "AA"="African American"
"H"="Hispanic"
"OTH"="Other "
"C"="White";
RUN;

12 SAS ESSENTIALS -- Elliott & Woodward


Apply Your Created Format
 To cause PROC FREQ to display categories in the Formatted
order, apply your created FORMAT:
The ORDER= option species the order in
which the categories are displayed. In this
case, they are displayed in FORMATTED
order.
PROC FREQ ORDER= FORMATTED
DATA=" C: \SASDATA \SURVEY";
TABLES RACE;
FORMAT RACE $FMTRACE.; You must also apply the
RUN; format to the variable for
it to be correctly used.

 Do Hands On Exercise p 248. (AFREQ2.SAS)

13 SAS ESSENTIALS -- Elliott & Woodward


10.3 CREATING ONE-WAY FREQUENCY TABLFS FROM
SUMMARIZED DATA
 The following example illustrates how to summarize counts
from a data set into a frequency table
 Suppose your data is in this summarized form:

CENTS 152
CENTS 100
This means there are 49
NICKELS 49 nickels
DIMES 59
QUARTERS 21
HALF 44
DOLLARS 21
Do the Hands On Exercise p 250 (AFREQ3.SAS)

14 SAS ESSENTIALS -- Elliott & Woodward


Testing Goodness of Fit in a One-Way Table
 A goodness-of-fit test of a single population is a test to
determine if the distribution of observed frequencies in
the sample data closely matches with the expected
number of occurrences under a hypothetical distribution
for the population.
 The hypotheses being tested are as follows:

H0: The population follows the hypothesized distribution.


Ha : The population does not follow the hypothesized
distribution.

15 SAS ESSENTIALS -- Elliott & Woodward


Goodness-of-fit using PROC FREQ
 A chi-square statistic is calculated, and a decision can be
made based on the p-value associated with that statistic.
A low p-value indicates that the data do not follow the
hypothesized, or theoretical, distribution. If the p-value is
sufficiently low (usually <0.05), you will reject the null
hypothesis. The syntax to perform a goodness-of-fit test is
as follows:

PROC FREQ; TABLES variable/ CHISQ


TESTP=(list of ratios);

16 SAS ESSENTIALS -- Elliott & Woodward


Goodness-of-fit Example

 As an example, we will use data from an experiment


conducted by the nineteenth-century monk Gregor
Mendel. According to a genetic theory, crossbred pea
plants show a 9:3:3:1 ratio. From 556 plants, you expect

(9/16) x 556 = 312.75 yellow smooth peas (56.25%)


(3/16) x 556 = 104.25 yellow wrinkled peas (18.75%)
(3/16) x 556 = 104.25 green smooth peas (18.75%)
(1/16) x 556 = 34.75 green wrinkled peas (6.25%)

17 SAS ESSENTIALS -- Elliott & Woodward


Actual Observed Data
 After growing 556 of these pea plants, Mendel observed
the following:

315 have yellow smooth peas


108 have yellow wrinkled peas
101 have green smooth peas
32 have green wrinkled peas

18 SAS ESSENTIALS -- Elliott & Woodward


The Goodness-of-Fit Code
 Hypothesizing a 9:3:3 :1 Ratio:

PROC FREQ ORDER=DATA ; WEIGHT NUMBER;


TITLE 'GOODNESS OF FIT ANALYSIS';
TABLES COLORTYPE / NOCUM CHISQ
TESTP=(0.5625 0.1875 0.1875 0.0625);
RUN;
Note these proportions are in a 9:3:3:1 ratio

 Do Hands on Example p 252 (AFRE4.SAS)

19 SAS ESSENTIALS -- Elliott & Woodward


10.4 ANALYZING TWO-WAY TABLES
 To create a cross-tabulation table using PROC FREQ for
relating two variables, use the TABLES statement with
both variables listed and separated by an asterisk (*),
(e.g., A*B).
 A cross-tabulation table is formed by counting the
number of occurrences in a sample across two grouping
variables.
 The number of columns in a table is usually denoted by c
and the number of rows by r. Thus, a table is said to be an
r x c table, that is, it has r x c cells.

20 SAS ESSENTIALS -- Elliott & Woodward


Test of Independence
 The hypotheses associated with a test of independence
are as follows:

H0: The variables are independent (no association between


them).
Ha : The variables are not independent.

For example, a null hypothesis could


be that there is no association
between handedness (left and right
handed) to hair color

21 SAS ESSENTIALS -- Elliott & Woodward


Test of Homogeneity
 The null hypothesis is that the populations have the same
distribution (they are homogeneous). In this case, the
hypotheses are as follows:

H0: The populations are homogeneous.


Ha : The populations are not homogeneous.

For example, a null hypothesis could be that


from two populations, (male and female) the
distribution of handedness the same.

22 SAS ESSENTIALS -- Elliott & Woodward


Testing These Hypotheses
 The chi-square test of independence or homogeneity is
reported by PROC FREQ (the tests are mathematically
equivalent) by the use of the I CHISQ option in the
TABLES statement. Use the same code to test for
 For example, either independence or
homogeneity

PROC FREQ; TABLES GENDER*GP/ CHISQ;

 Do Hands On Exercise p 254. (AFREQ5.SAS)


23 SAS ESSENTIALS -- Elliott & Woodward
Example code to perform a Chi-Square Test on an rx c
contingency table (Crime Example)

Note that WEIGHT COUNT; is needed


since the data are in summary form.

PROC FREQ DATA=DRINKERS; WEIGHT COUNT;


TABLES CRIME*DRINKER/CHISQ;
TITLE 'Chi Square Analysis of a
Contingency Table';
RUN;

24 SAS ESSENTIALS -- Elliott & Woodward


Summarized Data for the Crime Case
CRIME DRINKER COUNT
Arson 1 50
Arson 0 43 Notice how the data are in
Rape 1 88 summarized for. For Arson,
there were 50 “Drinkers”
Rape 0 62 (DRINKER=1) and 43 “Non-
Violence 1 155 Drinkers) (DRINKER=0)
Violence 0 110
Stealing 1 379
Stealing 0 300
Coining 1 18
Coining 0 14
Fraud 1 63
Fraud 0 144

25 SAS ESSENTIALS -- Elliott & Woodward


Results of Chi Square Analysis
 Observe the statistics
table. The Chi-Ssquare
value is 49.73 and the p-
value is p < 0.0001.
 Thus, you reject the null
hypothesis of no
association
(independence) and
conclude that there is
evidence of a
relationship between
drinking status and type
of crime committed.

26 SAS ESSENTIALS -- Elliott & Woodward


Creating a Contingency Table from Raw Data, the 2 x
2 Case
 In the previous example (CRIME) the data were in
summary form, and you needed to use the WEIGHT
COUNT; statement to reflect that.
 If your data are in raw form – one record per observation,
you do not need the WEIGHT statement.

For this data, each subject


has one record – thus you
have one record per
observation.

 Do Hands on Example p 257 (AFREQ6.SAS)

27 SAS ESSENTIALS -- Elliott & Woodward


Output from 2x2 Chi-Square Analysis
Statistical Results – note in particular
The Resulting Table of counts: the Chi-Square and Fisher Two-Sided
Pr<=p values.

The chi-square statistic, 8.29, p = 0.004,


indicates an association between
CLEANER and RASH (rejects the null
hypothesis). The two-sided Fisher results
p = 0.0095 provides the same decision.

28 SAS ESSENTIALS -- Elliott & Woodward


Tables with Small Counts in Cells
 When you summarize counts in tables, and there are
small numbers in one or more cells, a typical chi-square
statistical analysis may not be valid.
 Do Hands On Example p 259 (AFREQ7.SAS)
 Observe the warning message "WARNlNG: 50% of the
cells have expected counts <5. Chi-square may not be a
valid test.“
 In this case, the Fisher's Exact test (given in Table 10.13) is
the more reliable test and should be used instead of the
Chi-Square test.

29 SAS ESSENTIALS -- Elliott & Woodward


10.5 GOING DEEPER: CALCULATING RELATIVE RISK
MEASURES
 Two-by-two contingency tables are often used when
examining a measure of risk.
 A measure of this risk in a retrospective (case-control)
study is called the odds ratio (OR). In a case- control
study, a researcher takes a sample of subjects and looks
back in time for exposure (or nonexposure).
 If the data are collected prospectively, where subjects are
selected by presence or absence of a risk and then
observed over time to see if they develop an outcome,
the measure of risk is called relative risk (RR).
 Either way RR=1 or OR=1 means no risk observed.

30 SAS ESSENTIALS -- Elliott & Woodward


Testing Relative Risk in PROC FREQ
 In PROC FREQ, the option to calculate the values for OR
or RR is RELRISK and appears as an option to the TABLES
statement as shown here (for the RASH data):

TABLES CLEANER*RASH /RELRISK;

 In the results, a risk measure > 1 indicates that exposure


is harmful and a risk measure <1 implies that exposure is
a benefit.
 Do Hands On Example p 261 (AFREQ6.SAS)

31 SAS ESSENTIALS -- Elliott & Woodward


The OR= 0.1346 specifies the odds of
Results of Risk Analysis Row1/Row2 - that is, for cleaner 1
versus cleaner 2. Because OR is <1,
this indicates that the odds of a
Typically, the Odds Ratio is the person's having a rash who is using
statistic of interest cleaner 1 is less than they are when
the person is using cleaner 2.

32 SAS ESSENTIALS -- Elliott & Woodward


10.6 GOING DEEPER: INTER-RATER RELIABILITY (KAPPA)
 A method for assessing the degree of agreement between
two raters is Cohen's kappa coefficient.
 For example, kappa is useful for analyzing the consistency
of two raters who evaluate subjects on the basis of a
categorical measurement. Two Raters A and B compared…
Data for inter-rater reliability analysis
RATER A
Psyc. Neuro. Organic
Psych. 75 1 4
Neuro. 5 4 1
Rater B
Organic. 0 0 10

80 5 15

33 SAS ESSENTIALS -- Elliott & Woodward


Code used to Calculate Kappa

These options provide the


PROC FREQ Kappa test results.
WEIGHT WT;
TABLE RATERl*RATER2 / AGREE ;
TEST KAPPA;
TITLE 'KAPPA EXAMPLE FROM FLEISS';
RUN;

 Do Hands on Exercise p 262 (AKAPPA1.SAS)

34 SAS ESSENTIALS -- Elliott & Woodward


Results of Kappa Analysis
This is Kappa – the primary
statistic of interest

How to interpret kappa.

Interpretation of kappa statistic

Kappa Value Interpretation

<0 No agreement

0.0–0.20 Poor agreement

0.21–0.40 Fair agreement

0.41–0.60 Moderate agreement

0.61–0.80 Substantial agreement


See text for additional explanation
of these results. 0.81–1.00 Almost perfect agreement

35 SAS ESSENTIALS -- Elliott & Woodward


Calculating Weighted Kappa
 For the case in which rated categories are ordinal (that is
the categories of interest are in a meaningful order), it is
appropriate to use the weighted kappa statistic, because
it is designed to give partial credit to ratings that are close
to but not on the diagonal.
 For example, in a test of recognition of potentially
dangerous airline passengers, suppose a procedure is
devised that classifies passengers into three categories:
1 =No threat/Pass Note how these 3 categories are
ascending in danger – they have
2 =Concern/Recheck a definite order (they are
ordinal.)
3 =Potential threat/Detain.
 Do Hands on Exercise p 265 (AKAPPA2.SAS)

36 SAS ESSENTIALS -- Elliott & Woodward


Weighted Kappa Results
 The code uses this code to caclulate
the weighted kappa:
TABLE RATER1*RATER2 /AGREE;
TEST WTKAP;
 For this analysis report the
“Weighted Kappa” value
kappa=0.7413.
 Use the same interpretation of
kappa table as before.
 See text for the interpretation of
other results.

37 SAS ESSENTIALS -- Elliott & Woodward


10.7 SUMMARY
 This chapter discusses the capabilities of PROC FREQ for
creating one- and two-way frequency tables, analyzing
contingency tables, calculating measures of risk, and
measuring inter-rater reliability (using KAPPA).
 Continue to Chapter 11: COMPARING MEANS USING T-
TESTS

38 SAS ESSENTIALS -- Elliott & Woodward


39 SAS ESSENTIALS -- Elliott & Woodward
40 SAS ESSENTIALS -- Elliott & Woodward

You might also like