Descriptive Statistics Using SAS
Descriptive Statistics Using SAS
PROC MEANS
See www.stattutorials.com/SASDATA for files mentioned in this tutorial
TexaSoft, 2007
These SAS statistics tutorials briefly explain the use and interpretation of standard statistical
analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The
examples include how-to instructions for SAS Software.
N - Number of observations
NMISS - Number of missing observations
MEAN - Arithmetic average)
STD - Standard Deviation
MIN - Minimum (smallest)
MAX - Maximum (largest)
RANGE - Range
SUM - Sum of observations
VAR - Variance
USS Uncorr. sum of squares
CSS - Corr. sum of squares
STDERR - Standard Error
T - Students t value for testing Ho: d = 0
PRT - P-value associated with t-test above
SUMWGT - Sum of the WEIGHT variable
values
***************************************************************
* Data on weight, height, and age of a random sample of 12 *
* nutritionally deficient children *
***************************************************************;
DATA CHILDREN;
INPUT WEIGHT HEIGHT AGE;
DATALINES;
64 57 8
71 59 10
53 49 6
67 62 11
55 51 8
58 50 8
77 55 10
57 48 9
56 42 10
51 42 6
76 61 12
68 57 9
;
ODS RTF;
proc means;
Title 'Example 1a - PROC MEANS, simplest use';
run;
proc means maxdec=2;var WEIGHT HEIGHT;
Title 'Example 1b - PROC MEANS, limit decimals, specify
variables';
run;
proc means maxdec=2 n mean stderr median;var WEIGHT HEIGHT;
Title 'Example 1c PROC MEANS, specify statistics to report';
run;
ODS RTF CLOSE;
WEIGHT
HEIGHT
AGE
Mean
Std Dev
Minimum
Maximum
12
12
12
62.7500000
52.7500000
8.9166667
8.9861004
6.8240884
1.8319554
51.0000000
42.0000000
6.0000000
77.0000000
62.0000000
12.0000000
WEIGHT
HEIGHT
12
12
62.75
52.75
8.99
6.82
Minimum
Maximum
51.00
42.00
77.00
62.00
Variable
WEIGHT
HEIGHT
12
12
62.75
52.75
2.59
1.97
61.00
53.00
Mean
Std Dev
Minimum
Maximum
51.4571429
4.7475808
44.8000000
56.0000000
FEEDTYPE=2
Analysis Variable : WEIGHTGAIN
N
Mean
Std Dev
Minimum
Maximum
54.9666667
4.7944412
51.3000000
64.3000000
In this first version of the output the BY statement (along with the
PROC SORT) creates
two tables, one for each value of the BY variable. In this next example,
the CLASS
statement produces a single table broken down by group (FEEDTYPE.)
Mean
Std Dev
Minimum
Maximum
51.4571429
4.7475808
44.8000000
56.0000000
54.9666667
4.7944412
51.3000000
64.3000000
Hands on Exercise:
1. Modify the above program to output the following statistics:
N MEAN MEDIAN MIN MAX
2. Use MAXDEC=2 to limit number of decimals in output.
PROC MEANS is a quick way to find large or small values in your data
set that may be
considered outliers (see PROC UNIVARIATE also.) This example shows
the results of
using PROC means where the MINIMUM and MAXIMUM identify unusual
values in
the data set. (PROCMEANS3.SAS)
DATA WEIGHT;
INPUT TREATMENT LOSS @@;
DATALINES;
2 1.0 1 3.0 1 -1.0 1 1.5 1 0.5 1 3.5 1 -99
2 4.5 3 6.0 2 3.5 2 7.5 2 7.0 2 6.0 2 5.5
1 1.5 3 -2.5 3 -0.5 3 1.0 3 .5 3 78 1 .6 2 3 2 4 3 9 1 7 2 2
;
ODS RTF;
PROC MEAN; VAR LOSS;
TITLE 'Find largest and smallest values';
RUN;
ODS RTF CLOSE;
Notice that in this output, PROC means indicates that there is a small
value of -99 (could
be a missing value code) and a large value of 78 (could be a miscoded
number.) This is a
quick way to find outliers in your data set.
Analysis Variable : LOSS
N
Mean
Std Dev
Minimum
Maximum
26
2.0423077
25.4650062
-99.0000000
78.0000000
For example, the following code performs a paired t-test for weight loss
data:
(PROCMEANS4.SAS)
DATA WEIGHT;
INPUT WBEFORE WAFTER;
* Calculate WLOSS in the DATA step *;
WLOSS=WAFTER-WBEFORE;
DATALINES;
200 190
175 154
188 176
198 193
197 198
310 240
245 204
202 178
;
ODS RTF;
PROC MEANS N MEAN T PRT; VAR WLOSS;
TITLE 'Paired t-test example using PROC MEANS';
RUN;
ODS RTF CLOSE;
Notice that the actual test is performed on the new variable called
WLOSS, and that is
why it is the only variable requested in the PROC MEANS statement.
This is essentially
a one-sample t-test. The statistics of interest are the mean of WLOSS,
the t-statistic
associated with the null hypothesis for WLOSS and the p-value. The
SAS output is as
follows:
Mean t Value
-22.7500000
-2.79
Pr > |t|
0.0270
57
56
51
76
68
;
PROC MEANS NOPRINT DATA=WT;VAR WEIGHT;OUTPUT OUT=WTMEANS
MEAN=WTMEAN STDDEV=WTSD;
RUN;
DATA WTDIFF;SET WT;
IF _N_=1 THEN SET WTMEANS;
DIFF=WEIGHT-WTMEAN;
Z=DIFF/WTSD; * CREATES STANDARDIZED SCORE (Z-SCORE);
RUN;
ODS RTF;
PROC PRINT DATA= WTDIFF;VAR WEIGHT DIFF Z;
RUN;
ODS RTF CLOSE;
The statement
OUTPUT OUT=WTMEANS MEAN=WTMEAN STDDEV=WTSD;
The first SET statement (SET WT) reads in the entire WT data set. The
statement
IF _N_=1 THEN SET WTMEANS;
Reads in the first (and only) record from the WTMEANS data set and
merges the
WTDIFF and WTSD (and a couple of other system variables) into the
new WTDIFF data
set, allowing you to do the calculations to come up with the DIFF and Z
values.
The resulting data set contains the following information
Obs WEIGHT
DIFF
64
1.25
0.13910
71
8.25
0.91808
53
-9.75
-1.08501
67
4.25
0.47295
55
-7.75
-0.86244
58
-4.75
-0.52859
77
14.25
1.58578
57
-5.75
-0.63988
56
-6.75
-0.75116
10
51
-11.75
-1.30757
11
76
13.25
1.47450
12
68
5.25
0.58424
NOTE: You could also get standardized values using PROC STANDARD.
PROC STANDARD DATA=WT
MEAN=0 STD=1 OUT=ZSCORES;
VAR WEIGHT;
RUN;
PROC PRINT DATA=ZSCORES;
RUN;
End of tutorial
See http://www.stattutorials.com/SAS