0% found this document useful (0 votes)

191 views

Proc Summary

procedure used in SAS

Uploaded by

Mayank Chaturvedi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views

Proc Summary

procedure used in SAS

Uploaded by

Mayank Chaturvedi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

The MEANS/SUMMARY Procedure: Getting Started and Doing More

Arthur L. Carpenter California Occidental Consultants ABSTRACT

The MEANS/SUMMARY procedure is a workhorse for most data analysts. It is used to create tables of summary statistics as well as complex summary data sets. The user has a great many options which can be used to customize what the procedure is to produce. Unfortunately most analysts rely on only a few of the simpler basic ways of setting up the PROC step, never realizing that a number of less commonly used options and statements exist that can greatly simplify the procedure code, the analysis steps, and the resulting output. This tutorial begins with the basic statements of the MEANS/SUMMARY procedure and follows up with introductions to a number of important and useful options and statements that can provide the analyst with much needed tools. With this practical knowledge, you can greatly enhance the usability of the procedure and then you too will be doing more with MEANS/SUMMARY.

KEY WORDS
OUTPUT, MEANS, SUMMARY, AUTONAME, _TYPE_, WAYS, LEVELS, MAXID, GROUPID, preloaded formats

INTRODUCTION
PROC MEANS is one of SASs original procedures, and its initial mandate was to create printed tables of summary statistics. Later PROC SUMMARY was introduced to create summary data sets. Although these two procedures grew up on the opposite side of the tracks, over time both has evolved so that under the current version of SAS they actually both use the same software behind the scenes. These two procedures completely share capabilities. In fact neither can do anything that the other cannot do. Only some of the defaults are different (as they reflect the procedures original roots). For the analyst faced with creating statistical summaries, the MEANS/SUMMARY procedure is indispensable. While it is fairly simple to generate a straightforward statistical summary, these procedures allow a complex list of options and statements that give the analyst a great deal of control. Because of the similarity of these two procedures, examples will tend to show one or the other but not both. When I use MEANS or SUMMARY, I tend to select the procedure based on it primary objective of the step (SUMMARY for a summary data set and MEANS for a printed table). Even that rule, however is rather lax as MEANS has the further advantage of only having 5 letters in the procedure name.

BASIC STATEMENTS
The MEANS/SUMMARY procedure is so powerful that just a few simple statements and options can produce fairly complex and useful summary tables.

Differences Between MEANS and SUMMARY

Originally MEANS was used to generate printed tables and SUMMARY a summary data set. While both procedures can now create either type of output, the defaults for both tend to reflect the original roots of the procedure. One of the primary differences in defaults is seen by looking at the way each procedure creates printed tables. Printed tables are routed through the Output Delivery System to a destination such as LISTING or HTML. By default MEANS always creates a table to be printed. If you do not want a printed table you must explicitly turn it off (NOPRINT option). On the other hand, the SUMMARY procedure never creates a printed table unless it is specifically requested (PRINT option). There are a few other differences between MEANS and SUMMARY. In each case the difference reflects default behaviors, and these will be pointed out in the appropriate sections of this paper.

Creating a Basic Summary Table

Very little needs to be done to create a simple summary table. The DATA= option in the PROC statement identifies the data set to be summarized and the VAR statement lists one or more numeric variables to be analyzed. proc means data=sashelp.class; var weight; run; We can see that the mean weight of the 19 students in the CLASS data set is something over 100 pounds. Because we left the selection of the statistics to the defaults, the table contains N, mean, standard deviation, minimum and the maximum.
A Simple Printed Table The MEANS Procedure Analysis Variable : Weight N Mean Std Dev Minimum Maximum 19 100.0263158 22.7739335 50.5000000 150.0000000

Selecting Statistics
Generally we want more control over which statistics are to be selected. When you want to specifically select statistics, they are listed as options on the PROC statement. title1 'The First Two Statistical Moments'; proc means data=sashelp.class n mean var std stderr; var weight; run;
The First Two Statistical Moments The MEANS Procedure Analysis Variable : Weight N Mean Variance Std Dev Std Error 19 100.0263158 518.6520468 22.7739335 5.2246987

The list of available statistics is fairly comprehensive. A subset of which includes: ! n number of observations used to calculate the statistics ! nmiss number of observations with missing values ! min minimum value taken on by the data ! max maximum value taken on by the data ! range difference between the min and the max ! sum total of the data ! mean arithmetic mean ! std standard deviation ! stderr standard error ! var variance ! skewness symmetry of the data's distribution ! kurtosis peakedness of the data's distribution A number of statistics having to do with percentiles and quantiles are also available, including: ! median 50th percentile ! p50 50th percentile (or second quartile) ! p25 | q1 25th percentile (or first quartile) ! p75 | q3 75th percentile (or third quartile) ! p1 p5 p10 other percentiles ! p90 p95 p99 other percentiles Starting in SAS9.2 the MODE statistic is also available. Statistics listed on the PROC statement are only applied to the printed table and have NOTHING to do with and summary data sets that are also created.

Creating a Summary Data Set

Both procedures can also be used to create a summary data set through the use of the OUTPUT statement. Without using ODS, a summary data set will not be created unless the OUTPUT statement is present. This is true for both the MEANS and SUMMARY procedures.

title1 'A Simple Summary Data Set'; proc means data=sashelp.class noprint; var weight; output out=summrydat; run;
The NOPRINT option is used with MEANS, because a printed table is not wanted. A PROC PRINT of the summary data set (WORK.SUMMRYDAT) shows the following:

A Simple Summary Data Set Obs 1 2 3 4 5 _TYPE_ 0 0 0 0 0 _FREQ_ 19 19 19 19 19 _STAT_ N MIN MAX MEAN STD Weight 19.000 50.500 150.000 100.026 22.774

Again since statistics were not specified the same default list of statistics as was used in the MEANSs printed table appears here.

Selecting the Statistics and Naming the Variables in the Summary Data Set
Usually when you create a summary data set, you will want to specifically select the statistics. These are specified on the OUTPUT statement. Remember statistics listed on the PROC statement only apply to printed tables and have nothing to do with the statistics that you want in the summary data set. The techniques shown below can be combined - experiment. Selecting Statistics Statistics are selected by using their names as options in the OUTPUT statement. The name of each statistic is followed by an equal sign. The following OUTPUT statement requests that the mean weight be calculated and saved in the data set SUMMRYDAT.

title1 'Selected Statistics'; proc summary data=sashelp.class; var weight; output out=summrydat mean=; run; The mean weight will be stored in a variable named WEIGHT. This technique allows you to only pick a single statistic, and as such it is limited, however when combined with the techniques shown below, it can be very flexible. Explicate Naming By following the equal sign with a name, you can provide names for the new variables. This allows you to name more than one statistic on the OUTPUT statement. title1 'Selecting Multiple Statistics'; proc summary data=sashelp.class; var weight; output out=summrydat n=number mean=average std=std_deviation; run; You can also name multiple analysis variables. Here both HEIGHT and WEIGHT are specified.
Selecting Multiple Statistics std_ deviation 22.7739

Obs 1

_TYPE_ 0

_FREQ_ 19

number 19

average 100.026

title1 'Multiple Analysis Variables'; proc summary data=sashelp.class; var height weight; output out =summrydat n = ht_n wt_n mean = mean_ht mean_wt std = sd_ht sd_wt; run; Be sure to be careful here as the order of the variables in the VAR statement determines which variable is for height and which is for weight. You should also be smart about naming conventions. In the previous example the statistics for N are not consistently named relative to those for the MEAN and STD. This technique does not allow you to skip statistics. If you did not want the mean for HEIGHT, but only the mean for WEIGHT, this would not be possible, because HEIGHT is first on the VAR statement. To get around this you can use the techniques on naming the statistics shown in the next section.

Selected Naming When there is more than one variable in the VAR statement, but you do not want every statistic calculated for every analysis variable, you can selectively associate statistics with analysis variables.

title1 'Selective Associations'; proc summary data=sashelp.class; var height weight; output out =summrydat n =ht_n wt_n mean(weight)= wt_mean std(height) = ht_std; run;

Selective Associations Obs 1 _TYPE_ 0 _FREQ_ 19 ht_n 19 wt_n 19 wt_mean 100.026 ht_std 5.12708

Alternate forms of the statistic selections (in this case for the MEAN) could have included the following: mean(weight height)=wt_mean ht_mean mean(weight)=wt_mean mean(height)=ht_mean Automatic Naming of Summary Variables When you do not NEED to control the naming of the new summary variables, the AUTONAME and AUTOLABEL options can be used on the OUTPUT statement. The AUTONAME option allows you to select statistics without picking a name for the resulting variable in the OUTPUT table. This eliminates naming conflicts. The AUTOLABEL option creates a label for variables added to the OUT= data set. title1 'Using AUTONAME'; proc summary data=sashelp.class; var height weight; output out =summrydat n = mean= std = / autoname; run;
Using AUTONAME Height_ Mean 62.3368 Weight_ Mean 100.026 Height_ StdDev 5.12708 Weight_ StdDev 22.7739

Obs 1

_TYPE_ 0

_FREQ_ 19

Height_N 19

Weight_N 19

Notice that the names are in the form of variable_statistic. This is a nicely consistent, dependable, and usable naming convention

Using the CLASS Statement

The CLASS statement can be used to create subgroups. Unlike the BY statement the data do not have to be sorted prior to its use. Like in most other procedures that utilize the CLASS statement, there can be one or more classification variables. In a Printed Table When the resulting table is to be printed, CLASS creates one summary for each combination of classification variables.

title1 'CLASS and a Printed Table'; proc means data=sashelp.class(where=(age in(12,13,14))) n mean std; class age sex; var height; run;

CLASS and a Printed Table The MEANS Procedure Analysis Variable : Height N Age Sex Obs N Mean Std Dev 12 F 2 2 58.0500000 2.4748737 M 3 3 60.3666667 3.9323445 13 F M 2 1 2 1 60.9000000 62.5000000 6.2225397 .

In a Summary Data Set When creating a summary data set, one can get not only the classification variable interaction statistics, but the main factor statistics as well. This can be very helpful to the statistician.

F 2 2 63.5500000 1.0606602 M 2 2 66.2500000 3.8890873

title1 'CLASS and a Summary Data Set'; proc summary data=sashelp.class(where=(age in(12,13,14))); class age sex; var height; output out=clsummry n=ht_n mean=ht_mean std=ht_sd; run A PROC PRINT of the data set CLSUMMRY shows:
CLASS and a Summary Data Set Obs 1 2 3 4 5 6 7 8 9 10 11 12 Age . . . 12 13 14 12 12 13 13 14 14 Sex _TYPE_ 0 1 1 2 2 2 3 3 3 3 3 3 _FREQ_ 12 6 6 5 3 4 2 3 2 1 2 2 ht_n 12 6 6 5 3 4 2 3 2 1 2 2 ht_mean 61.7583 60.8333 62.6833 59.4400 61.4333 64.9000 58.0500 60.3667 60.9000 62.5000 63.5500 66.2500 ht_sd 3.97868 3.90470 4.18637 3.29742 4.49592 2.80119 2.47487 3.93234 6.22254 . 1.06066 3.88909

F M

F M F M F M

Two additional variables have been added to the summary data set; _TYPE_ (which is described below in more detail), and _FREQ_ (which counts observations). Although not apparent in this example, _FREQ_ counts all observations, while the N

statistic only counts observations with non-missing values. If you only want the statistics for the highest order interaction, you can use the NWAY option on the PROC statement. proc summary data=sashelp.class(where=(age in(12,13,14))) nway; Understanding _TYPE_ The _TYPE_ variable in the output data set helps us track the level of summarization, and can be used to distinguish the sets of statistics. Notice in the previous example that _TYPE_ changes for each level of summarization. _TYPE_ = 0 _TYPE_ = 1 _TYPE_ = 2 _TYPE_ = 3 Summarize across all classification variables Summarize as if the right most classification variable (SEX) was the only one Summarize as if the next to the right most classification variable (AGE) was the only one Interaction of the two classification variables.

In the following example there are three CLASS variables and _TYPE_ ranges from 0 to 7. title1 'Understanding _TYPE_'; proc summary data=advrpt.demog(where=(race in('1','4') & 12 le edu le 15 & symp in('01','02','03'))); class race edu symp; var ht; output out=stats mean= meanHT; run;
Understanding _TYPE_ Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 RACE EDU . . . . 12 14 15 12 12 14 15 . . . . . 12 15 14 12 12 15 14 SYMP 01 02 03 _TYPE_ 0 1 1 1 2 2 2 3 3 3 3 4 4 5 5 5 6 6 6 7 7 7 7 _FREQ_ 8 2 4 2 4 2 2 2 2 2 2 6 2 4 2 2 4 2 2 2 2 2 2 mean HT 66.25 64.00 66.50 68.00 67.50 64.00 66.00 67.00 68.00 64.00 66.00 67.00 64.00 66.50 68.00 64.00 67.50 66.00 64.00 67.00 68.00 66.00 64.00

1 4 1 1 4 1 1 4 1 1 1 4

02 03 01 02

02 03 01

02 03 02 01

When calculating the value of _TYPE_, assign a zero (0) when summarizing over a CLASS variable and assign a one (1) when summarizing for the CLASS variable. In the table below the zeros and ones associated with the class variables form a binary value. This binary value can be converted to decimal to obtain _TYPE_.
CLASS VARIABLES Observations 1 2-4 5-7 8 - 11 12 - 13 14 - 16 17 - 19 20 - 23 RACE 0 0 0 0 1 1 1 1 22=4 EDU 0 0 1 1 0 0 1 1 21=2 SYMP 0 1 0 1 0 1 0 1 20=1 Binary Value 0 1 10 11 100 101 110 111 _TYPE_ 0 1 2 3 4 5 6 7

A binary value of 110 = 1*22 + 1*21 + 0*20 = 1*4 + 1*2 + 0*1 = 6 = _TYPE_ Some SAS programmers find converting binary values to decimal values a bit tedious. Fortunately the developers at SAS Institute have provided us with alternatives.

Using CHARTYPE
The CHARTYPE option causes _TYPE_ to be displayed as a character variable in binary form rather than as a decimal value. title1 'Understanding _TYPE_ Using CHARTYPE'; proc summary data=advrpt.demog(where=(race in('1','4') & 12 le edu le 15 & symp in('01','02','03'))) chartype; class race edu symp; var ht; output out=stats mean= meanHT; run;

Understanding _TYPE_ Using CHARTYPE mean HT 66.25 64.00 66.50 68.00 67.50 64.00 66.00 67.00 68.00

Obs 1 2 3 4 5 6 7 8 9

RACE

EDU . . . . 12 14 15 12 12

SYMP

_TYPE_ 000 001 001 001 010 010 010 011 011

_FREQ_ 8 2 4 2 4 2 2 2 2

01 02 03

02 03

. . . . portions of the table not shown . . . .

CREATING SUMMARY DATA SUBSETS

Once you have started to create summary data sets with MEANS/SUMMARY, you will soon discover how very useful they can be. Of course you will often find that you do not need all the information contained in the summary data set and that you need to create a data subset. As with most things in SAS there are multiple ways to do this. We have already seen the use of the NWAY option to subset for only the highest order interaction. This is fine but not very flexible. Lets look at some techniques that are a bit more useful.

Select Rows Using _TYPE_

Once you understand and can predict the value of _TYPE_, it can be used to provide subsetting information in a followup DATA step. Suppose that in the previous example we would like to have only those rows for which EDU is a factor. Our DATA step might be written something like: data edufactor; set stats; where _type_ in(2,3,6,7); run;

Using the WAYS and LEVELS Options

The _TYPE_ variable is only one of several ways to identify levels of summarizations in the summary data set. The WAYS and LEVELS options on the OUTPUT statement provide additional discrimination capabilities. These options add the variables _LEVEL_ and _WAY_ to the summary data table. title1 'Using LEVELS and WAYS Options'; proc summary data=advrpt.demog; class race edu; var ht; output out=stats mean= meanHT Using LEVELS and WAYS Options /levels ways; run; Obs RACE EDU _WAY_ _TYPE_
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 . 10 12 13 14 15 16 17 18 . . . . . 10 12 13 15 16 17 18 12 16 17 14 15 14 16 18 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

_LEVEL_ 1 1 2 3 4 5 6 7 8 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

_FREQ_ 75 11 18 4 11 7 10 10 4 41 17 9 4 4 11 15 4 5 2 2 2 3 6 8 7 2 4 2 2

meanHT 67.5200 71.3636 66.8889 70.0000 64.1818 65.2857 70.4000 65.2000 69.0000 68.4390 67.6471 64.8889 64.5000 66.5000 71.3636 67.0667 70.0000 64.2000 71.0000 63.0000 73.0000 66.0000 71.0000 65.7500 64.0000 68.0000 64.5000 68.0000 65.0000

LEVELS option Adds the variable _LEVEL_ to the OUT= data table. This numeric variable counts the observations within _TYPE_. This means that when FIRST._TYPE_ is true _LEVEL_ will equal 1.

WAYS option Adds the variable _WAY_ to the OUT= data table. This numeric variable equals the number of classification variables that were used to calculate each observation e.g. for a three way interaction _WAY_ will equal 3.

1 2 3 4 5 1 1 1 1 1 1 1 2 2 2 3 3 4 5 5

Using the WAYS and TYPE Statements

In addition to the WAYS and LEVELS options on the OUTPUT statement there are also the WAYS and TYPE statements than can also be used to control what information is written to the summary data set. These have the further advantage of controlling what is actually calculated and can therefore also save computer resources when there are a large number of classification variables.

Controlling Summary Subsets Using WAYS The WAYS statement can be used to specify a list of combinations of class variables, which are to be displayed. Combinations of the WAYS statement for three classification variables include the following summarizations: ways ways ways ways 0; 1; 2; 3; across all class variables each classification variable (no cross products) each two way combination of the classification variables three way combination for three classification variables this is the same as using the NWAY option when there are three classification variables. lists of numbers are acceptable

ways 0,3;

When the number of classification variables becomes large the WAYS statement can utilize an incremental list. ways 0 to 9 by 3; In the following example, the main effect summaries (_TYPE_ = 1, 2) are not even calculated. title1 'Using the WAYS Statement'; proc summary data=advrpt.demog; Using the WAYS Statement class race edu symp; var ht; Obs RACE EDU SYMP _TYPE_ ways 0,2; output out=stats mean= meanHT; 1 . 0 run; 2 10 04 3
3 4 10 12 10 02 3 3

_FREQ_ 64 6 3 2

meanHT 67.1875 74.0000 69.0000 67.0000

Controlling Summary Subsets Using TYPES The TYPES statement can be used to select and limit the data roll up summaries. The TYPES statement eliminates much of your need to understand the automatic variable _TYPE_. The TYPES statement is used to list those combinations of the classification variable that are desired. Like the WAYS statement this also can be used to limit the number of calculations that need to be performed.

. . . . portions of the table not shown . . . .

Using the TYPES Statement Obs 1 2 3 4 5 6 7 8 9 10 11 RACE EDU SYMP _TYPE_ _FREQ_ meanHT 72.3333 66.2667 68.0000 64.0000 65.2857 70.4000 65.2000 65.0000 71.0000 66.5000 68.0000

title1 'Using the TYPES Statement'; proc summary data=advrpt.demog; class race edu symp; var ht; types edu race*symp; output out=stats mean= meanHT; run;

10 2 9 12 2 15 13 2 2 14 2 9 15 2 7 16 2 10 17 2 10 18 2 2 1 . 01 5 2 1 . 02 5 4 1 . 03 5 2 . . . . portions of the table not shown . . . .

For the following CLASS statement class race edu symp; variations of the TYPES statement could include: types (); types race*edu edu*symp; types race*(edu symp);

Using the CLASSDATA= and EXCLUSIVE Options

You can specify which combinations of levels of the classification variables are to appear in the report by creating a data set that contains the combinations of interest. These can include levels that do not exist in the data itself, but that are to none-the-less appear in the data set or report. The EXCLUSIVE option forces only those levels in the CLASSDATA= data set to appear in the report. The following example builds the data set that is to be used with the CLASSDATA= option. It also adds a level for each classification variable that does not exist in the data.

title1 'Using the CLASSDATA and EXCLUSIVE Options'; data selectlevels(keep=race edu symp); set advrpt.demog(where=(race in('1','4') & 12 le edu le 15 & symp in('01','02','03'))); output; * For fun add some nonexistent levels; if _n_=1 then do; edu=0; race='0'; symp='00'; output; end; Using the CLASSDATA and EXCLUSIVE Options run; proc summary data=advrpt.demog Obs RACE EDU SYMP _TYPE_ _FREQ_ classdata=selectlevels exclusive; 1 . 0 8 class race edu symp; 2 . 00 1 0 var ht; 3 . 01 1 2 output out=stats mean= 4 . 02 1 4 5 . 03 1 2 meanHT; 6 0 2 0 run;
The summary lines for observations 2 and 6 represent levels of the classification variables that do not appear in the data. They were generated thru a combination of the CLASSDATA= data set and the EXCLUSIVE option.
7 8 12 14 2 2 4 2

mean HT 66.25 . 64.00 66.50 68.00 . 67.50 64.00

. . . . portions of the table not shown . . . .

Using the COMPLETETYPES Option

All combinations of the classification variables may not exist in the data and therefore those combinations will not appear in the summary table. If all possible combinations are desired, regardless as to whether or not they exist in the data, use the COMPLETETYPES option on the PROC statement. title1 'Using the COMPLETETYPES Option'; proc summary data=advrpt.demog(where=(race in('1','4') & 12 le edu le 15 & symp in('01','02','03')))

completetypes; class race edu symp; var ht; output out=stats mean= meanHT; run; In the data there are no observations with both EDU=12 and SYMP=01', however since both levels exist somewhere in the data, the COMPLETETYPES option causes the combination to appear in the summary data set (obs=8).
Using the COMPLETETYPES Option mean HT 66.25 64.00 66.50 68.00 67.50 64.00 66.00 . 67.00

Obs 1 2 3 4 5 6 7 8 9

RACE

EDU . . . . 12 14 15 12 12

SYMP

_TYPE_ 0 1 1 1 2 2 2 3 3

_FREQ_ 8 2 4 2 4 2 2 0 2

01 02 03

01 02

. . . . portions of the table not shown . . . .

FINDING THE EXTREME VALUES

When working with data, it is not at all unusual to want to be able to identify the observations that contain the highest or lowest values of the analysis variables. These extreme values are automatically displayed in PROC UNIVARIATE output, but must be requested in MEANS/SUMMARY. As was shown earlier the MIN and MAX statistics show the extreme value, unfortunately they do not identify the observation that contains the extreme. Fortunately there are a couple of ways to do this.

Using MAXID and MINID

The MAXID and MINID options in the OUTPUT statement can be used to identify the observations with the maximum and minimum values. The general form of the statement is: MAXID(analysis var(ID var))=PDV var A new variable is added to the OUTPUT data set which takes on the value of the ID variable for the maximum observation. title1 'Using MAXID'; proc summary data=advrpt.demog; class race edu; var ht wt; output out=stats mean= meanHT MeanWT max=maxHt maxWT maxid(ht(subject) wt(subject))=maxHtSubject MaxWtSubject ; run;

Using MAXID max Ht 74 74 70 72 max WT 240 215 240 215 maxHt Subject 110 110 106 148 MaxWt Subject 137 109 137 117

Obs 1 2 3 4

RACE

EDU . 10 12 13

_TYPE_ 0 1 1 1

_FREQ_ 75 11 18 4

meanHT 67.5200 71.3636 66.8889 70.0000

MeanWT 160.267 194.091 167.722 197.000

. . . . portions of the table not shown . . . .

The OUTPUT statement could also have been written as: output out=stats mean= meanHT MeanWT max=maxHt maxWT maxid(ht(subject))=maxHtSubject maxid(wt(subject))=maxWtSubject ; When more than one variable is needed to identify the observation with the extreme value, the MAXID supports a list. As before when specifying lists, there is a one-to-one correspondence between the two lists. In the following OUTPUT statement both the SUBJECT and SSN are used in the list of identification variables. Consequently a new variable is created for each in the summary data set. output out=stats mean= meanHT MeanWT max=maxHt maxWT maxid(ht(subject ssn))=MaxHtSubject MaxHtSSN maxid(wt(subject ssn))=MaxWtSubject MaxWtSSN ; The MAXID and MINID options allow you to only capture a single extreme. It is also possible to display a group of the extreme values using the GROUPID option.

Using the GROUPID Option

Like the MAXID and MINID options, this option allows you to capture the maximum or minimum value and associated ID variable. More importantly, however, you may select more than just the single extreme value. title1 'Using GROUPID'; proc summary data=advrpt.demog; class race edu; var ht wt; output out=stats mean= MeanHT MeanWT max(wt)=maxWT idgroup(max(wt)out[2](wt subject race)=maxval ) ; run;

Using GROUPID max subject_ subject_ WT maxval_1 maxval_2 1 2 RACE_1 RACE_2

Obs RACE EDU _TYPE_ _FREQ_ 1 2 3 4 5 . 10 12 13 14 0 1 1 1 1 75 11 18 4 11

MeanHT 67.5200 71.3636 66.8889 70.0000 64.1818

MeanWT 160.267 194.091 167.722 197.000 108.091

240 215 240 215 115

215 215 185 215 115

137 109 137 117 131

109 143 119 163 141

2 1 2 1 4

1 1 1 1 4

. . . . portions of the table not shown . . . .

MAX statistic is superfluous in this example, and is included only for your reference. We are asking for the maximum of WT. GROUPID also is available for MIN, therefore in this example we could have also specified: idgroup(min(ht)out[3](ht subject race)=minht minsub minrace)

The top 2 values are to be shown This is a list of variables that will be shown as observation identifiers. The analysis variable is usually included. The MAX statistic has also been requested for comparison purposes , however it will only provide one value and not the next highest. You can choose the prefix of the ID variable or you can let the procedure do it for you . In either case, a number is appended to the variable name. In this example we can see that the second heaviest subject in the study was subject 137 with a weight of 215 pounds and a RACE of 1.

DOING MORE WITH MEANS/SUMMARY Using Options on CLASS Statements

The CLASS statement can now accept options. These include: ASCENDING / DESCENDING GROUPINTERNAL MISSING MLF ORDER preloaded format options (discussed below) Most of the following discussion applies to virtually all SAS procedures that accept the CLASS statement. ASCENDING / DESCENDING Normally output (in tables or a summary data set) is placed in ascending order for each classification variable. You can change this by using the DESCENDING option on the CLASS statement.

title1 'Using the DESCENDING CLASS Option'; proc summary data=advrpt.demog; class race/descending; var ht wt; output out=stats mean= MeanHT MeanWT ; run;

Using the DESCENDING CLASS Option Obs 1 2 3 4 5 6 RACE _TYPE_ 0 1 1 1 1 1 _FREQ_ 76 4 4 9 17 42 MeanHT 67.5526 66.5000 64.5000 64.8889 67.6471 68.4762 MeanWT 160.461 147.000 113.500 111.222 162.000 176.143

5 4 3 2 1

GROUPINTERNAL When a classification variable is associated with a format, the format is used when forming groups. proc format; value edulevel 0-12 = 'High School' 13-16= 'College' 17-high='Post Graduate'; run; title1 'Without Using the GROUPINTERNAL CLASS Option'; proc summary data=advrpt.demog; class edu; var ht wt; output out=stats mean= MeanHT MeanWT ; format edu edulevel.; run; The resulting table will show at most three levels for EDU. To use the original data values (internal values), the GROUPINTERNAL option is added to the CLASS statement. class edu/groupinternal; MISSING When a classification variable takes on a missing value that observation is eliminated from the analysis. If a missing value is OK or if the analyst needs to have it included in the summary, the MISSING option can be used. Most procedures that have either an implicit or explicit CLASS statement also have a MISSING option. However when the MISSING option is used on the PROC statement it is applied to all the classification variables and this may not be acceptable. By using the MISSING option on the CLASS statement you can control which classification variables are to be handled differently. In the following example there are three classification variables. However the MISSING option has only been applied to two of them. title1 'Using the MISSING CLASS Option'; proc means data=advrpt.demog n mean std; class race ; class edu symp/ missing; var ht wt; run; ORDER When classification variables are displayed or written to a table the values are ordered according to one of several possible schemes. These include: data order is based on the order of the incoming data formatted values are formatted and then ordered (default when the variable is formatted) freq the order is based on the frequency of the class level unformatted same as INTERNAL or GROUPINTERNAL Using the order=freq option on the CLASS statement causes the table to be ordered according to the most common levels of education. class edu/order=freq;

Using the ORDER CLASS Option The MEANS Procedure years of N education Obs Variable Label N Mean Std Dev 12 19 HT height in inches 19 66.9473684 2.7582942 WT weight in pounds 19 171.5263158 32.2703311 14 11 HT WT HT WT HT WT height in inches weight in pounds height in inches weight in pounds height in inches weight in pounds 11 11 11 11 10 10 64.1818182 108.0909091 71.3636364 194.0909091 65.2000000 145.2000000 0.4045199 4.3921417 3.2022719 19.0811663 2.3475756 25.0900600

. . . . portions of the table not shown . . . .

Using Multiple CLASS Statements

Because CLASS statements now accept options, and because those options may not apply to all the classification variables, it is often necessary to specify multiple CLASS statements - each with its own set of options. With or without options, when multiple CLASS statements are specified, the order of the statements themselves becomes important. The following CLASS statement class race edu; could be rewritten as class race; class edu;

PRELOADED FORMATS
Several options and techniques are available to control which levels of classification variables are to appear in the summary. Those that were discussed earlier in this paper include the CLASSDATA and COMPLETETYPES options. Also discussed were the WAYS and TYPES statements, as well as the WAYS and LEVELS options on the OUTPUT statement. A related set of options come under the general topic of Preloaded Formats. Variations of these options are available for most of the procedures that utilize classification variables. Like the others listed above these techniques/options are used to control the relationship of levels of classification variables that may not appear in the data and how those levels are to appear (or not appear) in the summary. Generally speaking when a level of a classification variable is not included in the data, the associated row will not appear in the table. This behavior relative to the missing levels can be controlled through the use of preloaded formats. For the MEANS/SUMMARY procedures, options used to preload formats include: PRELOADFMT Loads the format levels prior to execution. This option will always be present when you want to use a preloaded format. EXCLUSIVE COMPLETETYPES Only data levels that are included in the format definition are to appear in summary table All levels representing format levels are to appear in the summary

It is the interaction of these three options that gives us a wide range of possible outcomes. In each case the option PRELOADFMT will be present. As the name of the technique implies, the control is maintained through the use of user defined formats. For the examples that follow, the format $SYMPX has been created, and it contains one level, 00', that is not in the data. In the data the values of SYMP range from 01' to 10'. proc format; value $sympx '01' = 'Sleepiness' '02' = 'Coughing' '00' = 'Bad Code' ; run;

PRELOADFMT with EXCLUSIVE

Preloading with the CLASS statement options PRELOADFMT and EXCLUSIVE limits the levels of the classification variable to those that are both on the format and in the data. Essentially the format acts as a filter without resorting to either a subsetting IF or a WHERE clause. title1 'Preloading and the EXCLUSIVE Option'; proc summary data=advrpt.demog; class symp / preloadfmt exclusive; var ht; output out=stats mean= meanHT; format symp $sympx.; run; Symptoms that are not both on the format $SYMPX. and in the data, are not included on the summary table.

Preloading and the EXCLUSIVE Option mean HT 67.0 67.5 66.8

Obs 1 2 3

SYMP

_TYPE_ 0 1 1

_FREQ_ 14 4 10

Sleepiness Coughing

PRELOADFMT with the COMPLETETYPES Option

The COMPLETETYPES option requests that all combinations of levels appear in the summary. When it is used with preloaded formats, the complete list of levels comes from the format rather than from the data itself. In this example the format %SYMPX. is again preloaded, however rather than using the EXCLUSIVE CLASS statement option, the COMPLETTYPES option appears on the PROC statement. title1 'Preloading and the COMPLETETYPES Option'; proc summary data=advrpt.demog completetypes; class symp / preloadfmt; Preloading and the COMPLETETYPES Option var ht; output out=stats mean= meanHT; format symp $sympx.; Obs SYMP _TYPE_ _FREQ_ run; The summary now contains an observation for each SYMP in the data as well as each in the format $SYMPx.
1 2 3 4 5 6 Bad Code Sleepiness Coughing 03 04 0 1 1 1 1 1 65 0 4 10 4 13

meanHT 67.2000 . 67.5000 66.8000 66.5000 68.6923

. . . . portions of the table not shown . . . .

PRELOADFMT with the COMPLETETYPES and the EXCLUSIVE Options

When a preloaded format is used with both the COMPLETETYPES and the EXCLUSIVE options, the summary includes all levels of the format, but not necessarily all levels in the data. title1 'Preloading With Both'; title2 'the COMPLETETYPES and EXCLUSIVE Options'; proc summary data=advrpt.demog Preloading With Both completetypes; the COMPLETETYPES and EXCLUSIVE Options class symp / preloadfmt exclusive; mean var ht; Obs SYMP _TYPE_ _FREQ_ HT output out=stats mean= meanHT; 1 0 14 67.0 format symp $sympx.; 2 Bad Code 1 0 . run;
3 4 Sleepiness Coughing 1 1 4 10 67.5 66.8

SUMMARY
The MEANS /SUMMARY procedure produces a wide variety of summary reports and summary data tables. It is very flexible and, while it can be quite complex, a few basic statements allow the user to create useful summaries. As you develop a deeper knowledge of the MEANS/SUMMARY procedure, you will find that the generation of highly sophisticated summarizations is possible from within a single step.

ABOUT THE AUTHOR

Art Carpenters publications list includes four books, and numerous papers and posters presented at SUGI, SAS Global Forum, and other user group conferences. Art has been using SAS since 1976 and has served in various leadership positions in local, regional, national, and international user groups. He is a SAS Certified ProfessionalTM and through California Occidental Consultants he teaches SAS courses and provides contract SAS programming support nationwide.

AUTHOR CONTACT
Arthur L. Carpenter California Occidental Consultants 10606 Ketch Circle Anchorage, AK 99515 (907) 865-9167 art@caloxy.com www.caloxy.com

TRADEMARK INFORMATION
SAS, SAS Certified Professional, SAS Certified Advanced Programmer, and all other SAS Institute Inc. product or service names are registered trademarks of SAS Institute, Inc. in the USA and other countries. indicates USA registration.

Complete Download C++ Programming: An Object-Oriented Approach, 1e ISE 1st Edition Behrouz A. Forouzan - eBook PDF PDF All Chapters
100% (7)
Complete Download C++ Programming: An Object-Oriented Approach, 1e ISE 1st Edition Behrouz A. Forouzan - eBook PDF PDF All Chapters
59 pages
DADM - Tools Help
No ratings yet
DADM - Tools Help
24 pages
ANSYS Tutorial Design Optimization
100% (4)
ANSYS Tutorial Design Optimization
9 pages
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
From Everand
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
Jurex Gallo
No ratings yet
Unit Iii Sas Procedures
No ratings yet
Unit Iii Sas Procedures
27 pages
Descriptive Statistics Using SAS
No ratings yet
Descriptive Statistics Using SAS
10 pages
Primer 3e Chap3 Case New
No ratings yet
Primer 3e Chap3 Case New
8 pages
SAS® Reporting 101: REPORT, TABULATE, ODS, and Microsoft Office
No ratings yet
SAS® Reporting 101: REPORT, TABULATE, ODS, and Microsoft Office
9 pages
SAS Info 2
No ratings yet
SAS Info 2
4 pages
DADM - Tools Help
No ratings yet
DADM - Tools Help
25 pages
ODS Output Statement
No ratings yet
ODS Output Statement
13 pages
Statistical Process Control (SPC) Tutorial
No ratings yet
Statistical Process Control (SPC) Tutorial
10 pages
Spss Coursework Help
100% (2)
Spss Coursework Help
8 pages
Notes On The SAS Data Step and An Introduction To Simulation
No ratings yet
Notes On The SAS Data Step and An Introduction To Simulation
37 pages
Applications of Selected Statistical Package
No ratings yet
Applications of Selected Statistical Package
18 pages
Proc Print To Be Proud of
No ratings yet
Proc Print To Be Proud of
5 pages
DSF Record
No ratings yet
DSF Record
30 pages
Tips and Techniques For The SAS Programmer
No ratings yet
Tips and Techniques For The SAS Programmer
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
2
No ratings yet
2
8 pages
Univariate
No ratings yet
Univariate
9 pages
PharmaSUG 2019 BP 115
No ratings yet
PharmaSUG 2019 BP 115
8 pages
I Spss-Pasw: Ntroduction To
No ratings yet
I Spss-Pasw: Ntroduction To
37 pages
Catherine Truxillo, PH.D., Stephen Mcdaniel, and David Mcnamara, Sas Institute Inc., Cary, NC
No ratings yet
Catherine Truxillo, PH.D., Stephen Mcdaniel, and David Mcnamara, Sas Institute Inc., Cary, NC
9 pages
SAS Procedures
No ratings yet
SAS Procedures
8 pages
Tata SAS
No ratings yet
Tata SAS
285 pages
SPAD7 Data Miner Guide PDF
No ratings yet
SPAD7 Data Miner Guide PDF
176 pages
Basic SAS Interview Questions
No ratings yet
Basic SAS Interview Questions
71 pages
Chapter 3 Case Study Update Using Smartpls 4: July 2023
No ratings yet
Chapter 3 Case Study Update Using Smartpls 4: July 2023
9 pages
Regress It 2020 User Manual
No ratings yet
Regress It 2020 User Manual
15 pages
Suppl Formchar
No ratings yet
Suppl Formchar
2 pages
Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA
No ratings yet
Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA
12 pages
Sas Interview
No ratings yet
Sas Interview
12 pages
SPSS Step-by-Step Tutorial: Part 1
No ratings yet
SPSS Step-by-Step Tutorial: Part 1
50 pages
Count Obs in SAS
No ratings yet
Count Obs in SAS
6 pages
Tabulate
No ratings yet
Tabulate
23 pages
26 Run Cody
No ratings yet
26 Run Cody
5 pages
Load Runner Analysis: Hints N' Tips
No ratings yet
Load Runner Analysis: Hints N' Tips
4 pages
Spss
No ratings yet
Spss
50 pages
Handbook of SAS DATA Step Programming
No ratings yet
Handbook of SAS DATA Step Programming
3 pages
Learning SPSS: Data and EDA
No ratings yet
Learning SPSS: Data and EDA
40 pages
SAS Stat Studio v3.1
No ratings yet
SAS Stat Studio v3.1
69 pages
Data Mining Project DSBA PCA Report Final
No ratings yet
Data Mining Project DSBA PCA Report Final
21 pages
ProcessMA16 Manual
No ratings yet
ProcessMA16 Manual
34 pages
PharmaSUG-2015-IB07
No ratings yet
PharmaSUG-2015-IB07
10 pages
Unit 2 Query plan
No ratings yet
Unit 2 Query plan
7 pages
Learning SPSS: Data and EDA
No ratings yet
Learning SPSS: Data and EDA
40 pages
Top 50 SAS Interview Questions For 2019 - SAS Training - Edureka PDF
No ratings yet
Top 50 SAS Interview Questions For 2019 - SAS Training - Edureka PDF
9 pages
SAS Chapter 10
No ratings yet
SAS Chapter 10
5 pages
Complete Sas
100% (1)
Complete Sas
284 pages
Basic Teradata Query Optimization Tips
No ratings yet
Basic Teradata Query Optimization Tips
12 pages
Advanced SQL Processing
No ratings yet
Advanced SQL Processing
7 pages
Guido's Guide To PROC MEANS - A Tutorial For Beginners Using The SAS® System
No ratings yet
Guido's Guide To PROC MEANS - A Tutorial For Beginners Using The SAS® System
11 pages
Automatic Forecasting: Month Traffic
No ratings yet
Automatic Forecasting: Month Traffic
11 pages
Spss Tutorial Guide Complete
No ratings yet
Spss Tutorial Guide Complete
34 pages
Canatuan ME Lab. 3 Exercise 1 Statistical Process Control
No ratings yet
Canatuan ME Lab. 3 Exercise 1 Statistical Process Control
17 pages
Two-Way ANOVA With Post Tests: Entering and Graphing The Data
No ratings yet
Two-Way ANOVA With Post Tests: Entering and Graphing The Data
6 pages
Base SAS Interview Questions
No ratings yet
Base SAS Interview Questions
26 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
SPSS For Beginners: An Illustrative Step-by-Step Approach to Analyzing Statistical data
From Everand
SPSS For Beginners: An Illustrative Step-by-Step Approach to Analyzing Statistical data
Hunt Robert D.
No ratings yet
Biostatistics by Example Using SAS Studio
From Everand
Biostatistics by Example Using SAS Studio
Ron Cody
No ratings yet
Assignment 5: Write A Program To Generate Fractal Generation: Koch Curve. Code
No ratings yet
Assignment 5: Write A Program To Generate Fractal Generation: Koch Curve. Code
30 pages
Makefile
No ratings yet
Makefile
4 pages
[FREE PDF sample] (Ebook) Graph theory with applications by C. Vasudev ISBN 9788122417371, 812241737X ebooks
100% (2)
[FREE PDF sample] (Ebook) Graph theory with applications by C. Vasudev ISBN 9788122417371, 812241737X ebooks
77 pages
Uq23ca654a 20231114215430
No ratings yet
Uq23ca654a 20231114215430
12 pages
SE T01 - Pseudo Code I
No ratings yet
SE T01 - Pseudo Code I
10 pages
PHP and Mysql Exam
No ratings yet
PHP and Mysql Exam
18 pages
SE-Comps SEM4 AOA-CBCGS DEC18 SOLUTION
No ratings yet
SE-Comps SEM4 AOA-CBCGS DEC18 SOLUTION
15 pages
Create Border in C Language
No ratings yet
Create Border in C Language
7 pages
Modularity, Cohesion
No ratings yet
Modularity, Cohesion
5 pages
PFP191FE
No ratings yet
PFP191FE
25 pages
Stacks of Flapjacks: Input
No ratings yet
Stacks of Flapjacks: Input
2 pages
Training Highlights:: Applied Deep Learning For Medical Data Analysis (Mri, Ctscan, Xray)
No ratings yet
Training Highlights:: Applied Deep Learning For Medical Data Analysis (Mri, Ctscan, Xray)
5 pages
Oop Practical
No ratings yet
Oop Practical
56 pages
Introduction To Compiler Design 3rd Edition Torben Ægidius Mogensen Download PDF
100% (9)
Introduction To Compiler Design 3rd Edition Torben Ægidius Mogensen Download PDF
62 pages
PF Assignment No. 1,2,3
No ratings yet
PF Assignment No. 1,2,3
17 pages
Winmeen VAO Mission 100: Number Series
No ratings yet
Winmeen VAO Mission 100: Number Series
6 pages
Describe Segmentation Process and Comparison of Segmentation With Paging: Segmentation Process
No ratings yet
Describe Segmentation Process and Comparison of Segmentation With Paging: Segmentation Process
5 pages
Sessional I Exam Paper PF Fall 2022 - Solution
No ratings yet
Sessional I Exam Paper PF Fall 2022 - Solution
10 pages
Introduction To White Box Testing
No ratings yet
Introduction To White Box Testing
10 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Data Structures LMS
No ratings yet
Data Structures LMS
37 pages
Compiler Construction Week 14
No ratings yet
Compiler Construction Week 14
23 pages
Question Bank DM
No ratings yet
Question Bank DM
17 pages
AI71-SIMP
No ratings yet
AI71-SIMP
4 pages
Math in Networks
No ratings yet
Math in Networks
22 pages
B.Tech. Degree Examination Cse & It: (Nov-12) (EURCS-505/EURIT-505)
No ratings yet
B.Tech. Degree Examination Cse & It: (Nov-12) (EURCS-505/EURIT-505)
3 pages
Chapter IV. Graphical User Interface in C++ CLI
No ratings yet
Chapter IV. Graphical User Interface in C++ CLI
100 pages
DS Unit 1
No ratings yet
DS Unit 1
7 pages
3171 Set1
No ratings yet
3171 Set1
2 pages