Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8K views

SAS Sort Accum Total

The document describes how to create an accumulating total variable in SAS. It explains that by default, variables are initialized to missing at the start of each data step iteration. The RETAIN statement is used to prevent initialization and retain the previous value. It shows code to retain a variable called Mth2Dte, set its initial value to 0, and add the daily sales amount to calculate a running monthly total. The output shows the accumulating total updating with each observation.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8K views

SAS Sort Accum Total

The document describes how to create an accumulating total variable in SAS. It explains that by default, variables are initialized to missing at the start of each data step iteration. The RETAIN statement is used to prevent initialization and retain the previous value. It shows code to retain a variable called Mth2Dte, set its initial value to 0, and add the daily sales amount to calculate a running monthly total. The output shows the accumulating total updating with each observation.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 74

Sort And

Accumulating
Totals

Last Updated : 29June, 2004

Center of Excellence
Data Warehousing
Objectives

 Understand how the SAS System initializes the value of


a variable in the PDV.
 Prevent reinitialization of a variable in the PDV.
 Create an accumulating variable.
Creating an Accumulating Variable

SaleDate SaleAmt
The SAS data set
01APR2001 498.49 prog2.daysales
02APR2001 946.50 contains daily sales data
03APR2001 994.97
for a retail store. There is
04APR2001 564.59
05APR2001 783.01 one observation for each
06APR2001 228.82 day in April showing the
07APR2001 930.57 date (SaleDate) and the
08APR2001 211.47 total receipts for that day
09APR2001 156.23 (SaleAmt).
10APR2001 117.69
11APR2001 374.73
12APR2001 252.73
Creating an Accumulating Variable

 The store manager also wants to see a running total of


sales for the month as of each day.

 Partial Output
Sale
SaleDate Amt Mth2Dte

01APR2001 498.49 498.49


02APR2001 946.50 1444.99
03APR2001 994.97 2439.96
04APR2001 564.59 3004.55
05APR2001 783.01 3787.56
Creating Mth2Dte

 By default, variables created with an assignment


statement are initialized to missing at the top of the
DATA step.

Mth2Dte=Mth2Dte+SaleAmt;

 An accumulating variable must retain its value from one


observation to the next.
The RETAIN Statement

 General form of the RETAIN statement:

RETAIN
RETAINvariable-name
variable-name<initial-value>
<initial-value> …
…;;

 The RETAIN statement prevents SAS from re-


initializing the values of new variables at the top of the
DATA step.
 Previous values of retained variables are available for
processing across iterations of the DATA step.
The RETAIN Statement

 The RETAIN statement


 retains the value of the variable in the PDV across iterations of
the DATA step
 initializes the retained variable to missing before the first
execution of the DATA step if an initial value is not specified
 is a compile-time-only statement.
Retain Mth2Dte and Set an Initial Value

 If you do not supply an initial value, all the values of


Mth2Dte will be missing.

retain Mth2Dte 0;
Creating an Accumulating Variable

data mnthtot;
set prog2.daysales;
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
run;
Compile data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49 run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MTH2DTE

...
...
Execute data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

. . 0

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

.
15066 .
498.49 0

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

0+498.49 R

SALEDATE SALEAMT MNTH2DTE

.
15066 .
498.49 498.49
0

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49


Write out observation to mnthtot.
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97 Implicit Return
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01

SALEDATE SALEAMT MNTH2DTE

15066 498.49 498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01

SALEDATE SALEAMT MNTH2DTE

15066
15067 498.49
946.50 498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01

498.49+946.50 R

SALEDATE SALEAMT MNTH2DTE

15066
15067 498.49
946.50 1444.99
498.49

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99


Write out observation to mnthtot.
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97 Implicit Return
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15067 946.50 1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068
15067 946.50
994.97 1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

1444.99+994.97 R

SALEDATE SALEAMT MNTH2DTE

15068
15067 946.50
994.97 2439.96
1444.99

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96


Write out observation to mnthtot.
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97 Implicit Return
15069 564.59
15070 783.01

SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96

...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01

Continue processing until R


end of SAS data set
SALEDATE SALEAMT MNTH2DTE

15068 994.97 2439.96


Creating an Accumulating Variable

proc print data=mnthtot noobs;


format SaleDate date9.;
run;

Partial PROC PRINT Output


Sale
SaleDate Amt Mth2Dte

01APR2001 498.49 498.49


02APR2001 946.50 1444.99
03APR2001 994.97 2439.96
04APR2001 564.59 3004.55
05APR2001 783.01 3787.56
Accumulating Totals: Missing Values

 What happens if there are missing values for SaleAmt?

data mnthtot;
set prog2.daysales;
retain Mth2Dte 0;
Mth2dte=Mth2Dte+SaleAmt;
run;
Undesirable Output

Sale
SaleDate Amt Mth2Dte

01APR2001 498.49 498.49


02APR2001 . .
03APR2001 994.97 .
04APR2001 564.59 .
05APR2001 783.01 .
Subsequent
values of
Missing value Mth2Dte are
missing
...
...
The Sum Statement

 When creating an accumulating variable, an alternative


to the RETAIN statement is the sum statement.

 General form of the sum statement:

variable
variable ++expression;
expression;
The Sum Statement

 The sum statement


 creates the variable on the left side of the plus sign if it does
not already exist
 initializes the variable to zero before the first iteration of the
DATA step
 automatically retains the variable
 adds the value of the expression to the variable at execution
 ignores missing values.
Accumulating Totals: Missing Values

data mnthtot2;
set prog2.daysales2;
Mth2Dte+SaleAmt;
run;
Accumulating Totals: Missing Values

proc print data=mnthtot2 noobs;


format SaleDate date9.;
run;

Partial PROC PRINT Output

SaleDate SaleAmt Mth2Dte

01APR2001 498.49 498.49


02APR2001 . 498.49
03APR2001 994.97 1493.46
04APR2001 564.59 2058.05
05APR2001 783.01 2841.06
c03s1d1.sas
Objectives

 Define First. and Last. processing.


 Calculate an accumulating total for groups
of data.
 Use a subsetting IF statement to output selected
observations.
Accumulating Totals for Groups
EmpID Salary Div
The SAS data set
E00004 42000 HUMRES prog2.empsals
E00009 34000 FINACE contains each
E00011 27000 FLTOPS employee’s
E00036 20000 FINACE identification number
E00037 19000 FINACE (EmpID), salary
E00048 19000 FLTOPS (Salary), and division
E00077 27000 APTOPS (Div). There is one
E00097 20000 APTOPS observation for each
E00107 31000 FINACE employee.
E00123 20000 APTOPS
E00155 27000 APTOPS
E00171 44000 SALES
Desired Output

 Human resources wants a new data set that shows total


salary paid for each division.

Div DivSal

APTOPS 410000
FINACE 163000
FLTOPS 318000
HUMRES 181000
SALES 373000
Grouping the Data
A
B
You must group the
data in the SAS data set
E before you
D
C can perform processing.
Review of the SORT Procedure

You can rearrange the observations into groups


using the SORT procedure.

General form of a PROC SORT step:

PROC
PROCSORT
SORTDATA=input-SAS-data-set
DATA=input-SAS-data-set
<OUT=output-SAS-data-set>;
<OUT=output-SAS-data-set>;
BY
BY<DESCENDING>
<DESCENDING>BY-variable
BY-variable...;
...;
RUN;
RUN;
The SORT Procedure

 The SORT procedure


 rearranges the observations in a DATA set
 can sort on multiple variables
 creates a SAS data set that is a sorted copy of the input SAS data
set
 replaces the input data set by default.
Sorting by Div

proc sort data=prog2.empsals out=salsort;


by Div;
run;
Processing Data in Groups

Div Salary DivSal


APTOPS 20000
APTOPS 100000 170000
APTOPS 50000
FINACE 25000
FINACE 20000
FINACE 23000 95000
FINACE 27000
SALES 10000
SALES 12000 22000

...
...
BY-Group Processing
 General form of a BY statement used with the SET
statement:

DATA
DATAoutput-SAS-data-set;
output-SAS-data-set;
SET
SETinput-SAS-data-set;
input-SAS-data-set;
BYBYBY-variable
BY-variable…… ;;
<additional
<additionalSAS
SASstatements>
statements>
RUN;
RUN;

 The BY statement in the DATA step enables you to


process your data in groups.
BY-Group Processing

data divsals(keep=Div DivSal);


set salsort;
by Div;
additional SAS statements
run;
BY-Group Processing

 A BY statement in a DATA step creates temporary


variables for each variable listed in the BY statement.

 General form of the names of BY variables in a DATA


step:

First.BY-variable
First.BY-variable
Last.BY-variable
Last.BY-variable
First. and Last. Values
 The First. variable has a value of 1 for the first observation in
a BY group; otherwise, it equals 0.
 The Last. variable has a value of 1 for the last observation in a
BY group; otherwise, it equals 0.
 Use these temporary variables to conditionally process
sorted, grouped, or indexed data.
First. / Last. Example

Look Ahead
Div Salary First.Div
APTOPS 20000 1
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Look Ahead
Div Salary First.Div
APTOPS 20000 0
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Div Salary First.Div


Look Ahead
APTOPS 20000 0
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 1
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Div Salary First.Div


APTOPS 20000 1
Look Ahead
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000

...
...
First. / Last. Example

Div Salary First.Div


APTOPS 20000 1
Look Ahead
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000
What Must Happen When?

 There is a three-step process for accumulating totals:

 1. Set the accumulating variable to 0 at the start of


each BY group.

 2. Increment the accumulating variable with a sum


statement (automatically retains).

 3. Output only the last observation of each BY group.


Accumulating Totals for Groups
1. Set the accumulating variable to 0 at the
start of each BY group.

data divsals(keep=Div DivSal);


set salsort;
by Div;
if First.Div then DivSal=0;
additional SAS statements
run;
Accumulating Totals for Groups
2. Increment the accumulating variable with a
sum statement (automatically retains).

data divsals(keep=Div DivSal);


set salsort;
by Div;
if First.Div then DivSal=0;
DivSal+Salary;
additional SAS statements
run;
First. / Last. Example

Div Salary DivSal


APTOPS 20000 20000
APTOPS 100000 120000
APTOPS 50000 170000
FINACE 25000 25000
FINACE 20000 45000
FINACE 23000 68000
FINACE 27000 91000
SALES 10000 10000
SALES 12000 22000
Subsetting IF Statement

 The subsetting IF defines a condition that the


observation must meet to be further processed by the
DATA step.
 General form of the subsetting IF statement:

IF
IFexpression;
expression;

 If the expression is true, the DATA step continues processing


the current observation.
 If the expression is false, SAS returns to the top of the DATA
step.
Accumulating Totals for Groups
3. Output only the last observation of each BY group.
data divsals(keep=Div DivSal);
set salsort;
by Div;
if First.Div then DivSal=0;
DivSal+Salary;
if Last.Div;
run;
Subsetting IF Statement (Review)
Initialize
InitializePDV.
PDV.

Execute
Executeprogram
program
statements.
statements. NO

Is the
If condition; condition
true?

Execute
Executeadditional
additional YES
program
programstatements.
statements.

Output
Outputobservation
observationto
to
SAS
SASdata
dataset.
set.
...
...
Accumulating Totals for Groups

Partial Log
NOTE: There were 39 observations read
from the data set WORK.SALSORT.
NOTE: The data set WORK.DIVSALS has 5
observations and 2 variables.
NOTE: DATA statement used:
real time 0.74 seconds
cpu time 0.33 seconds
Accumulating Totals for Groups

proc print data=divsals noobs;


run;

PROC PRINT Output


Div DivSal

APTOPS 410000
FINACE 163000
FLTOPS 318000
HUMRES 181000
SALES 373000

c03s2d1.sas
Input Data

EmpID Salary Region Div The SAS data set


E00004 42000 E HUMRES prog2.regsals
E00009 34000 W FINACE contains each
E00011
E00036
27000
20000
W
W
FLTOPS
FINACE
employee’s ID
E00037 19000 E FINACE number (EmpID),
E00077 27000 C APTOPS salary (Salary),
E00097 20000 E APTOPS
E00107 31000 E FINACE region (Region),
E00123 20000 NC APTOPS and division (Div).
E00155
E00171
27000
44000
W
W
APTOPS
SALES
There is one
E00188 37000 W HUMRES observation for each
E00196 43000 C APTOPS employee.
E00210 31000 E APTOPS
E00222 250000 NC SALES
E00236 41000 W APTOPS
Desired Output

 Human resources wants a new data set that shows the


total salary paid and the total number of employees for
each division in each region.
 Partial Output
Num
Region Div DivSal Emps

C APTOPS 70000 2
E APTOPS 83000 3
E FINACE 109000 4
E FLTOPS 122000 3
E HUMRES 178000 5
NC APTOPS 37000 2
NC FLTOPS 28000 1
Sorting by Region and Div

 The data must be sorted by Region and Div. Region is


the primary sort variable. Div is the secondary sort
variable.

proc sort data=prog2.regsals out=regsort;


by Region Div;
run;
Sorting by Region and Div

proc print data=regsort noobs;


run;

Partial PROC PRINT Output


Region Div Salary

C APTOPS 27000
C APTOPS 43000
E APTOPS 20000
E APTOPS 31000
E APTOPS 32000
E FINACE 19000
E FINACE 31000
Multiple BY Variables

data regdivsals;
set regsort;
by Region Div;
additional SAS statements
run;
Multiple BY Variables: Example
Look Ahead
Region Div First.Region
C APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 0
...
...
Multiple BY Variables: Example

Look Ahead
Region Div First.Region
C APTOPS 0
C APTOPS
C APTOPS First.Div
E APTOPS 0
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 0
...
...
Multiple BY Variables: Example

Region Div
Look Ahead First.Region
C APTOPS 0
C APTOPS
C APTOPS First.Div
E APTOPS 0
E FINACE
E FINACE Last.Region
NC FINACE
1
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
...
...
Multiple BY Variables: Example

Region Div First.Region


C Look Ahead
APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
...
...
Multiple BY Variables: Example

Region Div First.Region


C Look Ahead
APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
Multiple BY Variables
When you use more than one variable in the BY
statement, a change in the primary variable forces
Last.BY-variable=1 for the secondary variable.

First. Last. First.


Region Div Region Region Div Last.Div

C APTOPS 1 0 1 0
C APTOPS 0 1 0 1
E APTOPS 1 0 1 0
E APTOPS 0 0 0 0
E APTOPS 0 0 0 1
E FINACE 0 0 1 0
Multiple BY Variables
/*Summarize salaries by division*/
data regdivsals(keep=Region Div
DivSal NumEmps);
set regsort;
by Region Div;
if First.Div then do;
DivSal=0;
NumEmps=0;
end;
DivSal+Salary;
NumEmps+1;
if Last.Div;
run;
Multiple BY Variables
Partial Log
NOTE: There were 39 observations read
from the data set WORK.REGSORT.
NOTE: The data set WORK.REGDIVSALS has
14 observations and 4 variables.
NOTE: DATA statement used:
real time 0.07 seconds
cpu time 0.07 seconds
Multiple BY Variables

proc print data=regdivsals noobs;


run;

Partial PROC PRINT Output

Region Div DivSal

C APTOPS 70000
E APTOPS 83000
E FINACE 109000
E FLTOPS 122000

c03s2d2.sas
Questions

You might also like