SAS Sort Accum Total
SAS Sort Accum Total
Accumulating
Totals
Center of Excellence
Data Warehousing
Objectives
SaleDate SaleAmt
The SAS data set
01APR2001 498.49 prog2.daysales
02APR2001 946.50 contains daily sales data
03APR2001 994.97
for a retail store. There is
04APR2001 564.59
05APR2001 783.01 one observation for each
06APR2001 228.82 day in April showing the
07APR2001 930.57 date (SaleDate) and the
08APR2001 211.47 total receipts for that day
09APR2001 156.23 (SaleAmt).
10APR2001 117.69
11APR2001 374.73
12APR2001 252.73
Creating an Accumulating Variable
Partial Output
Sale
SaleDate Amt Mth2Dte
Mth2Dte=Mth2Dte+SaleAmt;
RETAIN
RETAINvariable-name
variable-name<initial-value>
<initial-value> …
…;;
retain Mth2Dte 0;
Creating an Accumulating Variable
data mnthtot;
set prog2.daysales;
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
run;
Compile data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49 run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
...
...
Execute data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
. . 0
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
.
15066 .
498.49 0
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
0+498.49 R
.
15066 .
498.49 498.49
0
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01
...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01
...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01
15066
15067 498.49
946.50 498.49
...
...
data mnthtot;
set prog2.daysales;
SaleDate
SaleDate SaleAmt
SaleAmt retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066
15066 498.49
498.49 run;
15067
15067 946.50
946.50
15068
15068 994.97
994.97
15069
15069 564.59
564.59
15070
15070 783.01
783.01
498.49+946.50 R
15066
15067 498.49
946.50 1444.99
498.49
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
15068
15067 946.50
994.97 1444.99
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
1444.99+994.97 R
15068
15067 946.50
994.97 2439.96
1444.99
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
Implicit
15068 994.97 Output
15069 564.59
15070 783.01
...
...
data mnthtot;
set prog2.daysales;
SaleDate SaleAmt
retain Mth2Dte 0;
Mth2Dte=Mth2Dte+SaleAmt;
15066 498.49
run;
15067 946.50
15068 994.97
15069 564.59
15070 783.01
data mnthtot;
set prog2.daysales;
retain Mth2Dte 0;
Mth2dte=Mth2Dte+SaleAmt;
run;
Undesirable Output
Sale
SaleDate Amt Mth2Dte
variable
variable ++expression;
expression;
The Sum Statement
data mnthtot2;
set prog2.daysales2;
Mth2Dte+SaleAmt;
run;
Accumulating Totals: Missing Values
Div DivSal
APTOPS 410000
FINACE 163000
FLTOPS 318000
HUMRES 181000
SALES 373000
Grouping the Data
A
B
You must group the
data in the SAS data set
E before you
D
C can perform processing.
Review of the SORT Procedure
PROC
PROCSORT
SORTDATA=input-SAS-data-set
DATA=input-SAS-data-set
<OUT=output-SAS-data-set>;
<OUT=output-SAS-data-set>;
BY
BY<DESCENDING>
<DESCENDING>BY-variable
BY-variable...;
...;
RUN;
RUN;
The SORT Procedure
...
...
BY-Group Processing
General form of a BY statement used with the SET
statement:
DATA
DATAoutput-SAS-data-set;
output-SAS-data-set;
SET
SETinput-SAS-data-set;
input-SAS-data-set;
BYBYBY-variable
BY-variable…… ;;
<additional
<additionalSAS
SASstatements>
statements>
RUN;
RUN;
First.BY-variable
First.BY-variable
Last.BY-variable
Last.BY-variable
First. and Last. Values
The First. variable has a value of 1 for the first observation in
a BY group; otherwise, it equals 0.
The Last. variable has a value of 1 for the last observation in a
BY group; otherwise, it equals 0.
Use these temporary variables to conditionally process
sorted, grouped, or indexed data.
First. / Last. Example
Look Ahead
Div Salary First.Div
APTOPS 20000 1
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000
...
...
First. / Last. Example
Look Ahead
Div Salary First.Div
APTOPS 20000 0
APTOPS 100000
APTOPS 50000 Last.Div
FINACE 25000 0
FINACE 20000
FINACE 23000
FINACE 27000
SALES 10000
SALES 12000
...
...
First. / Last. Example
...
...
First. / Last. Example
...
...
First. / Last. Example
IF
IFexpression;
expression;
Execute
Executeprogram
program
statements.
statements. NO
Is the
If condition; condition
true?
Execute
Executeadditional
additional YES
program
programstatements.
statements.
Output
Outputobservation
observationto
to
SAS
SASdata
dataset.
set.
...
...
Accumulating Totals for Groups
Partial Log
NOTE: There were 39 observations read
from the data set WORK.SALSORT.
NOTE: The data set WORK.DIVSALS has 5
observations and 2 variables.
NOTE: DATA statement used:
real time 0.74 seconds
cpu time 0.33 seconds
Accumulating Totals for Groups
APTOPS 410000
FINACE 163000
FLTOPS 318000
HUMRES 181000
SALES 373000
c03s2d1.sas
Input Data
C APTOPS 70000 2
E APTOPS 83000 3
E FINACE 109000 4
E FLTOPS 122000 3
E HUMRES 178000 5
NC APTOPS 37000 2
NC FLTOPS 28000 1
Sorting by Region and Div
C APTOPS 27000
C APTOPS 43000
E APTOPS 20000
E APTOPS 31000
E APTOPS 32000
E FINACE 19000
E FINACE 31000
Multiple BY Variables
data regdivsals;
set regsort;
by Region Div;
additional SAS statements
run;
Multiple BY Variables: Example
Look Ahead
Region Div First.Region
C APTOPS 1
C APTOPS
C APTOPS First.Div
E APTOPS 1
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 0
...
...
Multiple BY Variables: Example
Look Ahead
Region Div First.Region
C APTOPS 0
C APTOPS
C APTOPS First.Div
E APTOPS 0
E FINACE
E FINACE Last.Region
NC FINACE
0
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 0
...
...
Multiple BY Variables: Example
Region Div
Look Ahead First.Region
C APTOPS 0
C APTOPS
C APTOPS First.Div
E APTOPS 0
E FINACE
E FINACE Last.Region
NC FINACE
1
NC SALES
NC SALES
NC SALES Last.Div
NC SALES 1
...
...
Multiple BY Variables: Example
C APTOPS 1 0 1 0
C APTOPS 0 1 0 1
E APTOPS 1 0 1 0
E APTOPS 0 0 0 0
E APTOPS 0 0 0 1
E FINACE 0 0 1 0
Multiple BY Variables
/*Summarize salaries by division*/
data regdivsals(keep=Region Div
DivSal NumEmps);
set regsort;
by Region Div;
if First.Div then do;
DivSal=0;
NumEmps=0;
end;
DivSal+Salary;
NumEmps+1;
if Last.Div;
run;
Multiple BY Variables
Partial Log
NOTE: There were 39 observations read
from the data set WORK.REGSORT.
NOTE: The data set WORK.REGDIVSALS has
14 observations and 4 variables.
NOTE: DATA statement used:
real time 0.07 seconds
cpu time 0.07 seconds
Multiple BY Variables
C APTOPS 70000
E APTOPS 83000
E FINACE 109000
E FLTOPS 122000
c03s2d2.sas
Questions