Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Tranforming SAS Data Sets

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41
At a glance
Powered by AI
The document discusses how to create new SAS data sets, transform existing variables, add new variables, and subset observations and variables using SAS data steps. It also covers the basic structure and components of a SAS data step.

The document explains that SAS data sets can be modified within a DATA step by selecting subsets of observations, transforming variables, and creating new variables. It discusses using statements like DATA, SET, OUTPUT, and RETURN to perform these tasks.

The document outlines that a basic SAS data step has five components - the DATA statement to start and name the output data set, the SET statement to read the input data set, programming statements to perform processing, the OUTPUT statement to write observations, and the RETURN statement to end processing of each observation.

5.

TRANSFORMING SAS DATA SETS


a. b. c. d. Creating new SAS data sets Creating and transforming variables Subsetting observations Subsetting variables

Reading Assignment:

Selected SAS Documentation for Bios111 Part 3: Transforming SAS Data Sets

REVISED FALL 2000

5-1

Creating New SAS Data Sets


It will often be desirable to modify an existing SAS data set in some way--selecting only a subset of the original observations, transforming variables, creating new variables, etc. These kinds of modifications are accomplished within a DATA step. v A DATA Step v Reads one or more input data sets (SAS and/or non-SAS) v Performs processing (transformations, selections, etc.), if specified v Creates one or more output data sets (SAS or non-SAS) v In this chapter we will only discuss reading a single input SAS data set and creating a single output SAS data set. The other possibilities will be covered in subsequent chapters. v All of the modification statements we will discuss can be used with any combination of input and output sources.

Structure of A DATA Step


A DATA step that creates a single output SAS data set by modifying a single input SAS data set has a five part structure: 1. 2. 3. 4. 5. A DATA statement to start the step and name the output data set A SET statement to read an observation from the input data set Programming statements to perform the processing required for this observation An OUTPUT statement to write the observation to the output data set A RETURN statement to end processing of this observation and return to the top of the step

5-2

The DATA Statement


The DATA statement has two functions: v It defines the start of a DATA step v It names the SAS data sets to be created Syntax: DATA Libref.Dataset; Where Dataset Libref is the name of the SAS data set to be created is the libref for a SAS data library in which the data set will be stored

The SET Statement


v The SET statement reads an observation from an input SAS data set each time it is executed. v All variables in the input SAS data set are automatically passed to the new SAS data set (unless otherwise directed with programming statements. v All observations in the input SAS data set are automatically passed to the new SAS data set (unless otherwise directed with programming statements. v New variables may be added with assignment statements. v Note that reading a data set does not modify it in any way. Syntax: SET Libref.Dataset; Where Dataset Libref is the name of an existing SAS data set to be read is the libref for a SAS data library in which the data set is

5-3

The OUTPUT Statement


v The OUTPUT statement controls when the values in the program data vector (PDV) are written to the output SAS data v The OUTPUT statement is optional v When the OUTPUT statement appears in the data step, there is no automatic output at the end of a data step v When the OUTPUT statement does not appears in the data step, SAS outputs the values of the PDV at the end of the data step v When an OUTPUT statement is executed, SAS immediately outputs the current PDV values to a SAS data set v Execution of the OUTPUT statement does not return control to the beginning of the DATA step

Syntax: OUTPUT; or

OUTPUT SASdataset(s) ;

The RETURN Statement


v The RETURN statement is usually the last statement in the DATA step. It indicates that processing of the current observation is finished. SAS then returns to the DATA statement at the beginning of the step and processes the next observation. v The RETURN statement is optional. If the RETURN statement is omitted, execution returns to the top of the data step when a RUN or a PROC statement is encountered. Syntax: RETURN;

5-4

Processing of a DATA Step


The processing of every DATA step involves two distinct phases. v First, SAS compiles the statements within the step, creating a program to perform the processing requested v Second, the program created is executed, processing the data and creating the new data set

v An Example DATA Step: DATA WORK.MYCLASS; SET CLASSLIB.CLASS; OUTPUT; RETURN; RUN;

5-5

The Compilation Phase


During the compilation phase, the DATA compiler: v The SET statement reads the descriptor portion of the existing SAS data set v Creates the descriptor part of the output data set v Creates the program data vector which will contain all of the variables found in the existing SAS data set plus any new variables created with assignment statements v Creates a machine language program to perform the processing v Detects syntax errors

The Execution Phase


During the execution phase: v The SET statement is executed once for each observation in the existing SAS data set v Each time the SET statement is executed, it reads an observation from the existing SAS data set and writes the current observation to the PDV v Any program statements in the DATA step are executed once for each observation in the input data set v The values in the PDV are written to the new SAS data set after the last executable statement in the DATA step or when an OUTPUT statement is executed

5-6

Flowchart of Execution:

Initialize PDV to Missing

DATA WORK.MYCLASS;

END of Input No

Yes

Read Next Observation into PVD

SET CLASSLIB.CLASS;

Modify Data Values in PDV

Write Values From PDV to Output Data Set

OUTPUT;

RETURN; First Output Data Set and Go to Next Step

5-7

SAS data set CLASSLIB.CLASS


NAME CHAR 12 CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX CHAR 1 M M M F M AGE NUM 5 37 31 41 . 3 14 HT NUM 8 71 70 74 48 12 25 WT NUM 8 195 160 195 . 1 45

Program data vector


NAME | SEX | AGE | HT | WT

SAS data set


WORK.MYCLASS NAME CHAR 12 SEX CHAR 1 AGE NUM 3 HT NUM 8 WT NUM 8

5-8

Summary--Creating New SAS Data Sets The four statements just described (DATA, SET, OUTPUT, RETURN) are used whenever we want to create a new SAS data set from an existing one. Other statements are added to the step in order to make the output data set a modified version of the input data set, rather than an exact copy. In this chapter, we only discuss creating SAS data sets from other, already existing SAS data sets. Creating a SAS data set from a non-SAS data set (e.g., ascii or Dbase file) is a more complex task, which will be covered in detail later in the course. Creating a new data set does not delete or modify the input data set; it is still available for use in subsequent steps.

5-9

Creating and Transforming Variables


In many cases, the reason for creating a new SAS data set will be to create new variables that are some combination of existing variables, or to transform an existing variable. For example, we might want to add a new variable to the class data set called RELWT (for relative weight) whose value for each observation is defined by the algebraic formula: RELWT=WT/HT ; that is, the persons weight divided by their height. An example of transforming an existing variable would be recoding the values of height from English units (inches) to metric units (centimeters). The formula in this case is: HT=2.54.HT ; that is, take each persons current value of weight, multiply it by 2.54 and use that result to replace the original value. These kinds of operations are performed in a DATA step using assignment statements.

The Assignment Statement


v The assignment statement is used to create new variables or to transform existing variables. v Syntax: variable = expression; where variable is the name of a variable in (or to be added to) the data set expression is an arithmetic expression, as defined below v Examples: RELWT = WT/HT; HT=2.54*ht;

5-10

v Notes: v The assignment is one (of two) exceptions to the rule that every SAS statement begins with a keyword v If "variable" is the name of an already existing variable, the value of "expression" replaces the previous value; if "variable" is a new name, the assignment statement creates a new variable, which is added to the output data set

Expressions
v An expression consists of one or more constants, variables, and functions, combined by operators. v A constant is a number (e.g., 1, - 23.6, .00/) or, a character string (e.g., JOHN, MALE, X#!); character constants must be enclosed in single quotes (apostrophes). (SAS also allows other, specialized types of constants; we will discuss some of them later in the course.) v A function is a program "built in" to SAS that performs some computation on character or numeric values. v An operator is a mathematical, logical, or character operation or manipulation that combines, compares, or transforms numeric or character values.

Arithmetic Operators perform basic arithmetic calculations

Symbol + * / **

Action addition subtraction multiplication division exponentiation

5-11

Comparison operators look at the relationship between two quantities Symbol = ^= > < >= <= Mnemonic Equivalent EQ NE GT LT GE LE IN Action equal to not equal to greater than less than greater than or equal to less than or equal to equal to one of a list

Logical or Boolean operators are used to link sequences of comparisons. Symbol & | ^ Mnemonic Equivalent AND OR NOT

Other operators Symbol >< <> || Description Maximum Minimum Concatenation

5-12

v Examples: Assigning constants: N=4; SEX=FEMALE; Basic arithmetic operators: X2=X; SUM=X+Y; DIF=X-Y; TWICE=X*2; HALF=X/2; CUBIC=X**3; Y=-X; Comparison operators: X<Y THEN C=5 ; (X<Y) ; NAME=PAT ; IF 5<=AGE<=20 ; IF AGE IN(10,20,30) THEN X=5 ; IF SEX IN(M,F) THEN S=1 ; Logical operators: IF A<B AND C>0 ; IF X=2 OR X=4 ; IF NOT(X=2) ; IF NOT(NAME=SMITH); Other operators: X= A><B ; NAME= FIRST || LAST ; copy the value addition subtraction multiplication division exponentiation change sign numeric constant character constant

5-13

Complex expressions
Priority of evaluation: ( ) ** |*/| |+ -| left to right operator precedence left to right parenthetical parenthetical

A=X+Y+Z; A=X+Y*Z; A=X/Y/Z; A=X/(Y/Z); Z=ABS(SQRT(X)-2);

Examples of Assignment Statement


1 2 3 4 5 6 7 NOTE: DATA WORK.NEWCLASS; SET CLASSLIB.CLASS; AGE=AGE*12; QUETELET= ((WT/2.2) / ((HT*2.54) **1)) *10000; OUTPUT; RETURN; RUN; Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line) : (Column). 1 at 16:6 1 at 17:13 2 at 17:23 The data set WORK.NEWCLASS has 6 observations and 6 variables. The DATA statement used 1.00 seconds. PROC PRINT DATA=WORK.NEWCLASS; TITLE1 CREATING VARIABLES WITH ASSIGNMENT STATEMENTS; RUN; The PROCEDURE PRINT used 1.00 seconds.

NOTE: NOTE: 8 9 10 11 NOTE:

CREATING VARIABLES WITH ASSIGNMENT STATEMENTS


OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE 444 372 492 . 36 168 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45 QUETELET 27.2538 23.0056 25.0889 . 4.8927 50.7274

5-14

Functions:
v General form of a SAS function: variable=function-name(argument1, argument2, . . .); v Each argument is separated from the others by a comma. Must functions accept arguments that areconstants, variables, expressions, or functions. v Examples: S=SQRT(X); A=ABS(X); B=MAX(2,7); C=SUBST(INSIDE,3,4); D=MIN(X,7,A+B); v Types of functions v v v v v Arithmetic (absolute value, square root, mean, variance..) Trigonometric (cosine, sine, arc cosine.) Other mathematical and statistical (natural logarithm, exponential.) Pseudo-random number generators Character string functions

v Selected functions that compute simple statistics v v v v v v v v Sum Mean Var Min Max Std N Nmiss sum mean variance minimum maximum standard deviation number non-missing number missing

v Simple statistics functions compute statistics for each observation (row) in the SAS data set (functions operate across rows) v Procedures produce statistics for variables (columns) in the SAS data set (procedures operate down columns)

5-15

Subsetting Observations
A common type of transformation is subsetting observations, creating a new SAS data set with the same variables as the input data set, but only those observations that satisfy some selection criterion. The subsetting IF statement can be used to accomplish this. Syntax: IF logical expression; where logical expression is given by one of the following: GT LT GE LE EQ NE

1.

expression

expression

2.

logical expression1

OR AND

logical expression2

where "expression" can be any of the forms discussed for assignment statements. v If the expression is true, execution of the step continues for this observation. v If the expression is false: v SAS stops executing statements for this observation immediately, and v returns to the top of the data step and begins processing the next observation. v Examples: IF AGE GT 35; IF DEPT EQ FURS; v Complex logical expressions can be constructed by combining simple logical expression with the operators AND and/or OR. v Examples: IF (HT GT 70) AND (WT GT 180); IF (DEPT EQ FURS) OR (CLERK EQ ABLE);

5-16

Execution of a Data Step With Subsetting IFs


DATA WORK.MALES; SET CLASSLIB.CLASS; IF SEX EQ M; OUTPUT; RETURN; RUN;

CLASSLIB.CLASS NAME PIGGY FROG GONZO SEX F M AGE . 3 14 HT 48 12 25 WT . 1 45

NAME |

SEX |

AGE |

HT |

WT

WORK.MALES NAME FROG SEX M AGE 3 HT 12 WT 1

5-17

Examples of Subsetting Observations


1 2 3 4 5 6 7 NOTE: NOTE: 8 9 10 11 NOTE:

DATA WORK.FURS; SET CLASSLIB.SALES; IF DEPT EQ FURS; OUTPUT; RETURN RUN; The data set WORK.FURS has 6 observations and 6 variables. The DATA statement used 2.00 seconds. PROC PRINT DATA=WORK.FURS; TITLE1 SELECTING OBSERVATIONS USING SUBSETTING IF; RUN; The PROCEDURE PRINT used 1.00 seconds.

SELECTING OBSERVATIONS USING SUBSETTING IF OBS 1 2 3 4 5 6 DEPT FURS FURS FURS FURS FURS FURS CLERK BURLEY BURLEY AGILE BURLEY BURLEY AGILE PRICE 599.95 800.00 590.00 499.95 700.00 700.00 COST 180.01 240.00 182.00 200.01 210.00 210.00 WEEKDAY THR MON SAT SAT THR WED DAY 5 9 14 14 19 25

5-18

Comparison Operators

GT LT GE LE EQ NE IN

> < >= <= = ^=

EXAMPLES: if age > 35 ; if age gt 35 ; if age < 35 ; if age lt 35 ; if age >= 35 ; if age ge 35 ; if sex > name ; if age <= 35 ; if age le 35 ; if age=35 ; if age= 35 ; if age eq 35 ; if sex=female ; if sex=FEMALE ; if sex= ; if sex= ; if age ne 35 ; if age ^= 35 ; if ht < wt ; if ht <=.z. ; if sex= ; IF sex in(MALE,FEMALE) ; IF age in(30,34) ;

5-19

if 30 <= age <= 40 ; if .z< age <= 50 ; if 20< age < 50 ;

LOGICAL BOOLEAN OPERATORS AND EXPRESSIONS

& | ^

AND OR NOT

EXAMPLES:

IF AGE=35 AND HT=40 ; IF (AGE=35) & (HT=40) ; IF SEX EQ FEMALES AND AGE IN(30,35) ; IF AGE>=16 AND AGE<=65 ; IF 16<= AGE <=65 ; IF HT>WT OR AGE=40 ; IF (HT>WT) | (AGE=40) ; IF AGE=20 OR AGE=30 OR AGE=40 ; IF AGE IN(20,30,40) ; IF NOT(SEX=MALE) ; IF SEX NE MALES ;

5-20

BOOLEAN NUMERIC EXPRESSIONS


In SAS a numeric value other than 0 or missing is true ; a value of 0 or missing is false . Therefore a numeric variable or expresssion can stand alone as a condition . If its value is a number other than 0 or missing it is true ; if its value is 0 or missing, the condition is false .

IF AGE ; IF (HT > WT) ; IF (AGE) & (HT > WT) ; NEWVAR=(HT>WT) ; NEWVAR=(AGE=40) ;

5-21

Example

19 PROC PRINT DATA=CLASSLIB.CLASS ; 20 TITLE PRINT OUT CLASS DATA SET ; 21 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds. 22 23 DATA CLASS2 ; 24 SET CLASSLIB.CLASS ; 25 IF AGE ; 26 OUTPUT ; 27 RETURN ; 28 RUN ; NOTE: The data set WORK.CLASS2 has 5 observations and 5 variables. NOTE: The DATA statement used 2.00 seconds. 29 30 PROC PRINT DATA=ONE ; 31 TITLE PRINT OUT CLASS2 DATA SET ; 32 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds.

PRINT OUT CLASS DATA SET OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45

PRINT OUT CLASS2 DATA SET OBS 1 2 3 4 5 NAME CHRISTIANSEN HOSKING J HELMS R FROG K GONZO SEX M M M M AGE 37 31 41 3 14 HT 71 70 74 12 25 WT 195 160 195 1 45

5-22

THE CONCATENATION OPERATOR


v The concatenation operator ( || ) concatenates character values. v the results of a concatenation are usually stored in a variable using an assignment statement v the length of the resulting variable is the sum of the lengths of each variable or constant in the concatenation operation v the concatenation operator does not trim trailing or leading blanks v the TRIM function can be used to trim trailing blanks from values before concatenating them

v use the LEFT function to trim leading blanks

5-23

CONCATENATION EXAMPLE
64 data one ; 65 set classlib.class ; 66 67 c1 = dept ; 68 c2 = bios ; 69 c3 = c1 || c2 ; 70 71 length c4 $ 8 ; 72 c4 = dept ; 73 c5 = c4 || c2 ; 74 75 c6 = c1 || of || c2 ; 76 77 keep c1-c6 ; 78 run; NOTE: The data set WORK.ONE has 6 observations and 6 variables. NOTE: The DATA statement used 2.00 seconds. 79 80 title concatenation example ; 81 proc print ; 82 run; NOTE: The PROCEDURE PRINT used 1.00 seconds. 83 proc contents ; 84 run; NOTE: The PROCEDURE CONTENTS used 1.00 seconds. CONCATENATION EXAMPLE OBS 1 2 3 4 5 6 C1 dept dept dept dept dept dept C2 bios bios bios bios bios bios C3 deptbios deptbios deptbios deptbios deptbios deptbios C4 dept dept dept dept dept dept dept dept dept dept dept dept C5 bios bios bios bios bios bios dept dept dept dept dept dept C6 of of of of of of bios bios bios bios bios bios

CONTENTS PROCEDURE Data Set Name: Observations: Variables: WORK.ONE 6 6 Type: Record Len: 52

-----Alphabetic List of Variables and Attributes----# 1 2 3 4 5 6 Variable C1 C2 C3 C4 C5 C6 Type Char Char Char Char Char Char Len 4 4 8 8 12 12 Pos 4 8 12 20 28 40 Label

5-24

WHERE STATEMENT
v The WHERE statement allows you to select observations from an existing SAS data set that meet a particular condition before the SAS system brings observations into a data set. v WHERE selection is the first operation the SAS system performs in each execution of a set, merge, or update operation v The WHERE statement in not executable; that is, it cant be used as part of an IF/THEN statement v The WHERE statement is not a replacement for the IF statement ; the two work differently and can produce different output data sets. A data step can use either statement, both, or neither. v SYNTAX: WHERE where_expression in which where_expression is an arithmetic or logical expression v EXAMPLES: where age>50 ; where sex=FEMALE and ht=. ;

WHERE EXPRESSIONS
v A WHERE expression is a sequence of operands and operators. You cannot use variables created within the data step or variables created in assignment statements. v A WHERE expression can use the following operators: Arithmetic Operators * / + Comparison Operators = ^= > < >= <=
IN

Logical Operators & | ^

5-25

WHERE vs IF
v The WHERE statement works before observation are brought into the data step(that is the PROGRAM DATA VECTOR) . v The IF statement works on observation that are already in the data step. v The WHERE statement is not executable, but the IF statement is v The WHERE statement operates only on observations in SAS data sets, whereas the IF statement can operate either on observations from existing SAS data sets or on observations created with an input statement. v If a BY statement does not accompany a SET or MERGE statement, the WHERE and IF statements usually produce the same result v In almost all cases a WHERE statement is more efficient than an IF statement(observations do not have to be moved into the PDV) v The WHERE statement, but not the IF statement can be used in SAS PROCS . v EXAMPLES: DATA ONE ; SET TWO; WHERE AGE>35 ; RUN; DATA ONE; SET TWO; WHERE AGE ; RUN; PROC PRINT DATA=CLASSLIB.CLASS ; WHERE SEX=FEMALE ; RUN; PROC MEANS DATA=CLASSLIB.CLASS ; WHERE 25 <AGE <= 35 AND SEX=MALE ; RUN;

5-26

USING A WHERE STATEMENT


27 proc print data=classlib.class ; 28 title1 print classlib.sales ; 29 title2 no WHERE statement ; 30 run; NOTE: The PROCEDURE PRINT used 1.00 seconds. 31 32 proc print data=classlib.class ; 33 WHERE SEX=F ; 34 title1 print classlib.sales ; 35 title2 using a WHERE statement ; 36 run; NOTE: The PROCEDURE PRINT used 1.00 seconds. 37 38 proc print data=classlib.class ; 39 WHERE AGE < 20 OR SEX= ; 40 title1 print classlib.sales ; 41 title2 using a WHERE statement ; 42 run; NOTE: The PROCEDURE PRINT used 1.00 seconds.

print classlib.sales no WHERE statement OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45

print classlib.sales using a WHERE statement OBS 4 NAME PIGGY M SEX F AGE . HT 48 WT .

print classlib.sales using a WHERE statement OBS 4 5 6 NAME PIGGY M FROG K GONZO SEX F M AGE . 3 14 HT 48 12 25 WT . 1 45

5-27

Subsetting Variables
Another type of transformation that can be performed with a data step is to create a data set containing a subset of the variables from the input data set. The DROP or KEEP statement can be used to accomplish this. v Syntax: or KEEP variable list; v Notes: v Only one of the statements can be used in a step: v if the DROP statement is used, the variables listed are not included in the output data set if the KEEP statement is used, the variables listed are the only ones included in the output data set v The KEEP or DROP statement only defines which values are written from the program data vector to the output data set; all values are available during the execution of the step DROP variable list;

Examples of Subsetting Variables


1 2 3 4 5 6 7 NOTE: NOTE: 8 9 10 11 NOTE: DATA WORK.NAMELESS; SET CLASSLIB.CLASS; DROP NAME; OUTPUT; RETURN; RUN; The data set WORK.NAMELESS has 6 observations and 4 variables. The DATA statement used 1.00 seconds. PROC PRINT DATA=WORK.NAMELESS; TITLE1 SUBSETTING VARIABLES WITH THE DROP STATEMENT; RUN; The PROCEDURE PRINT used 1.00 seconds.

SUBSETTING VARIABLES WITH THE DROP STATEMENT OBS 1 2 3 4 5 6 SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45

5-28

Examples of Subsetting Variables


1 2 3 4 5 6 7 NOTE: NOTE: 8 9 10 11 NOTE:

DATA WORK.NAMEONLY; SET CLASSLIB.CLASS; OUTPUT; KEEP NAME; RETURN; RUN; The data set WORK.NAMEONLY has 6 observations and 1 variable. The DATA statement used 2.00 seconds. PROC PRINT DATA=WORK.NAMEONLY; TITLE1 SUBSETTING VARIABLES WITH THE KEEP STATEMENT; RUN The PROCEDURE PRINT used 0.00 seconds.

SUBSETTING VARIABLES WITH THE KEEP STATEMENT OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO

5-29

The LENGTH statement:


The LENGTH statement is used to control the number of bytes allocated for a variable. It may also be used to establish whether the variable is character or numeric. When used as described below, the LENGTH statement can be used with numeric variables which take on only integer values, thus decreasing the disk space required to store SAS data sets. v SYNTAX: LENGTH var-list ($) N ...; where var-list is a list of variables of the same type and length $ N (if present) denotes that the variables in the preceding var-list are character, not numeric is the number of bytes (length) to be assigned to the variables in the preceding var-list. For character variables, 1 <= N <= 200 (0-32,767 starting with version 7.0) For numeric variables, on the ACS mainframe (MVS), 2 <= N <= 8. on the PC (DOS), 3 <= N <= 8.

v Notes for character variables: v The length of a character variable is determined by the first statement in which the compiler sees the variable. When used, the LENGTH statement should precede any assignment of SET statement involving the variable in question. v When character variables of different lengths are compared, the shorter value is padded with blanks on the right to match the length of the longer variable (in memory only). v Notes for numeric variables: v The valid length of a numeric variable is 2-8 bytes on the mainframe and 3-8 bytes on the PC. v The default length for numeric variables is 8 bytes; you should specify shorter lengths ONLY FOR INTEGERS, being sure to take into account the maximum integer that can be stored in a given number of bytes as specified in the length tables on the next page. Nonintegers stored in less than 8 bytes will lose precision because they will be truncated. v In the PDV, all numbers are stored in 8 bytes.

5-30

LENGTH TABLE FOR MVS AND PC ENVIRONMENTS

(Largest Integer by Length for SAS Numeric Variables under MVS and PC)

Length in Bytes

Largest Integer Represented Exactly MVS

PC

2 3 4 5 6 7 8

256 65,536 16,777,316 4,294,967,296 1,099,511,627,776 281,474,946,710,565 72,057,594,037,927,936

-8,192 2,097,152 536,870,912 137,438,953,472 35,184,372,088,832 9,007,199,254,740,992

5-31

LENGTH STATEMENT
USAGE NOTES: v LENGTHS placement in the data step determines its effectiveness. If placed before the first reference to a variable, it will store it in the indicated number of bytes. If it is placed after the steps first reference to a variable, it will have no effect, nor will SAS produce an error message. v There is no correspondence between the number of columns used for a numeric variable and the number of bytes specified in the length statement. v For numeric variables lengths of less than eight should only be used for integers v It is usually a good idea to specify lengths for all calculated or assigned character variables.

EXAMPLES: Length a b c 3 d 4 c $ 8 ; Length v1-v5 7 c1-c5 $ 8 ;

Data one ; set two ; length size $ 6 ; if ht<10 then size = small ; if ht>=10 then size=medium ; run;

Data new; length v1 5 region $ 3 ; set old; run;

5-32

ASSIGNING CONSTANTS: EXAMPLE


24 data one ; 25 set classlib.class ; 26 27 ** create c1 as a character constant ** ; 28 c1 = CSCC ; 29 30 ** create c2 as a character constant ** ; 31 c2 = cscc ; 32 33 ** create c2/c3 as a character constant ** ; 34 c3=c2 ; 35 c3= cscc ; 36 37 ** create n1 as a numeric constant ** ; 38 n1=100 ; 39 n2=100.00 ; 40 n3= 100 ; 41 n4=1e2 ; 42 RUN; NOTE: The data set WORK.ONE has 6 observations and 12 variables. NOTE: The DATA statement used 2.00 seconds. 43 44 TITLE ASSIGNING CONSTANTS ; 45 PROC PRINT ; 46 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds. 47 PROC CONTENTS ; 48 RUN; NOTE: The PROCEDURE CONTENTS used 2.00 seconds.

5-33

ASSIGNING CONSTANTS
OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45 C1 CSCC CSCC CSCC CSCC CSCC CSCC C2 cscc cscc cscc cscc cscc cscc C3 csc csc csc csc csc csc N1 100 100 100 100 100 100 N2 100 100 100 100 100 100 N3 100 100 100 100 100 100 N4 100 100 100 100 100 100

CONTENTS PROCEDURE Data Set Name: Observations: Variables: Label: WORK.ONE 6 12 Type: Record Len: 85

-----Alphabetic List of Variables and Attributes----# 3 6 7 8 4 9 10 11 12 1 2 5 Variable AGE C1 C2 C3 HT N1 N2 N3 N4 NAME SEX WT Type Num Char Char Char Num Num Num Num Num Char Char Num Len 8 4 4 4 8 8 8 8 8 12 1 8 Pos 17 41 45 49 25 53 61 69 77 4 16 33 Label

5-34

SAS Date, Time, and Date-Time Values


v Although dates and times are typically written in a numeric form (12/25/83, 12:15), they are not truly numeric in that we can not directly perform arithmetic operations on them. v SAS allows date and time data to be stored as numeric variables. TYPE Date Time Date-time UNITS Days Seconds Seconds DEFINITION Days since January 1, 1960 Number of seconds and hundredths of seconds Seconds since midnight, January 1, 1960

Jan 1, 1953

Jan 1, 1960

Nov 24, 1983

-2556

0 SAS Date Values

8728

Notes: v The baseline of January 1, 1960 is arbitrary v Any dates from 1582 to 20,000AD are valid v SAS accounts for leap years, century and fourth-century adjustments v Although date and date-time values have implied baseline times, differences in these values are directly interpretable. For example, the number of days from January 1, 1953 to November 24, 1983 is: 8728 (-2556) = 11284 days

5-35

Date Constants and Functions


v You can use SAS date constants to generate a SAS date from a specific date entered in your SAS code v Date constants are special constants that SAS converts into date values v The syntax of the date constant is: ddmmmyyyyd where dd mmm yyyy is a one or two digit value for day is a three letter abbreviation for month(JAN,FEB) is a two or four digit value for year(4 is recommended)

The d at the end of the constant ensures that SAS does not confuse the string with a character constant. v Examples: Date1 = 07OCT1999d; If evdate <= 21JUL1987d; If 01JUL1990d <= bdate <= 30JUL1990d ; v There are several useful functions available for handling SAS dates YEAR(SAS-date) extracts the year from a SAS date and returns a 4-digit year value.

MONTH(SAS-date) extracts the month from a SAS date and returns a number between 1 and 12. DAY(SAS-date) TODAY() extracts the day from a SAS date and returns a number between 1 and 31. extracts the date from the computer systems clock and stores the value as a SAS date. This function does not require any arguments.

MDY(month,day,year) creates a SAS date from separate month, day, and year variables. Arguments can be SAS numeric variables or constants. A missing or out of range value creates a missing value.

5-36

Simple Calculations
Using SAS date variables you can easily find the time elapsed between two dates. Simply subtract the dates to find the number of elapsed days, then, if necessary, divide the number to scale it to months, years, weeks, or any other unit of interest.

Days = date2 date1 ; Months = (date2 date1)/30.4 ; Years = (date2 date1) /365.25 ;

5-37

ASSIGNING DATE CONSTANTS: EXAMPLE


102 data one ; 103 set classlib.class ; 104 105 ** create date1 as a date constant ** ; 106 date1 = 01jul1993d ; 107 108 ** create date2 as a date constant ** ; 109 date2 = mdy(07,01,1993) ; 110 111 ** create date3 as a date constant ** ; 112 date3 = 01jul1943d ; 113 114 KEEP NAME DATE1-DATE3 ; 115 RUN; NOTE: The data set WORK.ONE has 6 observations and 4 variables. NOTE: The DATA statement used 2.00 seconds. 116 117 TITLE ASSIGNING DATE CONSTANTS ; 118 PROC PRINT ; 119 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds. 120 PROC CONTENTS ; 121 RUN; NOTE: The PROCEDURE CONTENTS used 1.00 seconds.

ASSIGNING DATE CONSTANTS


OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO DATE1 12235 12235 12235 12235 12235 12235 DATE2 12235 12235 12235 12235 12235 12235 DATE3 -6028 -6028 -6028 -6028 -6028 -6028

CONTENTS PROCEDURE Data Set Name: Observations: Variables: Label: WORK.ONE 6 4 Type: Record Len: 40

-----Alphabetic List of Variables and Attributes----# 2 3 4 1 Variable DATE1 DATE2 DATE3 NAME Type Num Num Num Char Len 8 8 8 12 Pos 16 24 32 4 Label

5-38

ARITHMETIC OPERATIONS: EXAMPLE


192 data one ; 193 set classlib.class ; 194 195 n1 = min(ht,wt) ; 196 n2 = min(age,35) ; 197 n3 = min(ht,sex) ; 198 n4 = mean(sex,name) ; 199 RUN; NOTE: Invalid numeric data, SEX=M , at line 197 column 15. NOTE: Invalid numeric data, SEX=M , at line 198 column 17. NOTE: Invalid numeric data, NAME=CHRISTIANSEN , at line 198 column 17. NAME=CHRISTIANSEN SEX=M AGE=37 HT=71 WT=195 N1=71 N2=35 N3=71 N4=. _ERROR_=1 _N_=1 NOTE: Invalid numeric data, SEX=M , at line 197 column 15. NOTE: Invalid numeric data, SEX=M , at line 198 column 17. NOTE: Invalid numeric data, NAME=HOSKING J , at line 198 column 17. NAME=HOSKING J SEX=M AGE=31 HT=70 WT=160 N1=70 N2=31 N3=70 N4=. _ERROR_=1 _N_=2 NOTE: Invalid numeric data, SEX=M , at line 197 column 15. NOTE: Invalid numeric data, SEX=M , at line 198 column 17. NOTE: Invalid numeric data, NAME=HELMS R , at line 198 column 17. NAME=HELMS R SEX=M AGE=41 HT=74 WT=195 N1=74 N2=35 N3=74 N4=. _ERROR_=1 _N_=3 NOTE: Invalid numeric data, SEX=F , at line 197 column 15. NOTE: Invalid numeric data, SEX=F , at line 198 column 17. NOTE: Invalid numeric data, NAME=PIGGY M , at line 198 column 17. NAME=PIGGY M SEX=F AGE=. HT=48 WT=. N1=48 N2=35 N3=48 N4=. _ERROR_=1 _N_=4 NOTE: Invalid numeric data, SEX=M , at line 197 column 15. NOTE: Invalid numeric data, SEX=M , at line 198 column 17. NOTE: Invalid numeric data, NAME=FROG K , at line 198 column 17. NAME=FROG K SEX=M AGE=3 HT=12 WT=1 N1=1 N2=3 N3=12 N4=. _ERROR_=1 _N_=5 NOTE: Invalid numeric data, NAME=GONZO , at line 198 column 17. NAME=GONZO SEX= AGE=14 HT=25 WT=45 N1=25 N2=14 N3=25 N4=. _ERROR_=1 _N_=6 NOTE: Character values have been converted to numeric values at the places given by: (Number of times) at (Line):(Column). 6 at 197:15 12 at 198:17 NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 6 at 198:17 NOTE: The data set WORK.ONE has 6 observations and 9 variables. NOTE: The DATA statement used 3.00 seconds. 202 203 TITLE ARITHMETIC OPERATIONS ; 204 PROC PRINT LABEL ; 205 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds.

ARITHMETIC OPERATIONS OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE 37 31 41 . 3 14 HT 71 70 74 48 12 25 WT 195 160 195 . 1 45 N1 71 70 74 48 1 25 N2 35 31 35 35 3 14 N3 71 70 74 48 12 25 N4 . . . . . .

5-39

ARITHMETIC OPERATIONS: EXAMPLE


35 data one ; 36 set classlib.class ; 37 38 ** use a character variable with a numeric operand ** ; 39 c1 = sex + 2 ; 40 41 ** numeric var on left , char var on right ** ; 42 length n1 8 ; 43 n1 = sex ; 44 45 ** char var on left , num var on right ** ; 46 length c2 $ 3 ; 47 c2 = AGE ; 48 49 RUN; NOTE: Invalid numeric data, SEX=M , at line 39 column 9. NOTE: Invalid numeric data, SEX=M , at line 43 column 9. NAME=CHRISTIANSEN SEX=M AGE=37 HT=71 WT=195 C1=. N1=. C2=37 _ERROR_=1 _N_=1 NOTE: Invalid numeric data, SEX=M , at line 39 column 9. NOTE: Invalid numeric data, SEX=M , at line 43 column 9. NAME=HOSKING J SEX=M AGE=31 HT=70 WT=160 C1=. N1=. C2=31 _ERROR_=1 _N_=2 NOTE: Invalid numeric data, SEX=M , at line 39 column 9. NOTE: Invalid numeric data, SEX=M , at line 43 column 9. NAME=HELMS R SEX=M AGE=41 HT=74 WT=195 C1=. N1=. C2=41 _ERROR_=1 _N_=3 NOTE: Invalid numeric data, SEX=F , at line 39 column 9. NOTE: Invalid numeric data, SEX=F , at line 43 column 9. NAME=PIGGY M SEX=F AGE=. HT=48 WT=. C1=. N1=. C2=. _ERROR_=1 _N_=4 NOTE: Invalid numeric data, SEX=M , at line 39 column 9. NOTE: Invalid numeric data, SEX=M , at line 43 column 9. NAME=FROG K SEX=M AGE=3 HT=12 WT=1 C1=. N1=. C2=3 _ERROR_=1 _N_=5 NOTE: Character values have been converted to numeric values at the places given by: (Number of times) at (Line):(Column). 6 at 39:9 6 at 43:9 NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 6 at 39:9 NOTE: Numeric values have been converted to character values at the places given by: (Number of times) at (Line):(Column). 6 at 47:9 NOTE: The data set WORK.ONE has 6 observations and 8 variables. NOTE: The DATA statement used 3.00 seconds. 50 PROC CONTENTS ; 51 RUN; NOTE: The PROCEDURE CONTENTS used 1.00 seconds. 52 PROC PRINT ; 53 RUN; NOTE: The PROCEDURE PRINT used 1.00 seconds.

5-40

CONTENTS PROCEDURE
Data Set Name: Observations: Variables: Label: WORK.ONE 6 8 Type: Record Len: 60

-----Alphabetic List of Variables and Attributes----# 3 6 8 4 7 1 2 5 Variable AGE C1 C2 HT N1 NAME SEX WT Type Num Num Char Num Num Char Char Num Len 8 8 3 8 8 12 1 8 Pos 17 41 57 25 49 4 16 33 Label

OBS 1 2 3 4 5 6

NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO

SEX M M M F M

AGE 37 31 41 . 3 14

HT 71 70 74 48 12 25

WT 195 160 195 . 1 45

C1 . . . . . .

N1 . . . . . .

C2 37 31 41 . 3 14

5-41

You might also like