Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
6K views

Merging Data Seven Different Ways

This document discusses 7 different methods for merging data in SAS: 1. Using the MERGE statement in a DATA step, which is the most common approach. 2. Using PROC SQL to perform the merge. 3. Using two SET statements and the KEY= option to loop through and find matches. 4. Creating a format from one dataset and using it to add variables when reading the other dataset. 5. Using hash tables since SAS 9.1 to perform the merge. 6. Loading one dataset into an array, then using the array to attach variables when reading the second dataset. 7. Using a MODIFY statement to update one dataset with variables from the other and outputting to
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
6K views

Merging Data Seven Different Ways

This document discusses 7 different methods for merging data in SAS: 1. Using the MERGE statement in a DATA step, which is the most common approach. 2. Using PROC SQL to perform the merge. 3. Using two SET statements and the KEY= option to loop through and find matches. 4. Creating a format from one dataset and using it to add variables when reading the other dataset. 5. Using hash tables since SAS 9.1 to perform the merge. 6. Loading one dataset into an array, then using the array to attach variables when reading the second dataset. 7. Using a MODIFY statement to update one dataset with variables from the other and outputting to
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Merging Data Seven Different Ways

David Franklin, Independent Consultant, New Hampshire, USA

Abstract Hash Tables


Merging data is one of the fundamental functions carried out when manipulating data 5 Since SAS version 9.1, the use of hash tables to merge data has been
to bring it into a form for either storage or analysis. The use of the MERGE statement available.
inside a datastep is the most common way this task is done within the SAS language DATA alldata0;
but there are others. This paper looks at the seven possible methods, including the IF _n_=0 THEN SET patdata;
use of the MERGE statement, for a one to one, or one to many merge, introducing the IF _n_=1 THEN DO;
SAS code needed to combine the data. No one method is better than another but DECLARE HASH _h1 (dataset: "PATDATA");
some pointers will be given on choosing a method for your data. rc=_h1.definekey("SUBJECT");
rc=_h1.definedata("TRT_CODE");
Introduction rc=_h1.definedone();
call missing(SUBJECT,TRT_CODE);
Merging variables from one dataset into another is one of the basic data manipulation END;
tasks that a SAS SET adverse;
programmer has to do. The most common way to merge on data is using the MERGE rc=_h1.find();
statement in the DATA step but there are six other ways that can help. First though, IF rc^=0 THEN trt_code=" ";
some data: DROP rc;;
RUN;
Dataset: PATDATA
SUBJECT TRT_CODE
Loading Unique Dataset into an Array
124263
124264
A
A
6 Unique dataset is put into an array first, then array is used to attach
124265 B TRT_CODE to ADVERSE when reading dataset ADVERSE in.
124266 B DATA _null_;
Dataset: ADVERSE SET sashelp.vtable;
SUBJECT EVENT WHERE libname='WORK';
124263 HEADACHE WHERE ALSO memname in('PATDATA','ADVERSE');
124266 FEVER CALL SYMPUT('X'||memname,put(nobs,8.));
124266 NAUSEA DATA alldata0;
124267 FRACTURE LENGTH trt_code $1;
ARRAY f{&xpatdata.,2} $6 _TEMPORARY_;
This data will be used throughout the poster for each method described. DO i=1 TO &xpatdata.;
SET patdata (RENAME=(trt_code=trt_code_dict));
f{i,1}=PUT(subject,6.); f{i,2}=trt_code_dict;
MERGE Statement
1 Most common code used to merge data – maximum control but data must be
END;
DO i=1 TO &xadverse.;
sorted or indexed before DATA statement. SET adverse;
DATA alldata0; trt_code='';
MERGE adverse (in=a) patdata (in=b); DO j=1 TO &xpatdata.;
BY subject; IF subject=INPUT(f(j,1),best.) THEN DO;
IF a; trt_code=f{j,2}; OUTPUT;
RUN; END;
IF ^MISSING(trt_code) THEN LEAVE;
SQL END;
2 Another common way, but can be resource hungry if large datasets are IF MISSING(trt_code) THEN OUTPUT;
END;
involved.
DROP i j trt_code_dict;
PROC SQL;
RUN;
CREATE TABLE alldata0 AS
SELECT a.*, b.trt_code
FROM adverse a LEFT JOIN patdata b MODIFY Statement
ON a.subject=b.subject; 7 Using PATDATA with the unique records, to update ADVERSE using a MODIFY
QUIT; statement, then putting the data out to a new dataset ADVERSE2 – the
RUN; MODIFY statement will not add TRTCD to the ADVERSE dataset, but the new
dataset ADVERSE2 contains it:
KEY= option DATA adverse adverse2;
3 A method that uses two SET statements, reading data from ADVERSE and then DO p=1 TO totobs;
looping through PATDATA to find a match. _iorc_=0;
SET patdata point=p nobs=totobs;
DATA alldata0;
DO WHILE(_iorc_=%sysrc(_sok));
SET adverse;
MODIFY adverse KEY=subject;
SET patdata KEY=subject /UNIQUE;
SELECT (_iorc_);
DO;
WHEN (%sysrc(_sok)) DO; /*Match Found*/
IF _IORC_ THEN DO;
SET patdata POINT=p; OUTPUT adverse2; END;
_ERROR_=0; trt_code='';
WHEN (%sysrc(_dsenom)) _error_=0; /*No Match*/
END;
OTHERWISE DO; /*A major problem somewhere*/
END;
PUT 'error: _iorc_ = ' _iorc_ /
RUN;
'program halted.'; _error_ = 0; STOP; END;
END;
FORMAT Procedure END;
4 Creates a format from PATDATA and then adds TRT_CODE to ADVERSE using END;
the format. STOP;
DATA fmt; RUN;
RETAIN fmtname 'TRT_FMT' type 'C';
SET patdata; Conclusion
RENAME subject=start trt_code=label; There are a number of methods which can be used to merge data, beyond the
PROC FORMAT CNTLIN=fmt; MERGE statement within a DATA step. No one method is better than another, and the
DATA alldata0; methods shown here are by no means exhaustive. It is only though trying these
SET adverse; different methods at your site that you will see resource efficiencies between the
ATTRIB trt_code LENGTH=$1 LABEL='Treatment Code'; methods.
trt_code=PUT(subject,$trt_fmt.);
RUN; Contact Information
Your comments and questions are valued and encouraged.
David Franklin
16 Roberts Road, Litchfield, NH 03052
Tel/Fax: 603-262-9160 Email: 100316.3451@compuserve.com
Web: http://ourworld.compuserve.com/homepages/dfranklinuk

You might also like