This document discusses 7 different methods for merging data in SAS:
1. Using the MERGE statement in a DATA step, which is the most common approach.
2. Using PROC SQL to perform the merge.
3. Using two SET statements and the KEY= option to loop through and find matches.
4. Creating a format from one dataset and using it to add variables when reading the other dataset.
5. Using hash tables since SAS 9.1 to perform the merge.
6. Loading one dataset into an array, then using the array to attach variables when reading the second dataset.
7. Using a MODIFY statement to update one dataset with variables from the other and outputting to
This document discusses 7 different methods for merging data in SAS:
1. Using the MERGE statement in a DATA step, which is the most common approach.
2. Using PROC SQL to perform the merge.
3. Using two SET statements and the KEY= option to loop through and find matches.
4. Creating a format from one dataset and using it to add variables when reading the other dataset.
5. Using hash tables since SAS 9.1 to perform the merge.
6. Loading one dataset into an array, then using the array to attach variables when reading the second dataset.
7. Using a MODIFY statement to update one dataset with variables from the other and outputting to
David Franklin, Independent Consultant, New Hampshire, USA
Abstract Hash Tables
Merging data is one of the fundamental functions carried out when manipulating data 5 Since SAS version 9.1, the use of hash tables to merge data has been to bring it into a form for either storage or analysis. The use of the MERGE statement available. inside a datastep is the most common way this task is done within the SAS language DATA alldata0; but there are others. This paper looks at the seven possible methods, including the IF _n_=0 THEN SET patdata; use of the MERGE statement, for a one to one, or one to many merge, introducing the IF _n_=1 THEN DO; SAS code needed to combine the data. No one method is better than another but DECLARE HASH _h1 (dataset: "PATDATA"); some pointers will be given on choosing a method for your data. rc=_h1.definekey("SUBJECT"); rc=_h1.definedata("TRT_CODE"); Introduction rc=_h1.definedone(); call missing(SUBJECT,TRT_CODE); Merging variables from one dataset into another is one of the basic data manipulation END; tasks that a SAS SET adverse; programmer has to do. The most common way to merge on data is using the MERGE rc=_h1.find(); statement in the DATA step but there are six other ways that can help. First though, IF rc^=0 THEN trt_code=" "; some data: DROP rc;; RUN; Dataset: PATDATA SUBJECT TRT_CODE Loading Unique Dataset into an Array 124263 124264 A A 6 Unique dataset is put into an array first, then array is used to attach 124265 B TRT_CODE to ADVERSE when reading dataset ADVERSE in. 124266 B DATA _null_; Dataset: ADVERSE SET sashelp.vtable; SUBJECT EVENT WHERE libname='WORK'; 124263 HEADACHE WHERE ALSO memname in('PATDATA','ADVERSE'); 124266 FEVER CALL SYMPUT('X'||memname,put(nobs,8.)); 124266 NAUSEA DATA alldata0; 124267 FRACTURE LENGTH trt_code $1; ARRAY f{&xpatdata.,2} $6 _TEMPORARY_; This data will be used throughout the poster for each method described. DO i=1 TO &xpatdata.; SET patdata (RENAME=(trt_code=trt_code_dict)); f{i,1}=PUT(subject,6.); f{i,2}=trt_code_dict; MERGE Statement 1 Most common code used to merge data – maximum control but data must be END; DO i=1 TO &xadverse.; sorted or indexed before DATA statement. SET adverse; DATA alldata0; trt_code=''; MERGE adverse (in=a) patdata (in=b); DO j=1 TO &xpatdata.; BY subject; IF subject=INPUT(f(j,1),best.) THEN DO; IF a; trt_code=f{j,2}; OUTPUT; RUN; END; IF ^MISSING(trt_code) THEN LEAVE; SQL END; 2 Another common way, but can be resource hungry if large datasets are IF MISSING(trt_code) THEN OUTPUT; END; involved. DROP i j trt_code_dict; PROC SQL; RUN; CREATE TABLE alldata0 AS SELECT a.*, b.trt_code FROM adverse a LEFT JOIN patdata b MODIFY Statement ON a.subject=b.subject; 7 Using PATDATA with the unique records, to update ADVERSE using a MODIFY QUIT; statement, then putting the data out to a new dataset ADVERSE2 – the RUN; MODIFY statement will not add TRTCD to the ADVERSE dataset, but the new dataset ADVERSE2 contains it: KEY= option DATA adverse adverse2; 3 A method that uses two SET statements, reading data from ADVERSE and then DO p=1 TO totobs; looping through PATDATA to find a match. _iorc_=0; SET patdata point=p nobs=totobs; DATA alldata0; DO WHILE(_iorc_=%sysrc(_sok)); SET adverse; MODIFY adverse KEY=subject; SET patdata KEY=subject /UNIQUE; SELECT (_iorc_); DO; WHEN (%sysrc(_sok)) DO; /*Match Found*/ IF _IORC_ THEN DO; SET patdata POINT=p; OUTPUT adverse2; END; _ERROR_=0; trt_code=''; WHEN (%sysrc(_dsenom)) _error_=0; /*No Match*/ END; OTHERWISE DO; /*A major problem somewhere*/ END; PUT 'error: _iorc_ = ' _iorc_ / RUN; 'program halted.'; _error_ = 0; STOP; END; END; FORMAT Procedure END; 4 Creates a format from PATDATA and then adds TRT_CODE to ADVERSE using END; the format. STOP; DATA fmt; RUN; RETAIN fmtname 'TRT_FMT' type 'C'; SET patdata; Conclusion RENAME subject=start trt_code=label; There are a number of methods which can be used to merge data, beyond the PROC FORMAT CNTLIN=fmt; MERGE statement within a DATA step. No one method is better than another, and the DATA alldata0; methods shown here are by no means exhaustive. It is only though trying these SET adverse; different methods at your site that you will see resource efficiencies between the ATTRIB trt_code LENGTH=$1 LABEL='Treatment Code'; methods. trt_code=PUT(subject,$trt_fmt.); RUN; Contact Information Your comments and questions are valued and encouraged. David Franklin 16 Roberts Road, Litchfield, NH 03052 Tel/Fax: 603-262-9160 Email: 100316.3451@compuserve.com Web: http://ourworld.compuserve.com/homepages/dfranklinuk
[Algebra Essentials Practice Workbook with Answers Linear and Quadratic Equations Cross Multiplying and Systems of Equations Improve your Math Fluency Series] Chris McMullen - Algebra Essentials Practice Workbook with A.pdf
Chris McMullen - Intermediate Algebra Skills Practice Workbook With Answers - Functions, Radicals, Polynomials, Conics, Systems, Inequalities, and (2021, Zishka Publishing) - Libgen - Li