SAS Programming Basics
SAS Programming Basics
stat
>
sas
>
seminars
>
sas_programming_basics
givingagift
>sasprogrammingbasics.htm
SASProgrammingBasics
SASisapowerfulandflexiblestatisticalpackagethatrunsonmanyplatforms,includingWindowsandUnix.Thisclassisdesignedforanyoneinterestedin
learninghowtowritebasicSASprograms.SomefamiliaritywithSASisrecommended.IfyouarenewtoSASyoumaywanttoreviewourIntroductionto
SASSeminar.Itisexpectedthatthoseattendingthiscoursehavetheabilitytonavigatetoandaccessdatafilesontheirownoperatingsystem.The
studentsintheclasswillhavehandsonexperienceusingSASfordatamanipulationincludinguseofarithmeticoperators,conditionalprocessing,usingSAS
builtinfunctions,merging,appending,formattinganddifferentoptionsformodifyingSASoutput.Itisourhopethatafterthisseminaryouwillbeableto:
ComfortablynavigatetheSASwindowenvironment
Subsetandcreatenewdatasets
Createnewvariables
WriteanddebugbasicSASprograms
UseSASfunctionforbasicdatamanagementtasks
Mergeandappenddata
ModifySASoutputforpresentation
PleasenotethatsinceweareusingdatafilesprovidedbySAS,weareunabletomaketheseavailableonourwebsite.Thus,thisseminarpageincludes
outputfromtheSASproceduresusedintheseminar.
ForclarityallSASkeywordswillbeinCAPITALlettersinordertodistinguishthemfromtheinformationthatyouastheuserwillprovide.
Note:ThisseminarwasdevelopedinSAS9.4
1.0SASRefresher
1.1Libname
Wewillstartbysettingourlibname,whichopensadirectorytothelocationwhereourSASdatafilesarestored.
*assign libname
LIBNAME idre 'C:\';
SASalsoallowsyoutoclearaparticularlibnameorusethe_all_keywordtoclearallassignedlibnames.
*clear libname;
LIBNAME idre CLEAR;
LIBNAME _ALL_ CLEAR;
* reassign library;
LIBNAME idre 'C:\';
1.1SASWindowingenvironment
Let'sbrieflyreviewtheSASwindowingenvironment.ThefivemainwindowsinSASaretheExplorer,Results,ProgramEditor,Log,andOutput/Results
Viewerwindows.Ingeneral,whenyoustartSAS,thewindowsthatinitiallyappeararetheLog,EditorandExplorerwindows.Otherwindowscanbefound
undertheViewmenuinthetoolbar.
TheSASExplorerwindowallowsyoutomanagefilesassociatedwithyourcurrentSASsessionincludingviewing,deleting,moving,andcopyingfiles.The
Editorwindow,whichisliterallyjustatexteditor,permitsyoutoenter,edit,submitandsaveSASprograms.TheLogwindowallowstheusertoview
informationabouttheircurrentsessionincludingmessagesaboutsubmittedSASprogramssuchassuccessfulexecution,errorsorwarnings.TheResults
windowenablesyoutoviewalistofresultsfromexecutedSASprograms.TheResultsViewerallowsyoutoviewHTMLresultsofexecutedSAS
procedures.InSAS9.4,thedefaultoutputformatisHTML.
1.2CreatingnewSASdatasets
Aswewillbeusingseveraldifferentdatasetsintheseminartoday,let'salsocoverhowtocreatenewpermanentandtemporarydatasetsfromthedatafiles
youhavebeenprovided.
*permanent dataset;
DATA idre.new;
SET idre.charities;
RUN;
*temporary dataset;
DATA new;
SET idre.charities;
RUN;
1.3SASOptions
SASincludesalargesuiteofsystemoptionsthatwillaffectyourSASsession.SpecificoptionsareinvokedbydefaultwhenyouopenSAS.Theoptionscan
varydependingwhatcomputingenvironmentyouareusing(e.g.Windows,Unix).TheOPTIONSprocedureliststhecurrentsettingsofSASsystemoptions
intheSASlog.
PROC OPTIONS;
RUN;
SASincludestwotypesofoptions:portableandhost.Portableornohostareoptionsthatarethesameregardlessoftheoperatingsystem.Hostoptionsare
differentdependingonwhichoperatingsystemyouareusing.
Belowaresomeexamplesofcommonoptionsandwhattheyareresponsiblefordoing.
TheAUTOCORRECToptionisturnedonbydefaultandallowsSAStocorrectsyntaxwithsmallmistakeslikeamisspelledkeyword.Inthefirstexample
below,theDATAkeywordismisspelledtoDATE.Whentheoptionisinvoked,youwillseethatintheLog(shownbelow),SASissuesawarningitassumed
thatthekeywordwasmisspelledandcontinuesexecutingtheprocedure.However,inthesecondexamplewheretheautomaticcorrectionoptionisturned
off,SASissuesanerrorandstopsexecutingtheprocedure.
*autocorrect option;
OPTIONS AUTOCORRECT; /*default*/
PROC FREQ DATE=idre.charities;
TABLE code;
RUN;
OPTIONS NOAUTOCORRECT;
PROC FREQ DATE=idre.charities;
TABLE code;
RUN;
TheFMTERRoptioncontrolswhetherSASwillissueawarningforincorrectformatsbeingusedforvariables.Inthiscase,thedefaultisforSASistoerror
andstopprocessingtheexecutedprocedure.Inthefirstexample,thedefaultoptionisinvokedandasyoucanseebelowSASissuesawarningthatthe
formatusedcouldnotbefound.However,inthesecondexamplewherewetellSAStonotissueanerror(NOFMTERR),SASignorestheincorrectlyused
formatandwilltheexecutethecommandwithouttheformat.
*format error;
OPTIONS FMTERR;/*default*/
PROC PRINT DATA=idre.charities;
FORMAT code $code.;
RUN;
OPTION NOFMTERR;
PROC PRINT DATA=idre.charities;
FORMAT code $code.;
RUN;
2.0DiagnosingandCorrectingSyntaxErrors
Amainissueswithlearninganewprogramminglanguageistheabilitytoidentifyandaddresscodingerrors.ThereareseveralwaysthatSASwillnotifyyou
ofsyntaxerrors.
2.1ColorCodedSyntax.
WhenexecutingcodeinSASintheEnhancedEditoryouwillnoticesomecolorcoding.Colorcodingprogramcomponentswillhelpyoumoreeasily
diagnosesyntaxerrors,andwhenyoufirststartwithSASyouwillmakemanymistakes.TakealookattheexamplesyntaxbelowcopiedfromtheEnhanced
Editorwindow.Hereyouwillsee5differentcolorsautomaticallygeneratedbySAS.ForexampleyouwillseethatkeywordslikeDATA,CLASS,MODELare
allhighlightedinblue.Ifyouusethewrongkeywordwithaprocedure,thekeywordwilloftenremainblacklikethevariablenamesbecauseSASdoesnot
recognizeit.OptionslikeSOLUTIONarealsoconsideredkeywords.Aswewilldiscusslater,thewaytoindicateaformatistoputaperiodatthenendand,
onceyoudothis,itwillturngreen.Anythinginquotationmarksturnsred.Inthesecondsetofcode,youwillseethatwearemissinganendquote,thusallof
thesyntaxisred.Thuswewouldknowtocorrectthemissingdoublequote.
2.2LogFile
Thelogfilewillalsoletyouknowwhenyouhavesyntaxerrors.BelowisanexampleusingPROCMEANS:
Inthesyntaxshown,weareattemptingtoruntheMEANSprocedurewithacoupleoptions.Wehaveadded"average"and"min"optionstoourstatementto
indicatethatweonlywanttotheaverageandtheminimumvaluesforsalary.Aswedescribedintheprevioussection,optionsshouldbecoloredinblueand
inthisexample"average"remainsblackindicatingtheSASisnotrecognizingitasakeyword.
Belowweseewhathappenswhenweattempttoexecutethesyntaxaswritten.Anerrorappearsinthelogfileindicatingthatthekeyword"average"wasnot
arecognizedoption.Additionally,inthisinstance,SASprovidesalistofalternateoptionsyoumayhavewanted.Ifyoulookcarefully,youwillseethat"mean"
isoneofthem.Ifwereplacetheunrecognizedkeyword"average"with"mean"theprocedurewillexecuteasexpected.
3.0DataStepvs.ProcStep
SASprogramsarecomprisedoftwodistinctsteps:datastepsandprocsteps.Datastepsarewrittenbyyou,whileproceduresareprewrittenprogramsthat
arebuiltin.Ingeneral,Datastepsareusedtoread,modifyandcreatedatafilesandalwaysbeginwitha"DATA"statement.Yousawanexampleofadata
stepinsection1.2.FromastatisticalstandpointaProcstepistypicallyusedtoanalyzeadatasetinSASwithoutmakingchangestothedata.Thereare
exceptionstothis.Procstepsalwaysstartwiththefamiliar"PROC"statement.YouhaveseenseveralexamplesofProcstepsintheprecedingsections
includingPROCPRINT,PROCMEANS,andPROCFREQ.Eachprocedureenablesustoanalyzeandprocessthatdatainspecificway.
Inthefollowingsectionswewilldemonstratehowtousethesetwotypesofsteps.
4.0ManipulatingDatasets
4.1Operators
AnoperatorinSASisasymbolrepresentingacomparison,logicaloperationormathematicalfunction.
4.1.1ComparisonOperator
Theseareoperatorsthatcompareavariablewithsomespecifiedvalueoranothervariable.Theyaretypicallyrepresentedassymbolssuchas=,<,>but
alsohavemnemonicequivalentslikeEQ,LT,orGT,respectively.Theoperatorscanusedwithinadataorprocstepdependingonyourneeds.
OneofthesimplestwaystouseacomparisonoperatorisinaWHEREstatement.Inthe"sales"datafile,wehaveinformationonsalesassociatesfrom
Australia(AU)andtheUnitedStates(US).IfweonlywantedtooutputrecordsforAustraliansalesassociateswecouldusethe=oreqoperator.Sincethe
variablecountrycontainscharacterinformationnotnumeric,weneedtoputsinglequotesaround'AU'.
Symbol
Mnemonic
^~
NOT
&
AND
OR
IntheprevioussectionwelearnedthatwecannotusetwoWHEREstatements,butwecanusetheANDoperatortocombinetheinformationcontainedin
thosetwostatementstoachievethedesiredresult.
BelowweuseANDtooutputobservationsrepresentingAustraliansalesassociatesthatmakelessthan$30,000ayear.Boththesymbolandmnemonicare
usedandtheygivethesameresult.
AswithcomparisonoperatorsyoucanalsocombineAND,OR,&NOTwiththeINoperator.Intheexamplebelowthevariablejob_titleincludesseven
differentjobtypes.Wewanttoobtainfrequenciesofallofthemexceptfortwo,SalesManagerandSalesRepIV.SoweusetheNOTcombinedwithINsince
wehavemorethenonevaluewearetryingtoexclude.
Operator
Description
CharorNum
BetweenAnd
Allowsforaninclusiverange
Both
Contains
Includesacharacterstringorsubstring
CharacterOnly
IsNullorIsMissing
Identifiesmissingvalues
Both
Like
Matchesapattern
CharacterOnly
=*
Soundslike
CharacterOnly
SameAndorAlso
AugmentsanexistingWHEREclausewithout
havetoretypetheoriginalone
Both
Forexample,herearethreewaysofspecifyingthatwewantSAStooutputallsalesassociaterecordswithsalariesthatrangefrom$28,000to$30,000.As
inanygoodprogramminglanguage,therearealwaysmultiplewaysofdoingthesamething.
Let'ssupposeweareinterestedinidentifyingproductnamesthatincludetheword"Woman's".Howcouldwedothat?The"Like"operatorcouldhelpusdo
this.Itworksbycomparingcharactervaluestosomegivenpattern.Itrequirestwospecialcharacters,apercent(%)signandanunderscore(_).Thepercent
denotesthatanynumberofcharactersmayoccupyaposition.However,theunderscorespecifiesthatonlyonecharactermayoccupyaposition.Ifweare
onlyinterestedinproductsthatstartwith"Woman's",thenwedon'tcarehowmanyspacescomeafter"Woman's":
IalsocouldaskSAStooutputtomeanynamethatincludes"Men's"anywhereinthetitle.Thiswouldrequiremultiple%signsb/canyproductnamewith
"Men's"mayhavecharacterspacesbeforeandafter.
FormoreinformationcheckoutSASHelpandDocumentationonspecialWHEREoperators.
4.1.4Arithmeticoperators
Arithmeticoperators,asyoucanprobablytellfromthename,allowyoutoperformarithmeticcalculationsinSAS.Belowisatableoftheoperatorsandtheir
symbolsusedinSAS.
Symbol
Description
**
*
/
+
Exponentiation
Multiplication
Division
Addition
Subtraction
Afewthingstonoteaboutusingtheseoperators.First,ifyouarecalculatingvaluesusingavariable(s)withmissingdata,theresultingvaluewillalsobe
missing.Second,expressionsareevaluatedwithrespecttothetraditionalorderofoperationswithexponentiationtakingthehighestprioritylevel,then
multiplication/divisionandlastaddition/subtraction.Thisorderingcanbemodifiedbyusingparentheses.Third,asisthecasewiththeotheroperatorswe
havediscussed,arithmeticoperatorscanbeusingonconjunctionwithbothlogicalandcomparisonoperators.
Let'stryafewexamples.BelowwewilluseaDatasteptocreateanewtemporarydatasetcalled"sales_subset"fromthe"sales"data.Thisdatasetwill
containonlyobservationsfromAustralianemployeeswhosejobtitlecontainstheword"Rep".SoweareusingalogicalandspecialWHEREoperator.
Additionally,wearecreatinganewvariablecalled"Bonus"whichiscalculatedbymultiplying"Salary"by.10.
DATA sales_subset;
SET idre.sales;
WHERE Country='AU' & Job_Title contains 'Rep';
Bonus=Salary*.10;
RUN;
Belowweoutputthefirst20recordsofournewdataset.
Inthissecondexamplelet'suseparenthesestochangetheestimationofacompound(morethenoneoperator)expression.WewilluseaDatastepto
createtwonewvariablesprofit1andprofit2.
DATA profit;
SET idre.order_fact;
profit1 = total_retail_price - costPrice_per_unit * quantity;
profit2 = (total_retail_price - costPrice_per_unit) * quantity;
RUN;
Let'sseehowtheuseofparentheseshaschangedourvalues.
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonarithmeticoperators.
4.2ConditionalProcessing
4.2.1WHEREandIFstatements
ConditionalprocessinginSASallowstheusertomanipulateandoutputportionsofdatainsteadofthewholefile.Inprevioussectionyouhaveseenseveral
examplesoftheWHEREstatement.Alternatively,SASalsoallowsfortheuseofIFstatements.Bothcanaccomplishsimilartaskshowever,whileboth
WHEREandIFcanbeusedwithaDatastep,onlyWHEREisallowedinaProcstep.Forexample,ifweaddanIFstatementtothePROCMEANS
commandfromearlierwewillseethattheIFturnsred.Thisindicatesthatthesyntaxisincorrect.
HoweverifyouuseWHEREthestatementisblue.
IfyouattempttoexecutethePROCMEANSusingtheincorrectIFstatementSASwillproduceanerrorbutSASwillexecutethecommandusingthe
WHEREstatement.
DatastepswillacceptbothWHEREandIFstatement,howeveronlyanIFcanbeusedforassignmentstatements.Belowisanexampleofanassignment
statement.Assignmentinthiscasemeanswearetakingobservationswithvaluesforsalarythataregreaterthan$30,000andassigningthem,usingTHEN
OUTPUT,toanewdatasetcalled"highsales".
DATA highsales ;
SET idre.sales;
IF salary GT 30000 THEN OUTPUT highsales;
RUN;
YoucanalsocombineWHEREandIFinthesameDatastepasdemonstratedbelow.WeuseanIFandWHEREstatementtosubsetthedata.Canyouthink
ofequivalentwaysofsubsettingthedata?
DATA emps;
SET idre.sales;
WHERE Country='AU';
Bonus=Salary*.10;
IF Bonus>=3000;
RUN;
Moreover,youwillnoticethatSASallowsyoutocreateavariableanduseitinanIFstatementinthesameDatastep.Thisissomethingyoucanaccomplish
withanIFbutnotWHERE.ThereasonforthishastodowithwhenSASexecutesconditionalstatements.WhenusingaWHEREconditionSASonlyselects
theobservationsthatmeetsthisparticularconditionandthencontinuesexecutinganyotheroperationsintheDataStep.Thismakesformoreefficient
processingofdataespeciallywithlargeamountsofdata.ButinthisinstanceifwehadusedaWHEREstatementtosubsetthedatausing"Bonus",SAS
wouldhavegivenusanerrorsayingthe"Bonus"variableisnotinthedataset.However,IFconditionsarenotprocesseduntiltheendoftheDatastep.Thus,
SASwillexecutetheWHEREstatementandcreate"Bonus"andthenassesswhethertheIFconditionistrue.
4.2.2IfThenstatement
AnIfthenstatementisacommonlyusedassignmentstatementthatistypicallycarriedoutwithinthecontextofDataStep.ItexecutesaSASstatementthat
fulfillsacertaincondition.
Wewillonceagaincreateavariablecalled"Bonus",butassignthevaluesbasedonacertainsetofconditionsthataredefinedbyanemployee'sjobtitle.
DATA comp1;
SET idre.sales;
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
IF Job_Title='Sales Manager' THEN Bonus=1500;
IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
Youwillseeintheoutputabove,thatseveralobservationshavemissingvalues.Thisisduetothefactthatwedidnotassignvaluesfor"Bonus"forallofthe
jobtitles.
ArelatedstatementtoIFTHENistheELSEstatementthatcanbeusedwhencreatingconditionalstatementsaroundmutuallyexclusivegroups.
DATA comp2;
SET idre.sales;
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
SASprocessthefirstIFstatementandifitisnottrueitmovestothenextandsoon.SAScontinuestotesttheIFTHENstatementuntilitfindsonethatis
true,whichatthatpointitstopsandwillnottesttheremainingconditions.Onceagain,thiscanspeeduptheprocessingoflargedatasets.However,aswas
thecasewiththefirstIFTHENexample,wewillendupwithalotofmissingvaluesusingthissyntax.
Whatifwehadascenariowherewewantedtogivealltheremainingcategories,thatdidnotfulfilltheprescribedconditions,onebonusvalue.Wecando
thatusingafinalELSEstatementwithnoIFTHEN.IntheSAScodebelow,weaddanadditionalELSEstatementassigningallofthejobtitlesabonusvalue
of500.
DATA comp3;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
ELSE Bonus=500;
RUN;
Now,wehavecompletedataforallobservations.
AsecondrelatedstatementtoIFTHENistheDELETEstatement.InallofthepreviousexampleswehaveusedtheIFTHENstatementtoaddinformation
butyoucanalsousetheIFTHENtodeleteaswell.UsingtheIFTHENDELETEsyntaxwecanspecifythatcertainobservationsfittingourconditionbe
permanentlydeletedfromthedata.Intheexamplebelow,wedeleteallobservationsassociatedwiththreespecificjobtitles.
DATA drop;
SET idre.sales;
IF Job_Title IN('Sales Manager', 'Senior Sales Manager', 'Chief Sales Officer') THEN DELETE;
RUN;
4.2.3UsingDo
TypicallywithanIFTHENstatementonlyoneexecutablestatementisallowed.Whenanexpressionistruetheassociatedstatementisexecuted.Butwhat
happensifyouwantmorethenonestatementexecutedforeachexpression.Forexample,let'simaginethatforeachbonusvalue,Ialsowanttocreatea
variablecalledfreqthatdenoteshowmanytimesayearthesalesassociatecanreceivethebonus(e.g.onceayear,twiceayear).Sowemighttrythe
followingcodeusingalogicaloperator.
DATA freq1;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000 & Freq = "once a year";
ELSE Bonus=500 & Freq = "twice a year";
RUN;
Whilethissyntaxappearsreasonable,SASwillexecutethestatementandtheissueanoteinthelogthat"VariableFreqisuninitialized".WhenSASis
unabletolocateavariableinaDATAstep,SASprintsthismessage.Ifyoulookinthefreq1SASdatasetyouwillseethatSAScreatedthevariablebutsets
allofit'svaluestomissingwhichisundesirable.Itappearsthatcreating"Freq"willrequireaseparatestatementinsteadofjustasimple"&".Youcouldtry
this:
DATA freq2;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE Bonus=500;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Freq = "once a year";
ELSE Freq = "twice a year";
RUN;
Butthiscodecouldgetfairlylongifyouhavealotofvariablestocreate.AbetterwaytodothiswouldbethroughtheuseofaDOgroupwhichallowsfor
multiplestatements.
DATA bonus;
SET idre.sales;
IF Country='US' THEN DO;
Bonus=500;
Freq='Once a Year';
END;
ELSE DO;
Bonus=300;
Freq='Twice a Year';
END;
RUN;
WhilethesyntaxlookssimilartoatraditionalIFTHEN,therearesomeimportantdifferences.First,theIFexpressionnowendswithTHENDO.Thisis
followedbyasetofstatementstobeexecuted.Second,eachDOblockendswithanENDstatement.Third,insteadofjustELSEwenowhaveELSEDO
whichalsohasanENDstatement.IfyouaremissinganEND,SASwillissueawarninginthelogandfailtoexecutetheDatastep.
4.3SASFunctions
Functionsacceptsargumentsandthenproduceaparticularvalue(numericorcharacter)basedonthosearguments.Argumentsareenclosedwithin
parenthesesandeachargumentisseparatedbyacomma.SAShasawidearrayofdifferentfunctionsdependingontheneedsoftheuserandcanbeused
inDatastep.Wewillcoverafewexamplesofbasicmathematicalfunctions,commondatefunctions,andsomeadditionalfunctionsusefulforspecificdata
managementtasks.
4.3.1ArithmeticFunctions
Inthefirstexample,wewillusethe"Oldbudget"datafiletocalculatethetotalandaverageamountbudgetedforbusinessoperationsoverafiveyearperiod.
DATA budget;
SET idre.oldbudget;
sum1 = yr2003 + yr2004 + yr2005 + yr2006 + yr2007;
sum2 = SUM(yr2003, yr2004, yr2005, yr2006, yr2007);
sum3 = SUM( of yr2003-yr2007);
mean1 = (yr2003 + yr2004 + yr2005 + yr2006 + yr2007)/5;
mean2 = MEAN(yr2003, yr2004, yr2005, yr2006, yr2007);
mean3 = MEAN( of yr2003-yr2007);
RUN;
Therearemanydifferentwayofcreatingthesumandmeanvariablesthatweneed.Wecreate"sum1"usinganarithmeticoperatortoaddthe5budget
amounttogether.Alternatively,wecanusetheSUM()function,theargumentsarethevariablesyouwishtosumtogether.Thedifferencebetweenusingthe
functionversusmanuallyaddingtogethereachvariableisthetreatmentofmissing.Whenweadditemsusing"+",acasewithmissingvaluesonanyofthe
variableslistedwillhaveamissingvaluefortheresultingvariable.IfweusetheSUM()function,anymissingvalueswillbetreatedasthoughtheywerezero,
andthenewvariablewillbeequaltomissingonlyifallofthevariableslistedaremissing.Whichmethodismostappropriatedependsonthesituationand
whatyouaretryingtoachieve.Last,ifyouhavealotofvariablestobesummedyoucanspecifyaSASvariablelist.Thissyntaxworkssincethevariable
beingspecifiedareconsecutiveinthedata.ChecktheSASdocumentationpageonSASvariablelistsonhowtousethisshortcutinothercircumstances.We
alsousesimilarsyntaxtodemonstratehowtoestimatetheaverageormeanbudgetvariables.
Allthevaluesproducedfor"sum1sum3"and"mean1mean3"arethesamesincewedonothaveanymissingdata.SAShasanumberofadditional
mathematicalfunctionsincludingabsolutevalue,maximum,minimumandsquarerootthatcanbeusedinasimilarmanner.
4.3.2DateFunctions
Oneofthemorechallengingdatatypestodealwithinanydataanalysispackagearedatevalues.Thankfully,SAShassomebuiltinfunctionsthatcanassist
userswithmanagingthisdatatype.SASstoresdateinformationasnumericvaluesrepresentingthenumberdaysbeforeorafterJan1,1960.SAScanalso
recognize2or4digityearvalues.Wewillusethe"Sales"datasetwhichincludesinformationondateofbirthandhiringdataforeachemployeeto
demonstratesomedatefunctions.
DATA comp;
SET idre.sales;
Hire_Month=MONTH(Hire_Date);
Birth_Day = WEEKDAY(Birth_date);
Day_Dif = DATDIF(Birth_date,Hire_Date, 'actual');
Month_dif= INTCK('years',Birth_date,Hire_Date);
Bonus_1 = INTNX('month', Hire_Date, 6);
RUN;
TheMONTHfunctionpullsthemonthfrom"Hire_date"andput'sitinavariablecalled"Hire_month".TheWEEKDAYfunctionfiguresoutwhatdayofthe
week(17)thedatewouldhavefallenonandoutputsthis.
DATDIFcalculatesthedifferenceindaysbetweentwodatesgiveninthefirsttwoarguments.Thethirdargumentspecifiesthemethodforcalculatingthe
days.Inthisexamplewespecifywewantthe'actual'numberofdays,butwecouldchooseothermethodsofcalculationsuchassumingthateachmonthhas
30daysandthatayearalwayshas360days.
INTCKcountsthenumberofintervalsbetweentwodates,inourexampleweaskedSAStooutputthenumberofyearsbetweenanemployeesdataofbirth
andwhentheywerehiredwhichwewouldbeequivalenttoanemployeesageatthetimeofhire.
INTNKisusedtoestimatecalculatethevariablebonus_1.Herewewanttocalculatewhenanemployeewithbeeligiblefortheirnextbonus.The
argumentsforthisfunctionaretheunitoftime,thevariablerepresentingthestartdate/timeandthenumberofincrements.Inourexample,employeesare
eligible6monthsaftertheirhiredate.
Belowistheoutputofthefirst10observationsofthe"comp"dataset,withandwithoutdateformats.Asmentionedbefore,SASstoresdateinformationas
numericinformationindays.Thusifyoudonotformatdatewithaformatstatement(discussedfurtherinthenextsection),itwilldisplayasjustanumber.
MoreexampleofSASdatefunctioncanbefoundontheSASHelpandDocumentationwebsite.
4.3.3OtherFunctions
SASincludesseveralothertypesoffunctionsdesignedforspecifictypesofneedsmanyofthesefunctionsarehelpfulfordatamanagementofcharacteror
stringinformation.Forexample,LENGTHtellstheuserthelengthofacharacterstringwhileCOMPRESSwillcompressstringvaluesandremoveunwanted
blanksandspecificcharactervalueslikedashes.Additionally,insimilarwaytoextractingdateinformationliketheMONTHfunction,SAShasseveral
functionsincludingSCANandSUBSTRthatallowsyoutoextractwordsfromaphrase.
Let'sdemonstratethese.Belowissampleofdatafromadatasetcalled"Shoes_eclipse"whereallthevariableshavecharacterinformation.Ourtaskof
interestistoobtainthelengthofproduct_name,compressproduct_nametoremovetheblanks,andcreateavariabletheextractsthebrandname
"Eclipse"fromproduct_group.
DATA shoes;
SET idre.shoes_eclipse;
length_name = LENGTH(product_name);
comp_product = COMPRESS(product_name);
brand = SUBSTR(product_group, 1, 7);
Youwillnoticeafewthingsabouttheoutputabove.First,forthevariablelength_name,ifyoucountedthenumberoflettersandspacesinproduct_name
youwouldendupwiththesamevaluesdisplayedabove.Second,thecompressedversionofproduct_namenowincludesnospaces.Third,bothSCAN
andSUBSTRfunctionsproducedthesameoutput.TheSUBSTRfunctiontakes3arguments,thenameofvariablewiththeinformationyouwanttoextract,
thecharacterpositionyouwanttostartfromandthenthenumberofcharactertoextract.InourexamplewearetellingSASthatwewanttoextracta
characterstringoflength7startingatthefirstcharacterpositionof"productgroup"whichwouldbethe"E"inEclipse.Unfortunately,thismeanswhatever
valueweareextractingmustalwaysbeofthesamelength.Whatifwehaveproductnamesofdifferentlengths.ThenyoumightwanttousetheSCAN
function,whichworkverysimilartoSUBSTRexcept,insteadofspecifyingthelengthofthestring,thelastargumentisadelimiter.Thesyntaxabove
indicatesthatthecharacterstringofintereststartsatthefirstpositionandcontinuesuntilablank/spaceisencountered.Thisfunctionworkswithmanytypes
ofdelimitersincluding<(+&!$*)^/,%.
Inthepreviousexamples,wewereextractingvaluesfromastring,butwhatifwewantedtocombinestringvariables.AusefulfunctionwouldbeCATX.
BelowwewantSAStocombinethecharacterstringinformationinfirst_nameandlast_nameintoonefullnamevariable.Additionally,thefunctionalso
requiresthespecificationofvariablesthatincludesinformationonthedelimiterofchoice.Inthefirstexample,thedelimiterisjustablankwhileinthesecond
examplethedelimiterisacomma.
DATA salesquiz;
SET idre.salesquiz;
sep = " ";
fullname = CATX(sep, first_name, last_name);
sep1 = ",";
fullname1 = CATX(sep1, last_name, first_name);
RUN;
Thenewvariablesaredisplayedabove.
AlistofallSASfunctions,bycategory,canbefoundhereontheSASwebsite.
Note:TheorderinwhichthevariablesarespecifiedintheCATXfunctiongovernstheorderinwhichtheywillbecombined.
4.4Sorting,MergingandAppending
4.4.1Sorting
Thearemanyinstanceswhenhavingyourdatasortedinaparticularwaywillbehelpfulforvisualizingyourdata.Additionally,certaintypesofdata
managementneedslikemergingdatasetsorgroupingobservationsbyaparticularcharacteristicrequiresorting.
SortingdatabyasinglevariableinSASisthemostsimple.BydefaultSASsortsdataascendingwiththesmallervaluesfirst.
Sortingcanalsobedoneusingmorethenonevariable.
Asyoucansee,thedataissortedinascendingorderby"Salary"firstandthenwhentherearetiedsalariesfromdifferentcountries,AUcomesbeforeUS
alphabetically.Wecanchangethissortingbehaviorflippingtheorderingofourvariablesand/oraddingintheDESCENDINGoption,whichreversesthesort
orderforthevariablethatimmediatelyfollowsit.
4.4.2Merging
Onedatamanagementtaskthatrequirespropersortingismerging.Merginginvolvesmatchingoneobservationinadatasettooneobservation(OnetoOne)
ormultipleobservations(OnetoMany)inaseconddataset.InorderforthistobedoneproperlyinSAS,thedatasetstobemergedmustbesortedbythe
samevariable(s).Intheexamplebelow,wewillmergeadatasetthathasemployeepayrollinformationwithaseconddatasetwithemployeeaddresses.
Sinceanemployee'sIDnumber(employee_id)isauniqueidentifierofeachobservation,wewillusethisvariabletomatchobservations.
First,weneedsorteachdatasetbyemployee_id.
Afewthingstotakenoteof.First,Youwillnoticethatdatasets"addresses"and"payroll"donotshareanyofthesamevariablesexceptEmployee_ID.In
general,youdonotwanttomergedatasetsthatincludevariableswiththesamenames.SAScanonlyuseonesetofvaluesandwillarbitrarilychoosethe
valuesfromthelastdatasetread.Thus,youshouldrenamevariablesbeforeattemptingthemerge.Second,Employee_IDisuniqueineachdataset,so
thiswillbeaOnetoOnemerge.
MergingisdoneinaDatastepsimilartowhatwehavebeenexecuting,exceptinsteadoftheSETstatementwenowhaveaMERGEstatement.Additionally,
theBYisusedtotellSASwhichvariablewillbeusedtomatchrecords.ThevariableaftertheBYstatementisthesameuniqueidentifierthatwejustused
forsorting.
DATA payadd;
MERGE payroll addresses;
BY Employee_ID;
RUN;
Belowisasubsetofvariablesfromthenewlymergeddata.Asyoucansee,Employee_Nameisfromthe"addresses"dataandBirth_dateandSalaryare
fromthe"payroll"data.
Nowlet'stakealookatanexampleofaOnetoManymerge.
Thefirstsetofdataprovidesinformationonorderanddeliverydates.Inthesecondsetofdatawehaveinformationontheproductorproductsordered.
BecausemorethenoneitemcanbeassociatedwithaparticularOrder_ID,itisnotuniqueinthisdataset.Thus,wewillneedtoconductaonetomany
mergewhereeachrowinour"orders"datacouldbemergedwithmultiplerowsinthe"order_item"data.Again,wewillbeginbysortingbothsetsofdataby
Order_ID.
Belowisoursyntaxtomergethetwodatasets.NoticewealsousedaKEEPstatement.Thisallowsustomergethedataandcontrolthenumbervariables
presentinthefinalmergeddataset.
DATA allorders;
MERGE orders order_item;
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
RUN;
Aboveisaselectedportionofthemergeddata.SASexecutedthemergewithoutanerrorbutitappearsthatwehavesomemissingdataasaresult.Thetwo
variablesthathavemissinginformationwerebothfromthe"orders"data.Thisisanindicationthatweperhapshavesomenonmatches.Ifwegobackand
lookatthe"orders"datawewouldseethatthereisnoinformationfortheorderidentifier"1243854878"butthereisinformationin"order_item",thuswhen
youmergethedatasetstogetherallthevariablesfrom"orders"willhavemissingvaluesforthisparticularorder.Thereareacouplewaysyoucandealwith
thisissue.First,youcanleavethedataasiswithmissinginformationfornonmatches.Alternatively,youcanchoosetocontroltheobservationsoutputtothe
newmergeddatasetbyusingtheINoptionontheMERGEstatement.TheINoptionacreatesvariableindicatingwhichdataset(s)contributedtoformingthe
observationinthefinalmergedataset.Itisatemporaryvariableusedinthemergingprocessthatisgivena0valueifdidnotprovideinformationora1ifit
did.Wecouldthenusethisvariabletoselectobservationsinthenewlymergeddatathatcomefromonedatasetorboth.Let'stakealookathowwecould
applythisoptioninourpreviousmerge.
DATA allorders2;
MERGE orders (in=a)
order_item (in=b);
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
IF a;
RUN;
UsingtheINoptionwithanIFstatementselectsobservationstobematchedbyorder_IDthatarepresentin"orders".Ifyouhaveavaluefororder_IDthatis
in"order_item"butnot"orders"thenitwillnotbeusedtoconstructobservationsforthe"allorders2"dataset.Note:UsingIF=aisequivalenttosayingIFa=1.
Thus,youwillnotendupwithanymissingvalues.
Now,onethingyougenerallywanttoavoidismanytomanymerges.Whenneitherdatasethasauniqueidentifierthatwillallowforpropermatchingof
recordstheresultisasomewhatunpredictableandoftenundesirableassortingofobservations.
4.4.3Appending
Appendingorconcatenatingobservationsistheprocessofaddingrowsorobservationstoadatasetasopposedtomergingwhichaddsvariables.Thiscan
alsobeaccomplishedusingaDatastep.SASwillstackthecolumnstotogetherbymatchingthenamesacrossdatasets.
Wewillappendthreedatasetsthatincludeinformationonordersfrom3consecutivemonths(JulySeptember)in2011.Belowissnapshotofthefirsttwo
recordsfromeachofthedatasetstobeappended.
Thesyntaxtoconducttheappendisquitesimple.AllyouhavetodoislistthedatasetstobeappendedontheSETstatementline.Theorderingandthe
numberofdatasetsdoesnotmatter.
DATA mnth7_8_9_2011 ;
SET idre.mnth7_2011 idre.mnth8_2011 idre.mnth9_2011;
RUN;
Aportionofthenewlyappendeddatasetisbelow.
Nowyoucanseethatall3datasetshasbeenappendedor"stacked"together.Thisexampleworkedperfectlybecausethethreedatasetssharedtheexact
samevariables.Butwhathappenswhenyouappendthedatasetsthatdonotcontainthesamevariables?
Takealookbackatour"shoe"data.Belowwehavetwosetsofdata,oneforEclipseshoesandoneforTrackerShoes.Youwillnoticetheyshareallthe
samevariablesexcepttwo,product_idandsupplier_name.
Whatwillhappenwhenweattempttoappendthedata?
DATA shoes;
SET idre.shoes_eclipse idre.shoes_tracker;
RUN;
Theappendstillexecuteswithouterror.However,inthenew"shoes"datawecreated,alltherecordsfromtheEclipsedatasetwillbemissingonthe
variablesthatwereonlyintheTrackerdataset.
5.0ModifyingSASOutput
5.1TitlesandFootnotes
Asyouhaveprobablyalreadynoticed,SASprovidesalotofoutputfrommanyofit'sprocedures.Asaresearcheritisimportanttoknowhowtomanipulate
andchangeyouroutputtoconveyimportantinformationtoyouraudience.OneoftheprocedureswehavebeenusingtoobtainoutputfromourvariousData
stepsisPROCPRINT.Wewillbeginbyexploringsomewaysofenhancingtheoutputfromthisprocedure.
Wheneveryouarepresentingtablesofinformation,thefirstitemmostpeoplelookforisatitle.ThisiseasilyimplementedwithaTITLEstatementin
SAS.Additionally,itisalsopossibletoaddmultipletitlestooutputinSASaswellasfootnotesbyjustaddinganumericsuffixtothestatementindicatingthe
desiredordering.SASallowsforupto10differenttitlesand/orfootnotes.
5.2LabelOptions
Additionally,youmayalsofinditusefultolabelyourvariablesforaddedreadability.YoucandothiswiththeLABELstatement.Youwillnoticethatsomeof
thevariablelabelsarelongerthanothers.Whenyouonlyhavethreevariablesitmaynotseemimportant,butifyouhavetolabel10variables,available
spaceinatablemayneedtobeaconsideration.OnewaytodealwiththisistousetheSPLIToptiononthePROCPRINTline.Thisallowstheuserto
controlthedisplayofthetitlesothatinsteadofthelabelbeingoneline,youcansplititintotwolines.
Thevariablenameshavebeenreplacedwithlabels.NotethatwedidnothavetoresubmittheTITLEandFOOTNOTEstatements.Theseareconsidered
globalstatementsandremainineffectuntilyoucancelthemoryouendyourSASsession.Tocanceltheseyoumustissueablankstatementforeach:
TITLE;
FOOTNOTE;
5.3Formats
Beyondjustlabelingvariables,youmayalsowanttoproperlylabelthevaluesofthosevariables.ThisiscarriedoutinSASusingaFORMATstatement.
FormattingvalueschangestheappearanceofthosevaluesinoutputbuttheunderlyingvaluesdoesNOTchange.
SAShaspredefinedformatsforcertaintypesofvariableslikedatesandallowsuserstocreatetheirownformatsforspecificsituations.Earlierwesawsome
examplesofpredefinedformatswhenwecovereddatefunctions.Here,wewillfocusonhowtocreateandapplyuserdefinedformats.
InSAS,thePROCFORMATprocedureisusedtodefineformats.Forexample,takealookatthesyntaxbelow:
PROC FORMAT;
VALUE $ctryfmt 'AU'='Australia'
'US'='United States'
other ='Miscoded';
VALUE tiers0-49999='Tier 1'
50000-99999='Tier 2'
100000-250000='Tier 3';
RUN;
EachformatisdefinedafteraVALUEstatement.Thenameyouchooseisuptoyou.Noticethatcharacterformatsmustbedefinedwitha"$"infrontof
them.Youthenprovideavaluelabelforeachlevelorrangeofvalues.Thefirstformatwearecreatingistolabelthecountrieswilltheirfullnamesinsteadof
abbreviations.VALUEstatementscanalsousekeywords.InthisexampleotherspecifiesthatanyvaluesotherthenAUorUSwillbelabeledas"Miscoded'.
Fornumericformats,youcanlabelasinglevalueorarangeofvalues.Oncetheformatsarecreatedwecanapplythemtothevariablesofinterest.
InbothDatastepsandProcsteps,SASdistinguishesformatsfromvariablesbyendingtheminaperiodwhichthenturnsthetextgreen.
Aboveyoucanseetheappearanceofthetablewithunformattedvaluestotheonewithformattedvalues.Ifyouonlywanttousetheformattedvaluesfor
certainproceduresinSAS,thenyoucanjustaddaformatstatementaswedidabove.Ifyouwanttheseformatstobepermanentlyappliedtoavariable,
thenyoucanusethesameformatstatementinaDataStep.
5.4OutputDeliverySystem(ODS)Basics
BesidescustomizingtheSASdefaultoutput,youmaywanttooutputresultstodifferentfiletypes.BydefaultSAS9.4outputresultsasHTMLandthisis
whatyouseeinthe"ResultsViewer"window.Ifyouwouldliketochangethisbehavior,youwillneedtousetheOutputDeliverySystem(ODS)statement.
Thiswillallowforoutputinseveraldifferentformatsincludinglisting/text,rtf,pdfand.xls.
ODSstatementsarealsoglobalstatementsareineffectuntilclosed.Thestatementcomesbeforerunningtheprocedure.Oncetheprocedure(s)is
executed,youwillthenclosetheODSstatement.Thebasicsyntaxisshownbelow:
ODS LISTING;
PROC FREQ DATA=idre.sales;
TABLES gender;
RUN;
ODS LISTING CLOSE;
Asauser,youcanalsocustomizethestyleorlookoftheoutputwhenselectingeitherahtml,pdsorrtfdestination.Thisabilityisoftenusefulwhen
formattingresultsforpresentationsorpublications.BelowaresomeexamplesofthedifferentoptionsavailableinSAS:
RUN;
ODS PDF CLOSE;
Justtobesafewewillgoaheadandclosealloftheopendestinations.However,bemindfulthatthiswillalsoclosethehtmldefault,soyouwillneedto
reissuetheglobalstatementturningitbackon.Otherwisethenexttimeyouissueaprocedurethatgeneratesoutput,SASwillissuethewarning"Nooutput
destinationactive".
6.0SpecialIssues
6.1DealingwithDuplicates
Anissuethatcomesupalotindatamanagementishowtohandleduplicates.ThereareseveralwaysinSAStoidentifyduplicaterecords.
OnewayistousesomeoftheoptionsavailabletouswiththePROCFREQprocedure.Inthe"nonsales"datafile,weshouldhave235uniqueemployee
identificationnumbers.WecanusetheORDER=FREQoptiontodetermineifthisistrue.Thisoptiondisplaysthefrequencyofeachuniqueidentification
numberindescendingorder.
AboveyoucanseethattheemployeeID#120108hastworecordsassociatedwithit,indicatingthatwehaveaduplicateproblem.Anotherusefuloptionis
NLEVELS,whichdisplaysthenumberofdistinctvaluesforeachvariable.
Thereare235uniqueemployeesinthe"nonsales"databutonly234uniquelevels,meaningthatoneemployeeID#isduplicated.
OncethepresenceofduplicateIDnumbershasbeenconfirmed,youwillmostlikelywanttoexaminethemtodetermineiftheyareindeedduplicaterecords
oriftheemployeeIDnumberisincorrect.InourdatasetonlyoneIDisduplicatedmakingassessmentfairlyeasy.However,whatdoyoudowhenseveral
ID'sorrecordsareduplicated?Let'sseparatetherecordswithuniqueID'sfromtheduplicatesusinganIFstatement.
RUN;
DATA dupes nodupes;
SET ids2;
BY employee_id;
IF NOT (FIRST.employee_id and LAST.employee_id) THEN OUTPUT dupes;
ELSE OUTPUT nodupes;
RUN;
AboveweareusingthekeywordsFIRST.andLAST.ThesekeywordsidentifythefirstandlastrecordinthegroupingvariableindicatedaftertheBY
statement.WhenanemployeeIDisunique,thefirstandlastrecordwillbethesamerow.ThusourcodeoutputsemployeeID'swherethefirstandlast
recordsarenotthesame,toadatasetcalled"dupes",andalltheotheruniquerecordsareputindatasetcalled"nodupes".
"Dupes"orduplicatedemployeeIDnumbers:
"NoDupes"oruniqueemployeeIDnumbers.
6.2IdentifyingOutliers
Anotherissuethatcomesupalotindealingwithdataisoutliers.ThesimplestwayinSAStoidentifyoutliersintousetheUNIVARIATEprocedure.
Bydefault,theUNIVARIATEprocedureoutputsthe5highestandlowestextremeobservations.Let'sexamineoutliersforproductpricesinthe"price_new"
dataset.
YoucanoverridethisdefaultbyspecifyingtheoptionNEXTROBS=ontheprocedurelineandindicatethenumberofoutlierstodisplay.Youcanspecifyany
numberbetween0andhalfofthetotalobservations.Youwillalsonoticethatalongwiththeextremevalues,SASalsoprovidesanobservationorrow
numberthatcorrespondstothisvalue.Additionally,youcanalsousetheIDstatementtoidentifyobservations.Thisstatementspecifiesoneormore
variablestobeincludedintheoutlierstable.Let'stryaddingtheproductidentifierProduct_IDtoeachofourextremevalues.
Noweachextremevalueisassociatedwithit'sIDnumber.
7.0Wrappingthingsup
Aswestatedinthebeginning,SASisaveryflexibleprogramswithgreatfeaturesfordatamanagement.
Thisseminaronlyscratchesthesurfaceondescribingalloftheprogrammingoptionsavailabletousers.
Formoreinformationonthetopicsdiscussedherepleaseexploreourwebsite.
Additionally,SAShasahostofcoursesdesignedtoimproveyourprogrammingskillsaimedatusersofalllevels.
Howtocitethispage
Reportanerroronthispageorleaveacomment
Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.
High Performance
Computing
Statistical Computing
ABOUT
2016 UC Regents
CONTACT
NEWS
HighPerformanceComputing
GIS
StatisticalComputing
Hoffman2Cluster
Mapshare
Classes
Hoffman2AccountApplication
Visualization
Conferences
Hoffman2UsageStatistics
3DModeling
ReadingMaterials
UCGridPortal
TechnologySandbox
IDREListserv
UCLAGridPortal
TechSandboxAccess
IDREResources
SharedCluster&Storage
DataCenters
SocialSciencesDataArchive
AboutIDRE
EVENTS
OUR EXPERTS