Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
203 views

SAS Programming Basics

The document discusses an introductory SAS programming basics seminar. It covers navigating the SAS interface, creating and modifying datasets using data steps and proc steps, and manipulating data using operators and functions. Common SAS options and syntax errors are also examined.

Uploaded by

Junaid Faruqui
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views

SAS Programming Basics

The document discusses an introductory SAS programming basics seminar. It covers navigating the SAS interface, creating and modifying datasets using data steps and proc steps, and manipulating data using operators and functions. Common SAS options and syntax errors are also examined.

Uploaded by

Junaid Faruqui
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

HelptheStatConsultingGroupby

stat

>

sas

>

seminars

>

sas_programming_basics

givingagift

>sasprogrammingbasics.htm

SASProgrammingBasics
SASisapowerfulandflexiblestatisticalpackagethatrunsonmanyplatforms,includingWindowsandUnix.Thisclassisdesignedforanyoneinterestedin
learninghowtowritebasicSASprograms.SomefamiliaritywithSASisrecommended.IfyouarenewtoSASyoumaywanttoreviewourIntroductionto
SASSeminar.Itisexpectedthatthoseattendingthiscoursehavetheabilitytonavigatetoandaccessdatafilesontheirownoperatingsystem.The
studentsintheclasswillhavehandsonexperienceusingSASfordatamanipulationincludinguseofarithmeticoperators,conditionalprocessing,usingSAS
builtinfunctions,merging,appending,formattinganddifferentoptionsformodifyingSASoutput.Itisourhopethatafterthisseminaryouwillbeableto:

ComfortablynavigatetheSASwindowenvironment
Subsetandcreatenewdatasets
Createnewvariables
WriteanddebugbasicSASprograms
UseSASfunctionforbasicdatamanagementtasks
Mergeandappenddata
ModifySASoutputforpresentation

PleasenotethatsinceweareusingdatafilesprovidedbySAS,weareunabletomaketheseavailableonourwebsite.Thus,thisseminarpageincludes
outputfromtheSASproceduresusedintheseminar.
ForclarityallSASkeywordswillbeinCAPITALlettersinordertodistinguishthemfromtheinformationthatyouastheuserwillprovide.
Note:ThisseminarwasdevelopedinSAS9.4

1.0SASRefresher
1.1Libname
Wewillstartbysettingourlibname,whichopensadirectorytothelocationwhereourSASdatafilesarestored.

*assign libname
LIBNAME idre 'C:\';
SASalsoallowsyoutoclearaparticularlibnameorusethe_all_keywordtoclearallassignedlibnames.

*clear libname;
LIBNAME idre CLEAR;
LIBNAME _ALL_ CLEAR;
* reassign library;
LIBNAME idre 'C:\';
1.1SASWindowingenvironment
Let'sbrieflyreviewtheSASwindowingenvironment.ThefivemainwindowsinSASaretheExplorer,Results,ProgramEditor,Log,andOutput/Results
Viewerwindows.Ingeneral,whenyoustartSAS,thewindowsthatinitiallyappeararetheLog,EditorandExplorerwindows.Otherwindowscanbefound
undertheViewmenuinthetoolbar.
TheSASExplorerwindowallowsyoutomanagefilesassociatedwithyourcurrentSASsessionincludingviewing,deleting,moving,andcopyingfiles.The
Editorwindow,whichisliterallyjustatexteditor,permitsyoutoenter,edit,submitandsaveSASprograms.TheLogwindowallowstheusertoview
informationabouttheircurrentsessionincludingmessagesaboutsubmittedSASprogramssuchassuccessfulexecution,errorsorwarnings.TheResults
windowenablesyoutoviewalistofresultsfromexecutedSASprograms.TheResultsViewerallowsyoutoviewHTMLresultsofexecutedSAS
procedures.InSAS9.4,thedefaultoutputformatisHTML.
1.2CreatingnewSASdatasets
Aswewillbeusingseveraldifferentdatasetsintheseminartoday,let'salsocoverhowtocreatenewpermanentandtemporarydatasetsfromthedatafiles
youhavebeenprovided.

*permanent dataset;
DATA idre.new;
SET idre.charities;
RUN;
*temporary dataset;
DATA new;
SET idre.charities;
RUN;

1.3SASOptions
SASincludesalargesuiteofsystemoptionsthatwillaffectyourSASsession.SpecificoptionsareinvokedbydefaultwhenyouopenSAS.Theoptionscan
varydependingwhatcomputingenvironmentyouareusing(e.g.Windows,Unix).TheOPTIONSprocedureliststhecurrentsettingsofSASsystemoptions
intheSASlog.

PROC OPTIONS;
RUN;
SASincludestwotypesofoptions:portableandhost.Portableornohostareoptionsthatarethesameregardlessoftheoperatingsystem.Hostoptionsare
differentdependingonwhichoperatingsystemyouareusing.
Belowaresomeexamplesofcommonoptionsandwhattheyareresponsiblefordoing.
TheAUTOCORRECToptionisturnedonbydefaultandallowsSAStocorrectsyntaxwithsmallmistakeslikeamisspelledkeyword.Inthefirstexample
below,theDATAkeywordismisspelledtoDATE.Whentheoptionisinvoked,youwillseethatintheLog(shownbelow),SASissuesawarningitassumed
thatthekeywordwasmisspelledandcontinuesexecutingtheprocedure.However,inthesecondexamplewheretheautomaticcorrectionoptionisturned
off,SASissuesanerrorandstopsexecutingtheprocedure.

*autocorrect option;
OPTIONS AUTOCORRECT; /*default*/
PROC FREQ DATE=idre.charities;
TABLE code;
RUN;
OPTIONS NOAUTOCORRECT;
PROC FREQ DATE=idre.charities;
TABLE code;
RUN;

TheFMTERRoptioncontrolswhetherSASwillissueawarningforincorrectformatsbeingusedforvariables.Inthiscase,thedefaultisforSASistoerror
andstopprocessingtheexecutedprocedure.Inthefirstexample,thedefaultoptionisinvokedandasyoucanseebelowSASissuesawarningthatthe
formatusedcouldnotbefound.However,inthesecondexamplewherewetellSAStonotissueanerror(NOFMTERR),SASignorestheincorrectlyused
formatandwilltheexecutethecommandwithouttheformat.

*format error;
OPTIONS FMTERR;/*default*/
PROC PRINT DATA=idre.charities;
FORMAT code $code.;
RUN;
OPTION NOFMTERR;
PROC PRINT DATA=idre.charities;
FORMAT code $code.;
RUN;


2.0DiagnosingandCorrectingSyntaxErrors
Amainissueswithlearninganewprogramminglanguageistheabilitytoidentifyandaddresscodingerrors.ThereareseveralwaysthatSASwillnotifyyou
ofsyntaxerrors.
2.1ColorCodedSyntax.
WhenexecutingcodeinSASintheEnhancedEditoryouwillnoticesomecolorcoding.Colorcodingprogramcomponentswillhelpyoumoreeasily
diagnosesyntaxerrors,andwhenyoufirststartwithSASyouwillmakemanymistakes.TakealookattheexamplesyntaxbelowcopiedfromtheEnhanced
Editorwindow.Hereyouwillsee5differentcolorsautomaticallygeneratedbySAS.ForexampleyouwillseethatkeywordslikeDATA,CLASS,MODELare
allhighlightedinblue.Ifyouusethewrongkeywordwithaprocedure,thekeywordwilloftenremainblacklikethevariablenamesbecauseSASdoesnot
recognizeit.OptionslikeSOLUTIONarealsoconsideredkeywords.Aswewilldiscusslater,thewaytoindicateaformatistoputaperiodatthenendand,
onceyoudothis,itwillturngreen.Anythinginquotationmarksturnsred.Inthesecondsetofcode,youwillseethatwearemissinganendquote,thusallof
thesyntaxisred.Thuswewouldknowtocorrectthemissingdoublequote.

2.2LogFile
Thelogfilewillalsoletyouknowwhenyouhavesyntaxerrors.BelowisanexampleusingPROCMEANS:

Inthesyntaxshown,weareattemptingtoruntheMEANSprocedurewithacoupleoptions.Wehaveadded"average"and"min"optionstoourstatementto
indicatethatweonlywanttotheaverageandtheminimumvaluesforsalary.Aswedescribedintheprevioussection,optionsshouldbecoloredinblueand
inthisexample"average"remainsblackindicatingtheSASisnotrecognizingitasakeyword.
Belowweseewhathappenswhenweattempttoexecutethesyntaxaswritten.Anerrorappearsinthelogfileindicatingthatthekeyword"average"wasnot
arecognizedoption.Additionally,inthisinstance,SASprovidesalistofalternateoptionsyoumayhavewanted.Ifyoulookcarefully,youwillseethat"mean"
isoneofthem.Ifwereplacetheunrecognizedkeyword"average"with"mean"theprocedurewillexecuteasexpected.

3.0DataStepvs.ProcStep
SASprogramsarecomprisedoftwodistinctsteps:datastepsandprocsteps.Datastepsarewrittenbyyou,whileproceduresareprewrittenprogramsthat
arebuiltin.Ingeneral,Datastepsareusedtoread,modifyandcreatedatafilesandalwaysbeginwitha"DATA"statement.Yousawanexampleofadata
stepinsection1.2.FromastatisticalstandpointaProcstepistypicallyusedtoanalyzeadatasetinSASwithoutmakingchangestothedata.Thereare
exceptionstothis.Procstepsalwaysstartwiththefamiliar"PROC"statement.YouhaveseenseveralexamplesofProcstepsintheprecedingsections
includingPROCPRINT,PROCMEANS,andPROCFREQ.Eachprocedureenablesustoanalyzeandprocessthatdatainspecificway.
Inthefollowingsectionswewilldemonstratehowtousethesetwotypesofsteps.

4.0ManipulatingDatasets
4.1Operators
AnoperatorinSASisasymbolrepresentingacomparison,logicaloperationormathematicalfunction.
4.1.1ComparisonOperator
Theseareoperatorsthatcompareavariablewithsomespecifiedvalueoranothervariable.Theyaretypicallyrepresentedassymbolssuchas=,<,>but
alsohavemnemonicequivalentslikeEQ,LT,orGT,respectively.Theoperatorscanusedwithinadataorprocstepdependingonyourneeds.
OneofthesimplestwaystouseacomparisonoperatorisinaWHEREstatement.Inthe"sales"datafile,wehaveinformationonsalesassociatesfrom
Australia(AU)andtheUnitedStates(US).IfweonlywantedtooutputrecordsforAustraliansalesassociateswecouldusethe=oreqoperator.Sincethe
variablecountrycontainscharacterinformationnotnumeric,weneedtoputsinglequotesaround'AU'.

PROC PRINT DATA=idre.sales;


WHERE Country='AU';
RUN;
TheINoperatorcanbeusedifyouaretryingtospecifyalistorrangeofvalues,asdemonstratedbelow.

PROC PRINT DATA=idre.sales;


WHERE Country IN ('AU', 'US');
RUN;
WealsocanspecifySAStooutputonlycertainrangesofvaluesfornumericvariables.InthefirstexamplebelowweaskSAStooutputsalaryvaluesthat
arelessthan(<)$30,000.Inthesecondexample,weoutputsalaryvaluesgreaterthanorequalto(ge)$30,000.

PROC PRINT DATA=idre.sales;


WHERE Salary<30000;
RUN;
PROC PRINT DATA=idre.sales;
WHERE Salary ge 30000;
RUN;
OnelimitationofusingaWHEREstatementisthatmorethan1cannotbeusedsimultaneously,exceptinspecialcases.Ifyouattempttosubmitthe
followingsyntax,SASwillissueanoteintheLogstating"WHEREclausehasbeenreplaced."Itwillthenexecutethefollowingsyntaxomittingthefirst
WHEREstatement.However,inthenextsectionwewilldemonstratehowtocombinecomparisonoperatorswithlogicaloperatorstoachievethedesired
output.

PROC PRINT DATA=idre.sales;


WHERE Country='AU';
WHERE Salary<30000;
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageoncomparisonoperators.
4.1.2LogicalOperators
ThelogicalorBooleanoperatorsincludeAND,OR,&NOT.Theyareoftenusedtolinkaseriesofcomparisons.Justlikethecomparisonoperators,these
canbewrittenaseithersymbolsormnemonics.Belowistableshowingtheeachsymbolandit'smnemonicalternative.

Symbol

Mnemonic

^~

NOT

&

AND

OR

IntheprevioussectionwelearnedthatwecannotusetwoWHEREstatements,butwecanusetheANDoperatortocombinetheinformationcontainedin
thosetwostatementstoachievethedesiredresult.
BelowweuseANDtooutputobservationsrepresentingAustraliansalesassociatesthatmakelessthan$30,000ayear.Boththesymbolandmnemonicare
usedandtheygivethesameresult.

PROC PRINT DATA=idre.sales;


WHERE Country='AU' AND Salary<30000;
RUN;
PROC PRINT data=idre.sales;
WHERE Country='AU' & Salary<30000;
RUN;

AswithcomparisonoperatorsyoucanalsocombineAND,OR,&NOTwiththeINoperator.Intheexamplebelowthevariablejob_titleincludesseven
differentjobtypes.Wewanttoobtainfrequenciesofallofthemexceptfortwo,SalesManagerandSalesRepIV.SoweusetheNOTcombinedwithINsince
wehavemorethenonevaluewearetryingtoexclude.

PROC FREQ DATA=idre.sales;


TABLES Job_Title;
WHERE Job_Title NOT IN ('Sales Manager','Sales Rep. IV');
RUN;
FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonlogicaloperators.
4.1.3WhereOperators
Wehavejustcoveredseveralexamplesusingthecomparisonandlogicaloperators.However,SASdoesincludeasetofspecialoperatorsthatcanbeused
onlyinWHEREexpressions.Someofthesehavesimilarfunctionstocomparisonoperators.

Operator

Description

CharorNum

BetweenAnd

Allowsforaninclusiverange

Both

Contains

Includesacharacterstringorsubstring

CharacterOnly

IsNullorIsMissing

Identifiesmissingvalues

Both

Like

Matchesapattern

CharacterOnly

=*

Soundslike

CharacterOnly

SameAndorAlso

AugmentsanexistingWHEREclausewithout
havetoretypetheoriginalone

Both

Forexample,herearethreewaysofspecifyingthatwewantSAStooutputallsalesassociaterecordswithsalariesthatrangefrom$28,000to$30,000.As
inanygoodprogramminglanguage,therearealwaysmultiplewaysofdoingthesamething.

* We can use only comparison operators;


PROC PRINT DATA=idre.sales;
WHERE 28000<=Salary<=30000;
RUN;
*We can use a mix of comparison and logical operators;
PROC PRINT DATA=idre.sales;
WHERE Salary>=28000 & Salary<=30000;
RUN;
*We can use only the special WHERE operators;
PROC PRINT DATA=idre.sales;
WHERE Salary BETWEEN 28000 AND 30000;
RUN;
Earlier,wediscussedthat,ingeneral,SASdoesnotallowyoutousemorethenoneWHEREstatementinthesamedataorprocstep.Theexceptiontothis
arethespecialoperators"sameand"and"also".ThesewillallowyoutoupdateoraugmentanexistingWHEREstatementtoaddanadditionalconditions.In
theexamplebelowthefirstconditionsubsetsthedatatoAustraliansalesassociatesthatmakelessthen$26,000,andthenweaddtheadditionalclausethat
theymustalsobefemale.

*Using Same and;


PROC PRINT DATA=idre.sales;
WHERE Country='AU' and Salary<26000;
WHERE SAME AND Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
*Using Also;
PROC PRINT DATA=idre.sales;
WHERE Country='AU' & Salary<26000;
WHERE ALSO Gender='F';
VAR First_Name Last_Name Gender Salary Country;
RUN;
Nowwhilesomeofthesespecialoperatorsarefairlyselfexplanatorylike"IsNull"somemaybelessso,suchas"=*"and"Like".Theseoperatorscanbe
helpfulforidentifyingissuessuchasmisspelledinformation,incorrectlyenteredinformation,oridentifyingrelatednamesortitlesthatvary.Forexample,
belowisadatasetcalled"shoes_eclipse"thatincludesseveraldifferentproductnames:

Let'ssupposeweareinterestedinidentifyingproductnamesthatincludetheword"Woman's".Howcouldwedothat?The"Like"operatorcouldhelpusdo
this.Itworksbycomparingcharactervaluestosomegivenpattern.Itrequirestwospecialcharacters,apercent(%)signandanunderscore(_).Thepercent
denotesthatanynumberofcharactersmayoccupyaposition.However,theunderscorespecifiesthatonlyonecharactermayoccupyaposition.Ifweare
onlyinterestedinproductsthatstartwith"Woman's",thenwedon'tcarehowmanyspacescomeafter"Woman's":

PROC PRINT DATA=idre.shoes_eclipse;


VAR product_name;
WHERE product_name LIKE "Woman's %";
RUN;

IalsocouldaskSAStooutputtomeanynamethatincludes"Men's"anywhereinthetitle.Thiswouldrequiremultiple%signsb/canyproductnamewith
"Men's"mayhavecharacterspacesbeforeandafter.

PROC PRINT DATA=idre.shoes_eclipse;


VAR product_name;
WHERE product_name LIKE "% Men's %";
RUN;

FormoreinformationcheckoutSASHelpandDocumentationonspecialWHEREoperators.
4.1.4Arithmeticoperators
Arithmeticoperators,asyoucanprobablytellfromthename,allowyoutoperformarithmeticcalculationsinSAS.Belowisatableoftheoperatorsandtheir
symbolsusedinSAS.

Symbol

Description

**
*
/
+

Exponentiation
Multiplication
Division
Addition
Subtraction

Afewthingstonoteaboutusingtheseoperators.First,ifyouarecalculatingvaluesusingavariable(s)withmissingdata,theresultingvaluewillalsobe
missing.Second,expressionsareevaluatedwithrespecttothetraditionalorderofoperationswithexponentiationtakingthehighestprioritylevel,then
multiplication/divisionandlastaddition/subtraction.Thisorderingcanbemodifiedbyusingparentheses.Third,asisthecasewiththeotheroperatorswe
havediscussed,arithmeticoperatorscanbeusingonconjunctionwithbothlogicalandcomparisonoperators.
Let'stryafewexamples.BelowwewilluseaDatasteptocreateanewtemporarydatasetcalled"sales_subset"fromthe"sales"data.Thisdatasetwill
containonlyobservationsfromAustralianemployeeswhosejobtitlecontainstheword"Rep".SoweareusingalogicalandspecialWHEREoperator.
Additionally,wearecreatinganewvariablecalled"Bonus"whichiscalculatedbymultiplying"Salary"by.10.

DATA sales_subset;
SET idre.sales;
WHERE Country='AU' & Job_Title contains 'Rep';
Bonus=Salary*.10;
RUN;
Belowweoutputthefirst20recordsofournewdataset.

Inthissecondexamplelet'suseparenthesestochangetheestimationofacompound(morethenoneoperator)expression.WewilluseaDatastepto
createtwonewvariablesprofit1andprofit2.

DATA profit;
SET idre.order_fact;
profit1 = total_retail_price - costPrice_per_unit * quantity;
profit2 = (total_retail_price - costPrice_per_unit) * quantity;
RUN;
Let'sseehowtheuseofparentheseshaschangedourvalues.

FormoreexamplecheckoutSAS9.4HelpandDocumentationpageonarithmeticoperators.
4.2ConditionalProcessing
4.2.1WHEREandIFstatements
ConditionalprocessinginSASallowstheusertomanipulateandoutputportionsofdatainsteadofthewholefile.Inprevioussectionyouhaveseenseveral
examplesoftheWHEREstatement.Alternatively,SASalsoallowsfortheuseofIFstatements.Bothcanaccomplishsimilartaskshowever,whileboth
WHEREandIFcanbeusedwithaDatastep,onlyWHEREisallowedinaProcstep.Forexample,ifweaddanIFstatementtothePROCMEANS
commandfromearlierwewillseethattheIFturnsred.Thisindicatesthatthesyntaxisincorrect.

HoweverifyouuseWHEREthestatementisblue.

IfyouattempttoexecutethePROCMEANSusingtheincorrectIFstatementSASwillproduceanerrorbutSASwillexecutethecommandusingthe
WHEREstatement.

DatastepswillacceptbothWHEREandIFstatement,howeveronlyanIFcanbeusedforassignmentstatements.Belowisanexampleofanassignment
statement.Assignmentinthiscasemeanswearetakingobservationswithvaluesforsalarythataregreaterthan$30,000andassigningthem,usingTHEN
OUTPUT,toanewdatasetcalled"highsales".

DATA highsales ;
SET idre.sales;
IF salary GT 30000 THEN OUTPUT highsales;
RUN;
YoucanalsocombineWHEREandIFinthesameDatastepasdemonstratedbelow.WeuseanIFandWHEREstatementtosubsetthedata.Canyouthink
ofequivalentwaysofsubsettingthedata?

DATA emps;
SET idre.sales;
WHERE Country='AU';
Bonus=Salary*.10;
IF Bonus>=3000;
RUN;
Moreover,youwillnoticethatSASallowsyoutocreateavariableanduseitinanIFstatementinthesameDatastep.Thisissomethingyoucanaccomplish
withanIFbutnotWHERE.ThereasonforthishastodowithwhenSASexecutesconditionalstatements.WhenusingaWHEREconditionSASonlyselects
theobservationsthatmeetsthisparticularconditionandthencontinuesexecutinganyotheroperationsintheDataStep.Thismakesformoreefficient
processingofdataespeciallywithlargeamountsofdata.ButinthisinstanceifwehadusedaWHEREstatementtosubsetthedatausing"Bonus",SAS
wouldhavegivenusanerrorsayingthe"Bonus"variableisnotinthedataset.However,IFconditionsarenotprocesseduntiltheendoftheDatastep.Thus,
SASwillexecutetheWHEREstatementandcreate"Bonus"andthenassesswhethertheIFconditionistrue.
4.2.2IfThenstatement
AnIfthenstatementisacommonlyusedassignmentstatementthatistypicallycarriedoutwithinthecontextofDataStep.ItexecutesaSASstatementthat
fulfillsacertaincondition.
Wewillonceagaincreateavariablecalled"Bonus",butassignthevaluesbasedonacertainsetofconditionsthataredefinedbyanemployee'sjobtitle.

DATA comp1;
SET idre.sales;
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
IF Job_Title='Sales Manager' THEN Bonus=1500;
IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;

Youwillseeintheoutputabove,thatseveralobservationshavemissingvalues.Thisisduetothefactthatwedidnotassignvaluesfor"Bonus"forallofthe
jobtitles.
ArelatedstatementtoIFTHENistheELSEstatementthatcanbeusedwhencreatingconditionalstatementsaroundmutuallyexclusivegroups.

DATA comp2;
SET idre.sales;
IF Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
RUN;
SASprocessthefirstIFstatementandifitisnottrueitmovestothenextandsoon.SAScontinuestotesttheIFTHENstatementuntilitfindsonethatis
true,whichatthatpointitstopsandwillnottesttheremainingconditions.Onceagain,thiscanspeeduptheprocessingoflargedatasets.However,aswas
thecasewiththefirstIFTHENexample,wewillendupwithalotofmissingvaluesusingthissyntax.
Whatifwehadascenariowherewewantedtogivealltheremainingcategories,thatdidnotfulfilltheprescribedconditions,onebonusvalue.Wecando
thatusingafinalELSEstatementwithnoIFTHEN.IntheSAScodebelow,weaddanadditionalELSEstatementassigningallofthejobtitlesabonusvalue
of500.

DATA comp3;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE IF Job_Title='Sales Manager' THEN Bonus=1500;
ELSE IF Job_Title='Senior Sales Manager' THEN Bonus=2000;
ELSE IF Job_Title='Chief Sales Officer' THEN Bonus=2500;
ELSE Bonus=500;
RUN;

Now,wehavecompletedataforallobservations.
AsecondrelatedstatementtoIFTHENistheDELETEstatement.InallofthepreviousexampleswehaveusedtheIFTHENstatementtoaddinformation
butyoucanalsousetheIFTHENtodeleteaswell.UsingtheIFTHENDELETEsyntaxwecanspecifythatcertainobservationsfittingourconditionbe
permanentlydeletedfromthedata.Intheexamplebelow,wedeleteallobservationsassociatedwiththreespecificjobtitles.

DATA drop;
SET idre.sales;
IF Job_Title IN('Sales Manager', 'Senior Sales Manager', 'Chief Sales Officer') THEN DELETE;
RUN;
4.2.3UsingDo
TypicallywithanIFTHENstatementonlyoneexecutablestatementisallowed.Whenanexpressionistruetheassociatedstatementisexecuted.Butwhat
happensifyouwantmorethenonestatementexecutedforeachexpression.Forexample,let'simaginethatforeachbonusvalue,Ialsowanttocreatea
variablecalledfreqthatdenoteshowmanytimesayearthesalesassociatecanreceivethebonus(e.g.onceayear,twiceayear).Sowemighttrythe
followingcodeusingalogicaloperator.

DATA freq1;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000 & Freq = "once a year";
ELSE Bonus=500 & Freq = "twice a year";
RUN;
Whilethissyntaxappearsreasonable,SASwillexecutethestatementandtheissueanoteinthelogthat"VariableFreqisuninitialized".WhenSASis
unabletolocateavariableinaDATAstep,SASprintsthismessage.Ifyoulookinthefreq1SASdatasetyouwillseethatSAScreatedthevariablebutsets
allofit'svaluestomissingwhichisundesirable.Itappearsthatcreating"Freq"willrequireaseparatestatementinsteadofjustasimple"&".Youcouldtry
this:

DATA freq2;
SET idre.sales;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Bonus=1000;
ELSE Bonus=500;
IF Job_Title='Sales Rep. III' or Job_Title='Sales Rep. IV' THEN Freq = "once a year";
ELSE Freq = "twice a year";
RUN;
Butthiscodecouldgetfairlylongifyouhavealotofvariablestocreate.AbetterwaytodothiswouldbethroughtheuseofaDOgroupwhichallowsfor
multiplestatements.

DATA bonus;
SET idre.sales;
IF Country='US' THEN DO;
Bonus=500;
Freq='Once a Year';
END;

ELSE DO;
Bonus=300;
Freq='Twice a Year';
END;
RUN;
WhilethesyntaxlookssimilartoatraditionalIFTHEN,therearesomeimportantdifferences.First,theIFexpressionnowendswithTHENDO.Thisis
followedbyasetofstatementstobeexecuted.Second,eachDOblockendswithanENDstatement.Third,insteadofjustELSEwenowhaveELSEDO
whichalsohasanENDstatement.IfyouaremissinganEND,SASwillissueawarninginthelogandfailtoexecutetheDatastep.
4.3SASFunctions
Functionsacceptsargumentsandthenproduceaparticularvalue(numericorcharacter)basedonthosearguments.Argumentsareenclosedwithin
parenthesesandeachargumentisseparatedbyacomma.SAShasawidearrayofdifferentfunctionsdependingontheneedsoftheuserandcanbeused
inDatastep.Wewillcoverafewexamplesofbasicmathematicalfunctions,commondatefunctions,andsomeadditionalfunctionsusefulforspecificdata
managementtasks.
4.3.1ArithmeticFunctions
Inthefirstexample,wewillusethe"Oldbudget"datafiletocalculatethetotalandaverageamountbudgetedforbusinessoperationsoverafiveyearperiod.

DATA budget;
SET idre.oldbudget;
sum1 = yr2003 + yr2004 + yr2005 + yr2006 + yr2007;
sum2 = SUM(yr2003, yr2004, yr2005, yr2006, yr2007);
sum3 = SUM( of yr2003-yr2007);
mean1 = (yr2003 + yr2004 + yr2005 + yr2006 + yr2007)/5;
mean2 = MEAN(yr2003, yr2004, yr2005, yr2006, yr2007);
mean3 = MEAN( of yr2003-yr2007);
RUN;
Therearemanydifferentwayofcreatingthesumandmeanvariablesthatweneed.Wecreate"sum1"usinganarithmeticoperatortoaddthe5budget
amounttogether.Alternatively,wecanusetheSUM()function,theargumentsarethevariablesyouwishtosumtogether.Thedifferencebetweenusingthe
functionversusmanuallyaddingtogethereachvariableisthetreatmentofmissing.Whenweadditemsusing"+",acasewithmissingvaluesonanyofthe
variableslistedwillhaveamissingvaluefortheresultingvariable.IfweusetheSUM()function,anymissingvalueswillbetreatedasthoughtheywerezero,
andthenewvariablewillbeequaltomissingonlyifallofthevariableslistedaremissing.Whichmethodismostappropriatedependsonthesituationand
whatyouaretryingtoachieve.Last,ifyouhavealotofvariablestobesummedyoucanspecifyaSASvariablelist.Thissyntaxworkssincethevariable
beingspecifiedareconsecutiveinthedata.ChecktheSASdocumentationpageonSASvariablelistsonhowtousethisshortcutinothercircumstances.We
alsousesimilarsyntaxtodemonstratehowtoestimatetheaverageormeanbudgetvariables.

Allthevaluesproducedfor"sum1sum3"and"mean1mean3"arethesamesincewedonothaveanymissingdata.SAShasanumberofadditional
mathematicalfunctionsincludingabsolutevalue,maximum,minimumandsquarerootthatcanbeusedinasimilarmanner.
4.3.2DateFunctions
Oneofthemorechallengingdatatypestodealwithinanydataanalysispackagearedatevalues.Thankfully,SAShassomebuiltinfunctionsthatcanassist
userswithmanagingthisdatatype.SASstoresdateinformationasnumericvaluesrepresentingthenumberdaysbeforeorafterJan1,1960.SAScanalso
recognize2or4digityearvalues.Wewillusethe"Sales"datasetwhichincludesinformationondateofbirthandhiringdataforeachemployeeto
demonstratesomedatefunctions.

DATA comp;
SET idre.sales;
Hire_Month=MONTH(Hire_Date);
Birth_Day = WEEKDAY(Birth_date);
Day_Dif = DATDIF(Birth_date,Hire_Date, 'actual');
Month_dif= INTCK('years',Birth_date,Hire_Date);
Bonus_1 = INTNX('month', Hire_Date, 6);
RUN;
TheMONTHfunctionpullsthemonthfrom"Hire_date"andput'sitinavariablecalled"Hire_month".TheWEEKDAYfunctionfiguresoutwhatdayofthe
week(17)thedatewouldhavefallenonandoutputsthis.
DATDIFcalculatesthedifferenceindaysbetweentwodatesgiveninthefirsttwoarguments.Thethirdargumentspecifiesthemethodforcalculatingthe
days.Inthisexamplewespecifywewantthe'actual'numberofdays,butwecouldchooseothermethodsofcalculationsuchassumingthateachmonthhas
30daysandthatayearalwayshas360days.
INTCKcountsthenumberofintervalsbetweentwodates,inourexampleweaskedSAStooutputthenumberofyearsbetweenanemployeesdataofbirth
andwhentheywerehiredwhichwewouldbeequivalenttoanemployeesageatthetimeofhire.

INTNKisusedtoestimatecalculatethevariablebonus_1.Herewewanttocalculatewhenanemployeewithbeeligiblefortheirnextbonus.The
argumentsforthisfunctionaretheunitoftime,thevariablerepresentingthestartdate/timeandthenumberofincrements.Inourexample,employeesare
eligible6monthsaftertheirhiredate.
Belowistheoutputofthefirst10observationsofthe"comp"dataset,withandwithoutdateformats.Asmentionedbefore,SASstoresdateinformationas
numericinformationindays.Thusifyoudonotformatdatewithaformatstatement(discussedfurtherinthenextsection),itwilldisplayasjustanumber.

PROC PRINT DATA=comp (OBS=10);


VAR Employee_ID Hire_date Hire_Month Birth_date Birth_Day Day_dif Month_dif Bonus_1;
*FORMAT Hire_date Birth_date Bonus_1 mmddyy10.;
RUN;

MoreexampleofSASdatefunctioncanbefoundontheSASHelpandDocumentationwebsite.
4.3.3OtherFunctions
SASincludesseveralothertypesoffunctionsdesignedforspecifictypesofneedsmanyofthesefunctionsarehelpfulfordatamanagementofcharacteror
stringinformation.Forexample,LENGTHtellstheuserthelengthofacharacterstringwhileCOMPRESSwillcompressstringvaluesandremoveunwanted
blanksandspecificcharactervalueslikedashes.Additionally,insimilarwaytoextractingdateinformationliketheMONTHfunction,SAShasseveral
functionsincludingSCANandSUBSTRthatallowsyoutoextractwordsfromaphrase.
Let'sdemonstratethese.Belowissampleofdatafromadatasetcalled"Shoes_eclipse"whereallthevariableshavecharacterinformation.Ourtaskof
interestistoobtainthelengthofproduct_name,compressproduct_nametoremovetheblanks,andcreateavariabletheextractsthebrandname
"Eclipse"fromproduct_group.

DATA shoes;
SET idre.shoes_eclipse;
length_name = LENGTH(product_name);
comp_product = COMPRESS(product_name);
brand = SUBSTR(product_group, 1, 7);

brand2 = SCAN(product_group, 1, " ");


RUN;

Youwillnoticeafewthingsabouttheoutputabove.First,forthevariablelength_name,ifyoucountedthenumberoflettersandspacesinproduct_name
youwouldendupwiththesamevaluesdisplayedabove.Second,thecompressedversionofproduct_namenowincludesnospaces.Third,bothSCAN
andSUBSTRfunctionsproducedthesameoutput.TheSUBSTRfunctiontakes3arguments,thenameofvariablewiththeinformationyouwanttoextract,
thecharacterpositionyouwanttostartfromandthenthenumberofcharactertoextract.InourexamplewearetellingSASthatwewanttoextracta
characterstringoflength7startingatthefirstcharacterpositionof"productgroup"whichwouldbethe"E"inEclipse.Unfortunately,thismeanswhatever
valueweareextractingmustalwaysbeofthesamelength.Whatifwehaveproductnamesofdifferentlengths.ThenyoumightwanttousetheSCAN
function,whichworkverysimilartoSUBSTRexcept,insteadofspecifyingthelengthofthestring,thelastargumentisadelimiter.Thesyntaxabove
indicatesthatthecharacterstringofintereststartsatthefirstpositionandcontinuesuntilablank/spaceisencountered.Thisfunctionworkswithmanytypes
ofdelimitersincluding<(+&!$*)^/,%.
Inthepreviousexamples,wewereextractingvaluesfromastring,butwhatifwewantedtocombinestringvariables.AusefulfunctionwouldbeCATX.
BelowwewantSAStocombinethecharacterstringinformationinfirst_nameandlast_nameintoonefullnamevariable.Additionally,thefunctionalso
requiresthespecificationofvariablesthatincludesinformationonthedelimiterofchoice.Inthefirstexample,thedelimiterisjustablankwhileinthesecond
examplethedelimiterisacomma.

DATA salesquiz;
SET idre.salesquiz;
sep = " ";
fullname = CATX(sep, first_name, last_name);
sep1 = ",";
fullname1 = CATX(sep1, last_name, first_name);
RUN;

Thenewvariablesaredisplayedabove.
AlistofallSASfunctions,bycategory,canbefoundhereontheSASwebsite.
Note:TheorderinwhichthevariablesarespecifiedintheCATXfunctiongovernstheorderinwhichtheywillbecombined.
4.4Sorting,MergingandAppending
4.4.1Sorting
Thearemanyinstanceswhenhavingyourdatasortedinaparticularwaywillbehelpfulforvisualizingyourdata.Additionally,certaintypesofdata
managementneedslikemergingdatasetsorgroupingobservationsbyaparticularcharacteristicrequiresorting.
SortingdatabyasinglevariableinSASisthemostsimple.BydefaultSASsortsdataascendingwiththesmallervaluesfirst.

PROC SORT DATA=idre.sales OUT=sales; *OUT= is optional;


BY Salary;
RUN;

Sortingcanalsobedoneusingmorethenonevariable.

PROC SORT DATA=idre.sales OUT=sales;


BY Salary Country;
RUN;

Asyoucansee,thedataissortedinascendingorderby"Salary"firstandthenwhentherearetiedsalariesfromdifferentcountries,AUcomesbeforeUS
alphabetically.Wecanchangethissortingbehaviorflippingtheorderingofourvariablesand/oraddingintheDESCENDINGoption,whichreversesthesort
orderforthevariablethatimmediatelyfollowsit.

PROC SORT DATA=idre.sales OUT=sales;


BY DESCENDING Salary DESCENDING Country;
RUN;

4.4.2Merging
Onedatamanagementtaskthatrequirespropersortingismerging.Merginginvolvesmatchingoneobservationinadatasettooneobservation(OnetoOne)
ormultipleobservations(OnetoMany)inaseconddataset.InorderforthistobedoneproperlyinSAS,thedatasetstobemergedmustbesortedbythe
samevariable(s).Intheexamplebelow,wewillmergeadatasetthathasemployeepayrollinformationwithaseconddatasetwithemployeeaddresses.
Sinceanemployee'sIDnumber(employee_id)isauniqueidentifierofeachobservation,wewillusethisvariabletomatchobservations.
First,weneedsorteachdatasetbyemployee_id.

PROC SORT DATA=idre.employee_payroll OUT=payroll;


BY Employee_ID;
RUN;

PROC SORT DATA=idre.employee_addresses OUT=addresses;


BY Employee_ID;
RUN;

Afewthingstotakenoteof.First,Youwillnoticethatdatasets"addresses"and"payroll"donotshareanyofthesamevariablesexceptEmployee_ID.In
general,youdonotwanttomergedatasetsthatincludevariableswiththesamenames.SAScanonlyuseonesetofvaluesandwillarbitrarilychoosethe
valuesfromthelastdatasetread.Thus,youshouldrenamevariablesbeforeattemptingthemerge.Second,Employee_IDisuniqueineachdataset,so
thiswillbeaOnetoOnemerge.
MergingisdoneinaDatastepsimilartowhatwehavebeenexecuting,exceptinsteadoftheSETstatementwenowhaveaMERGEstatement.Additionally,
theBYisusedtotellSASwhichvariablewillbeusedtomatchrecords.ThevariableaftertheBYstatementisthesameuniqueidentifierthatwejustused
forsorting.

DATA payadd;
MERGE payroll addresses;
BY Employee_ID;
RUN;
Belowisasubsetofvariablesfromthenewlymergeddata.Asyoucansee,Employee_Nameisfromthe"addresses"dataandBirth_dateandSalaryare
fromthe"payroll"data.

Nowlet'stakealookatanexampleofaOnetoManymerge.
Thefirstsetofdataprovidesinformationonorderanddeliverydates.Inthesecondsetofdatawehaveinformationontheproductorproductsordered.
BecausemorethenoneitemcanbeassociatedwithaparticularOrder_ID,itisnotuniqueinthisdataset.Thus,wewillneedtoconductaonetomany
mergewhereeachrowinour"orders"datacouldbemergedwithmultiplerowsinthe"order_item"data.Again,wewillbeginbysortingbothsetsofdataby
Order_ID.

PROC SORT DATA=idre.orders OUT= orders;


BY Order_id;
RUN;

PROC SORT DATA=idre.order_item OUT= order_item;


BY Order_id;
RUN;

Belowisoursyntaxtomergethetwodatasets.NoticewealsousedaKEEPstatement.Thisallowsustomergethedataandcontrolthenumbervariables
presentinthefinalmergeddataset.

DATA allorders;
MERGE orders order_item;
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
RUN;

Aboveisaselectedportionofthemergeddata.SASexecutedthemergewithoutanerrorbutitappearsthatwehavesomemissingdataasaresult.Thetwo
variablesthathavemissinginformationwerebothfromthe"orders"data.Thisisanindicationthatweperhapshavesomenonmatches.Ifwegobackand
lookatthe"orders"datawewouldseethatthereisnoinformationfortheorderidentifier"1243854878"butthereisinformationin"order_item",thuswhen
youmergethedatasetstogetherallthevariablesfrom"orders"willhavemissingvaluesforthisparticularorder.Thereareacouplewaysyoucandealwith
thisissue.First,youcanleavethedataasiswithmissinginformationfornonmatches.Alternatively,youcanchoosetocontroltheobservationsoutputtothe
newmergeddatasetbyusingtheINoptionontheMERGEstatement.TheINoptionacreatesvariableindicatingwhichdataset(s)contributedtoformingthe
observationinthefinalmergedataset.Itisatemporaryvariableusedinthemergingprocessthatisgivena0valueifdidnotprovideinformationora1ifit
did.Wecouldthenusethisvariabletoselectobservationsinthenewlymergeddatathatcomefromonedatasetorboth.Let'stakealookathowwecould
applythisoptioninourpreviousmerge.

DATA allorders2;
MERGE orders (in=a)
order_item (in=b);
BY Order_ID;
KEEP Order_ID Order_Item_Num Order_Type Order_Date Quantity Total_Retail_Price;
IF a;
RUN;
UsingtheINoptionwithanIFstatementselectsobservationstobematchedbyorder_IDthatarepresentin"orders".Ifyouhaveavaluefororder_IDthatis
in"order_item"butnot"orders"thenitwillnotbeusedtoconstructobservationsforthe"allorders2"dataset.Note:UsingIF=aisequivalenttosayingIFa=1.
Thus,youwillnotendupwithanymissingvalues.
Now,onethingyougenerallywanttoavoidismanytomanymerges.Whenneitherdatasethasauniqueidentifierthatwillallowforpropermatchingof
recordstheresultisasomewhatunpredictableandoftenundesirableassortingofobservations.
4.4.3Appending
Appendingorconcatenatingobservationsistheprocessofaddingrowsorobservationstoadatasetasopposedtomergingwhichaddsvariables.Thiscan
alsobeaccomplishedusingaDatastep.SASwillstackthecolumnstotogetherbymatchingthenamesacrossdatasets.
Wewillappendthreedatasetsthatincludeinformationonordersfrom3consecutivemonths(JulySeptember)in2011.Belowissnapshotofthefirsttwo
recordsfromeachofthedatasetstobeappended.

Thesyntaxtoconducttheappendisquitesimple.AllyouhavetodoislistthedatasetstobeappendedontheSETstatementline.Theorderingandthe
numberofdatasetsdoesnotmatter.

DATA mnth7_8_9_2011 ;
SET idre.mnth7_2011 idre.mnth8_2011 idre.mnth9_2011;
RUN;
Aportionofthenewlyappendeddatasetisbelow.

Nowyoucanseethatall3datasetshasbeenappendedor"stacked"together.Thisexampleworkedperfectlybecausethethreedatasetssharedtheexact
samevariables.Butwhathappenswhenyouappendthedatasetsthatdonotcontainthesamevariables?
Takealookbackatour"shoe"data.Belowwehavetwosetsofdata,oneforEclipseshoesandoneforTrackerShoes.Youwillnoticetheyshareallthe
samevariablesexcepttwo,product_idandsupplier_name.

Whatwillhappenwhenweattempttoappendthedata?

DATA shoes;
SET idre.shoes_eclipse idre.shoes_tracker;
RUN;

Theappendstillexecuteswithouterror.However,inthenew"shoes"datawecreated,alltherecordsfromtheEclipsedatasetwillbemissingonthe
variablesthatwereonlyintheTrackerdataset.

5.0ModifyingSASOutput
5.1TitlesandFootnotes
Asyouhaveprobablyalreadynoticed,SASprovidesalotofoutputfrommanyofit'sprocedures.Asaresearcheritisimportanttoknowhowtomanipulate
andchangeyouroutputtoconveyimportantinformationtoyouraudience.OneoftheprocedureswehavebeenusingtoobtainoutputfromourvariousData
stepsisPROCPRINT.Wewillbeginbyexploringsomewaysofenhancingtheoutputfromthisprocedure.
Wheneveryouarepresentingtablesofinformation,thefirstitemmostpeoplelookforisatitle.ThisiseasilyimplementedwithaTITLEstatementin
SAS.Additionally,itisalsopossibletoaddmultipletitlestooutputinSASaswellasfootnotesbyjustaddinganumericsuffixtothestatementindicatingthe
desiredordering.SASallowsforupto10differenttitlesand/orfootnotes.

TITLE1 'Orion Star Sales Staff';


TITLE2 'Salary Report';
FOOTNOTE1 'Confidential';
PROC PRINT DATA=idre.sales (OBS=5);
VAR Employee_ID
Last_Name Salary;
RUN;


5.2LabelOptions
Additionally,youmayalsofinditusefultolabelyourvariablesforaddedreadability.YoucandothiswiththeLABELstatement.Youwillnoticethatsomeof
thevariablelabelsarelongerthanothers.Whenyouonlyhavethreevariablesitmaynotseemimportant,butifyouhavetolabel10variables,available
spaceinatablemayneedtobeaconsideration.OnewaytodealwiththisistousetheSPLIToptiononthePROCPRINTline.Thisallowstheuserto
controlthedisplayofthetitlesothatinsteadofthelabelbeingoneline,youcansplititintotwolines.

PROC PRINT DATA=idre.sales (OBS=5) SPLIT='*';


VAR Employee_ID Last_Name Salary;
LABEL Employee_ID = 'Sales ID'
Last_Name = 'Last*Name'
Salary = 'Annual*Salary';
RUN;

Thevariablenameshavebeenreplacedwithlabels.NotethatwedidnothavetoresubmittheTITLEandFOOTNOTEstatements.Theseareconsidered
globalstatementsandremainineffectuntilyoucancelthemoryouendyourSASsession.Tocanceltheseyoumustissueablankstatementforeach:

TITLE;
FOOTNOTE;
5.3Formats
Beyondjustlabelingvariables,youmayalsowanttoproperlylabelthevaluesofthosevariables.ThisiscarriedoutinSASusingaFORMATstatement.
FormattingvalueschangestheappearanceofthosevaluesinoutputbuttheunderlyingvaluesdoesNOTchange.
SAShaspredefinedformatsforcertaintypesofvariableslikedatesandallowsuserstocreatetheirownformatsforspecificsituations.Earlierwesawsome
examplesofpredefinedformatswhenwecovereddatefunctions.Here,wewillfocusonhowtocreateandapplyuserdefinedformats.
InSAS,thePROCFORMATprocedureisusedtodefineformats.Forexample,takealookatthesyntaxbelow:

PROC FORMAT;
VALUE $ctryfmt 'AU'='Australia'
'US'='United States'
other ='Miscoded';
VALUE tiers0-49999='Tier 1'
50000-99999='Tier 2'
100000-250000='Tier 3';
RUN;
EachformatisdefinedafteraVALUEstatement.Thenameyouchooseisuptoyou.Noticethatcharacterformatsmustbedefinedwitha"$"infrontof
them.Youthenprovideavaluelabelforeachlevelorrangeofvalues.Thefirstformatwearecreatingistolabelthecountrieswilltheirfullnamesinsteadof
abbreviations.VALUEstatementscanalsousekeywords.InthisexampleotherspecifiesthatanyvaluesotherthenAUorUSwillbelabeledas"Miscoded'.
Fornumericformats,youcanlabelasinglevalueorarangeofvalues.Oncetheformatsarecreatedwecanapplythemtothevariablesofinterest.

InbothDatastepsandProcsteps,SASdistinguishesformatsfromvariablesbyendingtheminaperiodwhichthenturnsthetextgreen.

PROC PRINT DATA=idre.sales (OBS=5);


VAR Employee_ID Salary Country Birth_Date Hire_Date;
FORMAT Salary tiers. Birth_Date Hire_Date monyy7. Country $ctryfmt.;
RUN;

Aboveyoucanseetheappearanceofthetablewithunformattedvaluestotheonewithformattedvalues.Ifyouonlywanttousetheformattedvaluesfor
certainproceduresinSAS,thenyoucanjustaddaformatstatementaswedidabove.Ifyouwanttheseformatstobepermanentlyappliedtoavariable,
thenyoucanusethesameformatstatementinaDataStep.
5.4OutputDeliverySystem(ODS)Basics
BesidescustomizingtheSASdefaultoutput,youmaywanttooutputresultstodifferentfiletypes.BydefaultSAS9.4outputresultsasHTMLandthisis
whatyouseeinthe"ResultsViewer"window.Ifyouwouldliketochangethisbehavior,youwillneedtousetheOutputDeliverySystem(ODS)statement.
Thiswillallowforoutputinseveraldifferentformatsincludinglisting/text,rtf,pdfand.xls.
ODSstatementsarealsoglobalstatementsareineffectuntilclosed.Thestatementcomesbeforerunningtheprocedure.Oncetheprocedure(s)is
executed,youwillthenclosetheODSstatement.Thebasicsyntaxisshownbelow:

ODS PDF FILE="&path\example.pdf";


ODS RTF FILE="&path\example.rtf";
PROC FREQ DATA=<data>;
TABLES <variable>;
RUN;
ODS PDF CLOSE;
ODS RTF CLOSE;
Inthiscase,theoutputfromthePROCFREQwillbesavedtoapdffileandartffile.Youmustalsospecifythepathorlocationwhereyouwantthese
documentssaved.OncecompleteyoushouldthenclosetheoutputdestinationotherwiseSASwillkeepsendingyourresultstothesedocuments.
BeforeSAS9.3thedefaultoutputdestinationwaslisting.YoucanseetheresultsfromthePROCFREQintheOutputwindowandseeanewiconinthe
Resultstab.

ODS LISTING;
PROC FREQ DATA=idre.sales;
TABLES gender;
RUN;
ODS LISTING CLOSE;
Asauser,youcanalsocustomizethestyleorlookoftheoutputwhenselectingeitherahtml,pdsorrtfdestination.Thisabilityisoftenusefulwhen
formattingresultsforpresentationsorpublications.BelowaresomeexamplesofthedifferentoptionsavailableinSAS:

ODS HTML FILE="C:\myreport.html" STYLE=sasweb;


PROC FREQ DATA=idre.sales;
TABLES gender;
RUN;
ODS HTML CLOSE;

ODS PDF FILE="C:\myreport.pdf" STYLE=printer; /*Default*/


ODS PDF FILE="C:\myreport1.pdf" STYLE=journal;
PROC FREQ DATA=idre.sales;
TABLES gender;

RUN;
ODS PDF CLOSE;

Justtobesafewewillgoaheadandclosealloftheopendestinations.However,bemindfulthatthiswillalsoclosethehtmldefault,soyouwillneedto
reissuetheglobalstatementturningitbackon.Otherwisethenexttimeyouissueaprocedurethatgeneratesoutput,SASwillissuethewarning"Nooutput
destinationactive".

ODS _ALL_ CLOSE;


ODS HTML;

6.0SpecialIssues
6.1DealingwithDuplicates
Anissuethatcomesupalotindatamanagementishowtohandleduplicates.ThereareseveralwaysinSAStoidentifyduplicaterecords.
OnewayistousesomeoftheoptionsavailabletouswiththePROCFREQprocedure.Inthe"nonsales"datafile,weshouldhave235uniqueemployee
identificationnumbers.WecanusetheORDER=FREQoptiontodetermineifthisistrue.Thisoptiondisplaysthefrequencyofeachuniqueidentification
numberindescendingorder.

PROC FREQ DATA=idre.nonsales ORDER=FREQ;


TABLES Employee_ID;
RUN;

AboveyoucanseethattheemployeeID#120108hastworecordsassociatedwithit,indicatingthatwehaveaduplicateproblem.Anotherusefuloptionis
NLEVELS,whichdisplaysthenumberofdistinctvaluesforeachvariable.

PROC FREQ DATA=idre.nonsales NLEVELS;


TABLES Employee_ID /NOPRINT;
RUN;

Thereare235uniqueemployeesinthe"nonsales"databutonly234uniquelevels,meaningthatoneemployeeID#isduplicated.
OncethepresenceofduplicateIDnumbershasbeenconfirmed,youwillmostlikelywanttoexaminethemtodetermineiftheyareindeedduplicaterecords
oriftheemployeeIDnumberisincorrect.InourdatasetonlyoneIDisduplicatedmakingassessmentfairlyeasy.However,whatdoyoudowhenseveral
ID'sorrecordsareduplicated?Let'sseparatetherecordswithuniqueID'sfromtheduplicatesusinganIFstatement.

PROC SORT DATA=idre.nonsales OUT=ids2;


BY employee_id;

RUN;
DATA dupes nodupes;
SET ids2;
BY employee_id;
IF NOT (FIRST.employee_id and LAST.employee_id) THEN OUTPUT dupes;
ELSE OUTPUT nodupes;
RUN;
AboveweareusingthekeywordsFIRST.andLAST.ThesekeywordsidentifythefirstandlastrecordinthegroupingvariableindicatedaftertheBY
statement.WhenanemployeeIDisunique,thefirstandlastrecordwillbethesamerow.ThusourcodeoutputsemployeeID'swherethefirstandlast
recordsarenotthesame,toadatasetcalled"dupes",andalltheotheruniquerecordsareputindatasetcalled"nodupes".
"Dupes"orduplicatedemployeeIDnumbers:

"NoDupes"oruniqueemployeeIDnumbers.

6.2IdentifyingOutliers
Anotherissuethatcomesupalotindealingwithdataisoutliers.ThesimplestwayinSAStoidentifyoutliersintousetheUNIVARIATEprocedure.
Bydefault,theUNIVARIATEprocedureoutputsthe5highestandlowestextremeobservations.Let'sexamineoutliersforproductpricesinthe"price_new"
dataset.

PROC UNIVARIATE DATA=idre.price_new;


VAR unit_cost_price;
RUN;

YoucanoverridethisdefaultbyspecifyingtheoptionNEXTROBS=ontheprocedurelineandindicatethenumberofoutlierstodisplay.Youcanspecifyany
numberbetween0andhalfofthetotalobservations.Youwillalsonoticethatalongwiththeextremevalues,SASalsoprovidesanobservationorrow
numberthatcorrespondstothisvalue.Additionally,youcanalsousetheIDstatementtoidentifyobservations.Thisstatementspecifiesoneormore
variablestobeincludedintheoutlierstable.Let'stryaddingtheproductidentifierProduct_IDtoeachofourextremevalues.

PROC UNIVARIATE DATA=idre.price_new NEXTROBS=3;


VAR unit_cost_price;
ID Product_ID;
RUN;

Noweachextremevalueisassociatedwithit'sIDnumber.

7.0Wrappingthingsup
Aswestatedinthebeginning,SASisaveryflexibleprogramswithgreatfeaturesfordatamanagement.
Thisseminaronlyscratchesthesurfaceondescribingalloftheprogrammingoptionsavailabletousers.
Formoreinformationonthetopicsdiscussedherepleaseexploreourwebsite.
Additionally,SAShasahostofcoursesdesignedtoimproveyourprogrammingskillsaimedatusersofalllevels.

Howtocitethispage

Reportanerroronthispageorleaveacomment

Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.

IDRE RESEARCH TECHNOLOGY


GROUP

High Performance
Computing
Statistical Computing

GIS and Visualization

ABOUT
2016 UC Regents

CONTACT

NEWS

Terms of Use & Privacy Policy

HighPerformanceComputing

GIS

StatisticalComputing

Hoffman2Cluster

Mapshare

Classes

Hoffman2AccountApplication

Visualization

Conferences

Hoffman2UsageStatistics

3DModeling

ReadingMaterials

UCGridPortal

TechnologySandbox

IDREListserv

UCLAGridPortal

TechSandboxAccess

IDREResources

SharedCluster&Storage

DataCenters

SocialSciencesDataArchive

AboutIDRE

EVENTS

OUR EXPERTS

You might also like