SaS Notes
SaS Notes
%let path=/folders/myfolders/ecprg193;
libname orion "&path";
data steps --read from ip to create data set
proc steps -- procedure step process sas data set to report
run: step to explicitly end each step
global step : title
sas free format style
Unbalanced quotation marks. no errors or warning messages.. Ypu have to correct
by stopping execution manually.
Accessing sas data set.
Work is a temporary library and deletes after the session.
libref
work.newssalesman --------one level name.
temporary
LIBNAME libref 'sas_library'<options> sa_library should be lessthan 8 characters
.
You can specify the libname at the top of the code.
EG: liname orion 'filepath';
macro variable is reference by "&"
to read the contents of library
proc contents data = libref._ALL_;
run;
nods option to supress description and only for _ALL_ with a space.
Make sure you have assigned libref before you begin the code.
row is called as observation in sas.
column as variable
table as dataset.
missing records in data can be altered by default for character an empty space a
nd for numericals a dot.
formats --informat and lable
sas variables atoz can be started with _ but not numbers
by default proc print displays all variables but you can over ride by using VAR
statement.
sum Salary; for sum var1 var2 var3subset observations by using where.
not equal ^=
** exponention
two where conditions------second where replaces the first.Can use logical operat
ors.
We can add noobs in proc data = orion.sales noobs; to supress the observation nu
mbers.
contains symbol ? includes a 'substring'
where country = 'au' and
job ? 'rep'
We can replace obs by unique id by using
var var1 var2;
ID Customer_Id.
proc sort data. (sorts and replaces original data set un less u use OUT = <outpu
t -SAS-data-set>;
BY ASCENDING var;)
BY COUNTRY DESCENDING SALARY; LIKE GROUP BY COUNTRY.
DEFAULT TITLE the sas system
TITLES AND FOOT NOTES. are global.
title1; after title3 replaces all and deletes 2 and 3 title.
At the end u can end title;
footnotes;
use label var = 'newnameofvariable'; and should add the lable option at the end
of proc print stmnt
split='*'
sas formats:
date formats and adding $ signs.
use format statement.
format salary dollar8. Hire_date mmddyy10;
<$>format<width>.<decimal>
characters will be truncated if not specified correct length
jan 1 1960 have 0 and previous dates will give negative dates.
mmddyy6
mmddyy8
mmddyy10
create your own format by proc format statement;
proc foramt;
value format-name value or -range of value = 'formatted-value1'
can use other = 'AUSTRALIA'(NO QUOTATION FOR OTHER)
PROC FORMAT;
VALUE $CONTRYFMT 'AU'= 'AUSTRALIA'
'uS'= 'USA'
OTHER='MISCODE';
VALUE $SPORTS
'FB'='FOOTBALL';
proc print data = orion.sales lable;
format salary dollar10.birth_date Hire_date monyy7.country $contryfmt.;
run;
low and high can be used in tiers1
Reading data set.
creating data step
data work.subset1;
set orion.sales;
where country ='AU';
run;
NOW SUBSET.1 will have details.
sas date constant----- hire_date<'01jan2000'd; for subsetting using date variabl
e.
Incorrect.
The correct answer is a. If an operand in an arithmetic expression has a missing
value, the result is a missing value.
drop and keep(if the number of variables in keep is lesser than drop then we use
keep)
compilation phase
errors
program data vector(PDV)---AREA OF MEMORY for SAS build one observation_N_,_ERRO
R_.
for each variable a slot is added into the pdv.
descriptive portion
with data set & var names
After successful compilation
and execution phase
reads and writes observations from pdv to output data set.
sas
PDV contains only one observation at any time.
where statement selects the observation when they are readfrom input data set in
to the pdv.
if expression;
if expr2;
if epr3;
IF cannot be used in proc statement; where can be used in proc statement;
In data step we use if statement and where(only reads the data variables from t
he subset and cannot read the assignment variables as they are not in the subset
)
If u use label in the proc statement then they are temporary but if you want to
write the lables to the descriptive sections then we if should ede them ath the
data step;
SImilarly we can permanently format the variable in data set.
if you want to display the label you should add the lable to the proc pritn ste
p;
reading the spreadsheet data set; by using proc print;
sas/access interface to pc files.
sas/access libname statement interface we can connect to third part databases.e
g xls, oracle
libname libref <engine>"workbook-name"<options>;
bitness should be same.
if not we should use engine
libname orionx pcfiles path="&path/sales.xls";
proc contents orionx._all_;
gets all in the library orionx
If name ende with $ then it was from xls.
libref.'worksheetname$'n the coorect of reading the worksheet which enables the
special characters in the work sheet.
printing xls------proc print orionx.'worksheet$'n noobs;
it is important to dis associate the data source.
sas libref puts lock on xls.
libname orinx clear to disassociate;
creating sas data set from xls in a remote.
data work.subset;
set orionx.'Austali$'n;
We do this by merging
data sasdataset;
merge sasdataset1 data set2;
by <ascending/descending>commonvariable
sas donot reinitialize while merging.
If has any values not matching then sas first executes all the matched records a
nd then it re initailizes the pdv fto enter non matching records.
You can use IN (set)
sas reportings:
You can create sas proc freq, proc means, proc univariate.
to out output into the external files using proc ods (output delivery system).
proc freq orion.sales;
tables gender country;
where country ='AU';
RUN;
BY default it gives four columns with frequency,percent,cumulative frequency,cum
ulative percent.
You can supress them by using nocum in the tables statements
tables gender/nocum;
for no percentage we use
tables gender/nocum no percent;
we can use formats for grouping the numericals.eg tiers.
proc freq data = orion.sales;
tables salary;
foemat salar tiers.;
run;
proc freq data = orion.sales;
tables salary gender;
run;
You will get two seperate tables which is not useful. SO we will add by statemen
t after gender statement
proc freq data = orion.sales;
tables gender county;
by country;
run;
but before going to this we need to sort the data set by country. we have to do
this whenever you are using by statement.
eg :
proc sort data = orion.sales;
out = sorted;
by country;
run;
proc freq data = sorted;
tables salary gender;
by country;
run;
cross tabulatations by using * in proc freq;
proc freq data = orion.sales;
tables gender*county;
run;
here gender specifies the rows and country specifies the columns in the output d
ataset.
In cross tabulation results we will have four values in a cell as frequency,perc
ent, row percentage and column percentage.
If you want them in the separate columns then we have to use
tables gender*country/crosslist;
we cannot use nocum for crosstabulations but u can use no percent in these tabul
ations.
We can use /nopercent;
/nofreq;
/nocol;
/no
We can use proc format in the proc freq tables.
We can validate data with proc freq.
To find any duplicates
proc freq data = orion.nonsales2 order=freq;
tables employee_Id/nocum nopercent;
run;
We can also specify nlevels instead of order = freq; to find duplicates
proc freq data = orion.nonsales2 nlevels;
tables employee_Id/nocum nopercent;
run;
STATISTICS:
USING proc means
proc means data= sasdataset;
var analysis variables;
run;
u will get mean and standard deviation minumum and maximum.
We can group variables by using class;
prpoc means data = orion.sales;
var salary;
class gender country;
run;
Requsting specific requests; u want just n and mean
proc means data= sasdataset n mean;
var analysis variables n mean;
run;
proc means data= sasdataset min max sum;
var analysis variables n mean;
run;
If u want you control the decimal point by usind max dec=0;
We can use nmiss to identify the missing data.
proc means data = sasdataset min nmiss max;
Dedecting data outliers by using proc univariate.