Notes For SAS Programming Fall2009
Notes For SAS Programming Fall2009
Econ424
Fall 2009
Why SAS?
Roadmap
Thinking in SAS
Basic rules
Read in data
Data cleaning commands
Summary statistics
Combine two or more datasets
Hypothesis testing
Regression
Thinking in SAS
What is a program?
Algorithm, recipe, set of instructions
Thinking in SAS
Creating a program
What is your problem? (take project 3 as an example)
How can you find a solution?
What steps need to be taken to find an answer?
Do I need to read in data?
What variables do I need?
Where is the data?
What format is the data in?
How do I need to clean the data?
Are there outliers?
Are there any unexpected values in the data?
How do I need to transform the data?
Are the variables in the form that I need?
case insensitive
comment
* this is comment;
/* this is comment */;
Variable names
<=32 characters if SAS 9.0 or above
<=8 characters if SAS 8 or below
case insensitive
DATA newdata;
use the data set called proj3rawdata
set proj3rawdata;
in the temporary library
fracuninsured=uninsured/total;
percentuninsured=fracuninsured*100;
run;
Define new variables
input data
proj3rawdata
obs1
obs2
obs n
define
fracuninsured
define
percentuninsured
output data
newdata
obs1
obs n
Data
source
21
22
=
~=
>
<
>=
<=
in
or
EQ means equals
or
NE means not equal
or
GT means greater than
or
LT means less than
or
GE means greater than or equal
or
LE means less than or equal
means subset
Save data
* Save in sas format;
libname mylib M:\;
data mylib,proj3rawdata3;
set proj3rawdata3;
run;
* Export data to excel;
Proc export data=proj3rawdata3
outfile=M:\proj3data-fromsas.xls
dbms=excel replace;
Run;
No ; here
You can also export a sas data file into a comma delimited text file if you write
dbms=csv.
proc sort
proc sort data=proj3rawdata3;
by year state;
run;
proc sort data=proj3rawdata3
out=proj3rawdata3_sorted;
by year descending fracuninsured;
run;
* note that missing value is always counted as the
smallest;
proc freq
* Remember we already generate a variable called newgrp to
indicate categories of fraction uninsured and a variable called
popgrp to indicate categories of population size;
proc freq data=proj3rawdata3;
tables newgrp
One dimension frequency table
popgrp
newgrp*popgrp;
run;
newgrp
1
high
7500000
0.073
merged:
year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured
2009 MA 6420947 0.0548
high
7500000
0.073
appended:
year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured
2009 MA 6420947 0.0548
1
high
.
.
.
.
.
1
high
7500000
0.073
if one=1 OR two=1;?
append
data appended;
set proj3rawdata3 summary1;
run;
proc print data=appended;
run;
proc print data=merged;
run;
fracuninsured2008 ..
0.0536
0.075 ..
Step 1 to reshape:
generate a sub-sample for each year,
so that we have:
subsample2009
subsample2008
.
subsample2003
focus on 2009
data subsample2009;
set proj3rawdata3;
if year=2009;
run;
data subsample2009;
set subsample2009;
rename totalpop=totalpop2009;
rename insured=insured2009;
rename uninsured=uninsured2009;
rename fracuninsured=fracuninsured2009;
drop year;
run;
Check observations in
reshaped_withavg
proc freq data=reshaped_withavg;
tables myone*mytwo;
run;
2.
3.
4.
Comparison across more than two groups (as matched pairs) requires specific test on regression
coefficients.
H0: 2003 = 2004 test the coefficient of dummy2004=0 because 2003 is set as the benchmark
H0: 2004=2005 test the coefficient of dummy2004 = coefficient of dummy2005.
Test fracuninsured2008=fracuninsured2009;
In class exercise
for mean comparison
Main question:
compare fracuninsured in west, midatlantic and everywhere else, where
west = CA, WA, OR
midatlantic = DE, DC, MD, VA
step 0: define west, midatlantic and everywhereelse
exercise 1: compare west and midatlantic
1(a). as two independent samples
1(b). consider the match by year
exercise 2: compare west, midatlantic, and everywhere else
2(a). as three independent samples;
2(b). consider the match by year;
regression in SAS
Question: how do fracuninsured vary by total population of a state?
* model: fracuninsured=a+b*totalpop+error;
proc reg data=proj3rawdata3;
model fracuninsured=totalpop;
run;
* Add year fixed effects;
* Model: fracuninsured=a+b*totalpop+c1*dummy2004
+c2*dummy2005 + +c51*dummy2009+error;
proc glm data=proj3rawdata3;
class year;
model fracuninsured=totalpop year/solution;
run;
A comprehensive example
A review of
1. readin data
2. summary statistics
3. mean comparison
4. regression
reg-cityreg-simple.sas in N:\share\
B grade if score of 80 to 89
C grade if score of 70 to 79
better
quality
regulation
hygiene
scores
by county
by city
Data complications
(blue font indicates our final choices)
Unit of analysis:
individual restaurant? city? zipcode? census tract?
Unit of time:
each inspection? per month? per quarter? per year?
Define information:
county regulation? city regulation? the date of passing the regulation?
days since passing the regulation? % of days under regulation?
Define quality:
average hygiene score? the number of A restaurants? % of A
restaurants?
real test
reg-cityreg-simple.sas in N:\share\
Questions
How many observations in the sample?
log of the first data step, or output from proc contents
Questions
Questions
Economic theories suggest quality be higher after
the regulation if regulation gives consumers better
information. Is that true?
The summary statistics reported in proc means
(class citym_g or ctym_g) show the average
percentage of A restaurants in different
regulation environments.
Rigorous mean comparison tests are done in
proc glm with waller or lsd options.
Questions
Summary statistics often reflect many economic factors, not
only the one in our mind. That is why we need regressions.
Does more regulation lead to higher quality?
is the coefficient of city regulation positive and
significantly different from zero? (proc reg)
is the coefficient of county regulation positive and
significantly different from zero? (proc reg)
Do we omit other sensible explanations for quality
changes? What are they? (proc glm, year, quarter, city)
course evaluation
University wide:
www.CourseEvalUM.umd.edu
TTclass in particular: (password plstt)
www.surveyshare.com/survey/take/?sid=81087