0% found this document useful (0 votes)

81 views

Notes For SAS Programming Fall2009

This document provides an overview and introduction to SAS programming. It discusses why SAS is useful, where to access SAS, and outlines the basic structure and logic of SAS programs. Key points covered include being able to process large datasets, generate summary statistics and graphs, and the widespread use of SAS in government and private sectors. The document also reviews setting up SAS programs, reading in and cleaning data, performing summary analyses, and combining and analyzing multiple datasets.

Uploaded by

Gowtham Sp

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views

Notes For SAS Programming Fall2009

Uploaded by

Gowtham Sp

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 88

Notes for SAS programming

Econ424
Fall 2009

Why SAS?

Able to process large data set(s)

Easy to cope with multiple variables

Able to track all the operations on the data set(s)

Generate systematic output

Summary statistics
Graphs
Regression results

Most government agencies and private sectors use

SAS

Where to find SAS?

The MS Window version of SAS is only available on campus

class website, computer resources, On-campus computer lab
list computer lab location, hours and software
You can access SAS remotely via glue.umd.edu, but you cannot use the
interactive windows
use a secured telnet (e.g. ssh) to remotely login glue.umd.edu with your
directory ID and password
type tap sas to tell the system that you want to access SAS
use any text editor (say pico) to edit your sas program (say myprog.sas). You
can also create the text-only .sas file in your PC and (securely) ftp it into
glue.
In glue, type sas myprog.sas & will send the sas program to run in the
background.
The computer will automatically generate myprog.log to tell you how each
command runs in sas. If your program produces any output, the output will
be automatically saved in myprog.lst. All in the same directory as your .sas
file.

Roadmap

Thinking in SAS
Basic rules
Read in data
Data cleaning commands
Summary statistics
Combine two or more datasets
Hypothesis testing
Regression

Thinking in SAS
What is a program?
Algorithm, recipe, set of instructions

How is programming done in SAS?

SAS is like programming in any language:
Step by step instructions
Can create your own routines to process data
Most instructions are set up in a logical manner

SAS is NOT like other languages:

Some syntax is peculiar to SAS
Written specifically for statistics so it isnt all-purpose
Canned processes that you cannot edit nor can you see the
code

Thinking in SAS

Creating a program
What is your problem? (take project 3 as an example)
How can you find a solution?
What steps need to be taken to find an answer?
Do I need to read in data?
What variables do I need?
Where is the data?
What format is the data in?
How do I need to clean the data?
Are there outliers?
Are there any unexpected values in the data?
How do I need to transform the data?
Are the variables in the form that I need?

Basic rules (1) organize files

.sas program file

.log notes, errors, warnings
.lst output
.sas7bdat data file
library a cabinet to put data in
Default: Work library
temporary, erased after you close the session
Permanent library
libname mylib m:\;
mylib.mydata
= a sas data file named mydata in library mylib

run and recall .sas

Basic rules (2) -- program

every command ends with ;
format does not matter
if x=1 then y=1; else y=2; is the same as
if x=1
then y=1;
else y=2;

case insensitive
comment
* this is comment;
/* this is comment */;

Basic rule (3) variable

Type
numeric (default, 8 digit, . stands for missing value)
character ($, default 8 digit, blank stands for missing)

Variable names
<=32 characters if SAS 9.0 or above
<=8 characters if SAS 8 or below
case insensitive

Must start with letter or _

_name, my_name, zip5, u_and_me
-name, my-name, 5zip, per%, u&me, my@w, my$sign

Basic rules (4) data step

Data step

create a new data set called newdata in

the temporary library

DATA newdata;
use the data set called proj3rawdata
set proj3rawdata;
in the temporary library
fracuninsured=uninsured/total;
percentuninsured=fracuninsured*100;
run;
Define new variables
input data
proj3rawdata
obs1
obs2

obs n

define
fracuninsured
define
percentuninsured

output data
newdata
obs1

obs n

Basic rules (5) proc step

PROC step
Action

PROC PRINT data=newdata;

var fracuninsured percentuninsured;
title print out new data;
Signal the end of PROC step,
run;
could be ignored if this is
followed by a Data or Proc step

Data
source

Read in data (1) table editor

Tools table editor
choose an existing library or define a new
library
rename variable
save as

Read in data (2) by program

Data format

Datalines (enter the data in the program)

Existing data text, comma or tab delimited
Existing data text, fixed width
From an existing excel file

The following several slides wont be

covered in class. But you are welcome to
use them by yourselves.

Read in data (2) -- datalines

data testdata1;
infile datalines;
input id height weight gender $ age;
datalines;
1 68 144 M 23
2 78 . M 34
No ; until you finish all
3 62 99 F 37
the data lines
;
/* you only need one semicolon at the end of all data lines, but the
semicolon must stand alone in one line */
proc contents data=testdata1; run;
proc print data=testdata1; run;

Read in data (3) more datalines

/* read in data in fixed columns */
data testdata1;
infile datalines;
input id 1 height 2-3 weight 4-6 gender $7 age 8-9;
datalines;
168144M23
278 M34
36299 F37
;

Read in data (4) more datalines

data testdata1;
infile datalines;
input id : 1. height : 2. weight : 3. gender : $1. age : 2.;
datalines;
1 68 144 M 23
2 78 . M 34
3 62 99 F 37
;

Read in data (4) more datalines

*alternatively;
data testdata1;
infile datalines;
informat id 1. height 2. weight 3. gender $1. age 2.;
input id height weight gender age;
datalines;
1 68 144 M 23
2 78 . M 34
3 62 99 F 37
;

Read in data (5) .csv

data testdata1;
infile datalines dlm=, dsd missover;
/* what if you do not have dsd and/or missover */
input id height weight gender $ age ;
datalines;
1, 68, 144, M, 23
2, 78, , M, 34
3, 62, 99, F, 37
; /* what if you forget to type 23 */
run;

Read in data (6) .csv file

/* save the project 3 cleaned raw data in a comma delimited file (.csv) in M:\
before running the following codes */
libname mylib "M:\";
Restrict your filename <=8 characters
filename mycsv "M:\project3-rawdata-cleaned.csv";
data mylib.proj3rawdata;
infile mycsv firstobs=2 dlm=',' dsd missover lrecl=900;
Be consistent with yourself
input year: 4. state: $2. total insured uninsured;
run;
proc contents data=mylib.proj3rawdata;
run;
Be consistent with yourself
proc print data=mylib.proj3rawdata;
run;

Note the options in the infile command!

Read in data (7) from excel

filename myexcel M:\project3-rawdata-cleaned.xls;
proc import datafile=myexcel out=proj3rawdata2
DBMS=excel replace;
run;
No ; until
data proj3rawdata3;
the end of
Be
consistent
with
yourself
set proj3rawdata2;
the whole
fracuninsured=uninsured/total;
sentence
percentuninsured=fracuninsured*100;
run;
proc contents data=proj3rawdata2; run;
proc print data=proj3rawdata2; run;
proc contents data=proj3rawdata3; run;
proc print data=proj3rawdata3; run;

Read in data (7) from excel

Be careful !
SAS will read the first line as variable names, and assume
the raw data start from the second row.
SAS assigns numeric and character type automatically.
Sometime it does make mistake.

Data cleaning (1) if then

Format:
IF condition THEN action;
ELSE IF condition THEN action;
ELSE action;
Note:
(1) the if-then-else can be nested as many as you
want
(2) if you need multiple actions instead of one
action, use DO; action1; action2; END;

Data cleaning (1) if then

=
~=
>
<
>=
<=
in

or
EQ means equals
or
NE means not equal
or
GT means greater than
or
LT means less than
or
GE means greater than or equal
or
LE means less than or equal
means subset

if gender in (M, F) then ..;

Multiple conditions: AND (&), OR(|)

Data cleaning (1) if then

*reading in program of proj3rawdata3 is on page 21;
data proj3rawdata3;
set proj3rawdata3;
IF fracuninsured<0.15 THEN uninsuregrp=0;
ELSE uninsuregrp=1;
run;
proc contents data=proj3rawdata3; run;
proc print data=proj3rawdata3; run;
Note: (1) the code is less efficient if you replace ELSE ..; with
IF fracuninsured>=0.15 THEN ..;
(2) missing value is always counted as the smallest negative, so
fracuninsured=. will satisfy the condition fracuinsured<0.15. If you want
to ignore the missing obs set the condition as 0<=fracuninsured<0.15.

Data cleaning (1) if then

* Multiple actions in each branch;
data proj3rawdata3;
set proj3rawdata3;
IF fracuninsured<0.15 AND uninsured>1000000 THEN DO;
uninsuredgrp=0; uninsuredpop=over 1 million';
END;
the do-end pair
ELSE DO;
acts as brackets
uninsuredgrp=1; uninsuredpop=less than 1 million';
END;
run;
proc print data=proj3rawdata3; run;

Data cleaning (1) if then

*Use if commands to choose a subsample;
data proj3subsample; /* note here we generate a new data set */
set proj3rawdata3;
IF fracuninsured=. Then delete;
If fracuninsured<=0.1;
run;
proc print data=proj3subsample; run;

Data cleaning (1) exercise

still use proj3rawdata.
define newgrp = 1 if fracuninsured <0.1 (low)
2 if 0.1<=fracuninsured<0.15 (mid-low)
3 if 0.15<=fracuninsured<0.2 (mid-high)
4 if fracuninsured>=0.2 (high).

Data cleaning (1) exercise answer

data proj3rawdata3;
set proj3rawdata3;
if fracuninsured<0.1 then newgrp=1;
else if fracuninsured<0.15 then newgrp=2;
else if fracuninsured<0.2 then newgrp=3;
else newgrp=4;
run;
proc contents data=proj3rawdata3; run;
proc print data=proj3rawdata3; run;
Question: What if one observation has fracuninsured=.?

Save data
* Save in sas format;
libname mylib M:\;
data mylib,proj3rawdata3;
set proj3rawdata3;
run;
* Export data to excel;
Proc export data=proj3rawdata3
outfile=M:\proj3data-fromsas.xls
dbms=excel replace;
Run;

No ; here

You can also export a sas data file into a comma delimited text file if you write
dbms=csv.

Pages 31-34 are optional material for

data cleaning.
They are not required, but you may
find them useful in the future.
We skip them in the regular class.

Data cleaning (2)

convert variable type
Numeric to character:
age1=put(age, $2.);
age and age1 have the same contents but different formats
Character to numeric:
age2=input(age1, 1.);
now age and age2 are both numeric, but age2 is chopped at the first
digit
Take a sub string of a character
age3=substr(age1,2,1);
now age3 is a sub string of age1, starting from the second digit of
age1 (the meaning of 2) and having one digit in total (the meaning
of 1).

Data cleaning (2) - example

* we want to convert studid 012345678 to 012-345-678;
data testdata2;
infile datalines;
input studid : 9. studname : $1.;
datalines;
012345678 A
135792468 B
009876543 C
;
proc print; run;
data testdata2;
set testdata2;
if studid LT 1E+7 then studid1= '00||compress(put(studid, $9.));
else if 1E+7 LE studid LT 1E+8 then studid1='0'||compress(put(studid,
$9.));
else studid1= put(studid, $9.);
studid2=substr(studid1,1,3)||'-'||substr(studid1,4,3)||'-'||
substr(studid1,7,3);
proc print; run;

Data cleaning (2) - exercise

You have the following data, variables in sequence are
SSN, score1, score2, score3, score4, score5:
123-45-6789 100 98 96 95 92
344-56-7234 69 79 82 65 88
898-23-1234 80 80 82 86 92
Calculate the average and standard deviation of the five
scores for each individual. Use if-then command to find
out who has the highest average score, and report his SSN
without dashes.

Data summary roadmap

Proc contents variable definitions

Proc print raw data
Proc format make your print look nicer
Proc sort sort the data
Proc means basic summary statistics
Proc univariate detailed summary stat
Proc freq frequency count
Proc chart histogram
proc plot scatter plot

proc format (1)

*Continue the data cleaning exercise on page 29;
Defining group as a
data proj3rawdata3;
numeric variable will
save space
set proj3rawdata3;
if fracuninsured<0.1then newgrp=1;
else if fracuninsured<0.15 then newgrp=2;
else if fracuninsured<0.2 then newgrp=3;
else newgrp=4;
run;

proc format (2)

proc format;
value newgroup 1=low
no ; here until
2=mid-low
3=mid-high
the end of the
4=high;
value command
run;
proc print data=proj3rawdata3;
format newgrp newgroup. fracuninsured 4.2;
title report new group in words;
no ; here until
label fracuninsured=fraction uninsured
the end of the
newgrp=new group 1-4;
label command
var fracuninsured newgrp;
run;

proc sort
proc sort data=proj3rawdata3;
by year state;
run;
proc sort data=proj3rawdata3
out=proj3rawdata3_sorted;
by year descending fracuninsured;
run;
* note that missing value is always counted as the
smallest;

proc means and proc univariate

proc means data=proj3rawdata3;
class newgrp;
var insured uninsured fracuninsured;
run;

By default, proc means report

mean, stdev, min, max
Could choose what to report:
proc means data=proj3rawdata3
n mean median;

proc sort data=proj3rawdata3; by newgrp; run;

proc univariate data=proj3rawdata3;
by newgrp;
var insured uninsured fracuninsured;
run;

By default, proc univariate

report median, and many
other statistics

Notes on proc means and proc univariate

*if you do not use class or by command, the
statistics are based on the full sample. If you use
class or by var x, the statistics are based on the
subsample defined by each value of var x.
*You can use class or by in proc means, but only by
in proc univariate;
*whenever you use by var x, the data set should
be sorted by var x beforehand;

proc means and proc univariate

allow multiple groups
data proj3rawdata3;
set proj3rawdata3;
if totalpop<6000000 then popgrp=low;
else popgrp=high;
run;
proc means data=proj3rawdata3;
class newgrp popgrp;
var fracuninsured;
run;
proc sort data=proj3rawdata3;
by newgrp popgrp;
run;
proc univariate data=proj3rawdata3;
by newgrp popgrp;
var fracuninsured;
run;

proc freq
* Remember we already generate a variable called newgrp to
indicate categories of fraction uninsured and a variable called
popgrp to indicate categories of population size;
proc freq data=proj3rawdata3;
tables newgrp
One dimension frequency table
popgrp
newgrp*popgrp;
run;

Two-dimension frequency table

proc chart histogram for

categorical variables
proc chart data=proj3rawdata3;
title histogram for newgrp;
vbar newgrp;
run;
proc chart data=proj3rawdata3;
title frequency by two variables;
vbar newgrp / group=popgrp;
run;

proc chart histogram for

continuous variable
proc chart data=proj3rawdata3;
title histogram for continuous variable;
vbar fracuninsured;
run;
proc chart data=proj3rawdata3;
title histogram with specific midpoints;
vbar fracuninsured / midpoints=0 to 1 by 0.05;
run;

proc plot scatter plot

proc plot data=proj3rawdata3;
title scatter plot of fracuninsured and
totalpop;
plot fracuninsured*totalpop;
run;

scatter plot is less informative for

categorical variables
proc plot data=proj3rawdata3;
title scatter plot of newgrp and popgrp;
plot newgrp*popgrp;
run;

fancy proc means

proc means data=proj3rawdata3;
class newgrp popgrp;
var uninsured fracuninsured;
output
out = summary1
mean = avguninsured avgfracuninsured;
run;
proc print data=summary1;
run;

The following page may be

useful in practice, but I am not
going to cover it in class.

some summary stat. in proc print

* Assume we have already defined newgrp and
popgrp in proj3rawdata3;
proc sort data=proj3rawdata3; by popgrp; run;
proc print data=proj3rawdata3 n;
where fracuninsured>=0.1;
by popgrp;
sum totalpop;
var totalpop insured uninsured fracuninsured;
run;

How to handle multiple data sets?

Add more observations to an existing data
and the new observations follow the same
data structure as the old one append
Add more variables to an existing data and
the new variables refer to the same
subjects as in the old data merge
Sometimes we may need to change data
structure to fit in append or merge .

merge and append

proj3rawdata3: year state totalpop fracuninsured newgrp popgrp
2009 MA 6420947 0.0548
summary1:

newgrp
1

high

popgrp avguninsured avgfracuninsured

high

7500000

0.073

merged:
year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured
2009 MA 6420947 0.0548

high

7500000

0.073

appended:
year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured
2009 MA 6420947 0.0548
1
high
.
.
.
.
.
1
high
7500000
0.073

merge two datasets

proc sort data=proj3rawdata3;
by newgrp popgrp;
run;
proc sort data=summary1;
by newgrp popgrp;
run;
data merged;
merge proj3rawdata3 (in=one) summary1 (in=two);
by newgrp popgrp;
if one=1 & two=1;
run;
What if this line is

if one=1 OR two=1;?

Keep track of matched and

unmatched records
data allrecords;
merge proj3rawdata3 (in=one) summary1 (in=two);
by newgrp popgrp;
myone=one;
SAS will drop variables
mytwo=two;
one and two
if one=1 or two=1;
automatically at the end of
run;
the DATA step. If you want
proc freq data=allrecords;
to keep them, you can copy
them into new variables
tables myone*mytwo;
myone and mytwo
run;

be careful about merge!

always put the merged data into a new data set
must sort by the key variables before merge
ok for one-to-one, multi-to-one, one-to-multi, but
no good for multi-to-multi
be careful of what records you want to keep, and
what records you want to delete
what if variable x appears in both datasets, but x is
not in the by statement?
after the merge x takes the value defined in the last
dataset of the merge statement

append
data appended;
set proj3rawdata3 summary1;
run;
proc print data=appended;
run;
proc print data=merged;
run;

Class example of merge and

append: reshape and summarize
Task1: reshape proj3rawdata3 from long to wide
Task2: generate average fracuninsured per state and merge it
back to the main data
Source format of Proj3rawdata3 (long):
year state totalpop fracuninsured .
2009 MA 6420947 0.0548
.
2009 HI 1257622 0.078
.
.
2008 MA 6339513 0.0536
.
2008 HI 1267409 0.075
..
Target format (wide)
state totalpop2009 fracuninsured2009 . .. Totalpop2008
MA 6420947
0.0548 .
6339513
HI 1257622
0.078
.
1267409

fracuninsured2008 ..
0.0536
0.075 ..

Main data issues

From long to wide, variable names are
different except for the key variable (state)
We can generate average fracuninsured per
state either before or after the reshape, but
the merge code will be different depending
on when we compute the average
fracuninsured

Step 1 to reshape:
generate a sub-sample for each year,
so that we have:
subsample2009
subsample2008
.
subsample2003

focus on 2009
data subsample2009;
set proj3rawdata3;
if year=2009;
run;
data subsample2009;
set subsample2009;
rename totalpop=totalpop2009;
rename insured=insured2009;
rename uninsured=uninsured2009;
rename fracuninsured=fracuninsured2009;
drop year;
run;

Same for 2008

data subsample2008;
set proj3rawdata3;
if year=2008;
run;
data subsample2008;
set subsample2008;
rename totalpop=totalpop2008;
rename insured=insured2008;
rename uninsured=uninsured2008;
rename fracuninsured=fracuninsured2008;
drop year;
run;

Step 2 to reshape: merge each years

subsample by state
proc sort data=subsample2009; by state; run;

proc sort data=subsample2003; by state; run;

data reshaped;
merge subsample2009 subsample2008 subsample2007
subsample2006 subsample2005 subsample2004
subsample2003;
by state;
run;
proc print data=reshaped; run;

Generate average fracuninsured

by state
proc means data=proj3rawdata3;
class state;
var fracuninsured;
output out=avgperstate
mean=avgfrac_bystate;
run;

Merge the data file avgperstate

back to reshaped
proc sort data=reshaped;
by state;
run;
proc sort data=avgperstate;
by state;
run;
data reshaped_withavg;
merge reshaped (in=one) avgperstate (in=two);
by state;
myone=one;
mytwo=two;
if one=1 | two=1;
run;
proc print data=reshaped_withavg; run;

Check observations in
reshaped_withavg
proc freq data=reshaped_withavg;
tables myone*mytwo;
run;

mean comparison: two groups

Example: does the average fracuninsured differ between 2008 and 2009?
H0: mean of fracuninsured2008 = mean of fracuninsured2009.
H1: mean of fracuninsured2008 not equal to mean of fracuninsured2009.
This is a two-tail mean-comparison test between the 2008 sample and the
2009 sample.
The test result will be different if
(1)
(2)

we treat 2008 and 2009 as two independent samples; or

We treat 2008 and 2009 as matched pairs (matched by state).

mean comparison: two groups

SAS performs mean comparison in a regression framework.
H0: mean of fracuninsured2008 = mean of fracuninsured2009.
H1: mean of fracuninsured2008 not equal to mean of fracuninsured2009.
Step 1: focus on the subsample that has 2008 and 2009 data only.
Step 2: create a binary variable dummy2009=1 if year=2009, 0 if year=2008.
Step 3: depend on whether 2008 and 2009 are independent samples or matched pairs.
If independent samples, regress fracuninsured as:
fracuninsured = a + b* dummy2009 + error
If matched pairs, regress fracuninsured as:
fracuninsured = a + b * dummy2009
+ c1* dummy_AL + c2* dummy_AK
+ +c51*dummy_WY + error

mean comparison: two groups as

two independent samples
* Focus on 2008 and 2009 data only;
Data subsample0809;
Set proj3rawdata3;
If year=2008 or year=2009;
If year=2009 then dummy2009=1; else dummy2009=0;
Run;
* Treat 2008 and 2009 as two independent samples;
Proc reg data=subsample0809;
Model fracuninsured=dummy2009;
Run;
The difference between 2008 and 2009 is captured in the coefficient of dummy2009.
So the hypothesis test is equivalent to:
H0: coefficient of dummy2009=0.
H1: coefficient of dummy2009 not equal to 0.

mean comparison: two groups as

matched pairs (matched by state)
* Focus on 2008 and 2009 data only;
data subsample0809;
set proj3rawdata3;
if year=2008 or year=2009;
if year=2009 then dummy2009=1; else dummy2009=0;
run;
* Treat 2008 and 2009 as matched pairs (matched by state);
proc glm data=subsample0809;
class state;
model fracuninsured =dummy2009 state/solution;
run;
The within-state difference between 2008 and 2009 is captured in the coefficient of dummy2009.
So the hypothesis test is equivalent to:
H0: coefficient of dummy2009=0.
H1: coefficient of dummy2009 not equal to 0.

mean comparison: more than two groups

Example: does the average fracuninsured differ between any two years
in our data? (here year = 2003, 2004, 2009)
The test result will be different if
(1)
(2)

we treat year t and year t as two independent samples; or

We treat year t and year t as matched pairs (matched by state).

mean comparison: every year as an

independent sample
* Treat every year as an independent sample;
* Now we need to use the whole data of proj3rawdata3;
proc glm data=proj3rawdata3;
class year;
model fracuninsured=year;
means year/waller;
means year/lsd cldiff;
run;

mean comparison: every year as

matched pairs (matched by state)
* Treat every year as matched by state;
* First define dummies for each year;
data proj3rawdata3;
set proj3rawdata3;
if year=2004 then dummy2004=1; else dummy2004=0;
if year=2005 then dummy2005=1; else dummy2005=0;

if year=2009 then dummy2009=1; else dummy2009=0;

run;
proc glm data=proj3rawdata3;
class state;
model fracuninsured=dummy2004 dummy2005 dummy2006 dummy2007 dummy2008
dummy2009 state/solution;
run;
The coefficient of dummy2004 captures the difference between 2003 and 2004.
The coefficient of dummy2005 captures the difference between 2003 and 2005.

notes on mean comparison

The logic of mean comparison is the same as in Excel

Be careful about one-tail and two-tail tests.

The standard SAS output of coefficient t-stat and p-value are based on a two-tail test of H0: coeff=0,
but could be used for a one-tail test if we compare 1-alpha vs. p-value/2 instead of p-value.

Comparison across more than two groups (as independent samples)

H0: all groups have the same mean
F-test of whole regression
OR
H0: group x and group y has the same mean
waller or lsd statistics

Comparison across more than two groups (as matched pairs) requires specific test on regression
coefficients.
H0: 2003 = 2004 test the coefficient of dummy2004=0 because 2003 is set as the benchmark
H0: 2004=2005 test the coefficient of dummy2004 = coefficient of dummy2005.

Explicit hypothesis tests

in proc reg and proc glm
* Treating each year as independent samples;
Proc reg data=proj3rawdata3;
Model fracuninsured=dummy2004 dummy2005 dummy2006 dummy2007 dummy2008
dummy2009;
Test dummy2008=0;
Test fracuninsured2003=fracuninsured2008;
Test dummy2008=dummy2009;
Run;
Test fracuninsured2008=fracuninsured2009;
* Treating each year as matched by state;
Proc glm data=proj3rawdata3;
Class year state;
Model fracuninsured=year state/solution;
Contrast test 2003 vs. 2008
year 1 0 0 0 -1 0;
Contrast test 2008 vs. 2009
year 0 0 0 0 1 -1;
Test fracuninsured2003=fracuninsured2008;
Run;

Test fracuninsured2008=fracuninsured2009;

In class exercise
for mean comparison
Main question:
compare fracuninsured in west, midatlantic and everywhere else, where
west = CA, WA, OR
midatlantic = DE, DC, MD, VA
step 0: define west, midatlantic and everywhereelse
exercise 1: compare west and midatlantic
1(a). as two independent samples
1(b). consider the match by year
exercise 2: compare west, midatlantic, and everywhere else
2(a). as three independent samples;
2(b). consider the match by year;

regression in SAS
Question: how do fracuninsured vary by total population of a state?
* model: fracuninsured=a+b*totalpop+error;
proc reg data=proj3rawdata3;
model fracuninsured=totalpop;
run;
* Add year fixed effects;
* Model: fracuninsured=a+b*totalpop+c1*dummy2004
+c2*dummy2005 + +c51*dummy2009+error;
proc glm data=proj3rawdata3;
class year;
model fracuninsured=totalpop year/solution;
run;

A comprehensive example
A review of
1. readin data
2. summary statistics
3. mean comparison
4. regression
reg-cityreg-simple.sas in N:\share\

A Case Study of Los Angeles Restaurants

Nov. 16-18, 1997 CBS 2 News Behind the Kitchen Door

January 16, 1998, LA county inspectors start issuing hygiene

grade cards

A grade if score of 90 to 100

B grade if score of 80 to 89

C grade if score of 70 to 79

score below 70 actual score shown

Grade cards are prominently displayed

in restaurant windows

Score not shown on grade cards

Take the idea to data

Research Question:
Does better information lead to better hygiene quality?
better
Information

better
quality

regulation

hygiene
scores

by county
by city

Data complications
(blue font indicates our final choices)
Unit of analysis:
individual restaurant? city? zipcode? census tract?
Unit of time:
each inspection? per month? per quarter? per year?
Define information:
county regulation? city regulation? the date of passing the regulation?
days since passing the regulation? % of days under regulation?
Define quality:
average hygiene score? the number of A restaurants? % of A
restaurants?

How to test the idea?

Regression:
quality = +*information+error
+something else?
Something else could be:
year trend, seasonality, city specific effects, .

real test
reg-cityreg-simple.sas in N:\share\

Questions
How many observations in the sample?
log of the first data step, or output from proc contents

How many variables in the sample? How many

are numerical, how many are characters?
Output from proc contents

How many percentage of restaurants have A

quality in a typical city-month?
Output from proc means, on per_A

Questions

What is the difference between cityreg and ctyreg? We know county

regulation came earlier than city regulation, is that reflected in our
data?
Yes, cityreg<=ctyreg in every observation
We can check this in proc means for cityreg and ctyreg, or add a proc
print to eyeball each obs

What is the difference between cityreg and citymper? What is the

mean of cityreg? What is the mean of citymper? Are they consistent
with their definitions?
The unit of cityreg is # of days, so it should be a non-negative integer
The unit of citymper is % of days, so it should be a real number between
0 and 1
To check this, we can add a proc means for cityreg and citymper

Questions
Economic theories suggest quality be higher after
the regulation if regulation gives consumers better
information. Is that true?
The summary statistics reported in proc means
(class citym_g or ctym_g) show the average
percentage of A restaurants in different
regulation environments.
Rigorous mean comparison tests are done in
proc glm with waller or lsd options.

Questions
Summary statistics often reflect many economic factors, not
only the one in our mind. That is why we need regressions.
Does more regulation lead to higher quality?
is the coefficient of city regulation positive and
significantly different from zero? (proc reg)
is the coefficient of county regulation positive and
significantly different from zero? (proc reg)
Do we omit other sensible explanations for quality
changes? What are they? (proc glm, year, quarter, city)

Count duplicates (not required)

http://support.sas.com/ctx/samples/index.jsp?sid=67&tab=c
ode
data dups nodups ;
set clasdata ;
by name class ;
/* If the combination of NAME and CLASS is in the data
set once, output NODUPS, else output DUPS. */
if first.class and last.class then output nodups ;
else output dups ;
run;

course evaluation
University wide:
www.CourseEvalUM.umd.edu
TTclass in particular: (password plstt)
www.surveyshare.com/survey/take/?sid=81087

54 Marketo Questions
No ratings yet
54 Marketo Questions
23 pages
Learning SAS by Example A Programmers Guide Answers
50% (2)
Learning SAS by Example A Programmers Guide Answers
42 pages
Oracle CPQ 2021: Replace (STR, Old, New, (N) )
No ratings yet
Oracle CPQ 2021: Replace (STR, Old, New, (N) )
30 pages
Sas Cheat Sheet
No ratings yet
Sas Cheat Sheet
3 pages
Applied Statistics and The SAS Programming 5th Edition
0% (2)
Applied Statistics and The SAS Programming 5th Edition
44 pages
Sas 1
100% (2)
Sas 1
50 pages
Step by Step Procedure - Yuktdhara
No ratings yet
Step by Step Procedure - Yuktdhara
51 pages
Notes For SAS Programming Fall2009
No ratings yet
Notes For SAS Programming Fall2009
88 pages
Notes
No ratings yet
Notes
163 pages
Sascheatsheet 170401221255
100% (1)
Sascheatsheet 170401221255
29 pages
2 - Getting Your Data Into SAS The Basics
No ratings yet
2 - Getting Your Data Into SAS The Basics
54 pages
SAS 201 - Copy - Copy
No ratings yet
SAS 201 - Copy - Copy
17 pages
SAS 101 - Copy (2)
No ratings yet
SAS 101 - Copy (2)
17 pages
Week 03 - Chapters 5 & 6
No ratings yet
Week 03 - Chapters 5 & 6
60 pages
Here Is The Output Produced by The Proc Print Statement Above
No ratings yet
Here Is The Output Produced by The Proc Print Statement Above
6 pages
SAS 101 - Copy (3)
No ratings yet
SAS 101 - Copy (3)
17 pages
SAS 201 - Copy - Copy (4)
No ratings yet
SAS 201 - Copy - Copy (4)
17 pages
SAS 101 - Copy (5) - Copy
No ratings yet
SAS 101 - Copy (5) - Copy
17 pages
SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)
No ratings yet
SAS: What You Need To Know To Write A SAS Program: Data Definition and Options Data Step Procedure(s)
8 pages
Summary Syntax SAS
No ratings yet
Summary Syntax SAS
6 pages
Sas Programming
No ratings yet
Sas Programming
30 pages
SAS 101 - Copy (4)
No ratings yet
SAS 101 - Copy (4)
17 pages
Extra Interview Question On Programs
No ratings yet
Extra Interview Question On Programs
11 pages
SAS 101 - Copy (6) - Copy
No ratings yet
SAS 101 - Copy (6) - Copy
17 pages
Write SAS® Code To Generate Another SAS® Program A Dynamic Way To Get Your Data Into SAS®
No ratings yet
Write SAS® Code To Generate Another SAS® Program A Dynamic Way To Get Your Data Into SAS®
8 pages
Day 1
No ratings yet
Day 1
13 pages
S A S Guide
No ratings yet
S A S Guide
33 pages
SAS Material
No ratings yet
SAS Material
75 pages
Rsudio Problems
No ratings yet
Rsudio Problems
27 pages
SAS Programming by Example (1) : Chapter 1 Input and Infile
No ratings yet
SAS Programming by Example (1) : Chapter 1 Input and Infile
10 pages
Reading Data in SAS
No ratings yet
Reading Data in SAS
9 pages
DATA Cars1
No ratings yet
DATA Cars1
6 pages
5 Creating SAS Data Sets From External Files
No ratings yet
5 Creating SAS Data Sets From External Files
10 pages
Debugging Code & Data Import/Export
No ratings yet
Debugging Code & Data Import/Export
16 pages
Download Complete Cody s Data Cleaning Techniques Using SAS 2nd ed Edition Ron Cody PDF for All Chapters
No ratings yet
Download Complete Cody s Data Cleaning Techniques Using SAS 2nd ed Edition Ron Cody PDF for All Chapters
51 pages
Business Econometrics Using SAS Tools (BEST) : Class II - Introducing SAS Code
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class II - Introducing SAS Code
15 pages
SAS Programming Skills
100% (1)
SAS Programming Skills
19 pages
CH 2. DATA Step
No ratings yet
CH 2. DATA Step
30 pages
Sas Data Statement
No ratings yet
Sas Data Statement
17 pages
base-programming-ref-sheet
No ratings yet
base-programming-ref-sheet
4 pages
Sas Sample Syntax
No ratings yet
Sas Sample Syntax
34 pages
A Hands-On Introduction To SAS Programming: Casey Cantrell, Clarion Consulting, Los Angeles, CA
No ratings yet
A Hands-On Introduction To SAS Programming: Casey Cantrell, Clarion Consulting, Los Angeles, CA
21 pages
SAS Tips
No ratings yet
SAS Tips
34 pages
Introduction To SAS
No ratings yet
Introduction To SAS
6 pages
SAS Training: SAS Environment and Concepts of Libraries
No ratings yet
SAS Training: SAS Environment and Concepts of Libraries
99 pages
SAS Basics - Part 1
No ratings yet
SAS Basics - Part 1
30 pages
HPH 562 Data Management & Informatics: Lecture 2: Getting Your Data Into SAS
No ratings yet
HPH 562 Data Management & Informatics: Lecture 2: Getting Your Data Into SAS
23 pages
Sas
No ratings yet
Sas
36 pages
Cody s Data Cleaning Techniques Using SAS 2nd ed Edition Ron Cody - The ebook is ready for instant download and access
100% (1)
Cody s Data Cleaning Techniques Using SAS 2nd ed Edition Ron Cody - The ebook is ready for instant download and access
46 pages
Examination Content: Required Exam
No ratings yet
Examination Content: Required Exam
18 pages
SaS Notes
No ratings yet
SaS Notes
8 pages
LBSIM Business Analytics Slides - Day 6 (1)
No ratings yet
LBSIM Business Analytics Slides - Day 6 (1)
33 pages
DATALINES, Sequential Files, CSV, HTML and More - Using INFILE and INPUT Statements To Introduce External Data Into The SAS System
No ratings yet
DATALINES, Sequential Files, CSV, HTML and More - Using INFILE and INPUT Statements To Introduce External Data Into The SAS System
18 pages
SAS Ex
No ratings yet
SAS Ex
34 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Spanish Language Influences On Written English: Betty Rizzo and Santiago Villafane
No ratings yet
Spanish Language Influences On Written English: Betty Rizzo and Santiago Villafane
10 pages
200 Prep Questions: Downloads/200 - PMP - Sample - Questions PDF
No ratings yet
200 Prep Questions: Downloads/200 - PMP - Sample - Questions PDF
1 page
Fusion - Wikipedia
No ratings yet
Fusion - Wikipedia
3 pages
Very Large Scale Integration
No ratings yet
Very Large Scale Integration
3 pages
Hubble Space Telescope - Wikipedia
No ratings yet
Hubble Space Telescope - Wikipedia
3 pages
Lambda-CDM Model - Wikipedia
No ratings yet
Lambda-CDM Model - Wikipedia
2 pages
Semiconductor - Wikipedia
No ratings yet
Semiconductor - Wikipedia
2 pages
Diode - Wikipedia
No ratings yet
Diode - Wikipedia
3 pages
శ్రీ గురు చరిత్ర - రెండవ భాగం
No ratings yet
శ్రీ గురు చరిత్ర - రెండవ భాగం
36 pages
Planck Time - Wikipedia
No ratings yet
Planck Time - Wikipedia
2 pages
Nanomaterials For Nonlinear Optical (Nlo) Applications: A Review
No ratings yet
Nanomaterials For Nonlinear Optical (Nlo) Applications: A Review
11 pages
Currency
No ratings yet
Currency
3 pages
Comparative Study On Low-Power High-Performance Standard-Cell Flip-Flops
No ratings yet
Comparative Study On Low-Power High-Performance Standard-Cell Flip-Flops
9 pages
Low-Power and Area-Efficient Shift Register Using Pulsed Latches
No ratings yet
Low-Power and Area-Efficient Shift Register Using Pulsed Latches
16 pages
22.doc Front End
No ratings yet
22.doc Front End
58 pages
Low-Power and Area-Efficient Shift Register Using Pulsed Latch
No ratings yet
Low-Power and Area-Efficient Shift Register Using Pulsed Latch
5 pages
Analysis of Low-Power and Area-Efficient Shift Registers Using Pulsed Latch
No ratings yet
Analysis of Low-Power and Area-Efficient Shift Registers Using Pulsed Latch
5 pages
Windrose Pro User'S Guide: Windrose Pro Is A Software For Analysing and Drawing Directional Data
No ratings yet
Windrose Pro User'S Guide: Windrose Pro Is A Software For Analysing and Drawing Directional Data
98 pages
Syslib rm014 - en e
No ratings yet
Syslib rm014 - en e
52 pages
Data Merge InDesign
No ratings yet
Data Merge InDesign
14 pages
Paper 2 Databases
100% (2)
Paper 2 Databases
41 pages
Konect User Manual V2.3
No ratings yet
Konect User Manual V2.3
76 pages
GSTR 1
No ratings yet
GSTR 1
69 pages
Summer Internship Report
No ratings yet
Summer Internship Report
25 pages
Essential Admin MySQL5.5 1
No ratings yet
Essential Admin MySQL5.5 1
7 pages
Page-2
No ratings yet
Page-2
9 pages
Incident Report NG and Investigation: Oaky Creek
No ratings yet
Incident Report NG and Investigation: Oaky Creek
17 pages
LTMC 2308 Latest
No ratings yet
LTMC 2308 Latest
49 pages
GSTR 1
No ratings yet
GSTR 1
70 pages
Sales Force Data Loader
No ratings yet
Sales Force Data Loader
41 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
9626_s24_qp_02
No ratings yet
9626_s24_qp_02
4 pages
12_computer_science_sp_09_solution
No ratings yet
12_computer_science_sp_09_solution
17 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
CUSTOM
No ratings yet
CUSTOM
40 pages
PET User Manual
No ratings yet
PET User Manual
24 pages
External Tables in Oracle Database 11g
No ratings yet
External Tables in Oracle Database 11g
9 pages
How To Export An ASCII File Form The TSC2 Using Survey Controller
No ratings yet
How To Export An ASCII File Form The TSC2 Using Survey Controller
5 pages
Workforce Int MNGR v8 Advanced Int Prog (Practice)
No ratings yet
Workforce Int MNGR v8 Advanced Int Prog (Practice)
38 pages
GOHFER-3D-Basic-Frac-Design-Workflow
No ratings yet
GOHFER-3D-Basic-Frac-Design-Workflow
32 pages
Wildlife Sanctuary 1
No ratings yet
Wildlife Sanctuary 1
25 pages
Cbse Computer Science (3) (1)
No ratings yet
Cbse Computer Science (3) (1)
25 pages
CSCA08H Assignment 3
No ratings yet
CSCA08H Assignment 3
12 pages