0% found this document useful (0 votes)

127 views

Notes For SAS Programming Fall2009

The document provides notes on SAS programming. It discusses why SAS is useful for processing large datasets, summarizing data, and performing regression. It outlines where to find SAS on campus and how to access it remotely. It presents a roadmap for learning SAS, covering basic rules, reading in data, data cleaning, summary statistics, combining datasets, and regression. The document provides guidance on setting up SAS programs and working with variables, data steps, and proc steps.

Uploaded by

Puli Sreenivasulu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views

Notes For SAS Programming Fall2009

Uploaded by

Puli Sreenivasulu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 88

Notes for SAS programming

Econ424 Fall 2009

Why SAS?
Able to process large data set(s)

Easy to cope with multiple variables

Able to track all the operations on the data set(s) Generate systematic output Summary statistics Graphs

Regression results
Most government agencies and private sectors use SAS

Where to find SAS?

The MS Window version of SAS is only available on campus class website, computer resources, On-campus computer lab list computer lab location, hours and software You can access SAS remotely via glue.umd.edu, but you cannot use the interactive windows use a secured telnet (e.g. ssh) to remotely login glue.umd.edu with your directory ID and password type tap sas to tell the system that you want to access SAS use any text editor (say pico) to edit your sas program (say myprog.sas). You can also create the text-only .sas file in your PC and (securely) ftp it into glue. In glue, type sas myprog.sas & will send the sas program to run in the background. The computer will automatically generate myprog.log to tell you how each command runs in sas. If your program produces any output, the output will be automatically saved in myprog.lst. All in the same directory as your .sas file.

Roadmap
Thinking in SAS Basic rules Read in data Data cleaning commands Summary statistics Combine two or more datasets Hypothesis testing Regression

Thinking in SAS
What is a program?
Algorithm, recipe, set of instructions

How is programming done in SAS?

SAS is like programming in any language:
Step by step instructions Can create your own routines to process data Most instructions are set up in a logical manner

SAS is NOT like other languages:

Some syntax is peculiar to SAS Written specifically for statistics so it isnt all-purpose Canned processes that you cannot edit nor can you see the code

Thinking in SAS
Creating a program What is your problem? (take project 3 as an example) How can you find a solution? What steps need to be taken to find an answer? Do I need to read in data? What variables do I need? Where is the data? What format is the data in? How do I need to clean the data? Are there outliers? Are there any unexpected values in the data? How do I need to transform the data? Are the variables in the form that I need?

Basic rules (1) organize files

.sas program file .log notes, errors, warnings .lst output .sas7bdat data file library a cabinet to put data in
Default: Work library temporary, erased after you close the session Permanent library libname mylib m:\; mylib.mydata
= a sas data file named mydata in library mylib

run and recall .sas

Basic rules (2) -- program

every command ends with ; format does not matter
if x=1 then y=1; else y=2; is the same as if x=1 then y=1; else y=2;

case insensitive comment

* this is comment; /* this is comment */;

Basic rule (3) variable

Type
numeric (default, 8 digit, . stands for missing value) character ($, default 8 digit, blank stands for missing)

Variable names
<=32 characters if SAS 9.0 or above <=8 characters if SAS 8 or below case insensitive

Must start with letter or _

_name, my_name, zip5, u_and_me -name, my-name, 5zip, per%, u&me, my@w, my$sign

Basic rules (4) data step

Data step
create a new data set called newdata in the temporary library

DATA newdata; use the data set called proj3rawdata in set proj3rawdata; the temporary library fracuninsured=uninsured/total; percentuninsured=fracuninsured*100; run; Define new variables
input data proj3rawdata define fracuninsured define percentuninsured output data newdata

obs1
obs2 obs n

obs1
obs n

Basic rules (5) proc step

PROC step
Action Data source

PROC PRINT data=newdata; var fracuninsured percentuninsured; title print out new data; run; Signal the end of PROC step, could
be ignored if this is followed by a Data or Proc step

Read in data (1) table editor

Tools table editor choose an existing library or define a new library rename variable save as

Read in data (2) by program

Data format
Datalines (enter the data in the program) Existing data text, comma or tab delimited Existing data text, fixed width From an existing excel file

The following several slides wont be covered in class. But you are welcome to use them by yourselves.

Read in data (2) -- datalines

data testdata1; infile datalines; input id height weight gender $ age; datalines; 1 68 144 M 23 2 78 . M 34 No ; until you finish all 3 62 99 F 37 the data lines ; /* you only need one semicolon at the end of all data lines, but the semicolon must stand alone in one line */ proc contents data=testdata1; run; proc print data=testdata1; run;

Read in data (3) more datalines

/* read in data in fixed columns */ data testdata1; infile datalines; input id 1 height 2-3 weight 4-6 gender $7 age 8-9; datalines; 168144M23 278 M34 36299 F37 ;

Read in data (4) more datalines

data testdata1; infile datalines; input id : 1. height : 2. weight : 3. gender : $1. age : 2.; datalines; 1 68 144 M 23 2 78 . M 34 3 62 99 F 37 ;

Read in data (4) more datalines

*alternatively; data testdata1; infile datalines; informat id 1. height 2. weight 3. gender $1. age 2.; input id height weight gender age; datalines; 1 68 144 M 23 2 78 . M 34 3 62 99 F 37 ;

Read in data (5) .csv

data testdata1; infile datalines dlm=, dsd missover; /* what if you do not have dsd and/or missover */ input id height weight gender $ age ; datalines; 1, 68, 144, M, 23 2, 78, , M, 34 3, 62, 99, F, 37 ; /* what if you forget to type 23 */ run;

Read in data (6) .csv file

/* save the project 3 cleaned raw data in a comma delimited file (.csv) in M:\ before running the following codes */ Restrict your filename <=8 characters libname mylib "M:\"; filename mycsv "M:\project3-rawdata-cleaned.csv"; data mylib.proj3rawdata; Be consistent with yourself infile mycsv firstobs=2 dlm=',' dsd missover lrecl=900; input year: 4. state: $2. total insured uninsured; Be consistent with yourself run;
proc contents data=mylib.proj3rawdata; run; proc print data=mylib.proj3rawdata; run;

Note the options in the infile command!

Read in data (7) from excel

filename myexcel M:\project3-rawdata-cleaned.xls; proc import datafile=myexcel out=proj3rawdata2 DBMS=excel replace; run; No ; until data proj3rawdata3; the end of Be consistent with yourself set proj3rawdata2; the whole fracuninsured=uninsured/total; sentence percentuninsured=fracuninsured*100; run; proc contents data=proj3rawdata2; run; proc print data=proj3rawdata2; run; proc contents data=proj3rawdata3; run; proc print data=proj3rawdata3; run;
21

Read in data (7) from excel

Be careful ! SAS will read the first line as variable names, and assume the raw data start from the second row. SAS assigns numeric and character type automatically. Sometime it does make mistake.

Data cleaning (1) if then

Format: IF condition THEN action; ELSE IF condition THEN action; ELSE action; Note: (1) the if-then-else can be nested as many as you want (2) if you need multiple actions instead of one action, use DO; action1; action2; END;

Data cleaning (1) if then

= ~= > < >= <= in or or or or or or EQ NE GT LT GE LE means equals means not equal means greater than means less than means greater than or equal means less than or equal means subset

if gender in (M, F) then ..;

Multiple conditions: AND (&), OR(|)

Data cleaning (1) if then

*reading in program of proj3rawdata3 is on page 21; data proj3rawdata3; set proj3rawdata3; IF fracuninsured<0.15 THEN uninsuregrp=0; ELSE uninsuregrp=1; run; proc contents data=proj3rawdata3; run; proc print data=proj3rawdata3; run; Note: (1) the code is less efficient if you replace ELSE ..; with IF fracuninsured>=0.15 THEN ..; (2) missing value is always counted as the smallest negative, so fracuninsured=. will satisfy the condition fracuinsured<0.15. If you want to ignore the missing obs set the condition as 0<=fracuninsured<0.15.

Data cleaning (1) if then

* Multiple actions in each branch;
data proj3rawdata3; set proj3rawdata3; IF fracuninsured<0.15 AND uninsured>1000000 THEN DO; uninsuredgrp=0; uninsuredpop=over 1 million'; END; the do-end pair ELSE DO; acts as brackets uninsuredgrp=1; uninsuredpop=less than 1 million'; END; run; proc print data=proj3rawdata3; run;

Data cleaning (1) if then

*Use if commands to choose a subsample;
data proj3subsample; /* note here we generate a new data set */ set proj3rawdata3; IF fracuninsured=. Then delete; If fracuninsured<=0.1; run; proc print data=proj3subsample; run;

Data cleaning (1) exercise

still use proj3rawdata.
define newgrp = 1 if fracuninsured <0.1 (low) 2 if 0.1<=fracuninsured<0.15 (mid-low) 3 if 0.15<=fracuninsured<0.2 (mid-high) 4 if fracuninsured>=0.2 (high).

Data cleaning (1) exercise answer

data proj3rawdata3; set proj3rawdata3; if fracuninsured<0.1 then newgrp=1; else if fracuninsured<0.15 then newgrp=2; else if fracuninsured<0.2 then newgrp=3; else newgrp=4; run; proc contents data=proj3rawdata3; run; proc print data=proj3rawdata3; run; Question: What if one observation has fracuninsured=.?

Save data
* Save in sas format; libname mylib M:\; data mylib,proj3rawdata3; set proj3rawdata3; run; * Export data to excel; Proc export data=proj3rawdata3 outfile=M:\proj3data-fromsas.xls dbms=excel replace; Run;

No ; here

You can also export a sas data file into a comma delimited text file if you write dbms=csv.

Pages 31-34 are optional material for data cleaning.

They are not required, but you may find them useful in the future. We skip them in the regular class.

Data cleaning (2) convert variable type

Numeric to character: age1=put(age, $2.); age and age1 have the same contents but different formats Character to numeric: age2=input(age1, 1.); now age and age2 are both numeric, but age2 is chopped at the first digit Take a sub string of a character age3=substr(age1,2,1); now age3 is a sub string of age1, starting from the second digit of age1 (the meaning of 2) and having one digit in total (the meaning of 1).

Data cleaning (2) - example

* we want to convert studid 012345678 to 012-345-678; data testdata2; infile datalines; input studid : 9. studname : $1.; datalines; 012345678 A 135792468 B 009876543 C ; proc print; run; data testdata2; set testdata2; if studid LT 1E+7 then studid1= '00||compress(put(studid, $9.)); else if 1E+7 LE studid LT 1E+8 then studid1='0'||compress(put(studid, $9.)); else studid1= put(studid, $9.); studid2=substr(studid1,1,3)||'-'||substr(studid1,4,3)||''||substr(studid1,7,3); proc print; run;

Data cleaning (2) - exercise

You have the following data, variables in sequence are SSN, score1, score2, score3, score4, score5: 123-45-6789 100 98 96 95 92 344-56-7234 69 79 82 65 88 898-23-1234 80 80 82 86 92 Calculate the average and standard deviation of the five scores for each individual. Use if-then command to find out who has the highest average score, and report his SSN without dashes.

Data summary roadmap

Proc contents variable definitions Proc print raw data Proc format make your print look nicer Proc sort sort the data Proc means basic summary statistics Proc univariate detailed summary stat Proc freq frequency count Proc chart histogram proc plot scatter plot

proc format (1)

*Continue the data cleaning exercise on page 29; Defining group as a data proj3rawdata3; numeric variable will save space set proj3rawdata3; if fracuninsured<0.1then newgrp=1; else if fracuninsured<0.15 then newgrp=2; else if fracuninsured<0.2 then newgrp=3; else newgrp=4; run;

proc format (2)

proc format; value newgroup 1=low 2=mid-low no ; here until 3=mid-high the end of the 4=high; value command run; proc print data=proj3rawdata3; format newgrp newgroup. fracuninsured 4.2; title report new group in words; no ; here until label fracuninsured=fraction uninsured the end of the newgrp=new group 1-4; var fracuninsured newgrp; label command run;

proc sort
proc sort data=proj3rawdata3; by year state; run; proc sort data=proj3rawdata3 out=proj3rawdata3_sorted; by year descending fracuninsured; run; * note that missing value is always counted as the smallest;

proc means and proc univariate

proc means data=proj3rawdata3; class newgrp; var insured uninsured fracuninsured; run;
By default, proc means report mean, stdev, min, max Could choose what to report: proc means data=proj3rawdata3 n mean median;

proc sort data=proj3rawdata3; by newgrp; run;

proc univariate data=proj3rawdata3; by newgrp; var insured uninsured fracuninsured; run;

By default, proc univariate report median, and many other statistics

Notes on proc means and proc univariate

*if you do not use class or by command, the statistics are based on the full sample. If you use class or by var x, the statistics are based on the subsample defined by each value of var x. *You can use class or by in proc means, but only by in proc univariate; *whenever you use by var x, the data set should be sorted by var x beforehand;

proc means and proc univariate allow multiple groups

data proj3rawdata3; set proj3rawdata3; if totalpop<6000000 then popgrp=low; else popgrp=high; run; proc means data=proj3rawdata3; class newgrp popgrp; var fracuninsured; run; proc sort data=proj3rawdata3; by newgrp popgrp; run; proc univariate data=proj3rawdata3; by newgrp popgrp; var fracuninsured; run;

proc freq
* Remember we already generate a variable called newgrp to indicate categories of fraction uninsured and a variable called popgrp to indicate categories of population size; proc freq data=proj3rawdata3; tables newgrp One dimension frequency table popgrp newgrp*popgrp; run; Two-dimension frequency table

proc chart histogram for categorical variables

proc chart data=proj3rawdata3; title histogram for newgrp; vbar newgrp; run; proc chart data=proj3rawdata3; title frequency by two variables; vbar newgrp / group=popgrp; run;

proc chart histogram for continuous variable

proc chart data=proj3rawdata3; title histogram for continuous variable; vbar fracuninsured; run; proc chart data=proj3rawdata3; title histogram with specific midpoints; vbar fracuninsured / midpoints=0 to 1 by 0.05; run;

proc plot scatter plot

proc plot data=proj3rawdata3; title scatter plot of fracuninsured and totalpop; plot fracuninsured*totalpop; run;

scatter plot is less informative for categorical variables

proc plot data=proj3rawdata3; title scatter plot of newgrp and popgrp; plot newgrp*popgrp; run;

fancy proc means

proc means data=proj3rawdata3; class newgrp popgrp; var uninsured fracuninsured; output out = summary1 mean = avguninsured avgfracuninsured; run; proc print data=summary1; run;

The following page may be useful in practice, but I am not going to cover it in class.

some summary stat. in proc print

* Assume we have already defined newgrp and popgrp in proj3rawdata3; proc sort data=proj3rawdata3; by popgrp; run; proc print data=proj3rawdata3 n; where fracuninsured>=0.1; by popgrp; sum totalpop; var totalpop insured uninsured fracuninsured; run;

How to handle multiple data sets?

Add more observations to an existing data and the new observations follow the same data structure as the old one append Add more variables to an existing data and the new variables refer to the same subjects as in the old data merge Sometimes we may need to change data structure to fit in append or merge .

merge and append

proj3rawdata3: year state totalpop fracuninsured newgrp popgrp

2009 MA 6420947 0.0548

summary1: newgrp 1 merged:

high

popgrp avguninsured avgfracuninsured high 7500000 0.073

year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured 2009 MA 6420947 0.0548 1 high 7500000 0.073

appended:
year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured 2009 MA 6420947 0.0548 1 high . . . . . 1 high 7500000 0.073

merge two datasets

proc sort data=proj3rawdata3; by newgrp popgrp; run; proc sort data=summary1; by newgrp popgrp; run; data merged; merge proj3rawdata3 (in=one) summary1 (in=two); by newgrp popgrp; if one=1 & two=1; What if this line is run; if one=1 OR two=1;?

Keep track of matched and unmatched records

data allrecords; merge proj3rawdata3 (in=one) summary1 (in=two); by newgrp popgrp; myone=one; SAS will drop variables mytwo=two; one and two if one=1 or two=1; automatically at the end of run; the DATA step. If you want proc freq data=allrecords; to keep them, you can copy tables myone*mytwo; them into new variables myone and mytwo run;

be careful about merge!

always put the merged data into a new data set must sort by the key variables before merge ok for one-to-one, multi-to-one, one-to-multi, but no good for multi-to-multi be careful of what records you want to keep, and what records you want to delete what if variable x appears in both datasets, but x is not in the by statement?
after the merge x takes the value defined in the last dataset of the merge statement

append
data appended; set proj3rawdata3 summary1; run; proc print data=appended; run; proc print data=merged; run;

Class example of merge and append: reshape and summarize

Task1: reshape proj3rawdata3 from long to wide

Task2: generate average fracuninsured per state and merge it back to the main data
Source format of Proj3rawdata3 (long): year state totalpop fracuninsured . 2009 MA 6420947 0.0548 . 2009 HI 1257622 0.078 . . 2008 MA 6339513 0.0536 . 2008 HI 1267409 0.075 .. Target format (wide) state totalpop2009 fracuninsured2009 . .. Totalpop2008 MA 6420947 0.0548 . 6339513 HI 1257622 0.078 . 1267409

fracuninsured2008 .. 0.0536 0.075 ..

Main data issues

From long to wide, variable names are different except for the key variable (state) We can generate average fracuninsured per state either before or after the reshape, but the merge code will be different depending on when we compute the average fracuninsured

Step 1 to reshape: generate a sub-sample for each year, so that we have:

subsample2009 subsample2008 . subsample2003

focus on 2009
data subsample2009; set proj3rawdata3; if year=2009; run;

data subsample2009; set subsample2009; rename totalpop=totalpop2009; rename insured=insured2009; rename uninsured=uninsured2009; rename fracuninsured=fracuninsured2009; drop year; run;

Same for 2008

data subsample2008; set proj3rawdata3; if year=2008; run;

data subsample2008; set subsample2008; rename totalpop=totalpop2008; rename insured=insured2008; rename uninsured=uninsured2008; rename fracuninsured=fracuninsured2008; drop year; run;

Step 2 to reshape: merge each years subsample by state

proc sort data=subsample2009; by state; run; proc sort data=subsample2003; by state; run; data reshaped; merge subsample2009 subsample2008 subsample2007 subsample2006 subsample2005 subsample2004 subsample2003; by state; run; proc print data=reshaped; run;

Generate average fracuninsured by state

proc means data=proj3rawdata3; class state; var fracuninsured; output out=avgperstate mean=avgfrac_bystate; run;

Merge the data file avgperstate back to reshaped

proc sort data=reshaped; by state; run; proc sort data=avgperstate; by state; run; data reshaped_withavg; merge reshaped (in=one) avgperstate (in=two); by state; myone=one; mytwo=two; if one=1 | two=1; run; proc print data=reshaped_withavg; run;

Check observations in reshaped_withavg

proc freq data=reshaped_withavg; tables myone*mytwo; run;

mean comparison: two groups

Example: does the average fracuninsured differ between 2008 and 2009? H0: mean of fracuninsured2008 = mean of fracuninsured2009. H1: mean of fracuninsured2008 not equal to mean of fracuninsured2009. This is a two-tail mean-comparison test between the 2008 sample and the 2009 sample. The test result will be different if (1) (2) we treat 2008 and 2009 as two independent samples; or We treat 2008 and 2009 as matched pairs (matched by state).

mean comparison: two groups

SAS performs mean comparison in a regression framework.

H0: mean of fracuninsured2008 = mean of fracuninsured2009. H1: mean of fracuninsured2008 not equal to mean of fracuninsured2009.
Step 1: focus on the subsample that has 2008 and 2009 data only. Step 2: create a binary variable dummy2009=1 if year=2009, 0 if year=2008. Step 3: depend on whether 2008 and 2009 are independent samples or matched pairs. If independent samples, regress fracuninsured as: fracuninsured = a + b* dummy2009 + error If matched pairs, regress fracuninsured as: fracuninsured = a + b * dummy2009 + c1* dummy_AL + c2* dummy_AK + +c51*dummy_WY + error

mean comparison: two groups as two independent samples

* Focus on 2008 and 2009 data only; Data subsample0809; Set proj3rawdata3; If year=2008 or year=2009; If year=2009 then dummy2009=1; else dummy2009=0; Run; * Treat 2008 and 2009 as two independent samples; Proc reg data=subsample0809; Model fracuninsured=dummy2009; Run; The difference between 2008 and 2009 is captured in the coefficient of dummy2009. So the hypothesis test is equivalent to: H0: coefficient of dummy2009=0. H1: coefficient of dummy2009 not equal to 0.

mean comparison: two groups as matched pairs (matched by state)

* Focus on 2008 and 2009 data only; data subsample0809; set proj3rawdata3; if year=2008 or year=2009; if year=2009 then dummy2009=1; else dummy2009=0; run; * Treat 2008 and 2009 as matched pairs (matched by state); proc glm data=subsample0809; class state; model fracuninsured =dummy2009 state/solution; run; The within-state difference between 2008 and 2009 is captured in the coefficient of dummy2009. So the hypothesis test is equivalent to: H0: coefficient of dummy2009=0. H1: coefficient of dummy2009 not equal to 0.

mean comparison: more than two groups

Example: does the average fracuninsured differ between any two years in our data? (here year = 2003, 2004, 2009) The test result will be different if (1) (2) we treat year t and year t as two independent samples; or We treat year t and year t as matched pairs (matched by state).

mean comparison: every year as an independent sample

* Treat every year as an independent sample; * Now we need to use the whole data of proj3rawdata3; proc glm data=proj3rawdata3; class year; model fracuninsured=year; means year/waller; means year/lsd cldiff; run;

mean comparison: every year as matched pairs (matched by state)

* Treat every year as matched by state; * First define dummies for each year; data proj3rawdata3; set proj3rawdata3; if year=2004 then dummy2004=1; else dummy2004=0; if year=2005 then dummy2005=1; else dummy2005=0; if year=2009 then dummy2009=1; else dummy2009=0; run; proc glm data=proj3rawdata3; class state; model fracuninsured=dummy2004 dummy2005 dummy2006 dummy2007 dummy2008 dummy2009 state/solution; run; The coefficient of dummy2004 captures the difference between 2003 and 2004. The coefficient of dummy2005 captures the difference between 2003 and 2005.

notes on mean comparison

1. The logic of mean comparison is the same as in Excel

Be careful about one-tail and two-tail tests. The standard SAS output of coefficient t-stat and p-value are based on a two-tail test of H0: coeff=0, but could be used for a one-tail test if we compare 1-alpha vs. pvalue/2 instead of p-value.
Comparison across more than two groups (as independent samples) H0: all groups have the same mean F-test of whole regression OR H0: group x and group y has the same mean waller or lsd statistics

Comparison across more than two groups (as matched pairs) requires specific test on regression coefficients. H0: 2003 = 2004 test the coefficient of dummy2004=0 because 2003 is set as the benchmark
H0: 2004=2005 test the coefficient of dummy2004 = coefficient of dummy2005.

Explicit hypothesis tests in proc reg and proc glm

* Treating each year as independent samples; Proc reg data=proj3rawdata3; Model fracuninsured=dummy2004 dummy2005 dummy2006 dummy2007 dummy2008 dummy2009; Test fracuninsured2003=fracuninsured2008; Test dummy2008=0; Test dummy2008=dummy2009; Test fracuninsured2008=fracuninsured2009; Run; * Treating each year as matched by state; Proc glm data=proj3rawdata3; Class year state; Model fracuninsured=year state/solution; Contrast test 2003 vs. 2008 year 1 0 0 0 -1 0; Test fracuninsured2003=fracuninsured2008; Contrast test 2008 vs. 2009 year 0 0 0 0 1 -1; Test fracuninsured2008=fracuninsured2009; Run;

In class exercise for mean comparison

Main question: compare fracuninsured in west, midatlantic and everywhere else, where west = CA, WA, OR midatlantic = DE, DC, MD, VA step 0: define west, midatlantic and everywhereelse exercise 1: compare west and midatlantic 1(a). as two independent samples 1(b). consider the match by year exercise 2: compare west, midatlantic, and everywhere else 2(a). as three independent samples; 2(b). consider the match by year;

regression in SAS
Question: how do fracuninsured vary by total population of a state? * model: fracuninsured=a+b*totalpop+error; proc reg data=proj3rawdata3; model fracuninsured=totalpop; run; * Add year fixed effects; * Model: fracuninsured=a+b*totalpop+c1*dummy2004 +c2*dummy2005 + +c51*dummy2009+error; proc glm data=proj3rawdata3; class year; model fracuninsured=totalpop year/solution; run;

A comprehensive example
A review of 1. readin data 2. summary statistics 3. mean comparison 4. regression
reg-cityreg-simple.sas in N:\share\

A Case Study of Los Angeles Restaurants

Nov. 16-18, 1997 CBS 2 News Behind the Kitchen Door January 16, 1998, LA county inspectors start issuing hygiene grade cards

A grade if score of 90 to 100 B grade if score of 80 to 89 C grade if score of 70 to 79 score below 70 actual score shown

Grade cards are prominently displayed

in restaurant windows

Score not shown on grade cards

Take the idea to data

Research Question: Does better information lead to better hygiene quality? better Information better quality

regulation

by county
by city

hygiene scores

Data complications
(blue font indicates our final choices)
Unit of analysis:
individual restaurant? city? zipcode? census tract? Unit of time: each inspection? per month? per quarter? per year? Define information: county regulation? city regulation? the date of passing the regulation? days since passing the regulation? % of days under regulation? Define quality: average hygiene score? the number of A restaurants? % of A restaurants?

How to test the idea?

Regression: quality = +*information+error +something else? Something else could be: year trend, seasonality, city specific effects, .

real test
reg-cityreg-simple.sas in N:\share\

Questions
How many observations in the sample?
log of the first data step, or output from proc contents

How many variables in the sample? How many are numerical, how many are characters?
Output from proc contents

How many percentage of restaurants have A quality in a typical city-month?

Output from proc means, on per_A

Questions
What is the difference between cityreg and ctyreg? We know county regulation came earlier than city regulation, is that reflected in our data?
Yes, cityreg<=ctyreg in every observation We can check this in proc means for cityreg and ctyreg, or add a proc print to eyeball each obs

What is the difference between cityreg and citymper? What is the mean of cityreg? What is the mean of citymper? Are they consistent with their definitions?
The unit of cityreg is # of days, so it should be a non-negative integer The unit of citymper is % of days, so it should be a real number between 0 and 1 To check this, we can add a proc means for cityreg and citymper

Questions
Economic theories suggest quality be higher after the regulation if regulation gives consumers better information. Is that true? The summary statistics reported in proc means (class citym_g or ctym_g) show the average percentage of A restaurants in different regulation environments. Rigorous mean comparison tests are done in proc glm with waller or lsd options.

Questions
Summary statistics often reflect many economic factors, not only the one in our mind. That is why we need regressions. Does more regulation lead to higher quality? is the coefficient of city regulation positive and significantly different from zero? (proc reg) is the coefficient of county regulation positive and significantly different from zero? (proc reg) Do we omit other sensible explanations for quality changes? What are they? (proc glm, year, quarter, city)

Count duplicates (not required)

http://support.sas.com/ctx/samples/index.jsp?sid=67&tab=co de data dups nodups ; set clasdata ; by name class ; /* If the combination of NAME and CLASS is in the data set once, output NODUPS, else output DUPS. */ if first.class and last.class then output nodups ; else output dups ; run;

course evaluation
University wide: www.CourseEvalUM.umd.edu TTclass in particular: (password plstt) www.surveyshare.com/survey/take/?sid=81087

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
From Everand
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
Anja Fischer
5/5 (1)
Sas 1
100% (2)
Sas 1
50 pages
SAS Programming with Medicare Administrative Data
From Everand
SAS Programming with Medicare Administrative Data
Matthew Gillingham
5/5 (1)
Sas Adv 130 PDF
No ratings yet
Sas Adv 130 PDF
60 pages
Base Sas Certification Exercise
No ratings yet
Base Sas Certification Exercise
47 pages
Carpenter's Complete Guide To The SAS (R) Macro Language
No ratings yet
Carpenter's Complete Guide To The SAS (R) Macro Language
55 pages
SAS® 9.4 Companion For UNIX EnvironmentsThird EditionSAS
No ratings yet
SAS® 9.4 Companion For UNIX EnvironmentsThird EditionSAS
534 pages
Sas Course Content Porak Technologies
No ratings yet
Sas Course Content Porak Technologies
8 pages
Men Aka 2
No ratings yet
Men Aka 2
92 pages
(Ebook) Remote Mental Influence of Human Electrodermal Activity
No ratings yet
(Ebook) Remote Mental Influence of Human Electrodermal Activity
16 pages
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Introduction To SEM Using SAS
No ratings yet
Introduction To SEM Using SAS
50 pages
The Ins and Outs of SAS® Data Integration Studio
No ratings yet
The Ins and Outs of SAS® Data Integration Studio
16 pages
Sample
No ratings yet
Sample
24 pages
3.reading Importing Raw Data Into Sas
No ratings yet
3.reading Importing Raw Data Into Sas
5 pages
Practice SAS Base Exam Solutions 2014
50% (6)
Practice SAS Base Exam Solutions 2014
27 pages
(Pronounced ": History of Sas SAS
No ratings yet
(Pronounced ": History of Sas SAS
141 pages
SAS Project Solution Snapshots
No ratings yet
SAS Project Solution Snapshots
7 pages
Base Sas Certification Exercise
No ratings yet
Base Sas Certification Exercise
47 pages
YLP SAS Advanced Certification Slides
0% (3)
YLP SAS Advanced Certification Slides
114 pages
CH 12 Producing Summary Reports - Exercises
No ratings yet
CH 12 Producing Summary Reports - Exercises
17 pages
David Franklin, SAS Programmer - Consultant - Useful SAS Tips and Other Code
No ratings yet
David Franklin, SAS Programmer - Consultant - Useful SAS Tips and Other Code
11 pages
SAS Proc Report at Work
No ratings yet
SAS Proc Report at Work
43 pages
SAS Training - Day1
No ratings yet
SAS Training - Day1
36 pages
Base Sas Advance SAS Interview Questions by Keli Technologies
No ratings yet
Base Sas Advance SAS Interview Questions by Keli Technologies
12 pages
SAS DI Interview Questions
50% (2)
SAS DI Interview Questions
3 pages
SAS (Software) : Navigation Search
No ratings yet
SAS (Software) : Navigation Search
10 pages
SAS Clinical Training AND Placement Program
No ratings yet
SAS Clinical Training AND Placement Program
6 pages
Ten Good Reasons To Learn Sas Software'S SQL Procedure: Sigurd W. Hermansen, Westat, Rockville, MD
No ratings yet
Ten Good Reasons To Learn Sas Software'S SQL Procedure: Sigurd W. Hermansen, Westat, Rockville, MD
5 pages
A00-221 Certification Guide and How To Clear Exam On SAS Big Data Programming and Loading
0% (1)
A00-221 Certification Guide and How To Clear Exam On SAS Big Data Programming and Loading
15 pages
Sas Studymaterial
No ratings yet
Sas Studymaterial
3 pages
Base SAS Interview Questions
No ratings yet
Base SAS Interview Questions
14 pages
SAS Vs SQL
No ratings yet
SAS Vs SQL
52 pages
Bayesian Notes
100% (3)
Bayesian Notes
2 pages
Using Arrays in SAS Programming
No ratings yet
Using Arrays in SAS Programming
24 pages
Proc Report
No ratings yet
Proc Report
32 pages
JMP
No ratings yet
JMP
8 pages
Practice Exam. SAS Pre-Advanced
50% (2)
Practice Exam. SAS Pre-Advanced
50 pages
SAS Macro Tutorial
100% (1)
SAS Macro Tutorial
6 pages
1-Reading Raw Data in SAS Week1
No ratings yet
1-Reading Raw Data in SAS Week1
75 pages
Sas Input
No ratings yet
Sas Input
15 pages
Data Warehousing - Analysis - Reporting
No ratings yet
Data Warehousing - Analysis - Reporting
11 pages
Managing Data Integrity As Part of Master Data Management: Katri Vihavainen
No ratings yet
Managing Data Integrity As Part of Master Data Management: Katri Vihavainen
60 pages
Introduction To Tables and Graphs in SAS
No ratings yet
Introduction To Tables and Graphs in SAS
8 pages
Sas fAQ'S
No ratings yet
Sas fAQ'S
112 pages
Chapter 1 and 2
No ratings yet
Chapter 1 and 2
61 pages
What Is PDV (Program Data Vector)
No ratings yet
What Is PDV (Program Data Vector)
4 pages
SAS Macro Language Reference
No ratings yet
SAS Macro Language Reference
385 pages
SAS Access 92
No ratings yet
SAS Access 92
984 pages
Sas
No ratings yet
Sas
2 pages
2019 Migration Guide
No ratings yet
2019 Migration Guide
20 pages
SAS Enterprise Guide 4.1 A Basic Guide For Banner Finance Users
No ratings yet
SAS Enterprise Guide 4.1 A Basic Guide For Banner Finance Users
83 pages
Instant Download (eTextbook PDF) for Applied Regression Analysis and Other Multivariable Methods 5th Edition PDF All Chapters
100% (11)
Instant Download (eTextbook PDF) for Applied Regression Analysis and Other Multivariable Methods 5th Edition PDF All Chapters
65 pages
Read Text: Sas My Code
No ratings yet
Read Text: Sas My Code
6 pages
End Points
No ratings yet
End Points
22 pages
Sas Base Quiz Questions
No ratings yet
Sas Base Quiz Questions
3 pages
The Essential PROC SQL
No ratings yet
The Essential PROC SQL
10 pages
Tata SAS
No ratings yet
Tata SAS
285 pages
Sas A00-211
No ratings yet
Sas A00-211
50 pages
The Simple Guide to SAS: From Null to Novice
From Everand
The Simple Guide to SAS: From Null to Novice
Kirby Thomas
No ratings yet
SAS Viya: The Python Perspective
From Everand
SAS Viya: The Python Perspective
Kevin D. Smith
No ratings yet
SAS Workbook Solutiojns by Ron Cody - Chapter1
No ratings yet
SAS Workbook Solutiojns by Ron Cody - Chapter1
6 pages
Appendix: Answers To Selected Exercises: /user
No ratings yet
Appendix: Answers To Selected Exercises: /user
8 pages
Example Code and Data For Learning SAS by Example
No ratings yet
Example Code and Data For Learning SAS by Example
2 pages
SAS 9.2 - Configaration Guide
No ratings yet
SAS 9.2 - Configaration Guide
40 pages
SAS 9.2 Companion For Windows, Second Edition
No ratings yet
SAS 9.2 Companion For Windows, Second Edition
684 pages
SAS 9.2 Companion For zOS
No ratings yet
SAS 9.2 Companion For zOS
719 pages
Net PHD Appli Card 2011
No ratings yet
Net PHD Appli Card 2011
3 pages
SAS 9.2 Companion For UNIX Environments
No ratings yet
SAS 9.2 Companion For UNIX Environments
518 pages
Encryption in SAS 9.2
No ratings yet
Encryption in SAS 9.2
84 pages
Base SAS 9.2 Language Reference Dictonary Third Edition
No ratings yet
Base SAS 9.2 Language Reference Dictonary Third Edition
2,230 pages
SAS 9.2 CDISC Procedure - User's Guide
No ratings yet
SAS 9.2 CDISC Procedure - User's Guide
101 pages
SAS 9.2 National Language Support (NLS) - Reference Guide
No ratings yet
SAS 9.2 National Language Support (NLS) - Reference Guide
683 pages
SAS 9.2 Language Interfaces To Metadata
No ratings yet
SAS 9.2 Language Interfaces To Metadata
229 pages
Panel
100% (1)
Panel
93 pages
NAVR 611 Theory
100% (1)
NAVR 611 Theory
20 pages
Assumption of Normality Transformations Assumption of Normality Script Practice Problems
No ratings yet
Assumption of Normality Transformations Assumption of Normality Script Practice Problems
27 pages
BU Syllabus
No ratings yet
BU Syllabus
72 pages
CMS 810 Assignment
No ratings yet
CMS 810 Assignment
3 pages
Service Quality Perception and Customers' Satisfaction in Internet Banking Service
No ratings yet
Service Quality Perception and Customers' Satisfaction in Internet Banking Service
10 pages
Dependent T Test
No ratings yet
Dependent T Test
38 pages
K.vishnu Priya
No ratings yet
K.vishnu Priya
48 pages
Statistics and Probability
100% (1)
Statistics and Probability
27 pages
Study Guide Ch8 Multiple Regressions
No ratings yet
Study Guide Ch8 Multiple Regressions
13 pages
Nonparametric Methods
No ratings yet
Nonparametric Methods
11 pages
Capital Budgeting and Risk Management
0% (1)
Capital Budgeting and Risk Management
12 pages
Gender Difference: Work and Family Conflicts and Family-Work Conflicts
No ratings yet
Gender Difference: Work and Family Conflicts and Family-Work Conflicts
17 pages
Hypothesis Testing Mean
100% (1)
Hypothesis Testing Mean
26 pages
Meta-Mar Free Online Meta-Analysis Service!
No ratings yet
Meta-Mar Free Online Meta-Analysis Service!
8 pages
Lesson 10
No ratings yet
Lesson 10
28 pages
Final Exam Experimental Psych
No ratings yet
Final Exam Experimental Psych
17 pages
10 Mar - AssQ
No ratings yet
10 Mar - AssQ
2 pages
Biostat07 H
No ratings yet
Biostat07 H
17 pages
W4A1
No ratings yet
W4A1
4 pages
Assignment 2 Fin 534
No ratings yet
Assignment 2 Fin 534
24 pages
Luna MAT 243 Project Two Summary Report(done)
No ratings yet
Luna MAT 243 Project Two Summary Report(done)
5 pages
Computational Stats Aiml Notes
No ratings yet
Computational Stats Aiml Notes
36 pages
Using The Excel Spreadsheet Program To Calculate The Independent T
No ratings yet
Using The Excel Spreadsheet Program To Calculate The Independent T
4 pages
Stat130 Assignment 4 On Unit - 5
No ratings yet
Stat130 Assignment 4 On Unit - 5
8 pages
Normality Test in Clinical Research
No ratings yet
Normality Test in Clinical Research
7 pages
Econometrics for Accounting Students
No ratings yet
Econometrics for Accounting Students
132 pages
M.G University MBA Second Sem Syllabus
No ratings yet
M.G University MBA Second Sem Syllabus
11 pages
Quiz 5 Chap 6
No ratings yet
Quiz 5 Chap 6
5 pages