SAS Essentials 1
SAS Essentials 1
SAS Essentials 1
Essentials
Course Notes
SAS® Programming 1: Essentials Course Notes was developed by Stacey Syphus and Beth Hardin.
Additional contributions were made by Bruce Dawless, Brian Gayle, Anita Hillhouse, Marty Hultgren,
Mark Jordan, Eva-Maria Kegelmann, Gina Repole, Gemma Robson, Samantha Rowland, Allison
Saito, Prem Shah, Charu Shankar, Kristin Snyder, Peter Styliadis, Su Chee Tay, and Kitty Tjaris.
Instructional design, editing, and production support was provided by the Learning Design and
Development team.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
Copyright © 2020 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise,
without the prior written permission of the publisher, SAS Institute Inc.
Book code E71640, course code LWPG1V2/PG1V2, prepared date 26Mar2020. LWPG1V2_001
ISBN 978-1-64295-937-6
For Your Infor mation iii
Table of Contents
To learn more…
For information about other courses in the curriculum, contact the
SAS Education Division at 1-800-333-7660, or send e-mail to
training@sas.com. You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course
Catalog.
For a list of SAS books (including e-books) that relate to the topics
covered in this course notes, visit https://www.sas.com/sas/books.html or
call 1-800-727-0025. US customers receive free shipping to US
addresses.
viii For Your Information
Lesson 1 Essentials
1.1 The SAS Programming Process................................................................................... 1-3
Demonstration: SAS Programming Process................................................................ 1-6
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 The SAS Programming Process 1-3
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
It is impossible to understand data without using tools that help you derive meaning from numbers
and text. SAS offers a huge collection of tools and solutions to handle all your data needs. At the
core of all that SAS offers is the SAS programming language. Regardless of the SAS suite of tools
that you licensed, the Base SAS programming language is included. This course teaches you how
to write SAS code to handle the most common data processing tasks.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Lesson 1 Essentials
Analyze and
Access Explore Prepare Export
report on
data data data results
data
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
As you go through the process of making data meaningful and actionable, you will likely follow these
basic steps: access, explore, prepare, analyze and report, and export. SAS has programming tools
for each of these steps in the process. You follow this process as you learn the fundamentals of the
SAS programming language.
international
storm data
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
In this class, we analyze mainly international storm data, which is real data about storms such as
hurricanes, typhoons, and cyclones that has been collected since 1980. This data is stored in a
variety of formats, and the first thing you learn to do is write a SAS program to ac cess the data.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 The SAS Programming Process 1-5
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
As you continue through the programming process, you learn to write SAS programs that turn this
data into informative reports, tables, and graphics.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Lesson 1 Essentials
Scenario
Examine the international storm data that is used in course demonstrations. Open and run a SAS
program that follows the SAS programming process. The code included in the program is covered
throughout this course.
Files
• p101d01.sas
• Storm.xlsx – a Microsoft Excel workbook containing detail and summary data about international
storms
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 The SAS Programming Process 1-7
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Lesson 1 Essentials
b. Enterprise Guide: Click the Results tab and double-click the Excel file to open the new file.
You can also right-click on the file and select Open.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 The SAS Programming Process 1-9
US National
class
Park data
cars
international
storm and
weather data
shoes
Europe
tourism and
trade data
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Lesson 1 Essentials
10
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-11
SAS windowing
environment
13
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS provides several programming interfaces that can be used to interactively write and submit
code.
• SAS Studio – a web-based interface to SAS that you can use on any computer. SAS Studio is
the interface that is used in SAS OnDemand for Academics. SAS OnDemand for Academics is
cloud-based software. For more information, visit https://www.sas.com/en_us/learn/academic-
programs/software.html.
• SAS Enterprise Guide – a Windows client application that runs on your PC and accesses SAS
on a local or remote server.
• SAS windowing environment – a legacy interface that is part of SAS.
Note: This course uses SAS Enterprise Guide and SAS Studio because these are the
SAS interfaces that have the most modern programming tools.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Lesson 1 Essentials
To program, you need some basics: an editor to write and submit code, a way to read messages
related to the code that you submit (this is called the log in SAS), and a way to view the reports and
data that your programs create. Although they look different and are organized differently, all SAS
interfaces have these interactive programming tools. In addition, SAS Studio and Enterprise Guide
have an editor that is smart about SAS code, with features such as code completion and syntax
coloring.
Programs can also be submitted to the operating environment behind the scenes. This is referred to
as batch processing or background submit. The log and results are saved by default as separate
files in the same location as the SAS program. Background submission is often used for programs
that run regular jobs on a routine basis. These programs have typically been tested and can run
unattended.
SAS Studio also enables you to submit programs by right-clicking a .sas file in the Navigation pane
and selecting Background Submit. You can view the status of background programs and access
the associated log and results files by clicking the More application options icon and selecting
Background Job Status.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-13
Scenario
Write and submit a simple SAS program in SAS Enterprise Guide and examine the log and results.
Files
• sashelp.class – a sample table provided by SAS that includes information about 19 students
Notes
• Programs can be submitted by clicking Run or pressing the F3 key.
• A program generates a log. Depending on the code, a program might also generate results and
output data.
• To run a subset of a program, highlight the desired code. Then click Run or press F3.
Demo
1. View Sashelp sample tables.
Note: Sashelp is a collection of sample data files provided by SAS that are useful for testing
and practicing. This course references various data files in Sashelp to illustrate
programming syntax.
a. Open Enterprise Guide. On the Start Page, click Create a new program.
Note: In Enterprise Guide, your work can be organized in projects. To do so, select Create
a new project. As you open tables and programs or create new programs, you will
notice shortcuts added to your project in the Project pane. The project can be saved
by selecting File Save Project.
b. In the Servers pane in the lower left corner, expand Servers Local Libraries
SASHELP.
c. Double-click the CLASS table to open and view the data. You do not need to close the table.
2. Write and submit a program in SAS Enterprise Guide.
a. Type or copy and paste the program below on the Program tab and click Run.
Note: If the Program tab is not open, select File New Program, or click Create a new
item on the toolbar and select Program.
Note: If you copy and paste the program, click the Format code button to improve the
program spacing.
data myclass;
set sashelp.class;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Lesson 1 Essentials
b. Click the Log tab and toggle on the Notes button (if necessary). The log includes the
program and messages that are returned from SAS. The Log Summary is displayed by
default at the top of the window. You can click any of the messages in the Log Summary to
find the message in the log.
Note: If the Log Summary is closed, click the drop-down arrow to the right of the Errors,
Warnings, and Notes tab to expand the Log Summary.
c. Click the Output Data and Results tabs to examine the output.
d. Return to the Code tab. Highlight the PROC PRINT and RUN statements and click Run or
press F3.
Note: In this course, you often need to run only a portion of a SAS program.
e. Frequently, it is helpful to view multiple tabs at the same time. For example, you might want
to view a program and the results, or possibly compare two tables. By default, SAS
Enterprise Guide separates a program's tabs. To view more tabs at the same time, right-click
the tab that you would like to view and select either Float, New vertical tab group, or New
horizontal tab group. You can also drag and drop tabs outside of Enterprise Guide in any
location that you prefer.
Note: To return a tab to the original location, right-click the top bar of the tab and select
Docked as tabbed document for a Float window, or select Move to previous tab
group for a vertical or horizontal tabbed group. You can also set defaults for program
sub-tabs by going to View Program tab presets and selecting the preset of your
choice.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-15
Scenario
Write and submit a simple SAS program in SAS Studio and examine the log and results.
Files
• sashelp.class – a sample table provided by SAS that includes information about 19 students
Notes
• Programs can be submitted by clicking Run or pressing the F3 key.
• A program generates a log. Depending on the code, a program might also generate results and
output data.
• To run a subset of a program, highlight the desired code and click Run or press F3.
• When you rerun a program, the existing log, results, and output data are replaced.
Demo
1. View Sashelp sample tables.
Note: Sashelp is a collection of sample data files provided by SAS that are useful for testing
and practicing. This course references various data files in Sashelp to illustrate
programming syntax.
a. Open SAS Studio. In the navigation pane on the left side of the window, select Libraries.
Expand My Libraries SASHELP.
b. Double-click the CLASS table to open and view the data. A panel to the left of the data lists
the columns in the table. The Column panel can be collapsed by clicking the left-pointing
arrow .
c. Close the SASHELP.CLASS tab.
2. Write and submit a program in SAS Studio.
a. A new program window labeled Program 1 is open. Notice that there are tabs labeled CODE,
LOG, and RESULTS.
Note: If you do not have a new program window, press F4 or click New in the Files
and Folders pane and select SAS Program.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Lesson 1 Essentials
b. Type or copy and paste the program below on the CODE tab and click Run.
Note: If you copy and paste the program, click the Format code button to improve the
program spacing.
data myclass;
set sashelp.class;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-17
16
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Lesson 1 Essentials
Practice
Note: Please choose either the SAS Enterprise Guide or SAS Studio practice to further explore
your interface of choice.
b. Select File New Program (or click the Create a new item tool and select
Program) to start writing a SAS program. On the Code tab, type or copy and paste the
following code. This is a simple SAS program called a DATA step.
Note: If you copy and paste the program, click the Format code button to improve the
program spacing.
data work.shoes;
set sashelp.shoes;
NetSales=Sales-Returns;
run;
c. Click Run or press F3 to submit the code. Examine the Log and Output Data tabs.
d. Click the Log tab. Notice that there are additional statements included before and after the
DATA step. This is called wrapper code, and it includes statements added by Enterprise
Guide to set up the environment and results. To make the log easier to read, the wrapper
code statements can be hidden. Select Tools Options Results General and clear
the Show generated wrapper code in SAS log check box. Click OK.
e. Return to the Code tab and rerun the program. Examine the log.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-19
f. On the Code tab, add code to compute summary statistics. At the end of the program, begin
by typing pr. Notice that a prompt appears with valid keywords. Press the Enter key
or the spacebar to add the word proc to the program. Press the spacebar and type me.
Press Enter again to add means to the program.
g. Press the spacebar, use the prompts to select data=work.shoes, and press the spacebar
again. Notice that the prompt lists all valid options. Type or select options in the window to
complete the following statement:
proc means data=work.shoes mean sum maxdec=2;
Note: Autocomplete prompts can be modified or disabled by selecting Program
Editor options and then clicking the Autocomplete tab. On the tab, you can adjust
the prompts.
h. Complete the program by adding the highlighted statements below. Notice that after VAR
and CLASS, the autocomplete prompt includes a list the columns from the work.shoes
table.
proc means data=work.shoes mean sum maxdec=2;
var NetSales;
class region;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Lesson 1 Essentials
i. Highlight the code from PROC MEANS through RUN, and select Run or press F3.
Note: The default output format in SAS Enterprise Guide is HTML.
j. By default, the tabs are a vertical split. To change the default layout view of the program tabs,
go to View Program tab presets. You can also right-click a tab and select Float, New
vertical tab group, New horizontal tab group, or (if it is available) Move to previous tab
group.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-21
Note: You can also select a tab and drag it to a location of your choice.
k. To return to a single window for all program tabs, select View Program tab presets
Standard.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Lesson 1 Essentials
l. In addition to creating HTML output, you can create other output types. Click the Code tab
and click the properties icon. Select Results Customize result formats, styles, and
behavior. Clear any selected check boxes and then select the PDF and Excel check boxes.
Click OK.
m. Run the program again. An Excel file and a PDF file are created on the Results tab.
Note: PowerPoint, Excel, PDF, and RTF results must be viewed outside of Enterprise
Guide. Double-click the Excel or PDF file to open it. You can also right-click the file
and select Open.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-23
n. To save the program, return to the Code tab and click the Save "Program" As icon .
Navigate to the output folder in the course files. Enter shoesprogram in the File name field
and click Save. The .sas file extension is automatically added to the file name.
b. Options are available in the banner area to customize your SAS Studio environment.
New program, new import data, new query, close all tabs, and
New Options maximize view.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Lesson 1 Essentials
c. On the Program 1 tab, type or copy and paste the following code. This is a simple SAS
program called a DATA step.
Note: If you copy and paste the program, click Format code to improve the program
spacing.
data work.shoes;
set sashelp.shoes;
NetSales=Sales-Returns;
run;
d. Click Run or press F3 to submit the code. Examine the LOG and OUTPUT DATA tabs.
The RESULTS tab is empty because the program did not create a report.
e. On the CODE tab, add code to compute summary statistics. At the end of the program, begin
by typing pr. Notice that a prompt appears with valid keywords and syntax help. Press Enter
to add the word proc to the program. Press the spacebar and type me, and press Enter
again to add means to the program.
Note: The Autocomplete prompts also include a window with syntax Help and links to
documentation and examples.
f. Press the spacebar, use the prompt to select data=, and then type work.shoes. Press the
spacebar and notice that the prompt lists all valid options. Type or select options in the
window to complete the following statement:
proc means data=work.shoes mean sum;
g. Autocomplete prompts can be disabled by clicking More application options and then
clicking Preferences Code and Log. Clear the Enable autocomplete check box and
click Save.
h. Return to the CODE tab and press the spacebar after the SUM option and before the
semicolon. Notice that a prompt does not appear. Type MAXDEC=2 to round statistics
to two decimal places.
Note: If autocomplete is turned off, you can temporarily toggle it on at any point by holding
down the Ctrl key and pressing the spacebar to view the autocomplete prompt.
i. Complete the program by adding the following statements :
proc means data=work.shoes mean sum maxdec=2;
var NetSales;
class region;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-25
j. Highlight the code from PROC MEANS through RUN and click Run or press F3
to run only the selected portion. Confirm the results.
Note: The default output format in SAS Studio is HTML.
k. To view multiple tabs at the same time, click the RESULTS tab and drag it to the right side
of the work area until a highlighted region appears. To return to a single window, drag the
RESULTS tab back to the main tab area.
l. On the RESULTS tab, click the HTML, PDF, or Word icon to open results in the
corresponding file format. You are prompted to open or save the file in the browser.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Lesson 1 Essentials
Note: Additional options for the output formats are available by clicking More
application options and selecting Preferences Results.
m. To save the program, return to the CODE tab and click the Save As toolbar button.
Navigate to the output folder in the course files. Enter shoesprogram in the Name field and
click Save. The .sas file extension is automatically added to the file name.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Using SAS Programming Tools 1-27
course
files
activities
Make a note of
data
the location of
your course files
demos folder.
practices
output
18
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
data
demos
practices p104d01.sas
Programming 1, Lesson 4, demo 1
output
19
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Lesson 1 Essentials
course
files
activities
data
cre8data.sas
demos
practices
output
20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
21
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-29
24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
25
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-30 Lesson 1 Essentials
26
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
A DATA step can contain a variety of data manipulations, including filtering rows, computing new
columns, and joining tables. In this program, the DATA step is creating a copy of an existing SAS
table and adding a new column to convert height from inches to centimeters.
data myclass;
set sashelp.class;
heightcm=height*2.54;
run;
27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
A PROC, or procedure, step typically processes a SAS data set. SAS has dozens of procedures that
generate reports and graphs, manage data, or perform complex statistical analysis. This program
has two PROC steps: PROC PRINT generates a list of all the rows and columns in the data, and
PROC MEANS calculates basic summary statistics for age and heightcm.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-31
28
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
If a RUN or QUIT statement is not used at the end of a step, the beginning of a new step implies the
end of the previous step. If a RUN or QUIT statement is not used at the end of the last step, SAS
Studio and Enterprise Guide automatically submit a RUN and QUIT statement after the submitted
code.
SAS program
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-32 Lesson 1 Essentials
data myclass;
set sashelp.class;
heightcm=height*2.54;
run;
Most statements
proc print data=myclass;
run; begin with
a keyword, and all
proc means data=myclass; statements end with
var age heightcm; a semicolon.
run;
30
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Most statements begin with an identif ying keyword. In addition to DATA, PROC, and RUN
statements, this program also includes SET and VAR statements. The one statement that does not
begin with a keyword is the one that is creating the new column heightcm. The most important thing
to remember here is that all statements end with a semicolon.
Global Statements
TITLE . . . ;
OPTIONS . . . ;
Global statements are
typically outside of
LIBNAME . . . ; steps and do not need
a RUN statement.
31
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
In addition to DATA and PROC steps, a SAS program can also contain global statements. These
statements can be outside DATA and PROC steps, and they typically define some option or setting
for the SAS session. Global statements do not need a RUN statement after them.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-33
1.03 Activity
Open p101a03.sas from the activities folder and perform the following tasks:
1. View the code. How many steps are in the program?
2. How many statements are in the PROC PRINT step?
3. How many global statements are in the program?
4. Run the program and view the log.
5. How many observations were read by the PROC PRINT step?
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
34
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
These two programs have exactly the same code. Spacing does not matter to SAS, but it does
matter to people reading your code. You can use spaces and extra lines to make your program easy
to read and understand. There are also tools in your editor that format code for you. Click Format
code on the toolbar or right-click in the program and select Format code to format a SAS
program.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-34 Lesson 1 Essentials
data under13;
set sashelp.class;
where AGE<13;
drop heIGht Weight;
run;
Unquoted values can
be in any case.
35
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-35
36
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Another thing that you can do to make your code more understandable is to add comments. Any
commented text is ignored when the program executes. Comments are also useful when you are
testing code because you can suppress a portion of the code from execution.
To comment out a block of code using the /* */ technique in the SAS interfaces, you can highlight
the code and then press Ctrl+/ (forward slash).
• To uncomment a block of code in SAS Studio, highlight the block and then press Ctrl+/ again.
• To uncomment a block of code in SAS Enterprise Guide, highlight the block and then press
Ctrl+Shift+/.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-36 Lesson 1 Essentials
Scenario
Examine program statements, improve program spacing, and add comments.
Files
• p101d02.sas
• sashelp.cars – a sample table provided by SAS that includes basic information about 428 cars
Syntax
/*comment*/
*comment;
Notes
• All statements end with a semicolon.
• Spacing does not matter in a SAS program.
• Values not enclosed in quotation marks can be lowercase, uppercase, or mixed case.
• Consistent program spacing is a good practice to make programs legible.
• Use the automatic spacing feature Format code to improve the spacing in a program.
• Comments can be added to prevent text in the program from executing.
Demo
1. Open the p101d02.sas program from the demos folder. Run the program. Does it run
successfully?
2. Use the Format code feature to improve the program spacing. Use one of the following methods:
• Click Format code .
• Right-click in the program and select Format code.
3. Add the following text as a comment before the DATA statement: Program created by
<your-name>
Note: Select the comment text and press Ctrl+/ to surround it with /* and */.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-37
4. Comment out the first TITLE statement and the WHERE statement in PROC PRINT.
Run the code and verify that 428 rows are included in the results.
/*Program created by <name>*/
data mycars;
set sashelp.cars;
AvgMPG=mean(mpg_city, mpg_highway);
run;
title;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-38 Lesson 1 Essentials
syntax
misspelled
errors
keywords
38
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Syntax errors are a fact of programming life. As a programmer, it is incredibly valuable to be able to
identify, diagnose, and fix syntax errors in your code. A syntax error is an error in the spelling or
grammar of a SAS statement. Examples of syntax errors include misspelled keywords, unmatched
quotation marks, missing semicolons, and invalid options. You can catch some syntax errors, such
as an unmatched quotation mark, by paying attention to the color-coded syntax. When SAS finds a
syntax error in your submitted program, a warning or error message is written in the log.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-39
Scenario
Find and resolve some common syntax errors.
Files
• p101d03.sas
• sashelp.cars – a sample table provided by SAS that includes basic information about 428 cars
Notes
• Some common syntax errors are unmatched quotation marks, missing semicolons, misspelled
keywords, and invalid options.
• Syntax errors might result in a warning or error in the log.
• Refer to the log to help diagnose and resolve syntax errors.
Demo
1. Open the p101d03.sas program from the demos folder. Identify the three syntax errors but do
not fix them. Run the program.
2. Carefully review the messages in the log.
Note: The Log Summary is available to view the notes, warnings, and errors.
3. Fix the code and rerun the program.
data mycars;
set sashelp.cars;
AvgMPG=mean(mpg_city, mpg_highway);
run;
title;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-40 Lesson 1 Essentials
1.04 Activity
Open p101a04.sas from the activities folder and perform the following tasks:
1. Format the program to improve the spacing. What syntax error is
detected? Fix the error and run the program.
2. Read the log and identify any additional syntax errors or warnings. Correct
the program and format the code again.
3. Add a comment to describe the changes that you made to the program.
4. Run the program and examine the log and results. How many rows are in
the canadashoes data? data canadashoes; set sashelp.shoes;
where region="Canada;
Profit=Sales-Returns;run;
42
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The Extended Learning page is designed to supplement your learning for SAS Programming 1.
The Extended Learning page includes the following resources:
• PDF version of the course notes in English and other languages
• course files
• case studies for additional practice and application
• links to papers, videos, blogs, and other resources to learn more about related topics
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Understanding SAS Syntax 1-41
• Watch the video Getting • Watch the video Writing and • Visit the Learn SAS
Started with SAS Studio. Submitting SAS Code: Enterprise Guide page
• View additional free Choosing an Editor. for videos, tutorials, and
video tutorials on using • Complete the SAS windowing training.
SAS Studio tasks. environment activity on the • Take the SAS Enterprise
Extended Learning Page. Guide 1: Querying and
Reporting course.
43
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• Watch the video Getting Started with SAS Studio.
• View additional free video tutorials about using SAS Studio tasks.
• Watch the video Writing and Submitting SAS Code: Choosing an Editor.
• Visit the Learn SAS Enterprise Guide page for videos, tutorials, and training.
• Take the SAS Enterprise Guide 1: Querying and Reporting course.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-42 Lesson 1 Essentials
• Read the blog post How • Watch the video An • Take the free SAS
to run SAS programs in Introduction to SAS Viya Programming for R
Jupyter Notebook. Programming for SAS 9 Users course.
• Read the instructions Programmers. • Read the Getting
and download Jupyter • Take the Programming for Started with SAS Viya
kernel for SAS on the SAS Viya course after SAS for R documentation.
SAS github page. Programming 1.
44
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• Read the blog post How to run SAS programs in Jupyter Notebook.
• Read instructions and download Jupyter kernel for SAS on the SAS GitHub page.
• Watch the videos on An Introduction to SAS Viya Programming for SAS 9 Programmers .
• Take the Programming for SAS Viya course after SAS Programming 1.
• Take the free SAS Programming for R Users course.
• Use the Getting Started with SAS Viya for R documentation.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-43
1.4 Solutions
Solutions to Activities and Questions
Confirm that 22
SAS tables were
created.
22
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
33
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-44 Lesson 1 Essentials
41
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 2 Accessing Data
2.1 Understanding SAS Data ............................................................................................. 2-3
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Understanding SAS Data 2-3
Analyze and
Access Explore Prepare
report on
Export
data data data results
data
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Accessing data is the first step in the SAS programming process. There are many types of data files,
and SAS makes it easy to access data and use it for reporting and analysis.
Types of Data
Structured data Unstructured data
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS has engines to enable it to understand and read various types of structured data.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Lesson 2 Accessing Data
Types of Data
Structured data Unstructured data
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Unstructured data must be imported into SAS before you can analyze or report on it. SAS makes
importing data easy too.
structured data
.sas7bdat
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
A SAS table is a structured data file that has defined columns and rows. SAS tables have the file
extension .sas7bdat.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Understanding SAS Data 2-5
descriptor
SAS table portion
• table name
• number of rows
data • date created
portion • column names
• column attributes
data values
7
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
There are two parts to a SAS table: a descriptor portion and a data portion. The descriptor portion
contains information about the attributes of the table, or metadata. The metadata includes general
properties such as the table name, the number of rows, and the date and time that the table was
created. The descriptor portion also includes the column definitions. The data portion of a SAS table
contains the data values, stored in columns.
SAS Terminology
column or
SAS table or data set variable
row or
observation
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Lesson 2 Accessing Data
Name
Length
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
1 – 32 characters
Name
starts with a letter or
underscore
can be uppercase,
Length lowercase, or mixed case
10
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Column names are stored in the case that you use when you create the column, and that is the way
the column name appears in reports. After a column has been created, it can be typed in any case in
your code without affecting the way that it is stored.
Note: These same naming conventions should be followed for SAS table names.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Understanding SAS Data 2-7
Depending on the environment used to submit your SAS code, SAS might allow for spaces and
special symbols other than underscores in column and table names. If you use data sources other
than SAS that have flexible column-name rules, SAS can make allowances for that. However, for
simplicity and consistency, it is recommended to follow the standard SAS naming conventions.
a. month6
b. 6month
c. month#6
d. month 6
e. month_6
f. Month6
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
13
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Lesson 2 Accessing Data
Name
Numeric
SAS Dates
Type 01Jan1959 01Jan1960 01Jan1961
-365 0 366
Storing dates as
numbers makes
Length calculations
easy!
14
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS date values represent the number of days between January 1, 1960, and a specified date.
SAS can perform calculations on dates ranging from A.D. 1582 to A.D. 19,900.
SAS time values represent the number of seconds since midnight of the current day.
SAS datetime values represent the number of seconds between midnight on January 1, 1960,
and an hour/minute/second within a specified date.
Name
Numeric Character
Type
8 bytes 1 - 32,767 bytes
(~16 significant digits) (1 byte = one character)
Length
15
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Understanding SAS Data 2-9
The column length is the number of bytes allocated to store column values. The length is related to
the column type. Numeric columns, by default, are always stored as 8 bytes, which is enough for
about 16 significant digits. Character columns can be any length between 1 and 32,767 bytes, and a
byte stores one character. A column such as Country Code that is always a two-letter code might be
assigned a length of 2. A column such as Country Name that could have a varying number of
characters must have a length at least as long as the longest country name.
SAS uses floating-point representation to store numeric values. Floating-point representation
supports a wide range of values (very large or very small numbers) with an adequate amount of
numerical accuracy. For more information about how SAS stores numeric values, visit SAS 9.4
Language Reference: Concepts.
2.02 Activity
1. Navigate to the location of your course files and open the data folder.
Enterprise Guide: Expand Servers Local Files.
SAS Studio: Expand Files and Folders.
2. Double-click the storm_summary.sas7bdat SAS table to view it.
How are missing character and numeric values represented in the data?
16
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Lesson 2 Accessing Data
2.03 Question
Click Table Properties above the storm_summary data to view the table
and column attributes. Examine the length of the Basin column. Could East
Pacific be properly stored as a data value in the Basin column?
Yes
No
18
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
PROC CONTENTS
creates a report
about the descriptor
portion of the data.
20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102a04
The path provided in the program must be relative to where SAS is running. If SAS is on a remote
server, the path points to the server, not the local machine.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 Understanding SAS Data 2-11
21
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102a04
The output of PROC CONTENTS is a listing of the information in the descriptor portion of the table.
You can also think of this as the metadata or properties of the table. The first two sections of the
report give general information about the table, including where the table is stored, when it was
created and modified, and the number of rows and columns. The variable list provides the column
names along with their type and length.
In the class_birthdate table, there are the numeric columns Age, Birthdate, Height, and Weight
that all have a length of 8. There are also two character columns. Name is 8 bytes, meaning it can
store names with up to eight characters. Sex has a length of 1, which is appropriate because it
contains only one-letter codes.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Lesson 2 Accessing Data
2.04 Activity
Open p102a04.sas from the activities folder and perform the following task:
1. Write a PROC CONTENTS step to generate a report of the
storm_summary.sas7bdat table properties. Highlight the step and run
only the selected code.
2. How many observations are in the table?
3. How is the table sorted?
22
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-13
25
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
So far we have used a hardcoded file path to the SAS table that we want to access, and that file
path has the two pieces of information that are required for SAS to read the file: where the data is
located and what type of data it is. Because we have been reading a SAS table, providing a path to
the data and file name in quotation marks works perfectly.
Discussion
What challenges might arise if you use
a fixed path in your program?
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Lesson 2 Accessing Data
Think of the
I hope I don’t editing I’ll need
need to write that to do if the data
file path again changes location!
and again in a What if I want to
long program! access another
type of data?
27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
• eight-character maximum
• starts with a letter or
underscore
• continues with letters,
numbers, or underscores
28
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS libraries provide a way to specify the two required pieces of information – the location and file
type – in a very simple and efficient way. You can think of a library as a collection of data files that
are the same type and in the same location.
A library is created with the LIBNAME statement. This is a global statement, and it does not need a
RUN statement at the end.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-15
• The statement begins with the keyword LIBNAME, followed by what is referred to as a library
reference, or libref. The libref is the name of the library. The libref must be eight characters or
less, must start with either a letter or underscore, and can include only letters, numbers, and
underscores.
• After the libref, specify the engine, which is related to the type of data being accessed. The
engine is a behind-the-scenes set of instructions that enables SAS to read structured data files
directly, without having to do a separate, manual import into SAS. There are dozens of engines
available, including Base for SAS tables, Excel, Teradata, Hadoop, and many others.
• Finally, provide the location or connection information for the data to be read. That can be a
physical path or directory, or other options to connect to a database.
LIBNAME is a
global statement
and does not need
a RUN statement.
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS complies with operating system permissions that are assigned to the data files referenced by
the library. If you have Write access to the files, you are able to use SAS code to add, modify, or
delete data files. If you have Read access but do not have Write access, you can read data files
via the library, but you cannot make any changes to the files with SAS code.
To prevent SAS from making changes to tables in a library, add ACCESS=READONLY
at the end of the LIBNAME statement.
libname mylib base "s:/workshop/data" access=readonly;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Lesson 2 Accessing Data
The path specified must be relative to where SAS is running. If SAS is local, you can specify a path
to a folder of files on your own machine. If SAS is on a remote server, the path or folder must be to a
location known to the server.
When the LIBNAME statement is submitted, all the information about the location and file type is
associated with the library name, or libref.
libref.table-name
Create
use the
the proc contents data=mylib.class;
library
library run;
mylib
31
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
After the library is defined, it can be used as a shortcut to access tables in the program. To do this,
specify the libref, a period, and the table name.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-17
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
By default, a libref remains active until it is deleted or the SAS session ends. Remember that the
libref is simply a pointer or shortcut to existing data, so although the libref might be deleted when
SAS shuts down, the data remains in the same place. When SAS restarts, re-establish the library
and libref by submitting the LIBNAME statement again before accessing the data. This is why SAS
programs often begin with one or more LIBNAME statements to connect to the various data sources
that are used in the code.
2. Run the code and verify that the library was successfully assigned
in the log.
3. Go back to your program and save it as libname.sas in the main course
files folder. Replace the file if it exists.
33
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Lesson 2 Accessing Data
Note: The log might indicate that the pg1 libref refers to the same physical library as another libref,
such as TMP0001 or _TEMP0. When a table is opened to view in the data grid, SAS creates
a library that points to the folder where the table is located. You do not need to clear the libref
that is created by SAS.
2.06 Activity
1. Enterprise Guide: Select Libraries in the resources pane and click to
refresh.
SAS Studio: Select Libraries in the navigation pane and expand
My Libraries.
2. Expand the PG1 library. Why are the Excel and text files in the data folder
not included in the library?
35
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
temporary
Work
contents deleted at the end
of the SAS session
data=work.test
data=test
37
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Work and Sashelp are also known as SAS system libraries. For more information about system
libraries, access this page in SAS Help.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-19
The Work library is a temporary library that is automatically defined by SAS at the beginning of each
SAS session. We say that the Work library is temporary because any tables written to the Work
library are deleted at the end of each SAS session. This library is commonly used in SAS programs
because it is a great way to create working files that you do not need to save permanently.
The Work library is also considered to be the default library. If a libref is not provided in front of a
table name, SAS assumes that the library is Work. For example, test and work.test both reference
the temporary table named test in the Work library.
Work
includes sample data
that you can use
Sashelp
data=sashelp.cars
38
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Another library that SAS automatically defines is the Sashelp library. Sashelp contains a collection
of sample tables and other files. We use several of the sample tables in Sashelp in the examples in
this course.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Lesson 2 Accessing Data
SAS
administrator
sales research
sales.quarter1 research.field_trial9
39
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
If your SAS Platform has an administrator, other automatic libraries might be defined when you open
your SAS interface. If libraries are defined for you, you do not need to submit a LIBNAME statement.
You can use the libref that was created by your administrator and the table name to reference data in
your program.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-21
Scenario
Use the Work and Sashelp libraries that are automatically created by SAS. Determine what
happens with libraries and tables when SAS restarts.
Files
• p102d01.sas
• sashelp.class – a sample table provided by SAS that includes basic information about
19 students
Notes
• Work and Sashelp are system libraries that are automatically defined by SAS.
• Tables stored in the Work library are deleted at the end of each SAS session.
• Work is the default library, so if a table name without a libref is provided in the program, the table
is read from or written to the Work library.
• Sashelp contains a collection of sample tables and other files that include information about your
SAS session.
Demo
1. Open the p102d01.sas program from the demos folder and find the Demo section. Run the
demo program and use the navigation pane to examine the contents of the Work and out
libraries.
2. Which table is in the Work library? Which table is in the out library?
3. Restart SAS.
a. Enterprise Guide: In the Servers list, select Local, right-click, and select Disconnect.
Expand Local to start SAS again, and then expand Libraries.
b. SAS Studio: Select More application options Reset SAS Session.
4. Discuss the following questions:
a. What is in the Work library?
b. Why are the out and pg1 libraries not available?
c. Is class_copy2 saved permanently?
d. What must be done to re-establish the out library?
5. To re-establish the pg1 library, open and run the libname.sas program that was saved
previously in the main course files folder.
Note: Whenever you restart SAS Studio or SAS Enterprise Guide, you need to run the
libname.sas program to re-establish the pg1 library.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Lesson 2 Accessing Data
41
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102d02
In addition to SAS data, libraries can be used to access many other types data. For example, a
library using the XLSX engine can read data directly from Excel spreadsheets.
Remember that when SAS reads or writes data in a program, it must know where the data is located
and what format is it in. The only change to the LIBNAME statement syntax is that we specify the
XLSX engine, and a path that includes the complete Excel workbook file name and extension. You
can think of the Excel workbook as a collection of tables. Each individual worksheet or named range
is one table in the collection.
Note: The XLSX engine requires a license for SAS/ACCESS Interface to PC Files, and it also
requires SAS 9.4M2 or later.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-23
42
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102d02
There are two extra statements that are often used when reading Excel data. The first is the
OPTIONS statement, a global statement for specifying system options. Excel does not have any
rules for column headings, so they can be longer than 32 characters and include spaces or other
special symbols. When SAS reads the Excel data, we can force column names to follow strict SAS
naming conventions by using the VALIDVARNAME=V7 system option. Technically, this enforces the
column naming rules established with SAS 7. With this option set, SAS replaces any spaces or
special symbols in column names with underscores, and names greater than 32 characters are
truncated.
In SAS Studio and Enterprise Guide, the VALIDVARNAME= option is set t o ANY by default. ANY
enables column names to contain special characters, including spaces. If a column name contains
special characters, the column name must be expressed as a SAS name literal.
“var-name”n
The default value for VALIDVARNAME can also be changed in the interface options.
Enterprise Guide: Select Tools Options Data General and change Valid variable names to
Basic variable names.
SAS Studio: Select More application options Preferences Tables and change SAS variable
name policy to V7.
Note: The SAS windowing environment sets VALIDVARNAME=V7 by default.
When a connection is defined to data sources such as Excel or other databases, it is a good practice
to clear, or delete, the libref at the end of the program. While the library is active, it might create a
lock on the data preventing others from accessing the file, or it could maintain an active connection
to the data sources that is unnecessary. To clear the library reference, use the LIBNAME statement
again, name the libref, and use the keyword CLEAR.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Lesson 2 Accessing Data
options validvarname=v7;
libname xlclass xlsx "s:/workshop/data/class.xlsx";
43
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102d02
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-25
Scenario
Create a library to connect to an Excel workbook and reference an Excel worksheet in the program.
Files
• p102d02.sas
• Storm.xlsx – an Excel workbook with multiple worksheets that contain storm data
Syntax
OPTIONS VALIDVARNAME=V7;
Notes
• The XLSX engine enables you to read data directly from Excel workbooks. The XLSX engine
requires the SAS/ACCESS Interface to PC Files license.
• The VALIDVARNAME=V7 system option forces table and column names read from Excel to follow
SAS naming conventions. Spaces and special symbols are replaced with underscores, and names
greater than 32 characters are truncated.
• Date values are automatically converted to numeric SAS date values and formatted for easy
interpretation.
• Worksheets from the Excel workbook can be referenced in a SAS program as libref.worksheet-
name.
• When you define a connection to a data source other than a SAS data source, such as Excel or
other databases, it is a good practice to delete the libref at the end of your program with the
CLEAR option.
Demo
1. Open the Storm.xlsx file in Excel to view the data. Notice that, in the Storm_Summary
worksheet, there are spaces in the Hem NS and Hem EW column headings. Close the Excel file
after you finish viewing it.
Note: The file must be closed before you assign a library to the file.
2. Open p102d02.sas from the demos folder and find the Demo section. Complete the OPTIONS
statement to ensure that column names follow SAS naming conventions.
3. Complete the LIBNAME statement to define a library named xlstorm that connects to the
Storm.xlsx workbook.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Lesson 2 Accessing Data
4. Highlight the OPTIONS and LIBNAME statements and run the selected code. Use the navigation
area to find the xlstorm library. Open the storm_summary table. Notice that the Hem_NS and
Hem_EW columns include underscores. Close the storm_summary table.
*Complete the OPTIONS statement;
options validvarname=v7;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Accessing Data through Libraries 2-27
2.07 Activity
Open p102a07.sas from the activities folder and perform the following tasks:
1. If necessary, update the path of the course files in the LIBNAME
statement.
2. Complete the PROC CONTENTS step to read the parks table in the NP
library.
3. Run the program. Navigate to your list of libraries and expand the NP
library. Confirm that three tables are included: Parks, Species, and Visits.
4. Examine the log. Which column names were modified to follow SAS
naming conventions?
5. Uncomment the final LIBNAME statement and run it to clear the NP
library.
45
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Lesson 2 Accessing Data
PROC
IMPORT
48
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Libraries are an efficient and elegant way to directly access data and use it in a program. However,
sometimes you need to access unstructured data and to do that, you need to import the file and
create a copy as a SAS table.
Let’s start with text files as an example. Text files are simply strings of characters to your computer.
SAS cannot read text files directly with an engine. We must import the data into a structured format,
such as a SAS table, in order to use the data in a program.
There are a number of ways to import data. If you are interested in a point-and-click approach,
Enterprise Guide, SAS Studio, and the SAS windowing environment all offer an Import Wizard that
enables you to read various file types, specify options, and create a new SAS tabl e. But because
this is a programming class, we are going to teach you a simple programming option: the IMPORT
procedure.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing Data into SAS 2-29
49
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102d03
PROC IMPORT reads data from an external data source and writes it to a SAS table. SAS can
import delimited files with any character acting as the delimiter. To import a comma-delimited file, use
the DATAFILE= option to provide the path and complete file name, the DBMS= option to define the
file type as CSV, and the OUT= option to provide the library and name of the SAS output table. By
default, SAS assumes that column names are found in the first row of the file.
Here are some common DBMS identifiers that are included with Base SAS:
• CSV – comma-separated values.
• JMP – JMP files, JMP 7 or later.
• TAB – tab-delimited values.
• DLM – delimited files, default delimiter is a space. To use a different delimiter, use
the DELIMITER= statement.
Here are additional DBMS identifiers included with SAS/ACCESS Interface to PC Files:
• XLSX – Microsoft Excel 2007, 2010 and later
• ACCESS – Microsoft Access 2000 and later
Other DBMS identifiers can be viewed here in the SAS Help Center.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Lesson 2 Accessing Data
50
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p102d03
The REPLACE option indicates that the SAS output table should be replaced if it already exists.
By def ault, SAS scans the f irst 20 rows of the data to make its best guess f or the column attributes,
including type and length. It is possible that SAS might incorrectly assume a column’s type or length
based on the values f ound in those initial rows. Use the GUESSINGROWS= option to provide a set
number or use the keyword MAX to examine all rows. SAS scans the number of rows that you
specif y to determine type and length of each column in the imported table.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing Data into SAS 2-31
Scenario
Using PROC IMPORT, import a comma-delimited file and create a new SAS table.
Files
• p102d03.sas
• storm_damage.csv – a comma-delimited file that includes a description and damage estimates
for storms in the US with damages greater than one billion dollars
Syntax
Notes
• The IMPORT procedure can be used to read delimited text files .
• The DBMS option identifies the file type. The CSV value is included with Base SAS.
• The OUT= option provides the library and name of the SAS output table.
• The REPLACE option is necessary to overwrite the SAS output table if it exists.
• SAS assumes that column names are in the first line of the text file and data begins on the
second line.
• Date values are automatically converted to numeric SAS date values and formatted for easy
interpretation.
• The GUESSINGROWS= option can be used to increase the number of rows that SAS scans to
determine each column’s type and length from the default of 20 rows to a maximum of 32,767.
Demo
The storm_damage.csv file is in the data folder. In this display of the data, notice that column
names are in the first row, the data is comma-delimited, and there is a Date column. Data values
that include commas are enclosed in quotation marks.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Lesson 2 Accessing Data
1. Open the p102d03.sas program in the demos folder and find the Demo section. Complete the
PROC IMPORT step to read storm_damage.csv and create a temporary SAS table named
storm_damage_import. Replace the table if it exists.
2. Complete the PROC CONTENTS step to examine the properties of storm_damage_import.
3. Highlight the demo program and submit the selected code.
*Complete the PROC IMPORT step;
proc import datafile="s:/workshop/data/storm_damage.csv" dbms=csv
out=storm_damage_import replace;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing Data into SAS 2-33
2.08 Activity
Open p102a08.sas from the activities folder and perform the following tasks:
1. This program imports a tab-delimited file. Run the program twice and
carefully read the log. What is different about the second submission?
2. Fix the program and rerun it to confirm that the import is successful.
52
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
54
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The XLSX library engine can read and write Excel data directly, but you might prefer to import a copy
of your Excel data as a SAS table and use that SAS table in your program. If SAS/ACCESS to PC
Files is licensed, PROC IMPORT can accomplish this. Change the DATAFILE= value and the DBMS
option to reference XLSX and use the SHEET= option to tell SAS which worksheet you want to read.
PROC IMPORT can read only one spreadsheet at a time, and by default it reads the first worksheet.
Note: If the Excel file is open when PROC IMPORT runs, an error occurs.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-34 Lesson 2 Accessing Data
56
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• SAS/ACCESS courses (http://support.sas.com/training/us/paths/dmgt.html#acc )
• SAS/ACCESS documentation (https://support.sas.com/documentation/onlinedoc/access/ )
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing Data into SAS 2-35
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
1. Importing Excel Data from a Single Worksheet
Create a table that contains a copy of the data that is in an Excel workbook. The Excel workbook
contains a single worksheet.
a. Open p102p01.sas from the practices folder. Complete the PROC IMPORT step to read
eu_sport_trade.xlsx. Create a SAS table named eu_sport_trade and replace the table
if it exists.
b. Modify the PROC CONTENTS code to display the descriptor portion of the eu_sport_trade
table. Submit the program, and then view the output data and the results.
Level 2
2. Importing Data from a CSV File
Create a table from a comma-delimited CSV file.
np_traffic.csv
ParkName,UnitCode,ParkType,Region,TrafficCounter,ReportingDate,TrafficCount
Big Hole NB,BIHO,National Battlefield,Pacific West,TRAFFIC COUNT AT BATTLE ROAD,31JAN2016,0
Big Hole NB,BIHO,National Battlefield,Pacific West,TRAFFIC COUNT AT BATTLE ROAD,29FEB2016,0
Big Hole NB,BIHO,National Battlefield,Pacific West,TRAFFIC COUNT AT BATTLE ROAD,31MAR2016,0
Big Hole NB,BIHO,National Battlefield,Pacific West,TRAFFIC COUNT AT BATTLE ROAD,30APR2016,183
Big Hole NB,BIHO,National Battlefield,Pacific West,TRAFFIC COUNT AT BATTLE ROAD,31MAY2016,289
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-36 Lesson 2 Accessing Data
a. Create a new program. Write a PROC IMPORT step to read the np_traffic.csv file and
create the traffic SAS table. Add a PROC CONTENTS step to view the descriptor portion of
the newly created table. Submit the program.
b. Examine the data interactively. Scroll down to row 37. Notice that the values of ParkName
and TrafficCounter seem to be truncated. Modify the program to resolve this issue.
c. Submit the program and verify that ParkName and TrafficCounter are no longer truncated.
Challenge
3. Importing Data with a Specific Delimiter
Create a table from np_traffic.dat. The values in the text file are delimited with a pipe (that is,
a vertical bar).
ParkName|UnitCode|ParkType|Region|TrafficCounter|ReportingDate|TrafficCount
Big Hole NB|BIHO|National Battlefield|Pacific West|TRAFFIC COUNT AT BATTLE ROAD|31JAN2016|0
Big Hole NB|BIHO|National Battlefield|Pacific West|TRAFFIC COUNT AT BATTLE ROAD|29FEB2016|0
Big Hole NB|BIHO|National Battlefield|Pacific West|TRAFFIC COUNT AT BATTLE ROAD|31MAR2016|0
Big Hole NB|BIHO|National Battlefield|Pacific West|TRAFFIC COUNT AT BATTLE ROAD|30APR2016|183
Big Hole NB|BIHO|National Battlefield|Pacific West|TRAFFIC COUNT AT BATTLE ROAD|31MAY2016|289
a. Access the SAS Procedures Guide. Expand Procedures and find the IMPORT Procedure
section. Review the syntax and examples to determine how to read a file that is delimited
with a specific symbol.
b. Use PROC IMPORT to import the np_traffic.dat file and create the temporary traffic2 SAS
table.
Partial Results (rows 37-46 of 2,784)
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Importing Data into SAS 2-37
c. To test the library, select More application options Reset SAS Session. Expand the
Libraries section of the navigation pane and verify that the pg1 library is available.
5. SAS Enterprise Guide: Assigning a Library Automatically at Start-Up
Recall that when SAS shuts down, library references are deleted. It might be helpful to have
certain libraries that are automatically defined when SAS starts.
a. Select Tools Options SAS Programs. Select the Submit SAS code when server
is connected check box and click Edit. You can include any SAS code that you want to
execute each time that SAS starts. Enter a LIBNAME statement, click Save, and then
click OK.
Note: Change the path if necessary to match the location of your course data.
libname pg1 base "s:/workshop/data";
b. To test the library, select Local in the Servers list, right-click, select Disconnect, and then
click Yes. Expand Local to start SAS again, and then expand Libraries to confirm that pg1
is available.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-38 Lesson 2 Accessing Data
2.4 Solutions
Solutions to Practices
1. Importing Excel Data from a Single Worksheet
*Modify the path if necessary;
proc import datafile="s:/workshop/data/eu_sport_trade.xlsx"
dbms=xlsx
out=eu_sport_trade
replace;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Solutions 2-39
a. month6
b. 6month
c. month#6
d. month 6 Month6 and month6
e. month_6 are actually the
same column name.
f. Month6
12
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
How are missing character and numeric values represented in the data?
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-40 Lesson 2 Accessing Data
Yes
No Basin is two bytes,
so East Pacific would
be truncated, and
the value would be
Ea.
19
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
23
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Solutions 2-41
2. Run the code and verify that the library was successfully assigned
in the log.
25 libname pg1 base "s:/workshop/data";
NOTE: Libref PG1 was successfully assigned as follows:
Engine: BASE
Physical Name: s:\workshop\data
36
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-42 Lesson 2 Accessing Data
46
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
2. Fix the program and rerun it to confirm that the import is successful.
proc import datafile="s:/workshop/data/storm_damage.tab"
dbms=tab out=storm_damage_tab replace;
run;
53
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 3 Exploring and Validating
Data
3.1 Exploring Data ............................................................................................................. 3-3
Demonstration: Exploring Data with SAS Procedures................................................. 3-10
Practice............................................................................................................... 3-14
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-3
$w.
w.d
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Exploring data can include learning about the columns and values that you have, as well as
validating data to look for incorrect or inconsistent values. In this lesson, you learn to use some
procedures that give you some of this insight. You also learn to subset the data so that you can
focus on particular segments, format data so that you can easily understand it, sort data, and identify
and clean up duplicate values.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Lesson 3 Exploring and Validating Data
MEANS
UNIVARIATE
FREQ
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
After accessing data, the next step is to understand it. PROC CONTENTS can be used to confirm
column attributes, but often the data is too large or complex for a visual review to be sufficient.
The PRINT, MEANS, UNIVARIATE, and FREQ procedures can be used to quickly and easily explore
data.
PRINT Procedure
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-5
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
a. BY
b. ID
c. SUM
d. VAR
7
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Lesson 3 Exploring and Validating Data
PRINT Procedure
proc print data=sashelp.cars (obs=10);
var Make Model Type MSRP;
run;
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
This PROC PRINT step lists the first 10 rows, or observations, from the sashelp.cars table and
displays only the Make, Model, Type, and MSRP columns.
MEANS Procedure
PROC MEANS DATA=input-table;
VAR col-name(s);
RUN;
10
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-7
MEANS Procedure
proc means data=sashelp.cars;
var EngineSize Horsepower MPG_City MPG_Highway;
run;
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
This PROC MEANS step calculates the default statistics – frequency count (N), mean, standard
deviation, minimum, and maximum – for each of the columns that is listed in the VAR statement.
By examining the PROC MEANS results, you can identify average values or values that might be
outside of an expected range.
UNIVARIATE Procedure
PROC UNIVARIATE DATA=input-table;
VAR col-name(s);
RUN;
12
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Lesson 3 Exploring and Validating Data
UNIVARIATE Procedure
proc univariate data=sashelp.cars;
var MPG_Highway;
run;
13
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
This PROC UNIVARIATE step analyzes MPG_Highway and provides several summary statistics,
including the five lowest and highest extreme values and their observation numbers.
FREQ Procedure
PROC FREQ DATA=input-table;
TABLES col-name(s);
RUN;
14
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-9
FREQ Procedure
proc freq data=sashelp.cars;
tables Origin Type DriveTrain;
run;
15
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d01
This PROC FREQ step creates a separate table for Origin, Type, and DriveTrain. Each table
includes a list of the distinct values for the column along with a frequency count, percent, and
cumulative frequency and percent. This is a great way to validate the data in your columns. For
example, you might notice unexpected values or values that appear in both uppercase and
lowercase.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Lesson 3 Exploring and Validating Data
Scenario
Use the PRINT, MEANS, UNIVARIATE, and FREQ procedures to explore and validate data.
Files
• p103d01.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
Notes
• PROC PRINT lists all columns and rows in the input table by default. The OBS= data set option
limits the number of rows read from the input data. The VAR statement limits and orders the
columns that are listed.
• PROC MEANS generates simple summary statistics for each numeric column in the input data
by default. The VAR statement limits the columns to analyze.
• PROC UNIVARIATE also generates summary statistics for each numeric column in the data
by default, but it includes more detailed statistics related to distribution and extreme values.
The VAR statement limits the columns to analyze.
• PROC FREQ creates a frequency table for each column in the input table by default. You can limit
the columns that are analyzed by using the TABLES statement.
Demo
1. Open p103d01.sas from the demos folder and find the Demo section of the program. Complete
the PROC PRINT statement to list the data in pg1.storm_summary. Print the first 10
observations. Highlight the step and run the selected code.
proc print data=pg1.storm_summary (obs=10);
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-11
2. Add a VAR statement to include only the following columns: Season, Name, Basin,
MaxWindMPH, MinPressure, StartDate, and EndDate. Add list first 10 rows as a comment
before the PROC PRINT statement. Highlight the step and run the selected code.
Enterprise Guide Note: To easily add column names, use the autocomplete prompts to view
and select columns. You can either double-click on a column to add it in the program,
or start to type the column name and press the spacebar when the correct column is
highlighted.
SAS Studio Note: To easily add column names, place your cursor after the keyword VAR.
Use the Library section of the navigation pane to find the pg1 library. Expand the
storm_summary table to see a list of column names. Hold down the Ctrl key and select
the columns in the order in which you want them to appear in the statement. Drag the
selected columns to the VAR statement.
/*list first 10 rows*/
proc print data=pg1.storm_summary(obs=10);
var Season Name Basin MaxWindMPH MinPressure StartDate
EndDate;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Lesson 3 Exploring and Validating Data
3. Copy the PROC PRINT step and paste it at the end of the program. Change PRINT to MEANS.
Remove the OBS= data set option to analyze all observations. Modify the VAR statement to
calculate summary statistics for MaxWindMPH and MinPressure. Add calculate summary
statistics as a comment before the PROC MEANS statement. Highlight the step and run the
selected code.
/*calculate summary statistics*/
proc means data=pg1.storm_summary;
var MaxWindMPH MinPressure;
run;
4. Copy the PROC MEANS step and paste it at the end of the program. Change MEANS to
UNIVARIATE. Add examine extreme values as a comment before the PROC UNIVARIATE
statement. Highlight the step and run the selected code.
/*examine extreme values*/
proc univariate data=pg1.storm_summary;
var MaxWindMPH MinPressure;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-13
5. Copy the PROC UNIVARIATE step and paste it at the end of the program. Change UNIVARIATE
to FREQ. Change the VAR statement to a TABLES statement to produce frequency tables for
Basin, Type, and Season. Add list unique values and frequencies as a comment before the
PROC FREQ statement. Highlight the step and run the selected code.
/*list unique values and frequencies*/
proc freq data=pg1.storm_summary;
tables Basin Type Season;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Lesson 3 Exploring and Validating Data
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
1. Exploring Data with Procedures
The pg1.np_summary table contains public use statistics from the National Park Service.
Use the PRINT, MEANS, UNIVARIATE, and FREQ procedures to explore the data for possible
inconsistencies.
a. Open p103p01.sas from the practices folder. Complete the PROC PRINT statement to list
the first 20 observations in pg1.np_summary.
b. Add a VAR statement to include only the following variables: Reg, Type, ParkName,
DayVisits, TentCampers, and RVCampers. Highlight the step and run the selected code.
Do you observe any possible inconsistencies in the data?
c. Copy the PROC PRINT step and paste it at the end of the program. Change PRINT to
MEANS and remove the OBS= data set option. Modify the VAR statement to calculate
summary statistics for DayVisits, TentCampers, and RVCampers. Highlight the step and
run the selected code.
What is the minimum value for tent campers? Is that value unexpected?
d. Copy the PROC MEANS step and paste it at the end of the program. Change MEANS
to UNIVARIATE. Highlight the step and run the selected code.
Are there negative values for any of the columns?
e. Copy the PROC UNIVARIATE step and paste it at the end of the program. Change
UNIVARIATE to FREQ. Change the VAR statement to a TABLES statement to produce
frequency tables for Reg and Type. Highlight the step and run the selected code.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-15
Are there any lowercase codes? Are there any codes that occur only once in the table?
f. Add comments before each step to document the program. Save the program as
np_validate.sas in the output folder.
Level 2
2. Using Procedures to Validate Data
The pg1.np_summary table contains information about US national parks, monuments,
preserves, rivers, and seashores. Valid values for the columns Reg and Type are as follows:
Reg Description
Type Description
A Alaska
NM National Monument
IM Intermountain
NP National Park
MW Midwest
NS National Seashore
NC National Capital
PRE National Preserve
NE Northeast
RVR National River
PW Pacific West
SE Southeast
a. Create a new program. Write a PROC FREQ step to produce frequency tables for the Reg
and Type columns in the pg1.np_summary table. Submit the step and look for invalid
values.
b. Write a PROC UNIVARIATE step to generate statistics for the Acres column in the
pg1.np_summary table. Notice the observation numbers for the smallest park and the
largest park.
c. View the pg1.np_summary table to identify the name of the smallest and largest park s.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-16 Lesson 3 Exploring and Validating Data
Challenge
3. Generating Extreme Observations Output
The pg1.eu_occ table includes monthly occupancy counts for European countries between
January 2004 and September 2017.
The SAS Output Delivery System (ODS) gives you options for controlling the type and format
of the output that is generated by SAS code. The ODS SELECT statement is used to specify
output objects for results. The ODS SELECT statement can be used to generate a report
containing only the Extreme Observations output.
Note: To specify an output object, you need to know which output objects your SAS program
produces. The ODS TRACE statement writes to the SAS log a trace record that includes
the path, the label, and other information about each output object that your SAS
program produces. You can find documentation about the ODS TRACE and ODS
SELECT statements in the SAS Help Facility and in the online documentation.
a. Create a new program. Write a PROC UNIVARIATE step to examine Camp in the
pg1.eu_occ table.
b. Add the ODS TRACE statements before and after PROC UNIVARIATE as follows.
ods trace on;
proc univariate data=pg1.eu_occ;
var camp;
run;
ods trace off;
c. Submit the program and notice the trace information in the SAS log. Determine the name
of the Extreme Observations output object.
d. Delete the ODS TRACE statements. Add an ODS SELECT statement immediately before
the PROC UNIVARIATE step and provide the name of the Extreme Observation output
object.
Note: This method can be used with other procedures that create multiple tables (such as
PROC CONTENTS) to select a portion of the output.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Exploring Data 3-17
e. Using the SAS documentation or the syntax Help in the editor, identify the option that
specifies the number of extreme observations that are listed in the table. Use the option
to change the number of extreme observations from five to 10. Submit the program.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-18 Lesson 3 Exploring and Validating Data
PROC procedure-name . . . ;
WHERE expression;
RUN;
filters rows in
the results based If expression is true,
on the expression include the row
in the results.
19
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
What if you want to filter the rows that appear in a PROC PRINT report? Or what if you want to
calculate summary statistics for only a subset of the data based on a condition? You can use the
powerful and flexible WHERE statement to subset your data. The WHERE statement can be used in
PROC PRINT, MEANS, FREQ, UNIVARIATE and many others.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-19
= or EQ < or LT
Type = "SUV"
Type EQ "SUV"
^= or ~= or NE >= or GE
MSRP <= 30000
> or GT <= or LE
MSRP LE 30000
20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The WHERE statement consists of the keyword WHERE followed by one or more expressions. An
expression tests the value of a column against a condition. The expression evaluates as true or false
for each row.
Note: Either the symbol or letters can be used to represent these operators in an expression.
Type = "SUV"
Character values are case
sensitive and must be enclosed in Type = 'Wagon'
double or single quotation marks.
MSRP <= 30000
Numeric values must be standard
numeric (that is, no symbols).
21
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-20 Lesson 3 Exploring and Validating Data
22
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Dates are stored as numeric values, so the expression is evaluated based on a numeric comparison.
If you want to compare a date column to a fixed date, then you can use the SAS date constant
notation. SAS turns the string date into the numeric equivalent in order to evaluate the expression.
Combining Expressions
proc print data=sashelp.cars;
var Make Model Type MSRP MPG_City MPG_Highway;
where Type="SUV" and MSRP <= 30000;
run;
Expressions can be
combined with AND or OR.
23
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d02
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-21
24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The OR keyword can be used to provide multiple values, such as in this example. Notice that each
condition has to include TYPE=. This can be tedious if there are several valid values that must be
listed. A more efficient approach in this scenario is to use the IN operator to compare to a list of
values.
The IN operator works with both numeric and character values. Remember that character values are
case sensitive and must be enclosed in quotation marks. The keyword NOT can be used to reverse
the logic of the IN operator.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-22 Lesson 3 Exploring and Validating Data
Scenario
Use the WHERE statement and basic operators to subset rows in a procedure.
Files
• p103d02.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
WHERE expression;
Basic Operators:
=, EQ
^= , ~= , NE
> , GT
< , LT
>= , GE
<= , LE
IN(value1, …, valuen)
Notes
• The WHERE statement is used to filter rows. If the expression is true, rows are read.
If the expression is false, they are not.
• Character values are case sensitive and must be enclosed in quotation marks.
• Numeric values are not in quotation marks and must include only digits, decimal points,
and negative signs.
• Compound conditions can be created with AND or OR.
• The logic of an operator can be reversed with the NOT keyword.
• When an expression includes a fixed date value, use the SAS date constant syntax:
“ddmmmyyyy”d.
− dd represents a one- or two-digit day
− mmm represents a three-letter month in uppercase, lowercase, or mixed case
− yyyy represents a two- or four-digit year
Demo
1. Open p103d02.sas from the demos folder and find the Demo section of the program.
Write a PROC PRINT step to list the data in pg1.storm_summary.
2. Write a WHERE statement to include rows with MaxWindMPH values greater than or equal to
156 (Category 5 storms). Highlight the PROC PRINT step and run the selected code.
where MaxWindMPH >= 156;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-23
3. Modify the WHERE statement for each of the conditions below. Highlight the PROC PRINT step
and run the selected code after each condition.
a. Basin equal to WP (West Pacific)
where Basin = "WP";
b. Basin equal to SI or NI (South Indian or North Indian)
where Basin in ("SI" "NI");
c. StartDate on or after January 1, 2010
where StartDate >= "01jan2010"d;
d. Type equal to TS (tropical storm) and Hem_EW equal to W (west)
where Type = "TS" and Hem_EW = "W";
e. MaxWindMPH greater than 156 or MinPressure less than 920
where MaxWindMPH > 156 or MinPressure < 920;
4. In the final WHERE statement, are missing values included for MinPressure? How can you
exclude missing values?
where MaxWindMPH>156 or 0<MinPressure<920;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-24 Lesson 3 Exploring and Validating Data
26
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
IS NULL is another special operator that can be used with DBMS data. It distinguishes between null
and missing values. IS NULL and IS MISSING are the same when they are used with a SAS table.
27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-25
28
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
3.02 Activity
Open p103a02.sas from the activities folder and perform the following tasks:
1. Uncomment each WHERE statement one at a time and run the step to
observe the rows that are included in the results.
2. Comment all previous WHERE statements. Add a new WHERE statement
to print storms that begin with Z. How many storms are included in the
results?
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-26 Lesson 3 Exploring and Validating Data
31
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Suppose you have a program with multiple procedures, and you want to filter each where the value
of Type is Wagon. After you look at the results, you decide that you want similar reports where
Type=SUV and Type=Sedan. Find and replace is an option, but it would be preferable to change
that repeating value in one place.
Wagon
A SAS macro variable
SUV Sedan stores text that is
substituted in your code
when it runs. It’s like
macro
an automatic
variable find-and-replace.
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The SAS macro language enables you to design dynamic programs that are easy to update
or modify. A macro variable enables you to store text and use it in a program.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-27
The first step is to create the macro variable, and we do that with the %LET statement. All macro
statements begin with a % sign.
The next step is to use the macro variable in the program. In each place where Wagon is specified,
replace it with the macro variable that holds the value CarType. To reference a macro variable in a
program, precede the name with an ampersand.
Note: It is recommended that you do not include quotation marks when you define the macro
variable value. Use quotation marks when necessary after the macro variable is resolved.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-28 Lesson 3 Exploring and Validating Data
The ampersand triggers SAS to look up the text string stored in the CarType macro variable and
replace it with Wagon before it executes the code.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-29
EXIT
37
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Like libraries, macro variables are temporary, so when your SAS session ends, they are deleted. If
macro variable references are included in a program, the macro variables must be created before
they are referenced.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-30 Lesson 3 Exploring and Validating Data
Scenario
Modify a program to use SAS macro variables to filter data in multiple procedures.
Files
• p103d03.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
%LET macrovar=value;
WHERE numvar=¯ovar;
WHERE charvar="¯ovar";
WHERE datevar="¯ovar"d;
Notes
• A macro variable stores a text string that can be substituted into a SAS program.
• The %LET statement defines the macro variable name and assigns a value.
• Macro variable names must follow SAS naming rules.
• Macro variables can be referenced in a program by preceding the macro variable name with an &
(ampersand).
• If a macro variable reference is used inside quotation marks, double quotation marks must be
used.
Demo
1. Open p103d03.sas from the demos folder and find the Demo section of the program. Highlight
the demo program and run the selected code.
2. Write three %LET statements to create macro variables named WindSpeed, BasinCode, and
Date. Set the initial values of the variables to match the WHERE statement .
3. Modify the WHERE statement to reference the macro variables. Highlight the demo program and
run the selected code. Verify that the same results are produced.
%let WindSpeed=156;
%let BasinCode=NA;
%let Date=01JAN2000;
proc print data=pg1.storm_summary;
where MaxWindMPH>=&WindSpeed and Basin="&BasinCode" and
StartDate>="&Date"d;
var Basin Name StartDate EndDate MaxWindMPH;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-31
4. Change the values of the macro variables to values that you select. Possible values for Basin
include NA, WP, SP, WP, NI, and SI. Highlight the demo program and run the selected code.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-32 Lesson 3 Exploring and Validating Data
3.03 Activity
Open p103a03.sas from the activities folder and perform the following tasks:
1. Change the value in the %LET statement from NA to SP.
2. Run the program and carefully read the log.
Which procedure did not produce a report?
What is different about the WHERE statement in that step?
39
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-33
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
4. Filtering Rows in a Listing Report Using Character Data
The pg1.np_summary table contains public use statistics from the National Park Service.
The park type codes are inconsistent for national preserves. Examine these inconsistencies
by producing a report that lists any national preserve.
a. Open p103p04.sas from the practices folder. Add a WHERE statement to print only the
rows where ParkName includes Preserve.
Note: ParkName contains character values. These values are case sensitive.
b. Submit the program and view the results. Which codes are used for preserves?
Note: If you use double quotation marks in the WHERE statement, you receive a warning
in the log. To eliminate the warning, use single quotation marks.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-34 Lesson 3 Exploring and Validating Data
Level 2
6. Using Macro Variables to Subset Data in Procedures
a. Create a new program. Write a PROC FREQ step to analyze rows from pg1.np_species.
Include only rows where Species_ID starts with YOSE (Yosemite National Park) and
Category equals Mammal. Generate frequency tables for Abundance and
Conservation_Status.
b. Write a PROC PRINT step to list the same subset of rows from pg1.np_species. Include
Species_ID, Category, Scientific_Name, and Common_Names in the report. Run the
program.
c. Create a macro variable named ParkCode to store YOSE, and another macro variable
named SpeciesCat to store Mammal. Modify the code to reference the macro variables.
Run the program and confirm that the same results are generated.
Note: The macro variable values are case sensitive when they are used in a WHERE
statement.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Filtering Rows 3-35
d. Change the values of the macro variables to ZION (Zion National Park) and Bird. Run the
program.
Challenge
7. Eliminating Case Sensitivity in WHERE Conditions
Character comparisons in a WHERE statement are case sensitive. Use SAS functions to make
comparisons case insensitive.
a. Open pg1.np_traffic. Notice that the case of Location values is inconsistent.
b. Create a new program. Write a PROC PRINT step that lists ParkName, Location, and
Count. Print rows where Count is not equal to 0 and Location includes MAIN ENTRANCE.
Submit the program. Use the log to confirm that 38 rows are listed.
Note: If you use double quotation marks in the WHERE statement, you receive a warning
in the log. To eliminate the warning, use single quotation marks.
c. The UPCASE function can be used to eliminate case sensitivity in character WHERE
expressions. Use the UPCASE function on the Location column to include any case of
MAIN ENTRANCE. Run the program and verify that 40 rows are listed.
UPCASE(column)
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-36 Lesson 3 Exploring and Validating Data
Note: The UPCASE function in a WHERE statement does not permanently convert the
values of the column to uppercase.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Formatting Columns 3-37
$w.
w.d
Changing how
values appear
makes it easier to
interpret them.
43
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Sometimes when you are exploring data, it can be difficult to interpret the raw values in the data. For
example, it is impossible to visually evaluate SAS date values such as HireDate in their raw form.
Therefore, in your report, you might want to display the value in a date format that is easy to
understand. Numeric columns such as Salary store only digits and decimal points, but you might
want to display those numbers with commas or currency symbols to make them easier to interpret
quickly.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-38 Lesson 3 Exploring and Validating Data
44
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
45
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Note: International formats just add the symbol to the values. The formats do not convert values
from one currency to another.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Formatting Columns 3-39
3.04 Activity
1. Go to support.sas.com/documentation. Click Programming: SAS 9.4 and
Viya.
2. In the Syntax - Quick Links section, under Language Elements, select
Formats.
3. What does the Zw.d format do?
46
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
48
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-40 Lesson 3 Exploring and Validating Data
49
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p103d04
Here we are printing class_birthdate. You can format several columns using either the same format
or different formats in a single FORMAT statement. Here we are formatting the columns height and
weight with 3., which rounds the value to the nearest whole number, and we are formatting
Birthdate with the DATE9. format. These formats impact the way that the values are displayed in the
procedure results, but they do not change the raw data values themselves.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 Formatting Columns 3-41
Scenario
Use the FORMAT statement in a procedure to display data values as dates and currency.
Files
• p103d04.sas
• storm_damage – a SAS table that contains a description and damage estimates for storms in the
US with damages greater than one billion dollars
Syntax
<$>format-name<w>.<d>
Notes
• Formats are used to change how values are displayed in data and reports.
• Formats do not change the underlying data values.
• Formats can be applied in a procedure using the FORMAT statement.
• Visit SAS Language Elements documentation to access a list of available SAS formats.
Demo
1. Open p103d04.sas from the demos folder and find the Demo section of the program. Write
a PROC PRINT step to list the data in pg1.storm_damage. Highlight the step and run the
selected code.
2. Add a FORMAT statement to apply the MMDDYY10. format to Date and DOLLAR16. to Cost.
Highlight the step and run the selected code.
proc print data=pg1.storm_damage;
format Date mmddyy10. Cost dollar16.;
run;
3. Change the width of MMDDYY to 8 and DOLLAR to 14. Highlight the step and run the selected
code. Change MMDDYY to 6 and DOLLAR to 10. Highlight the step and run the selected code
again. What happens to the formatted values?
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-42 Lesson 3 Exploring and Validating Data
3.05 Activity
Open p103a05.sas from the activities folder and perform the following tasks:
1. Highlight the PROC PRINT step and run the selected code. Notice how
the values of Lat, Lon, StartDate, and EndDate are displayed in the report.
2. Change the width of the DATE format to 7 and run the PROC PRINT step.
How does the display of StartDate and EndDate change?
3. Change the width of the DATE format to 11 and run the PROC PRINT
step. How does the display of StartDate and EndDate change?
4. Highlight the PROC FREQ step and run the selected code. Notice that the
report includes the number of storms for each StartDate.
5. Add a FORMAT statement to apply the MONNAME. format to StartDate
and run the PROC FREQ step. How many rows are in the report?
51
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Sorting Data and Remov ing Duplicates 3-43
Sorting Data
55
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Sorting data can be a helpful or necessary step in exploring your data. You might want to sort on
groups or measures so that you can visually examine the high or low values. You might use sorting
as a way to identify and remove duplicate rows. Also, sorting might be required for certain data
processing steps.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-44 Lesson 3 Exploring and Validating Data
Sorting Data
PROC SORT DATA=input-table <OUT=output-table>;
BY <DESCENDING> col-name(s);
RUN;
56
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
So how does PROC SORT work? First, SAS rearranges the rows in the input table. Then SAS
creates a table that contains the rearranged rows either by replacing the original table or by creating
a new table. By default, SAS replaces the original SAS table unless the OUT= option specif ies an
output table. Keep in mind that PROC SORT does not generate printed output, so you have to open
or print the sorted table if you want to look at it.
Similar to PROC PRINT and other procedures, use the DATA= option to specify the input table. Next,
use the OUT= option in the PROC SORT statement to prevent permanently sorting the input table. If
you do not include the OUT= option, PROC SORT changes the sort order of the input table.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Sorting Data and Remov ing Duplicates 3-45
Sorting Data
proc sort data=pg1.class_test2 out=test_sort;
by Name;
run;
ascending order
by Name
57
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Sorting Data
proc sort data=pg1.class_test2 out=test_sort;
by Name TestScore;
run;
ascending order
by Name and then
within Name by
ascending TestScore
58
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-46 Lesson 3 Exploring and Validating Data
Sorting Data
proc sort data=pg1.class_test2 out=test_sort;
by Subject descending TestScore;
run;
ascending order
by Subject and then
within Subject by
descending TestScore
59
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
3.06 Activity
Open p103a06.sas from the activities folder and perform the following tasks:
1. Modify the OUT= option in the PROC SORT statement to create
a temporary table named storm_sort.
2. Complete the WHERE and BY statements to answer the following
question: Which storm in the North Atlantic basin (NA or na) had
the strongest MaxWindMPH?
60
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Sorting Data and Remov ing Duplicates 3-47
62
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
63
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-48 Lesson 3 Exploring and Validating Data
This removes
keeps only the first duplicate values
occurrence of each unique of the column listed
value of the BY variable in the BY statement.
64
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
65
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Sorting Data and Remov ing Duplicates 3-49
Scenario
Use the NODUPKEY option in PROC SORT to identify and remove duplicates.
Files
• p103d05.sas
• storm_detail – a SAS table that contains multiple rows per storm for the 2000 through 2016 storm
seasons. Each row represents one measurement for each six hours of a storm.
Syntax
Remove duplicate rows:
Notes
• The NODUPKEY option keeps only the first row for each unique value of the column or columns
listed in the BY statement.
• Using _ALL_ in the BY statement sorts by all columns and ensures that duplicate rows are
adjacent in the sorted table and are removed.
• The DUPOUT= option creates an output table in which the duplicates are removed.
Demo
1. Open p103d05.sas from the demos folder and find the Demo section of the program. Modify the
first PROC SORT step to sort by all columns and remove any duplicate rows. Write the removed
rows to a table named storm_dups. Highlight the step and run the selected code. Confirm that
there are 50,757 rows in storm_clean and 7 rows in storm_dups.
proc sort data=pg1.storm_detail out=storm_clean
nodupkey dupout=storm_dups;
by _all_;
run;
2. The second PROC SORT step is filtering for nonmissing values of Name and Pressure and then
sorting by descending Season, Basin, Name, and Pressure. Run the second PROC SORT step
and confirm that the first row for each storm represents the minimum value of Pressure.
Note: Because storm names can be reused in multiple years and basins, unique storms are
grouped by sorting by Season, Basin, and Name.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-50 Lesson 3 Exploring and Validating Data
3. Modify the third PROC SORT step to sort the min_pressure table from the previous PROC
SORT step, and keep the first row for each storm. You do not need to keep the removed
duplicates. Highlight the step and run the selected code.
proc sort data=min_pressure nodupkey;
by descending Season Basin Name;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Sorting Data and Remov ing Duplicates 3-51
• Visit the SAS 9.4 Procedures • Take the SAS Macro 1 course. • Learn about PROC FORMAT
Help page. • Read the SAS Macro in SAS Help.
• Browse or ask questions in Programming Made Easy • Take the SAS Programming 2
the SAS Procedures book. course.
community and see responses
from other SAS programmers.
67
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• Visit the SAS 9.4 Procedures Help page.
• Browse or ask questions in the SAS Procedures community and see responses from other SAS
programmers.
• Take the SAS Macro 1 course.
• Read the SAS Macro Programming Made Easy book.
• Learn about PROC FORMAT in SAS Help.
• Take the SAS Programming 2 course.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-52 Lesson 3 Exploring and Validating Data
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
8. Sorting Data and Creating an Output Table
Create the np_sort table that contains data for national parks ordered by regional code
and decreasing numbers of daily visitors.
a. Open p103p08.sas from the practices folder. Modify the PROC SORT step to read
pg1.np_summary and create a temporary sorted table named np_sort.
b. Add a BY statement to order the data by Reg and descending DayVisits.
c. Add a WHERE statement to select Type equal to NP. Submit the program.
Level 2
9. Sorting Data to Remove Duplicate Rows
The pg1.np_largeparks table contains gross acreage for large national parks. There are
duplicate rows for some locations.
a. Open and review the pg1.np_largeparks table. Notice that there are exact duplicate rows
for some parks.
b. Create a new program. Write a PROC SORT step that creates two tables (park_clean and
park_dups), and removes the duplicate rows. Submit the program.
park_clean
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Sorting Data and Remov ing Duplicates 3-53
park_dups
Challenge
10. Creating a Lookup Table from a Detailed Table
The pg1.eu_occ table includes multiple rows from each country code and country name.
Create a lookup table that includes a single row for each country code and name.
a. Create a new program. Write a PROC SORT step to sort pg1.eu_occ and create an output
table named countrylist. Remove duplicate key values. Sort by Geo and then Country.
b. To read only Geo and Country from the pg1.eu_occ table, you can use the KEEP= data set
option. Add the KEEP= option immediately after the input table and list Geo and Country.
data-set (KEEP=varlist)
c. Run the program and verify that only one row per country is included.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-54 Lesson 3 Exploring and Validating Data
3.5 Solutions
Solutions to Practices
1. Exploring Data with Procedures
/*Parts A and B*/
/*list first 20 rows*/
proc print data=pg1.np_summary(obs=20);
var Reg Type ParkName DayVisits TentCampers RVCampers;
run;
/*Part C*/
/*calculate summary statistics*/
proc means data=pg1.np_summary;
var DayVisits TentCampers RVCampers;
run;
/*Part D*/
/*examine extreme values*/
proc univariate data=pg1.np_summary;
var DayVisits TentCampers RVCampers;
run;
/*Part E*/
/*list unique values and frequency counts*/
proc freq data=pg1.np_summary;
tables Reg Type;
run;
b. Do you observe any possible inconsistencies in the data?
Yes. The Type column has inconsistencies. Notice that national preserve locations
have the code PRES and PRESERVE.
c. What is the minimum value for tent campers? Is that value unexpected?
The minimum value is zero. No, because it is possible that a park had zero tent
campers.
d. Are there negative values for any of the columns?
No
e. Are there any lowercase codes? Are there any codes that occur only once in the table?
There are no lowercase codes. NC, NPRE, and RIVERWAYS occur once in the table.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-55
*Part B;
proc univariate data=pg1.np_summary;
var Acres;
run;
a. What invalid values exist for Reg? None
What invalid values exist for Type? NPRE, PRESERVE, RIVERWAYS
c. What are the smallest and largest parks? Observation 78 (African Burial Ground
Monument, .35 acres) and observation 6 (Noatak National Preserve, 6,587,071.39
acres)
3. Generating Extreme Observations Output
*Part A and B;
ods trace on;
proc univariate data=pg1.eu_occ;
var camp;
run;
ods trace off;
*Part D and E;
ods select extremeobs;
proc univariate data=pg1.eu_occ nextrobs=10;
var camp;
run;
4. Filtering Rows in a Listing Report Using Character Data
proc print data=pg1.np_summary;
var Type ParkName;
where ParkName like '%Preserve%';
run;
5. Creating a Listing Report for Missing Data
*Part A;
proc print data=pg1.eu_occ;
where Hotel is missing and ShortStay is missing and
Camp is missing;
run;
*Part B;
proc print data=pg1.eu_occ;
where Hotel > 40000000;
run;
a. How many rows are included? 101
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-56 Lesson 3 Exploring and Validating Data
b. Which months are included in the report? The months are July or August.
6. Using Macro Variables to Subset Data in Procedures
%let ParkCode=ZION;
%let SpeciesCat=Bird;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-57
a. BY
b. ID
c. SUM
d. VAR
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
NOTE: There were 24 observations read from the data set PG1.STORM_SUMMARY.
WHERE name like 'Z%';
30
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-58 Lesson 3 Exploring and Validating Data
40
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Z8.
1350 00001350
47
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-59
continued...
3.05 Activity – Correct Answer
2. Change the width of the DATE format to 7 and run the PROC PRINT step.
How does the display of StartDate and EndDate change?
3. Change the width of the DATE format to 11 and run the PROC PRINT
step. How does the display of StartDate and EndDate change?
Formats are an
easy way to
The new group data in
report has 12 procedures!
rows.
53
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-60 Lesson 3 Exploring and Validating Data
61
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 4 Preparing Data
4.1 Reading and Filtering Data........................................................................................... 4-3
Practice............................................................................................................... 4-12
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Reading and Filtering Data 4-3
Analyze and
Access Explore Prepare report on
Export
data data data results
data
IF
THEN
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
After you explore your data, you likely want to make some adjustments based on what you find and
what you need. In this lesson, you learn various ways to subset data, and you use expressions and
functions to compute new columns. You also learn how to use conditional processing to obtain the
results that you want in your output data.
DATA Step
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-4 Lesson 4 Preparing Data
The DATA step is a robust, yet simple programming tool that can do everything from simple querying
to providing structure to messy weblogs. In this class, you become familiar with the most common
data manipulation actions, such as filtering rows and columns, computing new columns, and
performing conditional processing. Beyond these features, the DATA step also enables you to merge
or join tables, read complex raw data, and perform repetitive processing with DO loops or arrays.
These topics and many others are covered in SAS Programming 2: Data Manipulation Techniques
and other advanced programming courses.
specifies the
table to create
DATA output-table;
SET input-table;
RUN;
specifies the
table to read
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
When you work with data, you want to preserve your existing data and create a copy that you can
work on, so let’s start with a simple DATA step that does just that.
• The DATA statement names the table that you want to create, or the output table. This can be a
temporary table if you use the Work library or a permanent table if you use any other library. Be
aware that if the table you list in the DATA statement exists and you have Write access to it, the
DATA step overwrites that table.
• The SET statement names the existing table that you are reading from, or the input table. When I
reference a data source as libref.table, then based on a previous LIBNAME statement, SAS
knows where to find the data source and how to read it.
• The DATA step ends with a RUN statement.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Reading and Filtering Data 4-5
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Compilation Execution
• Check syntax for errors. • Read and write data.
• Identify column • Perform data
attributes. manipulations,
• Establish new table calculations, and so on.
What happens metadata.
behind the
scenes when a
DATA step runs?
7
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
How does the DATA step work behind the scenes? In this course, you need to have only a high-level
understanding of the process. The DATA step has two phases: compilation and execution. In the
compilation phase, SAS checks for syntax errors in the program and establishes the table metadata,
such as column name, type, and length. In the execution phase, the data is read, processed, and
written one row at time.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-6 Lesson 4 Preparing Data
data myclass;
Execution set sashelp.class;
...other statements...
1) Read a row from the run;
input table.
2) Sequentially process
statements. Automatic
3) At the end, write the row looping makes
to the output table. processing
4) Loop back to the top data easy!
of the DATA step to read
the next row from the
input table.
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
DATA step execution is like an automatic loop. The first time through the DATA step, the SET
statement reads the first row from the input table and then processes any other statements in
sequence, manipulating the values within that row. When SAS reaches the RUN statement, there is
an implied OUPUT action, and the new row is written to the output table. The DATA step then
automatically loops back to the top and executes the statements in order again, this time reading,
manipulating, and outputting the second row. That implicit loop continues until all rows are read from
the input table.
As you learn more about the DATA step, it is helpful to have a deep understanding of this behind-the-
scenes processing. The SAS Programming 2: Data Manipulation Techniques course addresses
more complex DATA step code and covers the details of the compile and execute phases.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Reading and Filtering Data 4-7
4.01 Activity
Open p104a01.sas from the activities folder and perform the following tasks:
1. Complete the DATA step to create a temporary table named storm_new
and read pg1.storm_summary. Run the program and read the log.
2. Define a library named out pointing to the output folder in the main
course files folder.
3. Change the program to save a permanent version of storm_new
in the out library. Run the modified program.
LIBNAME libref "path";
Keep this program
DATA output-table; open for the next
SET input-table; activity.
RUN;
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-8 Lesson 4 Preparing Data
DATA output-table;
SET input-table;
WHERE expression;
filters rows based RUN;
on the expression The DATA step reads
rows only from the
input table where the
expression is true.
13
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d01
The same WHERE syntax that works in a procedure to subset data for a report or analysis works in
the DATA step to filter rows. Only those rows from the input table that meet the criteria in the
WHERE statement are processed by the DATA step and written to the output table.
data myclass;
set sashelp.class;
where age >= 15;
run;
14
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Reading and Filtering Data 4-9
15
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d01
To specify the columns to include in the output data, use either the DROP statement or the KEEP
statement followed by the column names from the input table to drop or keep.
data myclass;
set sashelp.class;
keep name age height;
These
these statements
statements or drop sex weight;
have the same run;
result in the
output table.
table
16
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-10 Lesson 4 Preparing Data
4.03 Activity
Modify the program that you opened in the previous activity or open
p104a03.sas from the activities folder and perform the following tasks:
1. Change the name of the output table to storm_cat5.
2. Include only Category 5 storms (MaxWindMPH greater than or equal
to 156) with StartDate on or after 01JAN2000.
3. Add a statement to include the following columns in the output data:
Season, Basin, Name, Type, and MaxWindMPH. How many Category 5
storms occurred since January 1, 2000?
17
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
DATA output-table;
SET input-table;
FORMAT col-name format;
RUN; Formats in the
DATA step are
name of the name of the permanently
column that you format that you assigned to the
want to format want to apply columns.
19
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d01
The FORMAT statement is used in procedures to change how data values are displayed in a report
or analysis.
We can use the same FORMAT statement in the DATA step, but the impact is a little different. A
FORMAT statement in the DATA step permanently assigns a format to a column in the properties of
the new table. The raw data values are still stored in the table, but anytime you view the data or use
it in procedures, the formats are automatically applied.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Reading and Filtering Data 4-11
sashelp.class myclass
20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-12 Lesson 4 Preparing Data
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
1. Creating a SAS Table
The pg1.eu_occ SAS table contains monthly occupancy rates for European countries from
January 2004 through September 2017.
a. Open the pg1.eu_occ table and examine the column names and values.
b. Open p104p01.sas from the practices folder. Modify the code to create a temporary table
named eu_occ2016 and read pg1.eu_occ.
c. Complete the WHERE statement to select only the stays that were reported in 2016. Notice
that YearMon is a character column and the first four positions represent the year.
d. Complete the FORMAT statement in the DATA step to apply the COMMA17. format to the
Hotel, ShortStay, and Camp columns.
e. Complete the DROP statement to exclude Geo from the output table.
Level 2
2. Creating a Permanent SAS Table
The np_species table includes one row for each species that is found in each national park.
a. Create a new program. Write a DATA step to read the pg1.np_species table and create a
new permanent table named fox. Write the new table to the output folder.
b. Include only the rows where Category is Mammal and Common_Names includes Fox.
c. Exclude the Category, Record_Status, Occurrence, and Nativeness columns. Run the
program.
d. Notice that Fox Squirrels are included in the output table. Add a condition in the WHERE
statement to exclude rows that include Squirrel.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.1 Reading and Filtering Data 4-13
Challenge
3. Creating a SAS Table Using Macro Variables
The np_species table includes one row for each species that is found in each national park.
a. Write a new program that creates a temporary table named Mammal that includes only the
mammals from the pg1.np_species table. Do not include Abundance, Seasonality, or
Conservation_Status in the output table.
b. Use PROC FREQ to determine how many species there are for each unique value of
Record_Status.
c. Modify the program to use a macro variable to change Mammal to other values of Category.
Change the macro variable value to Bird and run the program.
Note: Use PROC FREQ to determine the unique values of Category.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-14 Lesson 4 Preparing Data
DATA output-table;
arithmetic expression
SET input-table;
assignment statement or constant
new-column = expression;
RUN;
The assignment
statement can
create or update
a column.
23
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Often your data does not have all the columns that you need, and you might want to calculate or
derive new columns from existing columns. Fortunately, this is easy to do in the DATA step. To create
new columns, you use an assignment statement. You simply type the name of the new column, an
equal sign, and then the expression that creates a new data value.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-15
24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d02
In this example, the WHERE statement includes rows where Origin is not equal to USA. The first
assignment statement creates the new column Profit using a simple arithmetic expression. SAS
creates the numeric column Profit and generates a value for every row in the output table by
subtracting Invoice from MSRP. The second assignment statement creates a column named
Source and assigns the character string Non-US Cars. Notice that because there is a KEEP
statement, you must explicitly list the new columns so that they are included in the cars_new table.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-16 Lesson 4 Preparing Data
Files
• p104d02.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
DATA output-table;
SET input-table;
new-column = expression;
RUN;
Notes
• The name of the column to be created or updated is listed on the left side of the equal sign.
• Provide an expression on the right side of the equal sign.
• SAS automatically defines the required attributes (name, type, and length) if the column is new.
• A new numeric column has a length of 8.
• The length of a new character column is determined based on the length of the assigned string.
• Character strings must be enclosed in quotation marks and are case sensitive.
Demo
1. Open p104d02.sas from the demos folder and find the Demo section of the program. Add
an assignment statement to create a numeric column named MaxWindKM by multiplying
MaxWindMPH by 1.60934.
2. Add a FORMAT statement to round MaxWindKM to the nearest whole number.
3. Add an assignment statement to create a new character column named StormType that is equal
to Tropical Storm. Highlight the DATA step and run the selected code.
data tropical_storm;
set pg1.storm_summary;
drop Hem_EW Hem_NS Lat Lon;
where Type="TS";
*Add assignment and FORMAT statements;
MaxWindKM=MaxWindMPH*1.60934;
format MaxWindKM 3.;
StormType="Tropical Storm";
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-17
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-18 Lesson 4 Preparing Data
4.04 Activity
Open p104a04.sas from the activities folder and perform the following tasks:
1. Add an assignment statement to create StormLength that represents
the number of days between StartDate and EndDate.
2. Run the program. In 1980, how long did the storm named Agatha last?
data storm_length;
set pg1.storm_summary;
drop Hem_EW Hem_NS Lat Lon;
*Add assignment statement;
run;
26
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Functions
A function is
a routine that
returns a value.
28
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Arithmetic calculations and character constants are a good start for creating new columns, but often
you need more elaborate or flexible methods for generating the new data values. SAS offers
hundreds of functions that can be used in countless ways to manipulate numeric, character, and
date values.
The syntax for a function is the function name, followed by the arguments enclosed in parentheses.
The arguments are separated by commas. The arguments consist of the input that the function
needs to perform its specific routine and return a value.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-19
Functions
DATA output-table;
SET input-table;
new-column=function(arguments);
RUN;
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Numeric Functions
Functions
SUM (num1, num2, ...)
SAS has a collection of summary statistics functions, including SUM, MEAN, MEDIAN, and RANGE.
Each of these functions can have an unlimited number of arguments, and each argument provides
either a numeric constant or numeric column in the data. The function calculates the summary
statistic from the values of the arguments for each row in the data. One interesting note about these
summary functions is that if any of the input values are missing, the missing value or values are
ignored, and the calculation is based on the known values.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-20 Lesson 4 Preparing Data
Numeric Functions
data cars_new;
set sashelp.cars;
MPG_Mean=mean(MPG_City, MPG_Highway);
format MPG_Mean 4.1;
keep Make Model MPG_City MPG_Highway MPG_Mean;
run;
31
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d03
In this example code, an assignment statement creates a column named MPG_Mean. The MEAN
function is used with the arguments MPG_City and MPG_Highway to supply values for
MPG_Mean. Notice that the FORMAT statement rounds the displayed values of MPG_Mean to one
decimal place.
4.05 Activity
Open p104a05.sas from the activities folder and perform the following tasks:
1. Open the pg1.storm_range table and examine the columns. Notice that
each storm has four wind speed measurements.
2. Create a new column named WindAvg that is the mean of Wind1, Wind2,
Wind3, and Wind4.
3. Create a new column WindRange that is the range of Wind1, Wind2,
Wind3, and Wind4.
data storm_windavg;
set pg1.storm_range;
*Add assignment statements;
run;
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-21
Character Functions
34
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Note: The default delimiters for the PROPCASE function are a blank, forward slash, hyphen, open
parenthesis, period, and tab. To use a different list of delimiters, specify a list of characters in
a single set of quotation marks as the second argument in the function.
Character Functions
data cars_new;
set sashelp.cars; Type is an
Type=upcase(Type); existing column.
keep Make Model Type;
run;
35
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d03
As a simple example, let’s look at the UPCASE function. It requires one argument: a character
column. The UPCASE function returns the uppercase equivalent of the input data values. In this
case, we are not creating a new column in the output data. We are converting the values in the Type
column to uppercase in the cars_new data.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-22 Lesson 4 Preparing Data
Files
• p104d03.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
UPCASE(char)
PROPCASE(char, <delimiters>)
Notes
• The UPCASE function converts character values to uppercase.
• The PROPCASE function changes the first letter of each word to uppercase and other letters
to lowercase.
• The CATS function concatenates character values and removes any leading or trai ling blanks.
• The SUBSTR function extracts a string from a character value.
Demo
1. Open p104d03.sas from the demos folder and find the Demo section of the program.
Add an assignment statement to convert Basin to all uppercase letters using the UPCASE
function.
2. Add an assignment statement to convert Name to proper case using the PROPCASE function.
3. Add an assignment statement to create Hemisphere, which concatenates Hem_NS and
Hem_EW using the CATS function.
4. Add an assignment statement to create Ocean, which extracts the second letter of Basin using
the SUBSTR function. Highlight the DATA step and run the selected code.
data storm_new;
set pg1.storm_summary;
drop Type Hem_EW Hem_NS MinPressure Lat Lon;
*Add assignment statements;
Basin=upcase(Basin);
Name=propcase(Name);
Hemisphere=cats(Hem_NS, Hem_EW);
Ocean=substr(Basin,2,1);
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-23
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-24 Lesson 4 Preparing Data
4.06 Activity
Open p104a06.sas from the activities folder and perform the following tasks:
1. Add a WHERE statement that uses the SUBSTR function to include rows
where the second letter of Basin is P (Pacific ocean storms).
2. Run the program and view the log and data. How many storms were in
the Pacific basin?
data pacific;
set pg1.storm_summary;
drop Type Hem_EW Hem_NS MinPressure Lat Lon;
*Add a WHERE statement that uses the SUBSTR function;
run;
37
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Date Functions
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-25
Date Functions
40
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Note: The optional third argument in the YRDIF function is called the basis. The basis value
describes how SAS calculates a date difference or a person’s age. When calculating the age
of a person or event, 'AGE' should be used as the basis. Visit the SAS documentation for the
YRDIF function to learn about other values for the basis.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-26 Lesson 4 Preparing Data
Files
• p104d04.sas
• storm_damage – a SAS table that contains a description and damage estimates for storms in the
US with damages greater than one billion dollars
Syntax
YEAR(SAS-date)
MONTH(SAS-date)
DAY(SAS-date)
WEEKDAY(SAS-date)
TODAY()
MDY(month, day, year)
YRDIF(startdate, enddate, 'AGE')
Notes
• The YEAR, MONTH, DAY, and WEEKDAY functions return a numeric value. For WEEKDAY, 1
represents Sunday.
• The TODAY function returns the current date based on the system clock as a SAS date value.
• The MDY function creates a SAS date based on numeric month, day, and year values.
• The YRDIF function calculates a precise age between two dates. There are various values for the
third argument. However, 'AGE' should be used for accuracy.
Demo
1. Open p104d04.sas from the demos folder and find the Demo section of the program. Create
the column YearsPassed and use the YRDIF function. The difference in years should be based
on each Date value and today’s date.
2. Create Anniversary as the day and month of each storm in the current year.
3. Format YearsPassed to round the value to one decimal place, and Date and Anniversary as
MM/DD/YYYY. Highlight the DATA step and run the selected code.
data storm_damage2;
set pg1.storm_damage;
drop Summary;
*Add assignment and FORMAT statements;
YearsPassed=yrdif(Date,today(),'age');
Anniversary=mdy(month(Date),day(Date),year(today()));
format YearsPassed 4.1 Date Anniversary mmddyy10.;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-27
Note: Values for YearsPassed and Anniversary will be different based on the current date.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-28 Lesson 4 Preparing Data
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
4. Creating New Columns
Create a new table named np_summary_update from pg1.np_summary. Create two new
columns: SqMiles and Camping.
a. Open p104p04.sas from the practices folder. Create a new column named SqMiles by
multiplying Acres by .0015625.
b. Create a new column named Camping as the sum of OtherCamping, TentCampers,
RVCampers, and BackcountryCampers.
c. Format SqMiles and Camping to include commas and zero decimal places.
d. Modify the KEEP statement to include the new columns. Run the program.
Level 2
5. Creating New Columns with Character and Date Functions
The pg1.eu_occ table contains individual columns for nights spent at hotels, short stay
accommodations, or camps for each year and month. The YearMon column is character.
a. Open a new program. Write a DATA step to create a temporary table named eu_occ_total
based on the pg1.eu_occ table. Create the following new columns:
• Year – the four-digit year extracted from YearMon.
• Month – the two-digit month extracted from YearMon.
• ReportDate – the first day of the reporting month.
Note: Use the MDY function and the new Year and Month columns.
• Total – the total nights spent at any establishment. Format the new column to display
the values with commas.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.2 Computing New Columns 4-29
b. Format Hotel, ShortStay, Camp, and Total with commas. Format ReportDate to display
the values in the form JAN2018.
c. Keep Country, Hotel, ShortStay, Camp, ReportDate, and Total in the new table.
Challenge
6. Creating a New Column with the SCAN Function
a. Access SAS Help to learn about the SCAN function.
b. Create a new program. Create a new temporary table named np_summary2 based on the
pg1.np_summary table. Use the SCAN function to create a new column named ParkType
that is the last word of the ParkName column.
Note: Use a negative number for the second argument to count words from right to left
in the character string.
c. Keep Reg, Type, ParkName, and ParkType in the output table.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-30 Lesson 4 Preparing Data
44
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Often in the DATA step, we need to process data conditionally. In other words, if some condition is
met, then execute one statement. If a different condition is met, then execute another statement. We
can accomplish this using IF-THEN logic.
data cars2;
set sashelp.cars;
if MSRP<30000 then Cost_Group=1;
if MSRP>=30000 then Cost_Group=2;
keep Make Model Type MSRP Cost_Group;
run;
45
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d05
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-31
Files
• p104d05.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
Notes
• The expression following IF defines a condition that is evaluated as true or false for each row.
• If the condition is true, the statement following THEN is executed.
• Only one statement is permitted after THEN.
Demo
1. Open p104d05.sas from the demos folder and find the Demo section of the program.
Create a column named PressureGroup that is based on the following assignments:
MinPressure<=920 1
MinPressure>920 0
data storm_new;
set pg1.storm_summary;
keep Season Name Basin MinPressure PressureGroup;
*Add IF-THEN statements;
if MinPressure<=920 then PressureGroup=1;
if MinPressure>920 then PressureGroup=0;
run;
2. Highlight the DATA step, run the selected code, and examine the data. What value is assigned
to PressureGroup when MinPressure is missing?
3. Add a new IF-THEN statement before the existing IF-THEN statements to assign
PressureGroup=. if MinPressure is missing.
data storm_new;
set pg1.storm_summary;
keep Season Name Basin MinPressure PressureGroup;
*Add IF-THEN statements;
if MinPressure=. then PressureGroup=.;
if MinPressure<=920 then PressureGroup=1;
if MinPressure>920 then PressureGroup=0;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-32 Lesson 4 Preparing Data
4. Highlight the DATA step and run the selected code. What value is assigned to PressureGroup?
When MinPressure is missing, the first two IF conditions are true. The last assignment
statement determines the value of PressureGroup.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-33
47
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
When you have multiple IF-THEN statements, SAS tests all conditions in sequence for every row of
the data. The last true condition executes the statement that determines the value in the output
table. Suppose you want to treat these conditions as a hierarchy so that when a true condition is
found, SAS simply executes the statement following THEN and skips the subsequent IF statements.
If you want to enforce this type of sequential testing, be sure to use the keyword ELSE.
data cars2;
set sashelp.cars;
if MSRP<20000 then Cost_Group=1;
else if MSRP<40000 then Cost_Group=2;
else if MSRP<60000 then Cost_Group=3;
else Cost_Group=4;
keep Make Model Type MSRP Cost_Group;
run;
48
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
The keyword ELSE is not in the first statement, but it has been added in the three statements that
follow. This tells SAS to test the conditions only until a true expression is found.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-34 Lesson 4 Preparing Data
Example: MSRP=35000
data cars2;
set sashelp.cars;
false
if MSRP<20000 then Cost_Group=1;
else if MSRP<40000 then Cost_Group=2;
else if MSRP<60000 then Cost_Group=3;
else Cost_Group=4;
keep Make Model Type MSRP Cost_Group;
run;
49
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
Let's look at an example where MSRP is equal to 35000. The first IF-THEN statement is false, so
SAS moves to the next statement.
Example: MSRP=35000
data cars2;
set sashelp.cars;
if MSRP<20000 then Cost_Group=1;
true execute
else if MSRP<40000 then Cost_Group=2;
else if MSRP<60000 then Cost_Group=3;
else Cost_Group=4;
keep Make Model Type MSRP Cost_Group;
run;
50
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-35
Example: MSRP=35000
data cars2;
set sashelp.cars;
if MSRP<20000 then Cost_Group=1; skip
else if MSRP<40000 then Cost_Group=2;
else if MSRP<60000 then Cost_Group=3;
else Cost_Group=4;
keep Make Model Type MSRP Cost_Group;
run;
51
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
Example: MSRP=75000
data cars2;
set sashelp.cars;
if MSRP<20000 then Cost_Group=1;
false else if MSRP<40000 then Cost_Group=2;
else if MSRP<60000 then Cost_Group=3;
else Cost_Group=4;
execute keep Make Model Type MSRP Cost_Group;
run;
The final ELSE statement
executes if all previous
conditions were false.
52
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
For a row with MSRP equal to 75000, none of the stated MSRP conditions are true, so the last
assignment statement is executed. Notice in this final ELSE statement that there is no condition, just
an assignment statement. There is no reason to test that final condition because if the preceding
conditions are all false, we know Cost_Group should be 4.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-36 Lesson 4 Preparing Data
4.07 Activity
Open p104a07.sas from the activities folder and perform the following tasks:
1. Add the ELSE keyword to test conditions sequentially until a true
condition is met.
2. Change the final IF-THEN statement to an ELSE statement.
3. How many storms are in PressureGroup 1?
53
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
55
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-37
56
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
It is important to know that the first occurrence of a column in the DATA step defines the name, type,
and length of the column. So, if you have an assignment statement that defines a character column
and assigns the value Basic, the column is created with a length of 5, the number of characters in
the word Basic. You can see from the output that Luxury is truncated because it has six characters.
number of
LENGTH char-column $ length; bytes or
characters
57
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
One way to avoid this problem is to explicitly define a character column in the DATA step with a
LENGTH statement. The syntax for this statement is the keyword LENGTH followed by the name of
the column, a dollar sign to indicate a character column, and the length that you want to assign.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-38 Lesson 4 Preparing Data
explicitly creates
data cars2; a new character column
set sashelp.cars; with a length of 6
length CarType $ 6;
if MSRP<60000 then CarType="Basic";
else CarType="Luxury";
keep Make Model MSRP CarType;
run;
58
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p104d06
4.08 Activity
Open p104a08.sas from the activities folder and perform the following tasks:
1. Run the program and examine the results. Why is Ocean truncated?
What value is assigned when Basin='na'?
2. Modify the program to add a LENGTH statement to declare the name,
type, and length of Ocean before the column is created.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-39
data cars2;
set sashelp.cars;
if MPG_City>26 and MPG_Highway>30 then Efficiency=1;
else if MPG_City>20 and MPG_Highway>25 then Efficiency=2;
else Efficiency=3;
keep Make Model MPG_City MPG_Highway Efficiency;
run;
OR One condition
must be true.
62
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
data cars2;
set sashelp.cars;
length Cost_Type $ 4;
if MSRP<20000 then Cost_Group=1 and Cost_Type="Low";
else if MSRP<40000 then Cost_Group=2 and Cost_Type="Mid";
else Cost_Group=3 and Cost_Type="High";
run;
This program doesn’t
Compound work because only
statements one statement is
are not allowed. permitted after THEN.
63
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
If you can specify a compound condition to evaluate, can you do the same after the keyword THEN
to execute multiple statements? If you attempt to use AND between two statements, the program
fails with a syntax error because you are allowed only one executable statement following THEN.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-40 Lesson 4 Preparing Data
64
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS offers alternate syntax that you can use when you want to execute multiple statements for a
given condition. We call this syntax IF-THEN/DO. After a condition, you type THEN DO and a
semicolon. After that statement, you can list as many statements as you need to process, and then
close the block with an END statement. This is repeated for each of the ELSE IF or ELSE DO
blocks.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-41
In this example, we use the DATA step to create not one, but two tables. In the DATA statement, we
can list more than one output table. In the first condition, if MSRP is less than 20000, we assign
Cost_Group a value of 1, and then use the explicit OUTPUT statement to tell SAS which of the two
tables to write that row to. Just remember that because these statements execute in sequence, we
must first assign a value to Cost_Group and then output the row to a particular table. The remaining
conditions also include statements to assign a different value to Cost_Group and output to either
the under40 or over40 table.
4.09 Activity
Open p104a09.sas from the activities folder. Run the program. Why does the
program fail?
66
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-42 Lesson 4 Preparing Data
Files
• p104d07.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
Syntax
Notes
• After the IF-THEN/DO statement, list any number of executable statements.
• Close each DO block with an END statement.
Demo
Open p104d07.sas from the demos folder and find the Demo section of the program. Modify the
IF-THEN statements to use IF-THEN/DO syntax to write rows to either the indian, atlantic, or
pacific table based on the value of Ocean. Highlight the DATA step and run the selected code.
data indian atlantic pacific;
set pg1.storm_summary;
length Ocean $ 8;
keep Basin Season Name MaxWindMPH Ocean;
Basin=upcase(Basin);
OceanCode=substr(Basin,2,1);
*Modify the program to use IF-THEN-DO syntax;
if OceanCode="I" then do;
Ocean="Indian";
output indian;
end;
else if OceanCode="A" then do;
Ocean="Atlantic";
output atlantic;
end;
else do;
Ocean="Pacific";
output pacific;
end;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-43
run;
indian Table
atlantic Table
pacific Table
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-44 Lesson 4 Preparing Data
69
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• SAS Programming 2: Data Manipulation Techniques
• SAS Programming 3: Advanced Techniques and Efficiencies
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-45
• Stick around for the last • Read this blog post: Reasons • Look for Reading Text Files
lesson! to love PROC DS2. with the DATA Step on the
• Take the SAS SQL 1 course. • Take the DS2 Programming Extended Learning page.
course.
70
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• Take the SAS SQL 1 course.
• Read this blog post: Reasons to love PROC DS2.
• Take the DS2 Programming course.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-46 Lesson 4 Preparing Data
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
7. Processing Statements Conditionally with IF-THEN/ELSE
The pg1.np_summary table contains public use statistics from the National Park Service. The
values of the Type column represent park type as a code. Create a new column, ParkType,
that contains full descriptive values.
a. Open p104p07.sas from the practices folder. Submit the program and view the generated
output.
b. In the DATA step, use IF-THEN/ELSE statements to create a new column, ParkType,
based on the value of Type.
Type ParkType
NM Monument
NP Park
NS Seashore
c. Modify the PROC FREQ step to generate a frequency report for ParkType.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.3 Conditional Processing 4-47
Level 2
8. Processing Statements Conditionally with DO Groups
Use conditional processing to split pg1.np_summary into two tables: parks and monuments.
a. Create a new program. Write a DATA step to create two temporary tables named parks and
monuments based on the pg1.np_summary table. Read only national parks or monuments
from the input table. (Type is either NP or NM.)
b. Create a new column named Campers that is the sum of all columns containing counts of
campers. Format the column to include commas.
c. When Type is NP, create a new column named ParkType that is equal to Park, and write the
row to the parks table. When Type is NM, assign ParkType as Monument and write the row
to the monuments table.
d. Keep Reg, ParkName, DayVisits, OtherLodging, Campers, and ParkType in both output
tables.
parks Table
monuments Table
Challenge
9. Processing Statements Conditionally with SELECT-WHEN Groups
SELECT and WHEN statements can be used in a DATA step as an alternative to IF-THEN
statements to process code conditionally.
a. Use SAS Help or online documentation to read about using SELECT and WHEN statements
in the DATA step.
b. Repeat Practice 8 using SELECT groups and WHEN statements.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-48 Lesson 4 Preparing Data
4.4 Solutions
Solutions to Practices
1. Creating a SAS Table
data eu_occ2016;
set pg1.eu_occ;
where YearMon like "2016%";
format Hotel ShortStay Camp comma17.;
drop geo;
run;
2. Creating a Permanent SAS Table
libname out "s:/workshop/output";
data out.fox;
set pg1.np_species;
where Category='Mammal' and Common_Names like '%Fox%'
and Common_Names not like '%Squirrel%';
drop Category Record_Status Occurrence Nativeness;
run;
data &cat;
set pg1.np_species;
where Category="&cat";
drop Abundance Seasonality Conservation_Status;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-49
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-50 Lesson 4 Preparing Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-51
temporary table
data storm_new;
set pg1.storm_summary;
run;
permanent table
12
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-52 Lesson 4 Preparing Data
data out.storm_cat5;
set pg1.storm_summary;
where StartDate>="01jan2000"d and MaxWindMPH>=156;
keep Season Basin Name Type MaxWindMPH;
run;
There were 18 Category 5 storms since January 1, 2000. How is the KEEP
NOTE: There were 18 observations read statement different from
from the data set PG1.STORM_SUMMARY. the VAR statement in
WHERE (StartDate>='01JAN2000'D) PROC PRINT?
and (MaxWindMPH>=156);
data storm_length;
set pg1.storm_summary;
drop Hem_EW Hem_NS Lat Lon;
StormLength = EndDate-StartDate;
run;
27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-53
data storm_windavg;
set pg1.storm_range;
WindAvg=mean(wind1, wind2, wind3, wind4);
WindRange=range(of wind1-wind4);
run;
OF col1 - coln
That's a good
shortcut for listing a
range of columns!
33
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
NOTE: There were 1958 observations read from the data set
PG1.STORM_SUMMARY.
WHERE SUBSTR(basin, 2, 1)='P';
NOTE: The data set WORK.PACIFIC has 1958 observations and 6 variables.
38
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-54 Lesson 4 Preparing Data
data storm_cat;
set pg1.storm_summary;
keep Name Basin MinPressure StartDate PressureGroup;
*add ELSE keyword and remove final condition;
if MinPressure=. then PressureGroup=.;
else if MinPressure<=920 then PressureGroup=1;
else PressureGroup=0;
run;
54
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
60
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4.4 Solutions 4-55
61
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
4-56 Lesson 4 Preparing Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 5 Analyzing and Reporting
on Data
5.1 Enhancing Reports with Titles, Footnotes, and Labels................................................. 5-3
Demonstration: Enhancing Reports ........................................................................... 5-9
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Enhanc ing Reports with Titles, Footnot es, and Labels 5-3
Analyze
Access Explore Prepare and Export
data data data report results
on data
MEANS TITLE
LABEL
FOOTNOTE
FREQ
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Now that data access, validation, and manipulation are behind you, you are ready to address the
peak of the programming process: analyzing and reporting on the data. Analyzing your data can
mean a lot of different things. It could be basic summarization to examine what happened in the
past, or it could be complex data mining or machine learning algorithms to predict what might
happen in the future. In this lesson, you concentrate on summarizing data. Specifically, you explore
in more depth the procedures that you can use for exploration: PRINT, MEANS, and FREQ.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-4 Lesson 5 Analyzing and Reporting on Data
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p105d01
TITLE is a global statement that establishes a permanent title for all reports created in your SAS
session. The syntax is just the keyword TITLE followed by the title text enclosed in quotation marks.
You can have up to 10 titles. You specify a number 1 through 10 after the keyword TITLE to indicate
the line number. TITLE and TITLE1 are equivalent.
You can also add footnotes to any report with the FOOTNOTE statement. The same rules for titles
apply to footnotes.
5.01 Activity
Open p105a01.sas from the activities folder and perform the following tasks:
1. In the program, notice that there is a TITLE statement followed by two
procedures. Run the program. Where does the title appear in the output?
2. Add a TITLE2 statement above PROC MEANS to print a second line:
Summary Statistics for MaxWind and MinPressure
3. Add another TITLE2 statement above PROC FREQ with this title:
Frequency Report for Basin
4. Run the program. Which titles appear above each report?
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Enhanc ing Reports with Titles, Footnotes, and Labels 5-5
5.02 Activity
Open p105a02.sas from the activities folder. Notice that there are no TITLE
statements in the code. Run the program. Does the report have the same
titles assigned in the previous activity?
Yes
No
7
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Remember that TITLE and FOOTNOTE are global statements, and they remain active as long as
the SAS session is active. If you want to clear the titles and footnotes that you have specified, you
can use the keyword TITLE or FOOTNOTE with no text. That is called a null TITLE statement. The
null TITLE statement clears all the titles that you have specified on any line. It is a good idea to do
this at the end of your program. Client applications such as SAS Studio submit a null TITLE
statement for you at the end of your code, but it is a good idea to get in the habit of submitting the
statement yourself.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-6 Lesson 5 Analyzing and Reporting on Data
Some procedures include the name of the procedure in a title above the results. You can turn this off
by submitting an ODS statement with the NOPROCTITLE option. You do more with ODS in another
lesson.
%let age=13;
title;
footnote;
10
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
LABEL col-name="label-text";
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p105d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Enhanc ing Reports with Titles, Footnotes, and Labels 5-7
12
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
In PROC PRINT, you must use either the LABEL or SPLIT= option in the PROC PRINT statement
to display labels in the report. When you use the LABEL option, SAS determines whether to split the
labels to multiple lines, and if so, where to make the split. The SPLIT= option enables you to define
a character that forces labels to split in specific locations.
proc print data=sashelp.cars split="*";
var Make Model MSRP MPG_Highway MPG_City;
label MSRP="Manufacturer Suggested*Retail Price"
MPG_Highway="Highway Miles*per Gallon";
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-8 Lesson 5 Analyzing and Reporting on Data
Segmenting Reports
You can use the BY statement in a reporting procedure to segment a report based on the unique
values of one or more columns. For example, what if you want to generate a separate frequency
report for each value of Origin? You must sort the table by Origin first, and then use the BY
statement in PROC FREQ. Then SAS treats the rows for each value of Origin as a separate table
and runs the frequency report.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Enhanc ing Reports with Titles, Footnotes, and Labels 5-9
Enhancing Reports
Scenario
Use titles, footnotes, labels, and grouping to enhance a report.
Files
• p105d01.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
TITLEn "title-text";
FOOTNOTEn "footnote-text";
LABEL col-name="label-text"
col-name="label-text";
Notes
• TITLE is a global statement that establishes a permanent title for all reports that are created in
your SAS session.
• You can have a maximum of 10 titles. You use a number 1 through 10 after the keyword TITLE
to indicate the line number. TITLE and TITLE1 are equivalent.
• Titles can be replaced with an additional TITLE statement with the same number. TITLE; clears
all titles.
• You can also add footnotes to any report with the FOOTNOTE statement . The same rules for titles
apply to footnotes.
• Labels can be used to provide more descriptive column headings. A label can include any text
at a maximum of 256 characters.
• All procedures automatically display labels except for PROC PRINT. You must add the LABEL
option in the PROC PRINT statement.
• To create a grouped report, first use PROC SORT to arrange the data by the grouping column,
and then use the BY statement in the reporting procedure.
Demo
1. Open p105d01.sas from the demos folder and find the Demo section of the program. Add a
PROC SORT step before PROC PRINT to sort pg1.storm_final by BasinName and descending
MaxWindMPH. Create a temporary table named storm_sort. Filter the rows to include only
MaxWindMPH>156.
proc sort data=pg1.storm_final out=storm_sort;
by BasinName descending MaxWindMPH;
where MaxWindMPH > 156;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-10 Lesson 5 Analyzing and Reporting on Data
2. Modify the PROC PRINT step to read the storm_sort table and group the report
by BasinName.
3. Add the following title: Category 5 Storms. Clear the title for future results.
4. Add labels for the following columns and ensure that PROC PRINT displays the labels:
MaxWindMPH Max Wind (MPH)
MinPressure Min Pressure
StartDate Start Date
StormLength Length of Storm (days)
5. Add the NOOBS option in the PROC PRINT statement to suppress the OBS column. Highlight
the demo program and run the selected code.
title "Category 5 Storms";
proc print data=storm_sort label noobs;
by BasinName;
var Season Name MaxWindMPH MinPressure StartDate StormLength;
label MaxWindMPH="Max Wind (MPH)"
MinPressure="Min Pressure"
StartDate="Start Date"
StormLength="Length of Storm (days)";
run;
title;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.1 Enhanc ing Reports with Titles, Footnotes, and Labels 5-11
15
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
When the LABEL statement is used in a DATA step, labels are assigned as permanent attributes
in the descriptor portion of the table. When procedures create reports using that data, labels are
automatically displayed. Notice that the LABEL option is still required in PROC PRINT.
5.03 Activity
Open p105a03.sas from the activities folder and perform the following tasks:
1. Modify the LABEL statement in the DATA step to label the Invoice column
as Invoice Price.
2. Run the program. Why do the labels appear in the PROC MEANS report
but not in the PROC PRINT report? Fix the program and run it again.
16
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-12 Lesson 5 Analyzing and Reporting on Data
number of
unique values
change
statistics
graphs to view
distribution 20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p105d02
PROC FREQ was used with the TABLES statement for data validation. However, many more
statements and options are available in PROC FREQ to customize the output and include additional
statistics.
21
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Frequency Reports 5-13
A basic f requency report is based on individual columns. By default, each column listed in the
TABLES statement generates a separate f requency table that includes the number and percentage
of rows f or each value in the data, as well as a cumulative f requency and percent. The numbers
included in this report can be customized using options in the PROC FREQ and TABLES
statements.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-14 Lesson 5 Analyzing and Reporting on Data
Files
• p105d02.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
Notes
• One or more TABLES statements can be used to define frequency tables and options.
• ODS Graphics enables graph options to be used in the TABLES statement.
• WHERE, FORMAT, LABEL, and BY statements can be used in PROC FREQ to customize
the report.
Demo
Note: Highlight the demo program and run the selected code after each step.
1. Open p105d02.sas from the demos folder and find the Demo section of the program.
Highlight the PROC FREQ step and run the selected code. Examine the default results.
2. In the PROC FREQ statement, add the ORDER=FREQ option to sort results by descending
frequency. Add the NLEVELS option to include a table with the number of distinct values.
proc freq data=pg1.storm_final order=freq nlevels;
3. Add the NOCUM option in the TABLES statement to suppress the cumulative columns.
tables BasinName Season / nocum;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Frequency Reports 5-15
4. Change Season to StartDate in the TABLES statement. Add a FORMAT statement to display
StartDate as the month name (MONNAME.).
proc freq data=pg1.storm_final order=freq nlevels;
tables BasinName StartDate / nocum;
format StartDate monname.;
run;
5. Add the ODS GRAPHICS ON statement before PROC FREQ. Use the PLOTS=FREQPLOT
option in the TABLES statement to create a bar chart. Add the chart options
ORIENT=HORIZONTAL and SCALE=PERCENT.
ods graphics on;
proc freq data=pg1.storm_final order=freq nlevels;
tables BasinName StartDate /
nocum plots=freqplot(orient=horizontal scale=percent) ;
format StartDate monname.;
run;
6. Add the title Frequency Report for Basin and Storm Month. Turn off the procedure title with
the ODS NOPROCTITLE statement. Add a LABEL statement to display BasinName as Basin
and StartDate as Storm Month. Clear the titles and turn the procedure titles back on.
ods graphics on;
ods noproctitle;
title "Frequency Report for Basin and Storm Month";
proc freq data=pg1.storm_final order=freq nlevels;
tables BasinName StartDate /
nocum plots=freqplot(orient=horizontal scale=percent);
format StartDate monname.;
label BasinName="Basin"
StartDate="Storm Month";
run;
title;
ods proctitle;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-16 Lesson 5 Analyzing and Reporting on Data
5.04 Activity
Open p105a04.sas from the activities folder and perform the following tasks:
1. Create a temporary output table named storm_count by completing the
OUT= option in the TABLES statement.
2. Add the NOPRINT option in the PROC FREQ statement to suppress the
printed report.
3. Run the program. Which statistics are included in the output table?
Which month has the highest number of storms?
23
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
rows columns
25
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p105d03
When you place an asterisk between two columns in the TABLES statement, PROC FREQ produces
a two-way frequency or crosstabulation report. A two-way frequency report can use some of the
same options that we have seen with the one-way frequency report, including NLEVELS to create
the number of levels table, ORDER= to control the sequence of rows, and OUT= to create an output
table. But there are additional options unique to the two-way frequency report that enable you to
apply different layouts to the results or include new statistics or analyses.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Frequency Reports 5-17
Files
• p105d03.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
Notes
• When you place an asterisk between two columns in the TABLES statement, PROC FREQ
produces a two-way frequency or crosstabulation report. The values of the first listed column are
the rows of the report, and the values of the second column are the columns.
• Use options in the TABLES statement to customize the table structure and the statistics that are
included in the output.
Demo
Note: Highlight the PROC FREQ step and run the selected code after each step.
1. Open p105d03.sas from the demos folder and find the Demo section of the program.
Highlight the PROC FREQ step, run the selected code, and examine the default results.
2. Add the NOPERCENT, NOROW, and NOCOL options in the TABLES statement.
tables StartDate*BasinName / norow nocol nopercent;
3. Delete the options in the TABLES statement and add the CROSSLIST option.
tables StartDate*BasinName / crosslist;
4. Change the CROSSLIST option to the LIST option in the TABLES statement.
tables StartDate*BasinName / list;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-18 Lesson 5 Analyzing and Reporting on Data
5. Delete the previous options and add OUT=STORMCOUNTS. Add NOPRINT to the PROC FREQ
statement to suppress the report.
proc freq data=pg1.storm_final noprint;
tables StartDate*BasinName / out=stormcounts;
format StartDate monname.;
label BasinName="Basin"
StartDate="Storm Month";
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Frequency Reports 5-19
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
1. Creating One-Way Frequency Reports
The pg1.np_species table provides a detailed species list for selected national parks.
Use this table to analyze categories of reported species.
a. Create a new program. Write a PROC FREQ step to analyze rows from pg1.np_species.
1) Use the TABLES statement to generate a frequency table for Category.
2) Use the NOCUM options to suppress the cumulative columns.
3) Use the ORDER=FREQ option in the PROC FREQ statement to order the results
by descending frequency.
4) Use Categories of Reported Species as the report title.
5) Run the program and review the results.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-20 Lesson 5 Analyzing and Reporting on Data
3) Add in the Everglades as a second title. Run the program and review the results.
Level 2
2. Creating Two-Way Frequency Reports
The pg1.np_codelookup table is primarily used to look up a park name or park code. However,
the table also includes columns for the park type and park region. Use this table to analyze the
frequency of park types by the various regions.
a. Create a new program. Write a PROC FREQ step to analyze rows from
pg1.np_codelookup. Generate a two-way frequency table for Type by Region. Exclude any
park type that contains the word Other. The levels with the most rows should come first in
the order. Suppress the display of column percentages. Use Park Types by Region as the
report title.
b. Run the program and review the results. Identify the top three park types based on total
frequency count.
Note: Statistics labels appear in the main table in Enterprise Guide if SAS Report is the
output format.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.2 Creating Frequency Reports 5-21
c. Modify the PROC FREQ step by limiting the park types to the three that were determined in
the previous step. In addition to suppressing the display of column percentages, display the
table using the CROSSLIST option. Add a frequency plot that groups the bars by the row
variable, displays row percentages, and has a horizontal orientation. Use Selected Park
Types by Region as the report title. Run the program and review the results.
Note: Use SAS documentation to learn how the GROUPBY=, SCALE=, and ORIENT=
options can be used to control the appearance of the plot.
Challenge
3. Creating a Customized Graph of a Two-Way Frequency Table
The SGPLOT procedure can be used to create statistical graphics such as histograms and
regression plots, in addition to simple graphics such as bar charts and line plots. Statements and
options enable you to control the appearance of your graph and add additional features such as
legends and reference lines.
a. Open p105p03.sas from the practices folder. Highlight the first TITLE statement and PROC
FREQ step, run the selected code, and examine the generated plot. The program subsets
the pg1.np_codelookup table for three park types: National Historic Site, National
Monument, and National Park. The plot uses a stacked layout with a horizontal orientation.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-22 Lesson 5 Analyzing and Reporting on Data
b. To create a more customized frequency bar chart, the SGPLOT procedure can be used with
the pg1.np_codelookup table. Examine the PROC SGPLOT step in the demo program.
1) The HBAR statement creates a horizontal bar chart with separate bars for each value
of Region. The GROUP= option segments each bar by the distinct values of Type.
2) The KEYLEGEND statement customizes the appearance and position of the legend.
3) The XAXIS statement adds reference lines on the horizontal axis.
c. Use SAS Help or autocomplete prompts to look for additional options in the HBAR statement
to customize the appearance of the chart.
1) Display labels on each segment of the bars.
2) Change the fill attributes for each bar to make the color 50% transparent .
3) Apply different values for the DATASKIN option to change the color effect on the bars.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Creating Summary Statistics Reports 5-23
group
data
PROC MEANS
makes it easy to
summarize your
data in reports
or tables!
PROC MEANS is a very useful procedure for calculating basic summary statistics and looking for
numeric values that might be outside of an expected range. Now that you are beyond validation, you
can use PROC MEANS to generate complex reports that include various statistics and groupings
within the data.
30
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-24 Lesson 5 Analyzing and Reporting on Data
Files
• p105d04.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
Notes
• Options in the PROC MEANS statement control the statistics that are included in the report.
• The CLASS statement specifies columns to group the data before calculating statistics.
• The WAYS statement specifies the number of ways to make unique combinations of class
columns.
Demo
Note: Highlight the PROC MEANS step and run the selected code after each step.
1. Open p105d04.sas from the demos folder and find the Demo section of the program.
Run the step and examine the starting report.
2. List the following statistics in the PROC MEANS statement: MEAN, MEDIAN, MIN, and MAX.
Add the MAXDEC=0 option to round statistics to the nearest integer.
proc means data=pg1.storm_final mean median min max maxdec=0;
3. The CLASS statement can be used to calculate statistics for groups. Add a CLASS statement
and list the BasinName column.
Note: The CLASS statement does not require the data to be sorted.
proc means data=pg1.storm_final mean median min max maxdec=0;
var MaxWindMPH;
class BasinName;
run;
4. Add StormType as an additional column in the CLASS statement. Run the program and notice
that one report is created with statistics that are calculated for the combination of BasinName
and StormType values.
class BasinName StormType;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Creating Summary Statistics Reports 5-25
5. The WAYS statement can be used to indicate the combinations of class columns to use for
creating the report. Add the WAYS statement and provide a value of 1.
proc means data=pg1.storm_final mean median min max maxdec=0;
var MaxWindMPH;
class BasinName StormType;
ways 1;
run;
6. Change the WAYS statement to list 0, 1, and 2.
proc means data=pg1.storm_final mean median min max maxdec=0;
var MaxWindMPH;
class BasinName StormType;
ways 0 1 2;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-26 Lesson 5 Analyzing and Reporting on Data
5.05 Activity
Open p105a05.sas from the activities folder and perform the following tasks:
1. Add options to include N (count), MEAN, and MIN statistics. Round each
statistic to the nearest integer.
2. Add a CLASS statement to group the data by Season and Ocean. Run the
program.
3. Modify the program to add the WAYS statement so that separate reports
are created for Season and Ocean statistics. Run the program.
Which ocean had the lowest mean for minimum pressure?
Which season had the lowest mean for minimum pressure?
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
34
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
When you analyze detailed data, you might want to create a SAS table that summarizes the data for
further analysis. PROC MEANS is a great way to create summary tables. The OUTPUT statement
offers several options to customize the table that is generated. You use the OUT= option to name the
output table. The OUTPUT statement also enables you to generate output statistics and name a
column to store them in.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Creating Summary Statistics Reports 5-27
5.06 Activity
Open p105a06.sas from the activities folder and perform the following tasks:
1. Run the PROC MEANS step and compare the report and the wind_stats
table. Are the same statistics in the report and table? What do the first
five rows in the table represent?
2. Uncomment the WAYS statement. Delete the statistics listed in the PROC
MEANS statement and add the NOPRINT option. Run the program. Notice
that a report is not generated and the first five rows from the previous
table are excluded.
3. Add the following options in the OUTPUT statement and run the program
again. How many rows are in the output table?
output out=wind_stats mean=AvgWind max=MaxWind;
35
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-28 Lesson 5 Analyzing and Reporting on Data
5.07 Activity
Open p105a07.sas from the activities folder. Run the program and examine
the results to see examples of other procedures that analyze and report
on the data.
38
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The map is created using the SGMAP procedure, which requires SAS 9.4M5 or later.
39
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Creating Summary Statistics Reports 5-29
Links
• Review the SAS 9.4 ODS Graphics documentation.
• Take the ODS Graphics: Essentials course.
• Use this ODS Graphics tip sheet as a reference.
• Take the free e-learning Statistics 1: Introduction to ANOVA, Regression, and Logistic Regression
course.
• Check out other training options for advanced analytics.
• Learn to use PROC REPORT and PROC TABULATE in the SAS Report Writing 1: Essentials
course.
• Read PROC REPORT by Example: Techniques for Building Professional Reports Using SAS .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-30 Lesson 5 Analyzing and Reporting on Data
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
4. Producing a Descriptive Statistic Report
The pg1.np_westweather table contains weather-related information for four national parks:
Death Valley National Park, Grand Canyon National Park, Yellowstone National Park,
and Zion National Park. Use the MEANS procedure to analyze the data in this table.
a. Create a new program. Write a PROC MEANS step to analyze rows from
pg1.np_westweather with the following specifications:
1) Generate the mean, minimum, and maximum statistics for the Precip, Snow, TempMin,
and TempMax columns.
2) Use the MAXDEC= option to display the values with a maximum of two decimal
positions.
3) Use the CLASS statement to group the data by Year and Name.
4) Use Weather Statistics by Year and Park as the report title. Run the program
and review the results.
Level 2
5. Creating an Output Table with Custom Columns
The pg1.np_westweather table contains weather-related information for four national parks:
Death Valley National Park, Grand Canyon National Park, Yellowstone National Park,
and Zion National Park. Use the MEANS procedure to analyze the data in this table.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.3 Creating Summary Statistics Reports 5-31
a. Create a new program. Write a PROC MEANS step to analyze rows from
pg1.np_westweather where values for Precip are not equal to zero. Analyze precipitation
amounts grouped by Name and Year. Create only an output table, named rainstats, with
columns for the N and SUM statistics. Name the columns RainDays and TotalRain
respectively. Keep only those rows that are the combination of Year and Name.
b. Write a PROC PRINT step to print the rainstats table. Suppress the printing of observation
numbers, and display column labels. Display the columns in the following order: Name, Year,
RainDays, and TotalRain. Label Name as Park Name, RainDays as Number of Days
Raining, and TotalRain as Total Rain Amount (inches). Use Rain Statistics by Year and
Park as the report title.
c. Run the program and review the results.
Challenge
6. Identifying the Top Three Extreme Values with the Output Statistics
a. Create a new program. Write a PROC MEANS step to analyze rows from pg1.np_multiyr
and create a table named top3parks with the following attributes:
1) Suppress the display of the PROC MEANS report.
2) Analyze Visitors grouped by Region and Year.
3) Drop the _FREQ_ and _TYPE_ columns from top3parks and keep only rows that are
a result of a combination of Region and Year.
4) Create a column for TotalVisitors in the output table.
5) Include in the output table the top three parks in terms of the number of visitors.
Automatically resolve conflicts in the column names when names are assigned
to the new columns in the output table.
Note: Use SAS Help to learn about the IDGROUP option in the OUTPUT statement.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-32 Lesson 5 Analyzing and Reporting on Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Solutions 5-33
5.4 Solutions
Solutions to Practices
1. Creating One-Way Frequency Reports
/*part a*/
title1 "Categories of Reported Species";
proc freq data=pg1.np_species order=freq;
tables Category / nocum;
run;
/*part b*/
ods graphics on;
ods noproctitle;
title1 "Categories of Reported Species";
title2 "in the Everglades";
proc freq data=pg1.np_species order=freq;
tables Category / nocum plots=freqplot;
where Species_ID like "EVER%" and
Category ne "Vascular Plant";
run;
title;
2. Creating Two-Way Frequency Reports
What are the top three park types based on total frequency?
National Historic Site, National Monument, and National Park
/*part a, b*/
title1 'Park Types by Region';
proc freq data=pg1.np_codelookup order=freq;
tables Type*Region / nocol;
where Type not like '%Other%';
run;
/*part c*/
title1 'Selected Park Types by Region';
ods graphics on;
proc freq data=pg1.np_codelookup order=freq;
tables Type*Region / nocol crosslist
plots=freqplot(groupby=row scale=grouppercent
orient=horizontal);
where Type in ('National Historic Site', 'National Monument',
'National Park');
run;
title;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-34 Lesson 5 Analyzing and Reporting on Data
/*part b */
title1 'Counts of Selected Park Types by Park Region';
ods graphics on;
proc freq data=pg1.np_codelookup order=freq noprint;
tables Type*Region / out=park_freq;
where Type in ('National Historic Site', 'National Monument',
'National Park');
run;
/*part c*/
proc sgplot data=pg1.np_codelookup;
where Type in ('National Historic Site', 'National Monument',
'National Park');
hbar region / group=type;
keylegend / opaque across=1 position=bottomright
location=inside;
xaxis grid;
run;
/*part d*/
proc sgplot data=pg1.np_codelookup;
where Type in ('National Historic Site', 'National Monument',
'National Park');
hbar region / group=type seglabel
fillattrs=(transparency=0.5) dataskin=crisp;
keylegend / opaque across=1 position=bottomright
location=inside;
xaxis grid;
run;
title;
4. Producing a Descriptive Statistic Report
title1 'Weather Statistics by Year and Park';
proc means data=pg1.np_westweather mean min max maxdec=2;
var Precip Snow TempMin TempMax;
class Year Name;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Solutions 5-35
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-36 Lesson 5 Analyzing and Reporting on Data
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Solutions 5-37
continued...
5.03 Activity – Correct Answer
1. Modify the LABEL statement in the DATA step to label the Invoice column
as Invoice Price.
data cars_update;
set sashelp.cars;
keep Make Model MSRP Invoice AvgMPG;
AvgMPG=mean(MPG_Highway, MPG_City);
label MSRP="Manufacturer Suggested Retail Price"
AvgMPG="Average Miles per Gallon"
Invoice="Invoice Price";
run;
17
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
18
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-38 Lesson 5 Analyzing and Reporting on Data
24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
33
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5.4 Solutions 5-39
continued...
5.06 Activity – Correct Answer
1. Run the PROC MEANS step and compare the report and the wind_stats
table. Are the same statistics in the report and table? What do the first
five rows in the table represent?
The statistics are different. The first five rows in the table summarize the
entire input table.
36
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
View SAS
documentation for more
options to customize the
output table.
37
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
5-40 Lesson 5 Analyzing and Reporting on Data
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 6 Exporting Results
6.1 Exporting Data ............................................................................................................. 6-3
Demonstration: Exporting Data to an Excel Workbook.................................................. 6-8
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Exporting Data 6-3
Analyze
Access Explore Prepare
and report
Export
data data data results
on data
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
You have clean data and accurate, interesting reports. Now you need to share what you created with
others. You realize that not everyone who needs access to your results uses SAS, so you need
methods to save the data and reports in formats that are easy to view.
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-4 Lesson 6 Exporting Results
If you want to export data using a manual process, each of the SAS programming environments
includes point-and-click tools for exporting data to various delimited text formats, such as comma-
separated values (CSV), tab-delimited values (TAB) and space-delimited (DLM) files.
• In Enterprise Guide, you can start this process by selecting Share Output Data from the
toolbar.
• In SAS Studio, you can right-click a table in the Library panel and select Export.
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
There are several methods to programmatically export data too. By writing a program to export data,
you can easily integrate the export into your overall program to automate the final export step.
PROC EXPORT can export a SAS table to a variety of external formats.
The DATA= option specifies the data source. The OUTFILE= option specifies the fully qualified path
and file name of the exported data file. The DBMS= option tells SAS how to format the output.
Here are common DBMS identifiers that are included with Base SAS:
• CSV – comma-separated values.
• JMP – JMP files, JMP 7 or later.
• TAB – tab-delimited values.
• DLM – delimited files. The default delimiter is a space. To use a different delimiter,
use the DELIMITER= statement.
Here are additional DBMS identifiers that are included with SAS/ACCESS Interface to PC Files:
• XLSX – Microsoft Excel 2007, 2010, and later
• ACCESS – Microsoft Access 2000 and later
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Exporting Data 6-5
Remember that
the path is relative
to the location
of SAS.
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
In this code example, PROC EXPORT creates a tab-delimited text file that has column names in the
first row of the file. Remember that the path in the OUTFILE= option must be relative to the location
of SAS. In other words, if SAS is running on a server, the path must be accessible from the server
location.
If SAS Studio or Enterprise Guide were configured to connect to SAS on a remote server, both
interfaces provide a method to download files from the remote server to your local machine.
SAS Studio – Select the file in the Files and Folders section of the navigation pane and click
Download .
Enterprise Guide – Click Open a task and select Browse Data Copy Files.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-6 Lesson 6 Exporting Results
6.01 Activity
1. Open the libname.sas program in the course files folder.
2. Create a macro variable named outpath that stores the location
of the output folder in your course files location.
3. Run the code and save the program.
7
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
6.02 Activity
Open p106a02.sas from the activities folder and perform the following tasks:
1. Complete the PROC EXPORT step to read the pg1.storm_final SAS table
and create a comma-delimited file named storm_final.csv. Use &outpath
to substitute the path of the output folder.
2. Run the program and view the text file:
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Exporting Data 6-7
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p106d01
Another easy way to export data is to use a SAS/ACCESS Interface LIBNAME engine. We simply
create the data in the desired format right from a SAS process. For example, a DATA step or
procedure OUTPUT statement can write results directly to the target data source. I do not have to
create a SAS table first and then export the SAS table in a separate step. Of course, you need Write
permission to the target destination.
For example, this program uses the SAS/ACCESS Interface to PC File Formats XLSX engine to
define a library to an Excel workbook named cars. The DATA step references the library and output
worksheet named asiacars. The code extracts data about cars manufactured in Asia from
sashelp.cars and writes the result directly into the worksheet asiacars.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-8 Lesson 6 Exporting Results
Files
• p106d01.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
Notes
• The XLSX engine requires a license for SAS/ACCESS Interface to PC Files.
• The XLSX engine can read and write data in Excel files.
• To write data to a new or existing Excel workbook, use the LIBNAME statement to assign a libref
that points to the Excel file. Use the libref when you name output tables. The table name is the
worksheet label in the Excel file.
Demo
1. Open p106d01.sas from the demos folder and find the Demo section of the program. Examine
the DATA and PROC MEANS steps and identify the temporary SAS tables that will be created.
Highlight the demo program and run the selected code.
2. Add a LIBNAME statement to create a library named xlout that points to an Excel file named
southpacific.xlsx in the output folder of the course data.
Note: Use the outpath macro variable to substitute the path of the output folder. If you did not
define the outpath macro variable, run the libname.sas program that was completed in
Activity 6.01.
libname xlout xlsx "&outpath/southpacific.xlsx";
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.1 Exporting Data 6-9
3. Modify the DATA and PROC steps to write output tables to the xlout library.
libname xlout xlsx "&outpath/southpacific.xlsx";
data xlout.South_Pacific;
set pg1.storm_final;
where Basin="SP";
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-10 Lesson 6 Exporting Results
6.03 Activity
Open p106a03.sas from the activities folder and perform the following tasks:
1. Complete the LIBNAME statement using the XLSX engine to create
an Excel workbook named storm.xlsx in the output folder.
2. Modify the DATA step to write the storm_final table to the storm.xlsx file.
3. After the DATA step, write a statement to clear the library.
4. Run the program and view the log to confirm that storm.xlsx was exported
with 3092 rows.
5. If possible, open the storm.xlsx file. How do dates appear in the
storm_final workbook?
13
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-11
16
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
SAS provides the Output Delivery System (ODS) to create customized output in a variety of formats.
In SAS, procedures that generate reports actually generate output objects. These can easily be
rendered in one or more output formats that are designed to be viewed in SAS or in other software
applications. In ODS terminology, each of these formats is called a destination. Some ODS
destinations produce very simple output files, such as text files that conform to comma-separated
values’ standards. Others produce complex output files that are designed to be viewed and
manipulated using external software applications. Common destinations of this type include Excel
(XLSX), Microsoft Word (RTF), Microsoft PowerPoint (PPTX), and Adobe (PDF). Many other
destinations are available in SAS.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-12 Lesson 6 Exporting Results
17
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Directing output to these destinations is like making a sandwich. The SAS procedure code that
creates the output is the “filling” for our sandwich, and the ODS statements preceding and following
the output code is the “bread” that makes the output easy to consume outside of SAS. Here are
some common destinations:
• EXCEL
• CSVALL (comma-delimited text file)
• RTF (Rich Text Format for viewing in word processors such as Microsoft Word)
• POWERPOINT
• HTML
• PDF
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-13
18
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
19
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-14 Lesson 6 Exporting Results
20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The ODS EXCEL destination provides an enormous amount of flexibility. You can specify a style for
the output by using the STYLE= option. There are many different styles that are built in to SAS. You
can list additional options in the ODS statement by using the OPTIONS keyword and enclosing
option-value pairs in parentheses. The SHEET_NAME= option customizes the tab names in the
workbook.
Note: ODS Excel was experimental in SAS 9.4M1 and M2. It is fully supported in SAS 9.4M3 and
later releases.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-15
Files
• p106d02.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
Notes
• The ODS EXCEL destination creates an XLSX file.
• By default, each procedure output is written to a separate worksheet with a default worksheet
name. The default style is also applied.
• Use the STYLE= option in the ODS EXCEL statement to apply a different style.
• Use the OPTIONS(SHEET_NAME=’label’) option in the ODS EXCEL statement to provide
a custom label for each worksheet.
Demo
1. Open p106d02.sas from the demos folder and find the Demo section in the program. Add an
ODS statement to create an Excel file named wind.xlsx in the output folder of the course files.
Close the Excel destination at the end of the program. Highlight the demo program and run the
selected code.
Note: Use the outpath macro variable to substitute the path of the output folder. If you did not
define the outpath macro variable, run the libname.sas program that was completed in
Activity 6.01.
Note: If you are using Enterprise Guide 8.1 or later, you receive a warning in the log. By
default, it uses the graph format Default. This allows the Output Delivery System (ODS)
to decide on the best graph format. To adjust the default settings, go to Tools
Results Graphs and change the graph format. You can also use the statement
GOPTIONS DEV=PNG before the ODS statement.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-16 Lesson 6 Exporting Results
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-17
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-18 Lesson 6 Exporting Results
6.04 Activity
Open p106a04.sas from the activities folder and perform the following tasks:
1. Add ODS statements to create an Excel file named pressure.xlsx
in the output folder. Be sure to close the ODS location at the end
of the program. Run the program and open the Excel file.
SAS Studio: Navigate to the output folder in the Files and Folders section
of the navigation pane. Select pressure.xlsx and click Download .
Enterprise Guide: Click the Results tab. Then, under Open with Default
Application, double-click the Excel icon.
24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The Output Delivery System also enables you to export reports to common formats that you use in
everyday business, such as PowerPoint by using the PowerPoint destination, and Microsoft Word by
using the RTF destination. The Rich Text Format (RTF) destination is a software-neutral file type that
is made for word processing programs such as Microsoft Word. There are particular options that
apply to each of these destinations so that you can customize your output.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-19
6.05 Activity
Open p106a05.sas from the activities folder and perform the following tasks:
1. Run the program and open the pressure.pptx file.
2. Modify the ODS statements to change the output destination to RTF.
Change the style to sapphire.
3. Add the STARTPAGE=NO option in the first ODS RTF statement
to eliminate a page break between the procedure results.
4. Rerun the program and open the pressure.rtf file.
25
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Finally, let’s look at the Portable Document Format (PDF) destination. PDF files are used extensively
for reporting because the layout can be precisely controlled, and you can guarantee that the
document will look just as you intended it to when the receiver opens it.
In SAS ODS, PDF is one of the PRINTER destinations, meaning that you have a lot of programmatic
control over the document’s appearance. You can use the PDFTOC= option to control the level of
bookmarks that are open. You can use the ODS PROCLABEL statement to label the bookmark for
the procedure.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-20 Lesson 6 Exporting Results
Files
• p106d03.sas
• storm_final – a SAS table that contains one row per storm for the 1980 through 2017 st orm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
Notes
• The ODS PDF destination creates a PDF file.
• The PDFTOC=n option controls the level of the expansion of the table of contents in PDF
documents.
• The ODS PROCLABEL statement enables you to change a procedure label.
Demo
1. Open p106d03.sas from the demos folder and find the Demo section of the program. Run the
program and open the PDF file to examine the results. Notice that bookmarks are created, and
they are linked to each procedure’s output.
Note: Use the outpath macro variable to substitute the path of the output folder. If you did not
define the outpath macro variable, run the libname.sas program that was completed in
Activity 6.01.
2. Add the STARTPAGE=NO option to eliminate page breaks between procedures. Add the
STYLE=JOURNAL option.
ods pdf file="&outpath/wind.pdf" startpage=no style=journal;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-21
3. To customize the PDF bookmarks, add the PDFTOC=1 option to ensure that bookmarks are
expanded only one level when the PDF is opened. To customize the bookmark labels, add the
ODS PROCLABEL statement before each PROC with custom text. Run the program and open
the PDF file.
ods pdf file="&outpath/wind.pdf" startpage=no style=journal
pdftoc=1;
ods noproctitle;
ods proctitle;
ods pdf close;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-22 Lesson 6 Exporting Results
• Take the Exporting SAS • View the following Help pages: • Take the SAS Report
Data Sets and Creating – Base SAS EXPORT procedure Writing 1: Essentials
ODS Files for Microsoft – SAS Output Delivery System: course.
Excel course. User’s Guide • Explore the SAS Output
– SAS/ACCESS Interface to PC Delivery System
Files: Reference resource page.
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• Take the Exporting SAS Data Sets and Creating ODS Files for Microsoft Excel course.
• View the following Help pages:
– Base SAS EXPORT Procedure
– SAS Output Delivery System: User’s Guide
– SAS/ACCESS Interface to PC Files: Reference
• Take the SAS Report Writing 1: Essentials course.
• Explore the SAS Output Delivery System resource page.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-23
Practice
If you restarted your SAS session, open and submit the libname.sas program in the course files.
Level 1
1. Creating an Excel File Using ODS EXCEL
Create an Excel workbook named StormStats.xlsx that includes the results of SAS procedures.
Customize the names of the Excel worksheets.
a. Open p106p01.sas from the practices folder. Before the PROC MEANS step, add an ODS
EXCEL statement to do the following:
1) Write the output file to “&outpath/StormStats.xlsx”.
Note: If you did not define the outpath macro variable, run the libname.sas program
that was completed in Activity 6.01.
2) Set the style for the Excel file to snow.
3) Set the sheet name for the first tab to South Pacific Summary.
b. Turn off the procedure titles and report titles at the start of the program. Turn the procedure
titles on at the end of the program.
c. Immediately before the PROC PRINT step, add an ODS EXCEL stat ement to set the sheet
name to Detail.
d. At the end of the program, add an ODS EXCEL statement to close the Excel destination.
e. Submit the program. If possible, open the StormStats.xlsx workbook in Excel.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-24 Lesson 6 Exporting Results
Level 2
2. Creating a Word Document with ODS RTF
Generate an RTF file that can be opened in Microsoft Word. The file should include the results
of three procedures and use different styles to change the appearance.
a. Open p106p02.sas from the practices folder. Modify the program to write the output file
to &outpath/ParkReport.rtf. Set the style for the output file to Journal and remove page
breaks between procedure results. Suppress the printing of procedure titles.
Note: If you did not define the outpath macro variable, run the libname.sas program that
was completed in Activity 6.01.
b. Run the program. Open the output file in Microsoft Word. Notice that the Journal style is
applied to the results, but the graph is now gray scale instead of color. Also notice that the
date and time the program ran is printed in the upper right corner of the page. Close
Microsoft Word.
c. Modify your SAS program so that both tables are created using the Journal style, but the
graph is created using the SASDOCPRINTER style.
Note: An ODS destination statement enables you to specify a style without requiring you
to redefine the output file location.
d. Add an OPTIONS statement with the NODATE option at the beginning of the program
to suppress the date and time in the RTF file. Restore the option for future submissions
by adding an OPTIONS statement with the DATE option at the end of the program.
e. Run the program. Open the new output file using Microsoft Word. Ensure that the style for
both tables is the same, but that the graph is now displayed in color. Close the report.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.2 Exporting Reports 6-25
Challenge
3. Creating a Landscape Report with ODS PDF
Generate a PDF document in landscape orientation. Print a report and map side by side.
a. Open p106p03.sas from the practices folder. Run the program and examine the output.
The program produces a table and map for North Atlantic region storms in the 2016 season.
b. Modify the program to produce a PDF file named StormSummary.pdf in the output folder
in the course files. Set the output style to Journal.
c. Use SAS Help to find a SAS system option that changes the page layout to landscape.
d. Use SAS Help to learn about the ODS LAYOUT GRIDDED statement as a way that you can
control the layout of multiple result objects. Force the results to be arranged in one row and
two columns.
e. Reset the system option at the end of the program so that future results have a portrait
layout.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-26 Lesson 6 Exporting Results
f. Run the program and open the StormSummary.pdf file to confirm the results.
Note: SAS Studio generates a warning in the log because the wrapper code is creating
an RTF file behind the scenes. LAYOUT is not supported in RTF. The warning can
be ignored because it does not impact the PDF results.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-27
6.3 Solutions
Solutions to Practices
1. Creating an Excel File Using ODS EXCEL
ods excel file="&outpath/StormStats.xlsx"
style=snow
options(sheet_name='South Pacific Summary');
ods noproctitle;
title;
proc means data=pg1.storm_detail maxdec=0 median max;
class Season;
var Wind;
where Basin='SP' and Season in (2014,2015,2016);
run;
ods excel options(sheet_name='Detail');
proc print data=pg1.storm_detail noobs;
where Basin='SP' and Season in (2014,2015,2016);
by Season;
run;
ods excel close;
ods proctitle;
2. Creating a Word Document with ODS RTF
ods rtf file="&outpath/ParkReport.rtf" style=Journal startpage=no;
ods noproctitle;
options nodate;
title "US National Park Regional Usage Summary";
proc freq data=pg1.np_final;
tables Region / nocum;
run;
proc means data=pg1.np_final mean median max nonobs maxdec=0;
class Region;
var DayVisits Campers;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-28 Lesson 6 Exporting Results
ods region;
proc print data=pg1.storm_final noobs;
var name StartDate MaxWindMPH StormLength;
where Basin="NA" and Season=2016;
format StartDate monyy7.;
run;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-29
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
10
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-30 Lesson 6 Exporting Results
14
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
...
23
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6.3 Solutions 6-31
The STARTPAGE=
option controls
page breaks
in the file.
26
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
6-32 Lesson 6 Exporting Results
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Lesson 7 Using SQL in SAS®
7.1 Using Structured Query Language (SQL) in SAS ......................................................... 7-3
Demonstration: Reading and Filtering Data with SQL................................................... 7-9
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Using Structured Query Language (SQL) in SAS 7-3
Python
REST SQL
In addition to working
with other types of data,
SAS SAS also enables you to
use other programming
Java R languages and APIs!
Lua
3
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
One of the great strengths of SAS has always been the ability to integrate with other types of data.
We have seen in this course how SAS integrates with Excel and other Microsoft Office products. You
can also read and write data from many other databases that are not part of SAS, including Oracle
and Hadoop.
In addition to enabling you to use data from other sources, SAS also supports other common
programming languages and APIs. You can take advantage of your knowledge and the strengths of
these other languages in the code that you submit in the SAS Platform.
To learn more about how these languages and APIs can be integrated on the SAS Platform,
visit http://developer.sas.com.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-4 Lesson 7 Using SQL in SAS®
Analyze
Access Explore Prepare Export
data
and report
data data results
on data
Structured Query
Language (SQL)
4
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Structured Query Language (SQL) is a common language that is used by many programmers in a
wide variety of software. SAS enables you to write SQL code as part of a SAS program. It is likely
that you will encounter SQL as you progress as a SAS programmer, so it is important to understand
how SQL can be a beneficial tool, and how it compares to the SAS code that was written.
5
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Using Structured Query Language (SQL) in SAS 7-5
The SQL language is available to use in Base SAS. Because SQL is a separate language, it is
implemented in SAS as a procedure. Many programmers who are new to SAS will have prior
experience with SQL. This provides an easy, familiar entry point for programming on the SAS
Platform.
There are two procedures to choose from for executing SQL in Base SAS: PROC SQL and PROC
FedSQL. Each has different extensions and strengths. PROC SQL is more tightly integrated with the
SAS System and has several unique extensions that are useful when processing on the SAS
Platform. PROC FedSQL is written to a more modern SQL ANSI standard, and it is more ANSI
compliant, which means that it has fewer SAS extensions. Because PROC SQL has been available
longer, it is more commonly encountered in existing SAS code, so PROC SQL was chosen for
executing SQL programs in this class.
6
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
The PROC SQL statement invokes the SQL language processor, and subsequent statements are
interpreted and executed as SQL until a QUIT statement is encountered.
SELECT is the most commonly used SQL statement and is usually referred to as a query. A query
consists of clauses that describe the desired result. At a minimum, a query must specify a list of
column names to retrieve in the SELECT clause and the name of the table that contains the columns
in the FROM clause. By default, an SQL query creates a report.
Note: Each SQL statement executes immediately and independently.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-6 Lesson 7 Using SQL in SAS®
proc sql;
select Name, Age, Height, Birthdate format=date9.
from pg1.class_birthdate;
quit;
This simple query selects columns from the class_birthdate table and generates a report. The
SELECT clause specifies the columns that you want to appear in the result, and the FROM clause
specifies the table containing the source data. Notice that lists, such as column names, are always
separated with commas. Also note the syntax applying a format to the Birthdate column. Although
this is not standard SQL syntax, this SAS extension to the SQL language makes it easier to create
more useful and polished reports.
proc sql;
select Name, Age, Height*2.54 as HeightCM format=5.1,
Birthdate format=date9.
from pg1.class_birthdate;
quit;
8
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p107d01
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Using Structured Query Language (SQL) in SAS 7-7
7.01 Activity
Open p107a01.sas from the activities folder.
1. What are the similarities and differences in the syntax of the two steps?
2. Run the program. What are the similarities and differences in the results?
proc sql;
select Name, Age, Height*2.54 as HeightCM format=5.1,
Birthdate format=date9.
from pg1.class_birthdate;
quit;
9
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
WHERE expression
proc sql;
select Name, Age, Height, Birthdate format=date9.
from pg1.class_birthdate
where age > 14;
quit;
11
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p107d01
The WHERE clause is used to subset rows in the query. The same WHERE syntax that worked in
other SAS procedures and the DATA step works in SQL too. However, remember that the WHERE
expression is not a separate statement in SQL, but instead it is a clause added to the SELECT
statement. Only those rows from the input table that meet the criterion provided are included in the
result.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-8 Lesson 7 Using SQL in SAS®
proc sql;
select Name, Age, Height, Birthdate format=date9.
from pg1.class_birthdate
where age > 14
order by Height desc;
quit;
The default sort
order is ascending.
12
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p107d01
In traditional SAS syntax, if you want a report produced in a particular order, you must perform two
separate steps. First sort the data, and then execute a reporting procedure. In SQL, we can do it all
in one query. We can add an ORDER BY clause to describe the order in which we want the results
arranged. If you want the rows ordered with the tallest person listed first (descending order), you
would add the DESC keyword after the column name in the ORDER BY clause.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Using Structured Query Language (SQL) in SAS 7-9
Files
• p107d01.sas
• storm_final - a SAS table that contains one row per storm for the 1980 through 2017 storm
seasons. The data was cleaned and prepared previously using the DATA step.
Syntax
PROC SQL;
SELECT col-name, col-name FORMAT=fmt
FROM input-table
WHERE expression
ORDER BY col-name <DESC>;
QUIT;
Notes
• PROC SQL creates a report by default.
• The SELECT statement describes the query. After the SELECT keyword, list columns to include in
the results, separated by commas.
• Computed columns can be included in the SELECT clause.
• The FROM clause lists one or more input tables.
• The ORDER BY clause arranges rows based on the listed columns. The default order is
ascending. Use DESC after a column name to reverse the sort sequence.
• PROC SQL ends with a QUIT statement.
Demo
1. Open p107d01.sas from the demos folder and find the Demo section of the program. Add a
SELECT statement to retrieve all columns from pg1.storm_final. Highlight the step and run the
selected code. Examine the log and results.
proc sql;
select *
from pg1.storm_final;
quit;
2. Modify the query to retrieve only the Season, Name, StartDate, and MaxWindMPH columns.
Format StartDate with MMDDYY10. Highlight the step and run the selected code.
proc sql;
select Season, Name, StartDate format=mmddyy10., MaxWindMPH
from pg1.storm_final;
quit;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-10 Lesson 7 Using SQL in SAS®
3. Modify Name in the SELECT clause to convert the values to proper case.
proc sql;
select Season, propcase(Name) as Name,
StartDate format=mmddyy10., MaxWindMPH
from pg1.storm_final;
quit;
4. Add a WHERE clause to include storms during or after the 2000 season with MaxWindMPH
greater than 156.
5. Add an ORDER BY clause to arrange rows by descending MaxWindMPH, and then by Name.
6. Add TITLE statements to describe the report. Highlight the step and run the selected code.
title "International Storms since 2000";
title2 "Category 5 (Wind>156)";
proc sql;
select Season, propcase(Name) as Name,
StartDate format=mmddyy10., MaxWindMPH
from pg1.storm_final
where MaxWindMPH > 156 and Season >= 2000
order by MaxWindMPH desc, Name;
quit;
title;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.1 Using Structured Query Language (SQL) in SAS 7-11
7.02 Activity
Open p107a02.sas from the activities folder and perform the following tasks:
1. Complete the SQL query to display Event and Cost from
pg1.storm_damage. Format the values of Cost.
2. Add a new column named Season that extracts the year from Date.
3. Add a WHERE clause to return rows where Cost is greater than 25 billion.
4. Add an ORDER BY clause to arrange rows by descending Cost.
Which storm had the highest cost?
PROC SQL;
SELECT col-name, col-name <FORMAT=fmt.>, expression AS col-name
FROM input-table
WHERE expression
ORDER BY col-name <DESC>;
QUIT;
14
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
proc sql;
create table work.myclass as
select Name, Age, Height Adding CREATE
from pg1.class_birthdate TABLE at the
where age > 14 beginning of the
order by Height desc; query turns a
quit; report into a table.
16
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-12 Lesson 7 Using SQL in SAS®
proc sql;
drop table work.myclass;
quit; This is helpful if you
are working with
DBMS tables that don’t
allow you to overwrite
existing tables.
17
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
For those writing SQL code for SAS to process in other database environments, you might need to
drop or delete a table before updating it. If you have appropriate permission to make such changes
within the database, you can use the DROP TABLE statement.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Joining Tables Using SQL in SAS 7-13
class_combine
Only students in
both input tables
are included.
19
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Joining tables is a very common requirement when working with data. There are multiple methods
available in SAS to join tables. The most common are SQL and the DATA step. In this course, we
introduce the SQL inner join. The SAS Programming 2: Data Manipulation Techniques course
addresses the DATA step merge.
In this example, we have information about students in the class_update table, and each student’s
assigned grade and teacher in the class_teachers table. Notice that the Name column is common
in both tables. We would like to join the tables so that all information for each student in contained in
a single result. An inner join will create a new report or table that includes students found in both
tables. Notice that David is in only class_update, and Carol is in only class_teachers, so they are
not included in the inner join result.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-14 Lesson 7 Using SQL in SAS®
proc sql;
select Grade, Age, Teacher
from pg1.class_update inner join pg1.class_teachers
on class_update.Name = class_teachers.Name;
quit;
20
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p107d02
What is the syntax required to combine the matching rows from two tables? We can modify the
FROM clause to add INNER JOIN, followed by the second table.
proc sql;
select Grade, Age, Teacher
from pg1.class_update inner join pg1.class_teachers
on class_update.Name = class_teachers.Name;
quit;
matching
criteria
21
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p107d02
Following the table names, this join syntax requires an ON clause to describe the criteria for
matching rows in the tables. Omitting the ON clause produces a syntax error.
The join in this example is an example of a specific type of inner join, referred to as an equijoin,
where only rows with identical values in the Name column produce a match. The ON condition could
also use other comparison operators, such as greater than or less than.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Joining Tables Using SQL in SAS 7-15
Although not illustrated in this course, outer joins enable you to include nonmatching rows in the
results. This is accomplished simply by changing the keyword INNER to OUTER (all nonmatching
rows) or RIGHT or LEFT (all rows from one table).
proc sql;
select class_update.Name, Grade, Age, Teacher
from pg1.class_update inner join pg1.class_teachers
on class_update.Name = class_teachers.Name;
quit;
Because Name occurs
in both tables, you must
use the table prefix to
indicate which column
you want to select.
22
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d . p107d02
Note that the Name column is prefixed by one of the table names. This is known as qualifying the
column names, and it is necessary when you have columns with the same name from more than
one table. Qualifying the column name avoids creating an ambiguous column reference, where SAS
does not know which Name column to read.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-16 Lesson 7 Using SQL in SAS®
Files
• p107d02.sas
• storm_summary – a SAS table that contains one row per storm for the 1980 through 2016 storm
seasons
• storm_basincodes – a SAS table that includes each two-letter basin code and the corresponding
full basin name
Syntax
PROC SQL;
SELECT col-name, col-name
FROM input-table1 INNER JOIN input-table2
ON table1.col-name=table2.col-name;
QUIT;
Notes
• An SQL inner join combines matching rows between two tables.
• The two tables to be joined are listed in the FROM clause separated by INNER JOIN.
• The ON expression indicates how rows should be matched. The column names must be qualified
as table-name.col-name.
Demo
1. Open pg1.storm_summary and pg1.storm_basincodes and compare the columns. Identify
the matching column.
2. Open the p107d02.sas program in the demos folder and find the Demo section of the program.
Add pg1.storm_basincodes to the FROM clause to perform an inner join on Basin. Qualify the
Basin columns as table-name.col-name in the ON expression only.
3. Add the BasinName column to the query after Basin. Highlight the step, run the selected code,
and examine the log. Why does the program fail?
proc sql;
select Season, Name, Basin, BasinName, MaxWindMPH
from pg1.storm_summary inner join pg1.storm_basincodes
on storm_summary.basin=storm_basincodes.basin
order by Season desc, Name;
quit;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Joining Tables Using SQL in SAS 7-17
4. Modify the query to qualify the Basin column in the SELECT clause. Highlight the step and run
the selected code.
proc sql;
select Season, Name, storm_summary.Basin, BasinName, MaxWindMPH
from pg1.storm_summary inner join pg1.storm_basincodes
on storm_summary.basin=storm_basincodes.basin
order by Season desc, Name;
quit;
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-18 Lesson 7 Using SQL in SAS®
proc sql;
select u.Name, Grade, Age, Teacher
from pg1.class_update as u
inner join pg1.class_teachers as t
on u.Name=t.Name;
quit;
24
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Typing the full table names to qualify columns can be tedious. SQL enables you to assign an alias
(or nickname) to a table in the FROM clause by adding the keyword AS and the alias of your choice.
Then you can use the alias in place of the full table name to qualify columns in the other clauses of a
query. In this example, the aliases for the two tables are the letters U and T.
7.03 Activity
Open p107a03.sas from the activities folder and perform the following tasks:
1. Define aliases for storm_summary and storm_basincodes in the FROM
clause.
2. Use one table alias to qualify Basin in the SELECT clause.
3. Complete the ON expression to match rows when Basin is equal in the
two tables. Use the table aliases to qualify Basin in the expression. Run
the step.
25
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Joining Tables Using SQL in SAS 7-19
The DATA step and SQL each provide rich syntax designed to solve our data processing
requirements. But each has its own strengths, and therefore it is helpful to know both as well as the
situations in which one might be easier or more efficient than the other.
The DATA step provides very detailed and customizable control over how data is read, processed,
and written. It includes the ability to create multiple tables simultaneously in a single DATA step,
which requires reading the input table only once. It also includes syntax for creating loops and
processing data in arrays.
SQL has the distinct advantage of being a standardized language that is used in most databases.
Some SQL syntax can be more streamlined than the equivalent statements in a DATA or PROC
step. And as we have seen, SQL can sometimes do in one query what can require multiple steps in
SAS, such as creating a report in sorted order.
Ultimately, it is a great benefit to know both native SAS syntax and SQL and use them when
appropriate in your SAS programs.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-20 Lesson 7 Using SQL in SAS®
29
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
To learn more about the DATA step, take the SAS Programming 2: Data Manipulation Techniques
course. To learn more about SQL, take the SAS SQL 1: Essentials course. In both courses, we teach
how the DATA step or PROC SQL runs behind the scenes so that you can control the processing of
your data with appropriate syntax. This enables you to take advantage of the best features in each
approach.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.2 Joining Tables Using SQL in SAS 7-21
30
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Links
• Take the SAS SQL 1 course.
• Read PROC SQL by Example.
• Take the SAS SQL Methods and More course.
• Read Practical and Efficient SAS Programming.
• Take the DS2 Programming Essentials course.
• Read Mastering the SAS DS2 Procedure.
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-22 Lesson 7 Using SQL in SAS®
31
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
https://communities.sas.com/sas-training
32
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7.3 Solutions 7-23
7.3 Solutions
Solutions to Activities and Questions
15
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
7-24 Lesson 7 Using SQL in SAS®
continued...
7.03 Activity – Correct Answer
proc sql;
select Season, Name, s.Basin, BasinName, MaxWindMPH
from pg1.storm_summary as s
inner join pg1.storm_basincodes as b
on s.basin=b.basin
order by Season desc, Name;
quit;
The storm_summary
table includes some
lowercase Basin
values. Are they
in the results?
26
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
proc sql;
select Season, Name, s.Basin, BasinName, MaxWindMPH
from pg1.storm_summary as s
inner join pg1.storm_basincodes as b
on upcase(s.basin)=b.basin
order by Season desc, Name;
quit;
27
C o p y r i g h t © S A S In s t i tu t e In c. A l l r i g h ts r e s e r ve d .
Copyright © 2020, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.