Managing Memory in SAS
Managing Memory in SAS
Managing Memory in SAS
I/O
Memory
CPU time
Definitions
Performance Statistics
are measurements of the total input and output operations (I/O), memory, and
CPU time used to process individual DATA or PROC steps. You can obtain these
statistics by using SAS system options that can help you measure your job's initial
performance and to determine how to improve performance.
System Performance
is measured by the overall amount of I/O, memory, and CPU time that your
system uses to process SAS programs. By using certain techniques and SAS
system options you can reduce or reallocate your usage of these three critical
resources to improve system performance. While you may not be able to take
advantage of every technique for every situation, you can choose the ones that
are best suited for a particular situation.
System Options
Options STIMER;
NOTE: DATA statement used:
real time 1.16 seconds
cpu time 0.09 seconds
Options FULLSTIMER;
NOTE: The SAS System used:
real time
0.16 seconds
user cpu time
0.01 seconds
system cpu time
0.02 seconds
Memory
1162k
Page Faults
0
Page Reclaims
2619
Page Swaps
0
Voluntary Context Switches
81
Involuntary Context Switches
6
Block Input Operations
0
Block Output Operations
0
Real time
represents the clock time it took to execute a job or step; it is heavily dependent on the
capacity of the system and the current load. As more users share a particular resource,
less of that resource is available to you.
CPU time
represents the actual processing time required by the CPU to execute the job, exclusive
of capacity and load factors. If you must wait longer for a resource, your CPU time will
not increase, but your real time will increase. It is not advisable to use real time as the
only criterion for the efficiency of your program because you cannot always control the
capacity and load demands on your system. A more accurate assessment of system
performance is CPU time, which decreases more predictably as you modify your
program to become more efficient.
User CPU
System CPU
Memory
Page Faults
Page Reclaims
Page Swaps
Description
the amount of time spent to process the SAS
job. Real time is also referred to as elapsed
time.
the CPU time spent to execute your SAS code.
the CPU time spent to perform system
overhead tasks on behalf of the SAS process.
the amount of memory required to run a step.
the number of pages that SAS tried to access
but were not in main memory and required I/O
activity.
the number of pages that can be accessed
without I/O activity.
the number of times a process was swapped
out of main memory.
Voluntary Context Switches the number of times that the SAS process had
to give up on the CPU because of a resource
constraint such as a disk drive.
Involuntary Context
Switches
Optimizing I/O
I/O is one of the most important factors for optimizing performance. Most SAS jobs
consist of repeated cycles of reading a particular set of data to perform various data
analysis and data manipulation tasks. To improve the performance of a SAS job, you
must reduce the number of times SAS accesses disk or tape devices.
Optimizing I/O
Process more data each time a device is accessed by:
BUFNO=
SAS uses the BUFNO= option to adjust the number of open page buffers when it
processes a SAS data set. Increasing this option's value can improve your application's
performance by allowing SAS to read more data with fewer passes; however, your
memory usage increases. Experiment with different values for this option to determine
the optimal value for your needs.
BUFSIZE=
When the Base SAS engine creates a data set, it uses the BUFSIZE= option to set the
permanent page size for the data set. The page size is the amount of data that can be
transferred for an I/O operation to one buffer. The default value for BUFSIZE= is
determined by your operating environment. Note that the default is set to optimize the
sequential access method. To improve performance for direct (random) access, you
should change the value for BUFSIZE=.
Whether you use your operating environment's default value or specify a value, the
engine always writes complete pages regardless of how full or empty those pages are.
If you know that the total amount of data is going to be small, you can set a small
page size with the BUFSIZE= option, so that the total data set size remains small and
you minimize the amount of wasted space on a page. In contrast, if you know that you
are going to have many observations in a data set, you should optimize BUFSIZE= so
that as little overhead as possible is needed. Note that each page requires some
additional overhead. Large data sets that are accessed sequentially benefit from
larger page sizes because sequential access reduces the number of system calls that
are required to read the data set. Note that because observations cannot span pages,
typically there is unused space on a page.
8
Optimizing I/O
Optimizing I/O
COMPRESS=
One further technique that can reduce I/O processing is to store your data as
compressed data sets by using the COMPRESS= data set option. However, storing your
data this way means that more CPU time is needed to decompress the observations as
they are made available to SAS. But if your concern is I/O, and not CPU usage,
compressing your data may improve the I/O performance of your application.
Long Character Values Dataset
Resource
Uncompressed
Compressed
Change
CPU
4.27 sec
27.46 sec
+23.19 sec
Space
235 MB
54 MB
-181 MB
Uncompressed
Compressed
Change
CPU
1.17 sec
14.68 sec
+13.51 sec
Space
52 MB
39 MB
-13 MB
10
However, by increasing memory available to SAS, you can decrease processing time
because the amount of time that is spent on paging, or reading pages of data into
memory, is reduced.
MEMSIZE=
Specifies the limit on the total amount of memory to be used by the SAS System
SAS does not automatically reserve or allocate the amount of memory that you specify
in the MEMSIZE option. SAS will only use as much memory as it needs to complete a
process. For example, a DATA step might only require 20M of memory, so even though
MEMSIZE is set to 500M, SAS will use only 20M of memory.
Setting MEMSIZE to 0 is not recommended except for debugging and testing purposes.
To determine this optimal value, run the SAS procedure or DATA step with MEMSIZE set
to 0 and the FULLSTIMER option. Note the amount of memory used by the process and
then set MEMSIZE to a larger amount.
11
SORTSIZE=
Specifies the amount of memory that is available to the SORT procedure
SUMSIZE=
Specifies a limit on the amount of memory that is available for data summarization
procedures such as the MEANS, OLAP, REPORT, SUMMARY, SURVEYFREQ,
SURVEYLOGISTIC, SURVEYMEANS, and TABULATE procedures.
MVARSIZE=
Specifies the maximum size for in-memory macro variable values
12
MEXECSIZE=
Specifies the maximum macro size that can be executed in memory. Use the
MEXECSIZE option to control the maximum size macro that will be executed in memory
as opposed to being executed from a file. The MEXECSIZE option value is the compiled
size of the macro. Memory is allocated only when the macro is executed. After the
macro completes, the memory is released. If memory is not available to execute the
macro, an out-of-memory message is written to the SAS log. Use the MCOMPILENOTE
option to write to the SAS log the size of the compiled macro. The MEMSIZE option does
not affect the MEXECSIZE option.
MSYMTABMAX=
Specifies the maximum amount of memory available to the macro variable symbol
table. Once the maximum value is reached, additional macro variables are written out
to disk. A value of 0 causes all macro symbol tables to be written to disk. If this option
is set too low and the application frequently reaches the specified memory limit, then
disk I/O increases. If this option is set too high (on some operating environments) and
the application frequently reaches the specified memory limit, then less memory is
available for the application, and CPU usage increases.
REALMEMSIZE=
Indicates the amount of real memory SAS can expect to allocate. Use the
REALMEMSIZE system option to optimize the performance of SAS procedures that alter
their algorithms and memory usage. Setting the value of REALMEMSIZE too low might
result in less than optimal performance. For better performance, set REALMEMSIZE to
the amount of memory (excluding swap space) that is available to the SAS session at
invocation.
13
PROC OPTIONS;
Lists the current values of all SAS system options.
For example on our system:
MEMSIZE=67108864 (64Mb) Specifies the limit on the total amount of memory to be used by SAS
(maximum was found to be 4GB)
SUMSIZE=0
Upper limit for data-dependent memory usage during summarization
SORTSIZE=50331648 (48Mb) Specifies the amount of memory that is available to the SORT procedure
MEXECSIZE=65536 (64KB) Maximum size for a macro to execute in memory
MSYMTABMAX=4194304 (4MB) Maximum amount of memory allocated for the macro table
MVARSIZE=32768 (32KB) Maximum length of value of macro variable
REALMEMSIZE=0
Limit on the total amount of real memory to be used by the SAS System
BUFNO=1
Number of buffers for each SAS data set
BUFSIZE=0
Size of buffer for page of SAS data set
Setting value of 0 for some of the options causes the maximum allowable memory to be set
14
15
Executing a single stream of code takes approximately the same amount of CPU time
each time that code is executed. Optimizing CPU performance in these instances is
usually a tradeoff.
16
17