The Essential PROC SQL
The Essential PROC SQL
A little history
SQL is a powerful, flexible, fourth-generation sublanguage that enables complex processing through a few simple statements. You need only to indicate the desired outcome rather than outline each of the steps necessary to reach that outcome because SQL is a nonprocedural language. SQL statements allow for the complete creation, maintenance, and reporting of relational database systems using English-like statements. In the mid-1970s the Structured Query Language (SQL) was developed by IBM researchers in San Jose, California, to support a new relational database model. In June 1970, Dr. E. F. Codd, a researcher with IBM, published his mathematical theory of data management in a paper entitled "A Relational Model of Data for Large Shared Data Banks." His ideas resulted in the definition of a new form of data storage structure, a table consisting of rows and columns. The relational database model was thus born from tables and the relationships between tables. SQL was designed to enable access to data stored in a relational database. It allows you to create, alter, and delete tables as well as modify or delete existing records or add new records to tables. By the late 1980s and early 1990s, each database vendor had its own version of SQL. In an effort to minimize the inconsistencies and provide portability of SQL statements, the American National Standards Institute (ANSI) developed a set of international standards to be applied to the language. Several standards have been published by ANSI since 1986, including SQL-89, SQL-92, SQL-99 and SQL-2003. Each successive SQL language release extends functionality. However, the foundations of the SQL language have remained mostly unchanged. Vendors that are compliant with the ANSI SQL-92 standard, for example, are also compliant with the SQL-99 core function standards. The power and ease-of-use of SQL has resulted in its use in hundreds of database products today. Companies such as Oracle, Microsoft, Sybase, and IBM depend heavily on SQL in their database products regardless of operating system. As a result, anyone working with databases today must be proficient in SQL. The ANSI standards have resulted in a set of more or less common statements with agreed upon functionality from each vendor. However, many different ideas and syntactical differences are found in each flavor of SQL. PROC SQL underwent major change in SAS Version 8, resulting in a more versatile procedure that is also more closely in line with the ANSI SQL-92 standard. The new version extends the functionality of the SQL language with elements from Base SAS.
The compact nature of a SQL statement allows for quick updates of programming code if the data changes. SQL statements can also include macro variables allowing for a generic program that can be dynamically updated.
Views provide the ability to selectively allow users to see columns and rows of information in one or more tables and data sets. Views can be generated from a complex SELECT statement that retrieves information from one or more tables or data sets. However, you issue a simple SELECT statement against the view, retrieving up-to-date information because the view is re-created each time it is queried. The dynamic nature of views makes them invaluable for reporting applications. Tip: Most table creation in a database can be accomplished using PROC SQL statements. If additional database-specific statements are required, they can be passed directly to the Relational Database Management Systems (RDBMS) for processing using the SQL Pass-Through facility. Only those users with appropriate database privileges can successfully submit CREATE statements.
Ease of maintenance
PROC SQL statements such as ALTER, INSERT, UPDATE, DROP, and DELETE provide for the addition of new data and the modification or update of existing data in a table. From within a SAS session, tables in a database as well as tables and data sets stored in native SAS libraries can be maintained by users with appropriate database privileges. New rows can be added to tables directly using the INSERT statement, or they may be taken from one or more tables or data sets in any active library, including a database. One or more criteria can be set, limiting the rows taken from the other sources. In PROC SQL, a single UPDATE statement applies the changes to existing rows in a table without creating a new table or data set. The rows modified by the UPDATE statement can be limited through WHERE clause criteria based on values within the same table or other tables. Moreover, updates can be easily applied from records in one or more other data sets or tables. Existing tables and data sets can also be altered using the ALTER statement to include additional columns and add column modifiers such as labels or formats. In each case, the work is done on the existing table without the need to create a new table or data set.
Security
USER, a special keyword, when added to a PROC SQL INSERT or UPDATE statement can be used to store the user ID associated with the action in a table. Such statements can be triggered by specific events to execute in the background of applications, allowing for the creation of an effective audit table. Dates and other information collected from the SAS session can also be added to the entry.
Database security is also maintained. A user must have the appropriate security to create tables, views, and indexes within a database. Only those users with appropriate database privileges can successfully submit INSERT, UPDATE, ALTER, and DELETE statements. In addition, only those users with read-access to tables may report from them or views built from those tables using the SELECT statement.
Optimized performance
The PROC SQL optimizer automatically works out a plan for executing SQL statements in the most efficient manner possible. Indexes can be built to provide better performance for your queries. In addition, directions for the optimizer may be added as statement options. Performance optimization is important when tables are stored in one or more databases as well as native SAS libraries. The SQL optimizer determines whether processing should be transferred to the database or be handled by SAS. Options such as DBMASTER, new in SAS 9, assist in the optimizer in efficient handling of queries involving tables residing in different database locations.
Adaptability
Flexibility in a changing environment
We all know how unstable the computing environment in companies is today. Your company may decide to implement a new data warehouse using Sybase instead of DB2. A new package may be introduced to generate reports from your Oracle database. If you are using a SAS/ACCESS LIBNAME statement to connect to a database, the only change needed regardless of the database is to the LIBNAME connection string required to establish a connection to the database. Using the familiar SAS interface, you can easily create new tables and update and retrieve your data. There is no need to learn another product or interface such as Oracle SQL*Plus or ISQL. Moreover, the information extracted using SQL is directly available for further processing in a SAS DATA step or other SAS procedures. SAS PROC SQL allows you to apply your standard SAS output formats and labeling options and almost all of the SAS functions. In fact, if you are a SAS programmer, you already know more about SQL statements than most other programmers!
Fuzzy logic
A wide range of criteria can be applied to SQL queries thereby limiting the rows retrieved or manipulated by the query. However, often the criteria we wish to apply cannot be written in a simple fashion using mathematical operators such as =, <, or >.
PROC SQL WHERE clauses may include conditions that require fuzzy logic or inexact matching. Fuzzy logic can be applied to pattern-matching criteria in an SQL query. SAS functions such as SCAN and CONTAINS allow us to parse a string for the inclusion of various characters. The LIKE operator can be used in conjunction with wildcard symbols to restrict character string matches to a particular position within a column value. For criteria based on a range, the BETWEEN operator can be used to set the bounds. PROC SQL may also include the SAS SOUNDEX function in the WHERE clause of a query. This function will match column values that sound similar to the given value.
Although the PROC SQL INSERT statement may be used to add rows to a table or data set, a VALUES keyword is required for each row and the complete record must be enclosed in parentheses. In addition, all character and missing variables must be enclosed in quotation marks. Because many files that are imported contain, at best, a delimiter between fields, the added syntax required by the INSERT statement can significantly add to the workload of data imports. The SAS DATA step is your only solution if you are attempting to import fixed-width columnar data into SAS. PROC SQL does not allow for positional column references in the INSERT statement.