Writing Good SQL
Writing Good SQL
Writing Good SQL
OVERVIEW
SQL WHAT?
The Essence of SQL: A Guide to Learning Most of SQK in the Least Amount of Time by
David Rozenshtein in SQL Forum Vol 4 No 4, whole issue (later a book) [Excellent
short introduction to SQL]
Performance Tips for Transact-SQL, slides from a presentation by Jeff Lichtman
Subquery Processing Performance Improvements, slides from a presentation by Jeff
Lichtman
Oracle PL/SQL Tips & Techniques (Oracle Series) by Joseph C. Trezzo (the definitive
'Tips' book on PL/SQL)
Oracle8 PL/SQL Programming by Urman and Rinaldi
1001 SQL Tips by Konrad King (due out in April 2001)
SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL by M. J.
Hernandez
SQL for Smarties: Advanced SQL Programming by Joe Celko
SQL: The Complete Reference (Osborne's complete reference series) by Groff and
Weinberg
The Guru's Guide to Transact-SQL by Kenneth W. Henderson
Understand the data objects youll use (esp. tables, views, indexes, keys)
Understand the relationships between the data objects youll use (esp. join
opportunities, cardinality)
Understand the meaning of the data youll use (meaning, values, NULLs, missing
data, uniqueness)
Understand how much data you are querying or generating
Understand the SQL syntax
Understand the report requirements or the SQL processing that is needed
Understand how the SQL optimizer works
Understand how your SQL articulates with with a wrapping language if present
HOW TO WRITE SQL - GOOD WORKING METHODS
PARSE: Read the SQL supplied by the client and create an internal data structure
called a parse tree. Perform syntax checks.
NORMALIZE: The parse tree is reorganized for greater efficiency. Existence checks
and permissions checks are performed.
COMPILE: Resolve views, flatten subqueries, process constraints; optimize queries and
form query plan.
EXECUTE: Run query plan and send results to client.
PHASES OF OPTIMIZER:
QUERY ANALYSIS
o Find SEARCH clauses
o Find OR clauses
o Find JOIN clauses
INDEX SELECTION
o Select best index for each clause
JOIN SELECTION
o Determine JOIN ORDER
o Estimate costs and pick best plan
The appendices of Client/Server Data Design with Sybase by G. Anderson has long
lists of dbcc and trace commands. May not be supportedtest in development!
In most cases, an index should be used for each table in the query. Generally, when
an index isnt used the entire table is scanned. This is bad :>(
Know what tables are indexed and how. Understand how/when indexes are used.
[More slides on this.]
Understand how certain predicate constructions prevent use of an index. [More slides
on this.]
Use showplan to confirm expectations about index usage.
If an apparently obvious index was not usedunderstand why.
If you think youre query should be using an index and showplan indicates that it is
not, ask a DBA to review the problem.
UNREALISTIC EXPECTATIONS:
Some properly formed queries legitimately ask the RDBMS server to do a lot of work
and may take time to execute.
Know the size of the objects in the query and try to understand how much work is
being requested.
Start queries off more simply with fewer tables and/or more simple or more restrictive
predicates to develop a performance baseline.
COMMON SQL PROBLEMS - 4
USE OF CURSORS:
When using cursors, acquire a large stock of garlic, crucifixes and wooden stakes.
Use cursors ONLY when absolutely necessary. There are always unpredictable
performance consequences to the use of cursors.
Never use cursors when set SQL will suffice.
If cursors must be used, be attentive to transaction blocking issues.
Some cursors operations require specific types of indexes to support them (like unique
or clustered).
Keep it simplenever use cursors.
SARG EXAMPLES
There should always be indexes on keys and the optimizer will usually use them for
joins between tables on their primary or foreign keys.
There are caveats:
o The datatypes of columns being joined MUST BE THE SAME. If a column
must be converted (even implicitly in some cases) to process the join, an
index on that column can not be used.
o Functions, expressions or concatenations used against a join column will
prevent that column from being used as an index.
In cases where one of the columns must be converted in a join, try moving the
datatype conversion (the CONVERT statement) to the join column associated with the
SMALLER of the two tables. That is, force the scan to the smaller table.
JOIN ORDER
Join order can make a big difference in query performance. Info regarding join order
for a query can be obtained through showplan and the 310 trace.
If there are more than 4 tables in a join, the Sybase optimizer may not determine the
best join order. In these cases, the optimizer determines the best join order by
costing 4 table subsets and progressively selects the outermost join tables.
A Sybase SET option can increase the number of tables considered when costing
joins - up to 8 tables. Optimization time may increase significantly. This SET option
is rarely needed and should be used with care.
In rare cases when needed, join order can be hard wired with SET forceplan on. This
is rarely needed and should be used with care.
SPECIFIC T-SQL TIPS - 1
The optimizer wont use a composite index UNLESS you have a valid SARG against
(at least) the first component of the composite index.
o NOTE: A composite index is a single index constructed across more than one
table column.
EXAMPLE:
o CREATE index Acct_Ind on Acct (acct, sub, l)
o Select * from Acct where sub = 5400 [This query will NOT use the
composite index Acct_Ind.]
Provide as much valid information in WHERE clauses as possible for the optimizer to
consider:
o Provide all possible joins for the optimizer to review. For example, when
joining three tables A, B, and C; specify the join from A to B, the join from B
to C AND the join from A to C if valid.
o Supply as many valid SARGs as possible. In particular, it may be useful to
provide redundant SARGs for the same column present in each of two joined
tables.
Avoid datatype mismatches between join columns or across the <operator> in SARGs.
An index will often not be used in these cases.
When using the LIKE operator, make sure the wildcard string starts with at least one
character before the wildcard character.
o The statement: president LIKE %linton will NOT use an index.
SPECIFIC T-SQL TIPS - 6
TRANSACTION BLOCKS:
UPDATES:
Include only the columns needed in the SELECT LIST (as opposed to SELECT *). This
reduces the data sent back to the client and provides the possibility of index
covering.
o A SELECT query is covered by an index when a composite index exists on the
table that includes all the columns in the SELECT LIST.
o Modest sized high volume projections can often be made to perform better by
creating a covering index.
The use of >= or <= can provide I/O advantages relative to > or <, especially
when constructing SARGs against columns with low selectivity (a relatively small
number of distinct values). In SARGs of the form col > constant, the index finds
constant quickly, but then sequentially scans pages until the next higher value is
found. If col >= constant is used, fewer pages will be scanned.
SPECIFIC T-SQL TIPS - 10
PARAMETERS TO SPs:
A parameter to a SP (Stored Procedure) that gets used within the SP in the form
<column> <operator> <@param> must be the same datatype as column in order to
be useful as a SARG.
Parameters to SPs are known at execute/compile time. However, the values of
declared variables within SPs are not known when the SP runs. Sometimes, this
situation can be improved as in the following example (from Paulsells P&T book):
o Split this SP:
CREATE PROCEDURE p AS
DECLARE @x int
DECLARE @x int
EXEC select_proc @x
SELECT INTO:
The SELECT INTO operation (SELECT * INTO new_table FROM) is very fast, much
faster than creating a table followed by an INSERTSELECT statement. SELECT INTO
creates a new table (on the fly) based on the columns in the SELECT LIST and the
restrictions in the predicate.
SELECT INTO is minimally logged. When turned on, DBs can not recover from
transaction log dumps. SELECT INTO is enabled in most of our DSS environments
(but not our transaction processing) environments.
SELECT INTO can populate either # temporary tables (which last only for the
current session) or regular tables.
Use SELECT INTO to quickly move a subset of data into a smaller table for
further SQL processing. It can be part of a strategy for breaking large
complex SQL into stepwise parts.
SPECIFIC T-SQL TIPS - 12
Rewrite SQL to use EXISTS and IN in subqueries and IF statements instead of NOT
EXISTS and NOT IN. In cases where the table must be scanned because there are no
appropriate SARGs or indexes, Sybase can return TRUE as soon as a single row
matches for EXISTS and IN, but must read all values for the negations.
EXISTENCE CHECKS:
THENdo something.
Dont use the COUNT aggregate to perform an existence check as in: SELECT * FROM
table WHERE 0 < (SELECT COUNT(*) from table2 where...). Instead perform an
existence check: SELECT * FROM table WHERE EXISTS (SELECT 1 FROM table2
WHERE). The COUNT may cause a table scan or index scan.
Using SELECT 1 in existence subqueries is better than SELECT * since it may result
in less locking of system tables.
Use of OR between SARGs or join clauses can be expensive. Use them only if really
needed.
SARGs can be combined with OR in two ways:
o col1 = <val1> OR col1 = <val2>. IN clauses are always reduced to this.
o col1 = <val> OR col2 = <val>
In the 2nd form of OR, a table scan must be used unless ALL of the columns
are indexed and ALL of the SARGs are properly formed. If indexes can be used
on all the columns in the OR, the optimizer uses a special OR STRATEGY or multiple
matching index scans. The OR STRATEGY entails creation of a special sorted
worktable.
Avoid long OR lists or long IN (val1, val2, valn) lists since all pages will be locked for
the duration of the statement execution.
REWRITE AS:
Special optimizations apply to the MIN and MAX aggregates applied to columns.
These optimizations cant be used if:
o The column is part of an expression or function.
o There is another aggregate in the query.
o The column is not the first column of in index.
o A GROUP BY clause is used.
The following query can NOT use the special MIN/MAX optimizations:
Because more than one aggregate is being used. Split the query into a separate MIN and
MAX query.
SUBQUERIES:
Some queries with subqueries can be rewritten as joins with better performance.
Review the following example:
Review issues of uniqueness and dups when flattening subqueries into the main
query.
BOOLEAN EXPRESSIONS:
When ANDing Boolean expressions (like @variable = Lower Interest Rates), put the
expression MOST LIKELY TO FAIL first. This saves time in evaluating the others.
When ORing Boolean expressions, put the expression MOST LIKELY TO SUCCEED
first.
These considerations are most likely to be important in the context of conditional
testing within SQL WHILE loops.
Create Stored Procedures for queries which will be used repeatedly with only slight
variation (that is, can be parameterized). A Stored Procedure is precompiled - the
query tree is prepared when the procedure is first used and available for use
thereafter.
Stored Procedures reduce network traffic, can help with security (data access) issues,
can be available to all clients and have other advantages.
Accurate density and distribution statistics are essential for accurate optimization of
queries.
If substantial modification has been made to a table (update, insert, delete) make sure
UPDATE STATS is run for the table.
SQL TOPICS FOR ANOTHER DAY...