SQL Code Smells
SQL Code Smells
1
TABLE OF CONTENTS
Introduction 3
8. Security Loopholes 68
Acknowledgements 70
2
Introduction
Once you’ve done a number of SQL code-reviews,
you’ll be able to spot signs in the code that indicate
all might not be well. These ‘code smells’ are
coding styles that, while not bugs, suggest design
problems with the code.
Kent Beck and Massimo Arnoldi seem to have coined the term ‘Code Smell’
in the ‘Once And Only Once’ page of www.C2.com, where Kent also said that
code ‘wants to be simple’. Kent Beck and Martin Fowler expand on the issue
of code challenges in their essay ‘Bad Smells in Code’, published as Chapter
3 of the book Refactoring: Improving the Design of Existing Code (ISBN 978-
0201485677).
Although there are generic code smells, SQL has its own particular habits that
will alert the programmer to the need to refactor code. (For grounding in code
smells in C#, see ‘Exploring Smelly Code’ and ‘Code Deodorants for Code
Smells’ by Nick Harrison.) Plamen Ratchev’s wonderful article Ten Common
SQL Programming Mistakes’ lists some of these code smells along with out-
and-out mistakes, but there are more. The use of nested transactions, for
example, isn’t entirely incorrect, even though the database engine ignores
all but the outermost, but their use does flag the possibility the programmer
thinks that nested transactions are supported.
3
If you are moving towards continuous delivery of database applications, you
should automate as much as possible the preliminary SQL code-review. It’s a
lot easier to trawl through your code automatically to pick out problems, than
to do so manually. Imagine having something like the classic ‘lint’ tools used
for C, or better still, a tool similar to Jonathan ‘Peli’ de Halleux’s Code Metrics
plug-in for .NET Reflector, which finds code smells in .NET code.
One can be a bit defensive about SQL code smells. I will cheerfully write very
long stored procedures, even though they are frowned upon. I’ll even use
dynamic SQL on occasion. You should use code smells only as an aid. It is
fine to ‘sign them off’ as being inappropriate in certain circumstances. In fact,
whole classes of code smells may be irrelevant for a particular database. The
use of proprietary SQL, for example, is only a code smell if there is a chance
that the database will be ported to another RDBMS. The use of dynamic SQL is
a risk only with certain security models. Ultimately, you should rely on your own
judgment. As the saying goes, a code smell is a hint of possible bad practice
to a pragmatist, but a sure sign of bad practice to a purist.
4
1 Problems with
Database Design
1.1 Packing lists, complex data, or
other multivariate attributes into
a table column
It is permissible to put a list or data document in a column only if
it is, from the database perspective, ‘atomic’, that is, never likely
to be shredded into individual values; in other words, it is fine as
long as the value remains in the format in which it started. You
should never need to split an ‘atomic’ value. We can deal with
values that contain more than a single item of information: We
store strings, after all, and a string is hardly atomic in the sense
that it consists of an ordinally significant collection of characters
or words. However, the string shouldn’t represent a list of
values. If you need to parse the value of a column to access
values within it, it is likely to need to be normalised, and it will
certainly be slow. Occasionally, a data object is too
complicated, peripheral, arcane or ephemeral
to be worth integrating with the database’s
normalised structure. It is fair to then
take an arm’s-length approach and
store it as XML, but in this case it will
need to be encapsulated by views
and table-valued functions so that
the SQL Programmer can easily
access the contents.
5
1.2 Using inappropriate data types
Although a business may choose to represent a date as a single
string of numbers or require codes that mix text with numbers, it is
unsatisfactory to store such data in columns that don’t match the
actual data type. This confuses the presentation of data with its
storage. Dates, money, codes and other business data can be
represented in a human-readable form, the ‘presentation’ mode, they
can be represented in their storage form, or in their data-interchange
form. Storing data in the wrong form as strings leads to major issues
with coding, indexing, sorting, and other operations. Put the data into
the appropriate ‘storage’ data type at all times.
6
1.4 Using an Entity Attribute Value
(EAV) model
The use of an EAV model is almost never justified and leads to very
tortuous SQL code that is extraordinarily difficult to apply any sort
of constraint to. When faced with providing a ‘persistence layer’ for
an application that doesn’t understand the nature of the data, use
XML instead. That way, you can use XSD to enforce data constraints,
create indexes on the data, and use XPath to query specific elements
within the XML. It is then, at least, a reliable database, even though it
isn’t relational!
7
1.6 Creating tables as ‘God Objects’
‘God Tables’ are usually the result of an attempt to encapsulate
a large part of the data for the business domain in a single wide
table. This is usually a normalization error, or rather, a rash and over-
ambitious attempt to ‘denormalise’ the database structure. If you have
a table with many columns, it is likely that you have come to grief on
the third normal form. It could also be the result of believing, wrongly,
that all joins come at great and constant cost. Normally they can be
replaced by views or table-valued functions. Indexed views can have
maintenance overhead but are greatly superior to denormalisation.
8
1.8 Using command-line and OLE
automation to access server-based
resources
In designing a database application, there is sometimes functionality
that cannot be done purely in SQL, usually when other server-based,
or network-based, resources must be accessed. Now that SQL
Server’s integration with PowerShell is so much more mature, it
is better to use that, rather than xp_cmdshell or sp_OACreate (or
similar), to access the file system or other server-based resources.
This needs some thought and planning: You should also use SQL
Agent jobs when possible to schedule your server-related tasks. This
requires up-front design to prevent them becoming unmanageable
monsters prey to ad-hoc growth.
9
2 Problems with Table
Design
2.1 Using constraints to restrict
values in a column
You can use a constraint to restrict the values permitted in a
column, but it is usually better to define the values in a separate
‘lookup’ table and enforce the data restrictions with a foreign key
constraint. This makes it much easier to maintain and will also
avoid a code-change every time a new value is added to the
permitted range, as is the case with constraints.
10
2.4 Using too many or too few
indexes
A table in a well-designed database with an
appropriate clustered index will have an optimum
number of non-clustered indexes, depending on
usage. Indexes incur a cost to the system since they
must be maintained if data in the table changes. The
presence of duplicate indexes and almostduplicate
indexes is a bad sign. So is the presence of unused
indexes. SQL Server lets you create completely
redundant and totally duplicate indexes. Sometimes
this is done in the mistaken belief that the order of
‘included’ (non-key) columns is significant. It isn’t!
11
For your clustered index, you are likely to choose a ‘narrow’ index
which is stored economically because this value has to be held in
every index leaf-level pointer. This can be an interesting tradeoff
because the clustered index key is automatically included in
all non-clustered indexes as the row locator so non-clustered
indexes will cover queries that need only the non-clustered index
key and the clustered index key.
12
2.7 Misusing NULL values
The three-value logic required to handle NULL values can cause a
problems in reporting, computed values and joins. A NULL value
means ‘unknown’, so any sort of mathematics or concatenation
will result in an unknown (NULL) value. Table columns should
be nullable only when they really need to be. Although it can
be useful to signify that the value of a column is unknown or
irrelevant for a particular row, NULLs should be permitted only
when they’re legitimate for the data and application, and fenced
around to avoid subsequent problems.
13
2.9 Creating a table without specifying
a schema
If you’re creating tables from a script, they must, like views and
routines, always be defined with two-part names. It is possible for
different schemas to contain the same table name, and there are
some perfectly legitimate reasons for doing this. Don’t rely on
dbo being the default schema for the login that executes the
create script: The default can be changed.
14
A ‘table’ without a clustered index is actually a heap, which is
a particularly bad idea when its data is usually returned in an
aggregated form, or in a sorted order. Paradoxically, though, it
can be rather good for implementing a log or a ‘staging’ table
used for bulk inserts, since it is read very infrequently, and there
is less overhead in writing to it. A table with a non-clustered
index , but without a clustered index can sometimes perform well
even though the index has to reference individual rows via a Row
Identifier rather than a more meaningful clustered index. The
arrangement can be effective for a table that isn’t often updated
if the table is always accessed by a non-clustered index and there
is no good candidate for a clustered index.
15
2.12 Defining a table column without
explicitly specifying whether it is
nullable
In a CREATE TABLE DDL script, a column definition that has
not specified that a column is NULL or NOT NULL is a risk. The
default nullability for a database’s columns can be altered by the
‘ANSI_NULL_DFLT_ON’ setting. Therefore one cannot assume
whether a column will default to NULL or NOT NULL. It is safest
to specify it in the column definition for noncomputed columns,
and it is essential if you need any portability of your table design.
Sparse columns must always allow NULL.
16
3 Problems with Data
Types
3.1 Using VARCHAR(1),
VARCHAR(2), etc.
Columns of a short or fixed length should have a fixed size
because variable-length types have a disproportionate storage
overhead. For a large table, this could be significant. The narrow
a table, the faster it can be accessed. In addition, columns of
variable length are stored after all columns of fixed length, which
can have performance implications. For short strings, use a fixed
length type, such as CHAR, NCHAR, and BINARY.
17
3.3 Using deprecated language
elements such as the TEXT/NTEXT
data types
There is no good reason to use TEXT or NTEXT. They were a first,
flawed attempt at BLOB storage and are there only for backward
compatibility. Likewise, the WRITETEXT, UPDATETEXT and
READTEXT statements are also deprecated. All this complexity
has been replaced by the VARCHAR(MAX) and NVARCHAR(MAX)
data types, which work with all of SQL Server’s string functions.
18
3.6 Mixing parameter data types in a
COALESCE expression
The result of the COALESCE expression (which is shorthand for
a CASE statement) is the first non-NULL expression in the list of
expressions provided as arguments. Mixing data types can result in
errors or data truncation.
19
3.10 The length of the VARCHAR,
VARBINARY and NVARCHAR
datatype in a CAST or CONVERT
clause wasn’t explicitly specified
When you convert a datatype to a varchar, you do not have
to specify the length. If you don’t do so, SQL Server will use a
Varchar length sufficient to hold the string. It is better to specify
the length because SQL Server has no idea what length you may
subsequently need.
20
3.12 Using VARCHAR(MAX) or
NVARCHAR(MAX) when it isn’t
necessary
VARCHAR types that specify a number rather than MAX have a
finite maximum length and can be stored in-page, whereas MAX
types are treated as BLOBS and stored off-page, preventing online
reindexing. Use MAX only when you need more than 8000 bytes
(4000 characters for NVARCHAR, 8000 characters for VARCHAR).
21
4 Problems with
expressions
4.1 Excessive use of parentheses
Some developers use parentheses even when they aren’t
necessary, as a safety net when they’re not sure of precedence.
This makes the code more difficult to maintain and understand.
22
4.3 Injudicious use of the LTRIM and
RTRIM functions
These don’t work as they do in any other computer language. They
only trim ASCII space rather than any whitespace character. Use a
scalar user-defined function instead.
23
4.6 Relying on data being implicitly
converted between types
Implicit conversions can have unexpected results, such as
truncating data or reducing performance. It is not always clear in
expressions how differences in data types are going to be
resolved. If data is implicitly converted in a join operation, the
database engine is more likely to build a poor execution plan. More
often then not, you should explicitly define your conversions to
avoid unintentional consequences.
See: SR0014: Data loss might occur when casting from {Type1} to
{Type2}
Usage of @@identity
24
4.8 Using BETWEEN for DATETIME
ranges
You never get complete accuracy if you specify dates when using
the BETWEEN logical operator with DATETIME values, due to the
inclusion of both the date and time values in the range. It is
better to first use a date function such as DATEPART to convert
the DATETIME value into the necessary granularity (such as day,
month, year, day of year) and store this in a column (or columns),
then indexed and used as a filtering or grouping value. This can be
done by using a persisted computed column to store the required
date part as an integer, or via a trigger.
25
4.10 INSERT without column list
The INSERT statement need not have a column list, but omitting
it assumes certain columns in a particular order. It likely to cause
errors if the table in to which the inserts will be made is changed,
particularly with table variables where insertions are not checked.
Column lists also make code more intelligible.
26
5 Difficulties with
Query Syntax
5.1 Creating UberQueries (God-like
Queries)
Always avoid overweight queries (e.g., a single query with four
inner joins, eight left joins, four derived tables, ten subqueries, eight
clustered GUIDs, two UDFs and six case statements).
27
5.4 Using the old Sybase JOIN syntax
The deprecated syntax (which includes defining the join condition
in the WHERE clause) is not standard SQL and is more difficult to
inspect and maintain. Parts of this syntax are completely
unsupported in SQL Server 2012 or higher.
The “old style” Microsoft/Sybase JOIN style for SQL, which uses
the =* and *= syntax, has been deprecated and is no longer used.
Queries that use this syntax will fail when the database engine
level is 10 (SQL Server 2008) or later (compatibility level 100).
The ANSI-89 table citation list (FROM tableA, tableB) is still ISO
standard for INNER JOINs only. Neither of these styles are worth
using. It is always better to specify the type of join you require,
INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS,
which has been standard since ANSI SQL-92 was published. While
you can choose any supported JOIN style, without affecting the
query plan used by SQL Server, using the ANSI-standard syntax
will make your code easier to understand, more consistent, and
portable to other relational database systems.
28
5.5 Using correlated subqueries
instead of a join
Correlated subqueries, queries that run against each returned
by the main query, sometimes seem an intuitive approach, but
they are merely disguised cursors needed only in exceptional
circumstances. Window functions will usually perform the same
operations much faster. Most usages of correlated subqueries are
accidental and can be replaced with a much simpler and faster
JOIN query.
29
5.8 Not using two-part object names
for object references
The compiler can interpret a two-part object name quicker than just
one name. This applies particularly to tables, views, procedures
and functions. The same name can be used in different schemas,
so it pays to make your queries unambiguous.
It is a very good idea to get into the habit of qualifying the names of
procedures with their schema. It is not only makes your code more
resilient and maintainable, but as Microsoft introduces new
features that use schemas, such as auditing mechanisms, you
code contains no ambiguities that could cause problems.
30
5.9 Using INSERT INTO without
specifying the columns and their
order
Not specifying column names is fine for interactive work, but if you
write code that relies on the hope that nothing will ever change,
then refactoring could prove to be impossible. It is much better
to trigger an error now than to risk corrupted results after the SQL
code has changed. Column lists also make code more intelligible.
31
5.11 Including complex conditionals in
the WHERE clause
It is tempting to produce queries in routines that have complex
conditionals in the WHERE clause where variables are used for
filtering rows. Usually this is done so that a range of filtering
conditions can be passed as parameters to a stored procedure or
tale-valued function. If a variable is set to NULL instead of a search
term, the OR logic or a COALESCE disables the condition. If this is
used in a routine, very different queries are performed according
to the combination of parameters used or set to null. As a result,
the query optimizer must use table scans, and you end up with
slowrunning queries that are hard to understand or refactor. This is
a variety of UberQuery which is usually found when some complex
processing is required to achieve the final result from the filtered
rows.
32
5.13 Assuming that SELECT statements
all have roughly the same execution
time
Few programmers admit to this superstition, but it is apparent
by the strong preference for hugely long SELECT statements
(sometimes called UberQueries). A simple SELECT statement runs
in just a few milliseconds. A process runs faster if the individual
SQL queries are clear enough to be easily processed by the query
optimizer. Otherwise, you will get a poor query plan that performs
slowly and won’t scale.
33
5.15 Referencing an unindexed column
within the IN predicate of a WHERE
clause
A WHERE clause that references an unindexed column in the IN
predicate causes a table scan and is therefore likely to run far more
slowly than necessary.
See: SR0005: Avoid using patterns that start with a ‘%’ in LIKE
predicates
34
5.18 Supplying object names without
specifying the schema
Object names need only to be unique within a schema. However,
when referencing an object in a SELECT, UPDATE, DELETE, MERGE
or EXECUTE statements or when calling the OBJECT_ID function,
the database engine can find the objects more easily found if the
names are qualified with the schema name.
35
5.20 Not using NOCOUNT ON in stored
procedures and triggers
Unless you need to return messages that give you the row count
of each statement, you should specify the NOCOUNT ON option to
explicitly turn off this feature. This option is not likely to be a
significant performance factor one way or the other. Whenever
you execute a query, a short message is returned to the client with
the number of rows that are affected by that T-SQL statement.
When you use SET NOCOUNT ON, this message is not sent. This
can improve performance by reducing network traffic slightly. It is
best to use SET NOCOUNT ON in SQL Server triggers and stored
procedures, unless one or more of the applications using the
stored procedures require it to be OFF, because they are reading the
value in the message.
36
5.21 Using the NOT IN predicate in the
WHERE clause
You’re queries will often perform poorly if your WHERE clause
includes a NOT IN predicate that references a subquery. The
optimizer will likely have to use a table scan instead of an index
seek, even if there is a suitable index. You can almost always get a
better-performing query by using a left outer join and checking for a
NULL in a suitable NOT NULLable column on the right-hand side.
Even if you’re not joining the two tables via the primary and foreign
keys, with a table of any size, an index is usually necessary to
check changes to PRIMARY KEY constraints against referencing
FOREIGN KEY constraints in other tables to verify that changes to
the primary key are reflected in the foreign key
37
5.23 Using a non-SARGable (Search
ARGument..able) expression in a
WHERE clause
In the WHERE clause of a query it is good to avoid having a column
reference or variable embedded within an expression, or used as
a parameter of a function. A column reference or variable is best
used as a single element on one side of the comparison operator,
otherwise it will most probably trigger a table scan, which is
expensive in a table of any size.
38
5.25 Using an unverified scalar user-
defined function as a constant.
The incorrect use of a non-schema bound scalar UDF, as a global
database constant, is a major performance problem and must be
winkled out of any production code. The problem arises because
SQL Server doesn’t trust non-schema verified scalar functions as
being precise and deterministic, and so chooses the safest, though
slowest, option when executing them. It’s a slightly insidious
problem because it doesn’t really show its full significance in the
execution plan, though an Extended Events session will reveal what
is really going on.
39
5.27 Using NOT IN with an expression
that allows null values
If you are using a NOT IN predicate to select only those rows that
match the results returned by a subquery or expression, make sure
there are no NULL values in those results. Otherwise, your outer
query won’t return the results you expect. In the case of both IN and
NOT IN, it is better to use an appropriate outer join.
40
5.29 An UPDATE statement has omitted
the WHERE clause, which would
update every row in the table
It is very easy to update an entire table, over-writing the data in it,
when you mean to update just one or more rows. At the console,
Delete or Update statements should also be in a transaction so
you can check the result before committing.
41
6 Problems with
naming
6.1 Excessively long or short identifiers
Identifiers should help to make SQL readable as if it were English.
Short names like t1 or gh might make typing easier but can cause
errors and don’t help teamwork. At the same time, names should
be names and not long explanations. Remember that these are
names, not documentation. Long names can be frustrating to the
person using SQL interactively, unless that person is using SQL
Prompt or some other IntelliSense system, through you can’t rely
on it.
42
6.4 Using reserved words in names
Using reserved words makes code more difficult to read, can cause
problems to code formatters, and can cause errors when writing
code.
43
6.7 Using square brackets
unnecessarily for object names
If object names are valid and not reserved words, there is no need
to use square brackets. Using invalid characters in object names
is a code smell anyway, so there is little point in using them. If you
can’t avoid brackets, use them only for invalid names.
44
7 Problems with
routines
7.1 Including few or no comments
Being antisocial is no excuse. Either is being in a hurry. Your scripts
should be filled with relevant comments, 30% at a minimum. This is
not just to help your colleagues, but also to help you-in-thefuture.
What seems obvious today will be as clear as mud tomorrow,
unless you comment your code properly. In a routine, comments
should include intro text in the header as well as examples of
usage.
45
7.3 Excessively ‘overloading’ routines
Stored procedures and functions are compiled with query plans.
If your routine includes multiple queries and you use a parameter
to determine which query to run, the query optimizer cannot come
up with an efficient execution plan. Instead, break the code into a
series of procedures with one ‘wrapper’ procedure that determines
which of the others to run.
46
7.6 Creating a Multi-statement table-
valued function, or a scalar
function when an inline function is
possible
Inline table-valued Functions run much quicker than a Multi-
statement table-valued function, and are also quicker than scalar
functions. Obviously, they are only possible where a process can be
resolved into a single query.
47
7.9 High cyclomatic complexity
Sometimes it is important to have long procedures, maybe with
many code routes. However, if a high proportion of your procedures
or functions are excessively complex, you’ll likely have trouble
identifying the atomic processes within your application. A high
average cyclomatic complexity in routines is a good sign of
technical debt.
48
7.12 Using Cursors
SQL Server originally supported cursors to more easily port dBase
II applications to SQL Server, but even then, you can sometimes use
a WHILE loop as an effective substitute. However, modern versions
of SQL Server provide window functions and the CROSS/OUTER
APPLY syntax to cope with most of the traditional valid uses of the
cursor.
49
7.15 Excessive use of the WHILE loop
A WHILE loop is really a type of cursor. Although a WHILE loop can
be useful for several inherently procedural tasks, you can usually
find a better relational way of achieving the same results. The
database engine is heavily optimised to perform set-based
operations rapidly. Don’t fight it!
50
7.18 Forgetting to set an output variable
The values of the output parameters must be explicitly set in all
code paths, otherwise the value of the output variable will be NULL.
This can result in the accidental propagation of NULL values. Good
defensive coding requires that you initialize the output parameters
to a default value at the start of the procedure body.
51
7.21 Use of a Hardcoded current
database name in a procedure call
You only need to specify the database when calling a procedure
in a different database. It is better to avoid using hardcoded
references to the current database as this causes problems if
you later do the inconceivable by changing the databases name
or cut-and-pasting a routine. There is no performance advantage
whatsoever in specifying the current database if the procedure is in
the same database.
52
7.23 Creating a routine with ANSI_
NULLS or QUOTED_IDENTIFIER
options set to OFF
At the time the routine is created (parse time), both options should
normally be set to ON. They are ignored on execution. The reason
for keeping Quoted Identifiers ON is that it is necessary when you
are creating or changing indexes on computed columns or indexed
views. If set to OFF, then CREATE, UPDATE, INSERT, and DELETE
statements on tables with indexes on computed columns or
indexed views will fail. SET QUOTED_IDENTIFIER must be ON when
you are creating a filtered index or when you invoke XML data type
methods. ANSI_NULLS will eventually be set to ON and this ISO
compliant treatment of NULLS will not be switchable to OFF.
53
7.26 Using the CHARINDEX function in a
WHERE Clause
Avoid using CHARINDEX in a WHERE clause to match strings if you
can use LIKE (without a leading wildcard expression) to achieve the
same results.
54
7.29 Using SET ROWCOUNT to specify
how many rows should be returned
We had to use this option until the TOP clause (with ORDER BY)
was implemented. The TOP option is much easier for the query
optimizer.
55
CREATE FUNCTION dbo.CurrencyTable(@Region VARCHAR(20)
= ‘%’)
--returns the currency for the region, supports
wildcards
--SELECT * FROM dbo.CurrencyTable(DEFAULT) returns all
--SELECT * FROM dbo.CurrencyTable(‘%Slov%’)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT TOP 100 PERCENT CountryRegion.Name AS
country, Currency.Name AS currency
FROM Person.CountryRegion
INNER JOIN Sales.CountryRegionCurrency
ON CountryRegion.CountryRegionCode =
CountryRegionCurrency.CountryRegionCode
INNER JOIN Sales.Currency
ON CountryRegionCurrency.CurrencyCode = Currency.
CurrencyCode
WHERE CountryRegion.Name LIKE @Region
ORDER BY Currency.Name
);
);
56
7.32 Duplicating names of objects of
different types
Although it is sometimes necessary to use the same name for the
same type of object in different schemas, it is never necessary to
do it for different object types and it can be very confusing. You
would never want a SalesStaff table and SalesStaff view and
SalesStaff stored procedure.
57
7.35 SELECT statement in trigger that
returns data to the client
Although it is possible to do, it is unwise. A trigger should never
return data to a client. It is possible to place a SELECT statement in
a trigger but it serves no practical useful purpose, and can have
unexpected effects. A trigger behaves much like a stored
procedure in that, when the trigger fires, results can be returned
to the calling application. This requires special handling because
these returned results would have to be handled in some way,
and this would have to be written into every application in which
modifications to the trigger table are allowed.
58
7.38 Using EXECUTE(string)
Don’t use EXEC to run dynamic SQL. It is there only for backward
compatibility and is a commonly used vector for SQL injection. Use
sp_executesql instead because it allows parameter substitutions
for both inputs and outputs and also because the execution plan
that sp_executesql produces is more likely to be reused.
59
7.41 Using unnecessary three-part and
four-part column references in a
select list
Sometimes, when a table is referenced in another database or
server, programmers believe that the two or three-part table name
needs to be applied to the columns. This is unnecessary and
meaningless. Just the table name is required for the columns.
Three-part column names might be necessary in a join if you have
duplicate table names, with duplicate column names, in different
schemas, in which case, you ought to be using aliases. The same
goes for cross-database joins.
60
7.44 Use of BEGIN TRANSACTION
without ROLLBACK TRANSACTION
ROLLBACK TRANSACTION rolls back a transaction to the beginning
of it, or to a savepoint inside the transaction. You don’t need a
ROLLBACK TRANSACTION statement within a transaction, but if
there isn’t one, then it may be a sign that error handling has not
been refined to production standards.
61
7.47 Not defining a default value for a
SET assignment that is the result of
a query
If a variable’s SET assignment is based on a query result and the
query returns no rows, the variable is set to NULL. In this case, you
should assign a default value to the variable unless you want it to
be NULL.
62
7.50 Not putting all the DDL statements
at the beginning of the batch
Don’t mix data manipulation language (DML) statements with
data definition language (DDL_statements. Instead, put all the DDL
statements at the beginning of your procedures or batches.
63
7.53 Literal type is not fully compatible
with procedure parameter type
A parameter passed to a procedure can be a literal (e.g. 1,’03
jun 2017’ or ‘hello world’) but it must be possible to cast it
unambiguously to the variable datatype declared for that parameter
in the body of the routine.
64
7.56 Use of the position notation after
the named notation for parameters
when calling a procedure
Parameters can be passed by position in a comma-delimited list, or
by name, but it is a bad idea to mix the two methods even when it is
possible. If a parameter has a default value assigned to it, it can be
left out of the parameter list, and it is difficult to check whether the
values you supply are for the parameters you intend.
65
7.58 Procedure parameter is not defined
as OUTPUT, but marked as OUTPUT
in procedure call statement
Output scalar parameters for procedures are passed to the
procedure, and can have their value altered within the procedure.
This allows procedures to return scalar output. The formal
parameter must be declared as an OUTPUT parameter if the actual
parameter that is passed had the OUTPUT keyword. This triggers
an error.
66
7.60 Number of passed parameters
exceeds the number of procedure
parameters
Parameters can be passed to procedures and functions in
an ordered delimited list, but never more than the number of
parameters. For a function, this must have the same number of list
members as the parameters. For a procedure you can have fewer if
defaults are declared in parameters.
67
8 Security Loopholes
68
8.3 Authentication set to Mixed Mode
Ensure that Windows Authentication Mode is used wherever
possible. SQL Server authentication is necessary only when a
server is remote or outside the domain, or if third-party software
requires SQL authentication for remote maintenance. Windows
Authentication is less vulnerable, and avoids having to transmit
passwords over the network or store them in connection strings.
69
Acknowledgements
For a booklet like this, it is best to go with the
established opinion of what constitutes a SQL Code
Smell. There is little room for creativity. In order to
identify only those SQL coding habits that could, in some
circumstances, lead to problems, I must rely on the help
of experts, and I am very grateful for the help, support
and writings of the following people in particular.
Code analysis in SQL Prompt provides fast and comprehensive analysis of T-SQL
code in SSMS and Visual Studio.
A monitoring tool that helps teams looking after SQL Server environments be more
proactive. Not only does SQL Monitor alert you to current issues, it gives you the
information you need to stop them happening in the future.
71