SQL Performance Tuning
SQL Performance Tuning
technical
Journal
for
the
PASS
Community
INTRODUCTION TO PERFORMANCE TUNING ON SQL SERVER 6 USING T-SQL CODEGEN FOR AUDIT TRAILS 14 PERFORMANCE TUNING USING SQL PROFILER 18 UNDERSTANDING QUERY EXECUTION PLANS TO OPTIMIZE PERFORMANCE 22 SQL CLR - HOW, WHY AND WHY NOT 27 DBA 101 - PERFORMANCE TUNING 101 33
THE PROFESSIONAL ASSOCIATION FOR SQL SERVER (PASS) PROVIDES A WIDE ARRAY OF YEAR-ROUND BENEFITS
JOIN A THRIVING COMMUNITY WITH MORE THAN 11,000 MEMBERS WORLDWIDE!
As a member of PASS, you receive a number of benefits that support your SQL Server user needs. Whether youre looking for substantial savings on SQL Server-related products, services to support current business initiatives, or educational opportunities, PASS membership benefits you on a number of SQL Server-focused fronts.
Educational Value
Discount to the 2007 PASS Community Summit: PASS Premium members receive a $200 USD discount to the LARGEST event of the year dedicated exclusively to SQL Server education and training the 2007 PASS Community Summit, September 18-21 in Denver, Colorado. Register early to save! Subscription to SQL Server Standard Magazine: With articles that appeal to developers, DBAs and Business Intelligence professionals, PASS members have access to the information and tools to help them develop their careers. Members receive 6 issues per year. International members are able to access the most current editions online.
Networking Value
Chapters (Regional User Groups): PASS provides a network of 100 official chapters/affiliated groups worldwide that offer valuable education and networking opportunities on a local level. For more information on finding a chapter in your area or starting a chapter please contact pass_chapters@sqlpass.org. Special Interest Groups (SIGs): PASS members have the option to join a variety of Special Interest Groups (SIGs) including DBA, AppDev and BI. SIGs connect PASS members from around the globe who have similar interests and face similar challenges. Visit http://sigs.sqlpass.org
Online Value
Access to Online Conference Proceedings: Only PASS members have access to the extensive source of SQL Server information from MVPs, Microsoft developers and user-experts who have presented at previous PASS user events. Job Target: This new online career resource helps connect employers looking for qualified employees and professionals looking for a new opportunity.
PASS offers two levels of membership Premier ($150/year) and Basic (FREE online membership). For more information or to join, visit http://www.sqlpass.org/membership/ Check out the new PASS SIG Web site today! Visit http://sigs.sqlpass.org
Register for the SIG Web (free of charge) and gain access to all of these great tools Book Reviews to keep you up-to-speed on useful industry resources SQL Server articles and interviews with influential industry leaders Blogs allowing you to voice your opinions and share information Educational Webcasts And much more! How can you get involved? Submit an article, script, link, write a book review or volunteer. There are many ways to get involved! Register on the SIG Web site and find out how!
PERFORMANCE TUNING
When it comes to working with databases, performance tuning is one of those things that everyone eventually needs to think about. My personal opinion is that Id rather think about it well before my systems go into production. I make sure that I am thinking about performance when I am designing my data structures, when I am writing my queries, and as I am going through quality assurance. Even with keeping all of this in perspective, there are always things that are missed. While the structures and queries that are designed for applications might be optimal when the system rolls out, time changes the performance curve of a system. Unexpected usage patterns, increased data volumes and hardware restrictions all come into play in a production system. How often do you intentionally induce index fragmentation into a test environment to see how your application is going to perform? If you dont, try it some time. It can be a real eye-opener. If your primary focus is that of a database developer, you need to make sure that you work with your production staff to ensure that they have the knowledge they need to support your systems in a production environment. If, on the other hand, your primary focus is that of a production database administrator, make sure that you engage your developers early in the course of a project so that your physical environment can be factored in when the system is designed. Communication is the key here. Tuning databases is as much an art as it is a science. If you have the luxury of a test environment that closely matches your production environment from a specification standpoint, my suggestion would be to try out a few different ways of doing things. Indexes usually give you the most bang for the tuning buck, but they do come with a price additional overhead when writing data. Rewriting queries can completely change the way that things perform. If you have a slow query, write it a few different ways and compare the performance. These are just a few ideas to get you started. I hope that some of the ideas put forth in this issue will help you walk a little further down the database tuning road. If you have any comments, please send them to me at editorial@sqlpass.org. Happy tuning!
Editor In Chief: Chuck Heinzelman Managing Editor: Susan Page Copy Editor: Susan Page Tech Editors: Darren Lacy Kathi Kellenberger Frank Scafidi Adam Machanic Graphic Design: Erin Agresta Printing: NBS-NorthBrook Services Advertising: Lesley MacDonald (lesley@ccevent.com) Subscriptions and address changes: Wayne Snyder (wayne.snyder@sqlpass.org) Feedback: Chuck Heinzelman (chuck.heinzelman@sqlpass.org) Copyright: Unless otherwise noted, all programming code and articles in this issue are the exclusive copyright of The Professional Association for SQL Server (PASS). Permission to photocopy for internal or personal use is granted to the purchaser of the magazine. SQL Server Standard is an independent publication and is not affiliated with Microsoft Corporation. Microsoft Corporation is not responsible in any way for the editorial policy or other contents of this publication. SQL Server, ADO.NET, Windows, Windows NT, Windows 2000 and Visual Studio are registered trademarks of Microsoft Corporation. Rather than put a trademark symbol in each occurrence of other trademarked name, we state that we are using the names only in an editorial fashion with no intention of infringement of the trademark. Although all reasonable attempts are made to ensure accuracy, the publisher does not assume any liability for errors or omissions anywhere in this publication. It is the readers responsibility to ensure that the procedures are acceptable in the readers environment and that proper backup is created before implementing any procedures.
Chuck Heinzelman
PASS Director of Technical Publications If you are interested in writing an article for the SQL Server Standard, please contact me at editorial@sqlpass.org. Ill get you a copy of the editorial calendar which includes the editorial focus for each of the next few issues and the deadlines for article submissions.
Database Maintenance for SQL Server 2005 Presented by Andrew Kelly This session is intended to walk the attendees through all of the different aspects of database maintenance for SQL Server 2005. It will cover not only the types of tasks but solid examples of code and techniques used to do so. This will include but not be limited to integrity checks, reindexing, backups, clean up, monitoring, updating stats, error and log checking.
TABLE OF CONTENTS
INTRODUCTION TO PERFORMANCE TUNING ON SQL SERVER . . . . . . . . . . . . . . . . . . . . . . . . . . .6
If you are new to performance tuning, it can seem quite a daunting task. This article will give you a good idea of where to look to determine the root cause of your performance problems, as well as some methods for solving them. By Wayne Fillis
Introduction
Performance Tuning in SQL Server is an art, and so involved that some DBAs specialize in this field. Every system developed has scope for improvement, and it is usually the DBAs job to fix the inevitable performance problems that develop over time. I cant say that I am a SQL Server guru, or even an expert, but I hope to share with you some of the things I have learned over the past 7 years that I have been involved with SQL Server. Most of what I have to say focuses on SQL Server 2000, but it is also relevant for SQL Server 2005. Arguably, the 2 main areas of performance tuning that a DBA would focus on are: 1. Hardware tuning (the Big Three) 2. Query tuning Lets start with The Big Three Memory, IO and CPU.
dures. Memory speeds up data access, because if SQL Server can find the requested data in memory it does not need to generate an expensive disk access to retrieve the data. Memory is also used for a variety of other processes. 32- bit hardware Each Windows application running on 32-bit architecture is only able to access 4 GB of RAM. Two GB is reserved for the Operating System (OS), and 2 GB is used by the application, which in our case is SQL Server. By using the /3G switch in the boot.ini file, you can change this behavior and reserve 3 GB for SQL Server and 1 GB for the OS. AWE SQL Server 2000 Enterprise and Developer Edition is AWE enabled. AWE stands for Address Windowing Extensions, and is available on Windows 2000/2003 Advanced Server and Datacenter Server. By enabling AWE on the OS (using the /PAE switch in boot.ini), and in SQL Server (in the configuration options), you will be able to access 8 GB of RAM with Windows 2000 Advanced Server (32 GB with Windows 2003 Advanced Server) and 64 GB with Windows 2000 Datacenter Server. For more information, see the article Enabling AWE Memory for SQL Server at
http://msdn2.microsoft.com/en-us/library/ms190673.aspx.
Memory (RAM)
Memory Allocation SQL Server allocates memory on an as-needed basis, but will basically take as much memory as it can. Memory is primarily used for the data buffer cache, as well as a cache for compiled queries and stored proce-
You also need to bear in mind that there is a performance overhead to using AWE, as the memory addresses need to be mapped to use the higher range of RAM. Also, only the relational engine data cache can use AWE memory. This can be avoided completely by using 64-bit hardware.
Arguably, the 2 main areas of performance tuning that a DBA would focus on are hardware tuning and query tuning.
64-bit hardware 64-bit Itanium 2 hardware running Windows Server 2003 can directly address up to 1,024 GB of physical memory, and 512 GB of addressable memory. SQL Server 2000 (64-bit edition) can access up to 512 GB. Furthermore, all parts of SQL Server can use this memory. A Microsoft article entitled SQL Server 2000 Enterprise Edition (64bit): Advantages of a 64-Bit Environment can help you make a choice between 32-bit and 64-bit platforms. Note that in the past few years, SQL Server 2005s increased compatibility with newer non-Itanium 64-bit platforms has opened up huge possibilities for scaling out your systems. Watch this emerging market carefully; it could take your applications to new levels. Performance Monitor You can use Performance Monitor (perfmon) from Control Panel / Administrative Tools to monitor the Buffer Cache Hit Ratio counter (under Buffer Manager), to see if data pages are being flushed from memory because there is simply not enough memory to store the data pages for a long enough period. If you monitor perfmon counter Statistics / Batch Requests/sec, SQL Compilations/sec, and SQL Re-Compilations/sec, you can see if your stored procedures are being flushed from the procedure cache too quickly due to memory constraints. You can also look for recompilations of stored procedures in the system table syscacheobjects. Adding more memory Adding more memory to your server can often solve many performance problems, but try to first improve your worst queries and any application design issues. Treat the cause, not the symptom.
Tricks to minimize impact However, there are some tricks you can use to minimize the hit on your disk subsystem. A company I used to work at had recurrent disk problems on the same day of every week the day reporting processes were run. The first thing we did was to run perfmon and monitor the counters called % Idle Time and Avg. Disk Queue Length (under Physical Disk). You will always see some queue length (this indicates that the disks cannot handle requests fast enough), but if the idle time or queue length is worse than normal then you can assume your disks are being thrashed. This is not good, as it will have a ripple effect throughout your applications. You may even notice your CPU dropping as SQL Server is throttled back by the slow disks. At this point, I think it is a good time to note that you should ideally run perfmon on your live servers frequently, so you can see what statistics you get under normal load. Then, when the system is behaving poorly, you will recognize anomalies more clearly. Idle Time At this company we noticed the idle time was consistently 100% on the data disks. We had already moved the transaction logs onto separate drives as a performance enhancement, but the disk being hammered was the main disk where our data files were being stored. A DBA noted that the tempdb database was also stored on these same disks, and the reporting processes made heavy use of sorting; hence, they used tempdb. We realized that tempdb was growing steadily in size. A quick check revealed that, to our horror, the auto growth size for our tempdb log file was set to 1 MB. It is an expensive IO process to grow a log or data file, and this was causing our idle time to hit the roof. We went through all our data and log files, and ensured the autogrow was set to 20%. If you have inherited a system, you might assume that the environment is configured correctly; but be warned it is not always so. Tempdb optimization If your applications make heavy use of tempdb, you can optimize by moving the database onto its own mirrored drives, and by adding data and log files so they equal the number of logical processors (CPUs). SQL Server will spread the load across the data files, thus reducing overhead when creating new objects. I have been told you can also get around the bottleneck of
IO
Upgrading your disks Disk usage, or IO, problems are possibly the easiest to diagnose and at the same time the most difficult part of your system to remedy. If you are not using a disk array or a SAN (Storage Area Network), adding or upgrading your physical disks can be a time consuming task, involving considerable application down-time. Even with a disk array or SAN, down-time is often inevitable.
creating new objects by allocating a tempdb data file of greater than 4 GB, though I have not tested this. Be warned, though: there is a bug in SQL Server 2000 when you have multiple tempdb data files. Whenever SQL Server is restarted the autogrow is set to 1MB. You can remedy this with a SQL Agent job which runs when SQL Server starts, and resets the auto grow to 20%. The problem is with scans Queries which do Index Scans, Table Scans and Clustered Index Scans on large tables or indexes also use a lot of IO and memory (the data read needs to be stored in memory). Occasionally a scan is the best technique for reading data, but a new index can often remove the need for the scan. If your query scans a very large table, you are performing a large amount of potentially unnecessary IO. You can monitor the level of scans on your system by using perfmon counter Access Methods / Full Scans/sec, and by using SQL Profiler events Scan:Started and Scan:Stopped. I will cover this in more detail later in the article.
CPU
CPU intensive SQL Server is also CPU intensive, though this largely depends on the nature of your code and the access method chosen by SQL Server. Compiles One thing I have noticed that uses a significant amount of CPU is compiles. Whenever a stored procedure is run for the first time the code is compiled and the access method (for example, what indexes to use) is determined. This is a very CPU-intensive process. I once saw a stored procedure take 18 seconds to compile, and 1 second to run. SQL Server will cache compiled plans and reuse them. Using sp_executesql as a parameterized query can often result in the plan being reused, but SQL code embedded in your application will almost always compile every time it runs. Monitoring perfmon counter Statistics / Batch Requests/sec, SQL Compilations/sec, and SQL Re-Compilations/sec will show if you have an excessive number of compiles taking place. Try converting embedded SQL to stored procedures, and you will find your number of compiles will drop.
many rows as possible for each page read. This reduces your IO. Fill Factor is used to keep free space available on the data pages, which is used by inserts and updates. If there is no free space on a data page and you need to insert a new row, then the page will split to make room for the new data. This is an IO intensive operation and causes fragmentation. The balance between low fill factor (more free space) and high fill factor is a balance between performance for reading, and performance for inserts and updates. One company I used to work at got the fill factor incorrect in their weekly Maintenance Plan to defragment the databases indexes. Instead of a fill factor of 90% (this means 10% free), the fill factor was accidentally set to 10% (90% free). The next morning the system was performing poorly. In fact, the users could not perform simple transactions against the database. A DBA noticed that the size of the database had grown much larger, and this was due to the increased amount of free space on the data pages. The number of IOs against the database had significantly increased, as multiple IOs were needed to retrieve the same data one IO would have taken before the defragmentation took place. I recommend the use of fill factor, but try to keep it around 8090%. Alternatively, leave it set at zero for SQL Server to maintain fill factor itself. Covering Index When SQL Server processes a query it uses the index to find the data page, then reads the page to access the remainder of the columns that the query needs. A Covering Index is used to radically improve performance of a query by including all these extra columns in the index itself. The result is a bloated index that can at times be almost the same size of the table itself. There will be an overhead for deletes, updates and inserts to the table, but the select query will usually be improved substantially. When adding covering indexes, you will need to make a judgment call regarding performance between updates vs. reads.
your query selection criteria includes a high percentage of the rows in the table (for example, a search on Gender = Male). A Scan can often generate a high volume of IO, though this depends on the size of the table or index being scanned. You can identify when scans are running by monitoring the Scan:Started and Scan:Stopped events in SQL Profiler, and by using the perfmon counter SQL Server:Access Method / Full Scans/sec. You can see the impact of a heavy scan by looking at perfmon counter Physical Disk / Avg. Disk Queue Length and % Idle Time. If the Queue Length increases, and the % Idle Time drops significantly at the same time that Full Scan/sec increases, then the scan is likely to be causing an IO problem. The scan could also be flushing data out of your data cache (which resides in memory), and causing a subsequent memory overload. The trick is to identify which queries are causing the scan. To do this, run a SQL Profiler trace at the same time you run perfmon, and try to limit the trace by filtering on CPU or Duration (in milliseconds) and Reads to pick up the expensive queries. The query that is running at the exact time you see a significant dip in % Idle Time is potentially causing IO problems.
Sometimes when you have identified a performance problem, you just need more powerful hardware.
Avoiding Scans
As mentioned previously, a scan could either be a Table Scan, Index Scan, or a Clustered Index Scan. Scans involve reading all or part of a table or index from start to finish. Sometimes, if the table being scanned is small, a scan is a more efficient way to read the data than using an index. The same holds true if
IN with a LEFT OUTER JOIN; 2. Remove User Defined Functions and system functions in the WHERE clause. While UDFs in the SELECT clause can provide excellent performance enhancements, a UDF in the WHERE clause can kill your query. The worst part of this is that the Estimated Execution Plan does not show the cost of UDFs in the total subtree cost. What is going to happen if you use a UDF or system function in your WHERE clause is that potentially every row is going to be passed through the function, and the result will be used to filter your data. If this equates to thousands of rows, your query is going to run for a while. You may get better performance by applying the function in a pre-select, placing the result into an indexed table variable, and using the table variable in your query; 3. Unnecessary ORDER BY can cause excessive IOs by sorting. Remove ORDER BY if you do not need it; 4. Limit the number of columns you are returning. This will reduce IO and network traffic; 5. Search queries with LIKE should be avoided where possible, especially LIKE on varchar (2000) fields, for example. Rather, set up a clustered index on the field you are searching on, or configure Full-Text Search; 6. LEFT OUTER JOIN can potentially slow down your query, especially if the field you are joining on contains many rows of NULL data. If you can use INNER JOIN, then use it instead of LEFT OUTER JOINS, or RIGHT OUTER JOINS; 7. If you are joining on a field that contains many rows of NULL data, you could see your query performing badly. Try not to join on that field.
Common pitfalls
A colleague of mine once advised me that before I run execution plans or look at runtimes of queries, to just take a look at the query code and check for some obvious mistakes. Here is a list of potential problems to look out for: 1. Remove NOT IN if the query inside the NOT IN returns a large number of rows, you are going to hit the tempdb database heavily and invoke excessive IOs. I recently saw a junior DBA write a NOT IN which grew tempdb to 20 GB, and it would have continued to run if the disk drive did not run out of disk space and crash the query. Replace your NOT
10
and the columns being retrieved. SQL Server 2005 Management Studio shows more detailed information than SQL2000s Query Analyser. A thick line joining icons indicates large volumes of data being moved around, and could highlight places where Scans are taking place. Hash Joins are bad, and can often be caused by a Scan at a previous step in the plan. An important part of the plan is that each icon shows the percentage of cost that the step takes up in the entire query. You can quickly see which sections of the query form the most expensive and time consuming portion, and this helps to resolve the most important issues first. To see the overall cost of the query, hover the mouse pointer over the top-left-most icon. If the subtree cost displayed is less than zero, then this query will be effective for a front-end (GUI) and web-based application requiring quick access times. Costs of between 1 and 3 are adequate, but could be tuned. A cost over 3 or 5 is potentially bad. For a back-end process, I have seen costs over 30 or 80 for search or reporting stored procedures. The real-time runtime of the query depends on the hardware it runs on, so a cost of 5 on one server could run for the same time that a lower cost query runs on a slower server. It all depends trial and error is the name of the game here.
a great tool, but it seems to have some bugs. When running the tool against a SQL2000 database I get some error messages, but it still seems to do the job correctly. When run against a large trace file, the tool doesnt work at all. However, the DTA tool will take a query or short profiler trace and quickly recommend indexes and statistics to improve your query. The quickest way to use the tool is to open your query or EXEC statement in SQL Server 2005 Management Studio (SSMS), highlight the query, right-click and select Analyze Query in Database Engine Tuning Advisor. Once the tool opens, select your database in the Database for Workload Analysis drop-down, tick your database in the list of available databases, and click Start Analysis. When analysis has completed, you can select the recommendations one after the other and copy them to the clipboard. Recommendations will be indexes or statistics you can add to improve the query. The trick here is to run the recommendations one after the other, and to check the Estimated Execution Plan between each recommendation. Some recommendations make no difference to the cost of the query and can be ignored.
SET SHOWPLAN
Figure 1: Estimated Execution Plan The SET SHOWPLAN_ALL ON is an alternative to the Estimated Execution Plan. To use the command, enter it in the Query Analyzer or SSMS query window just before your query. When you run the commands, your query is not actually run. Instead, you are presented with a detailed analysis of how the statements will be executed. I like to cut and paste the results into Excel, as I find it easier to navigate than the Query Results window.
11
Once the results are in Excel, you can easily see the same information as the Estimated Execution Plan, but in text format. I sometimes find SHOWPLAN easier to work through for complicated queries than the graphical Estimated Execution Plan.
INPUTBUFFER (spid) will show you the command being executed by the spid. I am not going to go into too much detail here. What I will say is that blocking is sometimes an indication of a performance problem on your system. If your system is not letting queries through fast enough and they start blocking, then you need to treat the cause. Blocking is generally just the symptom, but it is a good indication that something is going wrong somewhere.
Profiler
The SQL Profiler is a great tool for tracing your database and for seeing what queries are actually running. You can filter the trace on database, user, and a number of other criteria. There are a variety of different events and columns you can log, but generally the default settings are good enough. If you are looking for the worst queries, experiment with filtering on CPU, Duration and Reads. A colleague of mine uses CPU, but I generally filter on Duration greater than 1 second. This is not always indicative of a problem, as a query that runs for less than one second on the test server could run for 5 seconds on the production server under heavy load, or 20 seconds under abnormal load. If you experiment with the settings, you will be able to identify the queries causing the most load on your system. These are the queries you should be tuning.
SET STATISTICS
The SET STATISTICS IO ON command works in a similar fashion to the previous command described, except that in this case your query is actually run. This command shows the disk activity taking place when your query is run. SET STATISTICS TIME ON is a command which shows the time required to parse, compile and execute your query.
Monitoring Locks
An article on performance tuning is never complete without a discussion on locks and blocks. Generally speaking, locks are good and very necessary, but blocks are usually bad. Run the command select * from master..sysprocesses where blocked > 0 and spid <> blocked to see if blocks are taking place, and use sp_who2 to see more information about the spid. DBCC Figure 5: Profiler
Conclusion
Sometimes when you have identified a performance problem, you just need more powerful hardware. I
12
would take a step back first and take a look at the Big Three discussed previously, as well as your queries. If you can tune your worst queries by 10%, then you have been able to improve the overall performance of your system. Each query that performs badly impacts negatively on every other query running at the same time whether that query is part of your application or another application running on the same server. Query tuning works on the 80/20 rule. 80% of the work is used to tune 20% of the code, but once you have done that 20% your systems should be running better than ever before. Lastly, check out the article SQL Server 2005 Waits and Queues this is my bible for performance tuning. You will find the link under the References section.
References
Professional Sql Server 2000 Programming by Robert Vieira (WROX ISBN 1-861004-48-6) Microsoft SQL Server 2000 Performance Tuning Technical Reference by Edward Whalen, Marcilina Garcia , Steve Adrien DeLuca, and Dean Thompson (Microsoft Press - ISBN 0-7356-1270-6) SQL Server 2005 Waits and Queues by Tom Davidson; Updated by: Danny Tambs; Technical Reviewer: Sanjay Mishra The above article can be found at: http://www.microsoft.com/technet/prodtechnol/sql/be stpractice/performance_tuning_waits_queues.mspx
Wayne Fillis passion for computers started twenty years ago at the age of 14, when his parents bought him a ZX Spectrum. Running an impressive (in those days) 48k RAM, he used the computer to write games (what else does a 14 year old want to do beside play games)? After graduating with a computing diploma he started work as a DB2 Mainframe Cobol programmer at a large Insurance company, eventually moving to Visual Basic, PC Cobol and SQL Server 2000. He learned to love the power of set-based operations and was able to move his focus solely to SQL Server in 2004. He is passionate about technology and continues to enhance his knowledge and skills in whatever way possible.
13
From time to time over the years Ive obsessed on writing better audit trails. Maybe its my history as a Data Systems Tech in the Navy, maybe I just like having proof that the database worked as advertised, but seeing the full history for any row makes good sense to me. There are several types of audit trails. The most basic uses a copy of the transactional table and simply stores the last values. A better audit trail writes a full history from the creation to the deletion of the row. Ive also seen systems that keep a full history inside the transactional table by never updating a row; they just insert a new version of the row with the current values and flag that row as the active row.
easy to view the audit history, a nightly process could flush the Audit table to the Audit History table that includes non-clustered indexes and would speed retrieval.
Audit Strategies
The strategic question is what data operations to audit? Updates should certainly be audited, but inserts and deletes raise a question. It can be argued that Insert operations dont need to be audited because the original data can be viewed in the base tables, and updates are captured by the audit table. Not auditing inserts significantly reduces the size and performance load of auditing. The advantage of auditing insert operations is that it captures the full pedagogy, or source, of the data. Without auditing insert operations, it becomes necessary to store user creation metadata in the base table rows. There are two options for auditing delete operations writing a simple delete operation flag to the audit table and relying on the last insert or update audit to recreate the deleted data, or storing a verbose snapshot of the deleted row in the audit table. The first option, writing just a deleted timestamp and operation to the audit table, is an elegant solution. However it carries a risk if the audit table contains less than a complete picture of the data. Not auditing insert operations beginning with the tables first row means that the delete flag option is insufficient. Because most databases see more inserts than deletes, and its highly likely that the audit system will be applied to databases already in production, it makes sense to me to not audit insert operations, and to store a verbose record of delete operations. For the sake of options,
There are several types of audit trails. The most basic uses a copy of the transactional table and simply stores the last values.
There are non-clustered indexes on the audit table to optimize the table for inserts. If the application makes it
14
however, Ill include the insert operation trigger and both types of delete in the sample code.
the procedure and easily keep the audit trail triggers up-to-date. You get the speed of the fixed audit trigger without the pain. Also, code-gened code is terribly consistent. Ive been meaning to write this for about a year since I wanted to include it in the SQL Server 2005 Bible, but just didnt have the time. Four events coalesced for me recently regarding audit trail triggers. First, I saw a blog entry stating that the only way to build a dynamic audit trail trigger was to use CLR. UGHGG!?!?! 2) The client I mentioned really needs a good audit trail and I dont want to write the code by hand. 3) A reader wrote asking for more information and advice on implementing the T-SQL dynamic audit trail trigger that I wrote about in the SQL Server 2000 Bible. I wanted to give him a better solution than my old dynamic trigger. And 4) Im starting to plan the next edition of SQL Server Bible and on the list of new chapters was this idea for a better audit trail method.
AutoAudit
So, the AutoAudit stored procedure accepts a schema and table name and then it adds created and modified columns to the table and code generates audit triggers for the table. The stored procedure also creates the audit table (if its not already there). The AutoAuditAll stored procedure simply calls the AutoAudit stored procedure for every table in the database (except, of course, the audit table). You can download the AutoAudit scripts from www.SQLServerBible.com. The main script includes all the stored procedures, and a test script executes AutoAudit against a couple of tables in AdventureWorks. I have to point out that AdventureWorks tables include a column called ModifiedDate. Running AutoAudit on AdventureWorks means that it has two audit systems in place. Since AutoAudit automatically audits every base table column (with certain data type exceptions), youll see the ModifiedDate column in the triggers even though its not a smart idea to audit changes to the column used to audit last modified date. If AdventureWorks was a production database, the fix would be to manually edit the trigger and remove the extra code. At the time of this writing, AutoAudit is up to version 1.07. This version is limited to tables with single column primary keys, but Im working a version that will handle
15
composite primary keys that may be complete by the time you read this. So heres what a code-generated trigger looks like. This trigger was generated for the Production.Culture table in AdventureWorks. I chose this table to save space in the article because it has only 3 columns.
ALTER TRIGGER [Production].[Culture_Audit_Update] ON [Production].[Culture] AFTER Update NOT FOR REPLICATION AS SET NoCount On generated by AutoAudit on Feb 6 2007 2:09PM created by Paul Nielsen www.SQLServerBible.com DECLARE @AuditTime DATETIME SET @AuditTime = GetDate() Begin Try IF UPDATE([CultureID]) INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), Production.Culture, u, Inserted.[CultureID], NULL, Row Description (e.g. Order Number) NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line) [CultureID], Cast(Deleted.[CultureID] as VARCHAR(50)), Cast(Inserted.[CultureID] as VARCHAR(50)) FROM Inserted JOIN Deleted ON Inserted.[CultureID] = Deleted.[CultureID] AND isnull(Inserted.[CultureID],) <> isnull(Deleted.[CultureID],) IF UPDATE([Name]) INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), Production.Culture, u, Inserted.[CultureID], NULL, Row Description (e.g. Order Number) NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line) [Name], Cast(Deleted.[Name] as VARCHAR(50)), Cast(Inserted.[Name] as VARCHAR(50)) FROM Inserted JOIN Deleted ON Inserted.[CultureID] = Deleted.[CultureID] AND isnull(Inserted.[Name],) <> isnull(Deleted.[Name],) IF UPDATE([ModifiedDate]) INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), Production.Culture, u, Inserted.[CultureID], NULL, Row Description (e.g. Order Number) NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line)
[ModifiedDate], Cast(Deleted.[ModifiedDate] as VARCHAR(50)), Cast(Inserted.[ModifiedDate] as VARCHAR(50)) FROM Inserted JOIN Deleted ON Inserted.[CultureID] = Deleted.[CultureID] AND isnull(Inserted.[ModifiedDate],) <> isnull(Deleted.[ModifiedDate],) End Try Begin Catch Raiserror(error in [Production].[Culture_audit_update] trigger, 16, 1 ) with log End Catch The AutoAudit stored procedure does quite a bit; heres the section of the code that generates the update trigger. The first SET command starts to build the @SQL variable by setting it to the static opening portion of the trigger. SET @SQL = CREATE TRIGGER + @SchemaName + . + @TableName + _Audit_Update + ON + @SchemaName + . + @TableName + Char(13) + Char(10) + AFTER Update + Char(13) + Char(10) + NOT FOR REPLICATION AS + Char(13) + Char(10) + SET NoCount On + Char(13) + Char(10) + generated by AutoAudit on + Convert(VARCHAR(30), GetDate(),100) + Char(13) + Char(10) + created by Paul Nielsen + Char(13) + Char(10) + www.SQLServerBible.com + Char(13) + Char(10) + Char(13) + Char(10) + DECLARE @AuditTime DATETIME + Char(13) + Char(10) + SET @AuditTime = GetDate() + Char(13) + Char(10) + Char(13) + Char(10) + Begin Try + Char(13) + Char(10) for each column select @SQL = @SQL + + IF UPDATE([ + c.name + ]) + Char(13) + Char(10) + INSERT dbo.Audit (AuditDate, SysUser, Application, HostName, TableName, Operation, PrimaryKey, RowDescription, SecondaryRow, ColumnName, OldValue, NewValue) + Char(13) + Char(10) + SELECT @AuditTime, suser_sname(), APP_NAME(), Host_Name(), + + @SchemaName + . + @TableName + , u, + Inserted.[ + @PKColumnName + ], + Char(13) + Char(10) + NULL, Row Description (e.g. Order Number) + Char(13) + Char(10) + NULL, Secondary Row Value (e.g. Oder Number for an Order Detail Line) + Char(13) + Char(10) + [ + c.name+ ], + Cast(Deleted.[ + c.name + ] as VARCHAR(50)), + Cast(Inserted.[ + c.name + ] as VARCHAR(50)) + Char(13) + Char(10) + FROM Inserted + Char(13) + Char(10) + JOIN Deleted + Char(13) + Char(10) + ON Inserted.[ + @PKColumnName + ] = Deleted.[ + @PKColumnName + ] + Char(13) + Char(10) + AND isnull(Inserted.[ + c.name + ],) <> isnull(Deleted.[ + c.name + ],) + Char(13) + Char(10)+ Char(13) + Char(10) from sys.tables as t join sys.columns as c on t.object_id = c.object_id join sys.schemas as s on s.schema_id = t.schema_id
16
join sys.types as ty on ty.user_type_id = c.user_type_id join sys.types st on ty.system_type_id = st.user_type_id where t.name = @TableName AND s.name = @SchemaName AND c.name NOT IN (created, modified,RowVersion) AND c.is_computed = 0 AND st.name IN (tinyint, smallint, int, money, smallmoney, decimal, bigint, datetime, smalldateteime, numeric, varchar, nvarchar, char, nchar, bit) order by c.column_id select @SQL = @SQL + + End Try + Char(13) + Char(10) + Begin Catch + Char(13) + Char(10) + Raiserror(error in [ + + @SchemaName + ].[ + @TableName +_audit_update] trigger, 16, 1 ) with log + Char(13) + Char(10) + End Catch EXEC (@SQL)
@SQL. The final part of the code concatenates the conclusion of the trigger. Once @SQL is fully defined, a simple EXEC (@SQL) runs the code and creates the trigger. So, I invite you to download the AutoAudit script and try it for yourself. Admittedly, telling your friends that youre using T-SQL for code generation will get you some strange looks. But its worked well in this situation and AutoAudit easily creates consistent, fast fixed, audit trail triggers. And if you do use it, let me know how it works for you.
The second section of the code uses the multiple assignment variable technique to append the columndependent portion of the trigger. The select simply finds the column names for the table by referencing the sys.columns table. Each row (representing a column in the table) by the select is concatenated with the rest of the code needed for the trigger and the appended to
Paul Nielsen is the PASS Director of Global Community Development and author of SQL Server 2005 Bible and Total Trainings SQL Server Development video. His website is www.SQLServerBible.com and you can meet him at SQLTeach in Montreal where hes speaking about Nordic (O/R dbms for SQL Server) and giving the pre-con on Database Design and Optimization Best Practices. He also leads Smart Database Design Seminars around the country.
Offer expires July 31, 2007. Valid for addresses in continental US only. Quantities are limited, and other restrictions may apply. See our site for full details.
17
Performance Monitor, a free tool with the Microsoft Windows operating systems, is an invaluable tool when locating performance bottlenecks. While this may be the primary tool of the system administrator and a valuable resource for the SQL Server DBA, there are other tools a SQL Server DBA should be aware of which go beyond hardware bottlenecks. Being able to see query execution plans with Query Analyzer, SQL Server Management Studio, or our favorite 3rd party product can be equally useful to tune poorly performing queries. Traces, however, whether server-side or through a tool like SQL Profiler, can be the most important tool of all. This article introduces the use of SQL Profiler and serverside traces for performance monitoring/tuning.
adage - to put the most effort into where well get the biggest impact. A query may take a while to run but if it only runs once a day while another query runs relatively quickly, but thousands of times an hour, we may improve performance more by concentrating on that second query.
Figure 2: SQL Profiler Trace of All Queries Figure 2 shows a trace which uses the SQLProfilerTSQL trace template. What is shown is when a batch starts and what that batch contains. We now know what queries are being run and how often they occur. Again, this is all very useful information. However, were not done with using Profiler to research performance issues. In my experience, one of the biggest banes to performance has been where locks on database objects prevent other data operations from being carried out. Profiler can help us see these issues, too, but first we must be able to construct our own Profiler trace templates.
Ive used two different trace templates to show two different things: long running queries and all queries which do run. While each of these templates records information of value, neither of them (or any of the other prepackaged trace templates) may capture the right mix
18
of events and the appropriate columns for those events for your purposes. All is not lost, as SQL Profiler gives us the ability to build our own trace templates. There are two ways to approach building your own trace template. You can start from scratch with nothing selected or you can begin by modifying an existing template. Once youve built your trace template, you can save it to be re-used again. Creating a trace template is different between SQL Profiler for SQL Server 2000 and for SQL Server 2005. For SQL Server 2000, to start one from scratch, begin with File | New | Trace Template. To copy from an existing one, start with File | Open | Trace Template. Be sure to click the Save As button before starting your edits, however. When working with SQL Server 2005, start with File | Templates and then, depending on whether you want to start from scratch or start from an existing template, chose New Template or Edit Template, respectively. Again, if you choose Edit Template and you arent editing one of your own, click the Save As button on the General tab.
what our blocking problems are being caused by. We can also add Lock: Acquired, but chances are likely well receive too much information, even if we filter for the specific object, such as by using the ObjectID filter. Figure 3 shows just such an example with a single query against a table.
Figure 3: Lock:Acquired Events firing from Querying a Single Table Therefore, monitoring Lock:Acquired is of very limited value unless we can carefully control whats being executed on the SQL Server. While SQL Profiler is good for profiling deadlocks and lock timeouts, monitoring the locks themselves is probably better done as described in Microsoft KB article 271509, How to Monitor Blocking in SQL Server 2005 and in SQL Server 2000. This article describes using a stored procedure to monitor and report blocking periodically (sp_blocker_pss80), taking snapshots in time. While the information to parse through provided by this method can still be substantial, it is far smaller than if we tried to monitor individual locking with SQL Profiler or a server-side trace. Combining a SQL Profiler trace focusing on deadlocks and lock timeouts along with the steps in 271509 can help identify the source of blocking issues quickly. Figure 4 shows just a small excerpt of the sp_blocker_pss80 stored procedure when a blocking situation is captured:
19
As you can see, the script even tells us what queries are causing the blocking. Compare this with the results from SQL Server Profiler in Figure 5:
Profiler from SQL Server 2000 or 2005. Of course, the SQL Server 2005 events are available only with the SQL Profiler available from SQL Server 2005. If we try to connect with the SQL Profiler from SQL Server 2000, we get an error. If we are connecting to a SQL Server 2000 server, in order to obtain execution plan (show plan) information we must include the BinaryData column in order to get anything back. However, in doing so, the data is stored in a format which is unusuable outside of SQL Profiler. If we convert it to a trace table the BinaryData column is typed as an image column and so far as I am aware, Microsoft has not published publicly how to translate this information into a usable form. This is where the SQL Server 2005 options with XML are a huge boon. For instance, using the Showplan XML Statistics Profiler allows us to see the execution plan along with the values related to cost, etc. Since the results are in XML we can take this information and transform it as we need to in order to evaluate the various execution plans as they occurred. We can also take the whole conglomerate of execution plans and look for key objects which would potentially indicate poor performance, such as table or index scans. By collecting all the results and filtering through the data in this fashion, were not stuck going through each potential query one-byone to see where the execution plans indicate performance tuning is needed.
Figure 5: Blocking in SQL Server Profiler As the results show, unless were dealing with a deadlock situation or the client has set a lock timeout value (by default this value isnt set), were not going to see as much information in SQL Server Profiler as we would with the stored procedure and methodology provided in 271509.
One of the hardest things to learn how to do is take the settings in SQL Profiler for events and columns and write a server-side trace that does the same thing. There are certainly benefits to a server-side trace. First and foremost, we dont have to have a client actively running to capture such information. Second, were not going to SQL Server 2005: miss events because of too much activity on the server. Showplan All We can configure Server processes SQL Server trace Showplan All For Query Compile data (SQL Server 2000) or Server processes trace data Showplan Statistics Profile (SQL Server 2005) which forces the Showplan Text trace server side. Without this setting Showplan Text (Unencoded) One of the first things for which events may not be passed to the SQL Showplan XML most DBAs use SQL Profiler is to Profiler client when the SQL Server is Showplan XML For Query Compile capture what queries are under heavy stress. This setting forces Showplan XML Statistics Profile running too long on SQL Server. the trace handling back to the server, and if thats whats required we might In the case of the SQL Server 2000, as well build a server-side trace, in any case. In actualithese are the available events, whether using the SQL ty, this is what SQL Profiler is doing, except it has a mech-
20
anism to get the information from SQL Server to be able to display it visually. Extracting a server-side trace isnt all that difficult because we can let SQL Profiler do most of the work for us. Once we get a trace set up just the way we want it, we can export the trace to a SQL script. This script can be the run on SQL Server to setup the trace so that we dont have to have Profiler up and running. In SQL Server 2000s version of SQL Profiler, this can be accomplished by File | Script Trace | and either For SQL Server 2000 or SQL Server 7.0. With SQL Server 2005s version, we can script a trace by File | Export | Script Trace Definition | and either For SQL Server 2000 or For SQL Server 2005. Here is an excerpt from a trace definition of the Standard template (SQL Server 2005):
/****************************************************/ /* Created by: SQL Server Profiler 2005 */ /* Date: 04/22/2007 10:56:08 PM */ /****************************************************/ Create a Queue declare @rc int declare @TraceID int declare @maxfilesize bigint set @maxfilesize = 5 Please replace the text InsertFileNameHere, with an appropriate filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc extension will be appended to the filename automatically. If you are writing from remote server to local drive, please use UNC path and make sure server has write access to your network share exec @rc = sp_trace_create @TraceID output, 0, NInsertFileNameHere, @maxfilesize, NULL if (@rc != 0) goto error Client side File and Table cannot be scripted Set the events declare @on bit set @on = 1 exec sp_trace_setevent @TraceID, 14, 1, @on exec sp_trace_setevent @TraceID, 14, 9, @on
Concluding Thoughts
SQL Profiler is an excellent tool for helping to diagnose performance issues with our SQL Servers. We can go a step further to run server-side traces, eliminating the need for this client tool to be up and open all the time. This article touches on how to begin using SQL Profiler and traces to help in our performance tuning, but due to the flexibility of traces in general theres a lot more I didnt cover. If youre new to using this tool, I suggest breaking it out in a development or test environment and trying to cause the types of performance issues DBAs are called on to diagnose. Take the time to get SQL Profiler set up to show those events and get accustomed with how the results will appear. If you support both SQL Server 2000 and SQL Server 2005, spend some time with both versions of SQL Profiler, as there are some substantial differences between the capabilities of the two tools based on how the SQL Server database engines are instrumented in the two versions. Finally, when you are comfortable with using SQL Profiler, use it to generate your first few sets of server-side traces. As you get more comfortable with the stored procedures and functions related to traces, you may find you wont need SQL Profiler very often, except to view data. However, in a crunch, remember how to let SQL Profiler generate the guts of a trace for you. It can be a great time saver.
As you can see, this trace definition includes the appropriate variable declarations that are needed for the trace, the actual commands to be run, as well as guidance on what we need to change in order to customize the script for our purposes.
Brian Kelley is a Systems Architect and DBA with AgFirst Farm Credit Bank and the regular security columnist for SQLServerCentral.com. He is also the author of Start to Finish Guide to SQL Server Performance Monitoring and the President of the Midlands PASS Chapter for South Carolina. You can contact him at brian_kelley@sqlpass.org.
21
SET SHOWPLAN_TEXT
The SHOWPLAN_TEXT option returns detailed execution information about the T-SQL query statement, in row format with a single column but multiple rows. The rows returned are in tree-style, hierarchical format, each row in the row set detailing the steps in the execution plan which was, or will be, performed by SQL Server. Figure 1 below shows the query execution information using the SHOWPLAN_TEXT option. In the top pane, the SHOWPLAN_TEXT option is set ON, followed by the query, then the SHOWPLAN_TEXT option being set OFF. Notice in Figure 1 that the query results are not being returned as mentioned earlier, but what you do see is a single column with four rows (each row beginning with |) which contains the query execution information.
Figure 1
22
With the SHOWPLAN_TEXT option, each node in the output tree is an individual step that the SQL Server query processor has taken (or will take) for each query step. So how does this read? This type of output is read rightto-left, top down. Thus, the operators that are most indented produce rows consumed by its parent operator, and so on all the way up the tree. In the example above, the two most inner nodes are at the same level because they are the product of a join, with the upper node being the outer table and the lower node being the inner table. The upper node is executed first with the lower node being executed over and over for each row trying to match rows to the upper node.
The SHOWPLAN_ALL option provides much more information and is easier to read. For example, notice the Node and Parent columns in Figure 2. The Node column is the ID of the node in the current query, while the Parent column is the Node ID of the parent step, meaning, hierarchically which node belongs to which parent node. This makes reading the output much easier. The other columns are defined as follows: PhysicalOp - For rows of type PLAN_ROWS (see definition of Type column below), this column contains the type of physical implementation of the node, such as Sort, Nested Loops, etc. LogicalOp - Also for rows of type PLAN_ROWS, this column contains the type of relational operator of this node, such as Sort, Inner Join, etc. Argument - Contains additional information about the type of operation being performed, and are based on the type of physical operator (the PhysicalOp column). DefinedValues - Lists the values added by this operator. In this example, the values include the column names returned by the query (in other words, the values returned by the listed columns). EstimateRows - This column contains the estimated number of rows that this operation will produce. EstimateIO - Contains the I/O cost (estimated) for this operation. Obviously, this value should be as low as you can get it. EstimateCPU - Contains the CPU cost (estimated) for this operation. AvgRowSize - Contains the average size (estimated) of the rows being returned by the current operator. TotalSubtreeCost - Contains the total cost of the current operation and its child operations. OutputList - Lists the columns being protected by the current operation. Type - Contains the Node type. EstimateExecutions - Contains the number of times (estimated) that this operator will be executed during the currently running query.
SET SHOWPLAN_ALL
The SHOWPLAN_ALL option is very similar to the SHOWPLAN_TEXT option as to the manner of output format but differs in that the SHOWPLAN_ALL option returns more detailed information that that of the SHOWPLAN_TEXT option. While the SHOWPLAN_TEXT option returned a single column, the SHOWPLAN_ALL option returns seventeen additional columns that help better explain the execution output. Figure 2 below shows the same query using the SHOWPLAN_ALL option. For better readability in this example, the results were returned to a grid rather than text.
Figure 2 Figure 3 below shows most of the other columns provided by the SHOWPLAN_ALL option.
SET SHOWPLAN_XML
It should be stated that for now, the SHOWPLAN_TEXT and SHOWPLAN_ALL options are available and work. However, in future versions of SQL Server both the SHOWPLAN_TEXT and SHOWPLAN_ALL options will be
Figure 3
23
removed. Microsoft suggests that you start using the SHOWPLAN_XML option as soon as possible so that you can start getting familiar with this option. The SHOWPLAN_XML option returns the query execution information nicely formatted as an XML document and contains the exact same information as the SHOWPLAN_ALL option. In Figure 4 below, the top pane shows the utilization of the SHOWPLAN_XML option using the same query as the previous examples, while the lower pane shows the execution results.
you are able to glean from the returned plan output. One last comment on the SHOWPLAN options. You need to have proper permissions to use and set SHOWPLAN options, and have sufficient permissions on the objects that the SHOWPLAN will access and execute.
Figure 4 Obviously the execution results returned in Figure 4 above are hard to read, but simply clicking on the link in the Results window displays a new query window with the entire XML document nicely formatted, as shown below in Figure 5.
Figure 5 Granted, reading SHOWPLAN output is not the easiest thing in the world, but after a while it will become second nature and youll be surprised at the information
24
You should be able to tell by these descriptions that these tables contain vital information. It should be noted that these tables are updated when a query is optimized by the query optimizer, not every time a query is run. A nice feature about these tables is that they are state consistent, meaning that if a query is stopped during execution, or a transaction is rolled back, the missing index information will remain. Sweet. These tables keep missing index information for the current instance of SQL Server with no way of resetting the data in the tables, such as deleting the current data. The only way to reset (remove) the data from the missing index system tables is to restart SQL Server. Restarting SQL Server will drop all data from the missing index system tables. Also, The missing index feature is enabled by default and there is no mechanism to disable this feature while SQL Server is running. This feature can be disabled by stopping SQL Server then restarting SQL Server with the -x argument. As a note, however, this feature does have its limitations. For example, it does not gather information for more than 500 index groups. Nor does it specify any ordering of columns that need to be included in an index. Also, the information regarding the columns on which the indexes are missing is minimal. However, given these (and a few more) limitations, the missing indexes feature is still extremely valuable.
button, is the Display Estimated Execution Plan button. When this button is pressed, the query is evaluated and the estimated execution plan is displayed.
SELECT P.ProductName, P.UnitsInStock, S.CompanyName FROM Products AS P INNER JOIN Suppliers AS S ON P.SupplierID = S.SupplierID WHERE P.UnitPrice > 20.00 AND P.Discontinued = 0 ORDER BY P.ProductName
Using the same query as the previous sections (shown above), the query execution plan is graphically displayed on the Execution Plan tab when the query is executed in the query window. This information is the exact same information discussed previously in this article, but is nicely laid out using icons to represent each specific execution statement. The flow is easily readable via the arrows (again, read right-to-left) as well as additional information included with each icon such as the Cost and plan type. You can see in Figure 6 that the information shown is much easier to read than the output of the SHOWPLAN options.
Figure 6 Additionally, by moving your mouse over each icon in the execution plan, a window pops up detailing the information of that operation, as shown in Figure 7. The information in the figure on the next page contains the same information that is listed earlier in this article under the SHOWPLAN_ALL section, plus some additional information. Yet, you will agree that the window below is much easier to read. Based on the operation type, the information in the popup window will change slightly. This view also contains a few more pieces of information that is not included in the SHOWPLAN_ALL option. The Estimated Operator Cost contains the cost on the query optimizer when the query operation is executed. Your goal when writing a query is to try and get this number as low as possible. By getting the individual
25
You should also notice that there is also a Display Actual Execution Plan button on the same toolbar (exactly 4 buttons to the right of the Estimated button). The Estimated button parses the query and estimates an execution plan. The Actual button executes the query before the plan is generated. Once you have all of this information, it is up to you to decide what to do with it. Is the information you are getting back good? Based on the information returned from the execution plan, you could try re-writing portions of the query to try and get the operation execution costs down.
Summary
Query tuning can be difficult, but it doesnt have to be. Knowing what tools are available to you and how to properly use them can make the task of improving the performance of your queries much more enjoyable. The intent of this article was to show you exactly that, by discussing several of the tools and features included with SQL Server 2005 that you have at your disposal and how to understand the information these tools provide.
Picture 7 operator costs down, you will subsequently get the Estimated Subtree Cost down as well.
Scott Klein is an independent consultant with passions for all things SQL Server, .NET and XML. He is the author of Professional SQL Server 2005 XML and Professional WCF Programming: .NET Development with the Windows Communication Foundation, both by Wrox, writes the bi-weekly feature article for the SQL PASS Community Connector, and has contributed articles to both Wrox (www.Wrox.com) and TopXML (www.TopXML.com). He frequently speaks at SQL Server and .NET user groups. When he is not sitting in front of a computer or spending time with his family he can usually be found aboard his Yamaha at the local motocross track. He can be reached at ScottKlein@SqlXml.com
26
Ever since the announcement that the CLR was going to be integrated into SQL Server, Ive noticed two general reactions among database professionals. Those who would classify themselves as database administrators generally had a reaction along the lines of there is no way that Im going to let CLR code run on my server. Those who would classify themselves as developers generally responded by saying something like, now Ill never need to write T-SQL again! Of course both reactions are just that reactions. The reality of the situation lies somewhere in the middle. Database administrators should be willing to allow CLR-based code to run in their databases in certain controlled circumstances. Database developers should realize that T-SQL is still the king for accessing data, and CLR-based stored procedures and functions should be used in the situations where corresponding T-SQL functionality either doesnt exist or is too complicated to be written efficiently. In this article I will attempt to explain the what, whys and how - what is the CLR, why would you want to use it, why wouldnt you want to use it, and how do you use it.
referred to as managed. Youll sometimes hear the term managed code when someone is referring to a .NET application. You can still write unmanaged applications for Windows using languages such as C++ and Visual Basic 6.0. Managed and unmanaged code can even co-exist in the same application, but that is well beyond the scope of this article. The .NET platform provides a common type system that can be used across all .NET languages. Therefore, C# applications can call VB.NET class libraries without having to convert data types. You could if you wanted to even write a data access layer in COBOL that is used from a Python front end, provided you have the appropriate .NET compilers. This interoperability provides a major advantage, since you can leverage your resources where they can work best. If the majority of your system is going to be written in C#, you can still bring in a user interface developer for your web site who only knows how to develop in VB.NET and things will work perfectly together. Over the course of the article, I will delve back into the CLR realm where necessary to explain some of the key concepts that I am covering. I promise that Ill try not to get too .NET-ish.
27
using when I get into actually writing the CLR-based code in this article is a User Defined Function that evaluates regular expressions. Another area within SQL Server that can benefit from CLR integration is the type system. You have been able to add user-defined types to SQL Server for quite a while now, but you were limited as to what you could do with them. Now, with CLR integration, you can create rich user-defined types to help suit your business needs and greatly expand SQL Servers built-in type system. Yet another area of possible use that I would like to touch on is proprietary calculations. Lets say that you are working for a company that does financial analysis and has a proprietary calculation for determining someones credit risk. That calculation takes as input a number of different parameters and returns a rating from 1 to 5 on their risk. Since the calculation is so mission critical, the source code is kept under lock and key and the compiled versions of it are closely guarded. Up until now, the calculation has only been included in the compiled Windows-based applications that are used by the financial analysts. Members of senior management are asking for summary reports containing potential clients and their associated credit risk. Without CLR integration, the only way to get that risk score was to have the user interface code write it into the database as a static field. The problem with this approach is that a change in someones underlying information would require an extra step to ensure that the credit risk field was kept up-to-date. Now that the CLR has been integrated into SQL Server, the .NET code that performs the proprietary calculation can be integrated as a user-defined scalar-valued function. It could be called inline in a query, and the results can be used in a report that senior management can run at any time. Through CLR integration, you accomplish your goal without significantly expanding the risk of exposing your proprietary calculation any further than it has already been exposed through your applications.
Im not saying that there are never good reasons to embed business logic into the database tier I have personally done it many times. There are situations such as when you need to do a manual compare on two very large data sets that embedding that logic into the database will perform better than pulling those large data sets over the network, processing them, and then returning the results to the database. As a rule, I dont like the idea of embedding all of the business logic into the database. Having said that, there is a place where the CLR can help to clean up situations that you inherit. Say, for example, that you inherit a system where all of the business logic is embedded in the database in T-SQL stored procedures. When you approach your project manager for approval to extract all of this into a business layer, they deny the request due to the amount of rework that would be required to make use of the new business layer. Your counter-proposal could be for them to allow you to break the T-SQL stored procedures into CLRbased stored procedures. This would provide the following benefits: Procedural Logic Optimization Lets face it, while T-SQL allows you to perform procedural logic such as looping and flow control it isnt the best at it. That procedural logic can be moved into a CLR-based stored procedure that makes calls into smaller T-SQL based stored procedures to perform the set-based work. Additional Development Resources If you work in environments that are similar to the ones that I work in, there are many more people who know VB.NET or C# than know T-SQL or at least know it well enough to write complex stored procedures using it. By moving the business logic stored procedures into a CLR language, you are expanding the pool of people who can provide support and maintenance. Potential Future Reuse If written correctly, CLR-based stored procedures could be reworked into a dedicated business tier when the time comes. No Changes to Calling Applications To the world outside of SQL Server, a stored procedure is a stored procedure whether it is CLR-based or written in T-SQL. This means that calling applications do not need to change at all when you convert over from a T-SQL stored procedure to a CLR stored procedure.
28
The first thing that you need to know is that CLR integration is disabled by default. This, I think, will cause more support calls from developers trying to implement CLR-based stored procedures than anything else. To turn on CLR integration, run the script in Figure 1. Alternately, you can turn off CLR integration by running the script in Figure 2.
USE MASTER GO sp_configure show advanced options, 1; GO RECONFIGURE; GO sp_configure clr enabled, 1; GO RECONFIGURE; GO
DateTime DateTime Object Same class that is bound to the defined type in the same assembly or a dependent assembly. None None None None
Figure 3 SQL Server and CLR Data Types Last, but not least, lets get into the example. Regular expressions can be extremely handy when performing string comparisons. They make it easy to determine if a string is a valid e-mail address, web address, or postal code, just to name a few of the comparisons that can be done. Regular expressions have been around for years, but there isnt any direct support for them within SQL Server. The .NET Framework, however, has a great set of classes for working with regular expressions. This example will show you how to create a userdefined function that will compare an input string to a regular expression and will tell you whether or not the string matches the expression. I would suggest creating a new database to use for these examples. All of the example code will reference a database that I have created called SQLServerStandard. To get started, launch Visual Studio 2005. Create a new project by selecting File -> New -> Project You should be presented with a dialog similar to that shown in Figure 4. The dialog will show different options, depending on what languages and features you have installed.
Figure 2 Script to turn off CLR integration The next thing that you will need to know is that SQL Server data types are mapped to specific CLR data types. The table in Figure 3 (taken from SQL Server Books Online), lists the SQL Server data types and their corresponding CLR data types. When passing data into and out of CLR-based procedures, you should use the types in the column CLR data type (SQL Server). When passing data around within the CLRbased code, you should convert them to the types in the column CLR data type (.NET Framework).
SQL Server data type varbinary binary varbinary(1), binary(1) image varchar char nvarchar(1), nchar(1) nvarchar CLR data type (SQL Server) SqlBytes, SqlBinary SqlBytes, SqlBinary SqlBytes, SqlBinary None None None SqlChars, SqlString SqlChars, SqlString SQLChars is a better match for data transfer and access, and SQLString is a better match for performing String operations. SqlChars, SqlString None None SqlGuid None SqlBoolean SqlByte SqlInt16 SqlInt32 SqlInt64 SqlMoney SqlMoney SqlDecimal SqlDecimal SqlSingle SqlDouble CLR data type (.NET Framework) Byte[] Byte[] byte, Byte[] None None None Char, String, Char[]
nchar text ntext uniqueidentifier rowversion bit tinyint smallint int bigint smallmoney money numeric decimal real float
String, Char[] String, Char[] None None Guid Byte[] Boolean Byte Int16 Int32 Int64 Decimal Decimal Decimal Decimal Single Double
Figure 4 Visual Studio New Project Dialog In the Project types: tree, expand the tree for Visual C# and click Database. In the Templates: list, click SQL Server Project. For this demonstration, please enter the following information into the appropriate areas of the form: Name: RegExUDF Location: C:\temp\SQLServerStandard Solution Name: RegExUDF
29
Also, please check the Create directory for solution checkbox and uncheck the Add to Source Control checkbox (unless you want to put this demo code into source control, that is). Once you have entered all of the information, click OK. You may be prompted with a dialog asking if you want to enable SQL/CLR debugging. For the examples in this article, you can answer no. After you click OK, Visual Studio will create a new Database project. You will be prompted to either select or create a database reference. If you have created a reference to your target database in the past (through earlier use of Visual Studio to manage databases), you can select it from the list. Otherwise, click the Add New Reference button. You will be presented with a dialog similar to the one shown in Figure 5. Figure 5 New Database Reference Dialog Enter (or select) the name of the server where your database resides, select your desired authentication mode (and enter the user name and password, if necessary), select the target database, and click OK. You should be returned to the Add Database Reference dialog, where the reference that you just created should be shown in the list. Select the reference and click OK. When the solution opens, you should see a Solution Explorer window that looks similar to that shown in Figure 6. If you do not see a Solution Explorer window, you can display it by selecting View -> Solution Explorer. Figure 6 Solution Explorer Window The next thing that we need to do is to actually add the CLRbased User Defined Function to the project. To do this, rightclick on the project name in the Solution Explorer window and choose Add -> User-Defined Function from the context menu. You should be presented with a dialog similar to the one shown in Figure 7. Ensure that User-Defined
Function is selected in the Templates: list, enter RegEx.cs in the Name: text box, and click Add.
Figure 7 Add New Item Dialog A code window similar to that shown in Figure 8 should be opened although the color and font scheme will most likely differ from the one shown here!
Figure 8 Initial RegEx.cs Code Window Now I need to take a few minutes and dive into the CLR to explain what is going on in the code above. The first thing to note is that C# is a case-sensitive language. Therefore SqlString is not the same as sqlstring. If we were writing this code in Visual Basic.NET, SqlString and sqlstring would be the same, although the editor would most likely make them the same case in the code to make things more readable. The next thing to point out is the structure of the code. In the code above there is a single class, and that class contains a single method. The class is named UserDefinedFunctions, but that name could be almost anything you want it to be. Note that the name of the class and the name of the code file do not need to match. The class has two modifiers public and partial. Public indicates that there are no inherent restrictions on who can access this class (restrictions could be imposed by security privileges outside of the class, but that is
30
The new version of the method is still declared as public and static, but the return type has been changed to SqlBoolan which will turn into a bit in SQL Server. Unlike the original version, which did not accept any parameters, the new version The single method in the class is called RegEx. It has three accepts two parameters of type SqlString which translates modifiers associated with it public, static and SqlString. into a Unicode character type in SQL Server (nvarchar or Public indicates that the method can be called by anyone nchar). The first parameter (SearchString) is the string that who has access to the class. Static indiyou want to compare to a regular cates that the class does not need to be expression. The second parameter My general opinion is that CLR use instantiated (turned into an object) to (Expression) is the actual regular expresin the database engine should be call the method. SqlString indicates the sion to be applied in the comparison. limited to only where it is needed return type of the method. Since the regular expression syntax is everyone will have their own quite involved, Im not going to dig into it interpretation of what needed is. The text in the square brackets before in any depth. There are many resources the name of the method is called an on the Internet that can help you build attribute. Attributes are used as a way to regular expressions. The one that I am using in my example extend the functionality provided by the .NET Framework. You can think of attributes as meta-data for the methods in your actually comes from the MSDN library. beyond the scope of this article). Partial indicates to the complier that the class can be split among multiple files but it does not need to be. code. If all of this seems daunting to you, dont worry. If there is something that is required for your code to work within SQL Server, it will be included in the Visual Studio template. You might need to extend what is provided in the template to meet your needs, but the basics provided in the template should get you going. If we were to build, deploy and call the code as it stands right now, calling the function would return the text Hello to the calling query. We want the function to do a little more than that. Replace the entire RegEx method with the code in Figure 9. When you are done, you should have a code window similar to that shown in Figure 10.
public static SqlBoolean RegEx(SqlString SearchString, SqlString Expression) { return new SqlBoolean(System.Text.RegularExpressions.Regex.IsMatch(SearchSt ring.ToString(), Expression.ToString())); }
The code within the method is actually only one line of C# code even though it is broken up over 4 lines in the editor window shown in Figure 10. The semicolon not a carriage return or line feed marks the end of a line of code in C#. This following pseudocode describes what is done in this line of code: Compare the search string to the expression If there is a match, indicate true; otherwise, indicate false Convert the .NET bool to a new SqlBoolean Return the SqlBoolean One thing to note is that we need to convert the SQL Server CLR Data Types to their corresponding .NET CLR Data Types in order to work with them, and then convert back to the SQL Server CLR Data Types to be passed back to SQL Server. That is why we need to call the ToString() method when we use the input parameters (which are of type SqlString). ToString() converts the variables of type SqlString into their corresponding string type. On the way back out, we are taking the bool output from the IsMatch method and converting it into a new SqlBoolean. Now that we have the code, it needs to be built and deployed before we can actually use it. To build code in Visual Studio, you first need to select the appropriate build mode either Debug or Release from the dropdown list in the toolbar. Unless you are planning to step through the code with a debugger, I would recommend building the code in release mode. Next, select Build -> Build XXX, where XXX is the name of your project (RegExUDF in our case). If there are build errors, the error list will be displayed. If the build is successful, you should see Build succeeded on the left side of the Visual Studio status bar. I am going to show you two methods for deploying your code to SQL Server through T-SQL scripts and through Visual Studio itself.
31
When deploying through script, there are two things that you need to do create the assembly then create the function. An assembly is the .NET term for a compiled unit of code whether it is a Dynamic Link Library (dll) or an Executable (exe). The assemblies that we will be registering with SQL Server will be dlls. To create the assembly, execute the script in Figure 11 against your target database. Remember to change the path if you used a different target path than the path outlined in this example.
CREATE ASSEMBLY RegExUDF FROM C:\temp\SQLServerStandard\RegExUDF\RegExUDF\bin\Release\Re gExUDF.dll WITH PERMISSION_SET = SAFE;
through Visual Studio. To deploy through Visual Studio, select Build -> Deploy XXX, where XXX is the name of the project you are deploying. The deployment handles both creating the assembly and creating the function. You can set the PERMISSION_SET as a property of the project. As with the scripting option, it will default to SAFE. Now that the function is deployed, you can use it just like you would use any other UDF within SQL Server. The code shown in Figure 13 will execute the function twice. In each case, the expression is looking for a valid e-mail address. The first select statement will pass a valid e-mail address as the search string, and the second will pass an invalid e-mail address as the search string. The output is shown in Figure 14.
SELECT dbo.RegEx(chuck_heinzelman@sqlpass.org, ^([\w\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zAZ]{2,4}|[0-9]{1,3})(\]?)$); GO SELECT dbo.RegEx(chuck_heinzelman.sqlpass.org, ^([\w\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zAZ]{2,4}|[0-9]{1,3})(\]?)$); GO
FIgure 11 Code to Create Assembly The Create Assembly call takes the name of the assembly, the path to the .dll, and a PERMISSION_SET. The PERMISSION_SET is extremely important, as it defines what access the code within the assembly has when run. The default mode is SAFE which is the most restrictive method of access. SAFE assemblies cannot access resources outside of SQL Server. There are two other modes that are available. EXTERNAL_ACCESS means that the code can access external resources on the server, such as the file system and the registry. UNSAFE is the least restrictive PERMISSION_SET, and the code is virtually unrestricted. As a rule, you should use the least open PERMISSION_SET when you register an assembly. In other words, if your assembly doesnt require access to resources outside of SQL Server, mark it as SAFE, not EXTERNAL_ACCESS. The next step in deploying CLR-based code through script is to create the object a function in our case. To create the function, execute the script in Figure 12 against your target database.
CREATE FUNCTION RegEx (@SearchString as nvarchar(255), @Expression as nvarchar(255)) RETURNS bit AS EXTERNAL NAME RegExUDF.UserDefinedFunctions.RegEx;
Conclusion
CLR integration in SQL Server has caused both excitement and fear depending on your point of view. In my opinion, the excitement is warranted and the fear is unfounded. Given the fact that code can be deployed with extremely restrictive rights, there is actually little chance that something could go drastically wrong. The example that we have gone through here is one of the easiest ways to use the CLR in SQL Server. You can also create custom aggregations and user defined types, which in my opinion are the most difficult CLR objects to create. You can download the code used in this example at http://www.sqlserverstandard.com/downloads/200705/heinzelman.zip.
Figure 12 Code to Create Function As with a standard T-SQL function, the Create Function call takes the name of the function, any parameters required, a return type, and the function body. In our case, the function body refers to the method in the assembly that we registered. The name RegExUDF.UserDefinedFunctions.RegEx is the full name of the RegEx method that we created. There is, as I mentioned before, a way to deploy this function
32
PASS
The PASS Editorial Committee would like to welcome you to a new regular column for the SQL Server Standard magazine. With DBA 101, we will attempt to offer an introductory look at the editorial focus for each issue. If you have any comments, please send an e-mail to editorial@sqlpass.org.
When you are just starting out in the database world, performance tuning seems like a daunting task. There are so many places where you might need to go to tune something. Should I look at the table structures, storage, memory, indexes, or even the query itself? The point of this article is not to make you a tuning expert, nor is it meant to represent the perfect way to do things. My goal is to give you some ideas on where to start whether you are a seasoned DBA or someone just getting started.
The application must have a turn-around time of 300 milliseconds from the time that the user submits a request for data to the time that data is returned. The batch processing must be able to handle sustained throughput of 10,000 records per hour, with periodic bursts of up to 100,000 records per hour.
An important thing to derive from performance targets is what the real expectations are. In the first example, the requirement is that the data is returned in 300 milliseconds, but it doesnt mention if that is the first piece of data or the last. In the second example, the goal states that the system must be able to handle bursts of up to 100,000 records per hour, but it does not state how frequent those bursts are or if there can be some lag time in the processing of those bursts. Often you will find that performance goals need some additional clarification to make sure that they are met correctly. Dont be afraid to ask for clarification if a goal seems ambiguous to you. And, if the goals dont exist, ask the appropriate people for help in defining what those goals should be.
mum, you should put your log files on a separate volume from your data files.
Monitoring
Once a system is up and running, you should monitor its performance periodically to make sure that things are running optimally. To make this an effective venture, you should have a performance baseline to work against. To get this baseline, you should monitor the servers and your applications when they are running optimally. Once you have the baseline, you can perform periodic server monitoring to look for deviations from that baseline. If you have access to the archived sessions from prior PASS conferences, our president Kevin Kline has given a great session on baselining that is worth checking out.
When it comes to tools for monitoring performance, I use the tools that are available right out of the box Windows Performance Monitor and SQL Server Profiler. There are also some DBCC commands that can be very useful such as DBCC SHOWCONTIG for checking index fragmentation. I personally tend to monitor disk utilization, memory and processor utilization, and a subset of the SQL Server performance counters from within PerfMon. I also look at query execution times and I/O statistics from within Profiler. When I see things that dont look normal an abnormally high number of reads for a given query, for example I will dig deeper into the root cause of the problem.
Tuning
Once you have monitored the situation and have realized that you are not meeting your performance goals, you can begin to tune. When many people think of tuning, they automatically jump to indexes. This is often a good leap to make. However, there are other things that need to be investigated. You might have appropriate indexes on all of your tables, but they are not being used because of the way that the query is writ-
Many of these techniques can be implemented even if you are not designing an application from the ground up. You can take an existing or off-the-shelf application and spread objects across multiple filegroups on multiple volumes in an attempt to improve performance. At a mini-
ten. If this is the case, you could rewrite the query in a different way so that the optimizer will choose an existing index. In some cases, the optimizer will choose a bad query plan because your statistics are out of data. Periodically updating the statistics will help to solve this problem. Also, indexes that are overly fragmented can cause physical I/O to be slow.
Beyond index tuning and query rewrites, you can look at the performance of your physical hardware. Does your server have enough memory to keep data and query plans cached for a sufficient period of time? Are you over-taxing one of your drive arrays? Are you having networking issues? All of these are possible causes of poor system performance.
Conclusion
One thing that you need to remember is that in many cases perception is everything. Your server could be performing perfectly, but other factors such as network difficulties or poorly written application code could be the real problem. Many times, the problem will come down to several small issues that combine to make one bigger issue. When you need to deal with other departments on performance-related issues such as application development and network support make sure that you take your tact and your facts with you. When approaching others about performance issues, people can sometimes get defensive. My overall advice is to get all of your information straight before approaching other departments and to approach them in a constructive manner. That way, you stand the best chance of improving the overall situation.
401 North Michigan Avenue Chicago, Illinois 60611- 4267 Phone: 312.527.6742 Fax: 312.245.1081
Attend the only SQL Server event hosted for users, by users, resulting in unparalleled education to help you succeed in your career. For more information, visit: http://www.sqlpass.org/
USERS
E D U C AT I O N
WORKING
|
TOGETHER
| COMMUNITY
NETWORKING
w w w . s q l p a s s . o r g