Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

New Performance Tuning Deck

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

Performance Tuning:

A Repeatable Process –
Defining What The Issues Are
By
Frank McBath
frankmcb@microsoft.com
frankmcb@hotmail.com
Finding Queries Kill You
• See:
– www.computationpress.com
A Few Ideas
• To me, 95% of problems in databases are
driven by poor queries (SELECTs)
• 70 to 80% of escalation calls are disk drive
bottlenecks.
• I rarely start of looking at queries…
– I look at a pile of numbers…
Define the Problem(s)
• If you took your car to a mechanic…
• If you can’t define it… you can’t solve it…
• You have to get beyond “it’s slow”
Let the Computer Tell You What’s
Wrong with Itself
• Run profiler for 24 hours and search for long
running queries (LRQ)
– Surgical
• Take these queries and run them through
Database Tuning Advisor (DTA)
• Run the DMV script for missing indexes
– Shotgun
– Good… but you don’t now what originated them…
and what happens if they are one-offs and ad-hoc?
Break The Problem Up
• 50,000 lines of anything is pretty meaningless
– You rapidly get lost in the details and white noise.
• Think of things in repeating patterns
• Look for classic “80/20”
• Define things in “Top 10”
Getting A Baseline: Macro
• If you don’t have a baseline, you are just
guessing.
• Take a profiler trace. Run it for 24 hours (filter
of course!)
• Save it to a table and then do a “SUM(Reads),
Writes, Duration, CPU”
• This is a “Day in the life of your system.”
Get a Baseline: Micro
• Look at your repeating patterns and your
LRQ’s.
• Get the stats IO and stats profile on each of
these.
Baselining: Future Vision
• Run a PERFMON trace 24x7, sample every 15
minutes. Look at disk speed, RAM
consumption, and CPU.
• This can be used as a retroactive baseline.
• More importantly, you can plot this data and
“see the future” for when you need to
upgrade.
• Use this data for FY budgeting purposes.
Repro Query
• You HAVE to be able to reliably reproduce a
query outside of the application environment.
• For stored procedures, this is pretty easy.
• For ERP & CRM systems, it gets very tricky.
– SP_CURSORFETCH
– Implicit Cursor Conversions, etc…
Stats IO, Stats Profile
• SET STATISTICS IO [ON | OFF]
– Tells you at a table by table where you are getting
hit.
– Look for “Read Aheads” (aka. Scanning large
amounts of data)
• SET STATISTICS PROFILE [ON | OFF]
– This will show you at a statement by statement
level which lines are the most expensive.
– This is as accurate as you can get.
Collecting The Data
• Health Check Script
• Profiler Traces
– Keep it minimal
• Perfmon Traces
– Keep it minimal
• Don’t get lost in the white noise
• It’s like a NASCAR mechanic’s garage…
Health Check Script (I)
• Look at physical RAM, number of cores
1 ProductName NULL Microsoft SQL Server
2 ProductVersion 589824 9.00.3239.00
3 Language 1033 English (United States)
4 Platform NULL NT AMD64
5 Comments NULL NT AMD64
6 CompanyName NULL Microsoft Corporation
7 FileDescription NULL SQL Server Windows NT - 64 Bit
8 FileVersion NULL 2005.090.3239.00
9 InternalName NULL SQLSERVR
10 LegalCopyright NULL © Microsoft Corp. All rights reserved.
11 LegalTrademarks NULL Microsoft® is a registered trademark
of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation
12 OriginalFilename NULL SQLSERVR.EXE
13 PrivateBuild NULL NULL
14 SpecialBuild 212271104 NULL
15 WindowsVersion 248381957 5.2 (3790)
16 ProcessorCount 16 16
17 ProcessorActiveMask 16 ffff
18 ProcessorType 8664 NULL
19 PhysicalMemory 32765 32765 (34356162560)
20 Product ID NULL NULL
Health Check Script (II)
• SP_CONFIGURE
• I’m looking at these values…
– Compare these to the previous screen… what do
you see?

awe enabled 0 1 0 0
max degree of parallelism 0 64 1 1
max server memory (MB) 16 2147483647 14336 14336
min server memory (MB) 0 2147483647 4196 4196
xp_cmdshell 0 1 0 0
Health Check(III)
Distribution of the Workload
• 95 to 98% Reads…
– How would this impact our disk architecture?
Number of SELECT Bytes Read                              Bytes Written
---------------- --------------------------------------- -----------------
2946             491296440320                            644440064

Number of INSERT Bytes Read                              Bytes Written


---------------- --------------------------------------- ------------------
74               3868090368                              8216576

Number of UPDATE Bytes Read                              Bytes Written


---------------- --------------------------------------- ------------------
5                213155840                               24576

Number of DELETE Bytes Read                              Bytes Written


---------------- --------------------------------------- -----------------
47               10464018432                             184049664

Number of INDEX Hints


---------------------
0
Health Check (IV)
• Look at the number of files to the number of
cores… Same on TEMPDB…
• Look at the size of the files and the growth
characteristics…
name db_size owner ty_level
------------------------ ------------- ---------------------------------------------------------------------------
SIEBEL_OLTP_PRD 1133575.75 MB SIEBEL =Latin1_Gene

name fileid filename filegroup size growth


------------------------ ------ ------------------------------------------------- ----------- ------------ ----------
SIEBEL_OLTP_PRD_Data1 1 E:\PRD-EU\DATA1\SIEBEL_OLTP_PRD_EUR_Data1.MDF PRIMARY 251658240 KB 512000 KB
SIEBEL_OLTP_PRD_Log 2 E:\PRD-EU\LOG\SIEBEL_OLTP_PRD_Log.LDF NULL 154148608 KB 0 KB
SIEBEL_OLTP_PRD_Data2 3 E:\PRD-EU\DATA2\SIEBEL_OLTP_PRD_EUR_Data2.NDF PRIMARY 251658240 KB 512000 KB
SIEBEL_OLTP_PRD_Data3 4 E:\PRD-EU\DATA3\SIEBEL_OLTP_PRD_EUR_Data3.NDF PRIMARY 251658240 KB 512000 KB
SIEBEL_OLTP_PRD_Data4 6 E:\PRD-EU\DATA4\SIEBEL_OLTP_UAT_Data4.ndf PRIMARY 251658240 KB 512000 KB
Look at the Layout…
• All on the same drive?
• TempDB separated off from Data & Log?
• RAID Levels of devices
– Number of physical spindles
• Is RCSI used?
• What does this mean?
– HA & Performance implications
name db_size owner ty_level
------------------------ ------------- ---------------------------------------------------------------------------
SIEBEL_OLTP_PRD 1133575.75 MB SIEBEL =Latin1_Gene

name fileid filename filegroup size growth


------------------------ ------ ------------------------------------------------- ----------- ------------ ----------
SIEBEL_OLTP_PRD_Data1 1 E:\PRD-EU\DATA1\SIEBEL_OLTP_PRD_EUR_Data1.MDF PRIMARY 251658240 KB 512000 KB
SIEBEL_OLTP_PRD_Log 2 E:\PRD-EU\LOG\SIEBEL_OLTP_PRD_Log.LDF NULL 154148608 KB 0 KB
SIEBEL_OLTP_PRD_Data2 3 E:\PRD-EU\DATA2\SIEBEL_OLTP_PRD_EUR_Data2.NDF PRIMARY 251658240 KB 512000 KB
SIEBEL_OLTP_PRD_Data3 4 E:\PRD-EU\DATA3\SIEBEL_OLTP_PRD_EUR_Data3.NDF PRIMARY 251658240 KB 512000 KB
SIEBEL_OLTP_PRD_Data4 6 E:\PRD-EU\DATA4\SIEBEL_OLTP_UAT_Data4.ndf PRIMARY 251658240 KB 512000 KB
Look at the Response Times
• Remember, the DMV is just showing an
Average over time…
DbId FileId ms/io
------ ---------- ------ ------
1 master 1 3
1 master 2 0
2 tempdb 1 1
2 tempdb 2 2
2 tempdb 3 1
2 tempdb 4 1
2 tempdb 5 2
3 model 1 2
3 model 2 0
4 msdb 1 5
4 msdb 2 0
5 PRD 1 4
5 PRD 2 1
5 PRD 3 65
5 PRD 4 70
5 PRD 5 72
5 PRD 6 73
5 PRD 7 75
5 PRD 8 78
5 PRD 9 81
Top 10’s (I)
• What are you looking at when you see high
reads and writes in the same line?
• What is high duration and low reads?
• Train yourself to see these things and just
logically work through them in your head.
• Recognize bottlenecks when you see them in
the data.
Top 10 (II)
• Look at the Top 10 biggest tables… if you are
going to table scan… here are the real
problems…
– What else are you seeing?
• Then, by table…
Top 10 Largest Tables

table index indid dpages reserved bytes used slack space


-------------------- -------------------- ------ ----------- ----------- ---------------------------------------
---------------------
S_SRC S_SRC_P1 1 12 369 003 30 523 536 250 048 806 912 0.5947716214792414615
S_EVT_ACT S_EVT_ACT_P1 1 2855597 7709130 63153192960
0.6295824561266965274
S_PROD_SHIPMENT S_PROD_SHIPMENT_P1 1 3357636 6246911 51174694912
0.4625125922235805825
S_ORDER_ITEM S_ORDER_ITEM_P1 1 2906128 5935460 48623288320
0.5103786395662678209
S_STORE_COND S_STORE_COND_P1 1 4028383 4990831 40884887552
0.1928432359260411744
S_ORG_PROD S_ORG_PROD_P1 1 1043565 1420237 11634581504
0.2652177066222046039
EIM_SRC EIM_SRC_U1 1 848859 1153958 9453223936
table indid index name indid dpages reserved bytes used slack space
------- ------ ------------------ ----------- ----------- --------------- -------------------------
S_SRC 1 S_SRC_P1 1 12369003 30523536 250048806912 0.5947716214792414615
S_SRC 51 S_SRC_M8 51 949800 970926 7953825792 0.0217586098219637748
S_SRC 46 S_SRC_M4 46 917821 937101 7676731392 0.0205740896658951384
S_SRC 50 S_SRC_M7 50 917830 936909 7675158528 0.0203637706543538380
S_SRC 38 S_SRC_F6 38 219584 226289 1853759488 0.0296302515809429535
S_SRC 31 S_SRC_F35 31 219584 226288 1853751296 0.0296259633741073323
S_SRC 33 S_SRC_F37 33 219584 226287 1853743104 0.0296216751293711084
S_SRC 37 S_SRC_F51 37 219584 226287 1853743104 0.0296216751293711084
S_SRC 17 S_SRC_F22 17 219584 226287 1853743104 0.0296216751293711084
S_SRC 23 S_SRC_F28 23 219584 226287 1853743104 0.0296216751293711084
S_SRC 29 S_SRC_F33 29 219584 226287 1853743104 0.0296216751293711084
S_SRC 4 S_SRC_F10 4 219584 226287 1853743104 0.0296216751293711084
S_SRC 5 S_SRC_F11 5 219584 226287 1853743104 0.0296216751293711084
S_SRC 21 S_SRC_F26 21 219584 226287 1853743104 0.0296216751293711084
S_SRC 19 S_SRC_F24 19 219584 226287 1853743104 0.0296216751293711084
S_SRC 12 S_SRC_F18 12 219584 226287 1 853 743 104 0.0296216751293711084
S_SRC 13 S_SRC_F19 13 219584 226287 1853743104 0.0296216751293711084
S_SRC 45 S_SRC_M3 45 219584 226287 1853743104 0.0296216751293711084
S_SRC 47 S_SRC_M5 47 219584 226287 1853743104 0.0296216751293711084
S_SRC 55 S_SRC_V2 55 219584 226286 1853734912 0.0296173868467337794
S_SRC 39 S_SRC_F7 39 219584 226286 1853734912 0.0296173868467337794
S_SRC 40 S_SRC_F8 40 219584 226286 1853734912 0.0296173868467337794
S_SRC 41 S_SRC_F9 41 219584 226286 1853734912 0.0296173868467337794
S_SRC 15 S_SRC_F20 15 219584 226286 1853734912 0.0296173868467337794
S_SRC 16 S_SRC_F21 16 219584 226286 1853734912 0.0296173868467337794
S_SRC 22 S_SRC_F27 22 219584 226286 1853734912 0.0296173868467337794
S_SRC 6 S_SRC_F12 6 219584 226286 1853734912 0.0296173868467337794
S_SRC 7 S_SRC_F13 7 219584 226286 1853734912 0.0296173868467337794
S_SRC 10 S_SRC_F16 10 219584 226286 1853734912 0.0296173868467337794
S_SRC 11 S_SRC_F17 11 219584 226286 1853734912 0.0296173868467337794
S_SRC 30 S_SRC_F34 30 219584 226286 1853734912 0.0296173868467337794
S_SRC 26 S_SRC_F30 26 219584 226286 1853734912 0.0296173868467337794
S_SRC 27 S_SRC_F31 27 219584 226286 1853734912 0.0296173868467337794
S_SRC 28 S_SRC_F32 28 219584 226286 1853734912 0.0296173868467337794
S_SRC 36 S_SRC_F50 36 219584 226286 1853734912 0.0296173868467337794
S_SRC 32 S_SRC_F36 32 219584 226286 1853734912 0.0296173868467337794
S_SRC 57 S_SRC_V4 57 199400 205222 1681178624 0.0283692781475670250
Take a good quick first look
• Repeating patterns
– Uniform numbers
• Anomalies in the data
– High duration, low reads
– Lots of LRQs
Repeating Patterns
• Things that are too uniform happen for a
reason…
– Table Scans
– 100% of 1 value
• They are finger prints
• Train yourself to look for them fast
Look for 80/20
• What are you looking at here?
--------------------
Top 10 LRQ by Writes

RowNumber Duration Reads Writes CPU


----------- -------------------- -------------------- -------------------- -----------
5631 12453 1161768 3366 10672
1440 1436 144266 2499 640
1441 1233 143768 2497 625
1462 1686 144112 2497 672
1445 1343 144701 2495 781
1439 1233 143810 2495 672
1463 1263 143976 2494 594
1438 1093 143828 2494 625
1453 1360 143956 2494 750
1470 1250 143654 2493 672
Top 10’s (III)
• You will always have a “top 10”. It’s just a
matter if the queries are running in 8 minutes,
8 seconds or 8 milliseconds.
• Look at the TOP 10 by Reads, Writes, Duration
& CPU.
• I look at Reads the most as this is what 90 to
95% of all database IO is.
Easy, Hard, Difficult
• Easy
– Queries running in 10 seconds
– Queries doing 50GB
• Hard
– Queries running in 500 milliseconds, but run 100K
a day
• Difficult
– Queries doing 5 logical reads and run 500K a day
• Master..syscacheobjects!
Know Your Data
• How many know what this is and what the
problems are?

   WHERE
      ((T1.TEST_ORDER_FLG = ''N'') AND
      (T2.NAME = ''Sale'' OR T2.NAME = ''Return'' OR T2.NAME = @P2 OR T2.NAME =
''Exchange'' OR T2.NAME = ''Invoice'' OR T2.NAME = ''RMA - Over Shipped'') AND
      (T1.X_CUST_ACC = @P3 OR T1.X_CUST_ACC = @P4 OR T1.X_CUST_ACC = @P5 OR
T1.X_CUST_ACC IS NULL  OR T1.X_CUST_ACC IS NULL  OR T1.X_CUST_ACC IS NULL  OR
T1.X_CUST_ACC IS NULL )) AND
      (T1.STATUS_CD LIKE @P6)
   ORDER BY
      T4.BU_ID DESC, T4.ORDER_DT DESC
Summary
• At this point, you have defined your:
– Top 10 by table, index, LRQ
– Looked over the system configuration RAM,
MAXDOP
– Looked over the disk architecture at a logical level,
and physical (ms/io, HA & Performance)
• Now the issues have been defined.
– Use DTA to start fixing them.

You might also like