Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
398 views

MySQL ZFS Best Practices

1. Matching the ZFS record size to the Innodb page size of 16KB for data files and 128KB for log files can improve performance by reducing inflated I/O and read-modify-write penalties. 2. Using a separate ZFS intent log (slog) can provide a 10-20% improvement for write-heavy workloads by accelerating synchronous writes and reducing commit latency. 3. If the database working set does not fit in memory, adding an SSD as a secondary cache (L2ARC) can provide a big boost in performance.

Uploaded by

Dmitriy Ilchenko
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
398 views

MySQL ZFS Best Practices

1. Matching the ZFS record size to the Innodb page size of 16KB for data files and 128KB for log files can improve performance by reducing inflated I/O and read-modify-write penalties. 2. Using a separate ZFS intent log (slog) can provide a 10-20% improvement for write-heavy workloads by accelerating synchronous writes and reducing commit latency. 3. If the database working set does not fit in memory, adding an SSD as a secondary cache (L2ARC) can provide a big boost in performance.

Uploaded by

Dmitriy Ilchenko
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NEELAKANTH NADGIR'S BLOG

All MySQL Personal Ruby Sun uperf ZFS Archives


« April 2010
Sun Mon Tue Wed Thu Fri Sat
« Optimizing MySQL... | Main | Inniostat - InnoDB... » 1 2 3
4 5 6 7 8 9 10
Tuesday May 26, 2009 11 12 13 14 15 16 17
18 19 20 21 22 23 24
MySQL Innodb ZFS Best Practices 25 26 27 28 29 30

One of the cool things about talking about MySQL performance with ZFS is Today

that there is not much tuning to be done Tuning with ZFS is considered Search
evil, but a necessity at times. In this blog I will describe some of the tunings
Search
that you can apply to get better performance with ZFS as well as point out
performance bugs which when fixed will nullify the need for some of these
tunings. Past Entries

For the impatient, here is the summary. See below for the reasoning behind My last day at
these recommendations and some gotchas. Sun - 9/18/2009
cmdtruss -- truss
-c MySQL
1. Match ZFS recordsize with Innodb page size (16KB for Innodb (COM_*)
Datafiles, and 128KB for Innodb log files). Commands
2. If you have a write heavy workload, use a Seperate ZFS Intent Log. Inniostat -
InnoDB IO
3. If your database working set size does not fit in memory, you can get a Statistics
big boost by using a SSD as L2ARC. MySQL Innodb
4. While using storage devices with battery backed caches or while ZFS Best
Practices
comparing ZFS with other filesystems, turn off the cache flush. Optimizing
5. Prefer to cache within MySQL/Innodb over the ZFS Adaptive MySQL
replacement cache (ARC). Performance with
ZFS - Slides
6. Disable ZFS prefetch. available
7. Disable Innodb double write buffer. MySQL 5.4 on 2
Socket Nehalem
Lets look at all of them in detail. system (Sun Fire
X4270)
Reducing Innodb
Match ZFS recordsize with Innodb page size (16KB for mutex contention
WHAT Datafiles, and 128KB for Innodb log files). MySQL
Scalability on
HOW zfs set recordsize=16k tank/db Nehalem
systems
The biggest boost in performance can be obtained by SSDs for
matching the ZFS record size with the size of the IO. Since a Performance
Innodb Page is 16KB in size, most read IO is of size 16KB Engineers
(except for some prefetch IO which can get coalesced). The Trading off
default recordsize for ZFS is 128KB. The mismatch between Efficiency for the
Sake of Flexibility
the read size and the ZFS recordsize can result in severely MySQL and UFS
inflated IO. If you issue a 16KB read and the data is not Introduction to
already there in the ARC, you have to read 128KB of data to the Innodb IO
get it. ZFS cannot do a small read because the checksum is subsystem
calculated for the whole block and you have to read it all to Building MySQL
5.1.28 on
verify data integrity. The other reason to match the IO size 5.1.28 on
WHY and the ZFS recordsize is the read-modify-write penalty. With Opensolaris
a ZFS recordsize of 128KB, When Innodb modifies a page, if using Sun Studio
compilers
the zfs record is not already in memory, it needs to be read in
Learning MySQL
from the disk and modified before writing to disk. This Internals via bug
increases the IO latency significantly. Luckily matching the reports
ZFS recordsize with the IO size removes all the problems Innodb just got
mentioned above. better!
Unlocking
For Innodb log file, the writes are usually sequential and MySQL : Whats
varying in size. By using ZFS recordsize of 128KB you hot and what's
amortize the cost of read-modify-write. not
Peeling the
MySQL
Scalability Onion
You need to set the recordsize before creating the database Storage engine
files. If you have already created the files, you need to copy or MySQL
NOTE the files to get the new recordsize. You can use the stat(2) server? Where
command to check the recordsize (look for IO Block: ) has the time
gone?
Improving filesort
performance in
If you have a write heavy workload, use a seperate intent log MySQL
WHAT (slog). uperf - A network
benchmark tool
HOW zpool add log c4t0d0 c4t1d0

Write latency is extremely critical for many MySQL workloads. Links


Typically, a query will read some data, do some calculations,
update some data and then commit the transaction. To Tim Cook
blogs.sun.com
commit, the Innodb log has to be updated. Many transactions Weblog
Login
can be committing at the same time. It is very important that
WHY
this "wait" for commit be fast. Luckily in ZFS, synchronous Today's Page Hits: 152
writes can be accelerated up by using the Seperate Intent Log.
In our tests with Sysbench read-write, we have seen around
10-20% improvement with the slog.

If your query execution involves a physical read from


disk, the time for the write may not be that important. Be
sure to check this suggestion with your real workload.
Until Bug 6574286 is fixed, you cannot remove a slog.
Innodb actually issues multiple kinds of writes (log write,
dataspace write, insert buffer write). Of these, the most
critical one is the Innodb log write. The slog feature is
pool wide and thus some writes (like dataspace writes),
which need not go to the slog still do. This will be fixed
NOTE via Bug 6832481 ZFS separate intent log bypass
property
It is also possible that during ZFS transaction sync time,
the ZFS IO queue (35 deep) can get full. This means
that a write has to wait for a slot to become empty. Bug
6471212: need reserved I/O scheduler slots to improve
I/O latency of critical ops solves this using reserved slots.
Bug 6721168 slog latency impacted by I/O scheduler
during spa_sync is also worth checking out.
WHAT L2ARC (or Level 2 ARC)
HOW zpool add cache c4t0d0

If your database does not fit in memory, every time you miss
the database cache, you have to read a block from disk. This
cost is quite high with regular disks. You can minimize the
database cache miss latency by using a (or multiple) SSDs as
WHY
a level-2 cache or L2ARC. Depending on your database
working set size, memory and L2ARC size you may see
several orders of magnitude improvement in performance.

NOTE

WHAT When it is safe, turn off ZFS cache flush


The ZFS Evil tuning guide has more information about setting
HOW this tunable. Refer to it for the best way to achieve this.

ZFS is designed to work reliably with disks with caches.


Everytime it needs data to be stored persistantly on disk, it
issues a cache flush command to the disk. Disks with a
battery backed caches need not do anything (i.e the cache
flush command is a nop). Many storage devices interpret this
correctly and do the right thing when they receive a cache
WHY flush command. However, there are still a few storage systems
which do not interpret the cache flush command correctly. For
such storage systems, preventing ZFS from sending the cache
flush command results in a big reduction in IO latency. In our
tests with Sysbench read-write test we saw a 30%
improvement in performance.

Setting this tunable on a system without a battery backed


cache can cause inconsistencies in case of a crash.
When comparing ZFS with filesystems that blindly enable
NOTE the write cache, be sure to set this to get a fair
comparison.

WHAT Prefer to cache within MySQL/Innodb over the ARC.


HOW Via my.cnf and by limiting the ARC size
You have multiple levels of caching when you are using
MySQL/Innodb with ZFS. Innodb has its own buffer pool and
ZFS has the ARC. Both of them make independent decisions
on what to cache and what to flush. It is possible for both of
them to cache the same data. By caching inside Innodb, you
get a much shorter (and faster) code path to the data.
Moreover, when the Innodb buffer cache is full, a miss in the
WHY
Innodb buffer cache can lead to flushing of a dirty buffer, even
if the data was cached in the ARC. This leads to unnecessary
writes. Even though the ARC dynamically shrinks and expands
relative to memory pressure, it is more efficient to just limit it.In
our tests, we have found that it is better (7-200%) to cache
inside Innodb rather than ZFS.

The ARC can be tuned to cache everything, just metadata or


nothing on a per filesystem basis. See below for tuning advise
NOTE
about this.

WHAT Disable ZFS Prefetch.


HOW In /etc/system: set zfs:zfs_prefetch_disable = 1

Most filesystems implement some kind of prefetch. ZFS


prefetch detects linear (increasing and decreasing), strided,
multiblock strided IO streams and issues prefetch IO when it
will help performance. These prefetch IO have a lower priority
than regular reads and are generally very beneficial. ZFS also
has a lower level prefetch (commonly called vdev prefetch) to
help with spatial locality of data.

In Innodb, rows are stored in order of primary index. Innodb


issues two kinds of prefetch requests; one is triggered while
WHY
accessing sequential pages and other is triggered via random
access in an extent. While issuing prefetch IO, Innodb
assumes that file is laid out in the order of the primary key.
This is not true for ZFS. We are yet to investigate the impact
of Innodb prefetch.

It is well known that OLTP workloads access data in a random


order and hence do not benefit from prefetch. Thus we
recommend that you turn off ZFS prefetch.

If you have changed the primary cache caching strategy


to just cache metadata, you will not trigger file level
prefetch.
NOTE If you have set recordsize to 16k, you will not trigger the
lower level prefetch.

WHAT Disable Innodb Double write buffer.


HOW skip-innodb_doublewrite in my.cnf

Innodb uses a double write buffer for safely updating pages in


a tablespace. Innodb first writes the changes to the double
write buffer before updating the data page. This is to prevent
partial writes. Since ZFS does not allow partial writes, you can
WHY
safely turn off the double write buffer. In our tests with
Sysbench read-write, we say a 5% improvement in
performance.

NOTE
Posted at 01:21PM May 26, 2009 by Neelakanth Nadgir in MySQL |

Comments:

Post a Comment:
Comments are closed for this entry.

You might also like