I/O schedulers in Linux reorder and group I/O requests to improve throughput while balancing latency. Different schedulers take different approaches, and there is no single best scheduler for all situations. For Oracle databases on Linux, Oracle recommends using the Deadline scheduler for HDD storage to prioritize I/O requests, while the none scheduler may be best for SSD/NVMe storage. When selecting a scheduler, it is important to consider the storage media and I/O characteristics of the workload.
1 of 5
More Related Content
IO Schedulers (Elevater) concept and its affection on database performance
1. I/O Schedulers (Elevater) concept and its a
ff
ection on
database performance
Linux I/O schedulers in Linux
I/O schedulers attempt to improve throughput by reordering request access into a linear order
based on the logical addresses of the data and trying to group these together. While this may
increase overall throughput it may lead to some I/O requests waiting for too long, causing latency
issues. I/O schedulers attempt to balance the need for high throughput while trying to fairly share
I/O requests amongst processes.
Di
ff
erent approaches have been taken for various I/O schedulers and each has their own set of
strengths and weaknesses and the general rule is that there is no perfect default I/O scheduler for
all the range of I/O demands a system may experience.
I/O Elevater or Scheduler Types in Debian distribution as
Ubuntu
Multiqueue I/O schedulers
Note: These are the only I/O schedulers available in Ubuntu Eoan Ermine 19.10 and onwards.
The following I/O schedulers are designed for multiqueue devices. These map I/O requests to
multiple queues and these are handled by kernel threads that are distributed across multiple
CPUs.
bfq (Budget Fair Queuing) (Multiqueue)
Designed to provide good interactive response, especially for slower I/O devices. This is a
complex I/O scheduler and has a relatively high per-operation overhead so it is not ideal for
devices with slow CPUs or high throughput I/O devices. Fair sharing is based on the number of
sectors requested and heuristics rather than a time slice. Desktop users may like to experiment
with this I/O scheduler as it can be advantageous when loading large applications.
kyber (Multiqueue)
Designed for fast multi-queue devices and is relatively simple. Has two request queues:
• Synchronous requests (e.g. blocked reads)
• Asynchronous requests (e.g. writes)
There are strict limits on the number of request operations sent to the queues. In theory this limits
the time waiting for requests to be dispatched, and hence should provide quick completion time
for requests that are high priority.
none (Multiqueue)
The multi-queue no-op I/O scheduler. Does no reordering of requests, minimal overhead. Ideal for
fast random I/O devices such as NVME.
mq-deadline (Multiqueue)
This is an adaption of the deadline I/O scheduler but designed for Multiqueue devices. A good all-
rounder with fairly low CPU overhead.
2. Non-multiqueue I/O schedulers
NOTE: Non-multiqueue have been deprecated in Ubuntu Eoan Ermine 19.10 onwards as they are
no longer supported in the Linux 5.3 kernel.
deadline
This
fi
xes starvation issues seen in other schedulers. It uses 3 queues for I/O requests:
• Sorted
• Read FIFO - read requests stored chronologically
• Write FIFO - write requests stored chronologically
Requests are issued from the sorted queue inless a read from the head of a read or write FIFO
expires. Read requests are preferred over write requests. Read requests have a 500ms expiration
time, write requests have a 5s expiration time.
cfq (Completely Fair Queueing)
• Per-process sorted queues for synchronous I/O requests.
• Fewer queues for asynchronous I/O requests.
• Priorities from ionice are taken into account.
Each queue is allocated a time slice for fair queuing. There may be wasteful idle time if a time slice
quantum has not expired.
noop (No-operation)
Performs merging of I/O requests but no sorting. Good for random access devices (
fl
ash,
ramdisk, etc) and for devices that sort I/O requests such as advanced storage controllers.
Selecting I/O Schedulers
Prior to Ubuntu 19.04 with Linux 5.0 or Ubuntu 18.04.3 with Linux 4.15, the multiqueue I/O
scheduling was not enabled by default and just the deadline, cfq and noop I/O schedulers were
available by default.
For Ubuntu 19.10 with Linux 5.0 or Ubuntu 18.04.3 with Linux 5.0 onwards, multiqueue is enabled
by default providing the bfq, kyber, mq-deadline and none I/O schedulers.
For Ubuntu 19.10 with Linux 5.3 the deadline, cfq and noop I/O schedulers are deprecated.
Disk I/O Scheduler on Linux Redhat distribution as
Oracle Linux
The following are the common starting recommendations for an I/O scheduler for a RHEL based
virtual guest based upon kernel version and disk type.
• RHEL 8,9 : mq-deadline is default I/O scheduler unless otherwise changed [FN.1]
◦ Virtual disks: keep current io scheduler setting (mq-deadline)
◦ Physical disks: keep current io scheduler setting (mq-deadline, or for NVMe none)
• RHEL 7.5+ : deadline is default io scheduler unless otherwise changed
◦ Virtual disks: keep current io scheduler setting (deadline)
✦
6.2. I/O Scheduling with Red Hat Enterprise Linux as a Virtualization Guest
◦ Physical disks: keep current io scheduler setting (deadline)
• RHEL 4,5,6,(7.0-7.4) : cfq is default I/O scheduler unless otherwise changed
◦ Virtual disks: change to noop scheduler [FN.2]
✦
6.3. I/O Scheduling with Red Hat Enterprise Linux as a Virtualization Guest
◦ Physical disks: keep current io scheduler setting
3. Red Hat Enterprise Linux: Disk I/O Scheduler and
Orcale Linux
There are several choices for the type of disk scheduler used by the operating system for Red Hat
Enterprise Linux 8.
The function of the disk scheduler is for ordering, delaying, or merging I/O requests to storage to
achieve better throughput and latency. Each disk scheduler provides a di
ff
erent method for
managing storage requests:
None: Implements a
fi
rst-in
fi
rst-out (FIFO) scheduling algorithm. The none scheduler is
recommended for systems that have high performance storage like Solid State Drives (SSD) or
Non-volatile Memory Express (NVMe) drives. The default schedule for NVME is none.
Mq-deadline: This scheduler groups queued I/O requests into batches. The I/O in the batches are
organized by logical block addressing so that writes are organized by location to storage. The mq-
deadline is suited for traditional HDD storage.
Bfq: This scheduler prioritizes latency rather than maximum throughput. The bfq disk scheduler is
recommended for desktop or interactive tasks and traditional HDD storage.
Kyber: This scheduler tunes itself by analyzing I/O requests. With each I/O request a calculation is
done to determine of the I/O can be satis
fi
ed with the least amount of latency. Kyber is
recommended for high performance storage like SSDs and NVMe drives.
In this best practice, we followed Red Hat’s recommendation of using the “none” disk I/O
scheduler as the PowerStore contains NVMe drives and the database access the storage volumes
as virtual devices through virtual guest OS and through host bus adapters (HBAs).
Best I/O Scheduler for Oracle Database?
For best performance for Oracle ASM, Oracle recommends
that you use the Deadline I/O Scheduler when using Baremetal
servers with SAS or Sata HDD on storage or local disk.
When using Nvme or enterprise SSD disks may be good
options be none.
Note: there are many models of SSD disks in market with di
ff
erent characteristics and price.
Disk I/O schedulers reorder, delay, or merge requests for disk I/O to achieve better throughput
and lower latency.
4. Orcale Linux has multiple disk I/O schedulers available, including Deadline, Noop, Anticipatory,
and Completely Fair Queuing (CFQ).
On each cluster node, enter the following command to verify that the Deadline disk I/O scheduler
is con
fi
gured for use:
# cat /sys/block/${ASM_DISK}/queue/scheduler
noop [deadline] cfq
Note:
On newer kernel 'deadline' chnages to 'mq-deadline'.
In this example, the default disk I/O scheduler is Deadline and ASM_DISK is the Oracle Automatic
Storage Management (Oracle ASM) disk device.
On some virtual environments (VM) and special devices such as fast storage devices, the output
of the above command may be none.
The operating system or VM bypasses the kernel I/O scheduling and submits all I/O requests
directly to the device.
Do not change the I/O Scheduler settings on such environments.
If the default disk I/O scheduler is not Deadline, then set it using a rules
fi
le:
Note:You can modify balue of scheduler
fi
le manually by echo command but following role is
better to maintain.
1. Using a text editor, create a UDEV rules
fi
le for the Oracle ASM devices:
# vi /etc/udev/rules.d/60-oracle-schedulers.rules
2. Add the following line to the rules
fi
le and save it:
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/
scheduler}="deadline"
3. On clustered systems, copy the rules
fi
le to all other nodes on the cluster. For example:
$ scp 60-oracle-schedulers.rules root@node2:/etc/udev/rules.d/
4. Load the rules
fi
le and restart the UDEV service. For example:
◦ Oracle Linux and Red Hat Enterprise Linux
# udevadm control --reload-rules
◦ SUSE Linux Enterprise Server
# /etc/init.d boot.udev restart
1. Verify that the disk I/O scheduler is set as Deadline.
When use ASMLib:
Get the details of the disk:
# /etc/init.d/oracleasm querydisk -p DISK1
5. Disk "DISK1" is a valid ASM disk
/dev/sdb1: LABEL="DISK1" TYPE="oracleasm"
Check the deadline scheduler is con
fi
gured at the disk level:
# more /sys/block/sdb/queue/scheduler
noop [deadline] cfq <<<< Here in this case deadline is con
fi
gured.
Likewise you check for the disks of the ASM.
It is con
fi
rmed that the parent disks which are part of ASM diskgroup are using the "deadline" I/O
scheduler. This can be con
fi
rmed by running latest orachk utility.
Which Linux scheduler is recommended on Linux, 'CFQ Scheduler' or 'DEADLINE
Scheduler'?
The CFQ scheduler is suitable for a wide variety of applications and provides a good compromise
between throughput and latency, hence it is the default.
In comparison to the CFQ algorithm, the
Deadline scheduler caps per request and maintains a good disk throughput which is best for
disk-intensive database applications, so if you application is IO intensive, then you would want to
use the Deadline scheduler, but for all other applications, the Completely Fair Queuing (CFQ)
scheduler is better.
On newer kernel 'deadline' chnages to 'mq-deadline'.
Earlier CFQ scheduler is exposed to an issue (5041764 / bugzilla#151368) if OCFS2 is used, the
DEADLINE scheduler did not encounter the problem.
Regards
Alireza Kamrani
Senior RDBMS Consultant