Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit f11e8be

Browse files
committed
Make commit_delay much smarter.
Instead of letting every backend participating in a group commit wait independently, have the first one that becomes ready to flush WAL wait for the configured delay, and let all the others wait just long enough for that first process to complete its flush. This greatly increases the chances of being able to configure a commit_delay setting that actually improves performance. As a side consequence of this change, commit_delay now affects all WAL flushes, rather than just commits. There was some discussion on pgsql-hackers about whether to rename the GUC to, say, wal_flush_delay, but in the absence of consensus I am leaving it alone for now. Peter Geoghegan, with some changes, mostly to the documentation, by me.
1 parent f83b599 commit f11e8be

File tree

4 files changed

+58
-59
lines changed

4 files changed

+58
-59
lines changed

doc/src/sgml/config.sgml

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1866,23 +1866,26 @@ SET ENABLE_SEQSCAN TO OFF;
18661866
</indexterm>
18671867
<listitem>
18681868
<para>
1869-
When the commit data for a transaction is flushed to disk, any
1870-
additional commits ready at that time are also flushed out.
18711869
<varname>commit_delay</varname> adds a time delay, set in
1872-
microseconds, before a transaction attempts to
1873-
flush the WAL buffer out to disk. A nonzero delay can allow more
1874-
transactions to be committed with only one flush operation, if
1875-
system load is high enough that additional transactions become
1876-
ready to commit within the given interval. But the delay is
1877-
just wasted if no other transactions become ready to
1878-
commit. Therefore, the delay is only performed if at least
1879-
<varname>commit_siblings</varname> other transactions are
1880-
active at the instant that a server process has written its
1881-
commit record.
1882-
The default <varname>commit_delay</> is zero (no delay).
1883-
Since all pending commit data will be written at every flush
1884-
regardless of this setting, it is rare that adding delay
1885-
by increasing this parameter will actually improve performance.
1870+
microseconds, before a WAL flush is initiated. This can improve
1871+
group commit throughput by allowing a larger number of transactions
1872+
to commit via a single WAL flush, if system load is high enough
1873+
that additional transactions become ready to commit within the
1874+
given interval. However, it also increases latency by up to
1875+
<varname>commit_delay</varname> microseconds for each WAL
1876+
flush. Because the delay is just wasted if no other transactions
1877+
become ready to commit, it is only performed if at least
1878+
<varname>commit_siblings</varname> other transactions are active
1879+
immediately before a flush would otherwise have been initiated.
1880+
In <productname>PostgreSQL</> releases prior to 9.3,
1881+
<varname>commit_delay</varname> behaved differently and was much
1882+
less effective: it affected only commits, rather than all WAL flushes,
1883+
and waited for the entire configured delay even if the WAL flush
1884+
was completed sooner. Beginning in <productname>PostgreSQL</> 9.3,
1885+
the first process that becomes ready to flush waits for the configured
1886+
interval, while subsequent processes wait only until the leader
1887+
completes the flush. The default <varname>commit_delay</> is zero
1888+
(no delay).
18861889
</para>
18871890
</listitem>
18881891
</varlistentry>

doc/src/sgml/wal.sgml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -376,9 +376,7 @@
376376
<acronym>WAL</acronym> to disk, in the hope that a single flush
377377
executed by one such transaction can also serve other transactions
378378
committing at about the same time. Setting <varname>commit_delay</varname>
379-
can only help when there are many concurrently committing transactions,
380-
and it is difficult to tune it to a value that actually helps rather
381-
than hurt throughput.
379+
can only help when there are many concurrently committing transactions.
382380
</para>
383381

384382
</sect1>

src/backend/access/transam/xact.c

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,6 @@ bool XactDeferrable;
6868

6969
int synchronous_commit = SYNCHRONOUS_COMMIT_ON;
7070

71-
int CommitDelay = 0; /* precommit delay in microseconds */
72-
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
73-
7471
/*
7572
* MyXactAccessedTempRel is set when a temporary relation is accessed.
7673
* We don't allow PREPARE TRANSACTION in that case. (This is global
@@ -1123,22 +1120,6 @@ RecordTransactionCommit(void)
11231120
if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) ||
11241121
forceSyncCommit || nrels > 0)
11251122
{
1126-
/*
1127-
* Synchronous commit case:
1128-
*
1129-
* Sleep before flush! So we can flush more than one commit records
1130-
* per single fsync. (The idea is some other backend may do the
1131-
* XLogFlush while we're sleeping. This needs work still, because on
1132-
* most Unixen, the minimum select() delay is 10msec or more, which is
1133-
* way too long.)
1134-
*
1135-
* We do not sleep if enableFsync is not turned on, nor if there are
1136-
* fewer than CommitSiblings other backends with active transactions.
1137-
*/
1138-
if (CommitDelay > 0 && enableFsync &&
1139-
MinimumActiveBackends(CommitSiblings))
1140-
pg_usleep(CommitDelay);
1141-
11421123
XLogFlush(XactLastRecEnd);
11431124

11441125
/*

src/backend/access/transam/xlog.c

Lines changed: 38 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ bool fullPageWrites = true;
8080
bool log_checkpoints = false;
8181
int sync_method = DEFAULT_SYNC_METHOD;
8282
int wal_level = WAL_LEVEL_MINIMAL;
83+
int CommitDelay = 0; /* precommit delay in microseconds */
84+
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
8385

8486
#ifdef WAL_DEBUG
8587
bool XLOG_DEBUG = false;
@@ -2098,34 +2100,49 @@ XLogFlush(XLogRecPtr record)
20982100
*/
20992101
continue;
21002102
}
2101-
/* Got the lock */
2103+
2104+
/* Got the lock; recheck whether request is satisfied */
21022105
LogwrtResult = XLogCtl->LogwrtResult;
2103-
if (!XLByteLE(record, LogwrtResult.Flush))
2106+
if (XLByteLE(record, LogwrtResult.Flush))
2107+
break;
2108+
2109+
/*
2110+
* Sleep before flush! By adding a delay here, we may give further
2111+
* backends the opportunity to join the backlog of group commit
2112+
* followers; this can significantly improve transaction throughput, at
2113+
* the risk of increasing transaction latency.
2114+
*
2115+
* We do not sleep if enableFsync is not turned on, nor if there are
2116+
* fewer than CommitSiblings other backends with active transactions.
2117+
*/
2118+
if (CommitDelay > 0 && enableFsync &&
2119+
MinimumActiveBackends(CommitSiblings))
2120+
pg_usleep(CommitDelay);
2121+
2122+
/* try to write/flush later additions to XLOG as well */
2123+
if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
21042124
{
2105-
/* try to write/flush later additions to XLOG as well */
2106-
if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
2107-
{
2108-
XLogCtlInsert *Insert = &XLogCtl->Insert;
2109-
uint32 freespace = INSERT_FREESPACE(Insert);
2125+
XLogCtlInsert *Insert = &XLogCtl->Insert;
2126+
uint32 freespace = INSERT_FREESPACE(Insert);
21102127

2111-
if (freespace == 0) /* buffer is full */
2112-
WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
2113-
else
2114-
{
2115-
WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
2116-
WriteRqstPtr -= freespace;
2117-
}
2118-
LWLockRelease(WALInsertLock);
2119-
WriteRqst.Write = WriteRqstPtr;
2120-
WriteRqst.Flush = WriteRqstPtr;
2121-
}
2128+
if (freespace == 0) /* buffer is full */
2129+
WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
21222130
else
21232131
{
2124-
WriteRqst.Write = WriteRqstPtr;
2125-
WriteRqst.Flush = record;
2132+
WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
2133+
WriteRqstPtr -= freespace;
21262134
}
2127-
XLogWrite(WriteRqst, false, false);
2135+
LWLockRelease(WALInsertLock);
2136+
WriteRqst.Write = WriteRqstPtr;
2137+
WriteRqst.Flush = WriteRqstPtr;
21282138
}
2139+
else
2140+
{
2141+
WriteRqst.Write = WriteRqstPtr;
2142+
WriteRqst.Flush = record;
2143+
}
2144+
XLogWrite(WriteRqst, false, false);
2145+
21292146
LWLockRelease(WALWriteLock);
21302147
/* done */
21312148
break;

0 commit comments

Comments
 (0)