</indexterm>
</term>
<listitem>
- <para>
- Specifies the delay between activity rounds for the WAL writer.
- In each round the writer will flush WAL to disk. It then sleeps for
- <varname>wal_writer_delay</> milliseconds, and repeats. The default
- value is 200 milliseconds (<literal>200ms</>). Note that on many
- systems, the effective resolution of sleep delays is 10 milliseconds;
- setting <varname>wal_writer_delay</> to a value that is not a multiple
- of 10 might have the same results as setting it to the next higher
- multiple of 10. This parameter can only be set in the
+ <para>
+ Specifies how often the WAL writer flushes WAL. After flushing WAL it
+ sleeps for <varname>wal_writer_delay</> milliseconds, unless woken up
+ by an asynchronously committing transaction. In case the last flush
+ happened less than <varname>wal_writer_delay</> milliseconds ago and
+ less than <varname>wal_writer_flush_after</> bytes of WAL have been
+ produced since, WAL is only written to the OS, not flushed to disk.
+ The default value is 200 milliseconds (<literal>200ms</>). Note that
+ on many systems, the effective resolution of sleep delays is 10
+ milliseconds; setting <varname>wal_writer_delay</> to a value that is
+ not a multiple of 10 might have the same results as setting it to the
+ next higher multiple of 10. This parameter can only be set in the
+ <filename>postgresql.conf</> file or on the server command line.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-wal-writer-flush-after" xreflabel="wal_writer_flush_after">
+ <term><varname>wal_writer_flush_after</varname> (<type>integer</type>)
+ <indexterm>
+ <primary><varname>wal_writer_flush_after</> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies how often the WAL writer flushes WAL. In case the last flush
+ happened less than <varname>wal_writer_delay</> milliseconds ago and
+ less than <varname>wal_writer_flush_after</> bytes of WAL have been
+ produced since, WAL is only written to the OS, not flushed to disk.
+ If <varname>wal_writer_flush_after</> is set to <literal>0</> WAL is
+ flushed everytime the WAL writer has written WAL. The default is
+ <literal>1MB</literal>. This parameter can only be set in the
<filename>postgresql.conf</> file or on the server command line.
</para>
</listitem>
commit to minimize the window in which the filesystem change has been made
but the transaction isn't guaranteed committed.
-Every wal_writer_delay milliseconds, the walwriter process performs an
-XLogBackgroundFlush(). This checks the location of the last completely
-filled WAL page. If that has moved forwards, then we write all the changed
-buffers up to that point, so that under full load we write only whole
-buffers. If there has been a break in activity and the current WAL page is
-the same as before, then we find out the LSN of the most recent
-asynchronous commit, and flush up to that point, if required (i.e.,
-if it's in the current WAL page). This arrangement in itself would
-guarantee that an async commit record reaches disk during at worst the
-second walwriter cycle after the transaction completes. However, we also
-allow XLogFlush to flush full buffers "flexibly" (ie, not wrapping around
-at the end of the circular WAL buffer area), so as to minimize the number
-of writes issued under high load when multiple WAL pages are filled per
-walwriter cycle. This makes the worst-case delay three walwriter cycles.
+The walwriter regularly wakes up (via wal_writer_delay) or is woken up
+(via its latch, which is set by backends committing asynchronously) and
+performs an XLogBackgroundFlush(). This checks the location of the last
+completely filled WAL page. If that has moved forwards, then we write all
+the changed buffers up to that point, so that under full load we write
+only whole buffers. If there has been a break in activity and the current
+WAL page is the same as before, then we find out the LSN of the most
+recent asynchronous commit, and write up to that point, if required (i.e.
+if it's in the current WAL page). If more than wal_writer_delay has
+passed, or more than wal_writer_flush_after blocks have been written, since
+the last flush, WAL is also flushed up to the current location. This
+arrangement in itself would guarantee that an async commit record reaches
+disk after at most two times wal_writer_delay after the transaction
+completes. However, we also allow XLogFlush to write/flush full buffers
+"flexibly" (ie, not wrapping around at the end of the circular WAL buffer
+area), so as to minimize the number of writes issued under high load when
+multiple WAL pages are filled per walwriter cycle. This makes the worst-case
+delay three wal_writer_delay cycles.
There are some other subtle points to consider with asynchronous commits.
First, for each page of CLOG we must remember the LSN of the latest commit
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/bgwriter.h"
+#include "postmaster/walwriter.h"
#include "postmaster/startup.h"
#include "replication/basebackup.h"
#include "replication/logical.h"
}
/*
- * Flush xlog, but without specifying exactly where to flush to.
+ * Write & flush xlog, but without specifying exactly where to.
*
- * We normally flush only completed blocks; but if there is nothing to do on
- * that basis, we check for unflushed async commits in the current incomplete
- * block, and flush through the latest one of those. Thus, if async commits
- * are not being used, we will flush complete blocks only. We can guarantee
- * that async commits reach disk after at most three cycles; normally only
- * one or two. (When flushing complete blocks, we allow XLogWrite to write
- * "flexibly", meaning it can stop at the end of the buffer ring; this makes a
- * difference only with very high load or long wal_writer_delay, but imposes
- * one extra cycle for the worst case for async commits.)
+ * We normally write only completed blocks; but if there is nothing to do on
+ * that basis, we check for unwritten async commits in the current incomplete
+ * block, and write through the latest one of those. Thus, if async commits
+ * are not being used, we will write complete blocks only.
+ *
+ * If, based on the above, there's anything to write we do so immediately. But
+ * to avoid calling fsync, fdatasync et. al. at a rate that'd impact
+ * concurrent IO, we only flush WAL every wal_writer_delay ms, or if there's
+ * more than wal_writer_flush_after unflushed blocks.
+ *
+ * We can guarantee that async commits reach disk after at most three
+ * wal_writer_delay cycles. (When flushing complete blocks, we allow XLogWrite
+ * to write "flexibly", meaning it can stop at the end of the buffer ring;
+ * this makes a difference only with very high load or long wal_writer_delay,
+ * but imposes one extra cycle for the worst case for async commits.)
*
* This routine is invoked periodically by the background walwriter process.
*
- * Returns TRUE if we flushed anything.
+ * Returns TRUE if there was any work to do, even if we skipped flushing due
+ * to wal_writer_delay/wal_flush_after.
*/
bool
XLogBackgroundFlush(void)
{
- XLogRecPtr WriteRqstPtr;
+ XLogwrtRqst WriteRqst;
bool flexible = true;
- bool wrote_something = false;
+ static TimestampTz lastflush;
+ TimestampTz now;
+ int flushbytes;
/* XLOG doesn't need flushing during recovery */
if (RecoveryInProgress())
/* read LogwrtResult and update local state */
SpinLockAcquire(&XLogCtl->info_lck);
LogwrtResult = XLogCtl->LogwrtResult;
- WriteRqstPtr = XLogCtl->LogwrtRqst.Write;
+ WriteRqst = XLogCtl->LogwrtRqst;
SpinLockRelease(&XLogCtl->info_lck);
/* back off to last completed page boundary */
- WriteRqstPtr -= WriteRqstPtr % XLOG_BLCKSZ;
+ WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
/* if we have already flushed that far, consider async commit records */
- if (WriteRqstPtr <= LogwrtResult.Flush)
+ if (WriteRqst.Write <= LogwrtResult.Flush)
{
SpinLockAcquire(&XLogCtl->info_lck);
- WriteRqstPtr = XLogCtl->asyncXactLSN;
+ WriteRqst.Write = XLogCtl->asyncXactLSN;
SpinLockRelease(&XLogCtl->info_lck);
flexible = false; /* ensure it all gets written */
}
* holding an open file handle to a logfile that's no longer in use,
* preventing the file from being deleted.
*/
- if (WriteRqstPtr <= LogwrtResult.Flush)
+ if (WriteRqst.Write <= LogwrtResult.Flush)
{
if (openLogFile >= 0)
{
return false;
}
+ /*
+ * Determine how far to flush WAL, based on the wal_writer_delay and
+ * wal_writer_flush_after GUCs.
+ */
+ now = GetCurrentTimestamp();
+ flushbytes =
+ WriteRqst.Write / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
+
+ if (WalWriterFlushAfter == 0 || lastflush == 0)
+ {
+ /* first call, or block based limits disabled */
+ WriteRqst.Flush = WriteRqst.Write;
+ lastflush = now;
+ }
+ else if (TimestampDifferenceExceeds(lastflush, now, WalWriterDelay))
+ {
+ /*
+ * Flush the writes at least every WalWriteDelay ms. This is important
+ * to bound the amount of time it takes for an asynchronous commit to
+ * hit disk.
+ */
+ WriteRqst.Flush = WriteRqst.Write;
+ lastflush = now;
+ }
+ else if (flushbytes >= WalWriterFlushAfter)
+ {
+ /* exceeded wal_writer_flush_after blocks, flush */
+ WriteRqst.Flush = WriteRqst.Write;
+ lastflush = now;
+ }
+ else
+ {
+ /* no flushing, this time round */
+ WriteRqst.Flush = 0;
+ }
+
#ifdef WAL_DEBUG
if (XLOG_DEBUG)
- elog(LOG, "xlog bg flush request %X/%X; write %X/%X; flush %X/%X",
- (uint32) (WriteRqstPtr >> 32), (uint32) WriteRqstPtr,
+ elog(LOG, "xlog bg flush request write %X/%X; flush: %X/%X, current is write %X/%X; flush %X/%X",
+ (uint32) (WriteRqst.Write >> 32), (uint32) WriteRqst.Write,
+ (uint32) (WriteRqst.Flush >> 32), (uint32) WriteRqst.Flush,
(uint32) (LogwrtResult.Write >> 32), (uint32) LogwrtResult.Write,
(uint32) (LogwrtResult.Flush >> 32), (uint32) LogwrtResult.Flush);
#endif
START_CRIT_SECTION();
/* now wait for any in-progress insertions to finish and get write lock */
- WaitXLogInsertionsToFinish(WriteRqstPtr);
+ WaitXLogInsertionsToFinish(WriteRqst.Write);
LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
LogwrtResult = XLogCtl->LogwrtResult;
- if (WriteRqstPtr > LogwrtResult.Flush)
+ if (WriteRqst.Write > LogwrtResult.Write ||
+ WriteRqst.Flush > LogwrtResult.Flush)
{
- XLogwrtRqst WriteRqst;
-
- WriteRqst.Write = WriteRqstPtr;
- WriteRqst.Flush = WriteRqstPtr;
XLogWrite(WriteRqst, flexible);
- wrote_something = true;
}
LWLockRelease(WALWriteLock);
*/
AdvanceXLInsertBuffer(InvalidXLogRecPtr, true);
- return wrote_something;
+ /*
+ * If we determined that we need to write data, but somebody else
+ * wrote/flushed already, it should be considered as being active, to
+ * avoid hibernating too early.
+ */
+ return true;
}
/*