Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ce6a71f

Browse files
committed
Use vectored I/O to fill new WAL segments.
Instead of making many block-sized write() calls to fill a new WAL file with zeroes, make a smaller number of pwritev() calls (or various emulations). The actual number depends on the OS's IOV_MAX, which PG_IOV_MAX currently caps at 32. That means we'll write 256kB per call on typical systems. We may want to tune the number later with more experience. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
1 parent 13a021f commit ce6a71f

File tree

1 file changed

+22
-6
lines changed
  • src/backend/access/transam

1 file changed

+22
-6
lines changed

src/backend/access/transam/xlog.c

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
#include "pg_trace.h"
4949
#include "pgstat.h"
5050
#include "port/atomics.h"
51+
#include "port/pg_iovec.h"
5152
#include "postmaster/bgwriter.h"
5253
#include "postmaster/startup.h"
5354
#include "postmaster/walwriter.h"
@@ -3270,7 +3271,6 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
32703271
XLogSegNo installed_segno;
32713272
XLogSegNo max_segno;
32723273
int fd;
3273-
int nbytes;
32743274
int save_errno;
32753275

32763276
XLogFilePath(path, ThisTimeLineID, logsegno, wal_segment_size);
@@ -3317,6 +3317,9 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
33173317
save_errno = 0;
33183318
if (wal_init_zero)
33193319
{
3320+
struct iovec iov[PG_IOV_MAX];
3321+
int blocks;
3322+
33203323
/*
33213324
* Zero-fill the file. With this setting, we do this the hard way to
33223325
* ensure that all the file space has really been allocated. On
@@ -3326,15 +3329,28 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
33263329
* indirect blocks are down on disk. Therefore, fdatasync(2) or
33273330
* O_DSYNC will be sufficient to sync future writes to the log file.
33283331
*/
3329-
for (nbytes = 0; nbytes < wal_segment_size; nbytes += XLOG_BLCKSZ)
3332+
3333+
/* Prepare to write out a lot of copies of our zero buffer at once. */
3334+
for (int i = 0; i < lengthof(iov); ++i)
33303335
{
3331-
errno = 0;
3332-
if (write(fd, zbuffer.data, XLOG_BLCKSZ) != XLOG_BLCKSZ)
3336+
iov[i].iov_base = zbuffer.data;
3337+
iov[i].iov_len = XLOG_BLCKSZ;
3338+
}
3339+
3340+
/* Loop, writing as many blocks as we can for each system call. */
3341+
blocks = wal_segment_size / XLOG_BLCKSZ;
3342+
for (int i = 0; i < blocks;)
3343+
{
3344+
int iovcnt = Min(blocks - i, lengthof(iov));
3345+
off_t offset = i * XLOG_BLCKSZ;
3346+
3347+
if (pg_pwritev_with_retry(fd, iov, iovcnt, offset) < 0)
33333348
{
3334-
/* if write didn't set errno, assume no disk space */
3335-
save_errno = errno ? errno : ENOSPC;
3349+
save_errno = errno;
33363350
break;
33373351
}
3352+
3353+
i += iovcnt;
33383354
}
33393355
}
33403356
else

0 commit comments

Comments
 (0)