Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 443b482

Browse files
Add new replication mode synchronous_commit = 'write'.
Replication occurs only to memory on standby, not to disk, so provides additional performance if user wishes to reduce durability level slightly. Adds concept of multiple independent sync rep queues. Fujii Masao and Simon Riggs
1 parent 89dda5f commit 443b482

File tree

8 files changed

+124
-52
lines changed

8 files changed

+124
-52
lines changed

doc/src/sgml/config.sgml

+13-5
Original file line numberDiff line numberDiff line change
@@ -1560,7 +1560,7 @@ SET ENABLE_SEQSCAN TO OFF;
15601560
<para>
15611561
Specifies whether transaction commit will wait for WAL records
15621562
to be written to disk before the command returns a <quote>success</>
1563-
indication to the client. Valid values are <literal>on</>,
1563+
indication to the client. Valid values are <literal>on</>, <literal>write</>,
15641564
<literal>local</>, and <literal>off</>. The default, and safe, value
15651565
is <literal>on</>. When <literal>off</>, there can be a delay between
15661566
when success is reported to the client and when the transaction is
@@ -1580,11 +1580,19 @@ SET ENABLE_SEQSCAN TO OFF;
15801580
If <xref linkend="guc-synchronous-standby-names"> is set, this
15811581
parameter also controls whether or not transaction commit will wait
15821582
for the transaction's WAL records to be flushed to disk and replicated
1583-
to the standby server. The commit wait will last until a reply from
1584-
the current synchronous standby indicates it has written the commit
1585-
record of the transaction to durable storage. If synchronous
1583+
to the standby server. When <literal>write</>, the commit wait will
1584+
last until a reply from the current synchronous standby indicates
1585+
it has received the commit record of the transaction to memory.
1586+
Normally this causes no data loss at the time of failover. However,
1587+
if both primary and standby crash, and the database cluster of
1588+
the primary gets corrupted, recent committed transactions might
1589+
be lost. When <literal>on</>, the commit wait will last until a reply
1590+
from the current synchronous standby indicates it has flushed
1591+
the commit record of the transaction to durable storage. This
1592+
avoids any data loss unless the database cluster of both primary and
1593+
standby gets corrupted simultaneously. If synchronous
15861594
replication is in use, it will normally be sensible either to wait
1587-
both for WAL records to reach both the local and remote disks, or
1595+
for both local flush and replication of WAL records, or
15881596
to allow the transaction to commit asynchronously. However, the
15891597
special value <literal>local</> is available for transactions that
15901598
wish to wait for local flush to disk, but not synchronous replication.

doc/src/sgml/high-availability.sgml

+13-3
Original file line numberDiff line numberDiff line change
@@ -1010,6 +1010,16 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
10101010
standby servers using cascaded replication.
10111011
</para>
10121012

1013+
<para>
1014+
Setting <varname>synchronous_commit</> to <literal>write</> will
1015+
cause each commit to wait for confirmation that the standby has received
1016+
the commit record to memory. This provides a lower level of durability
1017+
than <literal>on</> does. However, it's a practically useful setting
1018+
because it can decrease the response time for the transaction, and causes
1019+
no data loss unless both the primary and the standby crashes and
1020+
the database of the primary gets corrupted at the same time.
1021+
</para>
1022+
10131023
<para>
10141024
Users will stop waiting if a fast shutdown is requested. However, as
10151025
when using asynchronous replication, the server will does not fully
@@ -1065,13 +1075,13 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
10651075

10661076
<para>
10671077
Commits made when <varname>synchronous_commit</> is set to <literal>on</>
1068-
will wait until the sync standby responds. The response may never occur
1069-
if the last, or only, standby should crash.
1078+
or <literal>write</> will wait until the synchronous standby responds. The response
1079+
may never occur if the last, or only, standby should crash.
10701080
</para>
10711081

10721082
<para>
10731083
The best solution for avoiding data loss is to ensure you don't lose
1074-
your last remaining sync standby. This can be achieved by naming multiple
1084+
your last remaining synchronous standby. This can be achieved by naming multiple
10751085
potential synchronous standbys using <varname>synchronous_standby_names</>.
10761086
The first named standby will be used as the synchronous standby. Standbys
10771087
listed after this will take over the role of synchronous standby if the

src/backend/replication/syncrep.c

+75-37
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
* per-transaction state information.
2121
*
2222
* Replication is either synchronous or not synchronous (async). If it is
23-
* async, we just fastpath out of here. If it is sync, then in 9.1 we wait
24-
* for the flush location on the standby before releasing the waiting backend.
23+
* async, we just fastpath out of here. If it is sync, then we wait for
24+
* the write or flush location on the standby before releasing the waiting backend.
2525
* Further complexity in that interaction is expected in later releases.
2626
*
2727
* The best performing way to manage the waiting backends is to have a
@@ -67,13 +67,15 @@ char *SyncRepStandbyNames;
6767

6868
static bool announce_next_takeover = true;
6969

70-
static void SyncRepQueueInsert(void);
70+
static int SyncRepWaitMode = SYNC_REP_NO_WAIT;
71+
72+
static void SyncRepQueueInsert(int mode);
7173
static void SyncRepCancelWait(void);
7274

7375
static int SyncRepGetStandbyPriority(void);
7476

7577
#ifdef USE_ASSERT_CHECKING
76-
static bool SyncRepQueueIsOrderedByLSN(void);
78+
static bool SyncRepQueueIsOrderedByLSN(int mode);
7779
#endif
7880

7981
/*
@@ -120,7 +122,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
120122
* be a low cost check.
121123
*/
122124
if (!WalSndCtl->sync_standbys_defined ||
123-
XLByteLE(XactCommitLSN, WalSndCtl->lsn))
125+
XLByteLE(XactCommitLSN, WalSndCtl->lsn[SyncRepWaitMode]))
124126
{
125127
LWLockRelease(SyncRepLock);
126128
return;
@@ -132,8 +134,8 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
132134
*/
133135
MyProc->waitLSN = XactCommitLSN;
134136
MyProc->syncRepState = SYNC_REP_WAITING;
135-
SyncRepQueueInsert();
136-
Assert(SyncRepQueueIsOrderedByLSN());
137+
SyncRepQueueInsert(SyncRepWaitMode);
138+
Assert(SyncRepQueueIsOrderedByLSN(SyncRepWaitMode));
137139
LWLockRelease(SyncRepLock);
138140

139141
/* Alter ps display to show waiting for sync rep. */
@@ -267,18 +269,19 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
267269
}
268270

269271
/*
270-
* Insert MyProc into SyncRepQueue, maintaining sorted invariant.
272+
* Insert MyProc into the specified SyncRepQueue, maintaining sorted invariant.
271273
*
272274
* Usually we will go at tail of queue, though it's possible that we arrive
273275
* here out of order, so start at tail and work back to insertion point.
274276
*/
275277
static void
276-
SyncRepQueueInsert(void)
278+
SyncRepQueueInsert(int mode)
277279
{
278280
PGPROC *proc;
279281

280-
proc = (PGPROC *) SHMQueuePrev(&(WalSndCtl->SyncRepQueue),
281-
&(WalSndCtl->SyncRepQueue),
282+
Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
283+
proc = (PGPROC *) SHMQueuePrev(&(WalSndCtl->SyncRepQueue[mode]),
284+
&(WalSndCtl->SyncRepQueue[mode]),
282285
offsetof(PGPROC, syncRepLinks));
283286

284287
while (proc)
@@ -290,15 +293,15 @@ SyncRepQueueInsert(void)
290293
if (XLByteLT(proc->waitLSN, MyProc->waitLSN))
291294
break;
292295

293-
proc = (PGPROC *) SHMQueuePrev(&(WalSndCtl->SyncRepQueue),
296+
proc = (PGPROC *) SHMQueuePrev(&(WalSndCtl->SyncRepQueue[mode]),
294297
&(proc->syncRepLinks),
295298
offsetof(PGPROC, syncRepLinks));
296299
}
297300

298301
if (proc)
299302
SHMQueueInsertAfter(&(proc->syncRepLinks), &(MyProc->syncRepLinks));
300303
else
301-
SHMQueueInsertAfter(&(WalSndCtl->SyncRepQueue), &(MyProc->syncRepLinks));
304+
SHMQueueInsertAfter(&(WalSndCtl->SyncRepQueue[mode]), &(MyProc->syncRepLinks));
302305
}
303306

304307
/*
@@ -368,7 +371,8 @@ SyncRepReleaseWaiters(void)
368371
{
369372
volatile WalSndCtlData *walsndctl = WalSndCtl;
370373
volatile WalSnd *syncWalSnd = NULL;
371-
int numprocs = 0;
374+
int numwrite = 0;
375+
int numflush = 0;
372376
int priority = 0;
373377
int i;
374378

@@ -419,20 +423,28 @@ SyncRepReleaseWaiters(void)
419423
return;
420424
}
421425

422-
if (XLByteLT(walsndctl->lsn, MyWalSnd->flush))
426+
/*
427+
* Set the lsn first so that when we wake backends they will release
428+
* up to this location.
429+
*/
430+
if (XLByteLT(walsndctl->lsn[SYNC_REP_WAIT_WRITE], MyWalSnd->write))
423431
{
424-
/*
425-
* Set the lsn first so that when we wake backends they will release
426-
* up to this location.
427-
*/
428-
walsndctl->lsn = MyWalSnd->flush;
429-
numprocs = SyncRepWakeQueue(false);
432+
walsndctl->lsn[SYNC_REP_WAIT_WRITE] = MyWalSnd->write;
433+
numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE);
434+
}
435+
if (XLByteLT(walsndctl->lsn[SYNC_REP_WAIT_FLUSH], MyWalSnd->flush))
436+
{
437+
walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = MyWalSnd->flush;
438+
numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH);
430439
}
431440

432441
LWLockRelease(SyncRepLock);
433442

434-
elog(DEBUG3, "released %d procs up to %X/%X",
435-
numprocs,
443+
elog(DEBUG3, "released %d procs up to write %X/%X, %d procs up to flush %X/%X",
444+
numwrite,
445+
MyWalSnd->write.xlogid,
446+
MyWalSnd->write.xrecoff,
447+
numflush,
436448
MyWalSnd->flush.xlogid,
437449
MyWalSnd->flush.xrecoff);
438450

@@ -507,40 +519,42 @@ SyncRepGetStandbyPriority(void)
507519
}
508520

509521
/*
510-
* Walk queue from head. Set the state of any backends that need to be woken,
511-
* remove them from the queue, and then wake them. Pass all = true to wake
512-
* whole queue; otherwise, just wake up to the walsender's LSN.
522+
* Walk the specified queue from head. Set the state of any backends that
523+
* need to be woken, remove them from the queue, and then wake them.
524+
* Pass all = true to wake whole queue; otherwise, just wake up to
525+
* the walsender's LSN.
513526
*
514527
* Must hold SyncRepLock.
515528
*/
516529
int
517-
SyncRepWakeQueue(bool all)
530+
SyncRepWakeQueue(bool all, int mode)
518531
{
519532
volatile WalSndCtlData *walsndctl = WalSndCtl;
520533
PGPROC *proc = NULL;
521534
PGPROC *thisproc = NULL;
522535
int numprocs = 0;
523536

524-
Assert(SyncRepQueueIsOrderedByLSN());
537+
Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
538+
Assert(SyncRepQueueIsOrderedByLSN(mode));
525539

526-
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue),
527-
&(WalSndCtl->SyncRepQueue),
540+
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
541+
&(WalSndCtl->SyncRepQueue[mode]),
528542
offsetof(PGPROC, syncRepLinks));
529543

530544
while (proc)
531545
{
532546
/*
533547
* Assume the queue is ordered by LSN
534548
*/
535-
if (!all && XLByteLT(walsndctl->lsn, proc->waitLSN))
549+
if (!all && XLByteLT(walsndctl->lsn[mode], proc->waitLSN))
536550
return numprocs;
537551

538552
/*
539553
* Move to next proc, so we can delete thisproc from the queue.
540554
* thisproc is valid, proc may be NULL after this.
541555
*/
542556
thisproc = proc;
543-
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue),
557+
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
544558
&(proc->syncRepLinks),
545559
offsetof(PGPROC, syncRepLinks));
546560

@@ -588,7 +602,12 @@ SyncRepUpdateSyncStandbysDefined(void)
588602
* wants synchronous replication, we'd better wake them up.
589603
*/
590604
if (!sync_standbys_defined)
591-
SyncRepWakeQueue(true);
605+
{
606+
int i;
607+
608+
for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
609+
SyncRepWakeQueue(true, i);
610+
}
592611

593612
/*
594613
* Only allow people to join the queue when there are synchronous
@@ -605,16 +624,18 @@ SyncRepUpdateSyncStandbysDefined(void)
605624

606625
#ifdef USE_ASSERT_CHECKING
607626
static bool
608-
SyncRepQueueIsOrderedByLSN(void)
627+
SyncRepQueueIsOrderedByLSN(int mode)
609628
{
610629
PGPROC *proc = NULL;
611630
XLogRecPtr lastLSN;
612631

632+
Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
633+
613634
lastLSN.xlogid = 0;
614635
lastLSN.xrecoff = 0;
615636

616-
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue),
617-
&(WalSndCtl->SyncRepQueue),
637+
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
638+
&(WalSndCtl->SyncRepQueue[mode]),
618639
offsetof(PGPROC, syncRepLinks));
619640

620641
while (proc)
@@ -628,7 +649,7 @@ SyncRepQueueIsOrderedByLSN(void)
628649

629650
lastLSN = proc->waitLSN;
630651

631-
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue),
652+
proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),
632653
&(proc->syncRepLinks),
633654
offsetof(PGPROC, syncRepLinks));
634655
}
@@ -675,3 +696,20 @@ check_synchronous_standby_names(char **newval, void **extra, GucSource source)
675696

676697
return true;
677698
}
699+
700+
void
701+
assign_synchronous_commit(int newval, void *extra)
702+
{
703+
switch (newval)
704+
{
705+
case SYNCHRONOUS_COMMIT_REMOTE_WRITE:
706+
SyncRepWaitMode = SYNC_REP_WAIT_WRITE;
707+
break;
708+
case SYNCHRONOUS_COMMIT_REMOTE_FLUSH:
709+
SyncRepWaitMode = SYNC_REP_WAIT_FLUSH;
710+
break;
711+
default:
712+
SyncRepWaitMode = SYNC_REP_NO_WAIT;
713+
break;
714+
}
715+
}

src/backend/replication/walsender.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -1410,7 +1410,8 @@ WalSndShmemInit(void)
14101410
/* First time through, so initialize */
14111411
MemSet(WalSndCtl, 0, WalSndShmemSize());
14121412

1413-
SHMQueueInit(&(WalSndCtl->SyncRepQueue));
1413+
for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
1414+
SHMQueueInit(&(WalSndCtl->SyncRepQueue[i]));
14141415

14151416
for (i = 0; i < max_wal_senders; i++)
14161417
{

src/backend/utils/misc/guc.c

+3-2
Original file line numberDiff line numberDiff line change
@@ -370,11 +370,12 @@ static const struct config_enum_entry constraint_exclusion_options[] = {
370370
};
371371

372372
/*
373-
* Although only "on", "off", and "local" are documented, we
373+
* Although only "on", "off", "write", and "local" are documented, we
374374
* accept all the likely variants of "on" and "off".
375375
*/
376376
static const struct config_enum_entry synchronous_commit_options[] = {
377377
{"local", SYNCHRONOUS_COMMIT_LOCAL_FLUSH, false},
378+
{"write", SYNCHRONOUS_COMMIT_REMOTE_WRITE, false},
378379
{"on", SYNCHRONOUS_COMMIT_ON, false},
379380
{"off", SYNCHRONOUS_COMMIT_OFF, false},
380381
{"true", SYNCHRONOUS_COMMIT_ON, true},
@@ -3164,7 +3165,7 @@ static struct config_enum ConfigureNamesEnum[] =
31643165
},
31653166
&synchronous_commit,
31663167
SYNCHRONOUS_COMMIT_ON, synchronous_commit_options,
3167-
NULL, NULL, NULL
3168+
NULL, assign_synchronous_commit, NULL
31683169
},
31693170

31703171
{

src/include/access/xact.h

+1
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ typedef enum
5555
{
5656
SYNCHRONOUS_COMMIT_OFF, /* asynchronous commit */
5757
SYNCHRONOUS_COMMIT_LOCAL_FLUSH, /* wait for local flush only */
58+
SYNCHRONOUS_COMMIT_REMOTE_WRITE, /* wait for local flush and remote write */
5859
SYNCHRONOUS_COMMIT_REMOTE_FLUSH /* wait for local and remote flush */
5960
} SyncCommitLevel;
6061

src/include/replication/syncrep.h

+12-1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,16 @@
1515

1616
#include "utils/guc.h"
1717

18+
#define SyncRepRequested() \
19+
(max_wal_senders > 0 && synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
20+
21+
/* SyncRepWaitMode */
22+
#define SYNC_REP_NO_WAIT -1
23+
#define SYNC_REP_WAIT_WRITE 0
24+
#define SYNC_REP_WAIT_FLUSH 1
25+
26+
#define NUM_SYNC_REP_WAIT_MODE 2
27+
1828
/* syncRepState */
1929
#define SYNC_REP_NOT_WAITING 0
2030
#define SYNC_REP_WAITING 1
@@ -37,8 +47,9 @@ extern void SyncRepReleaseWaiters(void);
3747
extern void SyncRepUpdateSyncStandbysDefined(void);
3848

3949
/* called by various procs */
40-
extern int SyncRepWakeQueue(bool all);
50+
extern int SyncRepWakeQueue(bool all, int mode);
4151

4252
extern bool check_synchronous_standby_names(char **newval, void **extra, GucSource source);
53+
extern void assign_synchronous_commit(int newval, void *extra);
4354

4455
#endif /* _SYNCREP_H */

0 commit comments

Comments
 (0)