Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit bca8b7f

Browse files
Hot Standby feedback for avoidance of cleanup conflicts on standby.
Standby optionally sends back information about oldestXmin of queries which is then checked and applied to the WALSender's proc->xmin. GetOldestXmin() is modified slightly to agree with GetSnapshotData(), so that all backends on primary include WALSender within their snapshots. Note this does nothing to change the snapshot xmin on either master or standby. Feedback piggybacks on the standby reply message. vacuum_defer_cleanup_age is no longer used on standby, though parameter still exists on primary, since some use cases still exist. Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas
1 parent 6507626 commit bca8b7f

File tree

11 files changed

+245
-50
lines changed

11 files changed

+245
-50
lines changed

doc/src/sgml/config.sgml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2006,6 +2006,10 @@ SET ENABLE_SEQSCAN TO OFF;
20062006
This parameter can only be set in the <filename>postgresql.conf</>
20072007
file or on the server command line.
20082008
</para>
2009+
<para>
2010+
You should also consider setting <varname>hot_standby_feedback</>
2011+
as an alternative to using this parameter.
2012+
</para>
20092013
</listitem>
20102014
</varlistentry>
20112015
</variablelist>
@@ -2121,6 +2125,22 @@ SET ENABLE_SEQSCAN TO OFF;
21212125
</listitem>
21222126
</varlistentry>
21232127

2128+
<varlistentry id="guc-hot-standby-feedback" xreflabel="hot_standby">
2129+
<term><varname>hot_standby_feedback</varname> (<type>boolean</type>)</term>
2130+
<indexterm>
2131+
<primary><varname>hot_standby_feedback</> configuration parameter</primary>
2132+
</indexterm>
2133+
<listitem>
2134+
<para>
2135+
Specifies whether or not a hot standby will send feedback to the primary
2136+
about queries currently executing on the standby. This parameter can
2137+
be used to eliminate query cancels caused by cleanup records, though
2138+
it can cause database bloat on the primary for some workloads.
2139+
The default value is <literal>off</literal>.
2140+
</para>
2141+
</listitem>
2142+
</varlistentry>
2143+
21242144
</variablelist>
21252145
</sect2>
21262146
</sect1>

doc/src/sgml/high-availability.sgml

Lines changed: 21 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1483,23 +1483,6 @@ if (!triggered)
14831483
on the primary, if it delays application of WAL records.
14841484
</para>
14851485

1486-
<para>
1487-
The most common reason for conflict between standby queries and WAL replay
1488-
is <quote>early cleanup</>. Normally, <productname>PostgreSQL</> allows
1489-
cleanup of old row versions when there are no transactions that need to
1490-
see them to ensure correct visibility of data according to MVCC rules.
1491-
However, this rule can only be applied for transactions executing on the
1492-
master. So it is possible that cleanup on the master will remove row
1493-
versions that are still visible to a transaction on the standby.
1494-
</para>
1495-
1496-
<para>
1497-
Experienced users should note that both row version cleanup and row version
1498-
freezing will potentially conflict with standby queries. Running a manual
1499-
<command>VACUUM FREEZE</> is likely to cause conflicts even on tables with
1500-
no updated or deleted rows.
1501-
</para>
1502-
15031486
<para>
15041487
Once the delay specified by <varname>max_standby_archive_delay</> or
15051488
<varname>max_standby_streaming_delay</> has been exceeded, conflicting
@@ -1526,6 +1509,23 @@ if (!triggered)
15261509
as a result of being unable to keep up with a heavy update load.
15271510
</para>
15281511

1512+
<para>
1513+
The most common reason for conflict between standby queries and WAL replay
1514+
is <quote>early cleanup</>. Normally, <productname>PostgreSQL</> allows
1515+
cleanup of old row versions when there are no transactions that need to
1516+
see them to ensure correct visibility of data according to MVCC rules.
1517+
However, this rule can only be applied for transactions executing on the
1518+
master. So it is possible that cleanup on the master will remove row
1519+
versions that are still visible to a transaction on the standby.
1520+
</para>
1521+
1522+
<para>
1523+
Experienced users should note that both row version cleanup and row version
1524+
freezing will potentially conflict with standby queries. Running a manual
1525+
<command>VACUUM FREEZE</> is likely to cause conflicts even on tables with
1526+
no updated or deleted rows.
1527+
</para>
1528+
15291529
<para>
15301530
Users should be clear that tables that are regularly and heavily updated
15311531
on the primary server will quickly cause cancellation of longer running
@@ -1537,12 +1537,10 @@ if (!triggered)
15371537

15381538
<para>
15391539
Remedial possibilities exist if the number of standby-query cancellations
1540-
is found to be unacceptable. The first option is to connect to the
1541-
primary server and keep a query active for as long as needed to
1542-
run queries on the standby. This prevents <command>VACUUM</> from removing
1543-
recently-dead rows and so cleanup conflicts do not occur.
1544-
This could be done using <xref linkend="dblink"> and
1545-
<function>pg_sleep()</>, or via other mechanisms. If you do this, you
1540+
is found to be unacceptable. The first option is to set the parameter
1541+
<varname>hot_standby_feedback</>, which prevents <command>VACUUM</> from
1542+
removing recently-dead rows and so cleanup conflicts do not occur.
1543+
If you do this, you
15461544
should note that this will delay cleanup of dead rows on the primary,
15471545
which may result in undesirable table bloat. However, the cleanup
15481546
situation will be no worse than if the standby queries were running

src/backend/access/transam/xlog.c

Lines changed: 54 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,11 @@ static XLogRecPtr LastRec;
158158
* known, need to check the shared state".
159159
*/
160160
static bool LocalRecoveryInProgress = true;
161+
/*
162+
* Local copy of SharedHotStandbyActive variable. False actually means "not
163+
* known, need to check the shared state".
164+
*/
165+
static bool LocalHotStandbyActive = false;
161166

162167
/*
163168
* Local state for XLogInsertAllowed():
@@ -405,6 +410,12 @@ typedef struct XLogCtlData
405410
*/
406411
bool SharedRecoveryInProgress;
407412

413+
/*
414+
* SharedHotStandbyActive indicates if we're still in crash or archive
415+
* recovery. Protected by info_lck.
416+
*/
417+
bool SharedHotStandbyActive;
418+
408419
/*
409420
* recoveryWakeupLatch is used to wake up the startup process to
410421
* continue WAL replay, if it is waiting for WAL to arrive or failover
@@ -4917,6 +4928,7 @@ XLOGShmemInit(void)
49174928
*/
49184929
XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
49194930
XLogCtl->SharedRecoveryInProgress = true;
4931+
XLogCtl->SharedHotStandbyActive = false;
49204932
XLogCtl->Insert.currpage = (XLogPageHeader) (XLogCtl->pages);
49214933
SpinLockInit(&XLogCtl->info_lck);
49224934
InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
@@ -6790,8 +6802,6 @@ StartupXLOG(void)
67906802
static void
67916803
CheckRecoveryConsistency(void)
67926804
{
6793-
static bool backendsAllowed = false;
6794-
67956805
/*
67966806
* Have we passed our safe starting point?
67976807
*/
@@ -6811,11 +6821,19 @@ CheckRecoveryConsistency(void)
68116821
* enabling connections.
68126822
*/
68136823
if (standbyState == STANDBY_SNAPSHOT_READY &&
6814-
!backendsAllowed &&
6824+
!LocalHotStandbyActive &&
68156825
reachedMinRecoveryPoint &&
68166826
IsUnderPostmaster)
68176827
{
6818-
backendsAllowed = true;
6828+
/* use volatile pointer to prevent code rearrangement */
6829+
volatile XLogCtlData *xlogctl = XLogCtl;
6830+
6831+
SpinLockAcquire(&xlogctl->info_lck);
6832+
xlogctl->SharedHotStandbyActive = true;
6833+
SpinLockRelease(&xlogctl->info_lck);
6834+
6835+
LocalHotStandbyActive = true;
6836+
68196837
SendPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY);
68206838
}
68216839
}
@@ -6862,6 +6880,38 @@ RecoveryInProgress(void)
68626880
}
68636881
}
68646882

6883+
/*
6884+
* Is HotStandby active yet? This is only important in special backends
6885+
* since normal backends won't ever be able to connect until this returns
6886+
* true. Postmaster knows this by way of signal, not via shared memory.
6887+
*
6888+
* Unlike testing standbyState, this works in any process that's connected to
6889+
* shared memory.
6890+
*/
6891+
bool
6892+
HotStandbyActive(void)
6893+
{
6894+
/*
6895+
* We check shared state each time only until Hot Standby is active. We
6896+
* can't de-activate Hot Standby, so there's no need to keep checking after
6897+
* the shared variable has once been seen true.
6898+
*/
6899+
if (LocalHotStandbyActive)
6900+
return true;
6901+
else
6902+
{
6903+
/* use volatile pointer to prevent code rearrangement */
6904+
volatile XLogCtlData *xlogctl = XLogCtl;
6905+
6906+
/* spinlock is essential on machines with weak memory ordering! */
6907+
SpinLockAcquire(&xlogctl->info_lck);
6908+
LocalHotStandbyActive = xlogctl->SharedHotStandbyActive;
6909+
SpinLockRelease(&xlogctl->info_lck);
6910+
6911+
return LocalHotStandbyActive;
6912+
}
6913+
}
6914+
68656915
/*
68666916
* Is this process allowed to insert new WAL records?
68676917
*

src/backend/replication/walreceiver.c

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,15 @@
3838
#include <signal.h>
3939
#include <unistd.h>
4040

41+
#include "access/transam.h"
4142
#include "access/xlog_internal.h"
4243
#include "libpq/pqsignal.h"
4344
#include "miscadmin.h"
4445
#include "replication/walprotocol.h"
4546
#include "replication/walreceiver.h"
4647
#include "storage/ipc.h"
4748
#include "storage/pmsignal.h"
49+
#include "storage/procarray.h"
4850
#include "utils/builtins.h"
4951
#include "utils/guc.h"
5052
#include "utils/memutils.h"
@@ -56,6 +58,7 @@ bool am_walreceiver;
5658

5759
/* GUC variable */
5860
int wal_receiver_status_interval;
61+
bool hot_standby_feedback;
5962

6063
/* libpqreceiver hooks to these when loaded */
6164
walrcv_connect_type walrcv_connect = NULL;
@@ -610,16 +613,43 @@ XLogWalRcvSendReply(void)
610613
wal_receiver_status_interval * 1000))
611614
return;
612615

613-
/* Construct a new message. */
616+
/* Construct a new message */
614617
reply_message.write = LogstreamResult.Write;
615618
reply_message.flush = LogstreamResult.Flush;
616619
reply_message.apply = GetXLogReplayRecPtr();
617620
reply_message.sendTime = now;
618621

619-
elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X",
622+
/*
623+
* Get the OldestXmin and its associated epoch
624+
*/
625+
if (hot_standby_feedback && HotStandbyActive())
626+
{
627+
TransactionId nextXid;
628+
uint32 nextEpoch;
629+
630+
reply_message.xmin = GetOldestXmin(true, false);
631+
632+
/*
633+
* Get epoch and adjust if nextXid and oldestXmin are different
634+
* sides of the epoch boundary.
635+
*/
636+
GetNextXidAndEpoch(&nextXid, &nextEpoch);
637+
if (nextXid < reply_message.xmin)
638+
nextEpoch--;
639+
reply_message.epoch = nextEpoch;
640+
}
641+
else
642+
{
643+
reply_message.xmin = InvalidTransactionId;
644+
reply_message.epoch = 0;
645+
}
646+
647+
elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X xmin %u epoch %u",
620648
reply_message.write.xlogid, reply_message.write.xrecoff,
621649
reply_message.flush.xlogid, reply_message.flush.xrecoff,
622-
reply_message.apply.xlogid, reply_message.apply.xrecoff);
650+
reply_message.apply.xlogid, reply_message.apply.xrecoff,
651+
reply_message.xmin,
652+
reply_message.epoch);
623653

624654
/* Prepend with the message type and send it. */
625655
buf[0] = 'r';

src/backend/replication/walsender.c

Lines changed: 69 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
#include "storage/ipc.h"
5454
#include "storage/pmsignal.h"
5555
#include "storage/proc.h"
56+
#include "storage/procarray.h"
5657
#include "tcop/tcopprot.h"
5758
#include "utils/builtins.h"
5859
#include "utils/guc.h"
@@ -502,6 +503,7 @@ ProcessStandbyReplyMessage(void)
502503
{
503504
StandbyReplyMessage reply;
504505
char msgtype;
506+
TransactionId newxmin = InvalidTransactionId;
505507

506508
resetStringInfo(&reply_message);
507509

@@ -531,10 +533,12 @@ ProcessStandbyReplyMessage(void)
531533

532534
pq_copymsgbytes(&reply_message, (char *) &reply, sizeof(StandbyReplyMessage));
533535

534-
elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X ",
536+
elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X xmin %u epoch %u",
535537
reply.write.xlogid, reply.write.xrecoff,
536538
reply.flush.xlogid, reply.flush.xrecoff,
537-
reply.apply.xlogid, reply.apply.xrecoff);
539+
reply.apply.xlogid, reply.apply.xrecoff,
540+
reply.xmin,
541+
reply.epoch);
538542

539543
/*
540544
* Update shared state for this WalSender process
@@ -550,6 +554,69 @@ ProcessStandbyReplyMessage(void)
550554
walsnd->apply = reply.apply;
551555
SpinLockRelease(&walsnd->mutex);
552556
}
557+
558+
/*
559+
* Update the WalSender's proc xmin to allow it to be visible
560+
* to snapshots. This will hold back the removal of dead rows
561+
* and thereby prevent the generation of cleanup conflicts
562+
* on the standby server.
563+
*/
564+
if (TransactionIdIsValid(reply.xmin))
565+
{
566+
TransactionId nextXid;
567+
uint32 nextEpoch;
568+
bool epochOK;
569+
570+
GetNextXidAndEpoch(&nextXid, &nextEpoch);
571+
572+
/*
573+
* Epoch of oldestXmin should be same as standby or
574+
* if the counter has wrapped, then one less than reply.
575+
*/
576+
if (reply.xmin <= nextXid)
577+
{
578+
if (reply.epoch == nextEpoch)
579+
epochOK = true;
580+
}
581+
else
582+
{
583+
if (nextEpoch > 0 && reply.epoch == nextEpoch - 1)
584+
epochOK = true;
585+
}
586+
587+
/*
588+
* Feedback from standby must not go backwards, nor should it go
589+
* forwards further than our most recent xid.
590+
*/
591+
if (epochOK && TransactionIdPrecedesOrEquals(reply.xmin, nextXid))
592+
{
593+
if (!TransactionIdIsValid(MyProc->xmin))
594+
{
595+
TransactionId oldestXmin = GetOldestXmin(true, true);
596+
if (TransactionIdPrecedes(oldestXmin, reply.xmin))
597+
newxmin = reply.xmin;
598+
else
599+
newxmin = oldestXmin;
600+
}
601+
else
602+
{
603+
if (TransactionIdPrecedes(MyProc->xmin, reply.xmin))
604+
newxmin = reply.xmin;
605+
else
606+
newxmin = MyProc->xmin; /* stay the same */
607+
}
608+
}
609+
}
610+
611+
/*
612+
* Grab the ProcArrayLock to set xmin, or invalidate for bad reply
613+
*/
614+
if (MyProc->xmin != newxmin)
615+
{
616+
LWLockAcquire(ProcArrayLock, LW_SHARED);
617+
MyProc->xmin = newxmin;
618+
LWLockRelease(ProcArrayLock);
619+
}
553620
}
554621

555622
/* Main loop of walsender process */

0 commit comments

Comments
 (0)