Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d122387

Browse files
committed
Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmxid values.
In commit a61daa1, we fixed pg_upgrade so that it would install sane relminmxid and datminmxid values, but that does not cure the problem for installations that were already pg_upgraded to 9.3; they'll initially have "1" in those fields. This is not a big problem so long as 1 is "in the past" compared to the current nextMultiXact counter. But if an installation were more than halfway to the MXID wrap point at the time of upgrade, 1 would appear to be "in the future" and that would effectively disable tracking of oldest MXIDs in those tables/databases, until such time as the counter wrapped around. While in itself this isn't worse than the situation pre-9.3, where we did not manage MXID wraparound risk at all, the consequences of premature truncation of pg_multixact are worse now; so we ought to make some effort to cope with this. We discussed advising users to fix the tracking values manually, but that seems both very tedious and very error-prone. Instead, this patch adopts two amelioration rules. First, a relminmxid value that is "in the future" is allowed to be overwritten with a full-table VACUUM's actual freeze cutoff, ignoring the normal rule that relminmxid should never go backwards. (This essentially assumes that we have enough defenses in place that wraparound can never occur anymore, and thus that a value "in the future" must be corrupt.) Second, if we see any "in the future" values then we refrain from truncating pg_clog and pg_multixact. This prevents loss of clog data until we have cleaned up all the broken tracking data. In the worst case that could result in considerable clog bloat, but in practice we expect that relfrozenxid-driven freezing will happen soon enough to fix the problem before clog bloat becomes intolerable. (Users could do manual VACUUM FREEZEs if not.) Note that this mechanism cannot save us if there are already-wrapped or already-truncated-away MXIDs in the table; it's only capable of dealing with corrupt tracking values. But that's the situation we have with the pg_upgrade bug. For consistency, apply the same rules to relfrozenxid/datfrozenxid. There are not known mechanisms for these to get messed up, but if they were, the same tactics seem appropriate for fixing them.
1 parent 4bcdfb9 commit d122387

File tree

1 file changed

+99
-26
lines changed

1 file changed

+99
-26
lines changed

src/backend/commands/vacuum.c

+99-26
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,10 @@ static BufferAccessStrategy vac_strategy;
6666

6767
/* non-export function prototypes */
6868
static List *get_rel_oids(Oid relid, const RangeVar *vacrel);
69-
static void vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti);
69+
static void vac_truncate_clog(TransactionId frozenXID,
70+
MultiXactId minMulti,
71+
TransactionId lastSaneFrozenXid,
72+
MultiXactId lastSaneMinMulti);
7073
static bool vacuum_rel(Oid relid, VacuumStmt *vacstmt, bool do_toast,
7174
bool for_wraparound);
7275

@@ -733,19 +736,33 @@ vac_update_relstats(Relation relation,
733736
}
734737

735738
/*
736-
* relfrozenxid should never go backward. Caller can pass
737-
* InvalidTransactionId if it has no new data.
739+
* Update relfrozenxid, unless caller passed InvalidTransactionId
740+
* indicating it has no new data.
741+
*
742+
* Ordinarily, we don't let relfrozenxid go backwards: if things are
743+
* working correctly, the only way the new frozenxid could be older would
744+
* be if a previous VACUUM was done with a tighter freeze_min_age, in
745+
* which case we don't want to forget the work it already did. However,
746+
* if the stored relfrozenxid is "in the future", then it must be corrupt
747+
* and it seems best to overwrite it with the cutoff we used this time.
748+
* See vac_update_datfrozenxid() concerning what we consider to be "in the
749+
* future".
738750
*/
739751
if (TransactionIdIsNormal(frozenxid) &&
740-
TransactionIdPrecedes(pgcform->relfrozenxid, frozenxid))
752+
pgcform->relfrozenxid != frozenxid &&
753+
(TransactionIdPrecedes(pgcform->relfrozenxid, frozenxid) ||
754+
TransactionIdPrecedes(GetOldestXmin(NULL, true),
755+
pgcform->relfrozenxid)))
741756
{
742757
pgcform->relfrozenxid = frozenxid;
743758
dirty = true;
744759
}
745760

746-
/* relminmxid must never go backward, either */
761+
/* Similarly for relminmxid */
747762
if (MultiXactIdIsValid(minmulti) &&
748-
MultiXactIdPrecedes(pgcform->relminmxid, minmulti))
763+
pgcform->relminmxid != minmulti &&
764+
(MultiXactIdPrecedes(pgcform->relminmxid, minmulti) ||
765+
MultiXactIdPrecedes(GetOldestMultiXactId(), pgcform->relminmxid)))
749766
{
750767
pgcform->relminmxid = minmulti;
751768
dirty = true;
@@ -772,8 +789,8 @@ vac_update_relstats(Relation relation,
772789
* truncate pg_clog and pg_multixact.
773790
*
774791
* We violate transaction semantics here by overwriting the database's
775-
* existing pg_database tuple with the new value. This is reasonably
776-
* safe since the new value is correct whether or not this transaction
792+
* existing pg_database tuple with the new values. This is reasonably
793+
* safe since the new values are correct whether or not this transaction
777794
* commits. As with vac_update_relstats, this avoids leaving dead tuples
778795
* behind after a VACUUM.
779796
*/
@@ -786,7 +803,10 @@ vac_update_datfrozenxid(void)
786803
SysScanDesc scan;
787804
HeapTuple classTup;
788805
TransactionId newFrozenXid;
806+
TransactionId lastSaneFrozenXid;
789807
MultiXactId newMinMulti;
808+
MultiXactId lastSaneMinMulti;
809+
bool bogus = false;
790810
bool dirty = false;
791811

792812
/*
@@ -795,13 +815,13 @@ vac_update_datfrozenxid(void)
795815
* committed pg_class entries for new tables; see AddNewRelationTuple().
796816
* So we cannot produce a wrong minimum by starting with this.
797817
*/
798-
newFrozenXid = GetOldestXmin(NULL, true);
818+
newFrozenXid = lastSaneFrozenXid = GetOldestXmin(NULL, true);
799819

800820
/*
801821
* Similarly, initialize the MultiXact "min" with the value that would be
802822
* used on pg_class for new tables. See AddNewRelationTuple().
803823
*/
804-
newMinMulti = GetOldestMultiXactId();
824+
newMinMulti = lastSaneMinMulti = GetOldestMultiXactId();
805825

806826
/*
807827
* We must seqscan pg_class to find the minimum Xid, because there is no
@@ -828,6 +848,21 @@ vac_update_datfrozenxid(void)
828848
Assert(TransactionIdIsNormal(classForm->relfrozenxid));
829849
Assert(MultiXactIdIsValid(classForm->relminmxid));
830850

851+
/*
852+
* If things are working properly, no relation should have a
853+
* relfrozenxid or relminmxid that is "in the future". However, such
854+
* cases have been known to arise due to bugs in pg_upgrade. If we
855+
* see any entries that are "in the future", chicken out and don't do
856+
* anything. This ensures we won't truncate clog before those
857+
* relations have been scanned and cleaned up.
858+
*/
859+
if (TransactionIdPrecedes(lastSaneFrozenXid, classForm->relfrozenxid) ||
860+
MultiXactIdPrecedes(lastSaneMinMulti, classForm->relminmxid))
861+
{
862+
bogus = true;
863+
break;
864+
}
865+
831866
if (TransactionIdPrecedes(classForm->relfrozenxid, newFrozenXid))
832867
newFrozenXid = classForm->relfrozenxid;
833868

@@ -839,6 +874,10 @@ vac_update_datfrozenxid(void)
839874
systable_endscan(scan);
840875
heap_close(relation, AccessShareLock);
841876

877+
/* chicken out if bogus data found */
878+
if (bogus)
879+
return;
880+
842881
Assert(TransactionIdIsNormal(newFrozenXid));
843882
Assert(MultiXactIdIsValid(newMinMulti));
844883

@@ -852,21 +891,30 @@ vac_update_datfrozenxid(void)
852891
dbform = (Form_pg_database) GETSTRUCT(tuple);
853892

854893
/*
855-
* Don't allow datfrozenxid to go backward (probably can't happen anyway);
856-
* and detect the common case where it doesn't go forward either.
894+
* As in vac_update_relstats(), we ordinarily don't want to let
895+
* datfrozenxid go backward; but if it's "in the future" then it must be
896+
* corrupt and it seems best to overwrite it.
857897
*/
858-
if (TransactionIdPrecedes(dbform->datfrozenxid, newFrozenXid))
898+
if (dbform->datfrozenxid != newFrozenXid &&
899+
(TransactionIdPrecedes(dbform->datfrozenxid, newFrozenXid) ||
900+
TransactionIdPrecedes(lastSaneFrozenXid, dbform->datfrozenxid)))
859901
{
860902
dbform->datfrozenxid = newFrozenXid;
861903
dirty = true;
862904
}
905+
else
906+
newFrozenXid = dbform->datfrozenxid;
863907

864-
/* ditto */
865-
if (MultiXactIdPrecedes(dbform->datminmxid, newMinMulti))
908+
/* Ditto for datminmxid */
909+
if (dbform->datminmxid != newMinMulti &&
910+
(MultiXactIdPrecedes(dbform->datminmxid, newMinMulti) ||
911+
MultiXactIdPrecedes(lastSaneMinMulti, dbform->datminmxid)))
866912
{
867913
dbform->datminmxid = newMinMulti;
868914
dirty = true;
869915
}
916+
else
917+
newMinMulti = dbform->datminmxid;
870918

871919
if (dirty)
872920
heap_inplace_update(relation, tuple);
@@ -875,12 +923,13 @@ vac_update_datfrozenxid(void)
875923
heap_close(relation, RowExclusiveLock);
876924

877925
/*
878-
* If we were able to advance datfrozenxid, see if we can truncate
879-
* pg_clog. Also do it if the shared XID-wrap-limit info is stale, since
880-
* this action will update that too.
926+
* If we were able to advance datfrozenxid or datminmxid, see if we can
927+
* truncate pg_clog and/or pg_multixact. Also do it if the shared
928+
* XID-wrap-limit info is stale, since this action will update that too.
881929
*/
882930
if (dirty || ForceTransactionIdLimitUpdate())
883-
vac_truncate_clog(newFrozenXid, newMinMulti);
931+
vac_truncate_clog(newFrozenXid, newMinMulti,
932+
lastSaneFrozenXid, lastSaneMinMulti);
884933
}
885934

886935

@@ -890,31 +939,38 @@ vac_update_datfrozenxid(void)
890939
* Scan pg_database to determine the system-wide oldest datfrozenxid,
891940
* and use it to truncate the transaction commit log (pg_clog).
892941
* Also update the XID wrap limit info maintained by varsup.c.
942+
* Likewise for datminmxid.
893943
*
894-
* The passed XID is simply the one I just wrote into my pg_database
895-
* entry. It's used to initialize the "min" calculation.
944+
* The passed frozenXID and minMulti are the updated values for my own
945+
* pg_database entry. They're used to initialize the "min" calculations.
946+
* The caller also passes the "last sane" XID and MXID, since it has
947+
* those at hand already.
896948
*
897949
* This routine is only invoked when we've managed to change our
898-
* DB's datfrozenxid entry, or we found that the shared XID-wrap-limit
899-
* info is stale.
950+
* DB's datfrozenxid/datminmxid values, or we found that the shared
951+
* XID-wrap-limit info is stale.
900952
*/
901953
static void
902-
vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti)
954+
vac_truncate_clog(TransactionId frozenXID,
955+
MultiXactId minMulti,
956+
TransactionId lastSaneFrozenXid,
957+
MultiXactId lastSaneMinMulti)
903958
{
904959
TransactionId myXID = GetCurrentTransactionId();
905960
Relation relation;
906961
HeapScanDesc scan;
907962
HeapTuple tuple;
908963
Oid oldestxid_datoid;
909964
Oid minmulti_datoid;
965+
bool bogus = false;
910966
bool frozenAlreadyWrapped = false;
911967

912-
/* init oldest datoids to sync with my frozen values */
968+
/* init oldest datoids to sync with my frozenXID/minMulti values */
913969
oldestxid_datoid = MyDatabaseId;
914970
minmulti_datoid = MyDatabaseId;
915971

916972
/*
917-
* Scan pg_database to compute the minimum datfrozenxid
973+
* Scan pg_database to compute the minimum datfrozenxid/datminmxid
918974
*
919975
* Note: we need not worry about a race condition with new entries being
920976
* inserted by CREATE DATABASE. Any such entry will have a copy of some
@@ -936,6 +992,19 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti)
936992
Assert(TransactionIdIsNormal(dbform->datfrozenxid));
937993
Assert(MultiXactIdIsValid(dbform->datminmxid));
938994

995+
/*
996+
* If things are working properly, no database should have a
997+
* datfrozenxid or datminmxid that is "in the future". However, such
998+
* cases have been known to arise due to bugs in pg_upgrade. If we
999+
* see any entries that are "in the future", chicken out and don't do
1000+
* anything. This ensures we won't truncate clog before those
1001+
* databases have been scanned and cleaned up. (We will issue the
1002+
* "already wrapped" warning if appropriate, though.)
1003+
*/
1004+
if (TransactionIdPrecedes(lastSaneFrozenXid, dbform->datfrozenxid) ||
1005+
MultiXactIdPrecedes(lastSaneMinMulti, dbform->datminmxid))
1006+
bogus = true;
1007+
9391008
if (TransactionIdPrecedes(myXID, dbform->datfrozenxid))
9401009
frozenAlreadyWrapped = true;
9411010
else if (TransactionIdPrecedes(dbform->datfrozenxid, frozenXID))
@@ -969,6 +1038,10 @@ vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti)
9691038
return;
9701039
}
9711040

1041+
/* chicken out if data is bogus in any other way */
1042+
if (bogus)
1043+
return;
1044+
9721045
/*
9731046
* Truncate CLOG to the oldest computed value. Note we don't truncate
9741047
* multixacts; that will be done by the next checkpoint.

0 commit comments

Comments
 (0)