Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 78e1220

Browse files
committed
Fix pg_upgrade failure from servers older than 9.3
When upgrading from servers of versions 9.2 and older, and MultiXactIds have been used in the old server beyond the first page (that is, 2048 multis or more in the default 8kB-page build), pg_upgrade would set the next multixact offset to use beyond what has been allocated in the new cluster. This would cause a failure the first time the new cluster needs to use this value, because the pg_multixact/offsets/ file wouldn't exist or wouldn't be large enough. To fix, ensure that the transient server instances launched by pg_upgrade extend the file as necessary. Per report from Jesse Denardo in CANiVXAj4c88YqipsyFQPboqMudnjcNTdB3pqe8ReXqAFQ=HXyA@mail.gmail.com
1 parent 1bc5935 commit 78e1220

File tree

3 files changed

+92
-0
lines changed

3 files changed

+92
-0
lines changed

src/backend/access/transam/multixact.c

+47
Original file line numberDiff line numberDiff line change
@@ -1722,6 +1722,46 @@ ZeroMultiXactMemberPage(int pageno, bool writeXlog)
17221722
return slotno;
17231723
}
17241724

1725+
/*
1726+
* MaybeExtendOffsetSlru
1727+
* Extend the offsets SLRU area, if necessary
1728+
*
1729+
* After a binary upgrade from <= 9.2, the pg_multixact/offset SLRU area might
1730+
* contain files that are shorter than necessary; this would occur if the old
1731+
* installation had used multixacts beyond the first page (files cannot be
1732+
* copied, because the on-disk representation is different). pg_upgrade would
1733+
* update pg_control to set the next offset value to be at that position, so
1734+
* that tuples marked as locked by such MultiXacts would be seen as visible
1735+
* without having to consult multixact. However, trying to create and use a
1736+
* new MultiXactId would result in an error because the page on which the new
1737+
* value would reside does not exist. This routine is in charge of creating
1738+
* such pages.
1739+
*/
1740+
static void
1741+
MaybeExtendOffsetSlru(void)
1742+
{
1743+
int pageno;
1744+
1745+
pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
1746+
1747+
LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
1748+
1749+
if (!SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, pageno))
1750+
{
1751+
int slotno;
1752+
1753+
/*
1754+
* Fortunately for us, SimpleLruWritePage is already prepared to deal
1755+
* with creating a new segment file even if the page we're writing is
1756+
* not the first in it, so this is enough.
1757+
*/
1758+
slotno = ZeroMultiXactOffsetPage(pageno, false);
1759+
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
1760+
}
1761+
1762+
LWLockRelease(MultiXactOffsetControlLock);
1763+
}
1764+
17251765
/*
17261766
* This must be called ONCE during postmaster or standalone-backend startup.
17271767
*
@@ -1742,6 +1782,13 @@ StartupMultiXact(void)
17421782
int entryno;
17431783
int flagsoff;
17441784

1785+
/*
1786+
* During a binary upgrade, make sure that the offsets SLRU is large
1787+
* enough to contain the next value that would be created.
1788+
*/
1789+
if (IsBinaryUpgrade)
1790+
MaybeExtendOffsetSlru();
1791+
17451792
/* Clean up offsets state */
17461793
LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
17471794

src/backend/access/transam/slru.c

+44
Original file line numberDiff line numberDiff line change
@@ -563,6 +563,50 @@ SimpleLruWritePage(SlruCtl ctl, int slotno)
563563
SlruInternalWritePage(ctl, slotno, NULL);
564564
}
565565

566+
/*
567+
* Return whether the given page exists on disk.
568+
*
569+
* A false return means that either the file does not exist, or that it's not
570+
* large enough to contain the given page.
571+
*/
572+
bool
573+
SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
574+
{
575+
int segno = pageno / SLRU_PAGES_PER_SEGMENT;
576+
int rpageno = pageno % SLRU_PAGES_PER_SEGMENT;
577+
int offset = rpageno * BLCKSZ;
578+
char path[MAXPGPATH];
579+
int fd;
580+
bool result;
581+
off_t endpos;
582+
583+
SlruFileName(ctl, path, segno);
584+
585+
fd = OpenTransientFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
586+
if (fd < 0)
587+
{
588+
/* expected: file doesn't exist */
589+
if (errno == ENOENT)
590+
return false;
591+
592+
/* report error normally */
593+
slru_errcause = SLRU_OPEN_FAILED;
594+
slru_errno = errno;
595+
SlruReportIOError(ctl, pageno, 0);
596+
}
597+
598+
if ((endpos = lseek(fd, 0, SEEK_END)) < 0)
599+
{
600+
slru_errcause = SLRU_OPEN_FAILED;
601+
slru_errno = errno;
602+
SlruReportIOError(ctl, pageno, 0);
603+
}
604+
605+
result = endpos >= (off_t) (offset + BLCKSZ);
606+
607+
CloseTransientFile(fd);
608+
return result;
609+
}
566610

567611
/*
568612
* Physical read of a (previously existing) page into a buffer slot

src/include/access/slru.h

+1
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
145145
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
146146
extern void SimpleLruFlush(SlruCtl ctl, bool checkpoint);
147147
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
148+
extern bool SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno);
148149

149150
typedef bool (*SlruScanCallback) (SlruCtl ctl, char *filename, int segpage,
150151
void *data);

0 commit comments

Comments
 (0)