Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit b6d9060

Browse files
committed
Fix multiple portability issues in pg_upgrade's rewriteVisibilityMap().
This is new code in 9.6, and evidently we missed out testing it as thoroughly as it should have been. Bugs fixed here: 1. Use binary not text mode to open the files on Windows. Before, if the visibility map chanced to contain two bytes that looked like \r\n, Windows' read() would convert that to \n, which both corrupts the map data and causes the file to look shorter than it should. Unless you were *very* unlucky and had an exact multiple of 8K such occurrences in each VM file, this would cause pg_upgrade to report a failure, though with a rather obscure error message. 2. The code for copying rebuilt bytes into the output was simply wrong. It chanced to work okay on little-endian machines but would emit the bytes in the wrong order on big-endian, leading to silent corruption of the visibility map data. 3. The code was careless about alignment of the working buffers. Given all three of an alignment-picky architecture, a compiler that chooses to put the new_vmbuf[] local variable at an odd starting address, and a checksum-enabled database, pg_upgrade would dump core. Point one was reported by Thomas Kellerer, the other two detected by code-reading. Point two is much the nastiest of these issues from an impact standpoint, though fortunately it affects only a minority of users. The Windows issue will definitely bite people, but it seems quite unlikely that there would be undetected corruption from that. In addition, I failed to resist the temptation to do some minor cosmetic adjustments, mostly improving the comments. It would be a good idea to try to improve the error reporting here, but that seems like material for a separate patch. Discussion: <nsjrbh$8li$1@blaine.gmane.org>
1 parent 41d58e9 commit b6d9060

File tree

1 file changed

+49
-42
lines changed

1 file changed

+49
-42
lines changed

src/bin/pg_upgrade/file.c

+49-42
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@
1818
#include <sys/stat.h>
1919
#include <fcntl.h>
2020

21-
#define BITS_PER_HEAPBLOCK_OLD 1
22-
2321

2422
#ifndef WIN32
2523
static int copy_file(const char *fromfile, const char *tofile);
@@ -84,10 +82,11 @@ copy_file(const char *srcfile, const char *dstfile)
8482
return -1;
8583
}
8684

87-
if ((src_fd = open(srcfile, O_RDONLY, 0)) < 0)
85+
if ((src_fd = open(srcfile, O_RDONLY | PG_BINARY, 0)) < 0)
8886
return -1;
8987

90-
if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) < 0)
88+
if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
89+
S_IRUSR | S_IWUSR)) < 0)
9190
{
9291
save_errno = errno;
9392

@@ -153,31 +152,30 @@ copy_file(const char *srcfile, const char *dstfile)
153152
* version, we could refuse to copy visibility maps from the old cluster
154153
* to the new cluster; the next VACUUM would recreate them, but at the
155154
* price of scanning the entire table. So, instead, we rewrite the old
156-
* visibility maps in the new format. That way, the all-visible bit
157-
* remains set for the pages for which it was set previously. The
158-
* all-frozen bit is never set by this conversion; we leave that to
159-
* VACUUM.
155+
* visibility maps in the new format. That way, the all-visible bits
156+
* remain set for the pages for which they were set previously. The
157+
* all-frozen bits are never set by this conversion; we leave that to VACUUM.
160158
*/
161159
const char *
162160
rewriteVisibilityMap(const char *fromfile, const char *tofile)
163161
{
164-
int src_fd = 0;
165-
int dst_fd = 0;
166-
char buffer[BLCKSZ];
167-
ssize_t bytesRead;
162+
int src_fd;
163+
int dst_fd;
164+
char *buffer;
165+
char *new_vmbuf;
168166
ssize_t totalBytesRead = 0;
169167
ssize_t src_filesize;
170168
int rewriteVmBytesPerPage;
171169
BlockNumber new_blkno = 0;
172170
struct stat statbuf;
173171

174-
/* Compute we need how many old page bytes to rewrite a new page */
172+
/* Compute number of old-format bytes per new page */
175173
rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2;
176174

177175
if ((fromfile == NULL) || (tofile == NULL))
178176
return "Invalid old file or new file";
179177

180-
if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
178+
if ((src_fd = open(fromfile, O_RDONLY | PG_BINARY, 0)) < 0)
181179
return getErrorText();
182180

183181
if (fstat(src_fd, &statbuf) != 0)
@@ -186,7 +184,8 @@ rewriteVisibilityMap(const char *fromfile, const char *tofile)
186184
return getErrorText();
187185
}
188186

189-
if ((dst_fd = open(tofile, O_RDWR | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR)) < 0)
187+
if ((dst_fd = open(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
188+
S_IRUSR | S_IWUSR)) < 0)
190189
{
191190
close(src_fd);
192191
return getErrorText();
@@ -195,14 +194,22 @@ rewriteVisibilityMap(const char *fromfile, const char *tofile)
195194
/* Save old file size */
196195
src_filesize = statbuf.st_size;
197196

197+
/*
198+
* Malloc the work buffers, rather than making them local arrays, to
199+
* ensure adequate alignment.
200+
*/
201+
buffer = (char *) pg_malloc(BLCKSZ);
202+
new_vmbuf = (char *) pg_malloc(BLCKSZ);
203+
198204
/*
199205
* Turn each visibility map page into 2 pages one by one. Each new page
200-
* has the same page header as the old one. If the last section of last
201-
* page is empty, we skip it, mostly to avoid turning one-page visibility
202-
* maps for small relations into two pages needlessly.
206+
* has the same page header as the old one. If the last section of the
207+
* last page is empty, we skip it, mostly to avoid turning one-page
208+
* visibility maps for small relations into two pages needlessly.
203209
*/
204210
while (totalBytesRead < src_filesize)
205211
{
212+
ssize_t bytesRead;
206213
char *old_cur;
207214
char *old_break;
208215
char *old_blkend;
@@ -225,61 +232,59 @@ rewriteVisibilityMap(const char *fromfile, const char *tofile)
225232
/*
226233
* These old_* variables point to old visibility map page. old_cur
227234
* points to current position on old page. old_blkend points to end of
228-
* old block. old_break points to old page break position for
229-
* rewriting a new page. After wrote a new page, old_break proceeds
230-
* rewriteVmBytesPerPage bytes.
235+
* old block. old_break is the end+1 position on the old page for the
236+
* data that will be transferred to the current new page.
231237
*/
232238
old_cur = buffer + SizeOfPageHeaderData;
233239
old_blkend = buffer + bytesRead;
234240
old_break = old_cur + rewriteVmBytesPerPage;
235241

236-
while (old_blkend >= old_break)
242+
while (old_break <= old_blkend)
237243
{
238-
char new_vmbuf[BLCKSZ];
239-
char *new_cur = new_vmbuf;
244+
char *new_cur;
240245
bool empty = true;
241246
bool old_lastpart;
242247

243-
/* Copy page header in advance */
248+
/* First, copy old page header to new page */
244249
memcpy(new_vmbuf, &pageheader, SizeOfPageHeaderData);
245250

246-
/* Rewrite the last part of the old page? */
247-
old_lastpart = old_lastblk && (old_blkend == old_break);
251+
/* Rewriting the last part of the last old page? */
252+
old_lastpart = old_lastblk && (old_break == old_blkend);
248253

249-
new_cur += SizeOfPageHeaderData;
254+
new_cur = new_vmbuf + SizeOfPageHeaderData;
250255

251256
/* Process old page bytes one by one, and turn it into new page. */
252-
while (old_break > old_cur)
257+
while (old_cur < old_break)
253258
{
259+
uint8 byte = *(uint8 *) old_cur;
254260
uint16 new_vmbits = 0;
255261
int i;
256262

257263
/* Generate new format bits while keeping old information */
258264
for (i = 0; i < BITS_PER_BYTE; i++)
259265
{
260-
uint8 byte = *(uint8 *) old_cur;
261-
262-
if (byte & (1 << (BITS_PER_HEAPBLOCK_OLD * i)))
266+
if (byte & (1 << i))
263267
{
264268
empty = false;
265-
new_vmbits |= 1 << (BITS_PER_HEAPBLOCK * i);
269+
new_vmbits |=
270+
VISIBILITYMAP_ALL_VISIBLE << (BITS_PER_HEAPBLOCK * i);
266271
}
267272
}
268273

269-
/* Copy new visibility map bit to new format page */
270-
memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK);
274+
/* Copy new visibility map bytes to new-format page */
275+
new_cur[0] = (char) (new_vmbits & 0xFF);
276+
new_cur[1] = (char) (new_vmbits >> 8);
271277

272-
old_cur += BITS_PER_HEAPBLOCK_OLD;
278+
old_cur++;
273279
new_cur += BITS_PER_HEAPBLOCK;
274280
}
275281

276-
/* If the last part of the old page is empty, skip writing it */
282+
/* If the last part of the last page is empty, skip writing it */
277283
if (old_lastpart && empty)
278284
break;
279285

280-
/* Set new checksum for a visibility map page (if enabled) */
281-
if (old_cluster.controldata.data_checksum_version != 0 &&
282-
new_cluster.controldata.data_checksum_version != 0)
286+
/* Set new checksum for visibility map page, if enabled */
287+
if (new_cluster.controldata.data_checksum_version != 0)
283288
((PageHeader) new_vmbuf)->pd_checksum =
284289
pg_checksum_page(new_vmbuf, new_blkno);
285290

@@ -290,17 +295,19 @@ rewriteVisibilityMap(const char *fromfile, const char *tofile)
290295
return getErrorText();
291296
}
292297

298+
/* Advance for next new page */
293299
old_break += rewriteVmBytesPerPage;
294300
new_blkno++;
295301
}
296302
}
297303

298-
/* Close files */
304+
/* Clean up */
305+
pg_free(buffer);
306+
pg_free(new_vmbuf);
299307
close(dst_fd);
300308
close(src_fd);
301309

302310
return NULL;
303-
304311
}
305312

306313
void

0 commit comments

Comments
 (0)