Fix overflow check and comment in GIN posting list encoding.

hlinnaka · hlinnaka · commit 6b7819a0bd01 · 2019-08-28T12:56:03.000+03:00
The comment did not match what the code actually did for integers with the 43rd bit set. You get an integer like that, if you have a posting list with two adjacent TIDs that are more than 2^31 blocks apart. According to the comment, we would store that in 6 bytes, with no continuation bit on the 6th byte, but in reality, the code encodes it using 7 bytes, with a continuation bit on the 6th byte as normal. The decoding routine also handled these 7-byte integers correctly, except for an overflow check that assumed that one integer needs at most 6 bytes. Fix the overflow check, and fix the comment to match what the code actually does. Also fix the comment that claimed that there are 17 unused bits in the 64-bit representation of an item pointer. In reality, there are 64-32-11=21. Fitting any item pointer into max 6 bytes was an important property when this was written, because in the old pre-9.4 format, item pointers were stored as plain arrays, with 6 bytes for every item pointer. The maximum of 6 bytes per integer in the new format guaranteed that we could convert any page from the old format to the new format after upgrade, so that the new format was never larger than the old format. But we hardly need to worry about that anymore, and running into that problem during upgrade, where an item pointer is expanded from 6 to 7 bytes such that the data doesn't fit on a page anymore, is implausible in practice anyway. Backpatch to all supported versions. This also includes a little test module to test these large distances between item pointers, without requiring a 16 TB table. It is not backpatched, I'm including it more for the benefit of future development of new posting list formats. Discussion: https://www.postgresql.org/message-id/33bfc20a-5c86-f50c-f5a5-58e9925d05ff%40iki.fi Reviewed-by: Masahiko Sawada, Alexander Korotkov
diff --git a/src/backend/access/gin/ginpostinglist.c b/src/backend/access/gin/ginpostinglist.c
@@ -23,25 +23,32 @@
 /*
  * For encoding purposes, item pointers are represented as 64-bit unsigned
  * integers. The lowest 11 bits represent the offset number, and the next
- * lowest 32 bits are the block number. That leaves 17 bits unused, i.e.
+ * lowest 32 bits are the block number. That leaves 21 bits unused, i.e.
  * only 43 low bits are used.
  *
+ * 11 bits is enough for the offset number, because MaxHeapTuplesPerPage <
+ * 2^11 on all supported block sizes. We are frugal with the bits, because
+ * smaller integers use fewer bytes in the varbyte encoding, saving disk
+ * space. (If we get a new table AM in the future that wants to use the full
+ * range of possible offset numbers, we'll need to change this.)
+ *
  * These 43-bit integers are encoded using varbyte encoding. In each byte,
  * the 7 low bits contain data, while the highest bit is a continuation bit.
  * When the continuation bit is set, the next byte is part of the same
- * integer, otherwise this is the last byte of this integer.  43 bits fit
- * conveniently in at most 6 bytes when varbyte encoded (the 6th byte does
- * not need a continuation bit, because we know the max size to be 43 bits):
+ * integer, otherwise this is the last byte of this integer. 43 bits need
+ * at most 7 bytes in this encoding:
  *
  * 0XXXXXXX
  * 1XXXXXXX 0XXXXYYY
  * 1XXXXXXX 1XXXXYYY 0YYYYYYY
  * 1XXXXXXX 1XXXXYYY 1YYYYYYY 0YYYYYYY
  * 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 0YYYYYYY
- * 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY YYYYYYYY
+ * 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY 0YYYYYYY
+ * 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY 0uuuuuuY
  *
  * X = bits used for offset number
  * Y = bits used for block number
+ * u = unused bit
  *
  * The bytes are in stored in little-endian order.
  *
@@ -73,6 +80,9 @@
  */
 #define MaxHeapTuplesPerPageBits		11
 
+/* Max. number of bytes needed to encode the largest supported integer. */
+#define MaxBytesPerInteger				7
+
 static inline uint64
 itemptr_to_uint64(const ItemPointer iptr)
 {
@@ -126,33 +136,40 @@ decode_varbyte(unsigned char **ptr)
 	unsigned char *p = *ptr;
 	uint64		c;
 
+	/* 1st byte */
 	c = *(p++);
 	val = c & 0x7F;
 	if (c & 0x80)
 	{
+		/* 2nd byte */
 		c = *(p++);
 		val |= (c & 0x7F) << 7;
 		if (c & 0x80)
 		{
+			/* 3rd byte */
 			c = *(p++);
 			val |= (c & 0x7F) << 14;
 			if (c & 0x80)
 			{
+				/* 4th byte */
 				c = *(p++);
 				val |= (c & 0x7F) << 21;
 				if (c & 0x80)
 				{
+					/* 5th byte */
 					c = *(p++);
 					val |= (c & 0x7F) << 28;
 					if (c & 0x80)
 					{
+						/* 6th byte */
 						c = *(p++);
 						val |= (c & 0x7F) << 35;
 						if (c & 0x80)
 						{
-							/* last byte, no continuation bit */
+							/* 7th byte, should not have continuation bit */
 							c = *(p++);
 							val |= c << 42;
+							Assert((c & 0x80) == 0);
 						}
 					}
 				}
@@ -208,15 +225,15 @@ ginCompressPostingList(const ItemPointer ipd, int nipd, int maxsize,
 
 		Assert(val > prev);
 
-		if (endptr - ptr >= 6)
+		if (endptr - ptr >= MaxBytesPerInteger)
 			encode_varbyte(delta, &ptr);
 		else
 		{
 			/*
-			 * There are less than 6 bytes left. Have to check if the next
+			 * There are less than 7 bytes left. Have to check if the next
 			 * item fits in that space before writing it out.
 			 */
-			unsigned char buf[6];
+			unsigned char buf[MaxBytesPerInteger];
 			unsigned char *p = buf;
 
 			encode_varbyte(delta, &p);