Use memcpy instead of a byte loop in pglz_decompress

tvondra · tvondra · commit c60e520f6e0e · 2019-11-28T23:29:30.000+01:00
The byte loop used in pglz_decompress() because of possible overlap may be quite inefficient, so this commit replaces it with memcpy. The gains do depend on the data (compressibility) and hardware, but seem to be quite significant. Author: Andrey Borodin Reviewed-by: Michael Paquier, Konstantin Knizhnik, Tels Discussion: https://postgr.es/m/469C9ED9-348C-4FE7-A7A7-B0FA671BEE4C@yandex-team.ru
diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
@@ -714,11 +714,13 @@ pglz_decompress(const char *source, int32 slen, char *dest,
 			if (ctrl & 1)
 			{
 				/*
-				 * Otherwise it contains the match length minus 3 and the
-				 * upper 4 bits of the offset. The next following byte
-				 * contains the lower 8 bits of the offset. If the length is
-				 * coded as 18, another extension tag byte tells how much
-				 * longer the match really was (0-255).
+				 * Set control bit means we must read a match tag. The match
+				 * is coded with two bytes. First byte uses lower nibble to
+				 * code length - 3. Higher nibble contains upper 4 bits of the
+				 * offset. The next following byte contains the lower 8 bits
+				 * of the offset. If the length is coded as 18, another
+				 * extension tag byte tells how much longer the match really
+				 * was (0-255).
 				 */
 				int32		len;
 				int32		off;
@@ -731,16 +733,44 @@ pglz_decompress(const char *source, int32 slen, char *dest,
 
 				/*
 				 * Now we copy the bytes specified by the tag from OUTPUT to
-				 * OUTPUT. It is dangerous and platform dependent to use
-				 * memcpy() here, because the copied areas could overlap
-				 * extremely!
+				 * OUTPUT (copy len bytes from dp - off to dp). The copied
+				 * areas could overlap, to preven possible uncertainty, we
+				 * copy only non-overlapping regions.
 				 */
 				len = Min(len, destend - dp);
-				while (len--)
+				while (off < len)
 				{
-					*dp = dp[-off];
-					dp++;
+					/*---------
+					 * When offset is smaller than length - source and
+					 * destination regions overlap. memmove() is resolving
+					 * this overlap in an incompatible way with pglz. Thus we
+					 * resort to memcpy()-ing non-overlapping regions.
+					 *
+					 * Consider input: 112341234123412341234
+					 * At byte 5       here ^ we have match with length 16 and
+					 * offset 4.       11234M(len=16, off=4)
+					 * We are decoding first period of match and rewrite match
+					 *                 112341234M(len=12, off=8)
+					 *
+					 * The same match is now at position 9, it points to the
+					 * same start byte of output, but from another position:
+					 * the offset is doubled.
+					 *
+					 * We iterate through this offset growth until we can
+					 * proceed to usual memcpy(). If we would try to decode
+					 * the match at byte 5 (len=16, off=4) by memmove() we
+					 * would issue memmove(5, 1, 16) which would produce
+					 * 112341234XXXXXXXXXXXX, where series of X is 12
+					 * undefined bytes, that were at bytes [5:17].
+					 *---------
+					 */
+					memcpy(dp, dp - off, off);
+					len -= off;
+					dp += off;
+					off += off;
 				}
+				memcpy(dp, dp - off, len);
+				dp += len;
 			}
 			else
 			{