Reduce memory block size for decoded tuple storage to 8kB.

MasahikoSawada · MasahikoSawada · commit eef9cc4dc28f · 2024-10-16T12:08:02.000-07:00
Commit a4ccc1c introduced the Generation Context and modified the logical decoding process to use a Generation Context with a fixed block size of 8MB for storing tuple data decoded during logical decoding (i.e., rb->tup_context). Several reports have indicated that the logical decoding process can be terminated due to out-of-memory (OOM) situations caused by excessive memory usage in rb->tup_context. This issue can occur when decoding a workload involving several concurrent transactions, including a long-running transaction that modifies tuples. By design, the Generation Context does not free a memory block until all chunks within that block are released. Consequently, if tuples modified by the long-running transaction are stored across multiple memory blocks, these blocks remain allocated until the long-running transaction completes, leading to substantial memory fragmentation. The memory usage during logical decoding, tracked by rb->size, does not account for memory fragmentation, resulting in potentially much higher memory consumption than the value of the logical_decoding_work_mem parameter. Various improvement strategies were discussed in the relevant thread. This change reduces the block size of the Generation Context used in rb->tup_context from 8MB to 8kB. This modification significantly decreases the likelihood of substantial memory fragmentation occurring and is relatively straightforward to backport. Performance testing across multiple platforms has confirmed that this change will not introduce any performance degradation that would impact actual operation. Backport to all supported branches. Reported-by: Alex Richman, Michael Guissine, Avi Weinberg Reviewed-by: Amit Kapila, Fujii Masao, David Rowley Tested-by: Hayato Kuroda, Shlok Kyal Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com Backpatch-through: 12
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
@@ -337,15 +337,20 @@ ReorderBufferAllocate(void)
 											sizeof(ReorderBufferTXN));
 
 	/*
-	 * XXX the allocation sizes used below pre-date generation context's block
-	 * growing code.  These values should likely be benchmarked and set to
-	 * more suitable values.
+	 * To minimize memory fragmentation caused by long-running transactions
+	 * with changes spanning multiple memory blocks, we use a single
+	 * fixed-size memory block for decoded tuple storage. The performance
+	 * testing showed that the default memory block size maintains logical
+	 * decoding performance without causing fragmentation due to concurrent
+	 * transactions. One might think that we can use the max size as
+	 * SLAB_LARGE_BLOCK_SIZE but the test also showed it doesn't help resolve
+	 * the memory fragmentation.
 	 */
 	buffer->tup_context = GenerationContextCreate(new_ctx,
 												  "Tuples",
-												  SLAB_LARGE_BLOCK_SIZE,
-												  SLAB_LARGE_BLOCK_SIZE,
-												  SLAB_LARGE_BLOCK_SIZE);
+												  SLAB_DEFAULT_BLOCK_SIZE,
+												  SLAB_DEFAULT_BLOCK_SIZE,
+												  SLAB_DEFAULT_BLOCK_SIZE);
 
 	hash_ctl.keysize = sizeof(TransactionId);
 	hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);