Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Wake up for latches in CheckpointWriteDelay().
authorThomas Munro <tmunro@postgresql.org>
Wed, 16 Mar 2022 00:37:58 +0000 (13:37 +1300)
committerThomas Munro <tmunro@postgresql.org>
Wed, 16 Mar 2022 00:57:07 +0000 (13:57 +1300)
The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily when
shared_buffers is small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de

doc/src/sgml/monitoring.sgml
src/backend/postmaster/checkpointer.c
src/backend/utils/activity/wait_event.c
src/include/utils/wait_event.h

index 15e51f926803a76c9e6811d6ffbe69ff22f639c5..b8ffc210a4a4b11a786187873a54c8dad3a903e2 100644 (file)
@@ -2223,6 +2223,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry><literal>BaseBackupThrottle</literal></entry>
       <entry>Waiting during base backup when throttling activity.</entry>
      </row>
+     <row>
+      <entry><literal>CheckpointerWriteDelay</literal></entry>
+      <entry>Waiting between writes while performing a checkpoint.</entry>
+     </row>
      <row>
       <entry><literal>PgSleep</literal></entry>
       <entry>Waiting due to a call to <function>pg_sleep</function> or
index 75a95f3de7a8fe1cd0f5bb9cb0986d2112dbbb5a..86996750dcd2dc9441883fa4c34f0e3645a64d20 100644 (file)
@@ -490,6 +490,9 @@ CheckpointerMain(void)
            }
 
            ckpt_active = false;
+
+           /* We may have received an interrupt during the checkpoint. */
+           HandleCheckpointerInterrupts();
        }
 
        /* Check for archive_timeout and switch xlog files if necessary. */
@@ -732,7 +735,10 @@ CheckpointWriteDelay(int flags, double progress)
         * Checkpointer and bgwriter are no longer related so take the Big
         * Sleep.
         */
-       pg_usleep(100000L);
+       WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+                 100,
+                 WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+       ResetLatch(MyLatch);
    }
    else if (--absorb_counter <= 0)
    {
index 6baf67740c7dd4823554abc27697ee00cd210340..affbcf25db60d282c51a9b638695239e10c6a24f 100644 (file)
@@ -473,6 +473,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
        case WAIT_EVENT_BASE_BACKUP_THROTTLE:
            event_name = "BaseBackupThrottle";
            break;
+       case WAIT_EVENT_CHECKPOINT_WRITE_DELAY:
+           event_name = "CheckpointWriteDelay";
+           break;
        case WAIT_EVENT_PG_SLEEP:
            event_name = "PgSleep";
            break;
index 6c6ec2e7118fa35c846bfd580a837e00b31f1ab7..1fb6f640138609fad88ba8a14d5b136b346ad4c9 100644 (file)
@@ -140,7 +140,8 @@ typedef enum
    WAIT_EVENT_PG_SLEEP,
    WAIT_EVENT_RECOVERY_APPLY_DELAY,
    WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL,
-   WAIT_EVENT_VACUUM_DELAY
+   WAIT_EVENT_VACUUM_DELAY,
+   WAIT_EVENT_CHECKPOINT_WRITE_DELAY
 } WaitEventTimeout;
 
 /* ----------