Postmaster fails to shut down right after crash restart
От | Sergey Shinderuk |
---|---|
Тема | Postmaster fails to shut down right after crash restart |
Дата | |
Msg-id | 63dcad16-22de-4326-a395-5310bc7e05ff@postgrespro.ru обсуждение исходный текст |
Список | pgsql-hackers |
Hello, While developing a patch and running regression tests I noticed that the postmaster could fail to shut down right after crash restart. It could get stuck in the PM_WAIT_BACKENDS state forever. As far as I understand, the problem occurs when a shutdown signal is received before getting PMSIGNAL_RECOVERY_STARTED from the startup process. In that case the FatalError flag is not cleared, and the postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer, which ignores SIGTERM. To easily reproduce the problem I added pg_usleep in xlogrecovery.c just before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch attached. Then I run a script that simulates a crash and does pg_ctl stop: $ ./init.sh [...] $ ./stop-after-crash.sh waiting for server to start.... done server started waiting for server to shut down............................................................... failed pg_ctl: server does not shut down Some processes are still alive: $ ps uf -C postgres USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND sergey 279874 0.0 0.0 222816 28560 ? Ss 14:25 0:00 /home/sergey/pgwork/devel/install/bin/postgres -D data sergey 279887 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_ postgres: io worker 0 sergey 279888 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_ postgres: io worker 1 sergey 279889 0.0 0.0 222772 5664 ? Ss 14:25 0:00 \_ postgres: io worker 2 sergey 279891 0.0 0.0 222884 8480 ? Ss 14:25 0:00 \_ postgres: checkpointer Here is an excerpt from the debug log: postmaster[279874] LOG: all server processes terminated; reinitializing startup[279890] LOG: database system was interrupted; last known up at 2025-04-24 14:25:58 MSK startup[279890] LOG: database system was not properly shut down; automatic recovery in progress postmaster[279874] DEBUG: postmaster received shutdown request signal postmaster[279874] LOG: received fast shutdown request postmaster[279874] DEBUG: updating PMState from PM_STARTUP to PM_STOP_BACKENDS postmaster[279874] DEBUG: sending signal 15/SIGTERM to background writer process with pid 279892 postmaster[279874] DEBUG: sending signal 15/SIGTERM to checkpointer process with pid 279891 postmaster[279874] DEBUG: sending signal 15/SIGTERM to startup process with pid 279890 postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker process with pid 279889 postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker process with pid 279888 postmaster[279874] DEBUG: sending signal 15/SIGTERM to io worker process with pid 279887 postmaster[279874] DEBUG: updating PMState from PM_STOP_BACKENDS to PM_WAIT_BACKENDS startup[279890] LOG: invalid record length at 0/175A4D8: expected at least 24, got 0 postmaster[279874] DEBUG: postmaster received pmsignal signal startup[279890] LOG: redo is not required checkpointer[279891] LOG: checkpoint starting: end-of-recovery immediate wait checkpointer[279891] LOG: checkpoint complete: wrote 0 buffers (0.0%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo lsn=0/175A4D8 startup[279890] DEBUG: exit(0) postmaster[279874] DEBUG: updating PMState from PM_WAIT_BACKENDS to PM_WAIT_BACKENDS checkpointer[279891] DEBUG: checkpoint skipped because system is idle checkpointer[279891] DEBUG: checkpoint skipped because system is idle I don't know how to fix this, but thought it's worth reporting. Best regards, -- Sergey Shinderuk https://postgrespro.com/
Вложения
В списке pgsql-hackers по дате отправления: