Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 94ae6ba

Browse files
committed
Send keepalives from walsender even when busy sending WAL.
If walsender doesn't hear from the client for the time specified by wal_sender_timeout, it will conclude the connection or client is dead, and disconnect. When half of wal_sender_timeout has elapsed, it sends a ping to the client, leaving it the remainig half of wal_sender_timeout to respond. However, it only checked if half of wal_sender_timeout had elapsed when it was about to sleep, so if it was busy sending WAL to the client for long enough, it would not send the ping request in time. Then the client would not know it needs to send a reply, and the walsender will disconnect even though the client is still alive. Fix that. Andres Freund, reviewed by Robert Haas, and some further changes by me. Backpatch to 9.3. Earlier versions relied on the client to send the keepalives on its own, and hence didn't have this problem.
1 parent bf4052f commit 94ae6ba

File tree

1 file changed

+29
-24
lines changed

1 file changed

+29
-24
lines changed

src/backend/replication/walsender.c

+29-24
Original file line numberDiff line numberDiff line change
@@ -1258,6 +1258,27 @@ WalSndLoop(void)
12581258
}
12591259
}
12601260

1261+
/*
1262+
* If half of wal_sender_timeout has elapsed without receiving any
1263+
* reply from standby, send a keep-alive message requesting an
1264+
* immediate reply.
1265+
*/
1266+
if (wal_sender_timeout > 0 && !ping_sent)
1267+
{
1268+
TimestampTz timeout;
1269+
1270+
timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
1271+
wal_sender_timeout / 2);
1272+
if (GetCurrentTimestamp() >= timeout)
1273+
{
1274+
WalSndKeepalive(true);
1275+
ping_sent = true;
1276+
/* Try to flush pending output to the client */
1277+
if (pq_flush_if_writable() != 0)
1278+
goto send_failure;
1279+
}
1280+
}
1281+
12611282
/*
12621283
* We don't block if not caught up, unless there is unsent data
12631284
* pending in which case we'd better block until the socket is
@@ -1267,7 +1288,7 @@ WalSndLoop(void)
12671288
*/
12681289
if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
12691290
{
1270-
TimestampTz timeout = 0;
1291+
TimestampTz timeout;
12711292
long sleeptime = 10000; /* 10 s */
12721293
int wakeEvents;
12731294

@@ -1276,32 +1297,14 @@ WalSndLoop(void)
12761297

12771298
if (pq_is_send_pending())
12781299
wakeEvents |= WL_SOCKET_WRITEABLE;
1279-
else if (wal_sender_timeout > 0 && !ping_sent)
1280-
{
1281-
/*
1282-
* If half of wal_sender_timeout has lapsed without receiving
1283-
* any reply from standby, send a keep-alive message to
1284-
* standby requesting an immediate reply.
1285-
*/
1286-
timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
1287-
wal_sender_timeout / 2);
1288-
if (GetCurrentTimestamp() >= timeout)
1289-
{
1290-
WalSndKeepalive(true);
1291-
ping_sent = true;
1292-
/* Try to flush pending output to the client */
1293-
if (pq_flush_if_writable() != 0)
1294-
goto send_failure;
1295-
}
1296-
}
12971300

1298-
/* Determine time until replication timeout */
1301+
/*
1302+
* If wal_sender_timeout is active, sleep in smaller increments
1303+
* to not go over the timeout too much. XXX: Why not just sleep
1304+
* until the timeout has elapsed?
1305+
*/
12991306
if (wal_sender_timeout > 0)
1300-
{
1301-
timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
1302-
wal_sender_timeout);
13031307
sleeptime = 1 + (wal_sender_timeout / 10);
1304-
}
13051308

13061309
/* Sleep until something happens or we time out */
13071310
ImmediateInterruptOK = true;
@@ -1315,6 +1318,8 @@ WalSndLoop(void)
13151318
* possibility that the client replied just as we reached the
13161319
* timeout ... he's supposed to reply *before* that.
13171320
*/
1321+
timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
1322+
wal_sender_timeout);
13181323
if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
13191324
{
13201325
/*

0 commit comments

Comments
 (0)