Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit a8458f5

Browse files
committed
Fix lost Windows socket EOF events.
Winsock only signals an FD_CLOSE event once if the other end of the socket shuts down gracefully. Because each WaitLatchOrSocket() call constructs and destroys a new event handle every time, with unlucky timing we can lose it and hang. We get away with this only if the other end disconnects non-gracefully, because FD_CLOSE is repeatedly signaled in that case. To fix this design flaw in our Windows socket support fundamentally, we'd probably need to rearchitect it so that a single event handle exists for the lifetime of a socket, or switch to completely different multiplexing or async I/O APIs. That's going to be a bigger job and probably wouldn't be back-patchable. This brute force kludge closes the race by explicitly polling with MSG_PEEK before sleeping. Back-patch to all supported releases. This should hopefully clear up some random build farm and CI hang failures reported over the years. It might also allow us to try using graceful shutdown in more places again (reverted in commit 29992a6) to fix instability in the transmission of FATAL error messages, but that isn't done by this commit. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/176008.1715492071%40sss.pgh.pa.us
1 parent 291c420 commit a8458f5

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

src/backend/storage/ipc/latch.c

+32
Original file line numberDiff line numberDiff line change
@@ -1999,6 +1999,38 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
19991999
cur_event->reset = false;
20002000
}
20012001

2002+
/*
2003+
* We associate the socket with a new event handle for each
2004+
* WaitEventSet. FD_CLOSE is only generated once if the other end
2005+
* closes gracefully. Therefore we might miss the FD_CLOSE
2006+
* notification, if it was delivered to another event after we stopped
2007+
* waiting for it. Close that race by peeking for EOF after setting
2008+
* up this handle to receive notifications, and before entering the
2009+
* sleep.
2010+
*
2011+
* XXX If we had one event handle for the lifetime of a socket, we
2012+
* wouldn't need this.
2013+
*/
2014+
if (cur_event->events & WL_SOCKET_READABLE)
2015+
{
2016+
char c;
2017+
WSABUF buf;
2018+
DWORD received;
2019+
DWORD flags;
2020+
2021+
buf.buf = &c;
2022+
buf.len = 1;
2023+
flags = MSG_PEEK;
2024+
if (WSARecv(cur_event->fd, &buf, 1, &received, &flags, NULL, NULL) == 0)
2025+
{
2026+
occurred_events->pos = cur_event->pos;
2027+
occurred_events->user_data = cur_event->user_data;
2028+
occurred_events->events = WL_SOCKET_READABLE;
2029+
occurred_events->fd = cur_event->fd;
2030+
return 1;
2031+
}
2032+
}
2033+
20022034
/*
20032035
* Windows does not guarantee to log an FD_WRITE network event
20042036
* indicating that more data can be sent unless the previous send()

0 commit comments

Comments
 (0)