Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit faf1324

Browse files
committed
Fix race in dsm_attach() when handles are reused.
DSM handle values can be reused as soon as the underlying shared memory object has been destroyed. That means that for a brief moment we might have two DSM slots with the same handle. While trying to attach, if we encounter a slot with refcnt == 1, meaning that it is currently being destroyed, we should continue our search in case the same handle exists in another slot. The race manifested as a rare "dsa_area could not attach to segment" error, and was more likely in 10 and 11 due to the lack of distinct seed for random() in parallel workers. It was made very unlikely in in master by commit 197e4af, and older releases don't usually create new DSM segments in background workers so it was also unlikely there. This fixes the root cause of bug report #15585, in which the error could also sometimes result in a self-deadlock in the error path. It's not yet clear if further changes are needed to avoid that failure mode. Back-patch to 9.4, where dsm.c arrived. Author: Thomas Munro Reported-by: Justin Pryzby, Sergei Kornilov Discussion: https://postgr.es/m/20190207014719.GJ29720@telsasoft.com Discussion: https://postgr.es/m/15585-324ff6a93a18da46@postgresql.org
1 parent b8386b0 commit faf1324

File tree

1 file changed

+8
-10
lines changed
  • src/backend/storage/ipc

1 file changed

+8
-10
lines changed

src/backend/storage/ipc/dsm.c

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -584,22 +584,20 @@ dsm_attach(dsm_handle h)
584584
nitems = dsm_control->nitems;
585585
for (i = 0; i < nitems; ++i)
586586
{
587-
/* If the reference count is 0, the slot is actually unused. */
588-
if (dsm_control->item[i].refcnt == 0)
587+
/*
588+
* If the reference count is 0, the slot is actually unused. If the
589+
* reference count is 1, the slot is still in use, but the segment is
590+
* in the process of going away; even if the handle matches, another
591+
* slot may already have started using the same handle value by
592+
* coincidence so we have to keep searching.
593+
*/
594+
if (dsm_control->item[i].refcnt <= 1)
589595
continue;
590596

591597
/* If the handle doesn't match, it's not the slot we want. */
592598
if (dsm_control->item[i].handle != seg->handle)
593599
continue;
594600

595-
/*
596-
* If the reference count is 1, the slot is still in use, but the
597-
* segment is in the process of going away. Treat that as if we
598-
* didn't find a match.
599-
*/
600-
if (dsm_control->item[i].refcnt == 1)
601-
break;
602-
603601
/* Otherwise we've found a match. */
604602
dsm_control->item[i].refcnt++;
605603
seg->control_slot = i;

0 commit comments

Comments
 (0)