Re: Physical replication slot advance is not persistent
От | Alexey Kondratov |
---|---|
Тема | Re: Physical replication slot advance is not persistent |
Дата | |
Msg-id | 175c2760666a78205e053207794c0f8f@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Physical replication slot advance is not persistent (Alexey Kondratov <a.kondratov@postgrespro.ru>) |
Ответы |
Re: Physical replication slot advance is not persistent
|
Список | pgsql-hackers |
On 2019-12-26 16:35, Alexey Kondratov wrote: > > Another concern is that ReplicationSlotIsDirty is added with the only > one user. It also cannot be used by SaveSlotToPath due to the > simultaneous usage of both flags dirty and just_dirtied there. > > In that way, I hope that we should call ReplicationSlotSave > unconditionally in the pg_replication_slot_advance, so slot will be > saved or not automatically based on the slot->dirty flag. In the same > time, ReplicationSlotsComputeRequiredXmin and > ReplicationSlotsComputeRequiredLSN should be called by anyone, who > modifies xmin and LSN fields in the slot. Otherwise, currently we are > getting some leaky abstractions. > It seems that there was even a race in the order of actions inside pg_replication_slot_advance, it did following: - ReplicationSlotMarkDirty(); - ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); - ReplicationSlotSave(); 1) Mark slot as dirty, which actually does nothing immediately, but setting dirty flag; 2) Do compute new global required LSN; 3) Flush slot state to disk. If someone will utilise old WAL and after that crash will happen between steps 2) and 3), then we start with old value of restart_lsn, but without required WAL. I do not know how to properly reproduce it without gdb and power off, so the chance is pretty low, but still it could be a case. Logical slots were not affected again, since there was a proper operations order (with comments) and slot flushing routines inside LogicalConfirmReceivedLocation. Thus, in the attached patch I have decided to do not perform slot flushing in the pg_replication_slot_advance at all and do it in the pg_physical_replication_slot_advance instead, as it is done in the LogicalConfirmReceivedLocation. Since this bugfix have not moved forward during the week, I will put it on the 01.2020 commitfest. Kyotaro, if you do not object I will add you as a reviewer, as you have already gave a lot of feedback, thank you for that! Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: