Re: out-of-order XID insertion in KnownAssignedXids
От | Konstantin Knizhnik |
---|---|
Тема | Re: out-of-order XID insertion in KnownAssignedXids |
Дата | |
Msg-id | 26dce67d-88c6-8abc-828c-6023828729e2@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: out-of-order XID insertion in KnownAssignedXids (Michael Paquier <michael@paquier.xyz>) |
Список | pgsql-hackers |
On 05.10.2018 11:04, Michael Paquier wrote: > On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote: >> As you can notice, XID 2004495308 is encountered twice which cause error in >> KnownAssignedXidsAdd: >> >> if (head > tail && >> TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid)) >> { >> KnownAssignedXidsDisplay(LOG); >> elog(ERROR, "out-of-order XID insertion in KnownAssignedXids"); >> } >> >> The probability of this error is very small but it can quite easily >> reproduced: you should just set breakpoint in debugger after calling >> MarkAsPrepared in twophase.c and then try to prepare any transaction. >> MarkAsPrepared will add GXACT to proc array and at this moment there will >> be two entries in procarray with the same XID: >> >> [snip] >> >> Now generated RUNNING_XACTS record contains duplicated XIDs. > So, I have been doing exactly that, and if you trigger a manual > checkpoint then things happen quite correctly if you let the first > session finish: > rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn: > 0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608 > latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606 > > If you still maintain the debugger after calling MarkAsPrepared, then > the manual checkpoint would block. Now if you actually keep the > debugger, and wait for a checkpoint timeout to happen, then I can see > the incorrect record. It is impressive that your customer has been able > to see that first, and then that you have been able to get into that > state with simple steps. There are about 1000 active clients performing 2PC transactions, so if you perform backup (which does checkpoint) then probability seems to be large enough. I have reproduced this problem without using gdb by just running in parallel many 2PC transactions and checkpoints: for ((i=1;i<10;i++)) do pgbench -n -T 300000 -M prepared -f t$i.sql postgres > t$i.log & done pgbench -n -T 300000 -f checkpoint.sql postgres > checkpoint.log & wait ------------------------------ tN.sql: begin; update t set val=val+1 where pk=N; prepare transaction 'tN'; commit prepared 'tN'; ------------------------------ checkpoint.sql: checkpoint; > >> I want to ask opinion of community about the best way of fixing this >> problem. Should we avoid storing duplicated XIDs in procarray (by >> invalidating XID in original pgaxct) or eliminate/change check for >> duplicate in KnownAssignedXidsAdd (for example just ignore >> duplicates)? > Hmmmmm... Please let me think through that first. It seems to me that > the record should not be generated to begin with. At least I am able to > confirm what you see. > -- > Michael -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
В списке pgsql-hackers по дате отправления: