Re: Is pg_control file crashsafe?
От | Alex Ignatov |
---|---|
Тема | Re: Is pg_control file crashsafe? |
Дата | |
Msg-id | 572C5231.800@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: Is pg_control file crashsafe? (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On 05.05.2016 7:16, Amit Kapila wrote: > On Wed, May 4, 2016 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > > > Amit Kapila <amit.kapila16@gmail.com > <mailto:amit.kapila16@gmail.com>> writes: > > > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov > <a.ignatov@postgrespro.ru <mailto:a.ignatov@postgrespro.ru>> > > > wrote: > > >> On 03.05.2016 2:17, Tom Lane wrote: > > >>> Writing a single sector ought to be atomic too. > > > > >> pg_control is 8k long(i think it is legth of one page in default PG > > >> compile settings). > > > > > The actual data written is always sizeof(ControlFileData) which > should be > > > less than one sector. > > > > Yes. We don't care what happens to the rest of the file as long as the > > first sector's worth is updated atomically. See the comments for > > PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile. > > > > We could change to a different PG_CONTROL_SIZE pretty easily, and there's > > certainly room to argue that reducing it to 512 or 1024 would be more > > efficient. I think the motivation for setting it at 8K was basically > > "we're already assuming that 8K writes are efficient, so let's assume > > it here too". But since the file is only written once per checkpoint, > > efficiency is not really a key selling point anyway. If you could make > > an argument that some other size would reduce the risk of failures, > > it would be interesting --- but I suspect any such argument would be > > very dependent on the quirks of a specific file system. > > > > How about using 512 bytes as a write size and perform direct writes > rather than going via OS buffer cache for control file? Alex, is the > issue reproducible (to ensure that if we try to solve it in some way, do > we have way to test it as well)? > > > > > One point worth considering is that on most file systems, rewriting > > a fraction of a page is *less* efficient than rewriting a full page, > > because the kernel first has to read in the old contents to fill > > the disk buffer it's going to partially overwrite with new data. > > This motivates against trying to reduce the write size too much. > > > > Yes, you are very much right and I have observed that recently during my > work on WAL Re-Writes [1]. However, I think that won't be the issue if > we use direct writes for control file. > > > [1] - > http://www.postgresql.org/message-id/CAA4eK1+=O33dZZ=jBtjXBFyD67R5dLcqFyOMj4f-qmFXBP1OOQ@mail.gmail.com > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/> Hi! No issue happened only once. Also any attempts to reproduce it is not successful yet
В списке pgsql-hackers по дате отправления: