Comparing changes

These are fairly basic, but better than nothing. While there are several opportunities to link to these entries, this patch does not add any. They will however be referenced by future patches. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250326183102.92.nmisch@google.com

The new view lists all IO handles that are currently in use and is mainly useful for PG developers, but may also be useful when tuning PG. FIXME: - catversion bump before commit Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt

Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m

To make the tests possible, a few functions from bufmgr.c/localbuf.c had to be exported, via buf_internals.h. Reviewed-by: Noah Misch <noah@leadboat.com> Co-authored-by: Andres Freund <andres@anarazel.de> Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt

mdreadv() has a codepath to zero out buffers when a read returns zero bytes, guarded by a check for zero_damaged_pages || InRecovery. The InRecovery codepath to zero out buffers in mdreadv() appears to be unreachable. The only known paths to reach mdreadv()/mdstartreadv() in recovery are XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf(), each of which takes care to extend the relation if necessary. This looks to either have been the case for a long time, or the code was never reachable. The zero_damaged_pages path is incomplete, as as missing segments are not created. Putting blocks into the buffer-pool that do not exist on disk is rather problematic, as such blocks will, at least initially, not be found by scans that rely on smgrnblocks(), as they are beyond EOF. It also can cause weird problems with relation extension, as relation extension does not expect blocks beyond EOF to exist. Therefore we would like to remove that path. mdstartreadv(), which I added in e5fe570b51c, does not implement this zeroing logic. I had started a discussion about that a while ago (linked below), but forgot to act on the conclusion of the discussion, namely to disable the in-memory-zeroing behavior. We could certainly implement equivalent zeroing logic in mdstartreadv(), but it would have to be more complicated due to potential differences in the zero_damaged_pages setting between the definer and completor of IO. Given that we want to remove the logic, that does not seem worth implementing the necessary logic. For now, put an Assert(false) comments documenting this choice into mdreadv() and comments documenting the deprecation of the path in mdreadv() and the non-implementation of it in mdstartreadv(). If we, during testing, discover that we do need the path, we can implement it at that time. Discussion: https://postgr.es/m/postgr.es/m/20250330024513.ac.nmisch@google.com Discussion: https://postgr.es/m/postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd

Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch:

Push an ErrorContextCallback adding additional detail about the process performing the I/O and the owner of the I/O when those are not the same. For io_method worker, this adds context specifying which process owns the I/O that the I/O worker is processing. For io_method io_uring, this adds context only when a backend is *completing* I/O for another backend. It specifies the pid of the owning process. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20250325141120.8e.nmisch%40google.com

Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch:

TODO: - Right now the sync.c integration with smgr.c/md.c isn't properly safe to use in a critical section The only reason it doesn't immediately fail is that it's reasonably rare that RegisterSyncRequest() fails *and* either: - smgropen()->hash_search(HASH_ENTER) decides to resize the hash table, even though the lookup is guaranteed to succeed for io_method=worker. - an io_method=uring completion is run in a different backend and smgropen() needs to build a new entry and thus needs to allocate memory For a bit I thought this could be worked around easily enough by not doing an smgropen() in mdsyncfiletag(), or adding a "fallible" smgropen() and instead just opening the file directly. That actually does kinda solve the problem, but only because the memory allocation in PathNameOpenFile() uses malloc(), not palloc() and thus doesn't trigger - temp_file_limit implementation

As of this commit there are no users of these AIO facilities, that'll come in later commits. Problems with AIO writes: - Write logic needs to be rebased on-top of the patch series to not hit bit dirty buffers while IO is going on The performance impact of doing the memory copies is rather substantial, as on intel memory bandwidth is *the* IO bottleneck even just for the checksum computation, without a copy. That makes the memory copy for something like bounce buffers hurt really badly. And the memory usage of bounce buffers is also really concerning. And even without checksums, several filesystems *really* don't like buffers getting modified during DIO writes. Which I think would mean we ought to use bounce buffers for *all* writes, which would impose a *very* substantial overhead (basically removing the benefit of DMA happening off-cpu). - I think it requires new lwlock.c infrastructure (as v1 of aio had), to make LockBuffer(BUFFER_LOCK_EXCLUSIVE) etc wait in a concurrency safe manner for in-progress writes I can think of ways to solve this purely in bufmgr.c, but only in ways that would cause other problems (e.g. setting BM_IO_IN_PROGRESS before waiting for an exclusive lock) and/or expensive.

This is likely never going to anywhere - Thomas Munro is working on something more complete. But I needed a way to exercise aio for checkpointer / bgwriter.

This is far from ready - just included to be able to exercise AIO writes and get some preliminary numbers. In all likelihood this will instead be based on-top of work by Thomas Munro instead of the preceding commit. TODO; - This doesn't implement bgwriter_flush_after, checkpointer_flush_after I think that's not too hard to do, it's mainly round tuits. - The queuing logic doesn't carefully respect pin limits That might be ok for checkpointer and bgwriter, but the infrastructure should be usable outside of this as well.

Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi

Without this we only can execute very little AIO for sequential scans, as there's just not enough buffers in the ring. This isn't the right fix, as just increasing the ring size can have negative performance implications in workloads where the kernel has all the data cached. Author: Reviewed-By: Discussion: https://postgr.es/m/ Backpatch:

For benchmarking it's quite annoying that the first time a memory is touched has completely different perf characteristics than subsequent accesses. Using MAP_POPULATE reduces that substantially.

This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5227 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/7s6fclfekpcxoaaorwrq67v4vmgixf2dcjcgznuj7vxs3ie3wq@okqbi5vabqbx Author(s): Andres Freund

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Apr 1, 2025

This comparison is taking too long to generate.

Uh oh!