Comparing changes

When determining whether an allocated chunk is still reachable, Valgrind will consider only pointers within what it believes to be allocated chunks. Normally, all of a block obtained from malloc() would be considered "allocated" --- but it turns out that if we use VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed block as allocated, all the rest of that malloc'ed block is ignored. This leads to lots of false positives of course. In particular, in any multi-malloc-block context, all but the primary block were reported as leaked. We also had a problem with context "ident" strings, which were reported as leaked unless there was some other pointer to them besides the one in the context header. To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate a context's management structs (the context struct itself and any per-block headers) as allocated chunks. That forces moving the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into the per-context-type code, so that the pool identifier can be made as soon as we've allocated the initial block, but otherwise it's fairly straightforward. Note that in Valgrind's eyes there is no distinction between these allocations and the allocations that the mmgr modules hand out to user code. That's fine for now, but perhaps someday we'll want to do better yet. When reading this patch, it's helpful to start with the comments added at the head of mcxt.c. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de

Arrange that only the "aligned chunk" part of the allocated space is included in a Valgrind vchunk. This suppresses complaints about that vchunk being possibly lost because PG is retaining only pointers to the aligned chunk. Also make sure that trailing wasted space is marked NOACCESS. As a tiny performance improvement, arrange that MCXT_ALLOC_ZERO zeroes only the returned "aligned chunk", not the wasted padding space. In passing, fix GetLocalBufferStorage to use MemoryContextAllocAligned instead of rolling its own implementation, which was equally broken according to Valgrind. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

Because dynahash.c never frees hashtable storage except by deleting the whole hashtable context, it doesn't bother to track the individual blocks of elements allocated by element_alloc(). This results in "possibly lost" complaints from Valgrind except when the first element of each block is actively in use. (Otherwise it'll be on a freelist, but very likely only reachable via "interior pointers" within element blocks, which doesn't satisfy Valgrind.) To fix, if we're building with USE_VALGRIND, expend an extra pointer's worth of space in each element block so that we can chain them all together from the HTAB header. Skip this in shared hashtables though: Valgrind doesn't track those, and we'd need additional locking to make it safe to manipulate a shared chain. While here, update a comment obsoleted by 9c911ec. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

Put the dlist_node fields of catctup and catclist structs first. This ensures that the dlist pointers point to the starts of these palloc blocks, and thus that Valgrind won't consider them "possibly lost". The postmaster's PMChild structs and the autovac launcher's avl_dbase structs also have the dlist_node-is-not-first problem, but putting it first still wouldn't silence the warning because we bulk-allocate those structs in an array, so that Valgrind sees a single allocation. Commonly the first array element will be pointed to only from some later element, so that the reference would be an interior pointer even if it pointed to the array start. (This is the same issue fixed in the previous patch for dynahash elements.) Since these are pretty simple data structures, I don't feel too bad about faking out Valgrind by just keeping a static pointer to the array start. This is all quite hacky, and it's not hard to imagine usages where we'd need some other idea in order to have reasonable leak tracking of structures that are only accessible via dlist_node lists. But this patch seems to be enough to silence this class of leakage complaints for the moment. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

Valgrind seems not to consider the global "environ" variable as a valid root pointer; so when we allocate a new environment array, it claims that data is leaked. To fix that, keep our own statically-allocated copy of the pointer, similarly to the previous patch. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

This is the first actual leakage bug fix in this patch series. The amount of memory regained is quite negligible of course. But we don't want Valgrind whining about this in every session. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

Free a couple of data structures manually near the end of the run when USE_VALGRIND, and ensure that the final vac_update_datfrozenxid() call is done in a non-permanent context. This doesn't have any real effect on the process's total memory consumption, since we're going to exit as soon as that last transaction is done. But it does pacify Valgrind. In combination with commit 02502c1, these fixes reduce reported leakage in autovacuum workers to zero in my tests. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

The amount of storage involved here is quite negligible, but without this change Valgrind will complain about every parallel worker process. While at it, move mq_putmessage's "Assert(pq_mq_handle != NULL)" to someplace where it's not trivially useless. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

format_procedure leaks memory, so run it in a short-lived context not the session-lifespan cache context for the PL/pgSQL function. parse_datatype called the core parser in the function's cache context, thus leaking potentially a lot of storage into that context. We were also being a bit careless with the TypeName structures made in that code path and others. Most of the time we don't need to retain the TypeName, so make sure it is made in the short-lived temp context, and copy it only if we do need to retain it. These are far from the only leaks in PL/pgSQL compilation, but they're the biggest as far as I've seen, and further improvement looks like it'd require delicate and bug-prone surgery. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

PL/pgSQL and SQL-function parsing leak some stuff into the long-lived function cache context. This isn't really a huge practical problem, since it's not a large amount of data and the cruft will be recovered if we have to re-parse the function. It's not clear that it's worth working any harder than the previous patch did to eliminate these leak complaints, so instead silence them with a suppression rule. This suppression rule also hides the fact that CachedFunction structs are intentionally leaked in some cases because we're unsure if any fn_extra pointers remain. That might be nice to do something about eventually, but it's not clear how. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

Like the situation with function cache loading, text search dictionary loading functions tend to leak some cruft into the dictionary's long-lived cache context. To judge by the examples in the core regression tests, not very many bytes are at stake. Moreover, I don't see a way to prevent such leaks without changing the API for TS template initialization functions: right now they do not have to worry about making sure that their results are long-lived. Hence, I think we should install a suppression rule rather than trying to fix this completely. However, I did grab some low-hanging fruit: several places were leaking the result of get_tsearch_config_filename. This seems worth doing mostly because they are inconsistent with other dictionaries that were freeing it already. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

load_domaintype_info() intentionally leaks some intermediate cruft into the long-lived DomainConstraintCache's memory context, reasoning that the amount of leakage will typically not be much so it's not worth doing a copyObject() of the final tree to avoid that. But Valgrind knows nothing of engineering tradeoffs and complains anyway. On the whole, the copyObject doesn't cost that much and this is surely not a performance-critical code path, so let's do it the clean way. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

CompleteCachedPlan intentionally doesn't worry about small leaks from PlanCacheComputeResultDesc. However, Valgrind knows nothing of engineering tradeoffs and complains anyway. Silence it by doing things the hard way if USE_VALGRIND. I don't really love this patch, because it makes the handling of plansource->resultDesc different from the handling of query dependencies and search_path just above, which likewise are willing to accept small leaks into the cached plan's context. However, those cases aren't provoking Valgrind complaints. (Perhaps in a CLOBBER_CACHE_ALWAYS build, they would?) For the moment, this makes the src/pl/plpgsql tests leak-free according to Valgrind. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

MarkGUCPrefixReserved didn't bother to clean up removed placeholder GUCs at all, which shows up as a leak in one regression test. It seems appropriate for it to do as much cleanup as define_custom_variable does when replacing placeholders, so factor that code out into a helper function. Testing showed that define_custom_variable's logic was one brick shy of a load too: it forgot to free the separate allocation for the placeholder's name. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

This runs in a long-lived context, so Valgrind complains about it. It's not clear to me that very much memory can actually be lost. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

If the presented array is toasted, this neglected to free the detoasted copy, which was then leaked into EventTriggerCacheContext. I'm a bit distressed by the amount of code that BuildEventTriggerCache is willing to run while switched into a long-lived cache context. Although the detoasted array is the only leak that Valgrind reports, let's tighten things up while we're here. (DecodeTextArrayToBitmapset is still run in the cache context, so doing that doesn't remove the need for the detoast fix. But it reduces the surface area for other leaks.) Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

Valgrind complains about the postmaster's socket-files and lock-files lists being leaked, which we can silence by just not nulling out the static pointers to them. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

These leaks are of absolutely no real-world consequence, since they are small one-time leaks in a one-time process. But this is the last step to get to zero reported leaks in a run of the core regression tests, so let's do it. Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us

This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5748 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/2884224.1748035274@sss.pgh.pa.us Author(s): Tom Lane

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on May 31, 2025

This comparison is taking too long to generate.

Uh oh!