Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit f2e3f62

Browse files
committed
Update implementation notes for new memory management logic.
1 parent e40492e commit f2e3f62

File tree

1 file changed

+52
-41
lines changed

1 file changed

+52
-41
lines changed

src/backend/utils/mmgr/README

Lines changed: 52 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
Proposal for memory allocation fixes, take 2 21-Jun-2000
2-
--------------------------------------------
1+
Notes about memory allocation redesign 14-Jul-2000
2+
--------------------------------------
33

4-
We know that Postgres has serious problems with memory leakage during
5-
large queries that process a lot of pass-by-reference data. There is
6-
no provision for recycling memory until end of query. This needs to be
4+
Up through version 7.0, Postgres has serious problems with memory leakage
5+
during large queries that process a lot of pass-by-reference data. There
6+
is no provision for recycling memory until end of query. This needs to be
77
fixed, even more so with the advent of TOAST which will allow very large
88
chunks of data to be passed around in the system. So, here is a proposal.
99

@@ -193,30 +193,53 @@ pathnodes; this will allow it to release the bulk of its temporary space
193193
usage (which can be a lot, for large joins) at completion of planning.
194194
The completed plan tree will be in TransactionCommandContext.
195195

196-
The executor will have contexts with lifetime similar to plan nodes
197-
(I'm not sure at the moment whether there's need for one such context
198-
per plan level, or whether a single context is sufficient). These
199-
contexts will hold plan-node-local execution state and related items.
200-
There will also be a context on each plan level that is reset at the start
201-
of each tuple processing cycle. This per-tuple context will be the normal
202-
CurrentMemoryContext during evaluation of expressions and so forth. By
203-
resetting it, we reclaim transient memory that was used during processing
204-
of the prior tuple. That should be enough to solve the problem of running
205-
out of memory on large queries. We must have a per-tuple context in each
206-
plan node, and we must reset it at the start of a tuple cycle rather than
207-
the end, so that each plan node can use results of expression evaluation
208-
as part of the tuple it returns to its parent node.
209-
210-
By resetting the per-tuple context, we will be able to free memory after
211-
each tuple is processed, rather than only after the whole plan is
212-
processed. This should solve our memory leakage problems pretty well;
213-
yet we do not need to add very much new bookkeeping logic to do it.
214-
In particular, we do *not* need to try to keep track of individual values
215-
palloc'd during expression evaluation.
216-
217-
Note we assume that resetting a context is a cheap operation. This is
218-
true already, and we can make it even more true with a little bit of
219-
tuning in aset.c.
196+
The top-level executor routines, as well as most of the "plan node"
197+
execution code, will normally run in TransactionCommandContext. Much
198+
of the memory allocated in these routines is intended to live until end
199+
of query, so this is appropriate for those purposes. We already have
200+
a mechanism --- "tuple table slots" --- for avoiding leakage of tuples,
201+
which is the major kind of short-lived data handled by these routines.
202+
This still leaves a certain amount of explicit pfree'ing needed by plan
203+
node code, but that code largely exists already and is probably not worth
204+
trying to remove. I looked at the possibility of running in a shorter-
205+
lived context (such as a context that gets reset per-tuple), but this
206+
seems fairly impractical. The biggest problem with it is that code in
207+
the index access routines, as well as some other complex algorithms like
208+
tuplesort.c, assumes that palloc'd storage will live across tuples.
209+
For example, rtree uses a palloc'd state stack to keep track of an index
210+
scan.
211+
212+
The main improvement needed in the executor is that expression evaluation
213+
--- both for qual testing and for computation of targetlist entries ---
214+
needs to not leak memory. To do this, each ExprContext (expression-eval
215+
context) created in the executor will now have a private memory context
216+
associated with it, and we'll arrange to switch into that context when
217+
evaluating expressions in that ExprContext. The plan node that owns the
218+
ExprContext is responsible for resetting the private context to empty
219+
when it no longer needs the results of expression evaluations. Typically
220+
the reset is done at the start of each tuple-fetch cycle in the plan node.
221+
222+
Note that this design gives each plan node its own expression-eval memory
223+
context. This appears necessary to handle nested joins properly, since
224+
an outer plan node might need to retain expression results it has computed
225+
while obtaining the next tuple from an inner node --- but the inner node
226+
might execute many tuple cycles and many expressions before returning a
227+
tuple. The inner node must be able to reset its own expression context
228+
more often than once per outer tuple cycle. Fortunately, memory contexts
229+
are cheap enough that giving one to each plan node doesn't seem like a
230+
problem.
231+
232+
A problem with running index accesses and sorts in TransactionMemoryContext
233+
is that these operations invoke datatype-specific comparison functions,
234+
and if the comparators leak any memory then that memory won't be recovered
235+
till end of query. The comparator functions all return bool or int32,
236+
so there's no problem with their result data, but there could be a problem
237+
with leakage of internal temporary data. In particular, comparator
238+
functions that operate on TOAST-able data types will need to be careful
239+
not to leak detoasted versions of their inputs. This is annoying, but
240+
it appears a lot easier to make the comparators conform than to fix the
241+
index and sort routines, so that's what I propose to do for 7.1. Further
242+
cleanup can be left for another day.
220243

221244
There will be some special cases, such as aggregate functions. nodeAgg.c
222245
needs to remember the results of evaluation of aggregate transition
@@ -365,15 +388,3 @@ chunk of memory is allocated in (by checking the required standard chunk
365388
header), so nodeAgg can determine whether or not it's safe to reset
366389
its working context; it doesn't have to rely on the transition function
367390
to do what it's expecting.
368-
369-
It might be that the executor per-run contexts described above should
370-
be tied directly to executor "EState" nodes, that is, one context per
371-
EState. I'm not real clear on the lifespan of EStates or the situations
372-
where we have just one or more than one, so I'm not sure. Comments?
373-
374-
It would probably be possible to adapt the existing "portal" memory
375-
management mechanism to do what we need. I am instead proposing setting
376-
up a totally new mechanism, because the portal code strikes me as
377-
extremely crufty and unwieldy. It may be that we can eventually remove
378-
portals entirely, or perhaps reimplement them with this mechanism
379-
underneath.

0 commit comments

Comments
 (0)