|
1 |
| -Proposal for memory allocation fixes, take 2 21-Jun-2000 |
2 |
| --------------------------------------------- |
| 1 | +Notes about memory allocation redesign 14-Jul-2000 |
| 2 | +-------------------------------------- |
3 | 3 |
|
4 |
| -We know that Postgres has serious problems with memory leakage during |
5 |
| -large queries that process a lot of pass-by-reference data. There is |
6 |
| -no provision for recycling memory until end of query. This needs to be |
| 4 | +Up through version 7.0, Postgres has serious problems with memory leakage |
| 5 | +during large queries that process a lot of pass-by-reference data. There |
| 6 | +is no provision for recycling memory until end of query. This needs to be |
7 | 7 | fixed, even more so with the advent of TOAST which will allow very large
|
8 | 8 | chunks of data to be passed around in the system. So, here is a proposal.
|
9 | 9 |
|
@@ -193,30 +193,53 @@ pathnodes; this will allow it to release the bulk of its temporary space
|
193 | 193 | usage (which can be a lot, for large joins) at completion of planning.
|
194 | 194 | The completed plan tree will be in TransactionCommandContext.
|
195 | 195 |
|
196 |
| -The executor will have contexts with lifetime similar to plan nodes |
197 |
| -(I'm not sure at the moment whether there's need for one such context |
198 |
| -per plan level, or whether a single context is sufficient). These |
199 |
| -contexts will hold plan-node-local execution state and related items. |
200 |
| -There will also be a context on each plan level that is reset at the start |
201 |
| -of each tuple processing cycle. This per-tuple context will be the normal |
202 |
| -CurrentMemoryContext during evaluation of expressions and so forth. By |
203 |
| -resetting it, we reclaim transient memory that was used during processing |
204 |
| -of the prior tuple. That should be enough to solve the problem of running |
205 |
| -out of memory on large queries. We must have a per-tuple context in each |
206 |
| -plan node, and we must reset it at the start of a tuple cycle rather than |
207 |
| -the end, so that each plan node can use results of expression evaluation |
208 |
| -as part of the tuple it returns to its parent node. |
209 |
| - |
210 |
| -By resetting the per-tuple context, we will be able to free memory after |
211 |
| -each tuple is processed, rather than only after the whole plan is |
212 |
| -processed. This should solve our memory leakage problems pretty well; |
213 |
| -yet we do not need to add very much new bookkeeping logic to do it. |
214 |
| -In particular, we do *not* need to try to keep track of individual values |
215 |
| -palloc'd during expression evaluation. |
216 |
| - |
217 |
| -Note we assume that resetting a context is a cheap operation. This is |
218 |
| -true already, and we can make it even more true with a little bit of |
219 |
| -tuning in aset.c. |
| 196 | +The top-level executor routines, as well as most of the "plan node" |
| 197 | +execution code, will normally run in TransactionCommandContext. Much |
| 198 | +of the memory allocated in these routines is intended to live until end |
| 199 | +of query, so this is appropriate for those purposes. We already have |
| 200 | +a mechanism --- "tuple table slots" --- for avoiding leakage of tuples, |
| 201 | +which is the major kind of short-lived data handled by these routines. |
| 202 | +This still leaves a certain amount of explicit pfree'ing needed by plan |
| 203 | +node code, but that code largely exists already and is probably not worth |
| 204 | +trying to remove. I looked at the possibility of running in a shorter- |
| 205 | +lived context (such as a context that gets reset per-tuple), but this |
| 206 | +seems fairly impractical. The biggest problem with it is that code in |
| 207 | +the index access routines, as well as some other complex algorithms like |
| 208 | +tuplesort.c, assumes that palloc'd storage will live across tuples. |
| 209 | +For example, rtree uses a palloc'd state stack to keep track of an index |
| 210 | +scan. |
| 211 | + |
| 212 | +The main improvement needed in the executor is that expression evaluation |
| 213 | +--- both for qual testing and for computation of targetlist entries --- |
| 214 | +needs to not leak memory. To do this, each ExprContext (expression-eval |
| 215 | +context) created in the executor will now have a private memory context |
| 216 | +associated with it, and we'll arrange to switch into that context when |
| 217 | +evaluating expressions in that ExprContext. The plan node that owns the |
| 218 | +ExprContext is responsible for resetting the private context to empty |
| 219 | +when it no longer needs the results of expression evaluations. Typically |
| 220 | +the reset is done at the start of each tuple-fetch cycle in the plan node. |
| 221 | + |
| 222 | +Note that this design gives each plan node its own expression-eval memory |
| 223 | +context. This appears necessary to handle nested joins properly, since |
| 224 | +an outer plan node might need to retain expression results it has computed |
| 225 | +while obtaining the next tuple from an inner node --- but the inner node |
| 226 | +might execute many tuple cycles and many expressions before returning a |
| 227 | +tuple. The inner node must be able to reset its own expression context |
| 228 | +more often than once per outer tuple cycle. Fortunately, memory contexts |
| 229 | +are cheap enough that giving one to each plan node doesn't seem like a |
| 230 | +problem. |
| 231 | + |
| 232 | +A problem with running index accesses and sorts in TransactionMemoryContext |
| 233 | +is that these operations invoke datatype-specific comparison functions, |
| 234 | +and if the comparators leak any memory then that memory won't be recovered |
| 235 | +till end of query. The comparator functions all return bool or int32, |
| 236 | +so there's no problem with their result data, but there could be a problem |
| 237 | +with leakage of internal temporary data. In particular, comparator |
| 238 | +functions that operate on TOAST-able data types will need to be careful |
| 239 | +not to leak detoasted versions of their inputs. This is annoying, but |
| 240 | +it appears a lot easier to make the comparators conform than to fix the |
| 241 | +index and sort routines, so that's what I propose to do for 7.1. Further |
| 242 | +cleanup can be left for another day. |
220 | 243 |
|
221 | 244 | There will be some special cases, such as aggregate functions. nodeAgg.c
|
222 | 245 | needs to remember the results of evaluation of aggregate transition
|
@@ -365,15 +388,3 @@ chunk of memory is allocated in (by checking the required standard chunk
|
365 | 388 | header), so nodeAgg can determine whether or not it's safe to reset
|
366 | 389 | its working context; it doesn't have to rely on the transition function
|
367 | 390 | to do what it's expecting.
|
368 |
| - |
369 |
| -It might be that the executor per-run contexts described above should |
370 |
| -be tied directly to executor "EState" nodes, that is, one context per |
371 |
| -EState. I'm not real clear on the lifespan of EStates or the situations |
372 |
| -where we have just one or more than one, so I'm not sure. Comments? |
373 |
| - |
374 |
| -It would probably be possible to adapt the existing "portal" memory |
375 |
| -management mechanism to do what we need. I am instead proposing setting |
376 |
| -up a totally new mechanism, because the portal code strikes me as |
377 |
| -extremely crufty and unwieldy. It may be that we can eventually remove |
378 |
| -portals entirely, or perhaps reimplement them with this mechanism |
379 |
| -underneath. |
|
0 commit comments