|
1 |
| -$Header: /cvsroot/pgsql/src/backend/utils/mmgr/README,v 1.3 2001/02/15 21:38:26 tgl Exp $ |
| 1 | +$Header: /cvsroot/pgsql/src/backend/utils/mmgr/README,v 1.4 2003/04/30 19:04:12 tgl Exp $ |
2 | 2 |
|
3 | 3 | Notes about memory allocation redesign
|
4 | 4 | --------------------------------------
|
@@ -110,109 +110,121 @@ children of a given context, but don't reset or delete that context
|
110 | 110 | itself".
|
111 | 111 |
|
112 | 112 |
|
113 |
| -Top-level contexts |
114 |
| ------------------- |
| 113 | +Globally known contexts |
| 114 | +----------------------- |
115 | 115 |
|
116 |
| -There will be several top-level contexts --- these contexts have no parent |
117 |
| -and will be referenced by global variables. At any instant the system may |
| 116 | +There will be several widely-known contexts that will typically be |
| 117 | +referenced through global variables. At any instant the system may |
118 | 118 | contain many additional contexts, but all other contexts should be direct
|
119 |
| -or indirect children of one of the top-level contexts to ensure they are |
120 |
| -not leaked in event of an error. I presently envision these top-level |
121 |
| -contexts: |
122 |
| - |
123 |
| -TopMemoryContext --- allocating here is essentially the same as "malloc", |
124 |
| -because this context will never be reset or deleted. This is for stuff |
125 |
| -that should live forever, or for stuff that you know you will delete |
126 |
| -at the appropriate time. An example is fd.c's tables of open files, |
127 |
| -as well as the context management nodes for memory contexts themselves. |
128 |
| -Avoid allocating stuff here unless really necessary, and especially |
129 |
| -avoid running with CurrentMemoryContext pointing here. |
| 119 | +or indirect children of one of these contexts to ensure they are not |
| 120 | +leaked in event of an error. |
| 121 | + |
| 122 | +TopMemoryContext --- this is the actual top level of the context tree; |
| 123 | +every other context is a direct or indirect child of this one. Allocating |
| 124 | +here is essentially the same as "malloc", because this context will never |
| 125 | +be reset or deleted. This is for stuff that should live forever, or for |
| 126 | +stuff that the controlling module will take care of deleting at the |
| 127 | +appropriate time. An example is fd.c's tables of open files, as well as |
| 128 | +the context management nodes for memory contexts themselves. Avoid |
| 129 | +allocating stuff here unless really necessary, and especially avoid |
| 130 | +running with CurrentMemoryContext pointing here. |
130 | 131 |
|
131 | 132 | PostmasterContext --- this is the postmaster's normal working context.
|
132 | 133 | After a backend is spawned, it can delete PostmasterContext to free its
|
133 | 134 | copy of memory the postmaster was using that it doesn't need. (Anything
|
134 | 135 | that has to be passed from postmaster to backends will be passed in
|
135 |
| -TopMemoryContext. The postmaster will probably have only TopMemoryContext, |
136 |
| -PostmasterContext, and possibly ErrorContext --- the remaining top-level |
137 |
| -contexts will be set up in each backend during startup.) |
| 136 | +TopMemoryContext. The postmaster will have only TopMemoryContext, |
| 137 | +PostmasterContext, and ErrorContext --- the remaining top-level contexts |
| 138 | +will be set up in each backend during startup.) |
138 | 139 |
|
139 | 140 | CacheMemoryContext --- permanent storage for relcache, catcache, and
|
140 | 141 | related modules. This will never be reset or deleted, either, so it's
|
141 | 142 | not truly necessary to distinguish it from TopMemoryContext. But it
|
142 | 143 | seems worthwhile to maintain the distinction for debugging purposes.
|
143 |
| -(Note: CacheMemoryContext may well have child-contexts with shorter |
144 |
| -lifespans. For example, a child context seems like the best place to |
145 |
| -keep the subsidiary storage associated with a relcache entry; that way |
146 |
| -we can free rule parsetrees and so forth easily, without having to depend |
147 |
| -on constructing a reliable version of freeObject().) |
148 |
| - |
149 |
| -QueryContext --- this is where the storage holding a received query string |
150 |
| -is kept, as well as storage that should live as long as the query string, |
151 |
| -notably the parsetree constructed from it. This context will be reset at |
152 |
| -the top of each cycle of the outer loop of PostgresMain, thereby freeing |
153 |
| -the old query and parsetree. We must keep this separate from |
154 |
| -TopTransactionContext because a query string might need to live either a |
155 |
| -longer or shorter time than a transaction, depending on whether it |
156 |
| -contains begin/end commands or not. (This'll also fix the nasty bug that |
157 |
| -"vacuum; anything else" crashes if submitted as a single query string, |
158 |
| -because vacuum's xact commit frees the memory holding the parsetree...) |
| 144 | +(Note: CacheMemoryContext will have child-contexts with shorter lifespans. |
| 145 | +For example, a child context is the best place to keep the subsidiary |
| 146 | +storage associated with a relcache entry; that way we can free rule |
| 147 | +parsetrees and so forth easily, without having to depend on constructing |
| 148 | +a reliable version of freeObject().) |
| 149 | + |
| 150 | +MessageContext --- this context holds the current command message from the |
| 151 | +frontend, as well as any derived storage that need only live as long as |
| 152 | +the current message (for example, in simple-Query mode the parse and plan |
| 153 | +trees can live here). This context will be reset, and any children |
| 154 | +deleted, at the top of each cycle of the outer loop of PostgresMain. This |
| 155 | +is kept separate from per-transaction and per-portal contexts because a |
| 156 | +query string might need to live either a longer or shorter time than any |
| 157 | +single transaction or portal. |
159 | 158 |
|
160 | 159 | TopTransactionContext --- this holds everything that lives until end of
|
161 | 160 | transaction (longer than one statement within a transaction!). An example
|
162 | 161 | of what has to be here is the list of pending NOTIFY messages to be sent
|
163 | 162 | at xact commit. This context will be reset, and all its children deleted,
|
164 |
| -at conclusion of each transaction cycle. Note: presently I envision that |
165 |
| -this context will NOT be cleared immediately upon error; its contents |
166 |
| -will survive anyway until the transaction block is exited by |
167 |
| -COMMIT/ROLLBACK. This seems appropriate since we want to move in the |
168 |
| -direction of allowing a transaction to continue processing after an error. |
169 |
| - |
170 |
| -TransactionCommandContext --- this is really a child of |
171 |
| -TopTransactionContext, not a top-level context, but we'll probably store a |
172 |
| -link to it in a global variable anyway for convenience. All the memory |
173 |
| -allocated during planning and execution lives here or in a child context. |
174 |
| -This context is deleted at statement completion, whether normal completion |
175 |
| -or error abort. |
176 |
| - |
177 |
| -ErrorContext --- this permanent context will be switched into |
178 |
| -for error recovery processing, and then reset on completion of recovery. |
179 |
| -We'll arrange to have, say, 8K of memory available in it at all times. |
180 |
| -In this way, we can ensure that some memory is available for error |
181 |
| -recovery even if the backend has run out of memory otherwise. This should |
182 |
| -allow out-of-memory to be treated as a normal ERROR condition, not a FATAL |
183 |
| -error. |
184 |
| - |
185 |
| -If we ever implement nested transactions, there may need to be some |
186 |
| -additional levels of transaction-local contexts between |
187 |
| -TopTransactionContext and TransactionCommandContext, but that's beyond |
188 |
| -the scope of this proposal. |
| 163 | +at conclusion of each transaction cycle. Note: this context is NOT |
| 164 | +cleared immediately upon error; its contents will survive until the |
| 165 | +transaction block is exited by COMMIT/ROLLBACK. |
| 166 | +(If we ever implement nested transactions, TopTransactionContext may need |
| 167 | +to be split into a true "top" pointer and a "current transaction" pointer.) |
| 168 | + |
| 169 | +QueryContext --- this is not actually a separate context, but a global |
| 170 | +variable pointing to the context that holds the current command's parse |
| 171 | +and plan trees. (In simple-Query mode this points to MessageContext; |
| 172 | +when executing a prepared statement it will point at the prepared |
| 173 | +statement's private context.) Generally it is not appropriate for any |
| 174 | +code to use QueryContext as an allocation target --- from the point of |
| 175 | +view of any code that would be referencing the QueryContext variable, |
| 176 | +it's a read-only context. |
| 177 | + |
| 178 | +PortalContext --- this is not actually a separate context either, but a |
| 179 | +global variable pointing to the per-portal context of the currently active |
| 180 | +execution portal. This can be used if it's necessary to allocate storage |
| 181 | +that will live just as long as the execution of the current portal requires. |
| 182 | + |
| 183 | +ErrorContext --- this permanent context will be switched into for error |
| 184 | +recovery processing, and then reset on completion of recovery. We'll |
| 185 | +arrange to have, say, 8K of memory available in it at all times. In this |
| 186 | +way, we can ensure that some memory is available for error recovery even |
| 187 | +if the backend has run out of memory otherwise. This allows out-of-memory |
| 188 | +to be treated as a normal ERROR condition, not a FATAL error. |
| 189 | + |
| 190 | + |
| 191 | +Contexts for prepared statements and portals |
| 192 | +-------------------------------------------- |
| 193 | + |
| 194 | +A prepared-statement object has an associated private context, in which |
| 195 | +the parse and plan trees for its query are stored. Because these trees |
| 196 | +are read-only to the executor, the prepared statement can be re-used many |
| 197 | +times without further copying of these trees. QueryContext points at this |
| 198 | +private context while executing any portal built from the prepared |
| 199 | +statement. |
| 200 | + |
| 201 | +An execution-portal object has a private context that is referenced by |
| 202 | +PortalContext when the portal is active. In the case of a portal created |
| 203 | +by DECLARE CURSOR, this private context contains the query parse and plan |
| 204 | +trees (there being no other object that can hold them). Portals created |
| 205 | +from prepared statements simply reference the prepared statements' trees, |
| 206 | +and won't actually need any storage allocated in their private contexts. |
189 | 207 |
|
190 | 208 |
|
191 | 209 | Transient contexts during execution
|
192 | 210 | -----------------------------------
|
193 | 211 |
|
194 |
| -The planner will probably have a transient context in which it stores |
195 |
| -pathnodes; this will allow it to release the bulk of its temporary space |
196 |
| -usage (which can be a lot, for large joins) at completion of planning. |
197 |
| -The completed plan tree will be in TransactionCommandContext. |
| 212 | +When creating a prepared statement, the parse and plan trees will be built |
| 213 | +in a temporary context that's a child of MessageContext (so that it will |
| 214 | +go away automatically upon error). On success, the finished plan is |
| 215 | +copied to the prepared statement's private context, and the temp context |
| 216 | +is released; this allows planner temporary space to be recovered before |
| 217 | +execution begins. (In simple-Query mode we'll not bother with the extra |
| 218 | +copy step, so the planner temp space stays around till end of query.) |
198 | 219 |
|
199 | 220 | The top-level executor routines, as well as most of the "plan node"
|
200 |
| -execution code, will normally run in a context with command lifetime. |
201 |
| -(This will be TransactionCommandContext for normal queries, but when |
202 |
| -executing a cursor, it will be a context associated with the cursor.) |
203 |
| -Most of the memory allocated in these routines is intended to live until |
204 |
| -end of query, so this is appropriate for those purposes. We already have |
205 |
| -a mechanism --- "tuple table slots" --- for avoiding leakage of tuples, |
206 |
| -which is the major kind of short-lived data handled by these routines. |
207 |
| -This still leaves a certain amount of explicit pfree'ing needed by plan |
208 |
| -node code, but that code largely exists already and is probably not worth |
209 |
| -trying to remove. I looked at the possibility of running in a shorter- |
210 |
| -lived context (such as a context that gets reset per-tuple), but this |
211 |
| -seems fairly impractical. The biggest problem with it is that code in |
212 |
| -the index access routines, as well as some other complex algorithms like |
213 |
| -tuplesort.c, assumes that palloc'd storage will live across tuples. |
214 |
| -For example, rtree uses a palloc'd state stack to keep track of an index |
215 |
| -scan. |
| 221 | +execution code, will normally run in a context that is created by |
| 222 | +ExecutorStart and destroyed by ExecutorEnd; this context also holds the |
| 223 | +"plan state" tree built during ExecutorStart. Most of the memory |
| 224 | +allocated in these routines is intended to live until end of query, |
| 225 | +so this is appropriate for those purposes. The executor's top context |
| 226 | +is a child of PortalContext, that is, the per-portal context of the |
| 227 | +portal that represents the query's execution. |
216 | 228 |
|
217 | 229 | The main improvement needed in the executor is that expression evaluation
|
218 | 230 | --- both for qual testing and for computation of targetlist entries ---
|
@@ -277,7 +289,7 @@ be released on error. Currently it does that through a "portal",
|
277 | 289 | which is essentially a child context of TopMemoryContext. While that
|
278 | 290 | way still works, it's ugly since xact abort needs special processing
|
279 | 291 | to delete the portal. Better would be to use a context that's a child
|
280 |
| -of QueryContext and hence is certain to go away as part of normal |
| 292 | +of PortalContext and hence is certain to go away as part of normal |
281 | 293 | processing. (Eventually we might have an even better solution from
|
282 | 294 | nested transactions, but this'll do fine for now.)
|
283 | 295 |
|
@@ -371,12 +383,14 @@ the relcache's per-relation contexts).
|
371 | 383 | Also, it will be possible to specify a minimum context size. If this
|
372 | 384 | value is greater than zero then a block of that size will be grabbed
|
373 | 385 | immediately upon context creation, and cleared but not released during
|
374 |
| -context resets. This feature is needed for ErrorContext (see above). |
375 |
| -It is also useful for per-tuple contexts, which will be reset frequently |
376 |
| -and typically will not allocate very much space per tuple cycle. We can |
377 |
| -save a lot of unnecessary malloc traffic if these contexts hang onto one |
378 |
| -allocation block rather than releasing and reacquiring the block on |
379 |
| -each tuple cycle. |
| 386 | +context resets. This feature is needed for ErrorContext (see above), |
| 387 | +but will most likely not be used for other contexts. |
| 388 | + |
| 389 | +We expect that per-tuple contexts will be reset frequently and typically |
| 390 | +will not allocate very much space per tuple cycle. To make this usage |
| 391 | +pattern cheap, the first block allocated in a context is not given |
| 392 | +back to malloc() during reset, but just cleared. This avoids malloc |
| 393 | +thrashing. |
380 | 394 |
|
381 | 395 |
|
382 | 396 | Other notes
|
|
0 commit comments