1
- Proposal for function-manager redesign 24-May -2000
1
+ Proposal for function-manager redesign 19-Nov -2000
2
2
--------------------------------------
3
3
4
4
We know that the existing mechanism for calling Postgres functions needs
@@ -24,10 +24,6 @@ can be done on an incremental file-by-file basis --- we won't need a
24
24
written in the old style can be left in place indefinitely, to provide
25
25
backward compatibility for user-written C functions.
26
26
27
- Note that neither the old function manager nor the redesign are intended
28
- to handle functions that accept or return sets. Those sorts of functions
29
- need to be handled by special querytree structures.
30
-
31
27
32
28
Changes in pg_proc (system data about a function)
33
29
-------------------------------------------------
@@ -37,7 +33,8 @@ This is a boolean value which will be TRUE if the function is "strict",
37
33
that is it always returns NULL when any of its inputs are NULL. The
38
34
function manager will check this field and skip calling the function when
39
35
it's TRUE and there are NULL inputs. This allows us to remove explicit
40
- NULL-value tests from many functions that currently need them. A function
36
+ NULL-value tests from many functions that currently need them (not to
37
+ mention fixing many more that need them but don't have them). A function
41
38
that is not marked "strict" is responsible for checking whether its inputs
42
39
are NULL or not. Most builtin functions will be marked "strict".
43
40
@@ -67,7 +64,9 @@ typedef struct
67
64
Oid fn_oid; /* OID of function (NOT of handler, if any) */
68
65
short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */
69
66
bool fn_strict; /* function is "strict" (NULL in => NULL out) */
67
+ bool fn_retset; /* function returns a set (over multiple calls) */
70
68
void *fn_extra; /* extra space for use by handler */
69
+ MemoryContext fn_mcxt; /* memory context to store fn_extra in */
71
70
} FmgrInfo;
72
71
73
72
For an ordinary built-in function, fn_addr is just the address of the C
@@ -79,8 +78,9 @@ to denote a not-yet-initialized FmgrInfo struct. fn_extra will always
79
78
be NULL when an FmgrInfo is first filled by the function lookup code, but
80
79
a function handler could set it to avoid making repeated lookups of its
81
80
own when the same FmgrInfo is used repeatedly during a query.) fn_nargs
82
- is the number of arguments expected by the function, and fn_strict is
83
- its strictness flag.
81
+ is the number of arguments expected by the function, fn_strict is its
82
+ strictness flag, and fn_retset shows whether it returns a set; all of
83
+ these values come from the function's pg_proc entry.
84
84
85
85
FmgrInfo already exists in the current code, but has fewer fields. This
86
86
change should be transparent at the source-code level.
@@ -109,15 +109,17 @@ context is NULL for an "ordinary" function call, but may point to additional
109
109
info when the function is called in certain contexts. (For example, the
110
110
trigger manager will pass information about the current trigger event here.)
111
111
If context is used, it should point to some subtype of Node; the particular
112
- kind of context can then be indicated by the node type field. (A callee
113
- should always check the node type before assuming it knows what kind of
114
- context is being passed.) fmgr itself puts no other restrictions on the use
115
- of this field.
112
+ kind of context is indicated by the node type field. (A callee should
113
+ always check the node type before assuming it knows what kind of context is
114
+ being passed.) fmgr itself puts no other restrictions on the use of this
115
+ field.
116
116
117
117
resultinfo is NULL when calling any function from which a simple Datum
118
118
result is expected. It may point to some subtype of Node if the function
119
- returns more than a Datum. Like the context field, resultinfo is a hook
120
- for expansion; fmgr itself doesn't constrain the use of the field.
119
+ returns more than a Datum. (For example, resultinfo is used when calling a
120
+ function that returns a set, as discussed below.) Like the context field,
121
+ resultinfo is a hook for expansion; fmgr itself doesn't constrain the use
122
+ of the field.
121
123
122
124
nargs, arg[], and argnull[] hold the arguments being passed to the function.
123
125
Notice that all the arguments passed to a function (as well as its result
@@ -257,27 +259,15 @@ types. Modules or header files that define specialized SQL datatypes
257
259
(eg, timestamp) should define appropriate macros for those types, so that
258
260
functions manipulating the types can be coded in the standard style.
259
261
260
- For non-primitive data types (particularly variable-length types) it
261
- probably won't be very practical to hide the pass-by-reference nature of
262
- the data type, so the PG_GETARG and PG_RETURN macros for those types
263
- probably won't do more than DatumGetPointer/PointerGetDatum plus the
264
- appropriate typecast . Functions returning such types will need to
265
- palloc() their result space explicitly. I recommend naming the GETARG
266
- and RETURN macros for such types to end in "_P", as a reminder that they
262
+ For non-primitive data types (particularly variable-length types) it won't
263
+ be very practical to hide the pass-by-reference nature of the data type,
264
+ so the PG_GETARG and PG_RETURN macros for those types won't do much more
265
+ than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see
266
+ TOAST discussion, below) . Functions returning such types will need to
267
+ palloc() their result space explicitly. I recommend naming the GETARG and
268
+ RETURN macros for such types to end in "_P", as a reminder that they
267
269
produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *".
268
270
269
- For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
270
- data value. There might be a few cases where the still-toasted value is
271
- wanted, but I am having a hard time coming up with examples. For the
272
- moment I'd say that any such code could use a lower-level macro that is
273
- just ((struct varlena *) DatumGetPointer(fcinfo->arg[n])).
274
-
275
- Note: the above examples assume that arguments will be counted starting at
276
- zero. We could have the ARG macros subtract one from the argument number,
277
- so that arguments are counted starting at one. I'm not sure if that would be
278
- more or less confusing. Does anyone have a strong feeling either way about
279
- it?
280
-
281
271
When a function needs to access fcinfo->flinfo or one of the other auxiliary
282
272
fields of FunctionCallInfo, it should just do it. I doubt that providing
283
273
syntactic-sugar macros for these cases is useful.
@@ -319,10 +309,6 @@ that this style of coding cannot pass a NULL input value nor cope with
319
309
a NULL result (it couldn't before, either!). We can make the helper
320
310
routines elog an error if they see that the function returns a NULL.
321
311
322
- (Note: direct calls like this will have to be changed at the same time
323
- that their called routines are changed to the new style. But that will
324
- still be a lot less of a constraint than a "big bang" conversion.)
325
-
326
312
When invoking a function that has a known argument signature, we have
327
313
usually written either
328
314
result = fmgr(targetfuncOid, ... args ... );
@@ -349,6 +335,68 @@ have to change in the first step of implementation, but they can
349
335
continue to support the same external appearance.
350
336
351
337
338
+ Support for TOAST-able data types
339
+ ---------------------------------
340
+
341
+ For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
342
+ data value. There might be a few cases where the still-toasted value is
343
+ wanted, but the vast majority of cases want the de-toasted result, so
344
+ that will be the default. To get the argument value without causing
345
+ de-toasting, use PG_GETARG_RAW_VARLENA_P(n).
346
+
347
+ Some functions require a modifiable copy of their input values. In these
348
+ cases, it's silly to do an extra copy step if we copied the data anyway
349
+ to de-TOAST it. Therefore, each toastable datatype has an additional
350
+ fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a
351
+ guaranteed-fresh copy, combining this with the detoasting step if possible.
352
+
353
+ There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given
354
+ pointer if and only if it is different from the original value of the n'th
355
+ argument. This can be used to free the de-toasted value of the n'th
356
+ argument, if it was actually de-toasted. Currently, doing this is not
357
+ necessary for the majority of functions because the core backend code
358
+ releases temporary space periodically, so that memory leaked in function
359
+ execution isn't a big problem. However, as of 7.1 memory leaks in
360
+ functions that are called by index searches will not be cleaned up until
361
+ end of transaction. Therefore, functions that are listed in pg_amop or
362
+ pg_amproc should be careful not to leak detoasted copies, and so these
363
+ functions do need to use PG_FREE_IF_COPY() for toastable inputs.
364
+
365
+ A function should never try to re-TOAST its result value; it should just
366
+ deliver an untoasted result that's been palloc'd in the current memory
367
+ context. When and if the value is actually stored into a tuple, the
368
+ tuple toaster will decide whether toasting is needed.
369
+
370
+
371
+ Functions accepting or returning sets
372
+ -------------------------------------
373
+
374
+ As of 7.1, Postgres has limited support for functions returning sets;
375
+ this is presently handled only in SELECT output expressions, and the
376
+ behavior is to generate a separate output tuple for each set element.
377
+ There is no direct support for functions accepting sets; instead, the
378
+ function will be called multiple times, once for each element of the
379
+ input set. This behavior will very likely be changed in future releases,
380
+ but here is how it works now:
381
+
382
+ If a function is marked in pg_proc as returning a set, then it is called
383
+ with fcinfo->resultinfo pointing to a node of type ReturnSetInfo. A
384
+ function that desires to return a set should raise an error "called in
385
+ context that does not accept a set result" if resultinfo is NULL or does
386
+ not point to a ReturnSetInfo node. ReturnSetInfo contains a single field
387
+ "isDone", which should be set to one of these values:
388
+
389
+ ExprSingleResult /* expression does not return a set */
390
+ ExprMultipleResult /* this result is an element of a set */
391
+ ExprEndResult /* there are no more elements in the set */
392
+
393
+ A function returning set returns one set element per call, setting
394
+ fcinfo->resultinfo->isDone to ExprMultipleResult for each element.
395
+ After all elements have been returned, the next call should set
396
+ isDone to ExprEndResult and return a null result. (Note it is possible
397
+ to return an empty set by doing this on the first call.)
398
+
399
+
352
400
Notes about function handlers
353
401
-----------------------------
354
402
@@ -361,49 +409,91 @@ function is invoked many times. (fn_extra can only be used as a hint,
361
409
since callers are not required to re-use an FmgrInfo struct.
362
410
But in performance-critical paths they normally will do so.)
363
411
364
- Issue: in what context should a handler allocate memory that it intends
365
- to use for fn_extra data? The current palloc context when the handler
366
- is actually called might be considerably shorter-lived than the FmgrInfo
367
- struct, which would lead to dangling-pointer problems at the next use
368
- of the FmgrInfo. Perhaps FmgrInfo should also store a memory context
369
- identifier that the handler could use to allocate space of the right
370
- lifespan. (Having fmgr_info initialize this to CurrentMemoryContext
371
- should work in nearly all cases, though a few places might have to
372
- set it differently.) At the moment I have not done this, since the
373
- existing PL handlers only need to set fn_extra to point at long-lived
374
- structures (data in their own caches) and don't really care which
375
- context the FmgrInfo is in anyway.
376
-
377
- Are there any other things needed by the call handlers for PL/pgsql and
378
- other languages?
379
-
380
- During the conversion process, support for old-style builtin functions
381
- and old-style user-written C functions will be provided by appropriate
382
- function handlers. For example, the handler for old-style builtins
383
- looks roughly like fmgr_c() used to.
384
-
385
-
386
- System table updates
387
- --------------------
388
-
389
- In the initial phase, two new entries will be added to pg_language
390
- for language types "newinternal" and "newC", corresponding to
391
- builtin and dynamically-loaded functions having the new calling
392
- convention.
393
-
394
- There will also be a change to pg_proc to add the new "proisstrict"
395
- column.
396
-
397
- Then pg_proc entries will be changed from language code "internal" to
398
- "newinternal" piecemeal, as the associated routines are rewritten.
399
- (This will imply several rounds of forced initdbs as the contents of
400
- pg_proc change, but I think we can live with that.)
401
-
402
- The old language names "internal" and "C" will continue to refer to
403
- functions with the old calling convention. We should deprecate
404
- old-style functions because of their portability problems, but the
405
- support for them will only be one small function handler routine,
406
- so we can leave them in place for as long as necessary.
407
-
408
- The expected calling convention for PL call handlers will need to change
409
- all-at-once, but fortunately there are not very many of them to fix.
412
+ If the handler wants to allocate memory to hold fn_extra data, it should
413
+ NOT do so in CurrentMemoryContext, since the current context may well be
414
+ much shorter-lived than the context where the FmgrInfo is. Instead,
415
+ allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache
416
+ context. fn_mcxt normally points at the context that was
417
+ CurrentMemoryContext at the time the FmgrInfo structure was created;
418
+ in any case it is required to be a context at least as long-lived as the
419
+ FmgrInfo itself.
420
+
421
+
422
+ Telling the difference between old- and new-style functions
423
+ -----------------------------------------------------------
424
+
425
+ During the conversion process, we carried two different pg_language
426
+ entries, "internal" and "newinternal", for internal functions. The
427
+ function manager used the language code to distinguish which calling
428
+ convention to use. (Old-style internal functions were supported via
429
+ a function handler.) As of Nov. 2000, no old-style internal functions
430
+ remain, so we can drop support for them. We will remove the old "internal"
431
+ pg_language entry and rename "newinternal" to "internal".
432
+
433
+ The interim solution for dynamically-loaded compiled functions has been
434
+ similar: two pg_language entries "C" and "newC". This naming convention
435
+ is not desirable for the long run, and yet we cannot stop supporting
436
+ old-style user functions. Instead, it seems better to use just one
437
+ pg_language entry "C", and require the dynamically-loaded library to
438
+ provide additional information that identifies new-style functions.
439
+ This avoids compatibility problems --- for example, existing dump
440
+ scripts will identify PL language handlers as being in language "C",
441
+ which would be wrong under the "newC" convention. Also, this approach
442
+ should generalize more conveniently for future extensions to the function
443
+ interface specification.
444
+
445
+ Given a dynamically loaded function named "foo" (note that the name being
446
+ considered here is the link-symbol name, not the SQL-level function name),
447
+ the function manager will look for another function in the same dynamically
448
+ loaded library named "pg_finfo_foo". If this second function does not
449
+ exist, then foo is assumed to be called old-style, thus ensuring backwards
450
+ compatibility with existing libraries. If the info function does exist,
451
+ it is expected to have the signature
452
+
453
+ Pg_finfo_record * pg_finfo_foo (void);
454
+
455
+ The info function will be called by the fmgr, and must return a pointer
456
+ to a Pg_finfo_record struct. (The returned struct will typically be a
457
+ statically allocated constant in the dynamic-link library.) The current
458
+ definition of the struct is just
459
+
460
+ typedef struct {
461
+ int api_version;
462
+ } Pg_finfo_record;
463
+
464
+ where api_version is 0 to indicate old-style or 1 to indicate new-style
465
+ calling convention. In future releases, additional fields may be defined
466
+ after api_version, but these additional fields will only be used if
467
+ api_version is greater than 2.
468
+
469
+ These details will be hidden from the author of a dynamically loaded
470
+ function by using a macro. To define a new-style dynamically loaded
471
+ function named foo, write
472
+
473
+ PG_FUNCTION_INFO_V1(foo);
474
+
475
+ Datum
476
+ foo(PG_FUNCTION_ARGS)
477
+ {
478
+ ...
479
+ }
480
+
481
+ The function itself is written using the same conventions as for new-style
482
+ internal functions; you just need to add the PG_FUNCTION_INFO_V1() macro.
483
+ Note that old-style and new-style functions can be intermixed in the same
484
+ library, depending on whether or not you write a PG_FUNCTION_INFO_V1() for
485
+ each one.
486
+
487
+ The SQL declaration for a dynamically-loaded function is CREATE FUNCTION
488
+ foo ... LANGUAGE 'C' regardless of whether it is old- or new-style.
489
+
490
+ New-style dynamic functions will be invoked directly by fmgr, and will
491
+ therefore have the same performance as internal functions after the initial
492
+ pg_proc lookup overhead. Old-style dynamic functions will be invoked via
493
+ a handler, and will therefore have a small performance penalty.
494
+
495
+ To allow old-style dynamic functions to work safely on toastable datatypes,
496
+ the handler for old-style functions will automatically detoast toastable
497
+ arguments before passing them to the old-style function. A new-style
498
+ function is expected to take care of toasted arguments by using the
499
+ standard argument access macros defined above.
0 commit comments