Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 3db72eb

Browse files
committed
Generate code for query jumbling through gen_node_support.pl
This commit changes the query jumbling code in queryjumblefuncs.c to be generated automatically based on the information of the nodes in the headers of src/include/nodes/ by using gen_node_support.pl. This approach offers many advantages: - Support for query jumbling for all the utility statements, based on the state of their parsed Nodes and not only their query string. This will greatly ease the switch to normalize the information of some DDLs, like SET or CALL for example (this is left unchanged and should be part of a separate discussion). With this feature, the number of entries stored for utilities in pg_stat_statements is reduced (for example now "CHECKPOINT" and "checkpoint" mean the same thing with the same query ID). - Documentation of query jumbling directly in the structure definition of the nodes. Since this code has been introduced in pg_stat_statements and then moved to code, the reasons behind the choices of what should be included in the jumble are rather sparse. Note that some explanation is added for the most relevant parts, as a start. - Overall code reduction and more consistency with the other parts generating read, write and copy depending on the nodes. The query jumbling is controlled by a couple of new node attributes, documented in nodes/nodes.h: - custom_query_jumble, to mark a Node as having a custom implementation. - no_query_jumble, to ignore entirely a Node. - query_jumble_ignore, to ignore a field in a Node. - query_jumble_location, to mark a location in a Node, for normalization. This can apply only to int fields, with "location" in their name (only Const as of this commit). There should be no compatibility impact on pg_stat_statements, as the new code applies the jumbling to the same fields for each node (its regression tests have no modification, for one). Some benchmark of the query jumbling between HEAD and this commit for SELECT and DMLs has proved that this new code does not cause a performance regression, with computation times close for both methods. For utility queries, the new method is slower than the previous method of calculating a hash of the query string, though we are talking about extra ns-level changes based on what I measured, which is unnoticeable even for OLTP workloads as a query ID is calculated once per query post-parse analysis. Author: Michael Paquier Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz
1 parent 8c1cd72 commit 3db72eb

File tree

10 files changed

+503
-832
lines changed

10 files changed

+503
-832
lines changed

contrib/pg_stat_statements/expected/pg_stat_statements.out

+2-1
Original file line numberDiff line numberDiff line change
@@ -571,8 +571,9 @@ DROP TABLE test \;
571571
DROP TABLE IF EXISTS test \;
572572
DROP FUNCTION PLUS_ONE(INTEGER);
573573
NOTICE: table "test" does not exist, skipping
574+
-- This DROP query uses two different strings, still they count as one entry.
574575
DROP TABLE IF EXISTS test \;
575-
DROP TABLE IF EXISTS test \;
576+
Drop Table If Exists test \;
576577
DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER);
577578
NOTICE: table "test" does not exist, skipping
578579
NOTICE: table "test" does not exist, skipping

contrib/pg_stat_statements/sql/pg_stat_statements.sql

+2-1
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,9 @@ CREATE INDEX test_b ON test(b);
265265
DROP TABLE test \;
266266
DROP TABLE IF EXISTS test \;
267267
DROP FUNCTION PLUS_ONE(INTEGER);
268+
-- This DROP query uses two different strings, still they count as one entry.
268269
DROP TABLE IF EXISTS test \;
269-
DROP TABLE IF EXISTS test \;
270+
Drop Table If Exists test \;
270271
DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER);
271272
DROP FUNCTION PLUS_TWO(INTEGER);
272273

src/backend/nodes/README

+1
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ FILES IN THIS DIRECTORY (src/backend/nodes/)
5151
readfuncs.c - convert text representation back to a node tree (*)
5252
makefuncs.c - creator functions for some common node types
5353
nodeFuncs.c - some other general-purpose manipulation functions
54+
queryjumblefuncs.c - compute a node tree for query jumbling (*)
5455

5556
(*) - Most functions in these files are generated by
5657
gen_node_support.pl and #include'd there.

src/backend/nodes/gen_node_support.pl

+113-1
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,8 @@ sub elem
121121
my @no_copy;
122122
# node types we don't want equal support for
123123
my @no_equal;
124+
# node types we don't want jumble support for
125+
my @no_query_jumble;
124126
# node types we don't want read support for
125127
my @no_read;
126128
# node types we don't want read/write support for
@@ -155,12 +157,13 @@ sub elem
155157
# This is a regular node, but we skip parsing it from its header file
156158
# since we won't use its internal structure here anyway.
157159
push @node_types, qw(List);
158-
# Lists are specially treated in all four support files, too.
160+
# Lists are specially treated in all five support files, too.
159161
# (Ideally we'd mark List as "special copy/equal" not "no copy/equal".
160162
# But until there's other use-cases for that, just hot-wire the tests
161163
# that would need to distinguish.)
162164
push @no_copy, qw(List);
163165
push @no_equal, qw(List);
166+
push @no_query_jumble, qw(List);
164167
push @special_read_write, qw(List);
165168

166169
# Nodes with custom copy/equal implementations are skipped from
@@ -170,6 +173,9 @@ sub elem
170173
# Similarly for custom read/write implementations.
171174
my @custom_read_write;
172175

176+
# Similarly for custom query jumble implementation.
177+
my @custom_query_jumble;
178+
173179
# Track node types with manually assigned NodeTag numbers.
174180
my %manual_nodetag_number;
175181

@@ -319,6 +325,10 @@ sub elem
319325
{
320326
push @custom_read_write, $in_struct;
321327
}
328+
elsif ($attr eq 'custom_query_jumble')
329+
{
330+
push @custom_query_jumble, $in_struct;
331+
}
322332
elsif ($attr eq 'no_copy')
323333
{
324334
push @no_copy, $in_struct;
@@ -332,6 +342,10 @@ sub elem
332342
push @no_copy, $in_struct;
333343
push @no_equal, $in_struct;
334344
}
345+
elsif ($attr eq 'no_query_jumble')
346+
{
347+
push @no_query_jumble, $in_struct;
348+
}
335349
elsif ($attr eq 'no_read')
336350
{
337351
push @no_read, $in_struct;
@@ -457,6 +471,8 @@ sub elem
457471
equal_as_scalar
458472
equal_ignore
459473
equal_ignore_if_zero
474+
query_jumble_ignore
475+
query_jumble_location
460476
read_write_ignore
461477
write_only_relids
462478
write_only_nondefault_pathtarget
@@ -1225,6 +1241,102 @@ sub elem
12251241
close $rfs;
12261242

12271243

1244+
# queryjumblefuncs.c
1245+
1246+
push @output_files, 'queryjumblefuncs.funcs.c';
1247+
open my $jff, '>', "$output_path/queryjumblefuncs.funcs.c$tmpext" or die $!;
1248+
push @output_files, 'queryjumblefuncs.switch.c';
1249+
open my $jfs, '>', "$output_path/queryjumblefuncs.switch.c$tmpext" or die $!;
1250+
1251+
printf $jff $header_comment, 'queryjumblefuncs.funcs.c';
1252+
printf $jfs $header_comment, 'queryjumblefuncs.switch.c';
1253+
1254+
print $jff $node_includes;
1255+
1256+
foreach my $n (@node_types)
1257+
{
1258+
next if elem $n, @abstract_types;
1259+
next if elem $n, @nodetag_only;
1260+
my $struct_no_query_jumble = (elem $n, @no_query_jumble);
1261+
1262+
print $jfs "\t\t\tcase T_${n}:\n"
1263+
. "\t\t\t\t_jumble${n}(jstate, expr);\n"
1264+
. "\t\t\t\tbreak;\n"
1265+
unless $struct_no_query_jumble;
1266+
1267+
next if elem $n, @custom_query_jumble;
1268+
1269+
print $jff "
1270+
static void
1271+
_jumble${n}(JumbleState *jstate, Node *node)
1272+
{
1273+
\t${n} *expr = (${n} *) node;\n
1274+
" unless $struct_no_query_jumble;
1275+
1276+
# print instructions for each field
1277+
foreach my $f (@{ $node_type_info{$n}->{fields} })
1278+
{
1279+
my $t = $node_type_info{$n}->{field_types}{$f};
1280+
my @a = @{ $node_type_info{$n}->{field_attrs}{$f} };
1281+
my $query_jumble_ignore = $struct_no_query_jumble;
1282+
my $query_jumble_location = 0;
1283+
1284+
# extract per-field attributes
1285+
foreach my $a (@a)
1286+
{
1287+
if ($a eq 'query_jumble_ignore')
1288+
{
1289+
$query_jumble_ignore = 1;
1290+
}
1291+
elsif ($a eq 'query_jumble_location')
1292+
{
1293+
$query_jumble_location = 1;
1294+
}
1295+
}
1296+
1297+
# node type
1298+
if (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
1299+
and elem $1, @node_types)
1300+
{
1301+
print $jff "\tJUMBLE_NODE($f);\n"
1302+
unless $query_jumble_ignore;
1303+
}
1304+
elsif ($t eq 'int' && $f =~ 'location$')
1305+
{
1306+
# Track the node's location only if directly requested.
1307+
if ($query_jumble_location)
1308+
{
1309+
print $jff "\tJUMBLE_LOCATION($f);\n"
1310+
unless $query_jumble_ignore;
1311+
}
1312+
}
1313+
elsif ($t eq 'char*')
1314+
{
1315+
print $jff "\tJUMBLE_STRING($f);\n"
1316+
unless $query_jumble_ignore;
1317+
}
1318+
else
1319+
{
1320+
print $jff "\tJUMBLE_FIELD($f);\n"
1321+
unless $query_jumble_ignore;
1322+
}
1323+
}
1324+
1325+
# Some nodes have no attributes like CheckPointStmt,
1326+
# so tweak things for empty contents.
1327+
if (scalar(@{ $node_type_info{$n}->{fields} }) == 0)
1328+
{
1329+
print $jff "\t(void) expr;\n"
1330+
unless $struct_no_query_jumble;
1331+
}
1332+
1333+
print $jff "}
1334+
" unless $struct_no_query_jumble;
1335+
}
1336+
1337+
close $jff;
1338+
close $jfs;
1339+
12281340
# now rename the temporary files to their final names
12291341
foreach my $file (@output_files)
12301342
{

src/backend/nodes/meson.build

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ backend_sources += files(
1010
'nodes.c',
1111
'params.c',
1212
'print.c',
13-
'queryjumblefuncs.c',
1413
'read.c',
1514
'tidbitmap.c',
1615
'value.c',
@@ -21,6 +20,7 @@ backend_sources += files(
2120
nodefunc_sources = files(
2221
'copyfuncs.c',
2322
'equalfuncs.c',
23+
'queryjumblefuncs.c',
2424
'outfuncs.c',
2525
'readfuncs.c',
2626
)

0 commit comments

Comments
 (0)