Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 00b0e72

Browse files
committed
ecpg: clean up documentation of parse.pl, and add more input checking.
README.parser is the user's manual, such as it is, for parse.pl. It's rather poorly written if you ask me; so try to improve it. (More could be written here, but this at least covers the same info in a more organized fashion.) Also, the single solitary line of usage info in parse.pl itself was a lie. Replace. Add some error checks that the ecpg.addons entries meet the syntax rules set forth in README.parser. One of them didn't, but accidentally worked anyway because the logic in include_addon is such that 'block' is the default behavior. Also add a cross-check that each ecpg.addons entry is matched exactly once in the backend grammar. This exposed that there are two dead entries there --- they are dead because the %replace_types table in parse.pl causes their nonterminals to be ignored altogether. Removing them doesn't change the generated preproc.y file. (This implies that check_rules.pl is completely worthless and should be nuked: it adds build cycles and maintenance effort while failing to reliably accomplish its one job of detecting dead rules. I'll do that separately.) Discussion: https://postgr.es/m/2011420.1713493114@sss.pgh.pa.us
1 parent 7be4ba4 commit 00b0e72

File tree

3 files changed

+123
-65
lines changed

3 files changed

+123
-65
lines changed
Lines changed: 77 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,77 @@
1-
ECPG modifies and extends the core grammar in a way that
2-
1) every token in ECPG is <str> type. New tokens are
3-
defined in ecpg.tokens, types are defined in ecpg.type
4-
2) most tokens from the core grammar are simply converted
5-
to literals concatenated together to form the SQL string
6-
passed to the server, this is done by parse.pl.
7-
3) some rules need side-effects, actions are either added
8-
or completely overridden (compared to the basic token
9-
concatenation) for them, these are defined in ecpg.addons,
10-
the rules for ecpg.addons are explained below.
11-
4) new grammar rules are needed for ECPG metacommands.
12-
These are in ecpg.trailer.
13-
5) ecpg.header contains common functions, etc. used by
14-
actions for grammar rules.
15-
16-
In "ecpg.addons", every modified rule follows this pattern:
17-
ECPG: dumpedtokens postfix
18-
where "dumpedtokens" is simply tokens from core gram.y's
19-
rules concatenated together. e.g. if gram.y has this:
20-
ruleA: tokenA tokenB tokenC {...}
21-
then "dumpedtokens" is "ruleAtokenAtokenBtokenC".
22-
"postfix" above can be:
23-
a) "block" - the automatic rule created by parse.pl is completely
24-
overridden, the code block has to be written completely as
25-
it were in a plain bison grammar
26-
b) "rule" - the automatic rule is extended on, so new syntaxes
27-
are accepted for "ruleA". E.g.:
28-
ECPG: ruleAtokenAtokenBtokenC rule
29-
| tokenD tokenE { action_code; }
30-
...
31-
It will be substituted with:
32-
ruleA: <original syntax forms and actions up to and including
33-
"tokenA tokenB tokenC">
34-
| tokenD tokenE { action_code; }
35-
...
36-
c) "addon" - the automatic action for the rule (SQL syntax constructed
37-
from the tokens concatenated together) is prepended with a new
38-
action code part. This code part is written as is's already inside
39-
the { ... }
40-
41-
Multiple "addon" or "block" lines may appear together with the
42-
new code block if the code block is common for those rules.
1+
ECPG's grammar (preproc.y) is built by parse.pl from the
2+
backend's grammar (gram.y) plus various add-on rules.
3+
Some notes:
4+
5+
1) Most input matching core grammar productions is simply converted
6+
to strings and concatenated together to form the SQL string
7+
passed to the server. parse.pl can automatically build the
8+
grammar actions needed to do this.
9+
2) Some grammar rules need special actions that are added to or
10+
completely override the default token-concatenation behavior.
11+
This is controlled by ecpg.addons as explained below.
12+
3) Additional grammar rules are needed for ECPG's own commands.
13+
These are in ecpg.trailer, as is the "epilogue" part of preproc.y.
14+
4) ecpg.header contains the "prologue" part of preproc.y, including
15+
support functions, Bison options, etc.
16+
5) Additional terminals added by ECPG must be defined in ecpg.tokens.
17+
Additional nonterminals added by ECPG must be defined in ecpg.type.
18+
19+
ecpg.header, ecpg.tokens, ecpg.type, and ecpg.trailer are just
20+
copied verbatim into preproc.y at appropriate points.
21+
22+
ecpg.addons contains entries that begin with a line like
23+
ECPG: concattokens ruletype
24+
and typically have one or more following lines that are the code
25+
for a grammar action. Any line not starting with "ECPG:" is taken
26+
to be part of the code block for the preceding "ECPG:" line.
27+
28+
"concattokens" identifies which gram.y production this entry affects.
29+
It is simply the target nonterminal and the tokens from the gram.y rule
30+
concatenated together. For example, to modify the action for a gram.y
31+
rule like this:
32+
target: tokenA tokenB tokenC {...}
33+
"concattokens" would be "targettokenAtokenBtokenC". If we want to
34+
modify a non-first alternative for a nonterminal, we still write the
35+
nonterminal. For example, "concattokens" should be "targettokenDtokenE"
36+
to affect the second alternative in:
37+
target: tokenA tokenB tokenC {...}
38+
| tokenD tokenE {...}
39+
40+
"ruletype" is one of:
41+
42+
a) "block" - the automatic action that parse.pl would create is
43+
completely overridden. Instead the entry's code block is emitted.
44+
The code block must include the braces ({}) needed for a Bison action.
45+
46+
b) "addon" - the entry's code block is inserted into the generated
47+
action, ahead of the automatic token-concatenation code.
48+
In this case the code block need not contain braces, since
49+
it will be inserted within braces.
50+
51+
c) "rule" - the automatic action is emitted, but then the entry's
52+
code block is added verbatim afterwards. This typically is
53+
used to add new alternatives to a nonterminal of the core grammar.
54+
For example, given the entry:
55+
ECPG: targettokenAtokenBtokenC rule
56+
| tokenD tokenE { custom_action; }
57+
what will be emitted is
58+
target: tokenA tokenB tokenC { automatic_action; }
59+
| tokenD tokenE { custom_action; }
60+
61+
Multiple "ECPG:" entries can share the same code block, if the
62+
same action is needed for all. When an "ECPG:" line is immediately
63+
followed by another one, it is not assigned an empty code block;
64+
rather the next nonempty code block is assumed to apply to all
65+
immediately preceding "ECPG:" entries.
66+
67+
In addition to the modifications specified by ecpg.addons,
68+
parse.pl contains some tables that list backend grammar
69+
productions to be ignored or modified.
70+
71+
Nonterminals that construct strings (as described above) should be
72+
given <str> type, which is parse.pl's default assumption for
73+
nonterminals found in gram.y. That can be overridden at need by
74+
making an entry in parse.pl's %replace_types table. %replace_types
75+
can also be used to suppress output of a nonterminal's rules
76+
altogether (in which case ecpg.trailer had better provide replacement
77+
rules, since the nonterminal will still be referred to elsewhere).

src/interfaces/ecpg/preproc/ecpg.addons

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -497,7 +497,7 @@ ECPG: opt_array_boundsopt_array_bounds'['']' block
497497
$$.index2 = mm_strdup($3);
498498
$$.str = cat_str(4, $1.str, mm_strdup("["), $3, mm_strdup("]"));
499499
}
500-
ECPG: opt_array_bounds
500+
ECPG: opt_array_bounds block
501501
{
502502
$$.index1 = mm_strdup("-1");
503503
$$.index2 = mm_strdup("-1");
@@ -510,15 +510,6 @@ ECPG: IconstICONST block
510510
ECPG: AexprConstNULL_P rule
511511
| civar { $$ = $1; }
512512
| civarind { $$ = $1; }
513-
ECPG: ColIdcol_name_keyword rule
514-
| ECPGKeywords { $$ = $1; }
515-
| ECPGCKeywords { $$ = $1; }
516-
| CHAR_P { $$ = mm_strdup("char"); }
517-
| VALUES { $$ = mm_strdup("values"); }
518-
ECPG: type_function_nametype_func_name_keyword rule
519-
| ECPGKeywords { $$ = $1; }
520-
| ECPGTypeName { $$ = $1; }
521-
| ECPGCKeywords { $$ = $1; }
522513
ECPG: VariableShowStmtSHOWALL block
523514
{
524515
mmerror(PARSE_ERROR, ET_ERROR, "SHOW ALL is not implemented");

src/interfaces/ecpg/preproc/parse.pl

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
#!/usr/bin/perl
22
# src/interfaces/ecpg/preproc/parse.pl
3-
# parser generator for ecpg version 2
4-
# call with backend parser as stdin
3+
# parser generator for ecpg
4+
#
5+
# See README.parser for some explanation of what this does.
6+
#
7+
# Command-line options:
8+
# --srcdir: where to find ecpg-provided input files (default ".")
9+
# --parser: the backend gram.y file to read (required, no default)
10+
# --output: where to write preproc.y (required, no default)
511
#
612
# Copyright (c) 2007-2024, PostgreSQL Global Development Group
713
#
@@ -148,6 +154,14 @@
148154

149155
close($parserfh);
150156

157+
# Cross-check that we don't have dead or ambiguous addon rules.
158+
foreach (keys %addons)
159+
{
160+
die "addon rule $_ was never used\n" if $addons{$_}{used} == 0;
161+
die "addon rule $_ was matched multiple times\n" if $addons{$_}{used} > 1;
162+
}
163+
164+
151165
sub main
152166
{
153167
line: while (<$parserfh>)
@@ -487,7 +501,10 @@ sub include_addon
487501
my $rec = $addons{$block};
488502
return 0 unless $rec;
489503

490-
my $rectype = (defined $rec->{type}) ? $rec->{type} : '';
504+
# Track usage for later cross-check
505+
$rec->{used}++;
506+
507+
my $rectype = $rec->{type};
491508
if ($rectype eq 'rule')
492509
{
493510
dump_fields($stmt_mode, $fields, ' { ');
@@ -668,10 +685,10 @@ sub dump_line
668685
}
669686

670687
=top
671-
load addons into cache
688+
load ecpg.addons into %addons hash. The result is something like
672689
%addons = {
673-
stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ] },
674-
stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ] }
690+
stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ], 'used' => 0 },
691+
stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ], 'used' => 0 }
675692
}
676693
677694
=cut
@@ -681,17 +698,25 @@ sub preload_addons
681698
my $filename = $srcdir . "/ecpg.addons";
682699
open(my $fh, '<', $filename) or die;
683700

684-
# there may be multiple lines starting ECPG: and then multiple lines of code.
685-
# the code need to be add to all prior ECPG records.
686-
my (@needsRules, @code, $record);
701+
# There may be multiple "ECPG:" lines and then multiple lines of code.
702+
# The block of code needs to be added to each of the consecutively-
703+
# preceding "ECPG:" records.
704+
my (@needsRules, @code);
687705

688-
# there may be comments before the first ECPG line, skip them
706+
# there may be comments before the first "ECPG:" line, skip them
689707
my $skip = 1;
690708
while (<$fh>)
691709
{
692-
if (/^ECPG:\s(\S+)\s?(\w+)?/)
710+
if (/^ECPG:\s+(\S+)\s+(\w+)\s*$/)
693711
{
712+
# Found an "ECPG:" line, so we're done skipping the header
694713
$skip = 0;
714+
# Validate record type and target
715+
die "invalid record type $2 in addon rule for $1\n"
716+
unless ($2 eq 'block' or $2 eq 'addon' or $2 eq 'rule');
717+
die "duplicate addon rule for $1\n" if (exists $addons{$1});
718+
# If we had some preceding code lines, attach them to all
719+
# as-yet-unfinished records.
695720
if (@code)
696721
{
697722
for my $x (@needsRules)
@@ -701,20 +726,27 @@ sub preload_addons
701726
@code = ();
702727
@needsRules = ();
703728
}
704-
$record = {};
729+
my $record = {};
705730
$record->{type} = $2;
706731
$record->{lines} = [];
707-
if (exists $addons{$1}) { die "Ga! there are dups!\n"; }
732+
$record->{used} = 0;
708733
$addons{$1} = $record;
709734
push(@needsRules, $record);
710735
}
736+
elsif (/^ECPG:/)
737+
{
738+
# Complain if preceding regex failed to match
739+
die "incorrect syntax in ECPG line: $_\n";
740+
}
711741
else
712742
{
743+
# Non-ECPG line: add to @code unless we're still skipping
713744
next if $skip;
714745
push(@code, $_);
715746
}
716747
}
717748
close($fh);
749+
# Deal with final code block
718750
if (@code)
719751
{
720752
for my $x (@needsRules)

0 commit comments

Comments
 (0)