Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit a542d56

Browse files
committed
ecpg: re-implement preprocessor's string management.
Most productions in the preprocessor grammar construct strings representing SQL or C statements or fragments thereof. Instead of returning these as <str> results of the productions, return them as "location" values, taking advantage of Bison's flexibility about what a location is. We aren't really giving up anything thereby, since ecpg's error reports have always just given line numbers, and that's tracked separately. The advantage of this is that a single instance of the YYLLOC_DEFAULT macro can perform all the work needed by the vast majority of productions, including all the ones made automatically by parse.pl. This avoids having large numbers of effectively-identical productions, which tickles an optimization inefficiency in recent versions of clang. (This patch reduces the compilation time for preproc.o by more than 100-fold with clang 16, and is visibly helpful with gcc too.) The compiled parser is noticeably smaller as well. A disadvantage of this approach is that YYLLOC_DEFAULT is applied before running the production's semantic action (if any). This means it cannot use the method favored by cat_str() of free'ing all the input strings; if the action needs to look at the input strings, it'd be looking at dangling storage. As this stands, therefore, it leaks memory like a sieve. This is already a big patch though, and fixing the memory management seems like a separable problem, so let's leave that for the next step. (This does remove some free() calls that I'd have had to touch anyway, in the expectation that the next step will manage memory reclamation quite differently.) Most of the changes here are mindless substitution of "@n" for "$N" in grammar rules; see the changes to README.parser for an explanation. Discussion: https://postgr.es/m/2011420.1713493114@sss.pgh.pa.us
1 parent 6b00549 commit a542d56

File tree

9 files changed

+752
-1216
lines changed

9 files changed

+752
-1216
lines changed

src/interfaces/ecpg/preproc/README.parser

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ Some notes:
44

55
1) Most input matching core grammar productions is simply converted
66
to strings and concatenated together to form the SQL string
7-
passed to the server. parse.pl can automatically build the
8-
grammar actions needed to do this.
7+
passed to the server. This is handled mostly automatically,
8+
as described below.
99
2) Some grammar rules need special actions that are added to or
1010
completely override the default token-concatenation behavior.
1111
This is controlled by ecpg.addons as explained below.
@@ -14,11 +14,31 @@ Some notes:
1414
4) ecpg.header contains the "prologue" part of preproc.y, including
1515
support functions, Bison options, etc.
1616
5) Additional terminals added by ECPG must be defined in ecpg.tokens.
17-
Additional nonterminals added by ECPG must be defined in ecpg.type.
17+
Additional nonterminals added by ECPG must be defined in ecpg.type,
18+
but only if they have non-void result type, which most don't.
1819

1920
ecpg.header, ecpg.tokens, ecpg.type, and ecpg.trailer are just
2021
copied verbatim into preproc.y at appropriate points.
2122

23+
24+
In the pre-v18 implementation of ecpg, the strings constructed
25+
by grammar rules were returned as the Bison result of each rule.
26+
This led to a large number of effectively-identical rule actions,
27+
which caused compilation-time problems with some versions of clang.
28+
Now, rules that need to return a string are declared as having
29+
void type (which in Bison means leaving out any %type declaration
30+
for them). Instead, we abuse Bison's "location tracking" mechanism
31+
to carry the string results, which allows a single YYLLOC_DEFAULT
32+
call to handle the standard token-concatenation behavior for the
33+
vast majority of the rules. Rules that don't need to do anything
34+
else can omit a semantic action altogether. Rules that need to
35+
construct an output string specially can do so, but they should
36+
assign it to "@$" rather than the usual "$$"; also, to reference
37+
the string value of the N'th input token, write "@N" not "$N".
38+
(But rules that return something other than a simple string
39+
continue to use the normal Bison notations.)
40+
41+
2242
ecpg.addons contains entries that begin with a line like
2343
ECPG: concattokens ruletype
2444
and typically have one or more following lines that are the code
@@ -69,9 +89,9 @@ parse.pl contains some tables that list backend grammar
6989
productions to be ignored or modified.
7090

7191
Nonterminals that construct strings (as described above) should be
72-
given <str> type, which is parse.pl's default assumption for
73-
nonterminals found in gram.y. That can be overridden at need by
74-
making an entry in parse.pl's %replace_types table. %replace_types
92+
given void type, which is parse.pl's default assumption for
93+
nonterminals found in gram.y. If the result should be of some other
94+
type, make an entry in parse.pl's %replace_types table. %replace_types
7595
can also be used to suppress output of a nonterminal's rules
7696
altogether (in which case ecpg.trailer had better provide replacement
7797
rules, since the nonterminal will still be referred to elsewhere).

0 commit comments

Comments
 (0)