Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 10d5822

Browse files
committed
Doc: add a little about LACON execution to src/backend/regex/README.
I wrote this while thinking about a possible optimization, but it's a useful description of the existing code regardless of whether the optimization ever happens. So push it separately.
1 parent 375aed3 commit 10d5822

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

src/backend/regex/README

+33
Original file line numberDiff line numberDiff line change
@@ -438,3 +438,36 @@ BOS/BOL/EOS/EOL adjacent to the pre-state and post-state. So a finished
438438
NFA for a pattern without anchors or adjacent-character constraints will
439439
have pre-state outarcs for RAINBOW (all possible character colors) as well
440440
as BOS and BOL, and likewise post-state inarcs for RAINBOW, EOS, and EOL.
441+
Also note that LACON arcs will never connect to the pre-state
442+
or post-state.
443+
444+
445+
Look-around constraints (LACONs)
446+
--------------------------------
447+
448+
The regex compiler doesn't have much intelligence about LACONs; it just
449+
constructs a sub-NFA representing the pattern that the constraint says to
450+
match or not match, and puts a LACON arc referencing that sub-NFA into the
451+
main NFA. At runtime, the executor applies the sub-NFA at each point in
452+
the string where the constraint is relevant, and then traverses or doesn't
453+
traverse the arc. ("Traversal" means including the arc's to-state in the
454+
set of NFA states that are considered active at the next character.)
455+
456+
The actual basic matching cycle of the executor is
457+
1. Identify the color of the next input character, then advance over it.
458+
2. Apply the DFA to follow all the matching "plain" arcs of the NFA.
459+
(Notionally, the previous DFA state represents the set of states the
460+
NFA could have been in before the character, and the new DFA state
461+
represents the set of states the NFA could be in after the character.)
462+
3. If there are any LACON arcs leading out of any of the new NFA states,
463+
apply each LACON constraint starting from the new next input character
464+
(while not actually consuming any input). For each successful LACON,
465+
add its to-state to the current set of NFA states. If any such
466+
to-state has outgoing LACON arcs, process those in the same way.
467+
(Mathematically speaking, we compute the transitive closure of the
468+
set of states reachable by successful LACONs.)
469+
470+
Thus, LACONs are always checked immediately after consuming a character
471+
via a plain arc. This is okay because the NFA's "pre" state only has
472+
plain out-arcs, so we can always consume a character (possibly a BOS
473+
pseudo-character as described above) before we need to worry about LACONs.

0 commit comments

Comments
 (0)