@@ -438,3 +438,36 @@ BOS/BOL/EOS/EOL adjacent to the pre-state and post-state. So a finished
438
438
NFA for a pattern without anchors or adjacent-character constraints will
439
439
have pre-state outarcs for RAINBOW (all possible character colors) as well
440
440
as BOS and BOL, and likewise post-state inarcs for RAINBOW, EOS, and EOL.
441
+ Also note that LACON arcs will never connect to the pre-state
442
+ or post-state.
443
+
444
+
445
+ Look-around constraints (LACONs)
446
+ --------------------------------
447
+
448
+ The regex compiler doesn't have much intelligence about LACONs; it just
449
+ constructs a sub-NFA representing the pattern that the constraint says to
450
+ match or not match, and puts a LACON arc referencing that sub-NFA into the
451
+ main NFA. At runtime, the executor applies the sub-NFA at each point in
452
+ the string where the constraint is relevant, and then traverses or doesn't
453
+ traverse the arc. ("Traversal" means including the arc's to-state in the
454
+ set of NFA states that are considered active at the next character.)
455
+
456
+ The actual basic matching cycle of the executor is
457
+ 1. Identify the color of the next input character, then advance over it.
458
+ 2. Apply the DFA to follow all the matching "plain" arcs of the NFA.
459
+ (Notionally, the previous DFA state represents the set of states the
460
+ NFA could have been in before the character, and the new DFA state
461
+ represents the set of states the NFA could be in after the character.)
462
+ 3. If there are any LACON arcs leading out of any of the new NFA states,
463
+ apply each LACON constraint starting from the new next input character
464
+ (while not actually consuming any input). For each successful LACON,
465
+ add its to-state to the current set of NFA states. If any such
466
+ to-state has outgoing LACON arcs, process those in the same way.
467
+ (Mathematically speaking, we compute the transitive closure of the
468
+ set of states reachable by successful LACONs.)
469
+
470
+ Thus, LACONs are always checked immediately after consuming a character
471
+ via a plain arc. This is okay because the NFA's "pre" state only has
472
+ plain out-arcs, so we can always consume a character (possibly a BOS
473
+ pseudo-character as described above) before we need to worry about LACONs.
0 commit comments