Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTom Lane2012-07-10 18:54:37 +0000
committerTom Lane2012-07-10 18:54:37 +0000
commit628cbb50ba80c83917b07a7609ddec12cda172d0 (patch)
tree7008492921c90e6de7c431633e33624a597a8416 /src/backend/regex/README
parent00dac6000d422033c3e8d191f01ee0e6525794c2 (diff)
Re-implement extraction of fixed prefixes from regular expressions.
To generate btree-indexable conditions from regex WHERE conditions (such as WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed prefix that a regex might have; that is, find any string that must be a prefix of all strings satisfying the regex. We used to do that with entirely ad-hoc code that looked at the source text of the regex. It didn't know very much about regex syntax, which mostly meant that it would fail to identify some optimizable cases; but Viktor Rosenfeld reported that it would produce actively wrong answers for quantified parenthesized subexpressions, such as '^(foo)?bar'. Rather than trying to extend the ad-hoc code to cover this, let's get rid of it altogether in favor of identifying prefixes by examining the compiled form of a regex. To do this, I've added a new entry point "pg_regprefix" to the regex library; hopefully it is defined in a sufficiently general fashion that it can remain in the library when/if that code gets split out as a standalone project. Since this bug has been there for a very long time, this fix needs to get back-patched. However it depends on some other recent commits (particularly the addition of wchar-to-database-encoding conversion), so I'll commit this separately and then go to work on back-porting the necessary fixes.
Diffstat (limited to 'src/backend/regex/README')
-rw-r--r--src/backend/regex/README4
1 files changed, 3 insertions, 1 deletions
diff --git a/src/backend/regex/README b/src/backend/regex/README
index 89ba6a62ea2..c5d21e8c99d 100644
--- a/src/backend/regex/README
+++ b/src/backend/regex/README
@@ -7,12 +7,13 @@ So this file is an attempt to reverse-engineer some docs.
General source-file layout
--------------------------
-There are four separately-compilable source files, each exposing exactly
+There are five separately-compilable source files, each exposing exactly
one exported function:
regcomp.c: pg_regcomp
regexec.c: pg_regexec
regerror.c: pg_regerror
regfree.c: pg_regfree
+ regprefix.c: pg_regprefix
(The pg_ prefixes were added by the Postgres project to distinguish this
library version from any similar one that might be present on a particular
system. They'd need to be removed or replaced in any standalone version
@@ -44,6 +45,7 @@ regexec.c Top-level regex execution code
rege_dfa.c DFA creation and execution
regerror.c pg_regerror: generate text for a regex error code
regfree.c pg_regfree: API to free a no-longer-needed regex_t
+regprefix.c Code for extracting a common prefix from a regex_t
The locale-specific code is concerned primarily with case-folding and with
expanding locale-specific character classes, such as [[:alnum:]]. It