Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit ca8217c

Browse files
committed
Add a test module for the regular expression package.
This module provides a function test_regex() that is functionally rather like regexp_matches(), but with additional debugging-oriented options and additional output. The debug options are somewhat obscure; they are chosen to match the API of the test harness that Henry Spencer wrote way-back-when for use in Tcl. With this, we can import all the test cases that Spencer wrote originally, even for regex functionality that we don't currently expose in Postgres. This seems necessary because we can no longer rely on Tcl to act as upstream and verify any fixes or improvements that we make. In addition to Spencer's tests, I added a few for lookbehind constraints (which we added in 2015, and Tcl still hasn't absorbed) that are modeled on his tests for lookahead constraints. After looking at code coverage reports, I also threw in a couple of tests to more fully exercise our "high colormap" logic. According to my testing, this brings the check-world coverage for src/backend/regex/ from 71.1% to 86.7% of lines. (coverage.postgresql.org shows a slightly different number, which I think is because it measures a non-assert build.) Discussion: https://postgr.es/m/2873268.1609732164@sss.pgh.pa.us
1 parent 4656e3d commit ca8217c

12 files changed

+7264
-0
lines changed

src/test/modules/Makefile

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ SUBDIRS = \
2222
test_pg_dump \
2323
test_predtest \
2424
test_rbtree \
25+
test_regex \
2526
test_rls_hooks \
2627
test_shm_mq \
2728
unsafe_tests \
+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Generated subdirectories
2+
/log/
3+
/results/
4+
/tmp_check/

src/test/modules/test_regex/Makefile

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# src/test/modules/test_regex/Makefile
2+
3+
MODULE_big = test_regex
4+
OBJS = \
5+
$(WIN32RES) \
6+
test_regex.o
7+
PGFILEDESC = "test_regex - test code for backend/regex/"
8+
9+
EXTENSION = test_regex
10+
DATA = test_regex--1.0.sql
11+
12+
REGRESS = test_regex test_regex_utf8
13+
14+
ifdef USE_PGXS
15+
PG_CONFIG = pg_config
16+
PGXS := $(shell $(PG_CONFIG) --pgxs)
17+
include $(PGXS)
18+
else
19+
subdir = src/test/modules/test_regex
20+
top_builddir = ../../../..
21+
include $(top_builddir)/src/Makefile.global
22+
include $(top_srcdir)/contrib/contrib-global.mk
23+
endif

src/test/modules/test_regex/README

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
test_regex is a module for testing the regular expression package.
2+
It is mostly meant to allow us to absorb Tcl's regex test suite.
3+
Therefore, there are provisions to exercise regex features that
4+
aren't currently exposed at the SQL level by PostgreSQL.
5+
6+
Currently, one function is provided:
7+
8+
test_regex(pattern text, string text, flags text) returns setof text[]
9+
10+
Reports an error if the pattern is an invalid regex. Otherwise,
11+
the first row of output contains the number of subexpressions,
12+
followed by words reporting set bit(s) in the regex's re_info field.
13+
If the pattern doesn't match the string, that's all.
14+
If the pattern does match, the next row contains the whole match
15+
as the first array element. If there are parenthesized subexpression(s),
16+
following array elements contain the matches to those subexpressions.
17+
If the "g" (glob) flag is set, then additional row(s) of output similarly
18+
report any additional matches.
19+
20+
The "flags" argument is a string of zero or more single-character
21+
flags that modify the behavior of the regex package or the test
22+
function. As described in Tcl's reg.test file:
23+
24+
The flag characters are complex and a bit eclectic. Generally speaking,
25+
lowercase letters are compile options, uppercase are expected re_info
26+
bits, and nonalphabetics are match options, controls for how the test is
27+
run, or testing options. The one small surprise is that AREs are the
28+
default, and you must explicitly request lesser flavors of RE. The flags
29+
are as follows. It is admitted that some are not very mnemonic.
30+
31+
- no-op (placeholder)
32+
0 report indices not actual strings
33+
(This substitutes for Tcl's -indices switch)
34+
! expect partial match, report start position anyway
35+
% force small state-set cache in matcher (to test cache replace)
36+
^ beginning of string is not beginning of line
37+
$ end of string is not end of line
38+
* test is Unicode-specific, needs big character set
39+
+ provide fake xy equivalence class and ch collating element
40+
(Note: the equivalence class is implemented, the
41+
collating element is not; so references to [.ch.] fail)
42+
, set REG_PROGRESS (only useful in REG_DEBUG builds)
43+
. set REG_DUMP (only useful in REG_DEBUG builds)
44+
: set REG_MTRACE (only useful in REG_DEBUG builds)
45+
; set REG_FTRACE (only useful in REG_DEBUG builds)
46+
47+
& test as both ARE and BRE
48+
(Not implemented in Postgres, we use separate tests)
49+
b BRE
50+
e ERE
51+
a turn advanced-features bit on (error unless ERE already)
52+
q literal string, no metacharacters at all
53+
54+
g global match (find all matches)
55+
i case-independent matching
56+
o ("opaque") do not return match locations
57+
p newlines are half-magic, excluded from . and [^ only
58+
w newlines are half-magic, significant to ^ and $ only
59+
n newlines are fully magic, both effects
60+
x expanded RE syntax
61+
t incomplete-match reporting
62+
c canmatch (equivalent to "t0!", in Postgres implementation)
63+
s match only at start (REG_BOSONLY)
64+
65+
A backslash-_a_lphanumeric seen
66+
B ERE/ARE literal-_b_race heuristic used
67+
E backslash (_e_scape) seen within []
68+
H looka_h_ead constraint seen
69+
I _i_mpossible to match
70+
L _l_ocale-specific construct seen
71+
M unportable (_m_achine-specific) construct seen
72+
N RE can match empty (_n_ull) string
73+
P non-_P_OSIX construct seen
74+
Q {} _q_uantifier seen
75+
R back _r_eference seen
76+
S POSIX-un_s_pecified syntax seen
77+
T prefers shortest (_t_iny)
78+
U saw original-POSIX botch: unmatched right paren in ERE (_u_gh)

0 commit comments

Comments
 (0)