|
| 1 | +test_regex is a module for testing the regular expression package. |
| 2 | +It is mostly meant to allow us to absorb Tcl's regex test suite. |
| 3 | +Therefore, there are provisions to exercise regex features that |
| 4 | +aren't currently exposed at the SQL level by PostgreSQL. |
| 5 | + |
| 6 | +Currently, one function is provided: |
| 7 | + |
| 8 | +test_regex(pattern text, string text, flags text) returns setof text[] |
| 9 | + |
| 10 | +Reports an error if the pattern is an invalid regex. Otherwise, |
| 11 | +the first row of output contains the number of subexpressions, |
| 12 | +followed by words reporting set bit(s) in the regex's re_info field. |
| 13 | +If the pattern doesn't match the string, that's all. |
| 14 | +If the pattern does match, the next row contains the whole match |
| 15 | +as the first array element. If there are parenthesized subexpression(s), |
| 16 | +following array elements contain the matches to those subexpressions. |
| 17 | +If the "g" (glob) flag is set, then additional row(s) of output similarly |
| 18 | +report any additional matches. |
| 19 | + |
| 20 | +The "flags" argument is a string of zero or more single-character |
| 21 | +flags that modify the behavior of the regex package or the test |
| 22 | +function. As described in Tcl's reg.test file: |
| 23 | + |
| 24 | +The flag characters are complex and a bit eclectic. Generally speaking, |
| 25 | +lowercase letters are compile options, uppercase are expected re_info |
| 26 | +bits, and nonalphabetics are match options, controls for how the test is |
| 27 | +run, or testing options. The one small surprise is that AREs are the |
| 28 | +default, and you must explicitly request lesser flavors of RE. The flags |
| 29 | +are as follows. It is admitted that some are not very mnemonic. |
| 30 | + |
| 31 | + - no-op (placeholder) |
| 32 | + 0 report indices not actual strings |
| 33 | + (This substitutes for Tcl's -indices switch) |
| 34 | + ! expect partial match, report start position anyway |
| 35 | + % force small state-set cache in matcher (to test cache replace) |
| 36 | + ^ beginning of string is not beginning of line |
| 37 | + $ end of string is not end of line |
| 38 | + * test is Unicode-specific, needs big character set |
| 39 | + + provide fake xy equivalence class and ch collating element |
| 40 | + (Note: the equivalence class is implemented, the |
| 41 | + collating element is not; so references to [.ch.] fail) |
| 42 | + , set REG_PROGRESS (only useful in REG_DEBUG builds) |
| 43 | + . set REG_DUMP (only useful in REG_DEBUG builds) |
| 44 | + : set REG_MTRACE (only useful in REG_DEBUG builds) |
| 45 | + ; set REG_FTRACE (only useful in REG_DEBUG builds) |
| 46 | + |
| 47 | + & test as both ARE and BRE |
| 48 | + (Not implemented in Postgres, we use separate tests) |
| 49 | + b BRE |
| 50 | + e ERE |
| 51 | + a turn advanced-features bit on (error unless ERE already) |
| 52 | + q literal string, no metacharacters at all |
| 53 | + |
| 54 | + g global match (find all matches) |
| 55 | + i case-independent matching |
| 56 | + o ("opaque") do not return match locations |
| 57 | + p newlines are half-magic, excluded from . and [^ only |
| 58 | + w newlines are half-magic, significant to ^ and $ only |
| 59 | + n newlines are fully magic, both effects |
| 60 | + x expanded RE syntax |
| 61 | + t incomplete-match reporting |
| 62 | + c canmatch (equivalent to "t0!", in Postgres implementation) |
| 63 | + s match only at start (REG_BOSONLY) |
| 64 | + |
| 65 | + A backslash-_a_lphanumeric seen |
| 66 | + B ERE/ARE literal-_b_race heuristic used |
| 67 | + E backslash (_e_scape) seen within [] |
| 68 | + H looka_h_ead constraint seen |
| 69 | + I _i_mpossible to match |
| 70 | + L _l_ocale-specific construct seen |
| 71 | + M unportable (_m_achine-specific) construct seen |
| 72 | + N RE can match empty (_n_ull) string |
| 73 | + P non-_P_OSIX construct seen |
| 74 | + Q {} _q_uantifier seen |
| 75 | + R back _r_eference seen |
| 76 | + S POSIX-un_s_pecified syntax seen |
| 77 | + T prefers shortest (_t_iny) |
| 78 | + U saw original-POSIX botch: unmatched right paren in ERE (_u_gh) |
0 commit comments