Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 938236a

Browse files
committed
The fti.pl supplied with the fulltextindex module generate ALL possible
substrings of two characters or greater, and is case-sensitive. This patch makes it work correctly. It generates only the suffixes of each word, plus lowercases them - as specified by the README file. This brings it into line with the fti.c function, makes it case-insensitive properly, removes the problem with duplicate rows being returned from an fti search and greatly reduces the size of the generated index table. It was written by my co-worker, Brett Toolin. Christopher Kings-Lynne
1 parent 8c6761a commit 938236a

File tree

1 file changed

+13
-12
lines changed

1 file changed

+13
-12
lines changed

contrib/fulltextindex/fti.pl

+13-12
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/usr/bin/perl
22
#
3-
# This script substracts all substrings out of a specific column in a table
3+
# This script substracts all suffixes of all words in a specific column in a table
44
# and generates output that can be loaded into a new table with the
55
# psql '\copy' command. The new table should have the following structure:
66
#
@@ -52,27 +52,28 @@
5252
$PGRES_NONFATAL_ERROR = 6 ;
5353
$PGRES_FATAL_ERROR = 7 ;
5454

55+
# the minimum length of word to include in the full text index
56+
$MIN_WORD_LENGTH = 2;
57+
58+
# the minimum length of the substrings in the full text index
59+
$MIN_SUBSTRING_LENGTH = 2;
60+
5561
$[ = 0; # make sure string offsets start at 0
5662

5763
sub break_up {
5864
my $string = pop @_;
5965

66+
# convert strings to lower case
67+
$string = lc($string);
6068
@strings = split(/\W+/, $string);
6169
@subs = ();
6270

6371
foreach $s (@strings) {
6472
$len = length($s);
65-
next if ($len < 4);
66-
67-
$lpos = $len-1;
68-
while ($lpos >= 3) {
69-
$fpos = $lpos - 3;
70-
while ($fpos >= 0) {
71-
$sub = substr($s, $fpos, $lpos - $fpos + 1);
72-
push(@subs, $sub);
73-
$fpos = $fpos - 1;
74-
}
75-
$lpos = $lpos - 1;
73+
next if ($len <= $MIN_WORD_LENGTH);
74+
for ($i = 0; $i <= $len - $MIN_SUBSTRING_LENGTH; $i++) {
75+
$tmp = substr($s, $i);
76+
push(@subs, $tmp);
7677
}
7778
}
7879

0 commit comments

Comments
 (0)