Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit dfd8e6c

Browse files
Fix an issue with index scan using pg_trgm due to char signedness on different architectures.
GIN and GiST indexes utilizing pg_trgm's opclasses store sorted trigrams within index tuples. When comparing and sorting each trigram, pg_trgm treats each character as a 'char[3]' type in C. However, the char type in C can be interpreted as either signed char or unsigned char, depending on the platform, if the signedness is not explicitly specified. Consequently, during replication between different CPU architectures, there was an issue where index scans on standby servers could not locate matching index tuples due to the differing treatment of character signedness. This change introduces comparison functions for trgm that explicitly handle signed char and unsigned char. The appropriate comparison function will be dynamically selected based on the character signedness stored in the control file. Therefore, upgraded clusters can utilize the indexes without rebuilding, provided the cluster upgrade occurs on platforms with the same character signedness as the original cluster initialization. The default char signedness information was introduced in 44fe30f, so no backpatch. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
1 parent 1aab680 commit dfd8e6c

File tree

2 files changed

+45
-4
lines changed

2 files changed

+45
-4
lines changed

contrib/pg_trgm/trgm.h

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,12 @@
4040

4141
typedef char trgm[3];
4242

43-
#define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
44-
#define CMPPCHAR(a,b,i) CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) )
45-
#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) )
46-
4743
#define CPTRGM(a,b) do { \
4844
*(((char*)(a))+0) = *(((char*)(b))+0); \
4945
*(((char*)(a))+1) = *(((char*)(b))+1); \
5046
*(((char*)(a))+2) = *(((char*)(b))+2); \
5147
} while(0)
48+
extern int (*CMPTRGM) (const void *a, const void *b);
5249

5350
#define ISWORDCHR(c) (t_isalnum(c))
5451
#define ISPRINTABLECHAR(a) ( isascii( *(unsigned char*)(a) ) && (isalnum( *(unsigned char*)(a) ) || *(unsigned char*)(a)==' ') )

contrib/pg_trgm/trgm_op.c

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op);
4242
PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op);
4343
PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op);
4444

45+
static int CMPTRGM_CHOOSE(const void *a, const void *b);
46+
int (*CMPTRGM) (const void *a, const void *b) = CMPTRGM_CHOOSE;
47+
4548
/* Trigram with position */
4649
typedef struct
4750
{
@@ -107,6 +110,47 @@ _PG_init(void)
107110
MarkGUCPrefixReserved("pg_trgm");
108111
}
109112

113+
#define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
114+
115+
/*
116+
* Functions for comparing two trgms while treating each char as "signed char" or
117+
* "unsigned char".
118+
*/
119+
static inline int
120+
CMPTRGM_SIGNED(const void *a, const void *b)
121+
{
122+
#define CMPPCHAR_S(a,b,i) CMPCHAR( *(((const signed char*)(a))+i), *(((const signed char*)(b))+i) )
123+
124+
return CMPPCHAR_S(a, b, 0) ? CMPPCHAR_S(a, b, 0)
125+
: (CMPPCHAR_S(a, b, 1) ? CMPPCHAR_S(a, b, 1)
126+
: CMPPCHAR_S(a, b, 2));
127+
}
128+
129+
static inline int
130+
CMPTRGM_UNSIGNED(const void *a, const void *b)
131+
{
132+
#define CMPPCHAR_UNS(a,b,i) CMPCHAR( *(((const unsigned char*)(a))+i), *(((const unsigned char*)(b))+i) )
133+
134+
return CMPPCHAR_UNS(a, b, 0) ? CMPPCHAR_UNS(a, b, 0)
135+
: (CMPPCHAR_UNS(a, b, 1) ? CMPPCHAR_UNS(a, b, 1)
136+
: CMPPCHAR_UNS(a, b, 2));
137+
}
138+
139+
/*
140+
* This gets called on the first call. It replaces the function pointer so
141+
* that subsequent calls are routed directly to the chosen implementation.
142+
*/
143+
static int
144+
CMPTRGM_CHOOSE(const void *a, const void *b)
145+
{
146+
if (GetDefaultCharSignedness())
147+
CMPTRGM = CMPTRGM_SIGNED;
148+
else
149+
CMPTRGM = CMPTRGM_UNSIGNED;
150+
151+
return CMPTRGM(a, b);
152+
}
153+
110154
/*
111155
* Deprecated function.
112156
* Use "pg_trgm.similarity_threshold" GUC variable instead of this function.

0 commit comments

Comments
 (0)