Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit da11977

Browse files
committed
Reduce memory usage of tsvector type analyze function.
compute_tsvector_stats() detoasted and kept in memory every tsvector value in the sample, but that can be a lot of memory. The original bug report described a case using over 10 gigabytes, with statistics target of 10000 (the maximum). To fix, allocate a separate copy of just the lexemes that we keep around, and free the detoasted tsvector values as we go. This adds some palloc/pfree overhead, when you have a lot of distinct lexemes in the sample, but it's better than running out of memory. Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org
1 parent ca793c5 commit da11977

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

src/backend/tsearch/ts_typanalyze.c

+17-4
Original file line numberDiff line numberDiff line change
@@ -232,17 +232,20 @@ compute_tsvector_stats(VacAttrStats *stats,
232232

233233
/*
234234
* We loop through the lexemes in the tsvector and add them to our
235-
* tracking hashtable. Note: the hashtable entries will point into
236-
* the (detoasted) tsvector value, therefore we cannot free that
237-
* storage until we're done.
235+
* tracking hashtable.
238236
*/
239237
lexemesptr = STRPTR(vector);
240238
curentryptr = ARRPTR(vector);
241239
for (j = 0; j < vector->size; j++)
242240
{
243241
bool found;
244242

245-
/* Construct a hash key */
243+
/*
244+
* Construct a hash key. The key points into the (detoasted)
245+
* tsvector value at this point, but if a new entry is created, we
246+
* make a copy of it. This way we can free the tsvector value
247+
* once we've processed all its lexemes.
248+
*/
246249
hash_key.lexeme = lexemesptr + curentryptr->pos;
247250
hash_key.length = curentryptr->len;
248251

@@ -261,6 +264,9 @@ compute_tsvector_stats(VacAttrStats *stats,
261264
/* Initialize new tracking list element */
262265
item->frequency = 1;
263266
item->delta = b_current - 1;
267+
268+
item->key.lexeme = palloc(hash_key.length);
269+
memcpy(item->key.lexeme, hash_key.lexeme, hash_key.length);
264270
}
265271

266272
/* lexeme_no is the number of elements processed (ie N) */
@@ -276,6 +282,10 @@ compute_tsvector_stats(VacAttrStats *stats,
276282
/* Advance to the next WordEntry in the tsvector */
277283
curentryptr++;
278284
}
285+
286+
/* If the vector was toasted, free the detoasted copy. */
287+
if (TSVectorGetDatum(vector) != value)
288+
pfree(vector);
279289
}
280290

281291
/* We can only compute real stats if we found some non-null values. */
@@ -447,9 +457,12 @@ prune_lexemes_hashtable(HTAB *lexemes_tab, int b_current)
447457
{
448458
if (item->frequency + item->delta <= b_current)
449459
{
460+
char *lexeme = item->key.lexeme;
461+
450462
if (hash_search(lexemes_tab, (const void *) &item->key,
451463
HASH_REMOVE, NULL) == NULL)
452464
elog(ERROR, "hash table corrupted");
465+
pfree(lexeme);
453466
}
454467
}
455468
}

0 commit comments

Comments
 (0)