Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit bbe0a81

Browse files
committed
Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
1 parent e589c48 commit bbe0a81

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+2261
-160
lines changed

configure

+170
Original file line numberDiff line numberDiff line change
@@ -699,6 +699,9 @@ with_gnu_ld
699699
LD
700700
LDFLAGS_SL
701701
LDFLAGS_EX
702+
LZ4_LIBS
703+
LZ4_CFLAGS
704+
with_lz4
702705
with_zlib
703706
with_system_tzdata
704707
with_libxslt
@@ -864,6 +867,7 @@ with_libxml
864867
with_libxslt
865868
with_system_tzdata
866869
with_zlib
870+
with_lz4
867871
with_gnu_ld
868872
with_ssl
869873
with_openssl
@@ -891,6 +895,8 @@ ICU_LIBS
891895
XML2_CONFIG
892896
XML2_CFLAGS
893897
XML2_LIBS
898+
LZ4_CFLAGS
899+
LZ4_LIBS
894900
LDFLAGS_EX
895901
LDFLAGS_SL
896902
PERL
@@ -1569,6 +1575,7 @@ Optional Packages:
15691575
--with-system-tzdata=DIR
15701576
use system time zone data in DIR
15711577
--without-zlib do not use Zlib
1578+
--with-lz4 build with LZ4 support
15721579
--with-gnu-ld assume the C compiler uses GNU ld [default=no]
15731580
--with-ssl=LIB use LIB for SSL/TLS support (openssl)
15741581
--with-openssl obsolete spelling of --with-ssl=openssl
@@ -1596,6 +1603,8 @@ Some influential environment variables:
15961603
XML2_CONFIG path to xml2-config utility
15971604
XML2_CFLAGS C compiler flags for XML2, overriding pkg-config
15981605
XML2_LIBS linker flags for XML2, overriding pkg-config
1606+
LZ4_CFLAGS C compiler flags for LZ4, overriding pkg-config
1607+
LZ4_LIBS linker flags for LZ4, overriding pkg-config
15991608
LDFLAGS_EX extra linker flags for linking executables only
16001609
LDFLAGS_SL extra linker flags for linking shared libraries only
16011610
PERL Perl program
@@ -8563,6 +8572,137 @@ fi
85638572

85648573

85658574

8575+
#
8576+
# LZ4
8577+
#
8578+
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with LZ4 support" >&5
8579+
$as_echo_n "checking whether to build with LZ4 support... " >&6; }
8580+
8581+
8582+
8583+
# Check whether --with-lz4 was given.
8584+
if test "${with_lz4+set}" = set; then :
8585+
withval=$with_lz4;
8586+
case $withval in
8587+
yes)
8588+
8589+
$as_echo "#define USE_LZ4 1" >>confdefs.h
8590+
8591+
;;
8592+
no)
8593+
:
8594+
;;
8595+
*)
8596+
as_fn_error $? "no argument expected for --with-lz4 option" "$LINENO" 5
8597+
;;
8598+
esac
8599+
8600+
else
8601+
with_lz4=no
8602+
8603+
fi
8604+
8605+
8606+
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_lz4" >&5
8607+
$as_echo "$with_lz4" >&6; }
8608+
8609+
8610+
if test "$with_lz4" = yes; then
8611+
8612+
pkg_failed=no
8613+
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for liblz4" >&5
8614+
$as_echo_n "checking for liblz4... " >&6; }
8615+
8616+
if test -n "$LZ4_CFLAGS"; then
8617+
pkg_cv_LZ4_CFLAGS="$LZ4_CFLAGS"
8618+
elif test -n "$PKG_CONFIG"; then
8619+
if test -n "$PKG_CONFIG" && \
8620+
{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"liblz4\""; } >&5
8621+
($PKG_CONFIG --exists --print-errors "liblz4") 2>&5
8622+
ac_status=$?
8623+
$as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
8624+
test $ac_status = 0; }; then
8625+
pkg_cv_LZ4_CFLAGS=`$PKG_CONFIG --cflags "liblz4" 2>/dev/null`
8626+
test "x$?" != "x0" && pkg_failed=yes
8627+
else
8628+
pkg_failed=yes
8629+
fi
8630+
else
8631+
pkg_failed=untried
8632+
fi
8633+
if test -n "$LZ4_LIBS"; then
8634+
pkg_cv_LZ4_LIBS="$LZ4_LIBS"
8635+
elif test -n "$PKG_CONFIG"; then
8636+
if test -n "$PKG_CONFIG" && \
8637+
{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"liblz4\""; } >&5
8638+
($PKG_CONFIG --exists --print-errors "liblz4") 2>&5
8639+
ac_status=$?
8640+
$as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
8641+
test $ac_status = 0; }; then
8642+
pkg_cv_LZ4_LIBS=`$PKG_CONFIG --libs "liblz4" 2>/dev/null`
8643+
test "x$?" != "x0" && pkg_failed=yes
8644+
else
8645+
pkg_failed=yes
8646+
fi
8647+
else
8648+
pkg_failed=untried
8649+
fi
8650+
8651+
8652+
8653+
if test $pkg_failed = yes; then
8654+
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
8655+
$as_echo "no" >&6; }
8656+
8657+
if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
8658+
_pkg_short_errors_supported=yes
8659+
else
8660+
_pkg_short_errors_supported=no
8661+
fi
8662+
if test $_pkg_short_errors_supported = yes; then
8663+
LZ4_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "liblz4" 2>&1`
8664+
else
8665+
LZ4_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "liblz4" 2>&1`
8666+
fi
8667+
# Put the nasty error message in config.log where it belongs
8668+
echo "$LZ4_PKG_ERRORS" >&5
8669+
8670+
as_fn_error $? "Package requirements (liblz4) were not met:
8671+
8672+
$LZ4_PKG_ERRORS
8673+
8674+
Consider adjusting the PKG_CONFIG_PATH environment variable if you
8675+
installed software in a non-standard prefix.
8676+
8677+
Alternatively, you may set the environment variables LZ4_CFLAGS
8678+
and LZ4_LIBS to avoid the need to call pkg-config.
8679+
See the pkg-config man page for more details." "$LINENO" 5
8680+
elif test $pkg_failed = untried; then
8681+
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
8682+
$as_echo "no" >&6; }
8683+
{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
8684+
$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
8685+
as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
8686+
is in your PATH or set the PKG_CONFIG environment variable to the full
8687+
path to pkg-config.
8688+
8689+
Alternatively, you may set the environment variables LZ4_CFLAGS
8690+
and LZ4_LIBS to avoid the need to call pkg-config.
8691+
See the pkg-config man page for more details.
8692+
8693+
To get pkg-config, see <http://pkg-config.freedesktop.org/>.
8694+
See \`config.log' for more details" "$LINENO" 5; }
8695+
else
8696+
LZ4_CFLAGS=$pkg_cv_LZ4_CFLAGS
8697+
LZ4_LIBS=$pkg_cv_LZ4_LIBS
8698+
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
8699+
$as_echo "yes" >&6; }
8700+
8701+
fi
8702+
LIBS="$LZ4_LIBS $LIBS"
8703+
CFLAGS="$LZ4_CFLAGS $CFLAGS"
8704+
fi
8705+
85668706
#
85678707
# Assignments
85688708
#
@@ -13379,6 +13519,36 @@ Use --without-zlib to disable zlib support." "$LINENO" 5
1337913519
fi
1338013520

1338113521

13522+
fi
13523+
13524+
if test "$with_lz4" = yes; then
13525+
for ac_header in lz4/lz4.h
13526+
do :
13527+
ac_fn_c_check_header_mongrel "$LINENO" "lz4/lz4.h" "ac_cv_header_lz4_lz4_h" "$ac_includes_default"
13528+
if test "x$ac_cv_header_lz4_lz4_h" = xyes; then :
13529+
cat >>confdefs.h <<_ACEOF
13530+
#define HAVE_LZ4_LZ4_H 1
13531+
_ACEOF
13532+
13533+
else
13534+
for ac_header in lz4.h
13535+
do :
13536+
ac_fn_c_check_header_mongrel "$LINENO" "lz4.h" "ac_cv_header_lz4_h" "$ac_includes_default"
13537+
if test "x$ac_cv_header_lz4_h" = xyes; then :
13538+
cat >>confdefs.h <<_ACEOF
13539+
#define HAVE_LZ4_H 1
13540+
_ACEOF
13541+
13542+
else
13543+
as_fn_error $? "lz4.h header file is required for LZ4" "$LINENO" 5
13544+
fi
13545+
13546+
done
13547+
13548+
fi
13549+
13550+
done
13551+
1338213552
fi
1338313553

1338413554
if test "$with_gssapi" = yes ; then

configure.ac

+20
Original file line numberDiff line numberDiff line change
@@ -986,6 +986,21 @@ PGAC_ARG_BOOL(with, zlib, yes,
986986
[do not use Zlib])
987987
AC_SUBST(with_zlib)
988988

989+
#
990+
# LZ4
991+
#
992+
AC_MSG_CHECKING([whether to build with LZ4 support])
993+
PGAC_ARG_BOOL(with, lz4, no, [build with LZ4 support],
994+
[AC_DEFINE([USE_LZ4], 1, [Define to 1 to build with LZ4 support. (--with-lz4)])])
995+
AC_MSG_RESULT([$with_lz4])
996+
AC_SUBST(with_lz4)
997+
998+
if test "$with_lz4" = yes; then
999+
PKG_CHECK_MODULES(LZ4, liblz4)
1000+
LIBS="$LZ4_LIBS $LIBS"
1001+
CFLAGS="$LZ4_CFLAGS $CFLAGS"
1002+
fi
1003+
9891004
#
9901005
# Assignments
9911006
#
@@ -1410,6 +1425,11 @@ failure. It is possible the compiler isn't looking in the proper directory.
14101425
Use --without-zlib to disable zlib support.])])
14111426
fi
14121427

1428+
if test "$with_lz4" = yes; then
1429+
AC_CHECK_HEADERS(lz4/lz4.h, [],
1430+
[AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required for LZ4])])])
1431+
fi
1432+
14131433
if test "$with_gssapi" = yes ; then
14141434
AC_CHECK_HEADERS(gssapi/gssapi.h, [],
14151435
[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])

contrib/amcheck/verify_heapam.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -1069,7 +1069,7 @@ check_tuple_attribute(HeapCheckContext *ctx)
10691069
*/
10701070
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
10711071

1072-
ctx->attrsize = toast_pointer.va_extsize;
1072+
ctx->attrsize = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
10731073
ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
10741074
ctx->totalchunks = ctx->endchunk + 1;
10751075

doc/src/sgml/catalogs.sgml

+12
Original file line numberDiff line numberDiff line change
@@ -1355,6 +1355,18 @@
13551355
</para></entry>
13561356
</row>
13571357

1358+
<row>
1359+
<entry role="catalog_table_entry"><para role="column_definition">
1360+
<structfield>attcompression</structfield> <type>char</type>
1361+
</para>
1362+
<para>
1363+
The current compression method of the column. If it is an invalid
1364+
compression method (<literal>'\0'</literal>) then column data will not
1365+
be compressed. Otherwise, <literal>'p'</literal> = pglz compression or
1366+
<literal>'l'</literal> = lz4 compression.
1367+
</para></entry>
1368+
</row>
1369+
13581370
<row>
13591371
<entry role="catalog_table_entry"><para role="column_definition">
13601372
<structfield>attacl</structfield> <type>aclitem[]</type>

doc/src/sgml/func.sgml

+16-2
Original file line numberDiff line numberDiff line change
@@ -25992,8 +25992,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
2599225992
<para>
2599325993
The functions shown in <xref linkend="functions-admin-dbsize"/> calculate
2599425994
the disk space usage of database objects, or assist in presentation
25995-
of usage results.
25996-
All these functions return sizes measured in bytes. If an OID that does
25995+
or understanding of usage results. <literal>bigint</literal> results
25996+
are measured in bytes. If an OID that does
2599725997
not represent an existing object is passed to one of these
2599825998
functions, <literal>NULL</literal> is returned.
2599925999
</para>
@@ -26028,6 +26028,20 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
2602826028
</para></entry>
2602926029
</row>
2603026030

26031+
<row>
26032+
<entry role="func_table_entry"><para role="func_signature">
26033+
<indexterm>
26034+
<primary>pg_column_compression</primary>
26035+
</indexterm>
26036+
<function>pg_column_compression</function> ( <type>"any"</type> )
26037+
<returnvalue>integer</returnvalue>
26038+
</para>
26039+
<para>
26040+
Shows the compression algorithm that was used to compress a
26041+
an individual variable-length value.
26042+
</para></entry>
26043+
</row>
26044+
2603126045
<row>
2603226046
<entry role="func_table_entry"><para role="func_signature">
2603326047
<indexterm>

doc/src/sgml/ref/alter_table.sgml

+16
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="parameter">name</replaceable>
5454
ALTER [ COLUMN ] <replaceable class="parameter">column_name</replaceable> SET ( <replaceable class="parameter">attribute_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )
5555
ALTER [ COLUMN ] <replaceable class="parameter">column_name</replaceable> RESET ( <replaceable class="parameter">attribute_option</replaceable> [, ... ] )
5656
ALTER [ COLUMN ] <replaceable class="parameter">column_name</replaceable> SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
57+
ALTER [ COLUMN ] <replaceable class="parameter">column_name</replaceable> SET COMPRESSION <replaceable class="parameter">compression_method</replaceable>
5758
ADD <replaceable class="parameter">table_constraint</replaceable> [ NOT VALID ]
5859
ADD <replaceable class="parameter">table_constraint_using_index</replaceable>
5960
ALTER CONSTRAINT <replaceable class="parameter">constraint_name</replaceable> [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
@@ -103,6 +104,7 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
103104
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( <replaceable>sequence_options</replaceable> ) ] |
104105
UNIQUE <replaceable class="parameter">index_parameters</replaceable> |
105106
PRIMARY KEY <replaceable class="parameter">index_parameters</replaceable> |
107+
COMPRESSION <replaceable class="parameter">compression_method</replaceable> |
106108
REFERENCES <replaceable class="parameter">reftable</replaceable> [ ( <replaceable class="parameter">refcolumn</replaceable> ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]
107109
[ ON DELETE <replaceable class="parameter">referential_action</replaceable> ] [ ON UPDATE <replaceable class="parameter">referential_action</replaceable> ] }
108110
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
@@ -383,6 +385,20 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
383385
</listitem>
384386
</varlistentry>
385387

388+
<varlistentry>
389+
<term>
390+
<literal>SET COMPRESSION <replaceable class="parameter">compression_method</replaceable></literal>
391+
</term>
392+
<listitem>
393+
<para>
394+
This sets the compression method for a column. The supported compression
395+
methods are <literal>pglz</literal> and <literal>lz4</literal>.
396+
<literal>lz4</literal> is available only if <literal>--with-lz4</literal>
397+
was used when building <productname>PostgreSQL</productname>.
398+
</para>
399+
</listitem>
400+
</varlistentry>
401+
386402
<varlistentry>
387403
<term><literal>ADD <replaceable class="parameter">table_constraint</replaceable> [ NOT VALID ]</literal></term>
388404
<listitem>

0 commit comments

Comments
 (0)