Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 4c49d8f

Browse files
committed
Doc: clean up verify_heapam() documentation.
I started with the intention of just suppressing a PDF build warning by removing the example output, but ended up doing more: correcting factual errors in the function's signature, moving a bunch of generalized handwaving into the "Using amcheck Effectively" section which seemed a better place for it, and improving wording and markup a little bit. Discussion: https://postgr.es/m/732904.1603728748@sss.pgh.pa.us
1 parent 66f8687 commit 4c49d8f

File tree

1 file changed

+65
-92
lines changed

1 file changed

+65
-92
lines changed

doc/src/sgml/amcheck.sgml

+65-92
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ AND c.relpersistence != 't'
8383
-- Function may throw an error when this is omitted:
8484
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
8585
ORDER BY c.relpages DESC LIMIT 10;
86-
bt_index_check | relname | relpages
86+
bt_index_check | relname | relpages
8787
----------------+---------------------------------+----------
8888
| pg_depend_reference_index | 43
8989
| pg_depend_depender_index | 40
@@ -208,104 +208,32 @@ SET client_min_messages = DEBUG1;
208208
verify_heapam(relation regclass,
209209
on_error_stop boolean,
210210
check_toast boolean,
211-
skip cstring,
211+
skip text,
212212
startblock bigint,
213213
endblock bigint,
214214
blkno OUT bigint,
215215
offnum OUT integer,
216216
attnum OUT integer,
217217
msg OUT text)
218-
returns record
218+
returns setof record
219219
</function>
220220
</term>
221221
<listitem>
222222
<para>
223223
Checks a table for structural corruption, where pages in the relation
224224
contain data that is invalidly formatted, and for logical corruption,
225225
where pages are structurally valid but inconsistent with the rest of the
226-
database cluster. Example usage:
227-
<screen>
228-
test=# select * from verify_heapam('mytable', check_toast := true);
229-
blkno | offnum | attnum | msg
230-
-------+--------+--------+--------------------------------------------------------------------------------------------------
231-
17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
232-
960 | 4 | | data begins at offset 152 beyond the tuple length 58
233-
960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
234-
960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
235-
960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
236-
960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
237-
1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
238-
1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
239-
1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
240-
1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
241-
1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
242-
1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
243-
1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
244-
1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
245-
(14 rows)
246-
</screen>
247-
As this example shows, the Tuple ID (TID) of the corrupt tuple is given
248-
in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
249-
for corruptions specific to a particular attribute in the tuple, the
250-
<literal>attnum</literal> field shows which one.
251-
</para>
252-
<para>
253-
Structural corruption can happen due to faulty storage hardware, or
254-
relation files being overwritten or modified by unrelated software.
255-
This kind of corruption can also be detected with
256-
<link linkend="app-initdb-data-checksums"><application>data page
257-
checksums</application></link>.
258-
</para>
259-
<para>
260-
Relation pages which are correctly formatted, internally consistent, and
261-
correct relative to their own internal checksums may still contain
262-
logical corruption. As such, this kind of corruption cannot be detected
263-
with <application>checksums</application>. Examples include toasted
264-
values in the main table which lack a corresponding entry in the toast
265-
table, and tuples in the main table with a Transaction ID that is older
266-
than the oldest valid Transaction ID in the database or cluster.
267-
</para>
268-
<para>
269-
Multiple causes of logical corruption have been observed in production
270-
systems, including bugs in the <productname>PostgreSQL</productname>
271-
server software, faulty and ill-conceived backup and restore tools, and
272-
user error.
273-
</para>
274-
<para>
275-
Corrupt relations are most concerning in live production environments,
276-
precisely the same environments where high risk activities are least
277-
welcome. For this reason, <function>verify_heapam</function> has been
278-
designed to diagnose corruption without undue risk. It cannot guard
279-
against all causes of backend crashes, as even executing the calling
280-
query could be unsafe on a badly corrupted system. Access to <link
281-
linkend="catalogs-overview">catalog tables</link> are performed and could
282-
be problematic if the catalogs themselves are corrupted.
283-
</para>
284-
<para>
285-
The design principle adhered to in <function>verify_heapam</function> is
286-
that, if the rest of the system and server hardware are correct, under
287-
default options, <function>verify_heapam</function> will not crash the
288-
server due merely to structural or logical corruption in the target
289-
table.
290-
</para>
291-
<para>
292-
The <literal>check_toast</literal> attempts to reconcile the target
293-
table against entries in its corresponding toast table. This option is
294-
disabled by default and is known to be slow.
295-
If the target relation's corresponding toast table or toast index is
296-
corrupt, reconciling the target table against toast values could
297-
conceivably crash the server, although in many cases this would
298-
just produce an error.
226+
database cluster.
299227
</para>
300228
<para>
301229
The following optional arguments are recognized:
302230
</para>
303231
<variablelist>
304232
<varlistentry>
305-
<term>on_error_stop</term>
233+
<term><literal>on_error_stop</literal></term>
306234
<listitem>
307235
<para>
308-
If true, corruption checking stops at the end of the first block on
236+
If true, corruption checking stops at the end of the first block in
309237
which any corruptions are found.
310238
</para>
311239
<para>
@@ -314,23 +242,29 @@ test=# select * from verify_heapam('mytable', check_toast := true);
314242
</listitem>
315243
</varlistentry>
316244
<varlistentry>
317-
<term>check_toast</term>
245+
<term><literal>check_toast</literal></term>
318246
<listitem>
319247
<para>
320-
If true, toasted values are checked gainst the corresponding
248+
If true, toasted values are checked against the target relation's
321249
TOAST table.
322250
</para>
251+
<para>
252+
This option is known to be slow. Also, if the toast table or its
253+
index is corrupt, checking it against toast values could conceivably
254+
crash the server, although in many cases this would just produce an
255+
error.
256+
</para>
323257
<para>
324258
Defaults to false.
325259
</para>
326260
</listitem>
327261
</varlistentry>
328262
<varlistentry>
329-
<term>skip</term>
263+
<term><literal>skip</literal></term>
330264
<listitem>
331265
<para>
332266
If not <literal>none</literal>, corruption checking skips blocks that
333-
are marked as all-visible or all-frozen, as given.
267+
are marked as all-visible or all-frozen, as specified.
334268
Valid options are <literal>all-visible</literal>,
335269
<literal>all-frozen</literal> and <literal>none</literal>.
336270
</para>
@@ -340,7 +274,7 @@ test=# select * from verify_heapam('mytable', check_toast := true);
340274
</listitem>
341275
</varlistentry>
342276
<varlistentry>
343-
<term>startblock</term>
277+
<term><literal>startblock</literal></term>
344278
<listitem>
345279
<para>
346280
If specified, corruption checking begins at the specified block,
@@ -349,12 +283,12 @@ test=# select * from verify_heapam('mytable', check_toast := true);
349283
target table.
350284
</para>
351285
<para>
352-
By default, does not skip any blocks.
286+
By default, checking begins at the first block.
353287
</para>
354288
</listitem>
355289
</varlistentry>
356290
<varlistentry>
357-
<term>endblock</term>
291+
<term><literal>endblock</literal></term>
358292
<listitem>
359293
<para>
360294
If specified, corruption checking ends at the specified block,
@@ -363,7 +297,7 @@ test=# select * from verify_heapam('mytable', check_toast := true);
363297
table.
364298
</para>
365299
<para>
366-
By default, does not skip any blocks.
300+
By default, all blocks are checked.
367301
</para>
368302
</listitem>
369303
</varlistentry>
@@ -374,23 +308,23 @@ test=# select * from verify_heapam('mytable', check_toast := true);
374308
</para>
375309
<variablelist>
376310
<varlistentry>
377-
<term>blkno</term>
311+
<term><literal>blkno</literal></term>
378312
<listitem>
379313
<para>
380314
The number of the block containing the corrupt page.
381315
</para>
382316
</listitem>
383317
</varlistentry>
384318
<varlistentry>
385-
<term>offnum</term>
319+
<term><literal>offnum</literal></term>
386320
<listitem>
387321
<para>
388322
The OffsetNumber of the corrupt tuple.
389323
</para>
390324
</listitem>
391325
</varlistentry>
392326
<varlistentry>
393-
<term>attnum</term>
327+
<term><literal>attnum</literal></term>
394328
<listitem>
395329
<para>
396330
The attribute number of the corrupt column in the tuple, if the
@@ -399,10 +333,10 @@ test=# select * from verify_heapam('mytable', check_toast := true);
399333
</listitem>
400334
</varlistentry>
401335
<varlistentry>
402-
<term>msg</term>
336+
<term><literal>msg</literal></term>
403337
<listitem>
404338
<para>
405-
A human readable message describing the corruption in the page.
339+
A message describing the problem detected.
406340
</para>
407341
</listitem>
408342
</varlistentry>
@@ -460,7 +394,7 @@ test=# select * from verify_heapam('mytable', check_toast := true);
460394
<filename>amcheck</filename> can be effective at detecting various types of
461395
failure modes that <link
462396
linkend="app-initdb-data-checksums"><application>data page
463-
checksums</application></link> will always fail to catch. These include:
397+
checksums</application></link> will fail to catch. These include:
464398

465399
<itemizedlist>
466400
<listitem>
@@ -557,6 +491,45 @@ test=# select * from verify_heapam('mytable', check_toast := true);
557491
</para>
558492
</listitem>
559493
</itemizedlist>
494+
</para>
495+
496+
<para>
497+
Structural corruption can happen due to faulty storage hardware, or
498+
relation files being overwritten or modified by unrelated software.
499+
This kind of corruption can also be detected with
500+
<link linkend="app-initdb-data-checksums"><application>data page
501+
checksums</application></link>.
502+
</para>
503+
504+
<para>
505+
Relation pages which are correctly formatted, internally consistent, and
506+
correct relative to their own internal checksums may still contain
507+
logical corruption. As such, this kind of corruption cannot be detected
508+
with <application>checksums</application>. Examples include toasted
509+
values in the main table which lack a corresponding entry in the toast
510+
table, and tuples in the main table with a Transaction ID that is older
511+
than the oldest valid Transaction ID in the database or cluster.
512+
</para>
513+
514+
<para>
515+
Multiple causes of logical corruption have been observed in production
516+
systems, including bugs in the <productname>PostgreSQL</productname>
517+
server software, faulty and ill-conceived backup and restore tools, and
518+
user error.
519+
</para>
520+
521+
<para>
522+
Corrupt relations are most concerning in live production environments,
523+
precisely the same environments where high risk activities are least
524+
welcome. For this reason, <function>verify_heapam</function> has been
525+
designed to diagnose corruption without undue risk. It cannot guard
526+
against all causes of backend crashes, as even executing the calling
527+
query could be unsafe on a badly corrupted system. Access to <link
528+
linkend="catalogs-overview">catalog tables</link> are performed and could
529+
be problematic if the catalogs themselves are corrupted.
530+
</para>
531+
532+
<para>
560533
In general, <filename>amcheck</filename> can only prove the presence of
561534
corruption; it cannot prove its absence.
562535
</para>

0 commit comments

Comments
 (0)