Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6b0be33

Browse files
committed
Update WAL configuration discussion to reflect post-7.1 tweaking.
Minor copy-editing.
1 parent 8394e47 commit 6b0be33

File tree

1 file changed

+80
-39
lines changed

1 file changed

+80
-39
lines changed

doc/src/sgml/wal.sgml

Lines changed: 80 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v 1.11 2001/09/29 04:02:19 tgl Exp $ -->
1+
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v 1.12 2001/10/26 23:10:21 tgl Exp $ -->
22

33
<chapter id="wal">
44
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
@@ -88,8 +88,11 @@
8888
transaction identifiers. Once UNDO is implemented,
8989
<filename>pg_clog</filename> will no longer be required to be
9090
permanent; it will be possible to remove
91-
<filename>pg_clog</filename> at shutdown, split it into segments
92-
and remove old segments.
91+
<filename>pg_clog</filename> at shutdown. (However, the urgency
92+
of this concern has decreased greatly with the adoption of a segmented
93+
storage method for <filename>pg_clog</filename> --- it is no longer
94+
necessary to keep old <filename>pg_clog</filename> entries around
95+
forever.)
9396
</para>
9497

9598
<para>
@@ -116,6 +119,18 @@
116119
copying the data files (operating system copy commands are not
117120
suitable).
118121
</para>
122+
123+
<para>
124+
A difficulty standing in the way of realizing these benefits is that they
125+
require saving <acronym>WAL</acronym> entries for considerable periods
126+
of time (eg, as long as the longest possible transaction if transaction
127+
UNDO is wanted). The present <acronym>WAL</acronym> format is
128+
extremely bulky since it includes many disk page snapshots.
129+
This is not a serious concern at present, since the entries only need
130+
to be kept for one or two checkpoint intervals; but to achieve
131+
these future benefits some sort of compressed <acronym>WAL</acronym>
132+
format will be needed.
133+
</para>
119134
</sect2>
120135
</sect1>
121136

@@ -133,8 +148,8 @@
133148
<para>
134149
<acronym>WAL</acronym> logs are stored in the directory
135150
<Filename><replaceable>$PGDATA</replaceable>/pg_xlog</Filename>, as
136-
a set of segment files, each 16 MB in size. Each segment is
137-
divided into 8 kB pages. The log record headers are described in
151+
a set of segment files, each 16MB in size. Each segment is
152+
divided into 8KB pages. The log record headers are described in
138153
<filename>access/xlog.h</filename>; record content is dependent on
139154
the type of event that is being logged. Segment files are given
140155
ever-increasing numbers as names, starting at
@@ -147,8 +162,8 @@
147162
The <acronym>WAL</acronym> buffers and control structure are in
148163
shared memory, and are handled by the backends; they are protected
149164
by lightweight locks. The demand on shared memory is dependent on the
150-
number of buffers; the default size of the <acronym>WAL</acronym>
151-
buffers is 64 kB.
165+
number of buffers. The default size of the <acronym>WAL</acronym>
166+
buffers is 8 8KB buffers, or 64KB.
152167
</para>
153168

154169
<para>
@@ -166,8 +181,8 @@
166181
disk drives that falsely report a successful write to the kernel,
167182
when, in fact, they have only cached the data and not yet stored it
168183
on the disk. A power failure in such a situation may still lead to
169-
irrecoverable data corruption; administrators should try to ensure
170-
that disks holding <productname>PostgreSQL</productname>'s data and
184+
irrecoverable data corruption. Administrators should try to ensure
185+
that disks holding <productname>PostgreSQL</productname>'s
171186
log files do not make such false reports.
172187
</para>
173188

@@ -179,11 +194,12 @@
179194
checkpoint's position is saved in the file
180195
<filename>pg_control</filename>. Therefore, when recovery is to be
181196
done, the backend first reads <filename>pg_control</filename> and
182-
then the checkpoint record; next it reads the redo record, whose
183-
position is saved in the checkpoint, and begins the REDO operation.
184-
Because the entire content of the pages is saved in the log on the
185-
first page modification after a checkpoint, the pages will be first
186-
restored to a consistent state.
197+
then the checkpoint record; then it performs the REDO operation by
198+
scanning forward from the log position indicated in the checkpoint
199+
record.
200+
Because the entire content of data pages is saved in the log on the
201+
first page modification after a checkpoint, all pages changed since
202+
the checkpoint will be restored to a consistent state.
187203
</para>
188204

189205
<para>
@@ -217,9 +233,9 @@
217233
buffers. This is undesirable because <function>LogInsert</function>
218234
is used on every database low level modification (for example,
219235
tuple insertion) at a time when an exclusive lock is held on
220-
affected data pages and the operation is supposed to be as fast as
221-
possible; what is worse, writing <acronym>WAL</acronym> buffers may
222-
also cause the creation of a new log segment, which takes even more
236+
affected data pages, so the operation needs to be as fast as
237+
possible. What is worse, writing <acronym>WAL</acronym> buffers may
238+
also force the creation of a new log segment, which takes even more
223239
time. Normally, <acronym>WAL</acronym> buffers should be written
224240
and flushed by a <function>LogFlush</function> request, which is
225241
made, for the most part, at transaction commit time to ensure that
@@ -230,7 +246,7 @@
230246
one should increase the number of <acronym>WAL</acronym> buffers by
231247
modifying the <varname>WAL_BUFFERS</varname> parameter. The default
232248
number of <acronym>WAL</acronym> buffers is 8. Increasing this
233-
value will have an impact on shared memory usage.
249+
value will correspondingly increase shared memory usage.
234250
</para>
235251

236252
<para>
@@ -243,34 +259,28 @@
243259
log (known as the redo record) it should start the REDO operation,
244260
since any changes made to data files before that record are already
245261
on disk. After a checkpoint has been made, any log segments written
246-
before the undo records are removed, so checkpoints are used to free
247-
disk space in the <acronym>WAL</acronym> directory. (When
248-
<acronym>WAL</acronym>-based <acronym>BAR</acronym> is implemented,
249-
the log segments can be archived instead of just being removed.)
250-
The checkpoint maker is also able to create a few log segments for
251-
future use, so as to avoid the need for
252-
<function>LogInsert</function> or <function>LogFlush</function> to
253-
spend time in creating them.
262+
before the undo records are no longer needed and can be recycled or
263+
removed. (When <acronym>WAL</acronym>-based <acronym>BAR</acronym> is
264+
implemented, the log segments would be archived before being recycled
265+
or removed.)
254266
</para>
255267

256268
<para>
257-
The <acronym>WAL</acronym> log is held on the disk as a set of 16
258-
MB files called <firstterm>segments</firstterm>. By default a new
259-
segment is created only if more than 75% of the current segment is
260-
used. One can instruct the server to pre-create up to 64 log segments
269+
The checkpoint maker is also able to create a few log segments for
270+
future use, so as to avoid the need for
271+
<function>LogInsert</function> or <function>LogFlush</function> to
272+
spend time in creating them. (If that happens, the entire database
273+
system will be delayed by the creation operation, so it's better if
274+
the files can be created in the checkpoint maker, which is not on
275+
anyone's critical path.)
276+
By default a new 16MB segment file is created only if more than 75% of
277+
the current segment has been used. This is inadequate if the system
278+
generates more than 4MB of log output between checkpoints.
279+
One can instruct the server to pre-create up to 64 log segments
261280
at checkpoint time by modifying the <varname>WAL_FILES</varname>
262281
configuration parameter.
263282
</para>
264283

265-
<para>
266-
For faster after-crash recovery, it would be better to create
267-
checkpoints more often. However, one should balance this against
268-
the cost of flushing dirty data pages; in addition, to ensure data
269-
page consistency, the first modification of a data page after each
270-
checkpoint results in logging the entire page content, thus
271-
increasing output to log and the log's size.
272-
</para>
273-
274284
<para>
275285
The postmaster spawns a special backend process every so often
276286
to create the next checkpoint. A checkpoint is created every
@@ -281,6 +291,35 @@
281291
<command>CHECKPOINT</command>.
282292
</para>
283293

294+
<para>
295+
Reducing <varname>CHECKPOINT_SEGMENTS</varname> and/or
296+
<varname>CHECKPOINT_TIMEOUT</varname> causes checkpoints to be
297+
done more often. This allows faster after-crash recovery (since
298+
less work will need to be redone). However, one must balance this against
299+
the increased cost of flushing dirty data pages more often. In addition,
300+
to ensure data page consistency, the first modification of a data page
301+
after each checkpoint results in logging the entire page content.
302+
Thus a smaller checkpoint interval increases the volume of output to
303+
the log, partially negating the goal of using a smaller interval, and
304+
in any case causing more disk I/O.
305+
</para>
306+
307+
<para>
308+
The number of 16MB segment files will always be at least
309+
<varname>WAL_FILES</varname> + 1, and will normally not exceed
310+
<varname>WAL_FILES</varname> + 2 * <varname>CHECKPOINT_SEGMENTS</varname>
311+
+ 1. This may be used to estimate space requirements for WAL. Ordinarily,
312+
when an old log segment file is no longer needed, it is recycled (renamed
313+
to become the next sequential future segment). If, due to a short-term
314+
peak of log output rate, there are more than <varname>WAL_FILES</varname> +
315+
2 * <varname>CHECKPOINT_SEGMENTS</varname> + 1 segment files, then unneeded
316+
segment files will be deleted instead of recycled until the system gets
317+
back under this limit. (If this happens on a regular basis,
318+
<varname>WAL_FILES</varname> should be increased to avoid it. Deleting log
319+
segments that will only have to be created again later is expensive and
320+
pointless.)
321+
</para>
322+
284323
<para>
285324
The <varname>COMMIT_DELAY</varname> parameter defines for how many
286325
microseconds the backend will sleep after writing a commit
@@ -294,6 +333,8 @@
294333
Note that on most platforms, the resolution of a sleep request is
295334
ten milliseconds, so that any nonzero <varname>COMMIT_DELAY</varname>
296335
setting between 1 and 10000 microseconds will have the same effect.
336+
Good values for these parameters are not yet clear; experimentation
337+
is encouraged.
297338
</para>
298339

299340
<para>

0 commit comments

Comments
 (0)