|
297 | 297 | transaction processing. Briefly, <acronym>WAL</acronym>'s central
|
298 | 298 | concept is that changes to data files (where tables and indexes
|
299 | 299 | reside) must be written only after those changes have been logged,
|
300 |
| - that is, after log records describing the changes have been flushed |
| 300 | + that is, after WAL records describing the changes have been flushed |
301 | 301 | to permanent storage. If we follow this procedure, we do not need
|
302 | 302 | to flush data pages to disk on every transaction commit, because we
|
303 | 303 | know that in the event of a crash we will be able to recover the
|
304 | 304 | database using the log: any changes that have not been applied to
|
305 |
| - the data pages can be redone from the log records. (This is |
| 305 | + the data pages can be redone from the WAL records. (This is |
306 | 306 | roll-forward recovery, also known as REDO.)
|
307 | 307 | </para>
|
308 | 308 |
|
|
323 | 323 |
|
324 | 324 | <para>
|
325 | 325 | Using <acronym>WAL</acronym> results in a
|
326 |
| - significantly reduced number of disk writes, because only the log |
| 326 | + significantly reduced number of disk writes, because only the WAL |
327 | 327 | file needs to be flushed to disk to guarantee that a transaction is
|
328 | 328 | committed, rather than every data file changed by the transaction.
|
329 |
| - The log file is written sequentially, |
330 |
| - and so the cost of syncing the log is much less than the cost of |
| 329 | + The WAL file is written sequentially, |
| 330 | + and so the cost of syncing the WAL is much less than the cost of |
331 | 331 | flushing the data pages. This is especially true for servers
|
332 | 332 | handling many small transactions touching different parts of the data
|
333 | 333 | store. Furthermore, when the server is processing many small concurrent
|
334 |
| - transactions, one <function>fsync</function> of the log file may |
| 334 | + transactions, one <function>fsync</function> of the WAL file may |
335 | 335 | suffice to commit many transactions.
|
336 | 336 | </para>
|
337 | 337 |
|
|
341 | 341 | linkend="continuous-archiving"/>. By archiving the WAL data we can support
|
342 | 342 | reverting to any time instant covered by the available WAL data:
|
343 | 343 | we simply install a prior physical backup of the database, and
|
344 |
| - replay the WAL log just as far as the desired time. What's more, |
| 344 | + replay the WAL just as far as the desired time. What's more, |
345 | 345 | the physical backup doesn't have to be an instantaneous snapshot
|
346 | 346 | of the database state — if it is made over some period of time,
|
347 |
| - then replaying the WAL log for that period will fix any internal |
| 347 | + then replaying the WAL for that period will fix any internal |
348 | 348 | inconsistencies.
|
349 | 349 | </para>
|
350 | 350 | </sect1>
|
|
497 | 497 | that the heap and index data files have been updated with all
|
498 | 498 | information written before that checkpoint. At checkpoint time, all
|
499 | 499 | dirty data pages are flushed to disk and a special checkpoint record is
|
500 |
| - written to the log file. (The change records were previously flushed |
| 500 | + written to the WAL file. (The change records were previously flushed |
501 | 501 | to the <acronym>WAL</acronym> files.)
|
502 | 502 | In the event of a crash, the crash recovery procedure looks at the latest
|
503 |
| - checkpoint record to determine the point in the log (known as the redo |
| 503 | + checkpoint record to determine the point in the WAL (known as the redo |
504 | 504 | record) from which it should start the REDO operation. Any changes made to
|
505 | 505 | data files before that point are guaranteed to be already on disk.
|
506 |
| - Hence, after a checkpoint, log segments preceding the one containing |
| 506 | + Hence, after a checkpoint, WAL segments preceding the one containing |
507 | 507 | the redo record are no longer needed and can be recycled or removed. (When
|
508 |
| - <acronym>WAL</acronym> archiving is being done, the log segments must be |
| 508 | + <acronym>WAL</acronym> archiving is being done, the WAL segments must be |
509 | 509 | archived before being recycled or removed.)
|
510 | 510 | </para>
|
511 | 511 |
|
|
544 | 544 | another factor to consider. To ensure data page consistency,
|
545 | 545 | the first modification of a data page after each checkpoint results in
|
546 | 546 | logging the entire page content. In that case,
|
547 |
| - a smaller checkpoint interval increases the volume of output to the WAL log, |
| 547 | + a smaller checkpoint interval increases the volume of output to the WAL, |
548 | 548 | partially negating the goal of using a smaller interval,
|
549 | 549 | and in any case causing more disk I/O.
|
550 | 550 | </para>
|
|
614 | 614 | <para>
|
615 | 615 | The number of WAL segment files in <filename>pg_wal</filename> directory depends on
|
616 | 616 | <varname>min_wal_size</varname>, <varname>max_wal_size</varname> and
|
617 |
| - the amount of WAL generated in previous checkpoint cycles. When old log |
| 617 | + the amount of WAL generated in previous checkpoint cycles. When old WAL |
618 | 618 | segment files are no longer needed, they are removed or recycled (that is,
|
619 | 619 | renamed to become future segments in the numbered sequence). If, due to a
|
620 |
| - short-term peak of log output rate, <varname>max_wal_size</varname> is |
| 620 | + short-term peak of WAL output rate, <varname>max_wal_size</varname> is |
621 | 621 | exceeded, the unneeded segment files will be removed until the system
|
622 | 622 | gets back under this limit. Below that limit, the system recycles enough
|
623 | 623 | WAL files to cover the estimated need until the next checkpoint, and
|
|
650 | 650 | which are similar to checkpoints in normal operation: the server forces
|
651 | 651 | all its state to disk, updates the <filename>pg_control</filename> file to
|
652 | 652 | indicate that the already-processed WAL data need not be scanned again,
|
653 |
| - and then recycles any old log segment files in the <filename>pg_wal</filename> |
| 653 | + and then recycles any old WAL segment files in the <filename>pg_wal</filename> |
654 | 654 | directory.
|
655 | 655 | Restartpoints can't be performed more frequently than checkpoints on the
|
656 | 656 | primary because restartpoints can only be performed at checkpoint records.
|
|
676 | 676 | insertion) at a time when an exclusive lock is held on affected
|
677 | 677 | data pages, so the operation needs to be as fast as possible. What
|
678 | 678 | is worse, writing <acronym>WAL</acronym> buffers might also force the
|
679 |
| - creation of a new log segment, which takes even more |
| 679 | + creation of a new WAL segment, which takes even more |
680 | 680 | time. Normally, <acronym>WAL</acronym> buffers should be written
|
681 | 681 | and flushed by an <function>XLogFlush</function> request, which is
|
682 | 682 | made, for the most part, at transaction commit time to ensure that
|
683 | 683 | transaction records are flushed to permanent storage. On systems
|
684 |
| - with high log output, <function>XLogFlush</function> requests might |
| 684 | + with high WAL output, <function>XLogFlush</function> requests might |
685 | 685 | not occur often enough to prevent <function>XLogInsertRecord</function>
|
686 | 686 | from having to do writes. On such systems
|
687 | 687 | one should increase the number of <acronym>WAL</acronym> buffers by
|
|
724 | 724 | <varname>commit_delay</varname>, so this value is recommended as the
|
725 | 725 | starting point to use when optimizing for a particular workload. While
|
726 | 726 | tuning <varname>commit_delay</varname> is particularly useful when the
|
727 |
| - WAL log is stored on high-latency rotating disks, benefits can be |
| 727 | + WAL is stored on high-latency rotating disks, benefits can be |
728 | 728 | significant even on storage media with very fast sync times, such as
|
729 | 729 | solid-state drives or RAID arrays with a battery-backed write cache;
|
730 | 730 | but this should definitely be tested against a representative workload.
|
|
828 | 828 | <para>
|
829 | 829 | <acronym>WAL</acronym> is automatically enabled; no action is
|
830 | 830 | required from the administrator except ensuring that the
|
831 |
| - disk-space requirements for the <acronym>WAL</acronym> logs are met, |
| 831 | + disk-space requirements for the <acronym>WAL</acronym> files are met, |
832 | 832 | and that any necessary tuning is done (see <xref
|
833 | 833 | linkend="wal-configuration"/>).
|
834 | 834 | </para>
|
835 | 835 |
|
836 | 836 | <para>
|
837 | 837 | <acronym>WAL</acronym> records are appended to the <acronym>WAL</acronym>
|
838 |
| - logs as each new record is written. The insert position is described by |
| 838 | + files as each new record is written. The insert position is described by |
839 | 839 | a Log Sequence Number (<acronym>LSN</acronym>) that is a byte offset into
|
840 |
| - the logs, increasing monotonically with each new record. |
| 840 | + the WAL, increasing monotonically with each new record. |
841 | 841 | <acronym>LSN</acronym> values are returned as the datatype
|
842 | 842 | <link linkend="datatype-pg-lsn"><type>pg_lsn</type></link>. Values can be
|
843 | 843 | compared to calculate the volume of <acronym>WAL</acronym> data that
|
|
846 | 846 | </para>
|
847 | 847 |
|
848 | 848 | <para>
|
849 |
| - <acronym>WAL</acronym> logs are stored in the directory |
| 849 | + <acronym>WAL</acronym> files are stored in the directory |
850 | 850 | <filename>pg_wal</filename> under the data directory, as a set of
|
851 | 851 | segment files, normally each 16 MB in size (but the size can be changed
|
852 | 852 | by altering the <option>--wal-segsize</option> <application>initdb</application> option). Each segment is
|
853 | 853 | divided into pages, normally 8 kB each (this size can be changed via the
|
854 |
| - <option>--with-wal-blocksize</option> configure option). The log record headers |
| 854 | + <option>--with-wal-blocksize</option> configure option). The WAL record headers |
855 | 855 | are described in <filename>access/xlogrecord.h</filename>; the record
|
856 | 856 | content is dependent on the type of event that is being logged. Segment
|
857 | 857 | files are given ever-increasing numbers as names, starting at
|
|
861 | 861 | </para>
|
862 | 862 |
|
863 | 863 | <para>
|
864 |
| - It is advantageous if the log is located on a different disk from the |
| 864 | + It is advantageous if the WAL is located on a different disk from the |
865 | 865 | main database files. This can be achieved by moving the
|
866 | 866 | <filename>pg_wal</filename> directory to another location (while the server
|
867 | 867 | is shut down, of course) and creating a symbolic link from the
|
|
877 | 877 | on the disk. A power failure in such a situation might lead to
|
878 | 878 | irrecoverable data corruption. Administrators should try to ensure
|
879 | 879 | that disks holding <productname>PostgreSQL</productname>'s
|
880 |
| - <acronym>WAL</acronym> log files do not make such false reports. |
| 880 | + <acronym>WAL</acronym> files do not make such false reports. |
881 | 881 | (See <xref linkend="wal-reliability"/>.)
|
882 | 882 | </para>
|
883 | 883 |
|
884 | 884 | <para>
|
885 |
| - After a checkpoint has been made and the log flushed, the |
| 885 | + After a checkpoint has been made and the WAL flushed, the |
886 | 886 | checkpoint's position is saved in the file
|
887 | 887 | <filename>pg_control</filename>. Therefore, at the start of recovery,
|
888 | 888 | the server first reads <filename>pg_control</filename> and
|
889 | 889 | then the checkpoint record; then it performs the REDO operation by
|
890 |
| - scanning forward from the log location indicated in the checkpoint |
| 890 | + scanning forward from the WAL location indicated in the checkpoint |
891 | 891 | record. Because the entire content of data pages is saved in the
|
892 |
| - log on the first page modification after a checkpoint (assuming |
| 892 | + WAL on the first page modification after a checkpoint (assuming |
893 | 893 | <xref linkend="guc-full-page-writes"/> is not disabled), all pages
|
894 | 894 | changed since the checkpoint will be restored to a consistent
|
895 | 895 | state.
|
896 | 896 | </para>
|
897 | 897 |
|
898 | 898 | <para>
|
899 | 899 | To deal with the case where <filename>pg_control</filename> is
|
900 |
| - corrupt, we should support the possibility of scanning existing log |
| 900 | + corrupt, we should support the possibility of scanning existing WAL |
901 | 901 | segments in reverse order — newest to oldest — in order to find the
|
902 | 902 | latest checkpoint. This has not been implemented yet.
|
903 | 903 | <filename>pg_control</filename> is small enough (less than one disk page)
|
|
0 commit comments