Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit d9bae53

Browse files
committed
Implement streaming xlog for backup tools
Add option for parallel streaming of the transaction log while a base backup is running, to get the logfiles before the server has removed them. Also add a tool called pg_receivexlog, which streams the transaction log into files, creating a log archive without having to wait for segments to complete, thus decreasing the window of data loss without having to waste space using archive_timeout. This works best in combination with archive_command - suggested usage docs etc coming later.
1 parent 2b64f3f commit d9bae53

13 files changed

+1805
-165
lines changed

doc/src/sgml/ref/allfiles.sgml

+1
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
172172
<!ENTITY pgCtl SYSTEM "pg_ctl-ref.sgml">
173173
<!ENTITY pgDump SYSTEM "pg_dump.sgml">
174174
<!ENTITY pgDumpall SYSTEM "pg_dumpall.sgml">
175+
<!ENTITY pgReceivexlog SYSTEM "pg_receivexlog.sgml">
175176
<!ENTITY pgResetxlog SYSTEM "pg_resetxlog.sgml">
176177
<!ENTITY pgRestore SYSTEM "pg_restore.sgml">
177178
<!ENTITY postgres SYSTEM "postgres-ref.sgml">

doc/src/sgml/ref/pg_basebackup.sgml

+53-12
Original file line numberDiff line numberDiff line change
@@ -143,8 +143,8 @@ PostgreSQL documentation
143143
</varlistentry>
144144

145145
<varlistentry>
146-
<term><option>-x</option></term>
147-
<term><option>--xlog</option></term>
146+
<term><option>-x <replaceable class="parameter">method</replaceable></option></term>
147+
<term><option>--xlog=<replaceable class="parameter">method</replaceable></option></term>
148148
<listitem>
149149
<para>
150150
Includes the required transaction log files (WAL files) in the
@@ -154,16 +154,43 @@ PostgreSQL documentation
154154
to consult the log archive, thus making this a completely standalone
155155
backup.
156156
</para>
157-
<note>
158-
<para>
159-
The transaction log files are collected at the end of the backup.
160-
Therefore, it is necessary for the
161-
<xref linkend="guc-wal-keep-segments"> parameter to be set high
162-
enough that the log is not removed before the end of the backup.
163-
If the log has been rotated when it's time to transfer it, the
164-
backup will fail and be unusable.
165-
</para>
166-
</note>
157+
<para>
158+
The following methods for collecting the transaction logs are
159+
supported:
160+
161+
<variablelist>
162+
<varlistentry>
163+
<term><literal>f</literal></term>
164+
<term><literal>fetch</literal></term>
165+
<listitem>
166+
<para>
167+
The transaction log files are collected at the end of the backup.
168+
Therefore, it is necessary for the
169+
<xref linkend="guc-wal-keep-segments"> parameter to be set high
170+
enough that the log is not removed before the end of the backup.
171+
If the log has been rotated when it's time to transfer it, the
172+
backup will fail and be unusable.
173+
</para>
174+
</listitem>
175+
</varlistentry>
176+
177+
<varlistentry>
178+
<term><literal>s</literal></term>
179+
<term><literal>stream</literal></term>
180+
<listitem>
181+
<para>
182+
Stream the transaction log while the backup is created. This will
183+
open a second connection to the server and start streaming the
184+
transaction log in parallel while running the backup. Therefore,
185+
it will use up two slots configured by the
186+
<xref linkend="guc-max-wal-senders"> parameter. As long as the
187+
client can keep up with transaction log received, using this mode
188+
requires no extra transaction logs to be saved on the master.
189+
</para>
190+
</listitem>
191+
</varlistentry>
192+
</variablelist>
193+
</para>
167194
</listitem>
168195
</varlistentry>
169196

@@ -260,6 +287,20 @@ PostgreSQL documentation
260287
The following command-line options control the database connection parameters.
261288

262289
<variablelist>
290+
<varlistentry>
291+
<term><option>-s <replaceable class="parameter">interval</replaceable></option></term>
292+
<term><option>--statusint=<replaceable class="parameter">interval</replaceable></option></term>
293+
<listitem>
294+
<para>
295+
Specifies the number of seconds between status packets sent back to the
296+
server. This is required when streaming the transaction log (using
297+
<literal>--xlog=stream</literal>) if replication timeout is configured
298+
on the server, and allows for easier monitoring. The default value is
299+
10 seconds.
300+
</para>
301+
</listitem>
302+
</varlistentry>
303+
263304
<varlistentry>
264305
<term><option>-h <replaceable class="parameter">host</replaceable></option></term>
265306
<term><option>--host=<replaceable class="parameter">host</replaceable></option></term>

doc/src/sgml/ref/pg_receivexlog.sgml

+270
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
<!--
2+
doc/src/sgml/ref/pg_receivexlog.sgml
3+
PostgreSQL documentation
4+
-->
5+
6+
<refentry id="app-pgreceivexlog">
7+
<refmeta>
8+
<refentrytitle>pg_receivexlog</refentrytitle>
9+
<manvolnum>1</manvolnum>
10+
<refmiscinfo>Application</refmiscinfo>
11+
</refmeta>
12+
13+
<refnamediv>
14+
<refname>pg_receivexlog</refname>
15+
<refpurpose>streams transaction logs from a <productname>PostgreSQL</productname> cluster</refpurpose>
16+
</refnamediv>
17+
18+
<indexterm zone="app-pgreceivexlog">
19+
<primary>pg_receivexlog</primary>
20+
</indexterm>
21+
22+
<refsynopsisdiv>
23+
<cmdsynopsis>
24+
<command>pg_receivexlog</command>
25+
<arg rep="repeat"><replaceable>option</></arg>
26+
</cmdsynopsis>
27+
</refsynopsisdiv>
28+
29+
<refsect1>
30+
<title>
31+
Description
32+
</title>
33+
<para>
34+
<application>pg_receivexlog</application> is used to stream transaction log
35+
from a running <productname>PostgreSQL</productname> cluster. The transaction
36+
log is streamed using the streaming replication protocol, and is written
37+
to a local directory of files. This directory can be used as the archive
38+
location for doing a restore using point-in-time recovery (see
39+
<xref linkend="continuous-archiving">).
40+
</para>
41+
42+
<para>
43+
<application>pg_receivexlog</application> streams the transaction
44+
log in real time as it's being generated on the server, and does not wait
45+
for segments to complete like <xref linkend="guc-archive-command"> does.
46+
For this reason, it is not necessary to set
47+
<xref linkend="guc-archive-timeout"> when using
48+
<application>pg_receivexlog</application>.
49+
</para>
50+
51+
<para>
52+
The transaction log is streamed over a regular
53+
<productname>PostgreSQL</productname> connection, and uses the
54+
replication protocol. The connection must be
55+
made with a user having <literal>REPLICATION</literal> permissions (see
56+
<xref linkend="role-attributes">), and the user must be granted explicit
57+
permissions in <filename>pg_hba.conf</filename>. The server must also
58+
be configured with <xref linkend="guc-max-wal-senders"> set high enough
59+
to leave at least one session available for the stream.
60+
</para>
61+
</refsect1>
62+
63+
<refsect1>
64+
<title>Options</title>
65+
66+
<para>
67+
The following command-line options control the location and format of the
68+
output.
69+
70+
<variablelist>
71+
<varlistentry>
72+
<term><option>-D <replaceable class="parameter">directory</replaceable></option></term>
73+
<term><option>--dir=<replaceable class="parameter">directory</replaceable></option></term>
74+
<listitem>
75+
<para>
76+
Directory to write the output to.
77+
</para>
78+
<para>
79+
This parameter is required.
80+
</para>
81+
</listitem>
82+
</varlistentry>
83+
</variablelist>
84+
</para>
85+
<para>
86+
The following command-line options control the running of the program.
87+
88+
<variablelist>
89+
<varlistentry>
90+
<term><option>-v</option></term>
91+
<term><option>--verbose</option></term>
92+
<listitem>
93+
<para>
94+
Enables verbose mode.
95+
</para>
96+
</listitem>
97+
</varlistentry>
98+
99+
</variablelist>
100+
</para>
101+
102+
<para>
103+
The following command-line options control the database connection parameters.
104+
105+
<variablelist>
106+
<varlistentry>
107+
<term><option>-s <replaceable class="parameter">interval</replaceable></option></term>
108+
<term><option>--statusint=<replaceable class="parameter">interval</replaceable></option></term>
109+
<listitem>
110+
<para>
111+
Specifies the number of seconds between status packets sent back to the
112+
server. This is required if replication timeout is configured on the
113+
server, and allows for easier monitoring. The default value is
114+
10 seconds.
115+
</para>
116+
</listitem>
117+
</varlistentry>
118+
119+
<varlistentry>
120+
<term><option>-h <replaceable class="parameter">host</replaceable></option></term>
121+
<term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
122+
<listitem>
123+
<para>
124+
Specifies the host name of the machine on which the server is
125+
running. If the value begins with a slash, it is used as the
126+
directory for the Unix domain socket. The default is taken
127+
from the <envar>PGHOST</envar> environment variable, if set,
128+
else a Unix domain socket connection is attempted.
129+
</para>
130+
</listitem>
131+
</varlistentry>
132+
133+
<varlistentry>
134+
<term><option>-p <replaceable class="parameter">port</replaceable></option></term>
135+
<term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
136+
<listitem>
137+
<para>
138+
Specifies the TCP port or local Unix domain socket file
139+
extension on which the server is listening for connections.
140+
Defaults to the <envar>PGPORT</envar> environment variable, if
141+
set, or a compiled-in default.
142+
</para>
143+
</listitem>
144+
</varlistentry>
145+
146+
<varlistentry>
147+
<term><option>-U <replaceable>username</replaceable></option></term>
148+
<term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
149+
<listitem>
150+
<para>
151+
User name to connect as.
152+
</para>
153+
</listitem>
154+
</varlistentry>
155+
156+
<varlistentry>
157+
<term><option>-w</></term>
158+
<term><option>--no-password</></term>
159+
<listitem>
160+
<para>
161+
Never issue a password prompt. If the server requires
162+
password authentication and a password is not available by
163+
other means such as a <filename>.pgpass</filename> file, the
164+
connection attempt will fail. This option can be useful in
165+
batch jobs and scripts where no user is present to enter a
166+
password.
167+
</para>
168+
</listitem>
169+
</varlistentry>
170+
171+
<varlistentry>
172+
<term><option>-W</option></term>
173+
<term><option>--password</option></term>
174+
<listitem>
175+
<para>
176+
Force <application>pg_receivexlog</application> to prompt for a
177+
password before connecting to a database.
178+
</para>
179+
180+
<para>
181+
This option is never essential, since
182+
<application>pg_receivexlog</application> will automatically prompt
183+
for a password if the server demands password authentication.
184+
However, <application>pg_receivexlog</application> will waste a
185+
connection attempt finding out that the server wants a password.
186+
In some cases it is worth typing <option>-W</> to avoid the extra
187+
connection attempt.
188+
</para>
189+
</listitem>
190+
</varlistentry>
191+
</variablelist>
192+
</para>
193+
194+
<para>
195+
Other, less commonly used, parameters are also available:
196+
197+
<variablelist>
198+
<varlistentry>
199+
<term><option>-V</></term>
200+
<term><option>--version</></term>
201+
<listitem>
202+
<para>
203+
Print the <application>pg_receivexlog</application> version and exit.
204+
</para>
205+
</listitem>
206+
</varlistentry>
207+
208+
<varlistentry>
209+
<term><option>-?</></term>
210+
<term><option>--help</></term>
211+
<listitem>
212+
<para>
213+
Show help about <application>pg_receivexlog</application> command line
214+
arguments, and exit.
215+
</para>
216+
</listitem>
217+
</varlistentry>
218+
219+
</variablelist>
220+
</para>
221+
222+
</refsect1>
223+
224+
<refsect1>
225+
<title>Environment</title>
226+
227+
<para>
228+
This utility, like most other <productname>PostgreSQL</> utilities,
229+
uses the environment variables supported by <application>libpq</>
230+
(see <xref linkend="libpq-envars">).
231+
</para>
232+
233+
</refsect1>
234+
235+
<refsect1>
236+
<title>Notes</title>
237+
238+
<para>
239+
When using <application>pg_receivexlog</application> instead of
240+
<xref linkend="guc-archive-command">, the server will continue to
241+
recycle transaction log files even if the backups are not properly
242+
archived, since there is no command that fails. This can be worked
243+
around by having an <xref linkend="guc-archive-command"> that fails
244+
when the file has not been properly archived yet.
245+
</para>
246+
247+
</refsect1>
248+
249+
<refsect1>
250+
<title>Examples</title>
251+
252+
<para>
253+
To stream the transaction log from the server at
254+
<literal>mydbserver</literal> and store it in the local directory
255+
<filename>/usr/local/pgsql/archive</filename>:
256+
<screen>
257+
<prompt>$</prompt> <userinput>pg_receivexlog -h mydbserver -D /home/pgbackup/archive</userinput>
258+
</screen>
259+
</para>
260+
</refsect1>
261+
262+
<refsect1>
263+
<title>See Also</title>
264+
265+
<simplelist type="inline">
266+
<member><xref linkend="APP-PGBASEBACKUP"></member>
267+
</simplelist>
268+
</refsect1>
269+
270+
</refentry>

doc/src/sgml/reference.sgml

+1
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,7 @@
220220
&pgConfig;
221221
&pgDump;
222222
&pgDumpall;
223+
&pgReceivexlog;
223224
&pgRestore;
224225
&psqlRef;
225226
&reindexdb;

src/bin/pg_basebackup/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
/pg_basebackup
2+
/pg_receivexlog

0 commit comments

Comments
 (0)