Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 0cc59cc

Browse files
committed
Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result)
and current server clock time to SR data messages. These are not currently used on the slave side but seem likely to be useful in future, and it'd be better not to change the SR protocol after release. Per discussion. Also do some minor code review and cleanup on walsender.c, and improve the protocol documentation.
1 parent 572ec5a commit 0cc59cc

File tree

5 files changed

+337
-207
lines changed

5 files changed

+337
-207
lines changed

doc/src/sgml/protocol.sgml

Lines changed: 168 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.87 2010/04/03 07:22:55 petere Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.88 2010/06/03 22:17:32 tgl Exp $ -->
22

33
<chapter id="protocol">
44
<title>Frontend/Backend Protocol</title>
@@ -1284,6 +1284,173 @@
12841284
</sect2>
12851285
</sect1>
12861286

1287+
<sect1 id="protocol-replication">
1288+
<title>Streaming Replication Protocol</title>
1289+
1290+
<para>
1291+
To initiate streaming replication, the frontend sends the
1292+
<literal>replication</> parameter in the startup message. This tells the
1293+
backend to go into walsender mode, wherein a small set of replication commands
1294+
can be issued instead of SQL statements. Only the simple query protocol can be
1295+
used in walsender mode.
1296+
1297+
The commands accepted in walsender mode are:
1298+
1299+
<variablelist>
1300+
<varlistentry>
1301+
<term>IDENTIFY_SYSTEM</term>
1302+
<listitem>
1303+
<para>
1304+
Requests the server to identify itself. Server replies with a result
1305+
set of a single row, containing two fields:
1306+
</para>
1307+
1308+
<para>
1309+
<variablelist>
1310+
<varlistentry>
1311+
<term>
1312+
systemid
1313+
</term>
1314+
<listitem>
1315+
<para>
1316+
The unique system identifier identifying the cluster. This
1317+
can be used to check that the base backup used to initialize the
1318+
slave came from the same cluster.
1319+
</para>
1320+
</listitem>
1321+
</varlistentry>
1322+
1323+
<varlistentry>
1324+
<term>
1325+
timeline
1326+
</term>
1327+
<listitem>
1328+
<para>
1329+
Current TimelineID. Also useful to check that the slave is
1330+
consistent with the master.
1331+
</para>
1332+
</listitem>
1333+
</varlistentry>
1334+
</variablelist>
1335+
</para>
1336+
</listitem>
1337+
</varlistentry>
1338+
1339+
<varlistentry>
1340+
<term>START_REPLICATION <replaceable>XXX</>/<replaceable>XXX</></term>
1341+
<listitem>
1342+
<para>
1343+
Instructs server to start streaming WAL, starting at
1344+
WAL position <replaceable>XXX</>/<replaceable>XXX</>.
1345+
The server can reply with an error, e.g. if the requested section of WAL
1346+
has already been recycled. On success, server responds with a
1347+
CopyOutResponse message, and then starts to stream WAL to the frontend.
1348+
WAL will continue to be streamed until the connection is broken;
1349+
no further commands will be accepted.
1350+
</para>
1351+
1352+
<para>
1353+
WAL data is sent as a series of CopyData messages. (This allows
1354+
other information to be intermixed; in particular the server can send
1355+
an ErrorResponse message if it encounters a failure after beginning
1356+
to stream.) The payload in each CopyData message follows this format:
1357+
</para>
1358+
1359+
<para>
1360+
<variablelist>
1361+
<varlistentry>
1362+
<term>
1363+
XLogData (B)
1364+
</term>
1365+
<listitem>
1366+
<para>
1367+
<variablelist>
1368+
<varlistentry>
1369+
<term>
1370+
Byte1('w')
1371+
</term>
1372+
<listitem>
1373+
<para>
1374+
Identifies the message as WAL data.
1375+
</para>
1376+
</listitem>
1377+
</varlistentry>
1378+
<varlistentry>
1379+
<term>
1380+
Byte8
1381+
</term>
1382+
<listitem>
1383+
<para>
1384+
The starting point of the WAL data in this message, given in
1385+
XLogRecPtr format.
1386+
</para>
1387+
</listitem>
1388+
</varlistentry>
1389+
<varlistentry>
1390+
<term>
1391+
Byte8
1392+
</term>
1393+
<listitem>
1394+
<para>
1395+
The current end of WAL on the server, given in
1396+
XLogRecPtr format.
1397+
</para>
1398+
</listitem>
1399+
</varlistentry>
1400+
<varlistentry>
1401+
<term>
1402+
Byte8
1403+
</term>
1404+
<listitem>
1405+
<para>
1406+
The server's system clock at the time of transmission,
1407+
given in TimestampTz format.
1408+
</para>
1409+
</listitem>
1410+
</varlistentry>
1411+
<varlistentry>
1412+
<term>
1413+
Byte<replaceable>n</replaceable>
1414+
</term>
1415+
<listitem>
1416+
<para>
1417+
A section of the WAL data stream.
1418+
</para>
1419+
</listitem>
1420+
</varlistentry>
1421+
</variablelist>
1422+
</para>
1423+
</listitem>
1424+
</varlistentry>
1425+
</variablelist>
1426+
</para>
1427+
<para>
1428+
A single WAL record is never split across two CopyData messages.
1429+
When a WAL record crosses a WAL page boundary, and is therefore
1430+
already split using continuation records, it can be split at the page
1431+
boundary. In other words, the first main WAL record and its
1432+
continuation records can be sent in different CopyData messages.
1433+
</para>
1434+
<para>
1435+
Note that all fields within the WAL data and the above-described header
1436+
will be in the sending server's native format. Endianness, and the
1437+
format for the timestamp, are unpredictable unless the receiver has
1438+
verified that the sender's system identifier matches its own
1439+
<filename>pg_control</> contents.
1440+
</para>
1441+
<para>
1442+
If the WAL sender process is terminated normally (during postmaster
1443+
shutdown), it will send a CommandComplete message before exiting.
1444+
This might not happen during an abnormal shutdown, of course.
1445+
</para>
1446+
</listitem>
1447+
</varlistentry>
1448+
</variablelist>
1449+
1450+
</para>
1451+
1452+
</sect1>
1453+
12871454
<sect1 id="protocol-message-types">
12881455
<title>Message Data Types</title>
12891456

@@ -4137,120 +4304,6 @@ not line breaks.
41374304

41384305
</sect1>
41394306

4140-
<sect1 id="protocol-replication">
4141-
<title>Streaming Replication Protocol</title>
4142-
4143-
<para>
4144-
To initiate streaming replication, the frontend sends the "replication"
4145-
parameter in the startup message. This tells the backend to go into
4146-
walsender mode, where a small set of replication commands can be issued
4147-
instead of SQL statements. Only the simple query protocol can be used in
4148-
walsender mode.
4149-
4150-
The commands accepted in walsender mode are:
4151-
4152-
<variablelist>
4153-
<varlistentry>
4154-
<term>IDENTIFY_SYSTEM</term>
4155-
<listitem>
4156-
<para>
4157-
Requests the server to identify itself. Server replies with a result
4158-
set of a single row, and two fields:
4159-
4160-
systemid: The unique system identifier identifying the cluster. This
4161-
can be used to check that the base backup used to initialize the
4162-
slave came from the same cluster.
4163-
4164-
timeline: Current TimelineID. Also used to check that the slave is
4165-
consistent with the master.
4166-
</para>
4167-
</listitem>
4168-
</varlistentry>
4169-
4170-
<varlistentry>
4171-
<term>START_REPLICATION XXX/XXX</term>
4172-
<listitem>
4173-
<para>
4174-
Instructs backend to start streaming WAL, starting at point XXX/XXX.
4175-
Server can reply with an error e.g if the requested piece of WAL has
4176-
already been recycled. On success, server responds with a
4177-
CopyOutResponse message, and backend starts to stream WAL as CopyData
4178-
messages.
4179-
The payload in CopyData message consists of the following format.
4180-
</para>
4181-
4182-
<para>
4183-
<variablelist>
4184-
<varlistentry>
4185-
<term>
4186-
XLogData (B)
4187-
</term>
4188-
<listitem>
4189-
<para>
4190-
<variablelist>
4191-
<varlistentry>
4192-
<term>
4193-
Byte1('w')
4194-
</term>
4195-
<listitem>
4196-
<para>
4197-
Identifies the message as WAL data.
4198-
</para>
4199-
</listitem>
4200-
</varlistentry>
4201-
<varlistentry>
4202-
<term>
4203-
Int32
4204-
</term>
4205-
<listitem>
4206-
<para>
4207-
The log file number of the LSN, indicating the starting point of
4208-
the WAL in the message.
4209-
</para>
4210-
</listitem>
4211-
</varlistentry>
4212-
<varlistentry>
4213-
<term>
4214-
Int32
4215-
</term>
4216-
<listitem>
4217-
<para>
4218-
The byte offset of the LSN, indicating the starting point of
4219-
the WAL in the message.
4220-
</para>
4221-
</listitem>
4222-
</varlistentry>
4223-
<varlistentry>
4224-
<term>
4225-
Byte<replaceable>n</replaceable>
4226-
</term>
4227-
<listitem>
4228-
<para>
4229-
Data that forms part of WAL data stream.
4230-
</para>
4231-
</listitem>
4232-
</varlistentry>
4233-
</variablelist>
4234-
</para>
4235-
</listitem>
4236-
</varlistentry>
4237-
</variablelist>
4238-
</para>
4239-
<para>
4240-
A single WAL record is never split across two CopyData messages. When
4241-
a WAL record crosses a WAL page boundary, however, and is therefore
4242-
already split using continuation records, it can be split at the page
4243-
boundary. In other words, the first main WAL record and its
4244-
continuation records can be split across different CopyData messages.
4245-
</para>
4246-
</listitem>
4247-
</varlistentry>
4248-
</variablelist>
4249-
4250-
</para>
4251-
4252-
</sect1>
4253-
42544307
<sect1 id="protocol-changes">
42554308
<title>Summary of Changes since Protocol 2.0</title>
42564309

src/backend/replication/walreceiver.c

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
*
3030
*
3131
* IDENTIFICATION
32-
* $PostgreSQL: pgsql/src/backend/replication/walreceiver.c,v 1.10 2010/04/20 22:55:03 tgl Exp $
32+
* $PostgreSQL: pgsql/src/backend/replication/walreceiver.c,v 1.11 2010/06/03 22:17:32 tgl Exp $
3333
*
3434
*-------------------------------------------------------------------------
3535
*/
@@ -41,6 +41,7 @@
4141
#include "access/xlog_internal.h"
4242
#include "libpq/pqsignal.h"
4343
#include "miscadmin.h"
44+
#include "replication/walprotocol.h"
4445
#include "replication/walreceiver.h"
4546
#include "storage/ipc.h"
4647
#include "storage/pmsignal.h"
@@ -393,18 +394,18 @@ XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len)
393394
{
394395
case 'w': /* WAL records */
395396
{
396-
XLogRecPtr recptr;
397+
WalDataMessageHeader msghdr;
397398

398-
if (len < sizeof(XLogRecPtr))
399+
if (len < sizeof(WalDataMessageHeader))
399400
ereport(ERROR,
400401
(errcode(ERRCODE_PROTOCOL_VIOLATION),
401402
errmsg_internal("invalid WAL message received from primary")));
403+
/* memcpy is required here for alignment reasons */
404+
memcpy(&msghdr, buf, sizeof(WalDataMessageHeader));
405+
buf += sizeof(WalDataMessageHeader);
406+
len -= sizeof(WalDataMessageHeader);
402407

403-
memcpy(&recptr, buf, sizeof(XLogRecPtr));
404-
buf += sizeof(XLogRecPtr);
405-
len -= sizeof(XLogRecPtr);
406-
407-
XLogWalRcvWrite(buf, len, recptr);
408+
XLogWalRcvWrite(buf, len, msghdr.dataStart);
408409
break;
409410
}
410411
default:

0 commit comments

Comments
 (0)