1
- <!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.18 2007/11/08 19:16:30 momjian Exp $ -->
1
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.19 2007/11/08 19:18:23 momjian Exp $ -->
2
2
3
3
<chapter id="high-availability">
4
4
<title>High Availability, Load Balancing, and Replication</title>
79
79
80
80
<variablelist>
81
81
82
- <varlistentry>
83
- <term>Shared Disk Failover</term>
84
- <listitem>
85
-
86
- <para>
87
- Shared disk failover avoids synchronization overhead by having only one
88
- copy of the database. It uses a single disk array that is shared by
89
- multiple servers. If the main database server fails, the standby server
90
- is able to mount and start the database as though it was recovering from
91
- a database crash. This allows rapid failover with no data loss.
92
- </para>
93
-
94
- <para>
95
- Shared hardware functionality is common in network storage devices.
96
- Using a network file system is also possible, though care must be
97
- taken that the file system has full POSIX behavior (see <xref
98
- linkend="creating-cluster-nfs">). One significant limitation of this
99
- method is that if the shared disk array fails or becomes corrupt, the
100
- primary and standby servers are both nonfunctional. Another issue is
101
- that the standby server should never access the shared storage while
102
- the primary server is running.
103
- </para>
104
-
105
- </listitem>
106
- </varlistentry>
107
-
108
- <varlistentry>
109
- <term>File System Replication</term>
110
- <listitem>
111
-
112
- <para>
113
- A modified version of shared hardware functionality is file system
114
- replication, where all changes to a file system are mirrored to a file
115
- system residing on another computer. The only restriction is that
116
- the mirroring must be done in a way that ensures the standby server
117
- has a consistent copy of the file system — specifically, writes
118
- to the standby must be done in the same order as those on the master.
119
- DRBD is a popular file system replication solution for Linux.
120
- </para>
82
+ <varlistentry>
83
+ <term>Shared Disk Failover</term>
84
+ <listitem>
85
+
86
+ <para>
87
+ Shared disk failover avoids synchronization overhead by having only one
88
+ copy of the database. It uses a single disk array that is shared by
89
+ multiple servers. If the main database server fails, the standby server
90
+ is able to mount and start the database as though it was recovering from
91
+ a database crash. This allows rapid failover with no data loss.
92
+ </para>
93
+
94
+ <para>
95
+ Shared hardware functionality is common in network storage devices.
96
+ Using a network file system is also possible, though care must be
97
+ taken that the file system has full POSIX behavior (see <xref
98
+ linkend="creating-cluster-nfs">). One significant limitation of this
99
+ method is that if the shared disk array fails or becomes corrupt, the
100
+ primary and standby servers are both nonfunctional. Another issue is
101
+ that the standby server should never access the shared storage while
102
+ the primary server is running.
103
+ </para>
104
+
105
+ </listitem>
106
+ </varlistentry>
107
+
108
+ <varlistentry>
109
+ <term>File System Replication</term>
110
+ <listitem>
111
+
112
+ <para>
113
+ A modified version of shared hardware functionality is file system
114
+ replication, where all changes to a file system are mirrored to a file
115
+ system residing on another computer. The only restriction is that
116
+ the mirroring must be done in a way that ensures the standby server
117
+ has a consistent copy of the file system — specifically, writes
118
+ to the standby must be done in the same order as those on the master.
119
+ DRBD is a popular file system replication solution for Linux.
120
+ </para>
121
121
122
122
<!--
123
123
https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
@@ -128,150 +128,150 @@ only committed once to disk and there is a distributed locking
128
128
protocol to make nodes agree on a serializable transactional order.
129
129
-->
130
130
131
- </listitem>
132
- </varlistentry>
133
-
134
- <varlistentry>
135
- <term>Warm Standby Using Point-In-Time Recovery (<acronym>PITR</>)</term>
136
- <listitem>
137
-
138
- <para>
139
- A warm standby server (see <xref linkend="warm-standby">) can
140
- be kept current by reading a stream of write-ahead log (WAL)
141
- records. If the main server fails, the warm standby contains
142
- almost all of the data of the main server, and can be quickly
143
- made the new master database server. This is asynchronous and
144
- can only be done for the entire database server.
145
- </para>
146
- </listitem>
147
- </varlistentry>
148
-
149
- <varlistentry>
150
- <term>Master-Slave Replication</term>
151
- <listitem>
152
-
153
- <para>
154
- A master-slave replication setup sends all data modification
155
- queries to the master server. The master server asynchronously
156
- sends data changes to the slave server. The slave can answer
157
- read-only queries while the master server is running. The
158
- slave server is ideal for data warehouse queries.
159
- </para>
160
-
161
- <para>
162
- Slony-I is an example of this type of replication, with per-table
163
- granularity, and support for multiple slaves. Because it
164
- updates the slave server asynchronously (in batches), there is
165
- possible data loss during fail over.
166
- </para>
167
- </listitem>
168
- </varlistentry>
169
-
170
- <varlistentry>
171
- <term>Statement-Based Replication Middleware</term>
172
- <listitem>
173
-
174
- <para>
175
- With statement-based replication middleware, a program intercepts
176
- every SQL query and sends it to one or all servers. Each server
177
- operates independently. Read-write queries are sent to all servers,
178
- while read-only queries can be sent to just one server, allowing
179
- the read workload to be distributed.
180
- </para>
181
-
182
- <para>
183
- If queries are simply broadcast unmodified, functions like
184
- <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
185
- sequences would have different values on different servers.
186
- This is because each server operates independently, and because
187
- SQL queries are broadcast (and not actual modified rows). If
188
- this is unacceptable, either the middleware or the application
189
- must query such values from a single server and then use those
190
- values in write queries. Also, care must be taken that all
191
- transactions either commit or abort on all servers, perhaps
192
- using two-phase commit (<xref linkend="sql-prepare-transaction"
193
- endterm="sql-prepare-transaction-title"> and <xref
194
- linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
195
- Pgpool and Sequoia are an example of this type of replication.
196
- </para>
197
- </listitem>
198
- </varlistentry>
199
-
200
- <varlistentry>
201
- <term>Asynchronous Multi-Master Replication</term>
202
- <listitem>
203
-
204
- <para>
205
- For servers that are not regularly connected, like laptops or
206
- remote servers, keeping data consistent among servers is a
207
- challenge. Using asynchronous multi-master replication, each
208
- server works independently, and periodically communicates with
209
- the other servers to identify conflicting transactions. The
210
- conflicts can be resolved by users or conflict resolution rules.
211
- </para>
212
- </listitem>
213
- </varlistentry>
214
-
215
- <varlistentry>
216
- <term>Synchronous Multi-Master Replication</term>
217
- <listitem>
218
-
219
- <para>
220
- In synchronous multi-master replication, each server can accept
221
- write requests, and modified data is transmitted from the
222
- original server to every other server before each transaction
223
- commits. Heavy write activity can cause excessive locking,
224
- leading to poor performance. In fact, write performance is
225
- often worse than that of a single server. Read requests can
226
- be sent to any server. Some implementations use shared disk
227
- to reduce the communication overhead. Synchronous multi-master
228
- replication is best for mostly read workloads, though its big
229
- advantage is that any server can accept write requests —
230
- there is no need to partition workloads between master and
231
- slave servers, and because the data changes are sent from one
232
- server to another, there is no problem with non-deterministic
233
- functions like <function>random()</>.
234
- </para>
235
-
236
- <para>
237
- <productname>PostgreSQL</> does not offer this type of replication,
238
- though <productname>PostgreSQL</> two-phase commit (<xref
239
- linkend="sql-prepare-transaction"
240
- endterm="sql-prepare-transaction-title"> and <xref
241
- linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
242
- can be used to implement this in application code or middleware.
243
- </para>
244
- </listitem>
245
- </varlistentry>
246
-
247
- <varlistentry>
248
- <term>Data Partitioning</term>
249
- <listitem>
250
-
251
- <para>
252
- Data partitioning splits tables into data sets. Each set can
253
- be modified by only one server. For example, data can be
254
- partitioned by offices, e.g. London and Paris, with a server
255
- in each office. If queries combining London and Paris data
256
- are necessary, an application can query both servers, or
257
- master/slave replication can be used to keep a read-only copy
258
- of the other office's data on each server.
259
- </para>
260
- </listitem>
261
- </varlistentry>
262
-
263
- <varlistentry>
264
- <term>Commercial Solutions</term>
265
- <listitem>
266
-
267
- <para>
268
- Because <productname>PostgreSQL</> is open source and easily
269
- extended, a number of companies have taken <productname>PostgreSQL</>
270
- and created commercial closed-source solutions with unique
271
- failover, replication, and load balancing capabilities.
272
- </para>
273
- </listitem>
274
- </varlistentry>
131
+ </listitem>
132
+ </varlistentry>
133
+
134
+ <varlistentry>
135
+ <term>Warm Standby Using Point-In-Time Recovery (<acronym>PITR</>)</term>
136
+ <listitem>
137
+
138
+ <para>
139
+ A warm standby server (see <xref linkend="warm-standby">) can
140
+ be kept current by reading a stream of write-ahead log (WAL)
141
+ records. If the main server fails, the warm standby contains
142
+ almost all of the data of the main server, and can be quickly
143
+ made the new master database server. This is asynchronous and
144
+ can only be done for the entire database server.
145
+ </para>
146
+ </listitem>
147
+ </varlistentry>
148
+
149
+ <varlistentry>
150
+ <term>Master-Slave Replication</term>
151
+ <listitem>
152
+
153
+ <para>
154
+ A master-slave replication setup sends all data modification
155
+ queries to the master server. The master server asynchronously
156
+ sends data changes to the slave server. The slave can answer
157
+ read-only queries while the master server is running. The
158
+ slave server is ideal for data warehouse queries.
159
+ </para>
160
+
161
+ <para>
162
+ Slony-I is an example of this type of replication, with per-table
163
+ granularity, and support for multiple slaves. Because it
164
+ updates the slave server asynchronously (in batches), there is
165
+ possible data loss during fail over.
166
+ </para>
167
+ </listitem>
168
+ </varlistentry>
169
+
170
+ <varlistentry>
171
+ <term>Statement-Based Replication Middleware</term>
172
+ <listitem>
173
+
174
+ <para>
175
+ With statement-based replication middleware, a program intercepts
176
+ every SQL query and sends it to one or all servers. Each server
177
+ operates independently. Read-write queries are sent to all servers,
178
+ while read-only queries can be sent to just one server, allowing
179
+ the read workload to be distributed.
180
+ </para>
181
+
182
+ <para>
183
+ If queries are simply broadcast unmodified, functions like
184
+ <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
185
+ sequences would have different values on different servers.
186
+ This is because each server operates independently, and because
187
+ SQL queries are broadcast (and not actual modified rows). If
188
+ this is unacceptable, either the middleware or the application
189
+ must query such values from a single server and then use those
190
+ values in write queries. Also, care must be taken that all
191
+ transactions either commit or abort on all servers, perhaps
192
+ using two-phase commit (<xref linkend="sql-prepare-transaction"
193
+ endterm="sql-prepare-transaction-title"> and <xref
194
+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
195
+ Pgpool and Sequoia are an example of this type of replication.
196
+ </para>
197
+ </listitem>
198
+ </varlistentry>
199
+
200
+ <varlistentry>
201
+ <term>Asynchronous Multi-Master Replication</term>
202
+ <listitem>
203
+
204
+ <para>
205
+ For servers that are not regularly connected, like laptops or
206
+ remote servers, keeping data consistent among servers is a
207
+ challenge. Using asynchronous multi-master replication, each
208
+ server works independently, and periodically communicates with
209
+ the other servers to identify conflicting transactions. The
210
+ conflicts can be resolved by users or conflict resolution rules.
211
+ </para>
212
+ </listitem>
213
+ </varlistentry>
214
+
215
+ <varlistentry>
216
+ <term>Synchronous Multi-Master Replication</term>
217
+ <listitem>
218
+
219
+ <para>
220
+ In synchronous multi-master replication, each server can accept
221
+ write requests, and modified data is transmitted from the
222
+ original server to every other server before each transaction
223
+ commits. Heavy write activity can cause excessive locking,
224
+ leading to poor performance. In fact, write performance is
225
+ often worse than that of a single server. Read requests can
226
+ be sent to any server. Some implementations use shared disk
227
+ to reduce the communication overhead. Synchronous multi-master
228
+ replication is best for mostly read workloads, though its big
229
+ advantage is that any server can accept write requests —
230
+ there is no need to partition workloads between master and
231
+ slave servers, and because the data changes are sent from one
232
+ server to another, there is no problem with non-deterministic
233
+ functions like <function>random()</>.
234
+ </para>
235
+
236
+ <para>
237
+ <productname>PostgreSQL</> does not offer this type of replication,
238
+ though <productname>PostgreSQL</> two-phase commit (<xref
239
+ linkend="sql-prepare-transaction"
240
+ endterm="sql-prepare-transaction-title"> and <xref
241
+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
242
+ can be used to implement this in application code or middleware.
243
+ </para>
244
+ </listitem>
245
+ </varlistentry>
246
+
247
+ <varlistentry>
248
+ <term>Data Partitioning</term>
249
+ <listitem>
250
+
251
+ <para>
252
+ Data partitioning splits tables into data sets. Each set can
253
+ be modified by only one server. For example, data can be
254
+ partitioned by offices, e.g. London and Paris, with a server
255
+ in each office. If queries combining London and Paris data
256
+ are necessary, an application can query both servers, or
257
+ master/slave replication can be used to keep a read-only copy
258
+ of the other office's data on each server.
259
+ </para>
260
+ </listitem>
261
+ </varlistentry>
262
+
263
+ <varlistentry>
264
+ <term>Commercial Solutions</term>
265
+ <listitem>
266
+
267
+ <para>
268
+ Because <productname>PostgreSQL</> is open source and easily
269
+ extended, a number of companies have taken <productname>PostgreSQL</>
270
+ and created commercial closed-source solutions with unique
271
+ failover, replication, and load balancing capabilities.
272
+ </para>
273
+ </listitem>
274
+ </varlistentry>
275
275
276
276
</variablelist>
277
277
0 commit comments