Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 75f0655

Browse files
committed
Add missing file for documentation section on failover, replication,
load balancing, and clustering options.
1 parent 2cbdb55 commit 75f0655

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed

doc/src/sgml/failover.sgml

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.1 2006/10/26 15:32:45 momjian Exp $ -->
2+
3+
<chapter id="failover">
4+
<title>Failover, Replication, Load Balancing, and Clustering Options</title>
5+
6+
<indexterm><primary>failover</></>
7+
<indexterm><primary>replication</></>
8+
<indexterm><primary>load balancing</></>
9+
<indexterm><primary>clustering</></>
10+
11+
<para>
12+
Database servers can work together to allow a backup server to
13+
quickly take over if the primary server fails (failover), or to
14+
allow several computers to serve the same data (load balancing).
15+
Ideally, database servers could work together seamlessly. Web
16+
servers serving static web pages can be combined quite easily by
17+
merely load-balancing web requests to multiple machines. In
18+
fact, read-only database servers can be combined relatively easily
19+
too. Unfortunately, most database servers have a read/write mix
20+
of requests, and read/write servers are much harder to combine.
21+
This is because though read-only data needs to be placed on each
22+
server only once, a write to any server has to be propagated to
23+
all servers so that future read requests to those servers return
24+
consistent results.
25+
</para>
26+
27+
<para>
28+
This synchronization problem is the fundamental difficulty for servers
29+
working together. Because there is no single solution that eliminates
30+
the impact of the sync problem for all use cases, there are multiple
31+
solutions. Each solution addresses this problem in a different way, and
32+
minimizes its impact for a specific workload.
33+
</para>
34+
35+
<para>
36+
Some failover and load balancing solutions are synchronous, meaning that
37+
a data-modifying transaction is not considered committed until all
38+
servers have committed the transaction. This guarantees that a failover
39+
will not lose any data and that all load-balanced servers will return
40+
consistent results with no propagation delay. Asynchronous updating has
41+
a small delay between the time of commit and its propagation to the
42+
other servers, opening the possibility that some transactions might be
43+
lost in the switch to a backup server, and that load balanced servers
44+
might return slightly stale results. Asynchronous communication is used
45+
when synchronous would be too slow.
46+
</para>
47+
48+
<para>
49+
Solutions can also be categorized by their granularity. Some solutions
50+
can deal only with an entire database server, while others allow control
51+
at the per-table or per-database level.
52+
</para>
53+
54+
<para>
55+
Performance must be considered in any failover or load balancing
56+
choice. There is usually a tradeoff between functionality and
57+
performance. For example, a full synchronous solution over a slow
58+
network might cut performance by more than half, while an asynchronous
59+
one might have a minimal performance impact.
60+
</para>
61+
62+
<para>
63+
This remainder of this section outlines various failover, replication,
64+
and load balancing solutions.
65+
</para>
66+
67+
<sect1 id="shared-disk-failover">
68+
<title>Shared Disk Failover</title>
69+
70+
<para>
71+
Shared disk failover avoids synchronization overhead by having only one
72+
copy of the database. It uses a single disk array that is shared by
73+
multiple servers. If the main database server fails, the backup server
74+
is able to mount and start the database as though it was recovering from
75+
a database crash. This allows rapid failover with no data loss.
76+
</para>
77+
78+
<para>
79+
Shared hardware functionality is common in network storage devices. One
80+
significant limitation of this method is that if the shared disk array
81+
fails or becomes corrupt, the primary and backup servers are both
82+
nonfunctional.
83+
</para>
84+
</sect1>
85+
86+
<sect1 id="warm-standby-using-point-in-time-recovery">
87+
<title>Warm Standby Using Point-In-Time Recovery</title>
88+
89+
<para>
90+
A warm standby server (see <xref linkend="warm-standby">) can
91+
be kept current by reading a stream of write-ahead log (WAL)
92+
records. If the main server fails, the warm standby contains
93+
almost all of the data of the main server, and can be quickly
94+
made the new master database server. This is asynchronous and
95+
can only be done for the entire database server.
96+
</para>
97+
</sect1>
98+
99+
<sect1 id="continuously-running-replication-server">
100+
<title>Continuously Running Replication Server</title>
101+
102+
<para>
103+
A continuously running replication server allows the backup server to
104+
answer read-only queries while the master server is running. It
105+
receives a continuous stream of write activity from the master server.
106+
Because the backup server can be used for read-only database requests,
107+
it is ideal for data warehouse queries.
108+
</para>
109+
110+
<para>
111+
Slony is an example of this type of replication, with per-table
112+
granularity. It updates the backup server in batches, so the repliation
113+
is asynchronous and might lose data during a fail over.
114+
</para>
115+
</sect1>
116+
117+
<sect1 id="data-partitioning">
118+
<title>Data Partitioning</title>
119+
120+
<para>
121+
Data partitioning splits tables into data sets. Each set can only be
122+
modified by one server. For example, data can be partitioned by
123+
offices, e.g. London and Paris. While London and Paris servers have all
124+
data records, only London can modify London records, and Paris can only
125+
modify Paris records.
126+
</para>
127+
128+
<para>
129+
Such partitioning implements both failover and load balancing. Failover
130+
is achieved because the data resides on both servers, and this is an
131+
ideal way to enable failover if the servers share a slow communication
132+
channel. Load balancing is possible because read requests can go to any
133+
of the servers, and write requests are split among the servers. Of
134+
course, the communication to keep all the servers up-to-date adds
135+
overhead, so ideally the write load should be low, or localized as in
136+
the London/Paris example above.
137+
</para>
138+
139+
<para>
140+
Data partitioning is usually handled by application code, though rules
141+
and triggers can be used to keep the read-only data sets current. Slony
142+
can also be used in such a setup. While Slony replicates only entire
143+
tables, London and Paris can be placed in separate tables, and
144+
inheritance can be used to access both tables using a single table name.
145+
</para>
146+
</sect1>
147+
148+
<sect1 id="query-broadcast-load-balancing">
149+
<title>Query Broadcast Load Balancing</title>
150+
151+
<para>
152+
Query broadcast load balancing is accomplished by having a program
153+
intercept every query and send it to all servers. Read-only queries can
154+
be sent to a single server because there is no need for all servers to
155+
process it. This is unusual because most replication solutions have
156+
each write server propagate its changes to the other servers. With
157+
query broadcasting, each server operates independently.
158+
</para>
159+
160+
<para>
161+
This can be complex to set up because functions like random()
162+
and CURRENT_TIMESTAMP will have different values on different
163+
servers, and sequences should be consistent across servers.
164+
Care must also be taken that all transactions either commit or
165+
abort on all servers Pgpool is an example of this type of
166+
replication.
167+
</para>
168+
</sect1>
169+
170+
<sect1 id="clustering-for-load-balancing">
171+
<title>Clustering For Load Balancing</title>
172+
173+
<para>
174+
In clustering, each server can accept write requests, and these
175+
write requests are broadcast from the original server to all
176+
other servers before each transaction commits. Under heavy
177+
load, this can cause excessive locking and performance degradation.
178+
It is implemented by <productname>Oracle</> in their
179+
<productname><acronym>RAC</></> product. <productname>PostgreSQL</>
180+
does not offer this type of load balancing, though
181+
<productname>PostgreSQL</> two-phase commit can be used to
182+
implement this in application code or middleware.
183+
</para>
184+
</sect1>
185+
186+
<sect1 id="clustering-for-parallel-query-execution">
187+
<title>Clustering For Parallel Query Execution</title>
188+
189+
<para>
190+
This allows multiple servers to work on a single query. One
191+
possible way this could work is for the data to be split among
192+
servers and for each server to execute its part of the query
193+
and results sent to a central server to be combined and returned
194+
to the user. There currently is no <productname>PostgreSQL</>
195+
open source solution for this.
196+
</para>
197+
</sect1>
198+
199+
<sect1 id="commercial-solutions">
200+
<title>Commercial Solutions</title>
201+
202+
<para>
203+
Because <productname>PostgreSQL</> is open source and easily
204+
extended, a number of companies have taken <productname>PostgreSQL</>
205+
and created commercial closed-source solutions with unique
206+
failover, replication, and load balancing capabilities.
207+
</para>
208+
</sect1>
209+
210+
</chapter>

0 commit comments

Comments
 (0)