Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit e788e84

Browse files
committed
doc: Add best practises section to partitioning docs
A few questionable partitioning designs have been cropping up lately around the mailing lists. Generally, these cases have been partitioning using too many partitions which have caused performance or OOM problems for the users. Since we have very little else to guide users into good design, here we add a new section to the partitioning documentation with some best practise guidelines for good design. Reviewed-by: Justin Pryzby, Amit Langote, Alvaro Herrera Discussion: https://postgr.es/m/CAKJS1f-2rx+E9mG3xrCVHupefMjAp1+tpczQa9SEOZWyU7fjEA@mail.gmail.com Backpatch-through: 10
1 parent 6f34fcb commit e788e84

File tree

1 file changed

+84
-2
lines changed

1 file changed

+84
-2
lines changed

doc/src/sgml/ddl.sgml

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3450,8 +3450,9 @@ VALUES ('Albany', NULL, NULL, 'NY');
34503450
</listitem>
34513451
</itemizedlist>
34523452

3453-
These deficiencies will probably be fixed in some future release,
3454-
but in the meantime considerable care is needed in deciding whether
3453+
Some functionality not implemented for inheritance hierarchies is
3454+
implemented for declarative partitioning.
3455+
Considerable care is needed in deciding whether partitioning with legacy
34553456
inheritance is useful for your application.
34563457
</para>
34573458

@@ -4674,6 +4675,87 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
46744675
</itemizedlist>
46754676
</para>
46764677
</sect2>
4678+
4679+
<sect2 id="ddl-partitioning-declarative-best-practices">
4680+
<title>Declarative Partitioning Best Practices</title>
4681+
4682+
<para>
4683+
The choice of how to partition a table should be made carefully as the
4684+
performance of query planning and execution can be negatively affected by
4685+
poor design.
4686+
</para>
4687+
4688+
<para>
4689+
One of the most critical design decisions will be the column or columns
4690+
by which you partition your data. Often the best choice will be to
4691+
partition by the column or set of columns which most commonly appear in
4692+
<literal>WHERE</literal> clauses of queries being executed on the
4693+
partitioned table. <literal>WHERE</literal> clause items that match and
4694+
are compatible with the partition key can be used to prune unneeded
4695+
partitions. However, you may be forced into making other decisions by
4696+
requirements for the <literal>PRIMARY KEY</literal> or a
4697+
<literal>UNIQUE</literal> constraint. Removal of unwanted data is also a
4698+
factor to consider when planning your partitioning strategy. An entire
4699+
partition can be detached fairly quickly, so it may be beneficial to
4700+
design the partition strategy in such a way that all data to be removed
4701+
at once is located in a single partition.
4702+
</para>
4703+
4704+
<para>
4705+
Choosing the target number of partitions that the table should be divided
4706+
into is also a critical decision to make. Not having enough partitions
4707+
may mean that indexes remain too large and that data locality remains poor
4708+
which could result in low cache hit ratios. However, dividing the table
4709+
into too many partitions can also cause issues. Too many partitions can
4710+
mean longer query planning times and higher memory consumption during both
4711+
query planning and execution. When choosing how to partition your table,
4712+
it's also important to consider what changes may occur in the future. For
4713+
example, if you choose to have one partition per customer and you
4714+
currently have a small number of large customers, consider the
4715+
implications if in several years you instead find yourself with a large
4716+
number of small customers. In this case, it may be better to choose to
4717+
partition by <literal>HASH</literal> and choose a reasonable number of
4718+
partitions rather than trying to partition by <literal>LIST</literal> and
4719+
hoping that the number of customers does not increase beyond what it is
4720+
practical to partition the data by.
4721+
</para>
4722+
4723+
<para>
4724+
Sub-partitioning can be useful to further divide partitions that are
4725+
expected to become larger than other partitions, although excessive
4726+
sub-partitioning can easily lead to large numbers of partitions and can
4727+
cause the same problems mentioned in the preceding paragraph.
4728+
</para>
4729+
4730+
<para>
4731+
It is also important to consider the overhead of partitioning during
4732+
query planning and execution. The query planner is generally able to
4733+
handle partition hierarchies up a few thousand partitions fairly well,
4734+
provided that typical queries allow the query planner to prune all but a
4735+
small number of partitions. Planning times become longer and memory
4736+
consumption becomes higher when more partitions remain after the planner
4737+
performs partition pruning. This is particularly true for the
4738+
<command>UPDATE</command> and <command>DELETE</command> commands. Another
4739+
reason to be concerned about having a large number of partitions is that
4740+
the server's memory consumption may grow significantly over a period of
4741+
time, especially if many sessions touch large numbers of partitions.
4742+
That's because each partition requires its metadata to be loaded into the
4743+
local memory of each session that touches it.
4744+
</para>
4745+
4746+
<para>
4747+
With data warehouse type workloads, it can make sense to use a larger
4748+
number of partitions than with an <acronym>OLTP</acronym> type workload.
4749+
Generally, in data warehouses, query planning time is less of a concern as
4750+
the majority of processing time is spent during query execution. With
4751+
either of these two types of workload, it is important to make the right
4752+
decisions early, as re-partitioning large quantities of data can be
4753+
painfully slow. Simulations of the intended workload are often beneficial
4754+
for optimizing the partitioning strategy. Never assume that more
4755+
partitions are better than fewer partitions and vice-versa.
4756+
</para>
4757+
</sect2>
4758+
46774759
</sect1>
46784760

46794761
<sect1 id="ddl-foreign-data">

0 commit comments

Comments
 (0)