Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 9243664

Browse files
committed
This patch updates pg_autovacuum in several ways:
* A few bug fixes * fixes solaris compile and crash issue * decouple vacuum analyze and analyze thresholds * detach from tty (dameonize) * improved logging layout * more conservative default configuration * improved, expanded and updated README please apply and 1st convenience, or before code freeze which ever comes first :-) At this point I think I have brought pg_autovacuum and its client side design as far as I think it should go. It works, keeping file sizes in check, helps performance and give the administrator a fair amount flexibility in configuring it. Next up is to do the FSM based design that is integrated into the back end. p.s. Thanks to Christopher Browne for his help. Matthew T. O'Connor
1 parent 4e1f986 commit 9243664

File tree

4 files changed

+1049
-656
lines changed

4 files changed

+1049
-656
lines changed
+128-52
Original file line numberDiff line numberDiff line change
@@ -1,80 +1,156 @@
11
pg_autovacuum README
2+
--------------------
23

3-
pg_autovacuum is a libpq client program that monitors all the databases of a
4-
postgresql server. It uses the stats collector to monitor insert, update and
5-
delete activity. When an individual table exceeds it's insert or delete
6-
threshold (more detail on thresholds below) then that table is vacuumed or
7-
analyzed. This allows postgresql to keep the fsm and table statistics up to
8-
date without having to schedule periodic vacuums with cron regardless of need.
4+
pg_autovacuum is a libpq client program that monitors all the
5+
databases associated with a postgresql server. It uses the stats
6+
collector to monitor insert, update and delete activity.
97

10-
The primary benefit of pg_autovacuum is that the FSM and table statistic information
11-
are updated as needed. When a table is actively changed pg_autovacuum performs the
12-
necessary vacuums and analyzes, when a table is inactive, no cycles are wasted
13-
performing vacuums and analyzes that are not needed.
8+
When a table exceeds its insert or delete threshold (more detail
9+
on thresholds below) then that table will be vacuumed or analyzed.
10+
11+
This allows postgresql to keep the fsm and table statistics up to
12+
date, and eliminates the need to schedule periodic vacuums.
13+
14+
The primary benefit of pg_autovacuum is that the FSM and table
15+
statistic information are updated as needed. When a table is actively
16+
changing, pg_autovacuum will perform the necessary vacuums and
17+
analyzes, whereas if a table remains static, no cycles will be wasted
18+
performing unnecessary vacuums/analyzes.
19+
20+
A secondary benefit of pg_autovacuum is that it ensures that a
21+
database wide vacuum is performed prior to xid wraparound. This is an
22+
important, if rare, problem, as failing to do so can result in major
23+
data loss.
24+
25+
26+
KNOWN ISSUES:
27+
-------------
28+
pg_autovacuum has been tested under Redhat Linux (by me) and Solaris (by
29+
Christopher B. Browne) and all known bugs have been resolved. Please report
30+
any problems to the hackers list.
31+
32+
pg_autovacuum does not get started automatically by either the postmaster or
33+
by pg_ctl. Along the sames lines, when the postmaster exits no one tells
34+
pg_autovacuum. The result is that at the start of the next loop,
35+
pg_autovacuum fails to connect to the server and exits. Any time it fails
36+
to connect pg_autovacuum exits.
37+
38+
pg_autovacuum requires that the stats system be enabled and reporting row
39+
level stats. The overhead of the stats system has been shown to be
40+
significant under certain workloads. For instance a tight loop of queries
41+
performing "select 1" was nearly 30% slower with stats enabled. However,
42+
in practice with more realistic workloads, the stats system overhead is
43+
usually nominal.
1444

15-
A secondary benefit of pg_autovacuum is that it guarantees that a database wide
16-
vacuum is performed prior to xid wraparound. This is important as failing to do
17-
so can result in major data loss.
1845

1946
INSTALL:
20-
To use pg_autovacuum, uncompress the tar.gz into the contrib directory and modify the
21-
contrib/Makefile to include the pg_autovacuum directory. pg_autovacuum will then be made as
22-
part of the standard postgresql install.
47+
--------
48+
49+
As of postgresql v7.4 pg_autovacuum is included in the main source tree
50+
under contrib. Therefore you just make && make install (similar to most other
51+
contrib modules) and it will be installed for you.
52+
53+
If you are using an earlier version of postgresql just uncompress the tar.gz
54+
into the contrib directory and modify the contrib/Makefile to include the pg_autovacuum
55+
directory. pg_autovacuum will then be made as part of the standard
56+
postgresql install.
2357

2458
make sure that the folowing are set in postgresql.conf
25-
stats_start_collector = true
26-
stats_row_level = true
2759

28-
start up the postmaster
29-
then, just execute the pg_autovacuum executable.
60+
stats_start_collector = true
61+
stats_row_level = true
62+
63+
start up the postmaster, then execute the pg_autovacuum executable.
3064

3165

3266
Command line arguments:
67+
-----------------------
68+
3369
pg_autovacuum has the following optional arguments:
70+
3471
-d debug: 0 silent, 1 basic info, 2 more debug info, etc...
72+
-D dameonize: Detach from tty and run in background.
3573
-s sleep base value: see "Sleeping" below.
3674
-S sleep scaling factor: see "Sleeping" below.
37-
-t tuple base threshold: see Vacuuming.
38-
-T tuple scaling factor: see Vacuuming.
39-
-U username: Username pg_autovacuum will use to connect with, if not specified the
40-
current username is used
75+
-v vacuum base threshold: see Vacuum and Analyze.
76+
-V vacuum scaling factor: see Vacuum and Analyze.
77+
-a analyze base threshold: see Vacuum and Analyze.
78+
-A analyze scaling factor: see Vacuum and Analyze.
79+
-L log file: Name of file to which output is submitted, otherwise STDERR
80+
-U username: Username pg_autovacuum will use to connect with, if not
81+
specified the current username is used.
4182
-P password: Password pg_autovacuum will use to connect with.
4283
-H host: host name or IP to connect too.
4384
-p port: port used for connection.
4485
-h help: list of command line options.
4586

46-
All arguments have default values defined in pg_autovacuum.h. At the time of this
47-
writing they are:
48-
#define AUTOVACUUM_DEBUG 1
49-
#define BASETHRESHOLD 100
50-
#define SCALINGFACTOR 2
51-
#define SLEEPVALUE 3
52-
#define SLEEPSCALINGFACTOR 2
53-
#define UPDATE_INTERVAL 2
87+
All arguments have default values defined in pg_autovacuum.h. At the
88+
time of writing they are:
89+
90+
-d 1
91+
-v 1000
92+
-V 2
93+
-a 500 (half of -v is not specified)
94+
-A 1 (half of -v is not specified)
95+
-s 300 (5 minutes)
96+
-S 2
5497

5598

5699
Vacuum and Analyze:
57-
pg_autovacuum performes either a vacuums analyze or just analyze depending on the table activity.
58-
If the number of (inserts + updates) > insertThreshold, then an only an analyze is performed.
59-
If the number of (deletes + updates ) > deleteThreshold, then a vacuum analyze is performed.
60-
deleteThreshold is equal to: tuple_base_value + (tuple_scaling_factor * "number of tuples in the table")
61-
insertThreshold is equal to: 0.5 * tuple_base_value + (tuple_scaling_factor * "number of tuples in the table")
62-
The insertThreshold is half the deleteThreshold because it's a much lighter operation (approx 5%-10% of vacuum),
63-
so running it more often costs us little in performance degredation.
100+
-------------------
101+
102+
pg_autovacuum performs either a vacuum analyze or just analyze depending
103+
on the quantity and type of table activity (insert, update, or delete):
104+
105+
- If the number of (inserts + updates + deletes) > AnalyzeThreshold, then
106+
only an analyze is performed.
107+
108+
- If the number of (deletes + updates ) > VacuumThreshold, then a
109+
vacuum analyze is performed.
110+
111+
deleteThreshold is equal to:
112+
vacuum_base_value + (vacuum_scaling_factor * "number of tuples in the table")
113+
114+
insertThreshold is equal to:
115+
analyze_base_value + (analyze_scaling_factor * "number of tuples in the table")
116+
117+
The AnalyzeThreshold defaults to half of the VacuumThreshold since it
118+
represents a much less expensive operation (approx 5%-10% of vacuum), and
119+
running it more often should not substantially degrade system performance.
64120

65121
Sleeping:
66-
pg_autovacuum sleeps after it is done checking all the databases. It does this so as
67-
to limit the amount of system resources it consumes. This also allows the system
68-
administrator to configure pg_autovacuum to be more or less aggressive. Reducing the
69-
sleep time will cause pg_autovacuum to respond more quickly to changes, be they database
70-
addition / removal, table addition / removal, or just normal table activity. However,
71-
setting these values to high can have a negative net effect on the server. If a table
72-
gets vacuumed 5 times during the course of a large update, it might take much longer
73-
than if it was vacuumed only once.
122+
---------
123+
124+
pg_autovacuum sleeps for a while after it is done checking all the
125+
databases. It does this in order to limit the amount of system
126+
resources it consumes. This also allows the system administrator to
127+
configure pg_autovacuum to be more or less aggressive.
128+
129+
Reducing the sleep time will cause pg_autovacuum to respond more
130+
quickly to changes, whether they be database addition/removal, table
131+
addition/removal, or just normal table activity.
132+
133+
On the other hand, setting pg_autovaccum to sleep values to agressivly
134+
(for too short a period of time) can have a negative effect on server
135+
performance. If a table gets vacuumed 5 times during the course of a
136+
large update, this is likely to take much longer than if the table was
137+
vacuumed only once, at the end.
138+
74139
The total time it sleeps is equal to:
75-
base_sleep_value + sleep_scaling_factor * "duration of the previous loop"
76140

77-
What it monitors:
78-
pg_autovacuum dynamically generates a list of databases and tables to monitor, in
79-
addition it will dynamically add and remove databases and tables that are
80-
removed from the database server while pg_autovacuum is running.
141+
base_sleep_value + sleep_scaling_factor * "duration of the previous
142+
loop"
143+
144+
Note that timing measurements are made in seconds; specifying
145+
"pg_vacuum -s 1" means pg_autovacuum could poll the database upto 60 times
146+
minute. In a system with large tables where vacuums may run for several
147+
minutes, longer times between vacuums are likely to be appropriate.
148+
149+
What pg_autovacuum monitors:
150+
----------------------------
151+
152+
pg_autovacuum dynamically generates a list of all databases and tables that
153+
exist on the server. It will dynamically add and remove databases and
154+
tables that are removed from the database server while pg_autovacuum is
155+
running. Overhead is fairly small per object. For example: 10 databases
156+
with 10 tables each appears to less than 10k of memory on my Linux box.

contrib/pg_autovacuum/TODO

+9-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
Todo Items for pg_autovacuum client
2-
3-
_Allow it to detach from the tty
2+
--------------------------------------------------------------------------
43

54
_create a FSM export function and see if I can use it for pg_autovacuum
65

@@ -9,6 +8,7 @@ _look into possible benifits of pgstattuple contrib work
98
_Continue trying to reduce server load created by polling.
109

1110
Done:
11+
--------------------------------------------------------------------------
1212
_Check if required pg_stats are enables, if not exit with error
1313

1414
_Reduce the number connections and queries to the server
@@ -34,3 +34,10 @@ _change name to pg_autovacuum
3434

3535
_Add proper table and database removal functions so that we can properly
3636
clear up before we exit, and make sure we don't leak memory when removing tables and such.
37+
38+
_Decouple insert and delete thresholds
39+
40+
_Fix Vacuum debug routine to include the database name.
41+
42+
_Allow it to detach from the tty
43+

0 commit comments

Comments
 (0)