Postgresql DBA Architecture
Postgresql DBA Architecture
Wal_Buffers Temp_Buffers
Clog_Buffers
Work_Mem
Memory_Locks
Vacuum_Buffers
Process
Postgres DB writer Stats collector Check pointer
Wal Writer Archive Logger Auto Vacuum
Physical Data Files
Data files WAL Files Temp Files Clog files Log files
8k
LRU
i=20 Dirty buffers
------
t3
t1 t2 t4
Pinned buffers MRU
t5 t6 t7 t8
By default size of the shared_buffers is 128 MB, We can assign 25% of the
physical memory to the Shared_Buffers.
Shared buffers allocating during the startup process using Shared_buffers parameter values.
Dirty Buffers: These buffers contains modified data, those modified buffers called as dirty shared
buffers.
Pinned Buffers: The buffers which are access by the transactions are called as pinned buffers
Algorithms: Entire Buffer management is working internally using LRU & MRU algorithms
If you want to increase the Shared_buffers size, we need to restart the Database to effect the new
values.
Non-repeatable read: A transaction re reads the data previously read and finds the data changed by
another transaction which has committed since the initial read.
Phantom read: A transaction re executes a query returning a set of rows satisfying a search condition
and finds that the set of rows satisfying the condition has changed because another recently committed
transaction.
WAL Buffers
To ensure that no data has been lost by server failures, PostgreSQL supports the WAL
mechanism. WAL data (also referred to as XLOG records) are transaction log in
PostgreSQL; and WAL buffer is a buffering area of the WAL data before writing to a
persistent storage.
WAL_BUFFERS allocate during the startup of the cluster used by the values of
WAL_Buffers in configuration file.
Changes of this parameter needs restart the cluster to effect the new values.
CLOG Buffers
• CLOG stands for "commit log", and the CLOG buffers is an area
in operating system RAM dedicated to hold commit log pages.
The commit log pages contain log of transaction metadata and
differ from the WAL data. The commit logs have commit status
of all transactions and indicate whether or not a transaction has
been completed (committed).
• Ex:
postgres=# SET maintenance_work_mem TO '384MB ';
SET
postgres=# CREATE INDEX ON personen_vorname_nachname_idx persons
( "last_name”);
CREATE INDEX
WORK_MEM
Work_mem is mainly used for expensive operations like sort
or hash operations like Joining, and filtering of certain
Data used
• ORDER BY, DISTINCT, and merge joins require memory for
sort operations.
hash joins, hash and hash-based processing aggregations
of IN operations require memory for hash tables.
Bitmap index scans require memory for the internal
bitmap.
• Parameter is work_mem, defaul value is 4MB
postgres=# SET work_mem TO '8 GB ';
SET
postgres=# SET trace_sort TO on;
SET
postgres=#SET client_min_messages TO DEBUG;
SET
postgres=# SELECT relname, relkind, relpages FROM pg_class c WHERE
relkind = 'r'
postgres=# relpages ORDER BY DESC, ASC relname
postgres=#LIMIT 1;
LOG: begin tuple sort: NKEYS = 2, Workmen = 8192, random access = f
LOG: starting performsort: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: performsort done: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG: internal sort ended, 31 KB used: CPU 0.00s/0.00u sec elapsed 0.00 sec
VACUUM_BUFFER
• Starts the daemon which cleans up tables and indexes, preventing bloat
and poor response times.
• This is the maximum amount of memory used by each of the autovacuum
worker processes, and it is controlled by
the autovacuum_work_mem database parameter. The memory is
allocated from the operating system RAM and is also influenced by
the autovacuum_max_workers database parameter. The setting
of autovacuum_work_mem should be configured carefully
as autovacuum_max_workers times this memory will be allocated from
the RAM. All these parameter settings only come into play when the auto
vacuum daemon is enabled, otherwise, these settings have no effect on
the behavior of VACUUM when run in other contexts. This memory
component is not shared by any other background server or user process.
temp_buffers
• A database may have one or more temporary tables, and
the data blocks (pages) of such temporary tables need a
separate allocation of memory to be processed in. The
temp buffers serve this purpose by utilising a portion of
RAM, defined by the temp_buffers parameter. The temp
buffers are only used for access to temporary tables in a
user session. There is no relation between temp buffers
in memory and the temporary files that are created
under the pgsql_tmp directory during
large sort and hash table operations.
Bg Writer Process: Default
• The background writer continues to flush dirty
pages/buffers to datafiles. This background writer
scans the shared buffer for dirty pages to write down
to the disk level.
• The default run times of BG writer process is 200
mill. Seconds and for every 100 LRU (least recently
used ) pages/buffers reached in shared buffers.
• Parameters to handle the BG Writer:
-bgwriter_delay= 200 (milli seconds)
-bgwriter_lru_maxpages=100 (buffers)
Check Pointer Process: Default
This process is takes care of cluster checkpoint. When a check
point starts all the dirty pages in memory are written to the
datafiles.
A checkpoint is a known safe starting point for recovery, since at
that time we write all currently outstanding database changes to
disk.
Checkpoint process has two aspects: the preparation of database
recovery, and the cleaning of dirty pages on the shared buffer
pool.
Checkpoint also updates PG_CONTROL file. which holds the
metadata of the current checkpoint.
Parameters:
checkpoint_timeout = 5 min
checkpoint_completion_target = 0.5
pg_control File:
• As the pg_control file contains the fundamental information of the checkpoint, it is certainly
essential for database recovery. If it is broken or unreadable, the recovery process cannot start
up in order to not obtained a starting point.
• State – The state of database server at the time of the latest check pointing starts.
• There are seven states in total:
'start up' is the state that system is starting up;
'shut down' is the state that system is going down normally by the shutdown command;
'in production' is the state that system is running;
……………. and so on.
• --------------
• ---------------
WAL Writer Process: default
• WAL writer is a background process to check
the WAL buffer periodically and write all
unwritten XLOG records into the WAL
segments.
• Parameter:
• Wal_writer_delay = 200 ms
• Auto vacuum Launcher Process: non- default
• PostgreSQL does not immediately remove the deleted tuples from the data
files. These are marked as deleted. Similarly, when a record is updated, it's
roughly equivalent to one delete and one insert. The previous version of the
record continues to be in the data file. Each update of a database row
generates a new version of the row. The reason is simple: there can be
active transactions, which want to see the data as it was before. As a result
of this activity, there will be a lot of unusable space in the data files. After
some time, these dead records become irrelevant as there are no
transactions still around to see the old data.
• The statistics collector transmits the collected information to other PostgreSQL processes
through temporary files. These files are stored in the directory named by the
stats_temp_directory parameter, pg_stat_tmp by default.
• Vacuum_freeze_table_age
• Autovacuum_freeze_max_age
BASE
Postgres.auto.conf
DATA
Pg_tblspc
BASE
• base directory contains the database files. Each database have a
dedicated sub-directories-named after the internal database's object id.
A freshly initialized data directory shows only three subdirectories in
the base folder.
Pg_multixact:
Stores the information's about the multi transaction
status, used generally for the row share locks.
Pg_subtrans:
Stores the sub transactions status data.
Pg_notify:
Stores information's about the LISTEN/NOTIFY
operations.
Pg_stat:
This directory contains the permanent les for the statistic
subsystem.
Pg_twophase:
Stores the two phase commit data. The two phase commit
allows the transaction opening independently from the
session. This way even a different session can commit or
rollback the transaction later.
Pg _tblspc:
• The directory contains the symbolic links to
the tablespace locations. A tablespace is a
logical name pointing a physical location.
• As from PostgreSQL 9.2 the location is read
directly from the symbolic link.
• This make possible to change the tablespace's
position simply stopping the cluster, moving
the data file in the new location, creating the
new symbolic link and starting the cluster.
Pg_snapshots:
• This directory is used to store the exported
snapshots. From the version 9.2 PostgreSQL
offers the transaction's snapshot export where
one session can open a transaction and export a
consistent snapshot.
• This way different session can access the
snapshot and read all together the same
consistent data snapshot.
• This feature is used, for example, by pg dump
for the parallel export.
Pg_ident.conf
• The content of the pg_ident.conf associates identifying usernames with PostgreSQL usernames
via definitions called ident maps . This is useful for users whose system usernames do not
match their PostgreSQL usernames. Some rules you should keep in mind when defining and
using an ident map are:
• Each ident map member is defined on a single line, which associates a map name with an
identifying username, and a translated PostgreSQL username.
• The pg_ident.conf file can contain multiple map names. Each group of single lines with the
same associative map name are considered a single map.
• A single line record to define an ident map consist of 3 tokens: the name of the map, the
identifying username, and the translated PostgreSQL username. This syntax is entered as
follows, where each token is separated by spaces, or tabs:
postmaster.opts
• If this file exists in the data directory, pg_ctl (in restart mode) will pass
the contents of the file as options to postgres, unless overridden by the
-o option. The contents of this file are also displayed in status mode.
Vacuum:
• Tuples that are deleted or obsolete by an update are not physically removed
from their table; they remain present until a VACUUM is done.