-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Konstantin Knizhnik edited this page Apr 17, 2018
·
6 revisions
Most of enterprise systems working with Postgres are using pgbouncer or similar tools. But pgbouncer has the following drawbacks:
- It is an extra entity which complicates system installation and administration.
- Pgbouncer itself can be a bottleneck and point of failure. For example with enabled SSL, single threaded model of pgbouncer becomes limiting factor when a lot of clients try to simultaneously reestablish connection. This is why some companies are building hierarchy of pgbouncers.
- Using pool_mode other than "session" makes it not possible to use prepared statements and session variables. Lack of prepared statements can itself decrease speed of simple queries up to two times.
This is whybuilt-in connectoin pooling for Postgres can be useful.
- There is one or more session pools with specified maximal number of workers (backends). In case of multiple session pools, each of them is associated with it own port, leaving standard postgres port (5432 by default) for normal connections with dedicated backends.
- When number of connections exceeds number of backends in seesion pool, then instead of spawning new backend we choose some of existed backend and redirect connection to it. There is more or less portable way in Unix to pass socket descriptors between processes using Unix sockets: for example [https://stackoverflow.com/questions/28003921/sending-file-descriptor-by-linux-socket/]. So a session is bounded to a backend. Backends and chosen using round-robin policy which should guarantee more or less unform distribution of sessions between backends if number of sessions is much larger than number of backends. But certainly skews in client application access patterns can violate this assumption.
- Rescheduling is done at transaction level. So it is enough to have one entry in procarray for backend to correctly handle locks. Also transaction level pooling eliminates problem with false deadlocks (caused by lack of free executors in the pool). Also transaction level pooling minimize changes in Postgres core needed to maintain correct session context: no need to suspend/resume transaction state, static variables, ....
- In the main Postgres query loop in PostgresMain we determine a moment when backend is not in transaction state and perform select of sockets of all active sessions and choose one of them.
- When client is disconnected, then we close session but do not terminate backend.
- To support prepared statements, we append session identifier to the name of the statement. So prepared statements of different sessions will not interleave. As far as session is bounded to the backend, it is possible to use prepared statements.
- Each session has its own copy of non-default GUCs, so it is possible to use session variables for pooled connections.
- Temporary tables are supported by creating private temporary tablespace for each session.
As a result built-in session pool supports:
- Prepared statements
- Session variables
- Temporary tables
Restrictions:
- Long living transaction will block any other transactions scheduled for this worker. Waiting for user actions inside transactions is completely incompatible with this approach. It is possible to use "idle_in_transaction_session_timeout" variable to detect and abort such transactions.
- Session worker can manage connections only to one database under the same user. Otherwise it will be necessary to cleanup caches at each reschedule. And reading startup package and authentication is done by backend, not by postmaster. So at the moment of scheduling connection to one of the workers, postmaster doesn't know to which database and under which user it will be connected.