Name	Name	Last commit message	Last commit date
Latest commit History 98 Commits
expected	expected
sql	sql
src	src
test	test
web	web
Makefile	Makefile
README.md	README.md
README.rus.md	README.rus.md
conf.add	conf.add
fix-extension-config.sql	fix-extension-config.sql
internals.md	internals.md
pgpro_scheduler--2.0--2.1.sql	pgpro_scheduler--2.0--2.1.sql
pgpro_scheduler--2.0.sql	pgpro_scheduler--2.0.sql
pgpro_scheduler--2.1.sql	pgpro_scheduler--2.1.sql
pgpro_scheduler.control	pgpro_scheduler.control

pgpro_scheduler - PostgreSQL extension for job scheduling

pgpro_scheduler allows to schedule jobs execution and control their activity in PostgreSQL database.

The job is the set of SQL commands. Schedule table could be described as a crontab-like string or as a JSON object. It's possible to use combination of both methods for scheduling settings.

Each job could calculate its next start time. The set of SQL commands could be executed in the same transaction or each command could be executed in individual one. It's possible to set SQL statement to be executed on failure of main job transaction.

Installation

pgpro_scheduler is a regular PostgreSQL extension and requires no prerequisites.

Before build extension from the source make sure that the environment variable PATH includes path to pg_config utility. Also make sure that you have developer version of PostgreSQL installed or PostgrteSQL was built from source code.

Install extension as follows:

$ cd pgpro_scheduler
$ make USE_PGXS=1
$ sudo make USE_PGXS=1 install
$ psql <DBNAME> -c "CREATE EXTENSION pgpro_scheduler"

Configuration

The extension defines a number of PostgreSQL variables (GUC). This variables help to handle scheduler configuration.

schedule.enabled - boolean, if scheduler is enabled in this system. Default value: false.
schedule.database - text, list of database names on which scheduler is enabled. Database names should be separated by comma. Default value: empty string.
schedule.schema - text, the schema name where scheduler store its tables and functions. To change this value restart required. Normally you should not change this variable but it could be useful if you want run scheduled jobs on hot-standby database. So you can define foreign data wrapper on master system to wrap default scheduler schema to another and use it on replica. Default value: schedule.
schedule.nodename - text, node name of this instance. Default value is master. You should not change or use it if you run single server configuration. But it is necessary to change this name if you run scheduler on hot-standby database.
schedule.max_workers - integer, max number of simultaneously running jobs for one database. Default value: 2.
schedule.transaction_state - text, this is internal variable. This variable contains state of executed job. This variable was designed to use with a next job start time calculation procedure. Possible values are:
- success - transaction has finished successfully
- failure - transaction has failed to finish
- running - transaction is in progress
- undefined - transaction has not started yet
The last two values normally should not appear inside the user procedure. If you got them probably it indicates an internal scheduler error.

Management

You could manage scheduler work by means of PostgreSQL variables described above.

For example, you have a fresh PostgreSQL installation with scheduler extension installed. You are going to use scheduler with databases called 'database1' and 'database2'. You want 'database1' be capable to run 5 jobs in parallel and 'database2' - 3.

Put the following string to your postgresql.conf:

shared_preload_libraries = 'pgpro_scheduler'

Then start psql and execute the following commands:

# ALTER SYSTEM SET schedule.enabled = true;
# ALTER SYSTEM SET schedule.database = 'database1,database2';
# ALTER DATABASE database1 SET schedule.max_workers = 5;
# ALTER DATABASE database2 SET schedule.max_workers = 3;
# SELECT pg_reload_conf();

If you do not need the different values in max_workers you could store the same in configuration file. Then ask server to reread configuration. There is no need to restart.

Here is an example of postgresql.conf:

shared_preload_libraries = 'pgpro_scheduler'
schedule.enabled = on
schedule.database = 'database1,database2'
schedule.max_workers = 5

The scheduler is designed as background worker which dynamically starts another bgworkers. That's why you should care about proper value in max_worker_processes variable. The minimal acceptable value could be calculated using the following formula:

N_min = 1 + N_databases + MAX_WORKERS₁ + ... + MAX_WORKERS_n

where:

N_min - the minimal acceptable amount of bgworkers in the system. Consider the fact that other systems need to start background workers too. E.g. parallel queries. So you need to adjust the value to their needs either.
N_databases - the number of databases scheduler works with
MAX_WORKERS_n - the value of schedule.max_workers variable in context of each database

SQL Scheme

The extension uses SQL schema schedule to store its internal tables and functions. Direct access to tables is forbidden. All manipulations should be performed by means of functions defined by extension.

SQL Types

The scheduler defines 2 SQL types and use them as types for return values for some of its functions.

cron_rec

This type describes information about the job to be scheduled.

CREATE TYPE schedule.cron_rec AS(
	id integer,             -- job id
	node text,              -- node name to be executed on
	name text,              -- job name 
	comments text,          -- job's comment
	rule jsonb,             -- scheduling rules
	commands text[],        -- sql commands to be executed
	run_as text,            -- name of executor user
	owner text,             -- name of owner user
	start_date timestamp,   -- lower bound of execution window
							-- NULL if unbound
	end_date timestamp,     -- upper bound of execution window
							-- NULL if unbound
	use_same_transaction boolean,   -- if true the set of sql commands 
									-- will be executed in same transaction
	last_start_available interval,  -- max time till scheduled job 
									-- can wait execution if all allowed 
									-- workers are busy
	max_instances int,		-- max number of simultaneous running instances
							-- of this job
	max_run_time interval,  -- max execution time
	onrollback text,        -- SQL command to be performed on transaction
							-- failure
	next_time_statement text,   -- SQL command to execute on main 
								-- transaction end to calculate next 
								-- start time
	active boolean,         -- true - job could be scheduled
	broken boolean          -- true - job has errors in configutration
							-- that prevent it's further execution
);

cron_job

Type describes information about job scheduled execution

CREATE TYPE schedule.cron_job AS(
	cron integer,           -- job id
	node text,              -- node name to be executed on
	scheduled_at timestamp, -- scheduled execution time
	name text,              -- job name
	comments text,          -- job comments
	commands text[],        -- sql commands to be executed
	run_as text,            -- name of executor user
	owner text,             -- name of owner user
	use_same_transaction boolean,	-- if true the set of sql commands
									-- will be executed in same transaction
	started timestamp,      -- timestamp of this job execution started
	last_start_available timestamp,	-- time untill job must be started
	finished timestamp,     -- timestamp of this job execution finished
	max_run_time interval,  -- max execution time
	max_instances int,		-- the number of instances run at the same time
	onrollback text,        -- statement on ROLLBACK
	next_time_statement text,	-- statement to calculate next start time
	status text,			-- status of this task: working, done, error 
	message text			-- error message
);