Windowing Functions
Windowing Functions
Hitoshi Harada
(umi.tanuki@gmail.com)
Head of Engineering Dept.
FORCIA, Inc.
David Fetter
(david.fetter@pgexperts.com)
PGCon2009, 21-22 May 2009 Ottawa
Windowed table
Operates on a windowed table
Returns a value for each row
Returned value is calculated from the rows in the window
empno
salary
develop
10
5200
sales
5000
personnel
3500
sales
4800
sales
550
personnel
3900
develop
4200
develop
4500
sales
4800
develop
6000
develop
11
5200
empno
salary
avg
diff
develop
6000
5020
980
sales
5500
5025
475
personnel
3900
3700
200
develop
11
5200
5020
180
develop
10
5200
5020
180
sales
5000
5025
-25
personnel
3500
3700
-200
sales
4800
5025
-225
sales
4800
5025
-225
develop
4500
5020
-520
develop
4200
5020
-820
Anatomy of a Window
Represents set of rows abstractly as:
A partition
Specified by PARTITION BY clause
Never moves
Can contain:
A frame
Specified by ORDER BY clause and frame clause
Defined in a partition
Moves within a partition
Never goes across two partitions
A partition
Specified by PARTITION BY clause in OVER()
Allows to subdivide the table, much like
GROUP BY clause
Without a PARTITION BY clause, the whole
table is in a single partition
func
(args)
OVER
)
partition-clause
orderclause
PARTITION BY expr,
PGCon2009, 21-22 May 2009 Ottawa
frameclause
A frame
Specified by ORDER BY clause and frame
clause in OVER()
Allows to tell how far the set is applied
Also defines ordering of the set
Without order and frame clauses, the whole of
partition is a single frame
func
(args)
OVER
)
partition-clause
orderclause
frameclause
AS
)
partition-clause
orderclause
frameclause
row_number()
rank()
dense_rank()
percent_rank()
cume_dist()
ntile()
lag()
lead()
first_value()
last_value()
nth_value()
row_number()
Returns number of the current row
SELECT val, row_number() OVER (ORDER BY val DESC) FROM tbl;
val
row_number()
rank()
Returns rank of the current row with gap
SELECT val, rank() OVER (ORDER BY val DESC) FROM tbl;
val
rank()
gap
Note: rank() OVER(*empty*) returns 1 for all rows, since all rows
are peers to each other
PGCon2009, 21-22 May 2009 Ottawa
dense_rank()
Returns rank of the current row without gap
SELECT val, dense_rank() OVER (ORDER BY val DESC) FROM tbl;
val
dense_rank()
no gap
Note: dense_rank() OVER(*empty*) returns 1 for all rows, since all rows
are peers to each other
PGCon2009, 21-22 May 2009 Ottawa
percent_rank()
Returns relative rank; (rank() 1) / (total row 1)
SELECT val, percent_rank() OVER (ORDER BY val DESC) FROM tbl;
val
percent_rank()
0.666666666666667
cume_dist()
Returns relative rank; (# of preced. or peers) / (total row)
SELECT val, cume_dist() OVER (ORDER BY val DESC) FROM tbl;
val
cume_dist()
0.5
=2/4
0.5
=2/4
0.75
=3/4
=4/4
ntile()
Returns dividing bucket number
SELECT val, ntile(3) OVER (ORDER BY val DESC) FROM tbl;
val
ntile(3)
The results are the divided positions, but if theres remainder add
row from the head
Note: ntile() OVER (*empty*) returns same values as above, since
ntile() doesnt care the frame but works against the partition
PGCon2009, 21-22 May 2009 Ottawa
4%3=1
lag()
Returns value of row above
SELECT val, lag(val) OVER (ORDER BY val DESC) FROM tbl;
val
lag(val)
NULL
3
Note: lag() only acts on a partition.
lead()
Returns value of the row below
SELECT val, lead(val) OVER (ORDER BY val DESC) FROM tbl;
val
lead(val)
NULL
Note: lead() acts against a partition.
PGCon2009, 21-22 May 2009 Ottawa
first_value()
Returns the first value of the frame
SELECT val, first_value(val) OVER (ORDER BY val DESC) FROM tbl;
val
first_value(val)
last_value()
Returns the last value of the frame
SELECT val, last_value(val) OVER
(ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEEDING
AND UNBOUNDED FOLLOWING) FROM tbl;
val
last_value(val)
nth_value()
Returns the n-th value of the frame
SELECT val, nth_value(val, val) OVER
(ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEEDING
AND UNBOUNDED FOLLOWING) FROM tbl;
val
nth_value(val, val)
NULL
NULL
aggregates(all peers)
Returns the same values along the frame
SELECT val, sum(val) OVER () FROM tbl;
val
sum(val)
14
14
14
14
cumulative aggregates
Returns different values along the frame
SELECT val, sum(val) OVER (ORDER BY val DESC) FROM tbl;
val
sum(val)
10
10
13
14
Note: row#1 and row#2 return the same value since they are the peers.
the result of row#3 is sum(val of row#1#3)
src/include/catalog/pg_proc.h
pg_proc catalog definitions
src/backend/utils/adt/windowfuncs.c
implementations of built-in window functions
PGCon2009, 21-22 May 2009 Ottawa
WindowDef
parse node for window definition over (partition by
order by )
WindowClause
parse node for window clause window (partition by
order by ) as w
WindowAgg
plan node for window aggregate
WindowAggState
executor node for window aggregate
PGCon2009, 21-22 May 2009 Ottawa
Final output
TargetEntry1:depname
TargetEntry2:empno
TargetEntry3:salary
TargetEntry4:avg
TargetEntry5:rank
Final output
TargetEntry1:depname
TargetEntry2:empno
TargetEntry3:salary
TargetEntry4:avg
TargetEntry5:rank
SeqScan
Var1:depname
Var2:empno
Var3:salary
PGCon2009, 21-22 May 2009 Ottawa
Final output
TargetEntry1:depname
TargetEntry2:empno
TargetEntry3:salary
TargetEntry4:avg
TargetEntry5:rank
WindowAgg1
Var1:depname
Var2:empno
Var3:salary
WindowFunc1:avg
Sort1
SeqScan
Var1:depname
Var2:empno
Var3:salary
PGCon2009, 21-22 May 2009 Ottawa
Final output
TargetEntry1:depname
WindowAgg2
Var1:depname
TargetEntry2:empno
Var2:empno
TargetEntry3:salary
Var3:salary
TargetEntry4:avg
Var4:avg
TargetEntry5:rank
WindowFunc:rank
Sort2
WindowAgg1
Var1:depname
Var2:empno
Var3:salary
WindowFunc1:avg
Sort1
SeqScan
Var1:depname
Var2:empno
Var3:salary
PGCon2009, 21-22 May 2009 Ottawa
Sort!
row#1
row#2
Table
Destination
row#3
row#4
Values are never shared among
returned rows
row#1
row#2
row#3
Table
row#4
row#1
Window
Object
powered
by
Tuplestore
row#2
row#3
row#4
Destination
winobj->markpos
row#1
row#1
row#2
row#2
row#3
row#3
row#4
row#5
Table
row#6
Window
Object
powered
by
Tuplestore
row#4
row#5
row#6
Destination
wfunc1(fcinfo){
WindowObject winobj = PG_WINDOW_OBJECT();
}
mark pos
fetch value
fetch value
Return a value
Window
Object
result1
result2
Final
func
Final
func
aggregate
trans value
aggregate
trans value
Initialize aggregate
Trans
func
Trans
func
Window
Object
Trans
func
WinGetPartitionLocalMemory(winobj, sz)
Store its own memory. It is used in rank() to save the
current rank, for example.
WinGetCurrentPosition(winobj)
Get current position in the partition (not in the frame).
Same as row_number()
WinGetPartitionRowCount(winobj)
Count up how many rows in the partition. Used in ntile() for
example.
WinSetMarkPosition(winobj, markpos)
Give to winobj a hint that you dont want rows preceding to
markpos anymore
http://archives.postgresql.org/pgsqlhackers/2009-01/msg00562.php
With standard SQL, you have to wait for
extended frame clause support like ROWS n
(PRECEDING|FOLLOWING) to calculate curve
smoothing, but this sample does it now.
Future work
Performance improvements
Reduce tuplestore overhead
Relocate WindowAgg node in the plan
Finally
Heikki Linnakangas
improvement of window object and its buffering strategy
Tom Lane
code rework overall and deep reading in the SQL spec
David Fetter
help miscellaneous things like git repository and this session
David Rowley
many tests to catch early bugs, as well as spec investigations