How Are Analytic Functions Different From Group or Aggregate Functions?

1.
How are analytic functions different from group or aggregate

functions?
SELECT deptno,
COUNT(*) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30)
GROUP BY deptno;
DEPTNO DEPT_COUNT
---------------------- ----------------------
20 5
30 6
2 rows selected
Query-1
Consider the Query-1 and its result. Query-1 returns departments and their employee count. Most
importantly it groups the records into departments in accordance with the GROUP BY clause. As such any
non-"group by" column is not allowed in the select clause.
SELECT empno, deptno,
COUNT(*) OVER (PARTITION BY
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);
EMPNO DEPTNO DEPT_COUNT
---------- ---------- ----------
7369 20 5
7566 20 5
7788 20 5
7902 20 5
7876 20 5
7499 30 6
7900 30 6
7844 30 6
7698 30 6
7654 30 6
7521 30 6
11 rows selected.
Query-2
Now consider the analytic function query (Query-2) and its result. Note the repeating values of
DEPT_COUNT column.
This brings out the main difference between aggregate and analytic functions. Though analytic functions
give aggregate result they do not group the result set. They return the group value multiple times with
each record. As such any other non-"group by" column or expression can be present in the select clause,
for example, the column EMPNO in Query-2.
Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on
the query. The main ORDER BY clause of the query operates after the analytic functions. So analytic
functions can only appear in the select list and in the main ORDER BY clause of the query.
In absence of any PARTITION or <window_clause> inside the OVER( ) portion, the function acts on entire
record set returned by the where clause. Note the results of Query-3 and compare it with the result of
aggregate function query Query-4.
SELECT empno, deptno,
COUNT(*) OVER ( ) CNT
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, 1;
EMPNO DEPTNO CNT
---------- ---------- ----------
7782 10 8
7839 10 8
7934 10 8
7369 20 8
7566 20 8
7788 20 8
7876 20 8
7902 20 8
Query-3
SELECT COUNT(*) FROM emp
WHERE deptno IN (10, 20);
COUNT(*)
----------
8
2. PRISTOP
What about the amount of work done by Analytics, say if we don't mind rows
to be collapsed and
return all rows with a column for aggregates. Would the analytics do less
work. My guess is they
would do the same amount, because they need to perform aggregation anyway.
Something like say
Select sum(sal) over (partition by emp.dept) sal_per_Dept

from emp
vs
select sum(sal), emp.dept from emp group by emp.dept
???
Followup October 18, 2006 - 8am Central time zone:
think about temp.
say you have 100 deptnos

with an average of 100 employees per deptno
which would you rather have sitting in your temp? 100 rows, or 10,000
I don't even know why we are having this discussion - it seems so blatantly
obvious that
a) you use aggregation when you needs to, well, AGGREGATE

b) you use analytics when you don't want to AGGREGATE
Even if they performed IDENTICALLY - I cannot understand why we would be

having this discussion -
select sum(sal), deptno from emp group by deptno;
versus
select distinct sum(sal) over (partition by deptno), deptno from emp;
it just seems obvious which is the "right" approach.

select owner, count(*) from big_table group by owner
call count cpu elapsed disk query current

rows
------- ------ -------- ---------- ---------- ---------- ----------
----------
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.01 0 0 0
0
Fetch 3 0.84 0.83 12856 14465 0
26
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 5 0.84 0.84 12856 14465 0
26
Rows Row Source Operation

------- ---------------------------------------------------
26 HASH GROUP BY (cr=14465 pr=12856 pw=0 time=845645 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=14465 pr=12856 pw=0 time=1011134
us)
***************************************************************************
*****
select distinct owner, count(*) over (partition by owner) from big_table
call count cpu elapsed disk query current

rows
------- ------ -------- ---------- ---------- ---------- ----------
----------
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 3 2.37 2.32 12856 14465 0
26
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 5 2.37 2.32 12856 14465 0
26
Rows Row Source Operation

------- ---------------------------------------------------
26 HASH UNIQUE (cr=14465 pr=12856 pw=0 time=2325741 us)
1000000 WINDOW SORT (cr=14465 pr=12856 pw=0 time=2709348 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=14465 pr=12856 pw=0 time=1000114
us)

How Are Analytic Functions Different From Group or Aggregate Functions?

Uploaded by

Copyright:

Available Formats

How Are Analytic Functions Different From Group or Aggregate Functions?

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How Are Analytic Functions Different From Group or Aggregate Functions?

Uploaded by

Copyright:

Available Formats

1.

How are analytic functions different from group or aggregate

WHERE deptno IN (20, 30)

SELECT empno, deptno,

COUNT(*) OVER (PARTITION BY

WHERE deptno IN (20, 30);

EMPNO DEPTNO DEPT_COUNT

---------- ---------- ----------

SELECT empno, deptno,

COUNT(*) OVER ( ) CNT

WHERE deptno IN (10, 20)

EMPNO DEPTNO CNT

---------- ---------- ----------

SELECT COUNT(*) FROM emp

WHERE deptno IN (10, 20);

Something like say

Select sum(sal) over (partition by emp.dept) sal_per_Dept

select sum(sal), emp.dept from emp group by emp.dept

Followup October 18, 2006 - 8am Central time zone:

think about temp.

say you have 100 deptnos

a) you use aggregation when you needs to, well, AGGREGATE

Even if they performed IDENTICALLY - I cannot understand why we would be

select sum(sal), deptno from emp group by deptno;

select distinct sum(sal) over (partition by deptno), deptno from emp;

it just seems obvious which is the "right" approach.

call count cpu elapsed disk query current

Rows Row Source Operation

call count cpu elapsed disk query current

Rows Row Source Operation

You might also like