Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

How Are Analytic Functions Different From Group or Aggregate Functions?

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

1.

How are analytic functions different from group or aggregate


functions?
SELECT deptno,

COUNT(*) DEPT_COUNT

FROM emp

WHERE deptno IN (20, 30)

GROUP BY deptno;

DEPTNO                 DEPT_COUNT            

---------------------- ----------------------

20                     5                      

30                     6                      

2 rows selected

Query-1
Consider the Query-1 and its result. Query-1 returns departments and their employee count. Most
importantly it groups the records into departments in accordance with the GROUP BY clause. As such any
non-"group by" column is not allowed in the select clause.

SELECT empno, deptno,

COUNT(*) OVER (PARTITION BY

deptno) DEPT_COUNT

FROM emp

WHERE deptno IN (20, 30);

EMPNO DEPTNO DEPT_COUNT

---------- ---------- ----------

7369 20 5

7566 20 5

7788 20 5

7902 20 5

7876 20 5

7499 30 6

7900 30 6

7844 30 6

7698 30 6

7654 30 6

7521 30 6

11 rows selected.

Query-2
Now consider the analytic function query (Query-2) and its result. Note the repeating values of
DEPT_COUNT column.
This brings out the main difference between aggregate and analytic functions. Though analytic functions
give aggregate result they do not group the result set. They return the group value multiple times with
each record. As such any other non-"group by" column or expression can be present in the select clause,
for example, the column EMPNO in Query-2.

Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on
the query. The main ORDER BY clause of the query operates after the analytic functions. So analytic
functions can only appear in the select list and in the main ORDER BY clause of the query.

In absence of any PARTITION or <window_clause> inside the OVER( ) portion, the function acts on entire
record set returned by the where clause. Note the results of Query-3 and compare it with the result of
aggregate function query Query-4.

SELECT empno, deptno,

COUNT(*) OVER ( ) CNT

FROM emp

WHERE deptno IN (10, 20)

ORDER BY 2, 1;

EMPNO DEPTNO CNT

---------- ---------- ----------

7782 10 8

7839 10 8

7934 10 8

7369 20 8

7566 20 8

7788 20 8

7876 20 8

7902 20 8

Query-3

SELECT COUNT(*) FROM emp

WHERE deptno IN (10, 20);

COUNT(*)

----------

8
2. PRISTOP
What about the amount of work done by Analytics, say if we don't mind rows
to be collapsed and
return all rows with a column for aggregates. Would the analytics do less
work. My guess is they
would do the same amount, because they need to perform aggregation anyway.

Something like say

Select sum(sal) over (partition by emp.dept) sal_per_Dept


from emp

vs

select sum(sal), emp.dept from emp group by emp.dept

???

Followup   October 18, 2006 - 8am Central time zone:

think about temp.

say you have 100 deptnos


with an average of 100 employees per deptno

which would you rather have sitting in your temp? 100 rows, or 10,000

I don't even know why we are having this discussion - it seems so blatantly
obvious that

a) you use aggregation when you needs to, well, AGGREGATE


b) you use analytics when you don't want to AGGREGATE

Even if they performed IDENTICALLY - I cannot understand why we would be


having this discussion -

select sum(sal), deptno from emp group by deptno;

versus

select distinct sum(sal) over (partition by deptno), deptno from emp;

it just seems obvious which is the "right" approach.


select owner, count(*) from big_table group by owner

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ----------
----------
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.01 0 0 0
0
Fetch 3 0.84 0.83 12856 14465 0
26
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 5 0.84 0.84 12856 14465 0
26

Rows Row Source Operation


------- ---------------------------------------------------
26 HASH GROUP BY (cr=14465 pr=12856 pw=0 time=845645 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=14465 pr=12856 pw=0 time=1011134
us)
***************************************************************************
*****
select distinct owner, count(*) over (partition by owner) from big_table

call count cpu elapsed disk query current


rows
------- ------ -------- ---------- ---------- ---------- ----------
----------
Parse 1 0.00 0.00 0 0 0
0
Execute 1 0.00 0.00 0 0 0
0
Fetch 3 2.37 2.32 12856 14465 0
26
------- ------ -------- ---------- ---------- ---------- ----------
----------
total 5 2.37 2.32 12856 14465 0
26

Rows Row Source Operation


------- ---------------------------------------------------
26 HASH UNIQUE (cr=14465 pr=12856 pw=0 time=2325741 us)
1000000 WINDOW SORT (cr=14465 pr=12856 pw=0 time=2709348 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=14465 pr=12856 pw=0 time=1000114
us)

You might also like