Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

Group and Aggregation Introduction

The document discusses how to use SQL to group and aggregate data using functions like COUNT, SUM, AVG, MAX, MIN. It covers the GROUP BY clause to group query results and calculate aggregates. It also discusses the HAVING clause to filter groups after aggregation unlike the WHERE clause. Examples provided analyze business data by department sales, highest salary, payments by customer.

Uploaded by

Shahab Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Group and Aggregation Introduction

The document discusses how to use SQL to group and aggregate data using functions like COUNT, SUM, AVG, MAX, MIN. It covers the GROUP BY clause to group query results and calculate aggregates. It also discusses the HAVING clause to filter groups after aggregation unlike the WHERE clause. Examples provided analyze business data by department sales, highest salary, payments by customer.

Uploaded by

Shahab Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Grouping Data and Aggregate

Functions
SQL provides powerful capabilities for
grouping query results and calculating
aggregates to analyze and summarize data.
GROUP By clause
The GROUP BY clause groups a result set by one or more
columns. For each distinct value in those columns it aggregates
the rows into a single summary row.

For example, to count customers per city:

SELECT city, COUNT(*) AS num_customers


FROM customers
GROUP BY city;

This groups all rows by the "city" column. For each unique city, it outputs a
single row with the city name and count of rows for that city.
We can group by multiple columns to create
subgroupings - like customers per city per
country:

SELECT country, city, COUNT(*) AS


num_customers
FROM customers
GROUP BY country, city;

Now for each country and city combination, we


get a count.
Aggregate Functions
Common aggregate functions used with GROUP BY are:

COUNT(): Counts rows in a group. Can pass * or a column name.

SUM(): Sums a numeric column.

AVG(): Averages a numeric column.

MAX()/MIN(): Maximum and minimum values in a column.


Let's see some examples of using aggregates to gain
business insights:
SELECT MAX(salary) AS highest_salary
FROM employees;

Finds highest salary paid.

SELECT department, SUM(sales) AS total_sales


FROM sales_data
GROUP BY department;

Gets total sales by department. Useful for department


performance comparisons.
HAVING Clause
•Where GROUP BY groups rows by columns,
HAVING filters row groups based on
aggregate conditions. This is done after
aggregation, unlike WHERE which filters
before.
•For example to get high performing
departments with over $100,000 in sales
SELECT department, SUM(sales) AS total_sales
FROM sales_data
GROUP BY department
HAVING SUM(sales) > 100000;

The HAVING clause filters the groups after computing


totals. This leaves only departments with sales greater
than the given amount.

You can combine HAVING with WHERE. WHERE


filters individual rows before aggregation. HAVING
then filters whole groups after calculating summaries.
Mastering GROUP BY, aggregates like
COUNT/SUM/AVG/MIN/MAX, along with the
HAVING clause opens up a breadth of reporting and
business analytics capabilities from database tables.
With a bit of SQL knowledge you can derive
surprisingly powerful insights
COUNT
QUESTION: Report the total number of
payments received before October 28, 2004.

We can use the `COUNT` function to count


the number of rows returned for a query.

SELECT COUNT(*) FROM payments


WHERE paymentDate<"2004-10-28";
COUNT and DISTINCT

Another common use case involves counting the number of


distinct values in a column.

**QUESTION**: Report the number of customer who have


made payments before October 28, 2004.

We just need to add the `DISTINCT` keyword before a


column name.

SELECT COUNT(DISTINCT customerNumber) FROM


payments WHERE paymentDate<"2004-10-28";
Chaining Queries
What if we wanted not just customer numbers, but all the
details? We can do this by chaining two SQL queries.

**QUESTION**: Retrieve the details all customers who


have made a payment before October 28, 2004.

We can use the result of the first query (a list of customer


numbers) as an input to a second query.

SELECT * FROM customers WHERE customerNumber in


(SELECT DISTINCT customerNumber FROM payments
WHERE paymentDate<"2004-10-28");

The above question can also be answered using join, which


we'll look at later.
GROUP BY and AS
**QUESTION**: Find the total number of payments made
each customer before October 28, 2004.

While performing aggregation, we can specify a column to


group rows by. We can also rename aggregate columns
using the `AS` keyword.

SELECT customerNumber, COUNT(*) as totalPayments


FROM payments WHERE paymentDate<"2004-10-28"
GROUP BY customerNumber;
SUM Apart from the count of rows, we can
also compute the sum of values in a column.
**QUESTION**: Find the total amount paid
by each customer payment before October 28,
2004.

SELECT customerNumber, SUM(amount) as


totalPayment > FROM payments WHERE
paymentDate<"2004-10-28" > GROUP BY
customerNumber; >
SUM and COUNT
**QUESTION**: Find the total no. of payments
and total payment amount for each customer for
payments made before October 28, 2004.

We can create separate columns for `SUM` and


`COUNT`

SELECT customerNumber, COUNT(*) as


numberOfPayments, SUM(amount) as totalPayment
FROM payments WHERE paymentDate<"2004-10-
28" GROUP BY customerNumber;
ORDER BY and LIMIT

**QUESTION**: Retrieve the customer number for 10 customers who


made the highest total payment in 2004.

SELECT customerNumber, SUM(amount) as totalPayment


FROM payments
WHERE paymentDate<"2004-10-28"
GROUP BY customerNumber
ORDER BY totalPayment DESC
LIMIT 10;
OFFSET

To get the next 10 results, we can simply add an `OFFSET` with the
number of rows to skip.

SELECT customerNumber, SUM(amount) as totalPayment


FROM payments
WHERE paymentDate<"2004-10-28"
GROUP BY customerNumber
ORDER BY totalPayment DESC
LIMIT 10
OFFSET 10;
UCASE and CONCAT

**QUESTION**: Display the full name of point of


contact each customer in the United States in upper
case, along with their phone number, sorted by
alphabetical order of customer name.

SELECT customerName,
CONCAT(UCASE(contactFirstName), " ",
UCASE(contactLastName)) AS contact, phone
FROM customers WHERE country="USA" ORDER
BY customerName;
SUBSTRING and LCASE
**QUESTION**: Display a paginated list of
customers (sorted by customer name), with a
country code column. The country is simply
the first 3 letters in the country name, in lower
case.
select customerName,
LCASE(SUBSTRING(country, 1, 3)) AS
countryCode FROM customers ORDER BY
customerName;
**QUROUNDESTION**: Display the list of the 5
most expensive products in the "Motorcycles"
product line with their price (MSRP) rounded to
dollars.

select productName, ROUND(MSRP) AS salePrice


FROM products WHERE
productLine="Motorcycles" ORDER BY salePrice
DESC LIMIT 5;
**QUESTION**: Display the product code,
product name, buy price, sale price and profit margin
percentage (`(MSRP - buyPrice)*100/buyPrice`) for
the 10 products with the highest profit margin.
Round the profit margin to 2 decimals.

SELECT productCode, productName, buyPrice,


MSRP, ROUND(((MSRP -
buyPrice)*100/buyPrice), 2) AS profitMargin
FROM products ORDER BY profitMargin DESC
LIMIT 10;

You might also like