Sqlfordevscom Next Level Database Techniques For Developers 22 27
Sqlfordevscom Next Level Database Techniques For Developers 22 27
-- MySQL
SELECT
SUM(released_at = 2001) AS released_2001,
SUM(released_at = 2002) AS released_2002,
SUM(director = 'Steven Spielberg') AS director_stevenspielberg,
SUM(director = 'James Cameron') AS director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';
-- PostgreSQL
SELECT
COUNT(*) FILTER (WHERE released_at = 2001) AS released_2001,
COUNT(*) FILTER (WHERE released_at = 2002) AS released_2002,
COUNT(*) FILTER (WHERE director = 'Steven Spielberg') AS
director_stevenspielberg,
COUNT(*) FILTER (WHERE director = 'James Cameron') AS
director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';
In some cases, you need to calculate multiple different statistics. Instead of executing
numerous queries, you can write one which will collect all the information in one single pass
through the data. Depending on your data and indexes this could speed up or slow down
your execution time. You should definitely test it on your application.
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: Multiple Aggregates in One Query
22
Limit Rows Also Including Ties
-- PostgreSQL
SELECT *
FROM teams
ORDER BY winning_games DESC
FETCH FIRST 3 ROWS WITH TIES;
Imagine you want to rank the teams of a sports league and show the top three ones. In rare
cases, at least 2 teams will have the same amount of winning games at the end of the
season. If they are both on 3rd place you may want to expand your limit to include both of
them. The WITH TIES option is doing precisely that. Whenever some rows would be
excluded despite having the same values as those included, they are included too although
the limit is exceeded.
23
Fast Row Count Estimates
-- MySQL
EXPLAIN FORMAT=TREE SELECT * FROM movies WHERE rating = 'NC-17' AND
price < 4.99;
-- PostgreSQL
EXPLAIN SELECT * FROM movies WHERE rating = 'NC-17' AND price < 4.99;
Showing the number of matching rows is a crucial feature for most applications, but it is
sometimes hard to implement for large databases. The larger a database is, the slower
counting the number of rows will be. The query will be very slow when no index exists to
help calculate the count. But even an existing index will not make counting hundreds of
thousands of index fast. However, an approximate count of rows may be good enough for
some use cases. The database's query planner always calculates an approximate row count
for a query that can be extracted by asking the database for the execution plan.
24
Date-Based Statistical Queries With Gap-Filling
-- MySQL
SET cte_max_recursion_depth = 4294967295;
WITH RECURSIVE dates_without_gaps(day) AS (
SELECT DATE_SUB(CURRENT_DATE, INTERVAL 14 DAY) as day
UNION ALL
SELECT DATE_ADD(day, INTERVAL 1 DAY) as day
FROM dates_without_gaps
WHERE day < CURRENT_DATE
)
SELECT dates_without_gaps.day, COALESCE(SUM(statistics.count), 0)
FROM dates_without_gaps
LEFT JOIN statistics ON(statistics.day = dates_without_gaps.day)
GROUP BY dates_without_gaps.day;
-- PostgreSQL
SELECT dates_without_gaps.day, COALESCE(SUM(statistics.count), 0)
FROM generate_series(
CURRENT_DATE - INTERVAL '14 days',
CURRENT_DATE,
'1 day'
) as dates_without_gaps(day)
LEFT JOIN statistics ON(statistics.day = dates_without_gaps.day)
GROUP BY dates_without_gaps.day;
The results for some statistical calculations will have gaps because no information was
saved for specific days. But instead of back-filling these holes with application code, the
database query can be restructured: A sequence of gapless values is created as source for
joining to the statistical data. For PostgreSQL the generate_series function might be used to
create the sequence, whereas for MySQL the same needs to be performed manually using a
recursive common table expression (CTE).
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: Fill Gaps in Statistical Time Series Results
25
Table Joins With A For-Each Loop
-- MySQL, PostgreSQL
SELECT customers.*, recent_sales.*
FROM customers
LEFT JOIN LATERAL (
SELECT *
FROM sales
WHERE sales.customer_id = customers.id
ORDER BY created_at DESC
LIMIT 3
) AS recent_sales ON true;
When joining tables, the rows of both tables are linked together based on some conditions.
However, the joining condition can only include all matching rows of the different table. It is
impossible to control the number of rows for every iteration of the join to e.g. limit the
bought products for every customer to just the last three ones.
The special lateral join type combines a join and a subquery. A subquery will be executed for
every row of the join's source table. Within that subquery, you can e.g. select only the last
three bought products of a customer. And as you already selected only matching sales for
every customer, a special true join condition indicates that all rows will be used. You can
now make for-each loops within your database. You've learned the holy grail of SQL!
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: For each loops with LATERAL Joins
26
Schema
The schema is probably the most crucial part of your database. The more complex your
schema is, the slower new developers will be able to work on your application. But it also
provides the possibility to go new ways and make them more straightforward by using
modern database features. Actually, many of those features can offload a lot of custom
application logic to the database and make development faster.
The schema chapter will show you how e.g. JSON documents can replace many tables, data
can be saved for faster querying or a simpler approach for storing trees.
27