Sqlfordevscom Next Level Database Techniques For Developers 37 40
Sqlfordevscom Next Level Database Techniques For Developers 37 40
SELECT *
FROM product_comments
WHERE product_id = 2
ORDER BY comment_id ASC
LIMIT 10
-- MySQL
CREATE TABLE product_comments (
product_id bigint,
comment_id bigint auto_increment UNIQUE KEY,
message text,
PRIMARY KEY (product_id, comment_id)
);
-- PostgreSQL
CREATE TABLE product_comments (
product_id bigint,
comment_id bigint GENERATED ALWAYS AS IDENTITY,
message text,
PRIMARY KEY (product_id, comment_id)
);
CLUSTER product_comments USING product_comments_pkey;
Every data you insert into a table will be physically sorted in the database's file so that
common tasks like selecting and updating rows are most efficient. But the database can't
know how you will use those rows. When building an e-commerce application, you want to
get some comments for a product. Typically, these comments are stored by an incrementing
primary key and are distributed through the table by their insertion order. But you can
enforce that these rows are stored physically in ascending order by the product (product_id)
and date of the comment (comment_id). The database can now efficiently find the first
comment and the next nine in the following rows instead of collecting them at 10 different
locations. Whether you use an SSD or HDD, random access to multiple locations will always
be slower than a single operation fetching multiple consecutive bytes.
This is the most exciting and most complex performance optimization I can teach you for big
data. The article below shares much more information and implementation obstacles.
Notice: I have already written a more extensive text about this topic on
SqlForDevs.com: Sorted Tables for Faster Range-Scans
37
Pre-Aggregation of Values for Faster Queries
SELECT SUM(likes_count)
FROM articles
WHERE user_id = 1 and publish_year = 2022;
Even if your schema is very well-designed and your queries all use a perfect index, they may
still be slow. When analytical queries, e.g. for a dashboard, have to aggregate tens or
hundreds of thousands of rows the performance will suffer drastically. Such queries are
constrained by computational limits about how fast data can be loaded and the required
time to extract the information from the rows or indexes and aggregate them. This operation
is very fast for small amounts of data, but the bigger it gets, the more you should look into
storing pre-aggregated values. No intelligent indexing will beat the performance
improvement of not having to aggregate tens of thousands of values.
38
Indexes
Without indexes your application would be slow as every operation would have to scan the
whole table. Consequently, indexes are the most interesting topic for developers but also
the most complicated one. A lot of content has been written about database indexing that I
don't want to repeat. Therefore, I am only sharing more extraordinary approaches and
features you may not have seen before.
The indexing chapter will show you a lot of exceptional indexing approaches like uniqueness
constraints for soft-deleted tables, simple rules for multi-column indexing, ways to find and
delete unused indexes and much more.
39
Indexes On Functions And Expressions
-- MySQL
CREATE INDEX users_email_lower ON users ((lower(email)));
-- PostgreSQL
CREATE INDEX users_email_lower ON users (lower(email));
Most developers are puzzled that their index on a column is not used when it is transformed
by a function or expression. A Google search results in countless StackOverflow articles
stating that you can't use an index in these cases, but this information is wrong! You can
create specialized indexes on a function or expression that are used whenever the exact
same transformation is applied in your WHERE.
Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: Function-Based Indexes
40