Behind The Scenes of SQL - Understanding SQL Query Execution
Behind The Scenes of SQL - Understanding SQL Query Execution
71 3 9 Share
When you write a query, hit submit, and then run the query or that little triangle
in DBBeaver…
Sure, you likely understand that data is pulled from multiple tables, data is
filtered, and aggregations occur.
How does SQL go from English into the lingua franca of data?
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 1/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
My goal is to try to make this more exciting than my professor did in my database
course.
Also, I am going to use the term SQL engine instead of RDBMS just because you
have solutions like Trino or other similar solutions that are not a RDBMS.
From there, you’ll likely have two more steps, depending on the SQL engine you’re
using.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 2/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Syntax Check - This ensures that the SQL query follows the correct grammar
and structure as defined by the SQL language. It identifies syntax errors like
missing commas, misplaced keywords, or unmatched parentheses and that
keywords are in the proper order.
Semantics Check - The SQL engine checks the query for semantic
correctness. This involves checking whether the tables, columns, and other
database objects referenced in the query exist and whether the operations are
valid (e.g., ensuring data types are compatible).
Now as I said, this is dependent on what SQL engine you’re using. For example,
here is an expert from Understanding MySQL Internals,
MySQL’s parser, like many others, consists of two parts: the lexical scanner
and the grammar rule module. The lexical scanner breaks the entire query
into tokens (elements that are indivisible, such as column names), while
the grammar rule module finds a combination of SQL grammar rules that
produce this sequence, and executes the code associated with those rules.
In the end, a parse tree is produced, which can now be used by the
optimizer. - Understanding MySQL Internals.
The end result will be something like the parse tree below.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 3/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Source
Translator/Algebrizer/Query Transformer
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 4/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Depending on what SQL engine you’re using will impact the next step in the
process. For example, SQL Server has an Algebrizer component. Oracle has their
Query Transformer, and still other query engines have other names for this
component.
Circling back to Microsoft SQL Server and it’s Algebrizer, it’s responsible for
converting the logical SQL query into an algebraic representation(e.g. relational
algebra)
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 5/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Now if you’re like me, the only time you’ve ever seen relational algebra was in your
database course where we had to hand-write out relational algebra for far more
complex queries. But if you’re really curious about the relational algebra for your
query, you can just use this online tool.
Optimizer
The next step in the process is for all of those logical operations plans to map to
physical operations. Now sometimes, I find that the terms logical and physical get a
little abstract.
The way you can think about it is that when we refer to logical, it's more like the
blueprint of a house. Blueprints don’t necessarily know all the implementation
details.
Whereas physical references the actual construction process and building of the
house. Now, you’re dealing with the concrete details: the materials (bricks, wood,
nails), the plumbing, wiring, and each room’s actual dimensions. The focus is on
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 6/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
how to practically build the house, ensuring that it stands strong and functions as
intended.
And here is the thing; just because something might have seemed to be well
planned out logically, sometimes someone puts a bolt in a weird corner that a
carpenter can’t actually hammer in.
Similarly, just because one logical operation flow might seem best, in
implementation, it might not be.
To give an example, when you write the word “JOIN”, the query engine has a few
options on how to implement it. It could be a Nested Loops Join, Merge Join, or
Hash Join.
The query optimizer will generate as many possible candidate execution plans in
some constrained space. From there, the Optimizer selects the query which is
having a low cost; thisOptimizer uses the statistics about the data.
Execution Plans
Now I wanted to talk a little bit about Execution plans. After all, it's one of the few
concepts here that you can likely actually examine.
Once you’ve worked with SQL for a while, you’ve likely heard of the term
“execution plan.” It might be somewhat dependent on what SQL engine you’re
working with but most of them allow you to see both the actual and estimated
plans your query engine will/has taken to query your data.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 7/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
For me, this was the first type I saw from Microsoft SQL Server:
Source
I used it a lot when I was trying to deal with queries that were taking longer than
expected and trying to figure out if there was somewhere I could optimize it.
Also, for those who might be more familiar with a more tabular version of an
execution plan, you might see something like the table below, which I pulled from
Oracles Optimizer.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 8/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Every SQL engine I have used has allowed me to use the EXPLAIN PLAN
command to get this.
So if you’re looking for a practical takeaway from this article, here it is!
Execution
Now that the optimizer has selected the optimal query. The execution plan will be
sent off to the query execution engine. Now perhaps in the future, someone will
build an agnostic Optimizer that automatically goes through and looks at
Presto/Trino, Snowflake, DuckDB, Spark, etc., and picks the best engine for the
job. Depending on whether you’re looking for cost-effective or fast (honestly,
perhaps this already exists and if it does, I’ll add it to this article if someone wants
to send it over). However, I imagine most individuals will want to be able to
personally select which engine is used when vs. having the Optimizer select it.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tru… 9/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Once the execution engine has run the execution plan, it will return the results
(assuming it doesn’t run out of memory). And now we are back to where you likely
are accustomed to thinking of a query.
Side note: There are a few other steps and components that I didn’t include such
as security and the query cache.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 10/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
I spent 6 hours learning how Apache Spark plans the execution for us by
Vu Trinh
Database Performance Tuning Guide 11g Release
Queryparser, an Open Source Tool for Parsing and Analyzing SQL by Uber
How A Query Engine Works
The Snowflake Elastic Data Warehouse
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 11/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Every once and a while, as much as I like to harp on business value, I believe it is
important to stop and dig into a subject. Enjoy learning about how things work.
Even if it’s just for the sake of it.
Also, because there are points where understanding things at lower levels does
start to provide value, you can start to make more decisions based on more
compact areas of knowledge.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 12/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Join us on October 3rd and 4th for the Data Engineering And Machine Learning
Summit 2024.
Immerse yourself in a confluence of data engineering and data science where the
spotlight is on those who truly get their hands dirty - the practitioners.
🛠️ Practitioner-First Philosophy
We believe in the power of practice. At the summit, you won’t just hear high-level
theories, but practical insights, solutions, and real-world stories. Our meticulously
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 13/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
curated lineup includes dozens of speakers, all of whom are leading practitioners
in the world of data.
🚀 Cutting-Edge Innovations
Explore the latest tools, platforms, and methodologies that are revolutionizing
data engineering and data science.
Special Thanks to Databricks for sponsoring DEML 2024 and Accel Events for
making it so easy to manage the event.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 14/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 15/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
You’ll find plenty of free resources you can access to expedite your journey as a
technical consultant as well as be able to talk to other consultants about questions
you may have!
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 16/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
If you’re relying on your OLTP system to provide analytics, you might be in for a
surprise. While it can work initially, these systems aren’t designed to handle
complex queries. Adding databases like MongoDB and CassandraDB only makes
matters worse, since they’re not SQL-friendly – the language most analysts and
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 17/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
data practitioners are used to. Over time, these systems simply can’t keep up with
the demands of performing analytics.
Why does this matter? After all, data is data, right? Well, there’s a big difference
between online transaction processing (OLTP) and online analytical processing
(OLAP). Each has its own unique requirements in terms of software and design.
That’s why solutions like Teradata and Vertica have played such a large role in
many enterprises. In fact, Teradata was one of the first data warehouse systems to
handle a TB of data for Walmart.
But here’s the thing: Why go through the trouble of duplicating your data and
creating a separate analytics system? In this article, we’ll explore the reasons why
you need to develop an analytical system if you want to answer your business
questions effectively.
I often find that people working with data don’t understand its building blocks.
Data is frequently treated as something that’s stored, queried, and cobbled together.
Consequently, data models are often an unintentional side effect or artifact
created without deliberate thought.
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 18/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Before we build a data model, we must grasp the critical building blocks forming
it.
71 Likes · 9 Restacks
Comments Restacks
Write a comment...
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 19/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
Mehran Sep 17
Thank you! It would be really valuable if you could write an article explaining
how the indexing mechanism works, for example, in PostgreSQL.
LIKE (1) REPLY SHARE
Ramona C Truta Ramona C Truta Sep 17
Excellent post, Ben!
After being a TA marking hand-written RA queries, as a teacher instituted a
LaTeX-typed one. I provided the students with a small stub with everything
they needed, and it worked really well.
I would also include Alex Xu's diagram, or yours if you have it, on how a
typical query gets evaluated, the one with the blocks etc.
Not knowing data design and not knowing these deep fundamentals on
query evaluations result in so much overhead cost. It's really great you're
bringing this into focus!
I do know of a startup working on using AI to optimize queries. I can send
that info separately, we had a chance to talk at a low-key event.
Please include index-only queries in your index post :)
LIKE REPLY SHARE
1 more comment...
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 20/21
9/26/24, 6:27 PM (4) Behind the Scenes of SQL: Understanding SQL Query Execution
https://seattledataguy.substack.com/p/behind-the-scenes-of-sql-understanding?utm_source=post-email-title&publication_id=21105&post_id=148891421&utm_campaign=email-post-title&isFreemail=tr… 21/21