SQL Lesson 12 - Order of execution of a Query

Now that we have an idea of all the parts of a query, we can now talk about how they all fit together in the context of a complete query.

Complete SELECT query

SELECT * ,
DISTINCT column, AGG_FUNC(_column_or_expression_), … 
FROM mytable 
JOIN another_table ON mytable.column = another_table.column 
WHERE _constraint_expression_ 
GROUP BY column 
HAVING _constraint_expression_ 
ORDER BY _column_ ASC/DESC 
LIMIT _count_ OFFSET _COUNT_;

Each query begins with finding the data that we need in a database, and then filtering that data down into something that can be processed and understood as quickly as possible. Because each part of the query is executed sequentially, it's important to understand the order of execution so that you know what results are accessible where.

Query order of execution

1. FROM and JOINs

The FROM clause, and subsequent JOINs are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause, and can cause temporary tables to be created under the hood containing all the columns and rows of the tables being joined.

2. WHERE

Once we have the total working set of data, the first-pass WHERE constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded. Each of the constraints can only access columns directly from the tables requested in the FROM clause. Aliases in the SELECT part of the query are not accessible in most databases since they may include expressions dependent on parts of the query that have not yet executed.

3. GROUP BY

The remaining rows after the WHERE constraints are applied are then grouped based on common values in the column specified in the GROUP BY clause. As a result of the grouping, there will only be as many rows as there are unique values in that column. Implicitly, this means that you should only need to use this when you have aggregate functions in your query.

4. HAVING

If the query has a GROUP BY clause, then the constraints in the HAVING clause are then applied to the grouped rows, discard the grouped rows that don't satisfy the constraint. Like the WHERE clause, aliases are also not accessible from this step in most databases.

5. SELECT

Any expressions in the SELECT part of the query are finally computed.

6. DISTINCT

Of the remaining rows, rows with duplicate values in the column marked as DISTINCT will be discarded.

7. ORDER BY

If an order is specified by the ORDER BY clause, the rows are then sorted by the specified data in either ascending or descending order. Since all the expressions in the SELECT part of the query have been computed, you can reference aliases in this clause.

8. LIMIT / OFFSET

Finally, the rows that fall outside the range specified by the LIMIT and OFFSET are discarded, leaving the final set of rows to be returned from the query.

Conclusion

Not every query needs to have all the parts we listed above, but a part of why SQL is so flexible is that it allows developers and data analysts to quickly manipulate data without having to write additional code, all just by using the above clauses.

Exercise

Here ends our lessons on SELECT queries, congrats of making it this far! This exercise will try and test your understanding of queries, so don't be discouraged if you find them challenging. Just try your best.

1. Find the number of movies each director has directed
	1. SELECT Director, COUNT( * ) FROM movies GROUP BY Director;
2. Find the total domestic and international sales that can be attributed to each director
	1. SELECT Director, Total_Domestic_Sales + Total_International_Sales AS TOTALSALES FROM ( SELECT Director, SUM(Domestic_sales) AS Total_Domestic_Sales, SUM(International_sales) AS Total_International_Sales  FROM Movies M JOIN Boxoffice B ON M.Id = B.Movie_id GROUP BY Director ) A;

Exercise 12 — Tasks

1. Find the number of movies each director has directed ✓
2. Find the total domestic and international sales that can be attributed to each director ✓

This webpage is a lesson from SQLBolt titled "SQL Lesson 12: Order of execution of a Query". It explains the order in which the different clauses of a SQL SELECT statement are executed:

  1. FROM and JOINs: Determines the complete data set by retrieving and joining tables.
  2. WHERE: Applies constraints to filter rows.
  3. GROUP BY: Groups remaining rows by specified columns (typically used with aggregate functions).
  4. HAVING: Filters groups created by GROUP BY.
  5. SELECT: Computes the expressions and columns to return.
  6. DISTINCT: Removes duplicate rows.
  7. ORDER BY: Sorts the final result set.
  8. LIMIT / OFFSET: Restricts the number of rows returned.

The lesson includes two practice tasks:

Additionally, sample tables for movies and box office sales are provided. The exercise aims to reinforce the lesson on query order and practice writing aggregate queries.[1]

1



Here are some other ways to write the same query that counts the number of movies each director has directed:

  1. Using alias for COUNT
SELECT Director, COUNT(*) AS MovieCount FROM movies GROUP BY Director;
  1. Using COUNT on a specific column
SELECT Director, COUNT(Title) FROM movies GROUP BY Director;

If the Title column is NOT NULL, it's equivalent to COUNT().*

  1. Using subquery
SELECT Director, 
(SELECT COUNT(*) FROM movies m2 WHERE m2.Director = m1.Director) 
FROM movies m1 
GROUP BY Director;
  1. Using GROUP BY with ORDER BY
SELECT Director, COUNT(*) FROM movies GROUP BY Director ORDER BY Director;

All versions will return one row per director with the number of movies they have directed. The main difference is use of aliases, explicit columns, or subqueries—results remain the same.

1


Here are other valid ways to write the query that finds the sum of total domestic and international sales for each director:


1. Using direct SUM of both columns inside the main SELECT:

SELECT 
  Director, 
  SUM(Domestic_sales + International_sales) AS TOTALSALES
FROM 
  Movies M
JOIN 
  Boxoffice B
ON 
  M.Id = B.Movie_id
GROUP BY 
  Director;

This computes the sum in one step instead of using a derived table.


2. Using column aliases inside a subquery with arithmetic in the outer SELECT:

SELECT
  Director,
  SUM_Domestic + SUM_International AS TOTALSALES
FROM (
    SELECT 
        Director,
        SUM(Domestic_sales) AS SUM_Domestic,
        SUM(International_sales) AS SUM_International
    FROM Movies M
    JOIN Boxoffice B ON M.Id = B.Movie_id
    GROUP BY Director
) D;

Just changes alias names for clarity.


3. Using CTE (Common Table Expression, if supported):

WITH SalesByDirector AS (
    SELECT 
        Director,
        SUM(Domestic_sales) AS Total_Domestic_Sales,
        SUM(International_sales) AS Total_International_Sales
    FROM 
        Movies M
    JOIN 
        Boxoffice B
    ON 
        M.Id = B.Movie_id
    GROUP BY 
        Director
)
SELECT 
    Director, 
    Total_Domestic_Sales + Total_International_Sales AS TOTALSALES
FROM 
    SalesByDirector;

Uses a CTE instead of a subquery for better readability with complex queries.


All these queries will return one row per director and the total of their domestic and international sales. The differences are in structure and style, so you can use whichever is allowed by your SQL dialect and you find most readable!

1