SQL Topic - Subqueries

You might have noticed that even with a complete query, there are many questions that we can't answer about our data without additional post, or pre, processing. In these cases, you can either make multiple queries and process the data yourself, or you can build a more complex query using SQL subqueries.

Example: General subquery

Lets say your company has a list of all Sales Associates, with data on the revenue that each Associate brings in, and their individual salary. Times are tight, and you now want to find out which of your Associates are costing the company more than the average revenue brought per Associate.

First, you would need to calculate the average revenue all the Associates are generating:

SELECT AVG(revenue_generated) FROM sales_associates;

And then using that result, we can then compare the costs of each of the Associates against that value. To use it as a subquery, we can just write it straight into the WHERE clause of the query:

SELECT * FROM sales_associates WHERE salary > **(SELECT AVG(revenue_generated) FROM sales_associates)**;

As the constraint is executed, each Associate's salary will be tested against the value queried from the inner subquery.

A subquery can be referenced anywhere a normal table can be referenced. Inside a FROM clause, you can JOIN subqueries with other tables, inside a WHERE or HAVING constraint, you can test expressions against the results of the subquery, and even in expressions in the SELECT clause, which allow you to return data directly from the subquery. They are generally executed in the same logical order as the part of the query that they appear in, as described in the last lesson. [1] < Explained.

Because subqueries can be nested, each subquery must be fully enclosed in parentheses in order to establish proper hierarchy. Subqueries can otherwise reference any tables in the database, and make use of the constructs of a normal query (though some implementations don't allow subqueries to use LIMIT or OFFSET).

Correlated subqueries

A more powerful type of subquery is the correlated subquery in which the inner query references, and is dependent on, a column or alias from the outer query. Unlike the subqueries above, each of these inner queries need to be run for each of the rows in the outer query, since the inner query is dependent on the current outer query row.

Example: Correlated subquery

Instead of the list of just Sales Associates above, imagine if you have a general list of Employees, their departments (engineering, sales, etc.), revenue, and salary. This time, you are now looking across the company to find the employees who perform worse than average in their department.

For each employee, you would need to calculate their cost relative to the average revenue generated by all people in their department. To take the average for the department, the subquery will need to know what department each employee is in:

SELECT * FROM employees WHERE salary > (SELECT AVG(revenue_generated) FROM employees AS dept_employees **WHERE dept_employees.department = employees.department**);

These kinds of complex queries can be powerful, but also difficult to read and understand, so you should take care using them. If possible, try and give meaningful aliases to the temporary values and tables. In addition, correlated subqueries can be difficult to optimize, so performance characteristics may vary across different databases.

Existence tests

When we introduced WHERE constraints in Lesson 2: Queries with constraints, the IN operator was used to test whether the column value in the current row existed in a fixed list of values. In complex queries, this can be extended using subqueries to test whether a column value exists in a dynamic list of values.

Select query with subquery constraint

SELECT *, … FROM mytable WHERE column **IN/NOT IN** (SELECT another_column FROM another_table);

When doing this, notice that the inner subquery must select for a column value or expression to produce a list that the outer column value can be tested against. This type of constraint is powerful when the constraints are based on current data.


The current SQLBolt page explains the concept of subqueries in SQL.

The page recommends giving meaningful aliases and being cautious with correlated subqueries due to possible performance issues.

1


can you access tables inside subqueries on the outside of it

No, you cannot access tables or columns defined only inside a subquery from the outer query. The scope of any table alias or column alias created inside a subquery is limited to that subquery.

However, the reverse is possible: in a correlated subquery, the subquery can reference columns from tables in the outer query. This is what makes a subquery “correlated.” But outer queries cannot see inside subqueries.

Summary:

1



  1. A subquery in SQL is like a small query inside a bigger one. You can use it just like a table or a value, in different parts of your SQL statements. Here’s a simpler explanation of the essentials:

    • What is a subquery?
      A subquery is a query placed inside another SQL query.

    • Where can you use a subquery?
      You can put it almost anywhere in your main query:

      • FROM clause: Use it as if it’s a temporary table you can join with others.
      • WHERE or HAVING clause: Use it to compare data from your main query with the result of the subquery.
      • SELECT clause: Use it to pull a single value from a subquery and show it as part of the result.
    • How are subqueries run?
      Subqueries are usually run at the same point in the query as where you wrote them.

    Example for clarity:

    SELECT name
    FROM students
    WHERE age > (SELECT AVG(age) FROM students);
    

    Here,

    • The part in parentheses is the subquery: (SELECT AVG(age) FROM students)
    • It finds the average age and compares each student's age to this value.

    Key Points:

    • Subqueries help you to break complex logic into smaller chunks.
    • They can be used to filter, compare, or even generate new columns.

    Let me know if you’d like to see specific examples for any of these uses! ↩︎