Refactoring Subqueries from SELECT to FROM: A Better Approach for Database Performance and Readability

Subquery in SELECT: trying to move to main query

Introduction

As a database developer, we often find ourselves dealing with complex queries that involve subqueries. In this article, we’ll explore the use of subqueries in the SELECT clause and how to refactor them into the FROM clause. We’ll also discuss the errors you might encounter when trying to move a subquery out of the SELECT clause.

The Problem

Consider the following query that uses a subquery within the SELECT clause:

SELECT
    PEOPLE.ID,
    (
    SELECT MIN(R.VALUE) KEEP (DENSE_RANK FIRST ORDER BY R.DATE_OF_CREATION)
    FROM REFERENCE_NUMBERS R
    WHERE
        R.DELETED IS NULL
        AND R.PERSON_ID(+) = PEOPLE.ID
    ) AS PRIMARY_REFERENCE_NUMBER,
    PEOPLE.NAME,
    PEOPLE.DATE_OF_BIRTH
FROM
    PEOPLE
WHERE
    PEOPLE.DELETED IS NULL
    AND PEOPLE.DATE_OF_BIRTH > TO_DATE('2000-01-01','yyyy-mm-dd')

This query selects a person’s ID, the earliest reference number with the minimum value, their name, and date of birth. The subquery in the SELECT clause finds the primary reference number for each person.

The Goal

The goal is to refactor this query so that the subquery is part of the main query in the FROM clause, eliminating the need for a separate subquery. We’ll explore two approaches: using an outer join and grouping by the person ID.

Using an Outer Join

One way to achieve this is by using an outer join with the table containing the reference numbers. Here’s the modified query:

SELECT
  p.id,
  r.primary_reference_number,
  p.name,
  p.date_of_birth
FROM people p
LEFT JOIN (
  SELECT person_id, MIN(value) KEEP (DENSE_RANK FIRST ORDER BY date_of_creation)
    AS primary_reference_number
  FROM reference_numbers
  WHERE deleted IS NULL
  GROUP BY person_id
) r ON r.person_id = p.id
WHERE p.deleted is null
AND p.date_of_birth > date '2000-01-01';

This query joins the people table with a derived table containing the primary reference number for each person. The outer join ensures that we include people without any reference numbers.

Grouping by Person ID

Another approach is to group the results by the person’s ID and use an aggregate function (in this case, MIN) to find the primary reference number. Here’s the modified query:

SELECT
  p.id,
  MIN(r.value) KEEP (DENSE_RANK FIRST ORDER BY r.date_of_creation)
    AS primary_reference_number,
  p.name,
  p.date_of_birth
FROM people p
LEFT JOIN reference_numbers r ON r.person_id = p.id AND r.deleted IS NULL
WHERE p.deleted is null
AND p.date_of_birth > date '2000-01-01'
GROUP BY p.id, p.name, p.date_of_birth;

This query groups the results by person ID and uses MIN to find the primary reference number.

Oracle-Specific Considerations

In this example, we’re using Oracle as our database management system. Oracle has some specific requirements for subqueries in the SELECT clause. When refactoring a subquery out of the SELECT clause, you need to ensure that all non-aggregated columns are included in the GROUP BY clause.

For example, if you’re selecting multiple columns in your main query (not just the person’s ID), you’ll need to include those columns in the GROUP BY clause. In this case, we’ve only selected a few columns (ID, NAME, and DATE_OF_BIRTH), so we don’t need to include any additional columns in the GROUP BY clause.

However, it’s worth noting that Oracle’s SQL standard doesn’t require including all non-aggregated columns in the GROUP BY clause. According to the SQL standard, you only need to specify the columns used by aggregate functions or included in WHERE, HAVING, and ORDER BY clauses.

Unfortunately, Oracle doesn’t conform to this standard, so we must include all non-aggregated columns in the GROUP BY clause if we want to use a subquery refactored out of the SELECT clause.

Last modified on 2024-01-17