Creating a Consolidated Table That Tracks Changes in Two Tables: SQL Solution and Best Practices

Comparing and Updating Changes - SQL

In this article, we will explore a problem where you have two tables: latest and history. The latest table contains the latest data, while the history table contains all the previous data. We need to create a consolidated table that indicates when the change was made.

Background

The problem statement is similar to maintaining an audit trail or tracking changes in a database. In this case, we are dealing with two tables: one for the current state and another for the historical state. Our goal is to merge these two tables into a single table while indicating the changes made at each point in time.

Approach

To solve this problem, we will use a combination of SQL techniques:

  1. Union: We will use the UNION ALL operator to combine rows from both tables.
  2. Window Functions: Specifically, we will use the LAG function to access previous row values and compare them with current row values.
  3. Case Expressions: We will use case expressions to determine the type of change (new entry, column changed, etc.) based on comparisons between current and previous row values.

Step 1: Creating the Consolidated Table

First, we need to create a consolidated table that combines rows from both latest and history tables. We can achieve this by using the following SQL statement:

with combined as
(
    select *,'latest' as source from latest l
    union all
    select *, 'history' as source from history h
),
changes AS (
    SELECT
        ct1.id,
        ct1.col1,
        ct1.col2,
        ct1.load_date,
        CASE
            WHEN ct1.col1 <> ct2.col1 THEN 'col1 changed'
            WHEN ct1.col2 <> ct2.col2 THEN 'col2 changed'
            ELSE NULL
        END AS changes
    FROM
        combined ct1
        LEFT JOIN combined ct2
        ON ct1.id = ct2.id
)
select * from changes;

Step 2: Handling “No Show”

To handle the case where an ID does not have a corresponding row in latest, we will use a combination of SQL techniques:

  • We will create a temporary table to hold the latest rows.
  • We will then join this temporary table with the combined table using a LEFT JOIN.

Here is how you can do it:

with cte as
(
    select id, col1, col2, load_date, change = null
    from   latest
    union all
    select id, col1, col2, load_date, change = null
    from   history
    union all
    select id, col1, col2, load_date = dateadd(day, 1, load_date),
           change = 'no show'
    from   (
             select id, col1, col2, load_date,
                    rn = row_number() over (partition by id order by load_date desc)
             from history 
         ) h
    where  h.rn = 1
    and    not exists
           (
               select *
               from   latest x
               where  x.id = h.id
           )
),
cte2 as
(  
    select id, col1, col2, load_date, change, 
          prev_col1 = lag(col1) over (partition by id order by load_date),
          prev_col2 = lag(col2) over (partition by id order by load_date)  
    from   cte
)
select *,
       change = isnull(change, '')
              + case when prev_col1 is null and prev_col2 is null
                     then 'new entry'
                     else ''
                     end
              + case when prev_col1 <> col1 
                     then 'col1 changed '
                     else ''
                     end
              + case when prev_col2 <> col2 
                     then 'col2 changed '
                     else ''
                     end  
from   cte2
order by id, load_date;

Conclusion

In this article, we have discussed how to create a consolidated table that tracks changes in two tables: latest and history. We used the combination of SQL techniques such as union, window functions, and case expressions to achieve our goal. Additionally, we handled the “no show” case by creating a temporary table to hold the latest rows.

Tips and Variations

  • You can modify the script to suit your specific needs.
  • To further improve performance, you may want to consider indexing on the columns used in the JOINs and WHERE clauses.

Last modified on 2024-02-01