Creating a New Variable Based on Two Conditions
In this article, we will explore how to create a new variable in SQL based on two conditions. We have a dataset about the number of School_children attending specific online courses, monitored on a quarterly basis. The goal is to determine the +/- movements of schoolkid numbers of the courses from one Quarter to the next one for each course.
Problem Statement
We want to create a new variable called Switch with values:
yes_plus: when there was a positive change of School_children (numbers) within a specific online_course between consecutive dates (Quarter_dates).yes_neg: when there was a negative change of School_children (numbers) within a specific online_course between consecutive dates (Quarter_dates).no: when there was no change of School_children (numbers) within a specific online_course between consecutive dates (Quarter_dates).
Solution
To achieve this, we can use a combination of the LAG function and conditional aggregation. The idea is to compare the current value of school_children with its value from the previous row using the same partitioning by online_course.
Query
drop table if exists dbo.test_table;
go
create table dbo.test_table(
[Date] date not null,
school_children int not null,
online_course varchar(100) not null);
insert dbo.test_table([Date], school_children, online_course) values
('2016-10-17',1458, 'A1'),
('2016-10-18',1458, 'A1'),
('2016-10-19',10468, 'A1'),
('2016-10-20',2478, 'A1');
select
*,
school_children - lag(school_children) over (partition by online_course order by [Date]) school_chldrn_diff,
case
when school_children = lag(school_children) over (partition by online_course order by [Date])
then 'no'
when school_children > lag(school_children) over (partition by online_course order by [Date])
then 'yes_plus'
else 'yes_neg'
end Switch
from dbo.test_table;
Explanation
The query uses the following techniques:
LAGfunction: This function returns the value of a column from a previous row within the same partitioning frame.- Partitioning by
online_course: We usepartition by online_courseto group rows by theonline_coursecolumn, ensuring that comparisons are made between consecutive dates for each course separately. - Conditional aggregation: The
casestatement uses conditional logic to determine the value of the newSwitchvariable.
Results
The resulting table will have an additional Switch column with the desired values:
| Date | school_children | online_course | school_chldrn_diff | Switch |
|---|---|---|---|---|
| 2016-10-17 | 1458 | A1 | NULL | NULL |
| 2016-10-18 | 1458 | A1 | 0 | no |
| 2016-10-19 | 10468 | A1 | 9010 | yes_plus |
| 2016-10-20 | 2478 | A1 | -7990 | yes_neg |
This approach can be applied to the original dataset by modifying the query to match the column names and data types.
Conclusion
Creating a new variable in SQL based on two conditions requires using conditional aggregation techniques, such as the case statement, along with window functions like LAG. By partitioning by relevant columns and comparing values across rows within the same partitions, you can create a new column that reflects changes between consecutive dates for each course.
Last modified on 2023-10-30