Understanding Business Minutes in Pandas DataFrames for Accurate Time Tracking

Understanding the Problem

The problem at hand involves finding the difference in calendar minutes between two time points in a pandas DataFrame. The goal is to replace the existing fillna operation, which calculates the difference in minutes, with business minutes.

To achieve this, we need to understand how to calculate business minutes and then apply this calculation to the given DataFrame.

Business Minutes

Business hours are typically defined as 10am to 5pm, Monday through Friday. This means that certain time points during weekends or after-hours are considered non-business hours.

The business_duration library provides a way to calculate business duration in minutes between two dates and times. It takes into account holidays, start and end hours, and the unit of measurement (in this case, minutes).

Calculating Business Minutes

To calculate business minutes, we need to define our start and end hours, as well as the holiday list for Russia.

start_hour = time(10, 0, 0)  # 10am
end_hour = time(21, 0, 0)   # 9pm
holidaylist_RU = holidays.Russia(years=[datetime.now().year, datetime.now().year-1])
unit_min='min'

Applying Business Minutes to the DataFrame

To apply business minutes to the given DataFrame, we need to first calculate the difference between the start and end times of each row. Then, we can use the business_duration library to calculate the business duration in minutes.

df['set'] = pd.to_datetime(df['set'])
df['closed'] = pd.to_datetime(df['closed'])
df = df.sort_values(['ID', 'set'])

# Calculate the difference between start and end times for each row
df['newtime_1'] = df.groupby('ID')['set'].shift(-1).fillna(df['closed'])

# Apply business minutes to calculate in work (minutes)
df['in work (minutes)'] = df.apply(lambda x: business_duration(x['set'], x['newtime_1'], start_hour, end_hour, holidaylist=holidaylist_RU, unit=unit_min), axis=1)

Replacing fillna with Business Minutes

Now that we have calculated the business minutes for each row, we can replace the existing fillna operation with these new values.

# Replace fillna with business duration
df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
                  .fillna(business_duration(c-s, start_hour, end_hour, holidaylist=holidaylist_RU, unit=unit_min)))

Conclusion

In this article, we discussed how to calculate business minutes and apply them to a pandas DataFrame. We used the business_duration library to perform these calculations.

By following the steps outlined in this article, you should be able to replace the existing fillna operation with business minutes in your own projects.

Code Examples

Here is the complete code example:

import pandas as pd
import business_duration as bd
import holidays as pyholidays
from datetime import time, datetime

holidaylist_RU = pyholidays.Russia(years=[datetime.now().year, datetime.now().year-1])
start_hour = time(10, 0, 0)
end_hour = time(21, 0, 0)
unit_min='min'

# Create a sample DataFrame
df = pd.DataFrame({
    'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
    'closed': ['2023-03-28 22:00', '2023-03-28 22:00', '2023-03-27 22:00', '2023-03-26 22:00', '2023-03-25 22:00', '2023-03-24 22:00'],
    'set': ['2023-03-27 19:00', '2023-03-28 19:15', '2023-03-28 20:00', '2023-03-27 22:00', '2023-03-25 19:00', '2023-03-26 19:30'],
    'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})

# Convert date columns to datetime
df['set'] = pd.to_datetime(df['set'])
df['closed'] = pd.to_datetime(df['closed'])

# Sort the DataFrame by ID and set
df = df.sort_values(['ID', 'set'])

# Calculate the difference between start and end times for each row
df['newtime_1'] = df.groupby('ID')['set'].shift(-1).fillna(df['closed'])

# Apply business minutes to calculate in work (minutes)
df['in work (minutes)'] = df.apply(lambda x: bd.business_duration(x['set'], x['newtime_1'], start_hour, end_hour, holidaylist=holidaylist_RU, unit=unit_min), axis=1)

# Replace fillna with business duration
df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
                  .fillna(business_duration(c-s, start_hour, end_hour, holidaylist=holidaylist_RU, unit=unit_min)))

Last modified on 2024-02-11