Understanding Time Differences Between Submissions in a Contract Data

Here’s the complete code snippet that performs the operations described:

import pandas as pd
import matplotlib.pyplot as plt
from datetime import timedelta

# Create a DataFrame
data = {
    'USER_ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'CONTRACT_REF': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
    'SUBMISSION_DATE': [
        '2022-01-01 01:00:00',
        '2022-01-02 02:00:00',
        '2022-01-03 03:00:00',
        '2022-01-04 04:00:00',
        '2022-01-05 05:00:00',
        '2022-01-06 06:00:00',
        '2022-01-07 07:00:00',
        '2022-01-08 08:00:00',
        '2022-01-09 09:30:00',
        '2022-01-10 10:00:00'
    ]
}
df = pd.DataFrame(data)

# Convert submission dates to datetime
df['SUBMISSION_DATE'] = pd.to_datetime(df['SUBMISSION_DATE'])

# Group by USER_ID and CONTRACT_REF, select SUBMISSION_DATE column
gs = df.groupby(['USER_ID', 'CONTRACT_REF'])['SUBMISSION_DATE']

# Take the difference of each group
diff = gs.diff()

# Fill NaT with 0
filled_diff = diff.fillna(0)

# Divide by a timedelta of 1 hour
result = filled_diff / pd.Timedelta(hours=1)

# Assign to DataFrame
df['TIME_DIFF'] = result

print(df)

This code creates a DataFrame, converts the submission dates to datetime format, groups by USER_ID and CONTRACT_REF, calculates the difference between each group, fills any missing values with 0, divides the results by 1 hour, and finally assigns these new values to the TIME_DIFF column of the original DataFrame.


Last modified on 2023-11-04