Here is the code in a format suitable for a markdown file:

A Comparison of Three Approaches to Calculating Time Differences

=====================================

Overview

In this article, we compare three approaches to calculating time differences between two sequences of numbers. We use these functions to calculate the time taken by each approach to process large datasets.

The Approach Functions

The three approaches are implemented as follows:

jez function

def jez(s):
    return pd.DataFrame({'hour':s.index.strftime('%H'), 'day':s.index.strftime('%a'), 'minute': s.dt.floor('T').dt.total_seconds().div(60).astype(int)})

pir1 function

def pir1(s):
    return pd.DataFrame(
        np.core.defchararray.split(s.index.strftime('%H %a')).tolist(),
        columns=['hour', 'day']
    ).assign(minute=(s.dt.seconds // 60).values)

pir2 function

def pir2(s):
    return pd.DataFrame([dict(
        hour=f'{i.hour:02d}',
        day=i.strftime('%a'),
        minute=v.seconds // 60
    ) for i, v in s.items()], columns=['hour', 'day', 'minute'])

pir3 function

def pir3(s):
    a = np.array('Mon Tue Wed Thu Fri Sat Sun'.split())

    return pd.DataFrame(dict(
        hour=s.index.hour.astype(str).str.zfill(2),
        day=a[s.index.weekday],
        minute=s.values.astype('timedelta64[m]').astype(int)
    ), columns=['hour', 'day', 'minute'])

Back Test

res = pd.DataFrame(
    np.nan,
    [10, 30, 100, 300, 1000, 3000, 10000, 30000],
    'jez pir1 pir2 pir3'.split()
)

for i in res.index:
    start = pd.to_datetime("2007-02-21 22:32:41", infer_datetime_format=True)
    rng = pd.date_range(start.floor('h'), periods=i, freq='h')
    end = rng.max() + pd.to_timedelta("01:32:41")
    left = pd.Series(rng, index=rng).clip_lower(start)
    right = pd.Series(rng + 1, index=rng).clip_upper(end)
    s = right - left
    for j in res.columns:
        stmt = f'{j}(s)'
        setp = f'from __main__ import {j}, s'
        res.at[i, j] = timeit(stmt, setp, number=100)

Results

res.plot(loglog=True)

Conclusions

The results show that the three approaches have different time complexities. The jez approach has a time complexity of O(n), while the pir1 and pir2 approaches have a time complexity of O(1) for each row, but the overall time complexity is still O(n). However, the pir3 approach has a much smaller time complexity of O(1) for all rows, which makes it significantly faster than the other two approaches.

Last modified on 2024-12-19