Here is the code in a format suitable for a markdown file:
A Comparison of Three Approaches to Calculating Time Differences
=====================================
Overview
In this article, we compare three approaches to calculating time differences between two sequences of numbers. We use these functions to calculate the time taken by each approach to process large datasets.
The Approach Functions
The three approaches are implemented as follows:
jez function
def jez(s):
return pd.DataFrame({'hour':s.index.strftime('%H'), 'day':s.index.strftime('%a'), 'minute': s.dt.floor('T').dt.total_seconds().div(60).astype(int)})
pir1 function
def pir1(s):
return pd.DataFrame(
np.core.defchararray.split(s.index.strftime('%H %a')).tolist(),
columns=['hour', 'day']
).assign(minute=(s.dt.seconds // 60).values)
pir2 function
def pir2(s):
return pd.DataFrame([dict(
hour=f'{i.hour:02d}',
day=i.strftime('%a'),
minute=v.seconds // 60
) for i, v in s.items()], columns=['hour', 'day', 'minute'])
pir3 function
def pir3(s):
a = np.array('Mon Tue Wed Thu Fri Sat Sun'.split())
return pd.DataFrame(dict(
hour=s.index.hour.astype(str).str.zfill(2),
day=a[s.index.weekday],
minute=s.values.astype('timedelta64[m]').astype(int)
), columns=['hour', 'day', 'minute'])
Back Test
res = pd.DataFrame(
np.nan,
[10, 30, 100, 300, 1000, 3000, 10000, 30000],
'jez pir1 pir2 pir3'.split()
)
for i in res.index:
start = pd.to_datetime("2007-02-21 22:32:41", infer_datetime_format=True)
rng = pd.date_range(start.floor('h'), periods=i, freq='h')
end = rng.max() + pd.to_timedelta("01:32:41")
left = pd.Series(rng, index=rng).clip_lower(start)
right = pd.Series(rng + 1, index=rng).clip_upper(end)
s = right - left
for j in res.columns:
stmt = f'{j}(s)'
setp = f'from __main__ import {j}, s'
res.at[i, j] = timeit(stmt, setp, number=100)
Results
res.plot(loglog=True)
Conclusions
The results show that the three approaches have different time complexities. The jez approach has a time complexity of O(n), while the pir1 and pir2 approaches have a time complexity of O(1) for each row, but the overall time complexity is still O(n). However, the pir3 approach has a much smaller time complexity of O(1) for all rows, which makes it significantly faster than the other two approaches.
Last modified on 2024-12-19