Setting Batch Number for Set of Records in Python
In this article, we will explore how to set a batch number for a set of records in Python using the pandas library. We’ll start by understanding what a moving sum is and then move on to implementing it along with setting a batch number.
What is Moving Sum?
A moving sum is a calculation that takes the average or total value of a series of numbers over a specific period, often used for time-series data analysis. It’s called “moving” because it updates its value as new data points are added.
For example, let’s say we have a daily sales data set with the following values:
| Day | Sales |
|---|---|
| 1 | 100 |
| 2 | 120 |
| 3 | 150 |
The moving sum of these sales over a period of 3 days would be (100 + 120 + 150) / 3 = 135. This means the average sales over the last 3 days is 135.
Setting Batch Number for Set of Records
We are trying to achieve two things:
- Set a batch number that should keep on increasing after moving sum value crosses 15.
- The moving sum should reset as well, so it only considers the total value of records up to that point.
For instance, if the cumulative sum exceeds 15 - we want our output rows containing total value of 15.
Using pandas for Data Analysis
We will be using pandas library in Python for this task. Pandas is a powerful data analysis library that provides data structures and functions designed to make working with structured data easy.
Let’s import necessary libraries:
import pandas as pd
Sample Data
First, let’s create our sample data set. We’ll use the csv file provided in the question for this example.
# Creating DataFrame from CSV file
data = {
"id": [1, 2, 3, 4, 5, 6, 7],
"date": ["2019-03-28 01:22:12", "2019-03-29 01:23:23", "2019-03-30 01:28:54",
"2019-03-28 01:12:21", "2019-03-12 01:08:11", "2019-03-28 01:01:21",
"2019-03-12 01:02:11"],
"records": [5, 5, 5, 2, 1, 12, 1]
}
df = pd.DataFrame(data)
Calculating Moving Sum
Next, we’ll create a function to calculate the moving sum. This will update its value as new data points are added.
# Initialize empty list to store moving sum values
moving = []
batch_numbers = []
cntr = 1
for idx, row in df.iterrows():
if len(moving) == 0:
# If there's no previous records to add
moving.append(row['records'])
batch_numbers.append(cntr)
cntr += 1
elif moving[-1] < 15:
# Add current records to the last batch number
moving.append(row['records'] + moving[-1])
batch_numbers.append(cntr)
elif moving[-1] >= 15:
# New batch starts
moving.append(row['records'])
cntr += 1
batch_numbers.append(cntr)
# Create new DataFrame with the calculated data
df['moving_sum'] = moving
df['batch_number'] = batch_numbers
Output
Finally, let’s print out our resulting DataFrame to see how it looks like.
print(df)
Explanation of Code
for idx, row in df.iterrows():: This loop goes through each row of the DataFrame.if len(moving) == 0:: If there’s no previous records to add (i.e., it’s the first batch), append the current record and create a new batch number.moving.append(row['records']): Append the current record to the moving sum list.`batch_numbers.append(cntr)` : Append the current batch number to the batch numbers list.`cntr += 1` : Move on to the next batch.
elif moving[-1] < 15:: If the last calculated batch was less than 15, add the current record to the same batch and increment its sum.moving.append(row['records'] + moving[-1]): Add the current records to the last batch number’s sum.`batch_numbers.append(cntr)`: Append the current batch number to the batch numbers list.
elif moving[-1] >= 15:: If the last calculated batch was more than or equal to 15, start a new batch and add the current record. Move on to the next batch and create a new batch number.
Conclusion
In this article, we explored how to calculate a moving sum along with setting a batch number for a set of records in Python using pandas library. We covered what is a moving sum, creating our sample data set, calculating the moving sum, and finally printing out our resulting DataFrame.
Last modified on 2025-03-11