Calculating Average Amount Outstanding for Customers Live in Consecutive Months in a Time Series
In this article, we will explore how to calculate the average amount outstanding for customers who are live in consecutive months in a time series dataset. We will use Python and its popular data science library pandas to accomplish this task.
Problem Statement
Suppose you have a dataframe that sums the $ amount of money that a customer has in their account during a particular month. If the customer has no money, the amount is simply 0. The dataframe looks like the following:
| Customer | A | B | C | D | E | F |
|---|---|---|---|---|---|---|
| 11/30/2015 | 0 | 1000 | 0 | 0 | 5000 | 0 |
| 12/31/2015 | 0 | 1000 | 0 | 0 | 5000 | 0 |
| 1/31/2016 | 2133 | 0 | 0 | 3000 | 5000 | 2000 |
| 2/29/2016 | 2133 | 0 | 0 | 3000 | 5000 | 2000 |
| 3/31/2016 | 2133 | 2000 | 4000 | 0 | 10000 | 2000 |
| 4/30/2016 | 0 | 2000 | 4000 | 0 | 10000 | 0 |
| 5/31/2016 | 0 | 2000 | 4000 | 0 | 10000 | 0 |
We need to calculate the average amount outstanding for customers who are live in consecutive months.
Solution
To solve this problem, we will follow these steps:
- Transform your data to Python readable form by removing ‘,’ and converting to integers.
- Check the condition whether the value is equal to 0 and shift the dataframe.
- Take negative of boolean dataframe which will only have live examples.
- Calculate the mean of the dataframe by axis one with only live values.
Step 1: Transform Your Data
First, we need to transform our data into a Python readable form. We can do this by removing ‘,’ and converting to integers.
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({
'A': ['0', '1000', '2133', '2133', '2133', '0'],
'B': ['1000', '0', '2000', '0', '2000', '2000'],
'C': ['0', '0', '0', '0', '4000', '4000'],
'D': ['0', '0', '3000', '3000', '0', '0'],
'E': ['5000', '5000', '5000', '5000', '10000', '10000'],
'F': ['0', '0', '2000', '2000', '2000', '0']
})
# Set the index to 'Customer'
df.set_index(['Customer'], inplace=True)
# Apply function to convert data to integers
df = pd.DataFrame(df.apply(lambda x: x.str.replace(',', '').astype(int), axis=1))
print(df)
Step 2: Check Condition and Shift Dataframe
Next, we need to check if the value is equal to 0 and shift the dataframe. We can do this by using the eq function to compare the values to 0 and then shifting the dataframe.
# Check condition whether the value is equal to 0 and shift dataframe
df = df[~df.eq(0).shift().fillna(False)]
print(df)
Step 3: Calculate Mean
Finally, we can calculate the mean of the dataframe by axis one with only live values. We can do this by using the apply function to calculate the sum and then dividing by the count.
# Take negative of boolean dataframe which will only have live examples
df1 = df[~df.eq(0).shift().fillna(False)]
# Calculate mean of dataframe by axis one with only live values
df1['Avg Outstanding'] = df1.apply(lambda x: sum(x)/x.ne(0).sum(), 1)
print(df1)
Example Use Case
The above code can be used to calculate the average amount outstanding for customers who are live in consecutive months. This is useful in finance and banking applications where we need to track customer accounts and calculate their average balance.
Note: The code assumes that the input dataframe has columns ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’ which represent the amounts of different types of transactions. It also assumes that the index is ‘Customer’.
Last modified on 2023-08-27