Create 48 Dataframes Based on 4 Countries and 12 Months Using Python Pandas Library

Filter Monthly Data Based on 12 Months and 4 Countries in Python

===========================================================

In this article, we will explore how to filter monthly data based on 12 months and 4 countries using Python. We will use the popular Pandas library for data manipulation and analysis.

Introduction


Data filtering is an essential step in data analysis. It allows us to extract specific data points that meet certain criteria. In this article, we will focus on filtering monthly data based on 12 months and 4 countries using Python. We will use a sample dataset called “onlineretail” which consists of several columns including InvoiceNo, Description, Country, and Month.

Understanding the Problem


We have a dataset with the following structure:

| InvoiceNo | Description | Country | Month |
| --- | --- | --- | --- |
| 1 | Item 1 | France | Jan |
| 2 | Item 2 | USA | Feb |
| ... | ... | ... | ... |

We need to create 48 dataframes based on 4 countries and 12 months. The pattern of dataframe that we need is shown below:

+------------+-----------+
| Country    | Month     |
+------------+-----------+
| France     | Jan       |
| France     | Feb       |
| ...        | ...       |
| USA        | Jan       |
| USA        | Feb       |
| ...        | ...       |
| Brazil     | Jan       |
| Brazil     | Feb       |
| ...        | ...       |
+------------+-----------+

Solution


To solve this problem, we will use the Pandas library in Python. We will first filter our dataframe to only include rows where the Country is one of the 4 countries and the Month is one of the 12 months.

Step 1: Filter Dataframe

We can filter our dataframe using the isin function provided by Pandas.

# Define the list of countries
Country = ["France", "USA", "Mexico", "Brazil"]

# Define the list of months
Month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

# Filter our dataframe to only include rows where Country is one of the 4 countries and Month is one of the 12 months
df = df[df['Country'].isin(Country) & df['Month'].isin(Month)]

Step 2: Group by Country and Month

We can group our filtered dataframe by Country and Month using the groupby function provided by Pandas.

# Group our filtered dataframe by Country and Month
data = dict(list(df.groupby(['Country', 'Month'])))

Step 3: Create Dataframes

We can create dataframes based on the grouped data. Each key in the dictionary represents a unique combination of Country and Month, and its corresponding value is a Pandas Series containing the filtered data.

# Create dataframes based on the grouped data
for country, month in data:
    print(f"{country} {month}")
    
    # Filter our dataframe to only include rows where Country is equal to the current country and Month is equal to the current month
    df_country_month = df[df['Country'] == country & df['Month'] == month]
    
    # Group our filtered dataframe by InvoiceNo and Description using the groupby function
    dataframes = df_country_month.groupby(['InvoiceNo', 'Description'])['Quantity'].sum().unstack().reset_index().fillna(0)
    
    # Set InvoiceNo as the index of the dataframe
    dataframes.set_index('InvoiceNo', inplace=True)

Example Use Case


Let’s use an example to demonstrate how to create 48 dataframes based on 4 countries and 12 months.

# Define the list of countries
Country = ["France", "USA", "Mexico", "Brazil"]

# Define the list of months
Month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

# Create a sample dataframe
import pandas as pd

data = {
    'InvoiceNo': [1, 2, 3, 4, 5],
    'Description': ['Item 1', 'Item 2', 'Item 3', 'Item 4', 'Item 5'],
    'Country': ['France', 'USA', 'Mexico', 'Brazil', 'France'],
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May']
}

df = pd.DataFrame(data)

# Filter our dataframe to only include rows where Country is one of the 4 countries and Month is one of the 12 months
df = df[df['Country'].isin(Country) & df['Month'].isin(Month)]

# Group our filtered dataframe by Country and Month
data = dict(list(df.groupby(['Country', 'Month'])))

# Create dataframes based on the grouped data
for country, month in data:
    print(f"{country} {month}")
    
    # Filter our dataframe to only include rows where Country is equal to the current country and Month is equal to the current month
    df_country_month = df[df['Country'] == country & df['Month'] == month]
    
    # Group our filtered dataframe by InvoiceNo and Description using the groupby function
    dataframes = df_country_month.groupby(['InvoiceNo', 'Description'])['Quantity'].sum().unstack().reset_index().fillna(0)
    
    # Set InvoiceNo as the index of the dataframe
    dataframes.set_index('InvoiceNo', inplace=True)

# Print the dataframes
for country, month in data:
    print(dataframes[f"{country} {month}"])

Conclusion


In this article, we have explored how to filter monthly data based on 12 months and 4 countries using Python. We have used the Pandas library for data manipulation and analysis. We have also provided an example use case to demonstrate how to create 48 dataframes based on 4 countries and 12 months.

References



Last modified on 2023-08-02