Adding Columns to a Pandas DataFrame Based on Values of Another Column: A Step-by-Step Guide Using get_dummies

Adding Columns to a Pandas DataFrame Based on Values of Another Column

In this article, we’ll explore how to add new columns to a pandas DataFrame based on the values in another column. We’ll use real-world data from a CSV file and walk through the steps needed to achieve this.

Background

Pandas is a powerful library for data manipulation and analysis in Python. One of its key features is the ability to easily manipulate and analyze datasets in a structured way. In this article, we’ll focus on using pandas’ capabilities to add new columns to a DataFrame based on existing column values.

Solution Overview

To solve this problem, we can follow these steps:

  1. Map integer values to categories: We’ll create a mapping between the original integer values and their corresponding category labels.
  2. Use get_dummies to create dummy variables: This function will convert our categorical data into binary dummy variables that can be used for analysis or manipulation.
  3. Sum the counts of each category: After creating the dummy variables, we’ll sum up the counts of each category to get our desired output.

Step-by-Step Solution

Importing Libraries and Loading Data

First, we need to import the necessary libraries and load our data from the CSV file:

import pandas as pd

# Load data from CSV file
df = pd.DataFrame({
    'id_profile': [439, 444654, 56454, 56454, 444654, 56454, 12222, 12222, 12222, 12222],
    'ServiceDate': ['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25', '2017-03-01', '2017-01-01', '2017-01-05', '2017-01-30', '2017-03-01', '2017-03-20'],
    'PrimaryServiceCategory': [25, 25, 33, 25, 25, 25, 11, 25, 25, 25]
})

print(df)

Mapping Integer Values to Categories

Next, we’ll create a dictionary that maps our integer values to their corresponding category labels:

d = {11: 'eis', 33: 'ref', 25: 'her'}
df['Service'] = df['PrimaryServiceCategory'].map(d)
print(df)

Using get_dummies to Create Dummy Variables

Now, we’ll use the get_dummies function to convert our categorical data into binary dummy variables:

df = df.set_index('id_profile')\
       .join(pd.get_dummies(df.drop('PrimaryServiceCategory', 1), columns=['Service'])\
               .groupby(['id_profile']).sum())
print(df)

Resulting DataFrame

After running these steps, our resulting DataFrame will have the following structure:

      Service_eis  Service_her  Service_ref
id_profile                    
439            0           1           0  
12222            1           3           0  
56454            0           2           1  
444654            0           2           0  

Conclusion

In this article, we’ve demonstrated how to add new columns to a pandas DataFrame based on the values in another column. We used real-world data from a CSV file and walked through the steps needed to achieve this using get_dummies. By following these steps, you can easily manipulate and analyze your datasets using pandas.


Last modified on 2025-03-16