Randomly Sampling Tuples from Each Row in a Pandas DataFrame

Here is the complete code to solve this problem. It creates a dummy dataframe and then uses apply along with lambda to randomly sample from each tuple in the dataframe.

import pandas as pd
import random

# Create a dummy dataframe
df = pd.DataFrame({'id':range(1, 101),
              'tups':[(random.randint(1, 1000000), random.randint(1, 1000000), random.randint(1, 1000000),
                       random.randint(1, 1000000), random.randint(1, 1000000), random.randint(1, 1000000)) for _ in range(100)],
              'records_to_select':[random.randint(1, 5) for _ in range(100)]})

# Use apply to randomly sample from each tuple
df['samples_from_tuple'] = df.apply(lambda x: tuple(random.sample(x['tups'], x['records_to_select'])), axis=1)

# Print the first few rows of the dataframe
print(df.head())

When you run this code, it will create a dummy dataframe with 100 rows and then use apply along with lambda to randomly sample from each tuple in the dataframe. The sampled tuples are stored in a new column called samples_from_tuple.

Last modified on 2024-11-29