Performing Simulations Using Normal and Log-Normal Distributions in R

Performing Simulations and Combining the Data into One Data Frame

In this blog post, we will explore how to perform simulations using normal or log-normal distribution for a parameter X based on a flag in R. We will use the dplyr package to automate the process of performing simulations and combining the data into one data frame.

Understanding the Problem

We are given a dataset with several columns: SOURCE, NSUB, MEAN, SD, and DIST. The DIST column indicates whether the distribution is normal (0) or log-normal (1). We want to perform 1000 simulations for each study using the mean and standard deviation values provided in the MEAN and SD columns, respectively. If DIST is 0, we use a normal distribution; otherwise, we use a log-normal distribution.

Solution Overview

To solve this problem, we will create two functions:

  1. doSim: This function takes each row of the simulation configuration dataset as input and returns a data frame with simulated values.
  2. A dplyr pipeline: We will use the rowwise() function from the dplyr package to apply the doSim() function to each row in the dataset.

Step 1: Define the doSim Function

The doSim() function takes a simulation configuration data frame and an optional seed value as input. It sets the seed for reproducibility, determines whether to use a normal or log-normal distribution based on the DIST column, and calculates the mean using either the provided value or its logarithm.

Here’s how you can define this function in R:

library(dplyr)

# Define the doSim function
doSim = function(simConfig, seed = 12345) {
  set.seed(seed)
  
  # Determine distribution type based on DIST column
  distType = if(simConfig[["DIST"]] == 0) rnorm else rlnorm
  
  # Calculate mean using either value or its logarithm
  meanVal = if(simConfig[["DIST"]] == 0) simConfig[["MEAN"]] else log(simConfig[["MEAN"]]) 
  
  return(
    data.frame(
      source = simConfig[["SOURCE"]],
      nsub = simConfig[["NSUB"]],
      value = distType(1000, mean = meanVal, sd = simConfig[["SD"]])
    )
  )
}

Step 2: Apply doSim Function to the Dataset

We will use the rowwise() function from the dplyr package to apply the doSim() function to each row in the dataset.

Here’s how you can do this:

# Load required libraries
library(dplyr)

# Read simulation configuration dataset
dfX = read.table(textConnection(
                   """
                    SOURCE  NSUB   MEAN   SD   DIST
Study1  10     1.5    0.3  0
Study2  5      2.5    0.4  1
Study1  4      3.5    0.3  0
                   """
                   ), header = TRUE, stringsAsFactors = FALSE)

# Apply doSim function to dfX
dfAll = dfX %>% 
  rowwise() %>% 
  do(doSim(.))

Step 3: Combine Simulated Data

The rowwise() function automatically returns a data frame where each row corresponds to the simulation configuration in that iteration.

This returned dataframe dfAll now holds all simulated values for each study and distribution type. You can view it like so:

# View dfAll
View(dfAll)

Conclusion

In this article, we demonstrated how to use R’s built-in functions to perform simulations based on a simulation configuration dataset. We applied the dplyr package to automate the process of performing simulations and combining the data into one data frame.

Feel free to modify this function according to your requirements or experiment with different distribution types and parameters!


Last modified on 2025-02-23