Fitting Curve to Histogram in Python
=====================================================
In this article, we will explore how to fit a probability distribution curve to a histogram created from a pandas DataFrame. We’ll cover various distributions such as Normal, Gamma, Beta, GEV, LogNormal, Weibull, and Exponential-Weibull, and provide code examples for each.
Introduction
Histograms are a common visualization tool used in statistics and data analysis to represent the distribution of a dataset. However, sometimes we need to fit a specific probability distribution curve to the histogram to better understand the characteristics of our data. In this article, we’ll discuss how to achieve this using Python’s popular libraries, such as NumPy, Pandas, Matplotlib, and SciPy.
Understanding Probability Distributions
Before diving into the code examples, let’s briefly review some common probability distributions that can be used for fitting curves to histograms:
- Normal Distribution: The normal distribution is a continuous probability distribution with a symmetric bell-shaped curve. It’s commonly used in statistics and is often referred to as the “Gaussian” distribution.
- Gamma Distribution: The gamma distribution is a two-parameter family of continuous probability distributions that can be used to model positive random variables.
- Beta Distribution: The beta distribution is another two-parameter family of continuous probability distributions, often used in Bayesian inference and statistical modeling.
- GEV (Generalized Extreme Value) Distribution: The GEV distribution is an extension of the Weibull distribution, which is commonly used to model extreme value data.
- LogNormal Distribution: The lognormal distribution is a probability distribution of a random variable whose logarithm is normally distributed.
- Weibull Distribution: The Weibull distribution is a continuous probability distribution that describes the shape and scale of failure in reliability engineering.
- Exponential-Weibull Distribution: This is an extension of the exponential distribution to include the parameters of the Weibull distribution.
Installing Required Libraries
Before we start coding, make sure you have the required libraries installed. You can install them using pip:
pip install numpy pandas matplotlib scipy statsmodels
Fitting Curve to Histogram Using Various Distributions
1. Normal Distribution
To fit a normal distribution curve to the histogram, you can use the norm.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.normal(loc=5, scale=2, size=1000)
# Calculate the mean and standard deviation of the data
mean = np.mean(data)
std_dev = np.std(data)
# Generate x values for plotting the normal distribution curve
x = np.linspace(mean - 3 * std_dev, mean + 3 * std_dev, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the normal distribution curve
plt.plot(x, norm.pdf(x, loc=mean, scale=std_dev), 'r-', label='Normal Distribution')
plt.legend()
plt.show()
2. Gamma Distribution
To fit a gamma distribution curve to the histogram, you can use the gamma.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import gamma
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.gamma(loc=2, scale=1.5, size=1000)
# Calculate the shape and scale parameters of the data
shape = np.mean(data) / np.std(data)
scale = np.sqrt(np.var(data))
# Generate x values for plotting the gamma distribution curve
x = np.linspace(0, 10, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the gamma distribution curve
plt.plot(x, gamma.pdf(x, shape, scale), 'r-', label='Gamma Distribution')
plt.legend()
plt.show()
3. Beta Distribution
To fit a beta distribution curve to the histogram, you can use the beta.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import beta
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.beta(a=2, b=1, size=1000)
# Calculate the shape parameters of the data
a = np.mean(data)
b = np.std(data)
# Generate x values for plotting the beta distribution curve
x = np.linspace(0, 1, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the beta distribution curve
plt.plot(x, beta.pdf(x, a, b), 'r-', label='Beta Distribution')
plt.legend()
plt.show()
4. GEV (Generalized Extreme Value) Distribution
To fit a GEV distribution curve to the histogram, you can use the gev.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import geom
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.geom(a=2, size=1000)
# Calculate the shape and scale parameters of the data
shape = 1.5 / (np.mean(data) - 1)
scale = np.sqrt(np.var(data))
# Generate x values for plotting the gamma distribution curve
x = np.linspace(0, 10, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the gamma distribution curve
plt.plot(x, geom.pdf(x, shape, scale), 'r-', label='Gamma Distribution')
plt.legend()
plt.show()
5. LogNormal Distribution
To fit a lognormal distribution curve to the histogram, you can use the lognorm.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import lognorm
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.lognormal(loc=5, scale=1.2, size=1000)
# Calculate the location and scale parameters of the data
loc = np.mean(np.log(data))
scale = np.std(np.log(data))
# Generate x values for plotting the lognormal distribution curve
x = np.linspace(4, 6, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the lognormal distribution curve
plt.plot(x, lognorm.pdf(x, loc, scale), 'r-', label='LogNormal Distribution')
plt.legend()
plt.show()
6. Weibull Distribution
To fit a Weibull distribution curve to the histogram, you can use the weibull_min.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import weibull_min
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.weibull(a=1.5, size=1000)
# Calculate the shape and scale parameters of the data
shape = 1.5 / (np.mean(data) - 1)
scale = np.sqrt(np.var(data))
# Generate x values for plotting the Weibull distribution curve
x = np.linspace(0, 10, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the Weibull distribution curve
plt.plot(x, weibull_min.pdf(x, shape, scale), 'r-', label='Weibull Distribution')
plt.legend()
plt.show()
7. Rician Distribution
To fit a Rician distribution curve to the histogram, you can use the rice.pdf() function from SciPy’s statistics module.
import numpy as np
from scipy.stats import rice
import matplotlib.pyplot as plt
# Create a sample dataset
np.random.seed(0)
data = np.random.rice(a=1.5, size=1000)
# Calculate the shape and scale parameters of the data
shape = 1.5 / (np.mean(data) - 1)
scale = np.sqrt(np.var(data))
# Generate x values for plotting the Rician distribution curve
x = np.linspace(0, 10, 100)
# Plot the histogram of the data
plt.hist(data, bins=30, density=True, alpha=0.5, label='Data')
# Plot the Rician distribution curve
plt.plot(x, rice.pdf(x, shape, scale), 'r-', label='Rician Distribution')
plt.legend()
plt.show()
Each of these plots shows a different type of distribution and how it can be used to model real-world data. The choice of distribution depends on the characteristics of the data being modeled.
Last modified on 2023-09-23