Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

NumPy - Visualize Distributions With Seaborn



Visualizing Distributions with Seaborn

When working with data, visualizing distributions is an important step in understanding the characteristics of the data.

Seaborn, built on top of Matplotlib, is a powerful visualization library in Python that simplifies the process of creating informative and attractive statistical plots.

In this tutorial, we will explore how to use Seaborn to visualize different types of distributions, including normal, uniform, and other probability distributions. We will also demonstrate how to enhance the visualization with customization options and styling.

What is Seaborn?

Seaborn is a Python visualization library that provides a high-level interface for creating attractive and informative statistical graphics. It integrates well with Pandas data structures and provides several functions to visualize distributions, relationships, and trends in data.

One of its key strengths is making it easy to visualize distributions, correlations, and data relationships with minimal code.

Seaborn builds on Matplotlib and provides more streamlined functions to create complex plots. It also automatically handles aesthetics, such as color schemes and labels, making your visualizations more attractive and easier to interpret.

Setting Up Seaborn

Before we start visualizing distributions with Seaborn, we need to install the necessary libraries and set up the environment. You can install Seaborn using pip if it is not already installed as shown below −

# Install Seaborn using pip
!pip install seaborn

In addition to Seaborn, we will use NumPy to generate data for the distributions. Here is the typical setup for importing both libraries −

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

Once the libraries are imported, we can start generating and visualizing different types of distributions.

Visualizing a Normal Distribution

One of the most commonly used distributions in statistics is the normal distribution, also known as the Gaussian distribution. It is symmetric and bell-shaped, often used to model things like test scores, heights, and measurement errors.

We can generate random data from a normal distribution using NumPy's numpy.random.normal() function and then use Seaborn's seaborn.histplot() function to visualize the distribution.

Example

In the following example, the sns.histplot() function automatically creates a histogram of the data, and by setting the kde parameter to True, it adds a smooth Kernel Density Estimate (KDE) curve over the histogram to visualize the probability density function (PDF) −

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random data from a normal distribution
data = np.random.normal(loc=0, scale=1, size=1000)

# Visualize the distribution using Seaborn
# kde=True adds a Kernel Density Estimate curve
sns.histplot(data, kde=True)  
plt.title('Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

The resulting plot will show a bell-shaped curve, which is characteristic of the normal distribution −

Numpy Distribution with Seaborn

Visualizing a Uniform Distribution

A uniform distribution is a type of distribution in which all outcomes are equally likely. In a continuous uniform distribution, the data points are spread evenly across a given range.

We can generate data from a uniform distribution using NumPy's numpy.random.uniform() function and visualize it using Seaborn.

Example

Here, the numpy.random.uniform() function generates random numbers between the specified low and high values (0 and 10 in this case). The histogram shows a flat distribution, indicating that all values are equally likely within the specified range −

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random data from a uniform distribution
data_uniform = np.random.uniform(low=0, high=10, size=1000)

# Visualize the distribution using Seaborn
sns.histplot(data_uniform, kde=True)
plt.title('Uniform Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

The output produced will show a uniform distribution where the frequency of each value is approximately the same across the range −

Uniform Numpy Seaborn Distribution

Visualizing Exponential Distribution

An exponential distribution is often used to model the time between events in a Poisson process. It is biased with a high frequency of small values and a long tail for larger values.

NumPy provides the numpy.random.exponential() function to generate random data from an exponential distribution.

Example

In the following example, we are creating a plot that will show a distribution with a peak near zero and a tail extending to the right. This is characteristic of exponential distributions, where the probability of a value occurring decreases exponentially as the value increases −

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random data from an exponential distribution
data_exponential = np.random.exponential(scale=1, size=1000)

# Visualize the distribution using Seaborn
sns.histplot(data_exponential, kde=True)
plt.title('Exponential Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

We get the output as shown below −

NumPy Seaborn Exponential Distribution

Visualizing the Pareto Distribution

As we discussed earlier, the Pareto distribution follows a power-law and is often used in economics to model wealth distribution. You can generate data for a Pareto distribution using NumPy's numpy.random.pareto() function.

Example

Let us visualize the pareto distribution using Seaborn −

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random data from a Pareto distribution
# Adding 1 to shift the minimum value
data_pareto = np.random.pareto(a=2, size=1000) + 1  

# Visualize the distribution using Seaborn
sns.histplot(data_pareto, kde=True)
plt.title('Pareto Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

The Pareto distribution will show a highly skewed histogram with a long tail extending to the right, reflecting that a few large values dominate the dataset −

NumPy Seaborn Pareto Distribution

Customizing Seaborn Plots

Seaborn allows you to customize the appearance of the plots easily. For instance, you can adjust the number of bins in the histogram, change the colors of the plot, or even modify the style of the plot. Here are a few ways to customize the appearance −

  • Change the number of bins: You can control the number of bins in the histogram by specifying the bins parameter.
  • Change the color: Use the color parameter to set a custom color for the plot.
  • Modify the style: Seaborn provides several built-in styles (such as 'darkgrid', 'whitegrid', etc.) that can be applied to the plot using sns.set_style().

Example

In the following example, we are creating a plot with 30 bins, a blue color, and a white grid background −

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate random data from a normal distribution
data = np.random.normal(loc=0, scale=1, size=1000)

# Customize the plot style
sns.set_style('whitegrid')

# Plot with more bins and custom color
sns.histplot(data, bins=30, color='blue', kde=True)
plt.title('Customized Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

The result produced is as follows −

NumPy Seaborn Customized Normal Distribution
Advertisements