Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

NumPy - Median



What is Median?

In mathematics, the median is the middle value of a set of numbers when they are arranged in order.

If the set has an odd number of values, the median is the middle one. If it has an even number of values, the median is the average of the two middle values.

The median is useful for finding the central tendency of data, especially when there are outliers.

The NumPy median() Function

The median() function in NumPy calculates the median of an array's elements. It sorts the values and returns the middle value, or the average of the two middle values if the array has an even number of elements.

You can also specify an axis to calculate the median along rows or columns. For example, np.median([1, 3, 2, 4]) returns 2.5.

Following is the basic syntax of the median() function in NumPy −

numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)

Where,

  • a: The input array or dataset for which the median is calculated.
  • axis: Specifies the axis along which the median is computed. If None (default), the median is computed over the entire array.
  • out: This allows you to specify a location where the result will be stored. If None (default), the result is returned as a new array.
  • overwrite_input: If True, the input array is modified in place to save memory. This is useful when you do not need the original data.
  • keepdims: If True, the result will retain the reduced dimensions, allowing for easier broadcasting. If False (default), the result is squeezed.

Understanding the Median Calculation

The calculation of the median in a dataset follows these steps −

  • Step 1: Sort the array in ascending order.
  • Step 2: Find the middle element. If the number of elements is odd, the middle element is the median.
  • Step 3: If the number of elements is even, calculate the average of the two middle elements to get the median.

Example

Let us understand this concept with an example. Here, in the first example, the array has an odd number of elements (5), so the middle element (5) is returned as the median.

In the second example, the array has an even number of elements (4), so the median is calculated by averaging the two middle elements (3 and 5), which gives 4.0 as the result −

import numpy as np

data_odd = np.array([1, 3, 5, 7, 9])
data_even = np.array([1, 3, 5, 7])

# Calculating the median for both datasets
median_odd = np.median(data_odd)
median_even = np.median(data_even)

print("Median of odd dataset:", median_odd)
print("Median of even dataset:", median_even)

Following is the output obtained −

Median of odd dataset: 5.0
Median of even dataset: 4.0

Computing Median along Different Axes

In NumPy, the axis parameter allows you to compute the median along specific axes of a multi-dimensional array. The axis refers to the direction in which the median should be calculated. For example, in a 2D array −

  • axis=0: Calculate the median along the columns (vertical axis).
  • axis=1: Calculate the median along the rows (horizontal axis).

Example

In the following example, we are computing the median along both axes of a 2D array −

import numpy as np

# Create a 2D array
data_2d = np.array([[1, 3, 5], [2, 4, 6], [7, 8, 9]])

# Calculate the median along axis 0 (columns)
median_axis_0 = np.median(data_2d, axis=0)

# Calculate the median along axis 1 (rows)
median_axis_1 = np.median(data_2d, axis=1)

print("Median along axis 0:", median_axis_0)
print("Median along axis 1:", median_axis_1)

In the output below, the median along axis 0 is computed by taking the median of each column. The median along axis 1 is calculated by taking the median of each row −

Median along axis 0: [2. 4. 6.]
Median along axis 1: [3. 4. 8.]

Median for Higher-Dimensional Arrays

The numpy.median() function also works for arrays with more than two dimensions. You can specify the axis along which to calculate the median, and the function will return the median for that axis while retaining the other dimensions. If no axis is specified, the median is calculated over the entire array.

Example

Following is an example to compute the median of a 3D array −

import numpy as np

# Create a 3D array
data_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

# Median along axis 0
median_3d_axis_0 = np.median(data_3d, axis=0)

# Median along axis 1
median_3d_axis_1 = np.median(data_3d, axis=1)

# Median along axis 2
median_3d_axis_2 = np.median(data_3d, axis=2)

print("Median along axis 0:", median_3d_axis_0)
print("Median along axis 1:", median_3d_axis_1)
print("Median along axis 2:", median_3d_axis_2)

In this case, the median is calculated along each of the axes (0, 1, and 2) for the 3D array. The function returns the median values for each of the specified axes while preserving the other dimensions −

Median along axis 0: [[3. 4.]
 [5. 6.]]
Median along axis 1: [[2. 3.]
 [6. 7.]]
Median along axis 2: [[1.5 3.5]
 [5.5 7.5]]

Handling NaN (Not a Number) Values

Sometimes, arrays may contain NaN (Not a Number) values, which can interfere with the calculation of the median. To handle NaN values, NumPy provides an option to ignore them during median calculation. You can use the numpy.nanmedian() function, which computes the median while ignoring NaN values.

Example

Following is an example to handle NaN values while calculating median in NumPy −

import numpy as np

# Create an array with NaN values
data_with_nan = np.array([1, 3, np.nan, 5, 7])

# Calculate the median while ignoring NaN values
median_without_nan = np.nanmedian(data_with_nan)

print("Median without NaN:", median_without_nan)

In this example, the np.nanmedian() function ignores the NaN value and computes the median of the remaining numbers, resulting in 4.0.

Median without NaN: 4.0
Advertisements