NumPy - Home
NumPy - Introduction
NumPy - Environment
NumPy Arrays
NumPy - Ndarray Object
NumPy - Data Types
NumPy Creating and Manipulating Arrays
NumPy - Array Creation Routines
NumPy - Array Manipulation
NumPy - Array from Existing Data
NumPy - Array From Numerical Ranges
NumPy - Iterating Over Array
NumPy - Reshaping Arrays
NumPy - Concatenating Arrays
NumPy - Stacking Arrays
NumPy - Splitting Arrays
NumPy - Flattening Arrays
NumPy - Transposing Arrays
NumPy Indexing & Slicing
NumPy - Indexing & Slicing
NumPy - Indexing
NumPy - Slicing
NumPy - Advanced Indexing
NumPy - Fancy Indexing
NumPy - Field Access
NumPy - Slicing with Boolean Arrays
NumPy Array Attributes & Operations
NumPy - Array Attributes
NumPy - Array Shape
NumPy - Array Size
NumPy - Array Strides
NumPy - Array Itemsize
NumPy - Broadcasting
NumPy - Arithmetic Operations
NumPy - Array Addition
NumPy - Array Subtraction
NumPy - Array Multiplication
NumPy - Array Division
NumPy Advanced Array Operations
NumPy - Swapping Axes of Arrays
NumPy - Byte Swapping
NumPy - Copies & Views
NumPy - Element-wise Array Comparisons
NumPy - Filtering Arrays
NumPy - Joining Arrays
NumPy - Sort, Search & Counting Functions
NumPy - Searching Arrays
NumPy - Union of Arrays
NumPy - Finding Unique Rows
NumPy - Creating Datetime Arrays
NumPy - Binary Operators
NumPy - String Functions
NumPy - Matrix Library
NumPy - Linear Algebra
NumPy - Matplotlib
NumPy - Histogram Using Matplotlib
NumPy Sorting and Advanced Manipulation
NumPy - Sorting Arrays
NumPy - Sorting along an axis
NumPy - Sorting with Fancy Indexing
NumPy - Structured Arrays
NumPy - Creating Structured Arrays
NumPy - Manipulating Structured Arrays
NumPy - Record Arrays
Numpy - Loading Arrays
Numpy - Saving Arrays
NumPy - Append Values to an Array
NumPy - Swap Columns of Array
NumPy - Insert Axes to an Array
NumPy Handling Missing Data
NumPy - Handling Missing Data
NumPy - Identifying Missing Values
NumPy - Removing Missing Data
NumPy - Imputing Missing Data
NumPy Performance Optimization
NumPy - Performance Optimization with Arrays
NumPy - Vectorization with Arrays
NumPy - Memory Layout of Arrays
Numpy Linear Algebra
NumPy - Linear Algebra
NumPy - Matrix Library
NumPy - Matrix Addition
NumPy - Matrix Subtraction
NumPy - Matrix Multiplication
NumPy - Element-wise Matrix Operations
NumPy - Dot Product
NumPy - Matrix Inversion
NumPy - Determinant Calculation
NumPy - Eigenvalues
NumPy - Eigenvectors
NumPy - Singular Value Decomposition
NumPy - Solving Linear Equations
NumPy - Matrix Norms
NumPy Element-wise Matrix Operations
NumPy - Sum
NumPy - Mean
NumPy - Median
NumPy - Min
NumPy - Max
NumPy Set Operations
NumPy - Unique Elements
NumPy - Intersection
NumPy - Union
NumPy - Difference
NumPy Random Number Generation
NumPy - Random Generator
NumPy - Permutations & Shuffling
NumPy - Uniform distribution
NumPy - Normal distribution
NumPy - Binomial distribution
NumPy - Poisson distribution
NumPy - Exponential distribution
NumPy - Rayleigh Distribution
NumPy - Logistic Distribution
NumPy - Pareto Distribution
NumPy - Visualize Distributions With Sea born
NumPy - Matplotlib
NumPy - Multinomial Distribution
NumPy - Chi Square Distribution
NumPy - Zipf Distribution
NumPy File Input & Output
NumPy - I/O with NumPy
NumPy - Reading Data from Files
NumPy - Writing Data to Files
NumPy - File Formats Supported
NumPy Mathematical Functions
NumPy - Mathematical Functions
NumPy - Trigonometric functions
NumPy - Exponential Functions
NumPy - Logarithmic Functions
NumPy - Hyperbolic functions
NumPy - Rounding functions
NumPy Fourier Transforms
NumPy - Discrete Fourier Transform (DFT)
NumPy - Fast Fourier Transform (FFT)
NumPy - Inverse Fourier Transform
NumPy - Fourier Series and Transforms
NumPy - Signal Processing Applications
NumPy - Convolution
NumPy Polynomials
NumPy - Polynomial Representation
NumPy - Polynomial Operations
NumPy - Finding Roots of Polynomials
NumPy - Evaluating Polynomials
NumPy Statistics
NumPy - Statistical Functions
NumPy - Descriptive Statistics
NumPy Datetime
NumPy - Basics of Date and Time
NumPy - Representing Date & Time
NumPy - Date & Time Arithmetic
NumPy - Indexing with Datetime
NumPy - Time Zone Handling
NumPy - Time Series Analysis
NumPy - Working with Time Deltas
NumPy - Handling Leap Seconds
NumPy - Vectorized Operations with Datetimes
NumPy ufunc
NumPy - ufunc Introduction
NumPy - Creating Universal Functions (ufunc)
NumPy - Arithmetic Universal Function (ufunc)
NumPy - Rounding Decimal ufunc
NumPy - Logarithmic Universal Function (ufunc)
NumPy - Summation Universal Function (ufunc)
NumPy - Product Universal Function (ufunc)
NumPy - Difference Universal Function (ufunc)
NumPy - Finding LCM with ufunc
NumPy - ufunc Finding GCD
NumPy - ufunc Trigonometric
NumPy - Hyperbolic ufunc
NumPy - Set Operations ufunc
NumPy Useful Resources
NumPy - Quick Guide
NumPy - Cheatsheet
NumPy - Useful Resources
NumPy - Discussion
NumPy Compiler

NumPy - Removing Missing Data

Quiz

Removing Missing Data from Arrays

Removing missing data from arrays involves cleaning the dataset by eliminating entries that contain NaN or other indicators of missing values.

NaN is used to denote undefined or unrepresentable values. It is important to address NaN values before performing any calculations to avoid misleading results or errors.

Removing Missing Data from 1D Arrays

Removing missing data from 1D arrays involves filtering out elements that are marked as missing, usually represented by NaN (Not a Number). In a 1D array, missing values are identified using the np.isnan() function, which creates a boolean array where each "True" value corresponds to a "NaN" entry in the original array.

To remove these missing values, you apply this boolean mask to the array, inverting the mask to focus on non-NaN entries. Specifically, ~np.isnan() generates a boolean array where True indicates valid data.

By using this mask to index the original array, you filter out all NaN values, resulting in a cleaned array that contains only valid entries.

Example

In the following example, we use Boolean indexing with np.isnan() function to create a mask that identifies NaN values. We then apply this mask to remove NaN values from the original array −

import numpy as np

# Creating a 1D array with NaN values
arr = np.array([1.0, 2.5, np.nan, 4.7, np.nan, 6.2])

# Removing NaN values using Boolean indexing
cleaned_arr = arr[~np.isnan(arr)]

print("Original Array:\n", arr)
print("Cleaned Array (without NaN):\n", cleaned_arr)

Following is the output obtained −

Original Array:
[1.  2.5 nan 4.7 nan 6.2]
Cleaned Array (without NaN):
[1.  2.5 4.7 6.2]

Removing Missing Data from 2D Arrays

Removing missing data from 2D arrays involves eliminating rows or columns that contain NaN (Not a Number) values.

This process ensures that the dataset is cleaned and suitable for analysis or modeling. Depending on the specific requirements, you can choose to remove entire rows or columns where missing values are present.

Example

In this example, we use np.isnan() function combined with any() function to create a mask that identifies rows containing NaN values. We then use this mask to filter out and remove those rows from the original 2D array −

import numpy as np 

# Creating a 2D array with NaN values
arr_2d = np.array([[1.0, np.nan, 3.5],
                   [np.nan, 5.1, 6.3],
                   [7.2, 8.1, 9.4]])

# Removing rows with NaN values
cleaned_arr_2d = arr_2d[~np.isnan(arr_2d).any(axis=1)]

print("Original 2D Array:\n", arr_2d)
print("Cleaned 2D Array (rows without NaN):\n", cleaned_arr_2d)

This will produce the following result −

Original 2D Array:
[[1.  nan 3.5]
 [nan 5.1 6.3]
 [7.2 8.1 9.4]]
Cleaned 2D Array (rows without NaN):
[[7.2 8.1 9.4]]

Removing Columns with Missing Data

Removing columns with missing data involves eliminating entire columns from a 2D array or dataset where any element is marked as missing, generally represented by NaN (Not a Number).

This is a common data cleaning step used to ensure that the dataset only includes columns with complete data, which can improve the quality of subsequent analyses.

Example

In the example below, we are creating a 2D array with some NaN values and removing columns that contain any NaN values using np.isnan() function combined with the any() function. This identifies columns with NaN values and then filters the array to exclude those columns −

import numpy as np

# Create a 2D array with some NaN values
arr_2d = np.array([[1.0, np.nan, 3.0],
                   [4.0, 5.0, 6.0],
                   [np.nan, 8.0, 9.0]])

# Remove columns with any NaN values
cleaned_arr_2d_cols = arr_2d[:, ~np.isnan(arr_2d).any(axis=0)]

print("Original 2D array:")
print(arr_2d)
print("2D array with columns containing NaN removed:")
print(cleaned_arr_2d_cols)

Following is the output of the above code −

Original 2D array:
[[ 1. nan  3.]
 [ 4.  5.  6.]
 [nan  8.  9.]]
2D array with columns containing NaN removed:
[[3.]
 [6.]
 [9.]]

Removing Missing Data from Multi-dimensional Arrays

Removing missing data from multi-dimensional arrays involves a process similar to that used for 1D and 2D arrays but applied to higher dimensions.

Multi-dimensional arrays (e.g., 3D or 4D arrays) present additional complexity because missing values may occur across multiple dimensions. The goal is to filter out slices or specific parts of the array that contain missing data.

Example

In the following example, we are creating a 3D array with some NaN values and removing slices (2D arrays) that contain any NaN values. We use the np.isnan() function combined with the any() function to identify slices with NaN values and then filter out those slices from the array −

import numpy as np 

# Creating a 3D array with NaN values
arr_3d = np.array([[[1.0, np.nan],
                    [3.5, 4.2]],
                   [[np.nan, 6.3],
                    [7.2, 8.1]]])

# Removing slices with NaN values
cleaned_arr_3d = arr_3d[~np.isnan(arr_3d).any(axis=(1, 2))]

print("Original 3D Array:\n", arr_3d)
print("Cleaned 3D Array (slices without NaN):\n", cleaned_arr_3d)

The output obtained is as shown below −

Original 3D Array:
[[[1.  nan]
  [3.5 4.2]]

 [[nan 6.3]
  [7.2 8.1]]]
Cleaned 3D Array (slices without NaN):
[]

Removing Missing Values from Structured Arrays

Removing missing values from structured arrays in NumPy involves handling arrays with complex data types where each element is a record or a row with multiple fields.

Structured arrays can include missing values (NaN or other placeholders) in specific fields. The goal is to filter out records that contain missing values, ensuring that only complete data is retained.

Example

In the following example, we define a structured array with fields 'name' and 'age', using 'f4' (float32) for the 'age' field to accommodate NaN values. We then create a boolean mask to identify and remove records with missing values in the 'age' field −

import numpy as np

# Define a structured array with fields 'name' and 'age'
# Use 'f4' (float32) for the 'age' field to handle NaN values
dtype = [('name', 'U10'), ('age', 'f4')]
data = [('Alice', 25.0), ('Bob', np.nan), ('Charlie', 30.0)]
structured_array = np.array(data, dtype=dtype)

# Identify missing values in the 'age' field
nan_mask = np.isnan(structured_array['age'])

# Remove records with missing values in the 'age' field
cleaned_structured_array = structured_array[~nan_mask]

print("Original structured array:")
print(structured_array)
print("Structured array with missing values removed:")
print(cleaned_structured_array)

After executing the above code, we get the following output −

Original structured array:
[('Alice', 25.) ('Bob', nan) ('Charlie', 30.)]
Structured array with missing values removed:
[('Alice', 25.) ('Charlie', 30.)]

Print Page