
- NumPy - Home
- NumPy - Introduction
- NumPy - Environment
- NumPy Arrays
- NumPy - Ndarray Object
- NumPy - Data Types
- NumPy Creating and Manipulating Arrays
- NumPy - Array Creation Routines
- NumPy - Array Manipulation
- NumPy - Array from Existing Data
- NumPy - Array From Numerical Ranges
- NumPy - Iterating Over Array
- NumPy - Reshaping Arrays
- NumPy - Concatenating Arrays
- NumPy - Stacking Arrays
- NumPy - Splitting Arrays
- NumPy - Flattening Arrays
- NumPy - Transposing Arrays
- NumPy Indexing & Slicing
- NumPy - Indexing & Slicing
- NumPy - Indexing
- NumPy - Slicing
- NumPy - Advanced Indexing
- NumPy - Fancy Indexing
- NumPy - Field Access
- NumPy - Slicing with Boolean Arrays
- NumPy Array Attributes & Operations
- NumPy - Array Attributes
- NumPy - Array Shape
- NumPy - Array Size
- NumPy - Array Strides
- NumPy - Array Itemsize
- NumPy - Broadcasting
- NumPy - Arithmetic Operations
- NumPy - Array Addition
- NumPy - Array Subtraction
- NumPy - Array Multiplication
- NumPy - Array Division
- NumPy Advanced Array Operations
- NumPy - Swapping Axes of Arrays
- NumPy - Byte Swapping
- NumPy - Copies & Views
- NumPy - Element-wise Array Comparisons
- NumPy - Filtering Arrays
- NumPy - Joining Arrays
- NumPy - Sort, Search & Counting Functions
- NumPy - Searching Arrays
- NumPy - Union of Arrays
- NumPy - Finding Unique Rows
- NumPy - Creating Datetime Arrays
- NumPy - Binary Operators
- NumPy - String Functions
- NumPy - Matrix Library
- NumPy - Linear Algebra
- NumPy - Matplotlib
- NumPy - Histogram Using Matplotlib
- NumPy Sorting and Advanced Manipulation
- NumPy - Sorting Arrays
- NumPy - Sorting along an axis
- NumPy - Sorting with Fancy Indexing
- NumPy - Structured Arrays
- NumPy - Creating Structured Arrays
- NumPy - Manipulating Structured Arrays
- NumPy - Record Arrays
- Numpy - Loading Arrays
- Numpy - Saving Arrays
- NumPy - Append Values to an Array
- NumPy - Swap Columns of Array
- NumPy - Insert Axes to an Array
- NumPy Handling Missing Data
- NumPy - Handling Missing Data
- NumPy - Identifying Missing Values
- NumPy - Removing Missing Data
- NumPy - Imputing Missing Data
- NumPy Performance Optimization
- NumPy - Performance Optimization with Arrays
- NumPy - Vectorization with Arrays
- NumPy - Memory Layout of Arrays
- Numpy Linear Algebra
- NumPy - Linear Algebra
- NumPy - Matrix Library
- NumPy - Matrix Addition
- NumPy - Matrix Subtraction
- NumPy - Matrix Multiplication
- NumPy - Element-wise Matrix Operations
- NumPy - Dot Product
- NumPy - Matrix Inversion
- NumPy - Determinant Calculation
- NumPy - Eigenvalues
- NumPy - Eigenvectors
- NumPy - Singular Value Decomposition
- NumPy - Solving Linear Equations
- NumPy - Matrix Norms
- NumPy Element-wise Matrix Operations
- NumPy - Sum
- NumPy - Mean
- NumPy - Median
- NumPy - Min
- NumPy - Max
- NumPy Set Operations
- NumPy - Unique Elements
- NumPy - Intersection
- NumPy - Union
- NumPy - Difference
- NumPy Random Number Generation
- NumPy - Random Generator
- NumPy - Permutations & Shuffling
- NumPy - Uniform distribution
- NumPy - Normal distribution
- NumPy - Binomial distribution
- NumPy - Poisson distribution
- NumPy - Exponential distribution
- NumPy - Rayleigh Distribution
- NumPy - Logistic Distribution
- NumPy - Pareto Distribution
- NumPy - Visualize Distributions With Sea born
- NumPy - Matplotlib
- NumPy - Multinomial Distribution
- NumPy - Chi Square Distribution
- NumPy - Zipf Distribution
- NumPy File Input & Output
- NumPy - I/O with NumPy
- NumPy - Reading Data from Files
- NumPy - Writing Data to Files
- NumPy - File Formats Supported
- NumPy Mathematical Functions
- NumPy - Mathematical Functions
- NumPy - Trigonometric functions
- NumPy - Exponential Functions
- NumPy - Logarithmic Functions
- NumPy - Hyperbolic functions
- NumPy - Rounding functions
- NumPy Fourier Transforms
- NumPy - Discrete Fourier Transform (DFT)
- NumPy - Fast Fourier Transform (FFT)
- NumPy - Inverse Fourier Transform
- NumPy - Fourier Series and Transforms
- NumPy - Signal Processing Applications
- NumPy - Convolution
- NumPy Polynomials
- NumPy - Polynomial Representation
- NumPy - Polynomial Operations
- NumPy - Finding Roots of Polynomials
- NumPy - Evaluating Polynomials
- NumPy Statistics
- NumPy - Statistical Functions
- NumPy - Descriptive Statistics
- NumPy Datetime
- NumPy - Basics of Date and Time
- NumPy - Representing Date & Time
- NumPy - Date & Time Arithmetic
- NumPy - Indexing with Datetime
- NumPy - Time Zone Handling
- NumPy - Time Series Analysis
- NumPy - Working with Time Deltas
- NumPy - Handling Leap Seconds
- NumPy - Vectorized Operations with Datetimes
- NumPy ufunc
- NumPy - ufunc Introduction
- NumPy - Creating Universal Functions (ufunc)
- NumPy - Arithmetic Universal Function (ufunc)
- NumPy - Rounding Decimal ufunc
- NumPy - Logarithmic Universal Function (ufunc)
- NumPy - Summation Universal Function (ufunc)
- NumPy - Product Universal Function (ufunc)
- NumPy - Difference Universal Function (ufunc)
- NumPy - Finding LCM with ufunc
- NumPy - ufunc Finding GCD
- NumPy - ufunc Trigonometric
- NumPy - Hyperbolic ufunc
- NumPy - Set Operations ufunc
- NumPy Useful Resources
- NumPy - Quick Guide
- NumPy - Cheatsheet
- NumPy - Useful Resources
- NumPy - Discussion
- NumPy Compiler
NumPy - Removing Missing Data
Removing Missing Data from Arrays
Removing missing data from arrays involves cleaning the dataset by eliminating entries that contain NaN or other indicators of missing values.
NaN is used to denote undefined or unrepresentable values. It is important to address NaN values before performing any calculations to avoid misleading results or errors.
Removing Missing Data from 1D Arrays
Removing missing data from 1D arrays involves filtering out elements that are marked as missing, usually represented by NaN (Not a Number). In a 1D array, missing values are identified using the np.isnan() function, which creates a boolean array where each "True" value corresponds to a "NaN" entry in the original array.
To remove these missing values, you apply this boolean mask to the array, inverting the mask to focus on non-NaN entries. Specifically, ~np.isnan() generates a boolean array where True indicates valid data.
By using this mask to index the original array, you filter out all NaN values, resulting in a cleaned array that contains only valid entries.
Example
In the following example, we use Boolean indexing with np.isnan() function to create a mask that identifies NaN values. We then apply this mask to remove NaN values from the original array −
import numpy as np # Creating a 1D array with NaN values arr = np.array([1.0, 2.5, np.nan, 4.7, np.nan, 6.2]) # Removing NaN values using Boolean indexing cleaned_arr = arr[~np.isnan(arr)] print("Original Array:\n", arr) print("Cleaned Array (without NaN):\n", cleaned_arr)
Following is the output obtained −
Original Array: [1. 2.5 nan 4.7 nan 6.2] Cleaned Array (without NaN): [1. 2.5 4.7 6.2]
Removing Missing Data from 2D Arrays
Removing missing data from 2D arrays involves eliminating rows or columns that contain NaN (Not a Number) values.
This process ensures that the dataset is cleaned and suitable for analysis or modeling. Depending on the specific requirements, you can choose to remove entire rows or columns where missing values are present.
Example
In this example, we use np.isnan() function combined with any() function to create a mask that identifies rows containing NaN values. We then use this mask to filter out and remove those rows from the original 2D array −
import numpy as np # Creating a 2D array with NaN values arr_2d = np.array([[1.0, np.nan, 3.5], [np.nan, 5.1, 6.3], [7.2, 8.1, 9.4]]) # Removing rows with NaN values cleaned_arr_2d = arr_2d[~np.isnan(arr_2d).any(axis=1)] print("Original 2D Array:\n", arr_2d) print("Cleaned 2D Array (rows without NaN):\n", cleaned_arr_2d)
This will produce the following result −
Original 2D Array: [[1. nan 3.5] [nan 5.1 6.3] [7.2 8.1 9.4]] Cleaned 2D Array (rows without NaN): [[7.2 8.1 9.4]]
Removing Columns with Missing Data
Removing columns with missing data involves eliminating entire columns from a 2D array or dataset where any element is marked as missing, generally represented by NaN (Not a Number).
This is a common data cleaning step used to ensure that the dataset only includes columns with complete data, which can improve the quality of subsequent analyses.
Example
In the example below, we are creating a 2D array with some NaN values and removing columns that contain any NaN values using np.isnan() function combined with the any() function. This identifies columns with NaN values and then filters the array to exclude those columns −
import numpy as np # Create a 2D array with some NaN values arr_2d = np.array([[1.0, np.nan, 3.0], [4.0, 5.0, 6.0], [np.nan, 8.0, 9.0]]) # Remove columns with any NaN values cleaned_arr_2d_cols = arr_2d[:, ~np.isnan(arr_2d).any(axis=0)] print("Original 2D array:") print(arr_2d) print("2D array with columns containing NaN removed:") print(cleaned_arr_2d_cols)
Following is the output of the above code −
Original 2D array: [[ 1. nan 3.] [ 4. 5. 6.] [nan 8. 9.]] 2D array with columns containing NaN removed: [[3.] [6.] [9.]]
Removing Missing Data from Multi-dimensional Arrays
Removing missing data from multi-dimensional arrays involves a process similar to that used for 1D and 2D arrays but applied to higher dimensions.
Multi-dimensional arrays (e.g., 3D or 4D arrays) present additional complexity because missing values may occur across multiple dimensions. The goal is to filter out slices or specific parts of the array that contain missing data.
Example
In the following example, we are creating a 3D array with some NaN values and removing slices (2D arrays) that contain any NaN values. We use the np.isnan() function combined with the any() function to identify slices with NaN values and then filter out those slices from the array −
import numpy as np # Creating a 3D array with NaN values arr_3d = np.array([[[1.0, np.nan], [3.5, 4.2]], [[np.nan, 6.3], [7.2, 8.1]]]) # Removing slices with NaN values cleaned_arr_3d = arr_3d[~np.isnan(arr_3d).any(axis=(1, 2))] print("Original 3D Array:\n", arr_3d) print("Cleaned 3D Array (slices without NaN):\n", cleaned_arr_3d)
The output obtained is as shown below −
Original 3D Array: [[[1. nan] [3.5 4.2]] [[nan 6.3] [7.2 8.1]]] Cleaned 3D Array (slices without NaN): []
Removing Missing Values from Structured Arrays
Removing missing values from structured arrays in NumPy involves handling arrays with complex data types where each element is a record or a row with multiple fields.
Structured arrays can include missing values (NaN or other placeholders) in specific fields. The goal is to filter out records that contain missing values, ensuring that only complete data is retained.
Example
In the following example, we define a structured array with fields 'name' and 'age', using 'f4' (float32) for the 'age' field to accommodate NaN values. We then create a boolean mask to identify and remove records with missing values in the 'age' field −
import numpy as np # Define a structured array with fields 'name' and 'age' # Use 'f4' (float32) for the 'age' field to handle NaN values dtype = [('name', 'U10'), ('age', 'f4')] data = [('Alice', 25.0), ('Bob', np.nan), ('Charlie', 30.0)] structured_array = np.array(data, dtype=dtype) # Identify missing values in the 'age' field nan_mask = np.isnan(structured_array['age']) # Remove records with missing values in the 'age' field cleaned_structured_array = structured_array[~nan_mask] print("Original structured array:") print(structured_array) print("Structured array with missing values removed:") print(cleaned_structured_array)
After executing the above code, we get the following output −
Original structured array: [('Alice', 25.) ('Bob', nan) ('Charlie', 30.)] Structured array with missing values removed: [('Alice', 25.) ('Charlie', 30.)]