Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

NumPy - Intersection



Intersection in NumPy

In NumPy, the term "intersection" refers to the elements that are common between two or more arrays.

NumPy provides a built-in function called numpy.intersect1d() that helps in finding the intersection between two arrays.

What is Array Intersection?

When you work with arrays, you might often need to find the elements that appear in both of them. This process is called finding the intersection.

For instance, if you have two sets of numbers and you need to determine which numbers appear in both, you can perform an intersection operation.

The NumPy intersect1d() Function

In NumPy, the intersect1d() function is used to find the intersection of two 1-dimensional arrays, or even more arrays if necessary.

Following is the basic syntax of the NumPy intersect1d() function. It works by comparing two input arrays and returning an array containing the common elements −

numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

Where,

  • ar1, ar2: These are the two input arrays in which we want to find the common elements.
  • assume_unique: If set to True, it assumes that both input arrays contain only unique elements, speeding up the computation.
  • return_indices: If set to True, the function returns not only the intersection elements but also their indices in the original arrays.

Example

In the following example, we are finding the common elements between two arrays using the numpy.intersect1d() function −

import numpy as np

# Define two arrays
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([4, 5, 6, 7, 8])

# Find intersection of the two arrays
intersection = np.intersect1d(array1, array2)

print("Intersection of array1 and array2:", intersection)

Following is the output obtained −

Intersection of array1 and array2: [4 5]

Assuming Unique Elements for Faster Computation

In cases where you are sure that the input arrays contain only unique elements (i.e., no duplicates), you can pass True to the assume_unique parameter. This speeds up the computation by avoiding the need to check for duplicates:

Example

As in the previous example, the intersection remains the same, but the function is more efficient due to the assumption of uniqueness −

import numpy as np

# Define two arrays with unique elements
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([4, 5, 6, 7, 8])

# Find intersection assuming unique elements
intersection = np.intersect1d(array1, array2, assume_unique=True)

print("Intersection assuming unique elements:", intersection)

The output obtained is as follows −

Intersection assuming unique elements: [4 5]

Returning Indices of Intersection Elements

In addition to the intersection elements, the numpy.intersect1d() function can also return the indices of these elements in the input arrays.

This is particularly useful when you want to know the exact positions of the common elements in the original arrays. To achieve this, set the return_indices parameter to True.

Example

In this example, the intersection elements 4 and 5 appear at indices 3 and 4 in array1 and at indices 0 and 1 in array2

import numpy as np

# Define two arrays
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([4, 5, 6, 7, 8])

# Find intersection and return indices
intersection, indices1, indices2 = np.intersect1d(array1, array2, return_indices=True)

print("Intersection elements:", intersection)
print("Indices in array1:", indices1)
print("Indices in array2:", indices2)

After executing the above code, we get the following output −

Intersection elements: [4 5]
Indices in array1: [3 4]
Indices in array2: [0 1]

Intersection of More Than Two Arrays

The numpy.intersect1d() function can also be used to find the intersection of more than two arrays.

While the function itself is designed to work with two arrays at a time, you can easily extend it to multiple arrays by using loops or the reduce() function from the functools module.

Example

As shown in the example below, the common element among all three arrays is 5, which forms the intersection −

import numpy as np
from functools import reduce

# Define multiple arrays
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([4, 5, 6, 7, 8])
array3 = np.array([5, 6, 7, 8, 9])

# Find intersection of all arrays
intersection = reduce(np.intersect1d, [array1, array2, array3])

print("Intersection of multiple arrays:", intersection)

The result produced is as follows −

Intersection of multiple arrays: [5]

Working with Arrays of Different Data Types

NumPy's intersect1d() function can also handle arrays of different data types, such as integers, floats, and strings.

However, the function compares the elements based on their data types, meaning it performs type-sensitive matching.

Example

In this example, the intersection element 4 is returned as a float because the first array contains floating-point numbers −

import numpy as np

# Define arrays with different data types
array1 = np.array([1.0, 2.0, 3.0, 4.0])
array2 = np.array([4, 5, 6, 7])

# Find intersection elements
intersection = np.intersect1d(array1, array2)

print("Intersection elements:", intersection)

The output obtained is as shown below −

Intersection elements: [4.]

Dealing with Floating-Point Precision Issues

When working with floating-point numbers, precision issues can arise, especially when the values are very close to each other but not exactly the same due to the way floating-point arithmetic works. To avoid this, you can round the arrays before performing the intersection.

Example

By rounding the arrays to two decimal places, the intersection operation works more accurately despite the small floating-point differences as shown in the example below −

import numpy as np

# Define floating-point arrays
array1 = np.array([1.234, 2.345, 3.456, 4.567])
array2 = np.array([4.567, 5.678, 6.789])

# Round arrays and find intersection
array1_rounded = np.round(array1, 2)
array2_rounded = np.round(array2, 2)

intersection = np.intersect1d(array1_rounded, array2_rounded)

print("Intersection after rounding:", intersection)

The output produced is as follows −

Intersection after rounding: [4.57]
Advertisements