Boolean Arrays and Masks.ipynb Colab
Boolean Arrays and Masks.ipynb Colab
This section covers the use of Boolean masks to examine and manipulate values within NumPy arrays. Masking
comes up when you want to extract, modify, count, or otherwise manipulate values in an array based on some
criterion: for example, you might wish to count all values greater than a certain value, or perhaps remove all outliers
that are above some threshold. In NumPy, Boolean masking is often the most efficient way to accomplish these
types of tasks.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
x == 3 # equal
As in the case of arithmetic operators, the comparison operators are implemented as ufuncs in NumPy; for example,
when you write x < 3 , internally NumPy uses np.less(x, 3) . A summary of the comparison operators and their
equivalent ufunc is shown here:
== np.equal != np.not_equal
Just as in the case of arithmetic ufuncs, these will work on arrays of any size and shape. Here is a two-dimensional
example:
import numpy as np
x=np.array([[1,2],[3,4]])
x
array([[1, 2],
[3, 4]])
x >10
x >10
array([[False, False],
[False, False]])
In each case, the result is a Boolean array, and NumPy provides a number of straightforward patterns for working
with these Boolean results.
print(x)
[[1 2]
[3 4]]
We see that there are eight array entries that are less than 6. Another way to get at this information is to use
np.sum ; in this case, False is interpreted as 0 , and True is interpreted as 1 :
np.sum(x<2)
array([[1, 2],
[3, 4]])
False
False
array([[1, 2],
[3, 4]])
# are all values less than 5?
np.all(x <5)
True
False
We've already seen how we might count, say, all days with rain less than four inches, or all days with rain greater than
two inches. But what if we want to know about all days with rain less than four inches and greater than one inch?
This is accomplished through Python's bitwise logic operators, & , | , ^ , and ~ . Like with the standard arithmetic
operators, NumPy overloads these as ufuncs which work element-wise on (usually Boolean) arrays.
Combining comparison operators and Boolean operators on arrays can lead to a wide range of efficient logical
operations.
The following table summarizes the bitwise Boolean operators and their equivalent ufuncs:
^ np.bitwise_xor ~ np.bitwise_not
array([[1, 2],
[3, 4]])
We can obtain a Boolean array for this condition easily, as we've already seen:
x < 3
Now to select these values from the array, we can simply index on this Boolean array; this is known as a masking
operation:
x[x < 3]
array([1, 2])
What is returned is a one-dimensional array filled with all the values that meet this condition; in other words, all the
values in positions at which the mask array is True .
By combining Boolean operations, masking operations, and aggregates, we can very quickly answer these sorts of
questions for our dataset.
keyboard_arrow_down Aside: Using the Keywords and/or Versus the Operators &/|
One common point of confusion is the difference between the keywords and and or on one hand, and the
operators & and | on the other hand. When would you use one versus the other?
The difference is this: and and or gauge the truth or falsehood of entire object, while & and | refer to bits within
each object.
When you use and or or , it's equivalent to asking Python to treat the object as a single Boolean entity. In Python, all
nonzero integers will evaluate as True. Thus:
bool(42), bool(0)
(True, False)
bool(42 and 0)
False
bool(42 or 0)
True
When you use & and | on integers, the expression operates on the bits of the element, applying the and or the or to
the individual bits making up the number:
'0b101010'
bin(42 | 59)
'0b111011'