vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
Welcome to your first notebook of this specialization! As mentioned in the lecture videos, you
will use Python for the labs and programming assignments in this specializiation. In this course,
most of your work will be inside a library called NumPy. NumPy (Numerical Python) is an open-
source package that is widely used in science and engineering. You can check out the official
NumPy documentaion here. In this notebook, you will use NumPy to create 2-D arrays and easily
compute mathematical operations. Feel free to skip this notebook if you are already fluent with
NumPy.
Instructions
Table of Contents
About Jupyter Notebooks
• 1 - Basics of NumPy
– 1.1 - Packages
– 1.2 - Advantages of using NumPy arrays
– 1.3 - How to create NumPy arrays
– 1.4 - More on NumPy arrays
• 2 - Multidimensional arrays
– 2.1 - Finding size, shape and dimension
• 3 - Array math operations
– 3.1 - Multiplying vector with a scalar (broadcasting)
• 4 - Indexing and slicing
– 4.1 - Indexing
– 4.2 - Slicing
• 5 - Stacking
About Jupyter Notebooks
Jupyter Notebooks are interactive coding journals that integrate live code, explanatory text,
equations, visualizations and other multimedia resources, all in a single document. As a first
exercise, run the test snippet below and the print statement cell for "Hello World".
# Run the "Hello World" in the cell below to print "Hello World".
test = "Hello World"
print(test)
Hello World
1 - Basics of NumPy
NumPy is the main package for scientific computing in Python. It performs a wide variety of
advanced mathematical operations with high efficiency. In this practice lab you will learn several
key NumPy functions that will help you in future assignments, such as creating arrays, slicing,
indexing, reshaping and stacking.
1.1 - Packages
Before you get started, you have to import NumPy to load its functions. As you may notice, even
though there is no expected output, when you run this cell, the Jupyter Notebook imports the
package (often referred to as the library) and its functions. Try it for yourself and run the
following cell.
import numpy as np
NumPy provides an array object that is much faster and more compact than Python lists.
Through its extensive API integration, the library offers many built-in functions that make
computing much easier with only a few lines of code. This can be a huge advantage when
performing math operations on large datasets.
The array object in NumPy is called ndarray meaning 'n-dimensional array'. To begin with, you
will use one of the most common array types: the one-dimensional array ('1-D'). A 1-D array
represents a standard list of values entirely in one dimension. Remember that in NumPy, all of
the elements within the array are of the same type.
print(one_dimensional_arr)
[10 12]
print(a)
[1 2 3]
Another way to implement an array is using np.arange(). This function will return an array of
evenly spaced values within a given interval. To learn more about the arguments that this
function takes, there is a powerful feature in Jupyter Notebook that allows you to access the
documentation of any function by simply pressing shift+tab on your keyboard when clicking
on the function. Give it a try for the built-in documentation of np.arange().
print(b)
[0 1 2]
print(c)
[ 1 4 7 10 13 16 19]
What if you wanted to create an array with five evenly spaced values in the interval from 0 to
100? As you may notice, you have 3 parameters that a function must take. One paremeter is the
starting number, in this case 0, the final number 100 and the number of elements in the array, in
this case, 5. NumPy has a function that allows you to do specifically this by using
np.linspace().
print(lin_spaced_arr)
Did you notice that the output of the function is presented in the float value form (e.g. "... 25.
50. ...")? The reason is that the default type for values in the NumPy function np.linspace is a
floating point (np.float64). You can easily specify your data type using dtype. If you access
the built-in documentation of the functions, you may notice that most functions take in an
optional parameter dtype. In addition to float, NumPy has several other data types such as int,
and char.
To change the type to integers, you need to set the dtype to int. You can do so, even in the
previous functions. Feel free to try it out and modify the cells to output your desired data type.
print(lin_spaced_arr_int)
[ 0 25 50 75 100]
print(c_int)
[ 1 4 7 10 13 16 19]
print(b_float)
[0. 1. 2.]
print(char_arr)
Did you notice that the output of the data type of the char_arr array is <U23? This means that
the string ('Welcome to Math for ML!') is a 23-character (23) unicode string (U) on a little-
endian architecture (<). You can learn more about data types here.
1.4 - More on NumPy arrays
One of the advantages of using NumPy is that you can easily create arrays with built-in functions
such as:
print(ones_arr)
[1. 1. 1.]
print(zeros_arr)
[0. 0. 0.]
print(empt_arr)
[0. 0. 0.]
print(rand_arr)
2 - Multidimensional Arrays
With NumPy you can also create arrays with more than one dimension. In the above examples,
you dealt with 1-D arrays, where you can access their elements using a single index. A
multidimensional array has more than one column. Think of a multidimensional array as an excel
sheet where each row/column represents a dimension.
# Create a 2 dimensional array (2-D)
two_dim_arr = np.array([[1,2,3], [4,5,6]])
print(two_dim_arr)
[[1 2 3]
[4 5 6]]
An alternative way to create a multidimensional array is by reshaping the initial 1-D array. Using
np.reshape() you can rearrange elements of the previous array into a new shape.
# 1-D array
one_dim_arr = np.array([1, 2, 3, 4, 5, 6])
# Print the new 2-D array with two rows and three columns
print(multi_dim_arr)
[[1 2 3]
[4 5 6]]
multi_dim_arr.ndim
multi_dim_arr.shape
(2, 3)
multi_dim_arr.size
[ 3 7 11]
[1 1 1]
[ 2 12 30]
3.1 - Multiplying vector with a scalar (broadcasting)
Suppose you need to convert miles to kilometers. To do so, you can use the NumPy array
functions that you've learned so far. You can do this by carrying out an operation between an
array (miles) and a single number (the conversion rate which is a scalar). Since, 1 mile = 1.6 km,
NumPy computes each multiplication within each cell.
This concept is called broadcasting, which allows you to perform operations specifically on
arrays of different shapes.
vector * 1.6
array([1.6, 3.2])
4.1 - Indexing
Let us select specific elements from the arrays as given.
# Select the third element of the array. Remember the counting starts
from 0.
a = np.array([1, 2, 3, 4, 5])
print(a[2])
3
1
For multidimensional arrays of shape n, to index a specific element, you must input n indices,
one for each dimension. There are two common ways to do this, either by using two sets of
brackets, or by using a single bracket and separating each index by a comma. Both methods are
shown here.
# Select element number 8 from the 2-D array using indices i, j and
two sets of brackets
print(two_dim[2][1])
# Select element number 8 from the 2-D array, this time using i and j
indexes in a single
# set of brackets, separated by a comma
print(two_dim[2,1])
8
8
4.2 - Slicing
Slicing gives you a sublist of elements that you specify from the array. The slice notation
specifies a start and end value, and copies the list from start up to but not including the end
(end-exclusive).
Note you can use slice notation with multi-dimensional indexing, as in a[0:2, :5]. This is the
extent of indexing you'll need for this course but feel free to check out the official NumPy
documentation for extensive documentation on more advanced NumPy array indexing
techniques.
print(sliced_arr)
[2 3 4]
[1 2 3]
print(sliced_arr)
[3 4 5]
print(sliced_arr)
[1 3 5]
sliced_arr_1
array([[1, 2, 3],
[4, 5, 6]])
# Similarily, slice the two_dim array to get the last two rows
sliced_two_dim_rows = two_dim[1:3]
print(sliced_two_dim_rows)
[[4 5 6]
[7 8 9]]
# This example uses slice notation to get every row, and then pulls
the second column.
# Notice how this example combines slice notation with the use of
multiple indexes
sliced_two_dim_cols = two_dim[:, 1]
print(sliced_two_dim_cols)
[2 5 8]
5 - Stacking
Finally, stacking is a feature of NumPy that leads to increased customization of arrays. It means
to join two or more arrays, either horizontally or vertically, meaning that it is done along a new
axis.
a2 = np.array([[3,3],
[4,4]])
print(f'a1:\n{a1}\n')
print(f'a2:\n{a2}')
a1:
[[1 1]
[2 2]]
a2:
[[3 3]
[4 4]]
print(vert_stack)
[[1 1]
[2 2]
[3 3]
[4 4]]
print(horz_stack)
[[1 1 3 3]
[2 2 4 4]]
[array([[1, 1],
[2, 2]]), array([[3, 3],
[4, 4]])]
[array([[1],
[2]]), array([[1],
[2]]), array([[3],
[4]]), array([[3],
[4]])]
[array([[1],
[2]]), array([[1, 3, 3],
[2, 4, 4]])]
# Split the vertically stacked array after the first and third row
vert_split_first_third = np.vsplit(vert_stack, [1, 3])
print(vert_split_first_third)
[array([[1, 1],
[2, 2]]), array([[3, 3],
[4, 4]])]
[array([[1, 1]]), array([[2, 2]]), array([[3, 3]]), array([[4, 4]])]
[array([[1, 1]]), array([[2, 2],
[3, 3]]), array([[4, 4]])]