D2L CH2 Part1

Dive into Deep Learning
Chapter 2:
Preliminaries
Data Manipulation and Preprocessing
Contents
CH2
Data Data Linear Algebra Automatic

Linear Algebra Calculus Probability Documentation
Manipulation Preprocessing cont. Differentiation
Finding All the

Reading the Derivatives and Basic Probability
Getting Started Scalars Reduction A Simple Example Functions and
Dataset Differentiation Theory
Classes in a Module
Finding the Usage

Dealing with
Handling Missing Backward for Non- of Specific
Operations Vectors Dot Products Partial Derivatives Multiple Random
Data Scalar Variables Functions and
Variables
Classes
Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

Matrices Gradients
Mechanism Tensor Format Products Computation Variance
Computing the
Indexing and Matrix-Matrix
Tensors Chain Rule Gradient of Python
Slicing Multiplication
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Other Python +6 sections
Algebra
Objects
Preliminaries
• All machine learning is concerned with extracting information from data.
• Machine learning typically requires working with large datasets, which we can think of as tables
o The rows correspond to examples, and
o The columns correspond to attributes.
Sepal Sepal Petal Petal Species

length width length width Features or covariates
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa Labels
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
Instance or example
7.2 3.6 6.1 2.5 virginica
6.5 3.2 5.1 2.0 virginica
Example of Iris dataset
• Linear algebra gives us a powerful set of techniques for working with tabular data.
Preliminaries
• Additionally, deep learning is all about optimization.
o We have a model with some parameters and we want to find those that fit our data the best.
o Determining which way to move each parameter at each step of an algorithm requires a little bit of calculus.
o The package automatically computes differentiation for us.
Calculus- the mathematics of change
• Next, machine learning is concerned with making predictions:

o What is the likely value of some unknown attribute, given the information that we observe?
o To reason rigorously under uncertainty we will need to invoke the language of probability.
Preliminaries
• This chapter provides a rapid introduction to basic and frequently-used mathematics to allow anyone
to understand at least most of the mathematical content of the book.
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Data Manipulation
• There are two important things we need to do with data:
o Acquire them
o Process them once they are inside the computer.
• A tensor is an n dimensional array.

• No matter which framework you use, its tensor class
o in MXNet.
o in both PyTorch and TensorFlow.
• The tensor class is similar to NumPy’s with a few features:

o GPU is well-supported to accelerate the computation, The second-order Cauchy stress tensor (T) describes the stress forces
whereas NumPy only supports CPU computation. experienced by a material at a given point.
o The tensor class supports automatic differentiation.

Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Getting Started
• To start:
o Import the and modules from MXNet.
o The module includes functions supported by NumPy.
o module contains a set of extensions developed to empower deep learning within a NumPy-like environment.
from mxnet import np, npx
npx.set_np()
• When using tensors, we invoke the function:

o This is for compatibility of tensor processing by other components of MXNet.
• A tensor represents a (possibly multi-dimensional) array of

numerical values.
o With one axis, a tensor corresponds (in math) to a vector.
o With two axes, a tensor corresponds to a matrix.
o Tensors with more than two axes do not have special
mathematical names.
Source: https://www.i2tutorials.com/what-do-you-mean-by-tensor-and-explain-about-tensor-dataty
Getting Started
• We can use to create a row vector containing the first 12 integers starting with 0.
o They are created as floats by default.
x = np.arange(12)
x
• We can access a tensor’s shape (the length along each axis) by inspecting its property.
x.shape
• To know the total number of elements in a tensor, i.e., the product of all of the shape elements, we can inspect its size.
x.size
Getting Started
• To change the shape of a tensor without altering either the number of elements or their values,
invoke the function.
X = x.reshape(3, 4)
X
• Reshaping by manually specifying every dimension is unnecessary.

o If our target is a matrix with shape (height, width), then after we know the width, the height is given implicitly.
o Tensors can automatically work out one dimension given the rest.
We invoke this capability by placing -1 for the dimension that we would like tensors to automatically infer.
x.reshape(-1, 4)
#or
x.reshape(3, -1)
Getting Started
• We can create a tensor representing a tensor with all elements set to 0 and a shape of (2, 3, 4) as follows:
np.zeros((2, 3, 4))
• Similarly, we can create tensors with each element set to 1 as follows:

np.ones((2, 3, 4))
Getting Started
• Often, we want to randomly sample the values for each element in a tensor from some probability distribution.
o For example, when we construct arrays to serve as parameters in a neural network,
we initialize their values randomly.
• The following snippet creates a tensor with shape (3, 4). Each of its elements is randomly sampled from a standard
Gaussian (normal) distribution with a mean of 0 and a standard deviation of 1.
np.random.normal(0, 1, size=(3, 4))
• We can also specify the exact values for each element in the desired tensor by supplying a Python list (or list of lists)
containing the numerical values.
np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Operations
• Some of the simplest and most useful operations are the elementwise operations.
o These apply a standard scalar operation to each element of an array.
Source: https://www.freepik.com/free-photo/mathematics-with-numbers-pi-symbol_6625946.htm#page=2&query=mathematics&position=32
• We would denote:
o A unary scalar operator (taking one input) by the signature .
o A binary scalar operator (taking two real inputs, and yielding one output) by the signature: .
o Given any two vectors u and v of the same shape, and a binary operator , we can produce a vector c = F(u,v)
by setting for all where and are the element of the vectors c, u, and v.
o Here, we produced the vector-valued by lifting the scalar function to an elementwise
vector operation.
Operations
• In the following example, we use commas to formulate a 5-element tuple, where each element is the result of an
elementwise operation.
x = np.array([1, 2, 4, 8])
y = np.array([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y # The ** operator is exponentiation
• Many more operations can be applied elementwise, including unary operators like exponentiation.
np.exp(x)
• In addition to elementwise computations, we can also perform linear algebra operations, including vector dot
products and matrix multiplication.
Operations
• We can also concatenate multiple tensors together, stacking them end-to-end to form a larger tensor.
• The example below shows what happens when we concatenate two matrices along:
o Rows- axis 0, the first element of the shape.
o Columns- axis 1, the second element of the shape.
X = np.arange(12).reshape(3, 4)
Y = np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
np.concatenate([X, Y], axis=0), np.concatenate([X, Y], axis=1)
• Sometimes, we want to construct a binary tensor via logical statements. Take X == Y as an example.
X == Y
• Summing all the elements in the tensor yields a tensor with only one element.
X.sum()
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Broadcasting Mechanism
• Under certain conditions, even when shapes differ, we can still perform elementwise operations by invoking the
broadcasting mechanism. This mechanism works in the following way:
o Expand one or both arrays by copying elements appropriately so that the two tensors have the same shape.
o Carry out the elementwise operations on the resulting arrays.
• In most cases, we broadcast along an axis where an array initially only has length 1, such as in the following example:
a = np.arange(3).reshape(3, 1)
b = np.arange(2).reshape(1, 2)
a, b
• We broadcast the entries of both matrices into a larger matrix as follows:

o For matrix it replicates the columns.
o For matrix it replicates the rows.
a + b
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Indexing and Slicing
• Elements in a tensor can be accessed by index.
o The first element has index 0 and ranges are specified to include the first but before the last element.
Indices
0 1 2 3 4 5 6 7
Slice 2:5
• We can access elements according to their relative position to the end of the list by using negative indices.
o Thus, selects the last element and selects the second and the third elements as follows:
X[-1], X[1:3]
• We can also write elements of a matrix by specifying indices.

X[1, 2] = 9
X
• To assign multiple elements the same value, we simply index all of them and then assign them the value.
X[0:2, :] = 12
X
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Saving Memory
• Running operations can cause new memory to be allocated to host results.
o For example, if we write Y = X + Y, we will dereference the tensor that Y used to point to and instead point Y
at the newly allocated memory.
o In the following example, we demonstrate this with Python’s function, which gives us the exact address
of the referenced object in memory.
before = id(Y)
Y = Y + X
id(Y) == before
• After running , we will find that points to a different location.

o That is because Python first evaluates Y + X, allocating new memory for the result and then makes Y point to
this new location in memory.
Y Value
𝑌 =𝑌 + 𝑋
Y Value
Result of
Saving Memory
• Allocating new memory for new results for the same variable might be undesirable for two reasons:
o We do not want to run around allocating memory unnecessarily all the time.
o We might point at the same parameters from multiple variables.
• In machine learning, we might have hundreds of megabytes of parameters and update all of them multiple
times per second.
• If we do not update in place, other references will still point to the old memory location, making it possible for parts
of our code to inadvertently reference stale parameters.
Saving Memory
• Performing in-place operations is easy. We can assign the result of an operation to a previously allocated array with
slice notation, e.g., .
o For example, create a new matrix with the same shape as , using to allocate a block of entries.
Z = np.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))
• If the value of X is not reused in subsequent computations, we can also use X[:] = X + Y or X += Y to reduce the
memory overhead of the operation.
before = id(X)
X += Y
id(X) == before
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Conversion to Other Python Objects
• Converting to a NumPy tensor, or vice versa, is easy. The converted result does not share memory.
A = X.asnumpy()
B = np.array(A)
type(A), type(B)
• To convert a size-1 tensor to a Python scalar, we can invoke the function or Python’s built-in functions.
a = np.array([3.5])
a, a.item(), float(a), int(a)
Summary
• The main interface to store and manipulate data for deep learning is the tensor ( n -dimensional array).
It provides a variety of functionalities including basic mathematics operations, broadcasting, indexing, slicing,
memory saving, and conversion to other Python objects.
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Data Preprocessing
• To apply deep learning to solving real-world problems, we often begin with preprocessing raw data.
• package is a commonly used data analytic tool.
o can work together with tensors.
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Reading the Dataset
• We begin by creating an artificial dataset that is stored in a csv (comma-separated values) file
../data/house_tiny.csv.
o The following mkdir_if_not_exist function ensures that the directory ../data exists.
o is a special mark where the following function, class, or statements are saved in the d2l package
so later they can be directly invoked (e.g., ) without being redefined.
import os
def mkdir_if_not_exist(path): #@save
"""Make a directory if it does not exist."""
if not isinstance(path,str):
path = os.path.join(*path)
if not os.path.exists(path):
os.makedirs(path)
Reading the Dataset
• Below we write the dataset row by row into a csv file.
data_file = "../data/house_tiny.csv"
mkdir_if_not_exist("../data")
with open(data_file,'w') as f:
f.write('NumRooms,Alley,Price\n') # Column names
f.write('NA,Pave,127500\n') # Each row represents a data example
f.write('2,NA,106000\n')
f.write('4,NA,178100\n')
f.write('NA,NA,140000\n')
• This dataset has four rows and three columns, where each row describes the number of rooms (“NumRooms”),
the alley type (“Alley”), and the price (“Price”) of a house.
• To load the raw dataset from the created csv file, we import the package and invoke the function.
# If pandas is not installed, just uncomment the following line:
# !pip install pandas
import pandas as pd
data = pd.read_csv(data_file)
print(data)
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Handling Missing Data
• “NaN” entries are missing values. To handle missing data, typical methods include imputation and deletion:
o imputation replaces missing values with substituted ones. (that will be considered)
o deletion ignores missing values.
• By integer-location based indexing (), we split data into inputs and outputs,
o inputs takes the first two columns.
o outputs only keeps the last column.
• For numerical values in inputs that are missing, replace the “NaN” entries with the mean value of the same column.
• For categorical or discrete values in inputs, we consider “NaN” as a category.
inputs, outputs = data.iloc[:,0:2], data.iloc[:,2]
inputs = inputs.fillna(inputs.mean())
print(inputs)
• Since the “Alley” column only takes two types of categorical values “Pave” and “NaN”,
can automatically convert this column to two columns “Alley_Pave” and “Alley_nan”.
o A row whose alley type is “Pave” will set values to 1.
o A row whose alley type is “NaN” will set values to 0.
inputs = pd.get_dummies(inputs,dummy_na=True)
print(inputs)
Contents
CH2

Finding All the

Classes in a Module
Finding the Usage

Dealing with
Variables
Classes

Matrices Gradients
Computing the
Control Flow
Basic Properties of
Saving Memory Norms
Tensor Arithmetic
Conversion to
More on Linear
Algebra
Objects
Conversion to the Tensor Format
• Now that all the entries in inputs and outputs are numerical, they can be converted to the tensor format.
o Once data are in this format, they can be further manipulated with those tensor functionalities.
from mxnet import np
X,y = np.array(inputs.values), np.array(outputs.values)

X,y
Summary
• Like many other extension packages in the vast ecosystem of Python, can work together with tensors.
• Imputation and deletion can be used to handle missing data.

D2L CH2 Part1

Uploaded by

Copyright:

Available Formats

D2L CH2 Part1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

D2L CH2 Part1

Uploaded by

Copyright:

Available Formats

Dive into Deep Learning

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

Sepal Sepal Petal Petal Species

Calculus- the mathematics of change

• Next, machine learning is concerned with making predictions:

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

• A tensor is an n dimensional array.

• The tensor class is similar to NumPy’s with a few features:

o The tensor class supports automatic differentiation.

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

• When using tensors, we invoke the function:

• A tensor represents a (possibly multi-dimensional) array of

• Reshaping by manually specifying every dimension is unnecessary.

• Similarly, we can create tensors with each element set to 1 as follows:

np.random.normal(0, 1, size=(3, 4))

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

• We broadcast the entries of both matrices into a larger matrix as follows:

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

• We can also write elements of a matrix by specifying indices.

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

• After running , we will find that points to a different location.

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and

Data Data Linear Algebra Automatic

Finding All the

Finding the Usage

Broadcasting Conversion to the Matrix-Vector Detaching Expectation and