Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
By Ivan Vasilev
()
Neural Networks
Deep Learning
Machine Learning
Artificial Intelligence
Natural Language Processing
Robot Uprising
Technological Progress
Enemies to Lovers
Strong Female Protagonist
Misunderstandings
Space Opera
Mad Scientist
Ai Takeover
Social Differences
Technophobia
Convolutional Neural Networks
Object Detection
Computer Vision
Generative Adversarial Networks
Autonomous Vehicles
About this ebook
Gain expertise in advanced deep learning domains such as neural networks, meta-learning, graph neural networks, and memory augmented neural networks using the Python ecosystem
Key Features- Get to grips with building faster and more robust deep learning architectures
- Investigate and train convolutional neural network (CNN) models with GPU-accelerated libraries such as TensorFlow and PyTorch
- Apply deep neural networks (DNNs) to computer vision problems, NLP, and GANs
In order to build robust deep learning systems, you’ll need to understand everything from how neural networks work to training CNN models. In this book, you’ll discover newly developed deep learning models, methodologies used in the domain, and their implementation based on areas of application.
You’ll start by understanding the building blocks and the math behind neural networks, and then move on to CNNs and their advanced applications in computer vision. You'll also learn to apply the most popular CNN architectures in object detection and image segmentation. Further on, you’ll focus on variational autoencoders and GANs. You’ll then use neural networks to extract sophisticated vector representations of words, before going on to cover various types of recurrent networks, such as LSTM and GRU. You’ll even explore the attention mechanism to process sequential data without the help of recurrent neural networks (RNNs). Later, you’ll use graph neural networks for processing structured data, along with covering meta-learning, which allows you to train neural networks with fewer training samples. Finally, you’ll understand how to apply deep learning to autonomous vehicles.
By the end of this book, you’ll have mastered key deep learning concepts and the different applications of deep learning models in the real world.
What you will learn- Cover advanced and state-of-the-art neural network architectures
- Understand the theory and math behind neural networks
- Train DNNs and apply them to modern deep learning problems
- Use CNNs for object detection and image segmentation
- Implement generative adversarial networks (GANs) and variational autoencoders to generate new images
- Solve natural language processing (NLP) tasks, such as machine translation, using sequence-to-sequence models
- Understand DL techniques, such as meta-learning and graph neural networks
This book is for data scientists, deep learning engineers and researchers, and AI developers who want to further their knowledge of deep learning and build innovative and unique deep learning projects. Anyone looking to get to grips with advanced use cases and methodologies adopted in the deep learning domain using real-world examples will also find this book useful. Basic understanding of deep learning concepts and working knowledge of the Python programming language is assumed.
Related to Advanced Deep Learning with Python
Related ebooks
Hands-On Deep Learning Algorithms with Python: Master deep learning algorithms with extensive math by implementing them using TensorFlow Rating: 0 out of 5 stars0 ratingsMachine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Deep Learning with Keras Rating: 4 out of 5 stars4/5TensorFlow in 1 Day: Make your own Neural Network Rating: 4 out of 5 stars4/5Deep Learning with Keras: Beginner’s Guide to Deep Learning with Keras Rating: 3 out of 5 stars3/5Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsConvolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python Rating: 0 out of 5 stars0 ratingsDeep Learning Fundamentals in Python Rating: 4 out of 5 stars4/5Python Machine Learning Rating: 4 out of 5 stars4/5Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsMastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data Rating: 0 out of 5 stars0 ratingsDeep Learning with TensorFlow Rating: 5 out of 5 stars5/5Designing Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsMachine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques Rating: 5 out of 5 stars5/5Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow Rating: 0 out of 5 stars0 ratingsPython Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python Rating: 0 out of 5 stars0 ratingsOperationalizing Machine Learning Pipelines: Building Reusable and Reproducible Machine Learning Pipelines Using MLOps Rating: 0 out of 5 stars0 ratingsMachine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques Rating: 5 out of 5 stars5/5TensorFlow For Dummies Rating: 0 out of 5 stars0 ratingsHands-on Supervised Learning with Python Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
AI Money Machine: Unlock the Secrets to Making Money Online with AI Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Co-Intelligence: Living and Working with AI Rating: 4 out of 5 stars4/5AI for Educators: AI for Educators Rating: 3 out of 5 stars3/5Nexus: A Brief History of Information Networks from the Stone Age to AI Rating: 4 out of 5 stars4/5The Coming Wave: AI, Power, and Our Future Rating: 5 out of 5 stars5/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Some Future Day: How AI Is Going to Change Everything Rating: 0 out of 5 stars0 ratingsMake Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery Rating: 1 out of 5 stars1/5Coding with AI For Dummies Rating: 1 out of 5 stars1/5A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going Rating: 4 out of 5 stars4/5Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsThe AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions Rating: 2 out of 5 stars2/5100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi Rating: 0 out of 5 stars0 ratingsMidjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 3 out of 5 stars3/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from Rating: 5 out of 5 stars5/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5Artificial Intelligence For Dummies Rating: 3 out of 5 stars3/580 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5
Reviews for Advanced Deep Learning with Python
0 ratings0 reviews
Book preview
Advanced Deep Learning with Python - Ivan Vasilev
Advanced Deep Learning with Python
Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ivan Vasilev
BIRMINGHAM - MUMBAI
Advanced Deep Learning with Python
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pravin Dhandre
Acquisition Editor: Devika Battike
Content Development Editor: Nathanya Dias
Senior Editor: Ayaan Hoda
Technical Editor: Manikandan Kurup
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Production Designer: Nilesh Mohite
First published: December 2019
Production reference: 1111219
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78995-617-7
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, where he continued to develop it. He has also worked as a machine learning engineer and researcher in the area of medical image classification and segmentation with deep neural networks. Since 2017, he has been focusing on financial machine learning. He is working on a Python-based platform that provides the infrastructure to rapidly experiment with different machine learning algorithms for algorithmic trading. Ivan holds an MSc degree in artificial intelligence from the University of Sofia, St. Kliment Ohridski.
About the reviewer
Saibal Dutta has been working as an analytical consultant in SAS Research and Development. He is also pursuing a PhD in data mining and machine learning from IIT, Kharagpur. He holds an M.Tech in electronics and communication from the National Institute of Technology, Rourkela. He has worked at TATA communications, Pune, and HCL Technologies Limited, Noida, as a consultant. In his 7 years of consulting experience, he has been associated with global players including IKEA (in Sweden) and Pearson (in the US). His passion for entrepreneurship led him to create his own start-up in the field of data analytics. His areas of expertise include data mining, artificial intelligence, machine learning, image processing, and business consultation.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Table of Contents
Title Page
Copyright and Credits
Advanced Deep Learning with Python
About Packt
Why subscribe?
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Core Concepts
The Nuts and Bolts of Neural Networks
The mathematical apparatus of NNs
Linear algebra
Vector and matrix operations
Introduction to probability
Probability and sets
Conditional probability and the Bayes rule
Random variables and probability distributions
Probability distributions
Information theory
Differential calculus
A short introduction to NNs
Neurons
Layers as operations
NNs
Activation functions
The universal approximation theorem
Training NNs
Gradient descent
Cost functions
Backpropagation
Weight initialization
SGD improvements
Summary
Section 2: Computer Vision
Understanding Convolutional Networks
Understanding CNNs
Types of convolutions
Transposed convolutions
1×1 convolutions
Depth-wise separable convolutions
Dilated convolutions
Improving the efficiency of CNNs
Convolution as matrix multiplication
Winograd convolutions
Visualizing CNNs
Guided backpropagation
Gradient-weighted class activation mapping
CNN regularization
Introducing transfer learning
Implementing transfer learning with PyTorch
Transfer learning with TensorFlow 2.0
Summary
Advanced Convolutional Networks
Introducing AlexNet
An introduction to Visual Geometry Group
VGG with PyTorch and TensorFlow
Understanding residual networks
Implementing residual blocks
Understanding Inception networks
Inception v1
Inception v2 and v3
Inception v4 and Inception-ResNet
Introducing Xception
Introducing MobileNet
An introduction to DenseNets
The workings of neural architecture search
Introducing capsule networks
The limitations of convolutional networks
Capsules
Dynamic routing
The structure of the capsule network
Summary
Object Detection and Image Segmentation
Introduction to object detection
Approaches to object detection
Object detection with YOLOv3
A code example of YOLOv3 with OpenCV
Object detection with Faster R-CNN
Region proposal network
Detection network
Implementing Faster R-CNN with PyTorch
Introducing image segmentation
Semantic segmentation with U-Net
Instance segmentation with Mask R-CNN
Implementing Mask R-CNN with PyTorch
Summary
Generative Models
Intuition and justification of generative models
Introduction to VAEs
Generating new MNIST digits with VAE
Introduction to GANs
Training GANs
Training the discriminator
Training the generator
Putting it all together
Problems with training GANs
Types of GAN
Deep Convolutional GAN
Implementing DCGAN
Conditional GAN
Implementing CGAN
Wasserstein GAN
Implementing WGAN
Image-to-image translation with CycleGAN
Implementing CycleGAN
Building the generator and discriminator
Putting it all together
Introducing artistic style transfer
Summary
Section 3: Natural Language and Sequence Processing
Language Modeling
Understanding n-grams
Introducing neural language models
Neural probabilistic language model
Word2Vec
CBOW
Skip-gram
fastText
Global Vectors for Word Representation model
Implementing language models
Training the embedding model
Visualizing embedding vectors
Summary
Understanding Recurrent Networks
Introduction to RNNs
RNN implementation and training
Backpropagation through time
Vanishing and exploding gradients
Introducing long short-term memory
Implementing LSTM
Introducing gated recurrent units
Implementing GRUs
Implementing text classification
Summary
Sequence-to-Sequence Models and Attention
Introducing seq2seq models
Seq2seq with attention
Bahdanau attention
Luong attention
General attention
Implementing seq2seq with attention
Implementing the encoder
Implementing the decoder
Implementing the decoder with attention
Training and evaluation
Understanding transformers
The transformer attention
The transformer model
Implementing transformers
Multihead attention
Encoder
Decoder
Putting it all together
Transformer language models
Bidirectional encoder representations from transformers
Input data representation
Pretraining
Fine-tuning
Transformer-XL
Segment-level recurrence with state reuse
Relative positional encodings
XLNet
Generating text with a transformer language model
Summary
Section 4: A Look to the Future
Emerging Neural Network Designs
Introducing Graph NNs
Recurrent GNNs
Convolutional Graph Networks
Spectral-based convolutions
Spatial-based convolutions with attention
Graph autoencoders
Neural graph learning
Implementing graph regularization
Introducing memory-augmented NNs
Neural Turing machines
MANN*
Summary
Meta Learning
Introduction to meta learning
Zero-shot learning
One-shot learning
Meta-training and meta-testing
Metric-based meta learning
Matching networks for one-shot learning
Siamese networks
Implementing Siamese networks
Prototypical networks
Optimization-based learning
Summary
Deep Learning for Autonomous Vehicles
Introduction to AVs
Brief history of AV research
Levels of automation
Components of an AV system
Environment perception
Sensing
Localization
Moving object detection and tracking
Path planning
Introduction to 3D data processing
Imitation driving policy
Behavioral cloning with PyTorch
Generating the training dataset
Implementing the agent neural network
Training
Letting the agent drive
Putting it all together
Driving policy with ChauffeurNet
Input and output representations
Model architecture
Training
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Preface
This book is a collection of newly evolved deep learning models, methodologies, and implementations based on the areas of their application. In the first section of the book, you will learn about the building blocks of deep learning and the math behind neural networks (NNs). In the second section, you'll focus on convolutional neural networks (CNNs) and their advanced applications in computer vision (CV). You'll learn to apply the most popular CNN architectures in object detection and image segmentation. Finally, you'll discuss variational autoencoders and generative adversarial networks.
In the third section, you'll focus on natural language and sequence processing. You'll use NNs to extract sophisticated vector representations of words. You'll discuss various types of recurrent networks, such as long short-term memory (LSTM) and gated recurrent unit (GRU). Finally, you'll cover the attention mechanism to process sequential data without the help of recurrent networks. In the final section, you'll learn how to use graph NNs to process structured data. You'll cover meta-learning, which allows you to train an NN with fewer training samples. And finally, you'll learn how to apply deep learning in autonomous vehicles.
By the end of this book, you'll have gained mastery of the key concepts associated with deep learning and evolutionary approaches to monitoring and managing deep learning models.
Who this book is for
This book is for data scientists, deep learning engineers and researchers, and AI developers who want to master deep learning and want to build innovative and unique deep learning projects of their own. This book will also appeal to those who are looking to get well-versed with advanced use cases and the methodologies adopted in the deep learning domain using real-world examples. Basic conceptual understanding of deep learning and a working knowledge of Python is assumed.
What this book covers
Chapter 1, The Nuts and Bolts of Neural Networks, will briefly introduce what deep learning is and then discuss the mathematical underpinnings of NNs. This chapter will discuss NNs as mathematical models. More specifically, we'll focus on vectors, matrices, and differential calculus. We'll also discuss some gradient descent variations, such as Momentum, Adam, and Adadelta, in depth. We will also discuss how to deal with imbalanced datasets.
Chapter 2, Understanding Convolutional Networks, will provide a short description of CNNs. We'll discuss CNNs and their applications in CV
Chapter 3, Advanced Convolutional Networks, will discuss some advanced and widely used NN architectures, including VGG, ResNet, MobileNets, GoogleNet, Inception, Xception, and DenseNets. We'll also implement ResNet and Xception/MobileNets using PyTorch.
Chapter 4, Object Detection and Image Segmentation, will discuss two important vision tasks: object detection and image segmentation. We'll provide implementations for both of them.
Chapter 5, Generative Models, will begin the discussion about generative models. In particular, we'll talk about generative adversarial networks and neural style transfer. The particular style transfer will be implemented later.
Chapter 6, Language Modeling, will introduce word and character-level language models. We'll also talk about word vectors (word2vec, Glove, and fastText) and we'll use Gensim to implement them. We'll also walk through the highly technical and complex process of preparing text data for machine learning applications such as topic modeling and sentiment modeling with the help of the Natural Language ToolKit's (NLTK) text processing techniques.
Chapter 7, Understanding Recurrent Networks, will discuss the basic recurrent networks, LSTM, and GRU cells. We'll provide a detailed explanation and pure Python implementations for all of the networks.
Chapter 8, Sequence-to-Sequence Models and Attention, will discuss sequence models and the attention mechanism, including bidirectional LSTMs, and a new architecture called transformer with encoders and decoders.
Chapter 9, Emerging Neural Network Designs, will discuss graph NNs and NNs with memory, such as Neural Turing Machines (NTM), differentiable neural computers, and MANN.
Chapter 10, Meta Learning, will discuss meta learning—the way to teach algorithms how to learn. We'll also try to improve upon deep learning algorithms by giving them the ability to learn more information using less training samples.
Chapter 11, Deep Learning for Autonomous Vehicles, will explore the applications of deep learning in autonomous vehicles. We'll discuss how to use deep networks to help the vehicle make sense of its surrounding environment.
To get the most out of this book
To get the most out of this book, you should be familiar with Python and have some knowledge of machine learning. The book includes short introductions to the major types of NNs, but it will help if you are already familiar with the basics of NNs.
Download the example code files
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at www.packt.com.
Select the Support tab.
Click on Code Downloads.
Enter the name of the book in the Search box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789956177_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Build the full GAN model by including the generator, discriminator, and the combined network.
A block of code is set as follows:
import matplotlib.pyplot as plt
from matplotlib.markers import MarkerStyle
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Lambda, Input, Dense
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The collection of all possible outcomes (events) of an experiment is called, sample space."
Warnings or important notes appear like this.
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
Section 1: Core Concepts
This section will discuss some core Deep Learning (DL) concepts: what exactly DL is, the mathematical underpinnings of DL algorithms, and the libraries and tools that make it possible to develop DL algorithms rapidly.
This section contains the following chapter:
Chapter 1, The Nuts and Bolts of Neural Networks
The Nuts and Bolts of Neural Networks
In this chapter, we'll discuss some of the intricacies of neural networks (NNs)—the cornerstone of deep learning (DL). We'll talk about their mathematical apparatus, structure, and training. Our main goal is to provide you with a systematic understanding of NNs. Often, we approach them from a computer science perspective—as a machine learning (ML) algorithm (or even a special entity) composed of a number of different steps/components. We gain our intuition by thinking in terms of neurons, layers, and so on (at least I did this when I first learned about this field). This is a perfectly valid way to do things and we can still do impressive things at this level of understanding. Perhaps this is not the correct approach, though.
NNs have solid mathematical foundations and if we approach them from this point of view, we'll be able to define and understand them in a more fundamental and elegant way. Therefore, in this chapter, we'll try to underscore the analogy between NNs from mathematical and computer science points of view. If you are already familiar with these topics, you can skip this chapter. Still, I hope that you'll find some interesting bits you didn't know about already (we'll do our best to keep this chapter interesting!).
In this chapter, we will cover the following topics:
The mathematical apparatus of NNs
A short introduction to NNs
Training NNs
The mathematical apparatus of NNs
In the next few sections, we'll discuss the mathematical branches related to NNs. Once we've done this, we'll connect them to NNs themselves.
Linear algebra
Linear algebra deals with linear equations such as
and linear transformations (or linear functions) and their representations, such as matrices and vectors.
Linear algebra identifies the following mathematical objects:
Scalars: A single number.
Vectors: A one-dimensional array of numbers (or components). Each component of the array has an index. In literature, we will see vectors denoted either with a superscript arrow ( ) or in bold (x). The following is an example of a vector:
Throughout this book, we'll mostly use the bold (x) graph notations. But in some instances, we'll use formulas from different sources and we'll try to retain their original notation.
We can visually represent an n-dimensional vector as the coordinates of a point in an n-dimensional Euclidean space, (equivalent to a coordinate system). In this case, the vector is referred to as Euclidean and each vector component represents the coordinate along the corresponding axis, as shown in the following diagram:
Vector representation in space
However, the Euclidean vector is more than just a point and we can also represent it with the following two properties:
Magnitude (or length) is a generalization of the Pythagorean theorem for an n-dimensional space:
Directionis the angle of the vector along each axis of the vector space.
Matrices: This is a two-dimensional array of numbers. Each element is identified by two indices (row and column). A matrix is usually denoted with a bold capital letter; for example, A. Each matrix element is denoted with the small matrix letter and a subscript index; for example, aij. Let's look at an example of the matrix notation in the following formula:
We can represent a vector as a single-column n×1 matrix (referred to as a column matrix) or a single -ow 1×n matrix (referred to as a row matrix).
Tensors: Before we explain them, we have to start with a disclaimer. Tensors originally come from mathematics and physics, where they have existed long before we started using them in ML. The tensor definition in these fields differs from the ML one. For the purposes of this book, we'll only consider tensors in the ML context. Here, a tensor is a multi-dimensional array with the following properties:
Rank: Indicates the number of array dimensions. For example, a tensor of rank 2 is a matrix, a tensor of rank 1 is a vector, and a tensor of rank 0 is a scalar. However, the tensor has no limit on the number of dimensions. Indeed, some types of NNs use tensors of rank 4.
Shape: The size of each dimension.
The data type of the tensor elements. These can vary between libraries, but typically include 16-, 32-, and 64-bit float and 8-, 16-, 32-, and 64-bit integers.
Contemporary DL libraries such as TensorFlow and PyTorch use tensors as their main data structure.
You can find a thorough discussion on the nature of tensors here: https://stats.stackexchange.com/questions/198061/why-the-sudden-fascination-with-tensors. You can also check the TensorFlow (https://www.tensorflow.org/guide/tensors) and PyTorch (https://pytorch.org/docs/stable/tensors.html) tensor definitions.
Now that we've introduced the types of objects in linear algebra, in the next section, we'll discuss some operations that can be applied to them.
Vector and matrix operations
In this section, we'll discuss the vector and matrix operations that are relevant to NNs. Let's start:
Vector addition is the operation of adding two or more vectors together into an output vector sum. The output is another vector and is computed with the following formula:
The dot (or scalar) product takes two vectors and outputs a scalar value. We can compute the dot product with the following formula:
Here, |a| and |b| are the vector magnitudes and θ is the angle between the two vectors. Let's assume that the two vectors are n-dimensional and that their components are a1, b1, a2, b2, and so on. Here, the preceding formula is equivalent to the following:
The dot product of two two-dimensional vectors, a and b, is illustrated in the following diagram:
The dot product of vectors. Top: vector components; Bottom: dot product of the two vectors
The dot product acts as a kind of similarity measure between the two vectors—if the angle θ between the two vectors is small (the vectors have similar directions), then their dot product will be higher because of
.
Following this idea, we can define a cosine similarity between two vectors as follows:
The cross (or vector)product takes two vectors and outputs another vector, which is perpendicular to both initial vectors. We can compute the magnitude of the cross product output vector with the following formula:
The following diagram shows an example of a cross product between two two-dimensional vectors:
Cross product of two two-dimensional vectors
As we mentioned previously, the output vector is perpendicular to the input vectors, which also means that the vector is normal to the plane containing them. The magnitude of the output vector is equal to the area of the parallelogram with the vectors a and b for sides (denoted in the preceding diagram).
We can also define a vector through vector space, which is a collection of objects (in our case, vectors) that can be added together and multiplied by a scalar value. The vector space will allow us to define a linear transformation as a function, f, which can transform each vector (point) of vector space, V, into a vector (point) of another vector space, W :
. f has to satisfy the following requirements for any two vectors,
:
Additivity:
Homogeneity:
, wherecis a scalar
Matrix transpose: Here, we flip the matrix along its main diagonal (the main diagonal is the collection of matrix elements, aij, where i = j). The transpose operation is denoted with superscript, . To clarify, the cell of is equal to the cell of :
The transpose of an m×n matrix is an n×m matrix. The following are a few transpose examples:
Matrix-scalar multiplication is the multiplication of a matrix by a scalar value. In the following example, is a scalar:
Matrix-matrix addition is the element-wise addition of one matrix with another. For this operation, both matrices must have the same size. The following is an example:
Matrix-vector multiplication is the multiplication of a matrix by a vector. For this operation to be valid, the number of matrix columns must be equal to the vector length. The result of multiplying the m×n matrix and an n-dimensional vector is an m-dimensional vector. The following is an example:
We can think of each row of the matrix as a separate n-dimensional vector. Here, each element of the output vector is the dot product between the corresponding matrix row and x. The following is a numerical example:
Matrix multiplication is the multiplication of one matrix with another. To be valid, the number of columns of the first matrix has to be equal to the number of rows of the second (this is a non-commutative operation). We can think of this operation as multiple matrix-vector multiplications, where each column of the second matrix is one vector. The result of an m×n matrix multiplied by an n×p matrix is an m×p matrix. The following is an example:
If we consider two vectors as row matrices, we can represent a vector dot product as matrix multiplication, that is,
.
This concludes our introduction to linear algebra. In the next section, we'll introduce the probability theory.
Introduction to probability
In this section, we'll discuss some of the aspects of probability and statistics that are relevant to NNs.
Let's start by introducing the concept of a statistical experiment, which has the following properties:
Consists of multiple independent trials.
The outcome of each trial is non-deterministic; that is, it's determined by chance.
It has more than one possible outcome. These outcomes are known as events (we'll also discuss events in the context of sets in the following section).
All the possible outcomes of the experiment are known in advance.
One example of a statistical experiment is a coin toss, which has two possible outcomes—heads or tails. Another example is a dice throw with six possible outcomes: 1, 2, 3, 4, 5, and 6.
We'll define probability as the likelihood that some event, e, would occur and we'll denote it with P(e). The probability is a number in the range of [0, 1], where 0 indicates that the event cannot occur and 1 indicates that it will always occur. If P(e) = 0.5, there is a 50-50 chance the event would occur, and so on.
There are two ways we can approach probability:
Theoretical: The event we're interested in compared to the total number of possible events. All the events are equally as likely:
To understand this, let's use the coin toss example with two possible outcomes. The theoretical probability of each possible outcome is P(heads) = P(tails) = 1/2. The theoretical probability for each of the sides of a dice throw would be 1/6.
Empirical: This is the number of times an event we're interested in occurs compared to the total number of trials:
The result of the experiment may show that the events aren't equally likely. For example, let's say that we toss a coin 100 times and that we observe heads 56 times. Here, the empirical probability for heads is P(heads) = 56 / 100 = 0.56. The higher the number of trials, the more accurate the calculated probability is (this is known as the law of large numbers).
In the next section, we'll discuss probability in the context of sets.
Probability and sets
The collection of all possible outcomes (events) of an experiment is called, sample space. We can think of the sample space as a mathematical set. It is usually denoted with a capital letter and we can list all the set outcomes with {} (the same as Python sets). For example, the sample space of coin toss events is Sc = {heads, tails}, while for dice rows it's Sd = {1, 2, 3, 4, 5, 6}. A single outcome of the set (for example, heads) is called a sample point. An event is an outcome (sample point) or a combination of outcomes (subset) of the sample space. An example of a combined event is for the dice to land on an even number, that is, {2, 4, 6}.
Let's assume that we have a sample space S = {1, 2, 3, 4, 5} and two subsets (events) A = {1, 2, 3} and B = {3, 4, 5}. Here, we can do the following operations with them:
Intersection: The result is a new set that contains only the elements found in both sets:
Sets whose intersections are empty sets {} are disjoint.
Complement: The result is a new set that contains all the elements of the sample space that aren't included in a given set:
Union: The result is a new set that contains the elements that can be found in either set:
The following Venn diagrams illustrate these different set relationships:
Venn diagrams of the possible set relationships
We can transfer the set properties to events and their probabilities. We'll assume that the events are independent—the occurrence of one event doesn't affect the probability of the occurrence of another. For example, the outcomes of the different coin tosses are independent of one another. That being said, let's learn how to translate the set operations in the events domain:
The intersection of two events is a subset of the outcomes, contained in both events. The probability of the intersection is called joint probability and is computed via the following formula:
Let's say that we want to compute the probability of a card being red (either hearts or diamonds) and a Jack. The probability for red is P(red) = 26/52 = 1/2. The probability for getting a Jack is P(Jack) = 4/52 = 1/13. Therefore, the joint probability is P(red, Jack) = (1/2) * (1/13) = 1/26. In this example, we assumed that the two events are independent. However, the two events occur at the same time (we draw a single card). Had they occurred successively, for example, two card draws, where one is a Jack and the other is red, we would enter the realm of conditional probability. This joint probability is also denoted as P(A, B) or P(AB).
The probability of the occurrence of a single event P(A) is also known as marginal probability (as opposed to joint probability).
Two events are disjoint (or mutually exclusive) if they don't share any outcomes. That is, their respective sample space subsets are disjoint. For example, the events of odd or even dice rows are disjoint. The following is true for the probability of disjoint events:
The joint probability of disjoint events (the probability for these events to occur simultaneously) is P(A∩B) = 0.
The sum of the probabilities of disjoint events is
.
If the subsets of multiple events contain the whole sample space between themselves, they are jointly exhaustive. Events A and B from the preceding example are jointly exhaustive because, together, they fill up the whole sample space (1 through 5). The following is true for the probability of jointly exhaustive events:
If we only have two events that are disjoint and jointly exhaustive at the same time, the events are complement. For example, odd and even dice throw events are complement.
We'll refer to outcomes coming from either A or B (not necessarily in both) as the union of A and B. The probability of this union is as follows:
So far, we've discussed independent events. In the next section, we'll focus on dependent ones.
Conditional probability and the Bayes rule
If the occurrence of event A changes the probability of the occurrence of event B, where A occurs before B, then the two are dependent. To illustrate this concept, let's imagine that we draw multiple cards sequentially from the deck. When the deck is full, the probability to draw hearts is P(hearts) = 13/52 = 0.25. But once we've drawn the first card, the probability to pick hearts on the second turn changes. Now, we only have 51 cards and one less heart. We'll call the probability of the second draw conditional probability and we'll denote it with P(B|A). This is the probability of event B (second draw), given that event A has occurred (first draw). To continue with our example, the probability of picking hearts on the second draw becomes P(hearts2|hearts1) = 12/51 = 0.235.
Next, we can extend the joint probability formula (introduced in the preceding section) in terms of dependent events. The formula is as follows:
However, the preceding equation is just a special case for two events. We can extend this further for multiple events, A1, A2, ..., An. This new generic formula is known as the chain rule of probability:
For example, the chain rule for three events is as follows:
We can also derive the formula for the conditional probability itself:
This formula makes sense for the following reasons:
P(A ∩ B) states that we're interested in the occurrences of B, given that A has already occurred. In other words, we're interested in the joint occurrence of the events, hence the joint probability.
P(A) states that we're interested only in the subset of outcomes when event A has occurred. We already know that A has occurred and therefore we restrict our observations to these outcomes.
The following holds true for dependent events:
Using this equation, we can replace the value of P(A∩B) in the conditional probability formula to come up with the following:
The preceding