Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook760 pages7 hours

Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

Rating: 0 out of 5 stars

()

Read preview
  • Neural Networks

  • Deep Learning

  • Machine Learning

  • Artificial Intelligence

  • Natural Language Processing

  • Robot Uprising

  • Technological Progress

  • Enemies to Lovers

  • Strong Female Protagonist

  • Misunderstandings

  • Space Opera

  • Mad Scientist

  • Ai Takeover

  • Social Differences

  • Technophobia

  • Convolutional Neural Networks

  • Object Detection

  • Computer Vision

  • Generative Adversarial Networks

  • Autonomous Vehicles

About this ebook

Gain expertise in advanced deep learning domains such as neural networks, meta-learning, graph neural networks, and memory augmented neural networks using the Python ecosystem

Key Features
  • Get to grips with building faster and more robust deep learning architectures
  • Investigate and train convolutional neural network (CNN) models with GPU-accelerated libraries such as TensorFlow and PyTorch
  • Apply deep neural networks (DNNs) to computer vision problems, NLP, and GANs
Book Description

In order to build robust deep learning systems, you’ll need to understand everything from how neural networks work to training CNN models. In this book, you’ll discover newly developed deep learning models, methodologies used in the domain, and their implementation based on areas of application.

You’ll start by understanding the building blocks and the math behind neural networks, and then move on to CNNs and their advanced applications in computer vision. You'll also learn to apply the most popular CNN architectures in object detection and image segmentation. Further on, you’ll focus on variational autoencoders and GANs. You’ll then use neural networks to extract sophisticated vector representations of words, before going on to cover various types of recurrent networks, such as LSTM and GRU. You’ll even explore the attention mechanism to process sequential data without the help of recurrent neural networks (RNNs). Later, you’ll use graph neural networks for processing structured data, along with covering meta-learning, which allows you to train neural networks with fewer training samples. Finally, you’ll understand how to apply deep learning to autonomous vehicles.

By the end of this book, you’ll have mastered key deep learning concepts and the different applications of deep learning models in the real world.

What you will learn
  • Cover advanced and state-of-the-art neural network architectures
  • Understand the theory and math behind neural networks
  • Train DNNs and apply them to modern deep learning problems
  • Use CNNs for object detection and image segmentation
  • Implement generative adversarial networks (GANs) and variational autoencoders to generate new images
  • Solve natural language processing (NLP) tasks, such as machine translation, using sequence-to-sequence models
  • Understand DL techniques, such as meta-learning and graph neural networks
Who this book is for

This book is for data scientists, deep learning engineers and researchers, and AI developers who want to further their knowledge of deep learning and build innovative and unique deep learning projects. Anyone looking to get to grips with advanced use cases and methodologies adopted in the deep learning domain using real-world examples will also find this book useful. Basic understanding of deep learning concepts and working knowledge of the Python programming language is assumed.

LanguageEnglish
Release dateDec 12, 2019
ISBN9781789952711
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

Related to Advanced Deep Learning with Python

Related ebooks

Intelligence (AI) & Semantics For You

View More

Reviews for Advanced Deep Learning with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Advanced Deep Learning with Python - Ivan Vasilev

    Advanced Deep Learning with Python.

    Advanced Deep Learning with Python

    Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch

    Ivan Vasilev

    BIRMINGHAM - MUMBAI

    Advanced Deep Learning with Python

    Copyright © 2019 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Commissioning Editor: Pravin Dhandre

    Acquisition Editor: Devika Battike

    Content Development Editor: Nathanya Dias

    Senior Editor: Ayaan Hoda

    Technical Editor: Manikandan Kurup

    Copy Editor: Safis Editing

    Project Coordinator: Aishwarya Mohan

    Proofreader: Safis Editing

    Indexer: Tejal Daruwale Soni

    Production Designer: Nilesh Mohite

    First published: December 2019

    Production reference: 1111219

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-78995-617-7

    www.packt.com

    Packt.com

    Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

    Why subscribe?

    Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

    Improve your learning with Skill Plans built especially for you

    Get a free eBook or video every month

    Fully searchable for easy access to vital information

    Copy and paste, print, and bookmark content

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

    At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

    Contributors

    About the author

    Ivan Vasilev started working on the first open source Java deep learning library with GPU support in 2013. The library was acquired by a German company, where he continued to develop it. He has also worked as a machine learning engineer and researcher in the area of medical image classification and segmentation with deep neural networks. Since 2017, he has been focusing on financial machine learning. He is working on a Python-based platform that provides the infrastructure to rapidly experiment with different machine learning algorithms for algorithmic trading. Ivan holds an MSc degree in artificial intelligence from the University of Sofia, St. Kliment Ohridski.

    About the reviewer

    Saibal Dutta has been working as an analytical consultant in SAS Research and Development. He is also pursuing a PhD in data mining and machine learning from IIT, Kharagpur. He holds an M.Tech in electronics and communication from the National Institute of Technology, Rourkela. He has worked at TATA communications, Pune, and HCL Technologies Limited, Noida, as a consultant. In his 7 years of consulting experience, he has been associated with global players including IKEA (in Sweden) and Pearson (in the US). His passion for entrepreneurship led him to create his own start-up in the field of data analytics. His areas of expertise include data mining, artificial intelligence, machine learning, image processing, and business consultation.

    Packt is searching for authors like you

    If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

    Table of Contents

    Title Page

    Copyright and Credits

    Advanced Deep Learning with Python

    About Packt

    Why subscribe?

    Contributors

    About the author

    About the reviewer

    Packt is searching for authors like you

    Preface

    Who this book is for

    What this book covers

    To get the most out of this book

    Download the example code files

    Download the color images

    Conventions used

    Get in touch

    Reviews

    Section 1: Core Concepts

    The Nuts and Bolts of Neural Networks

    The mathematical apparatus of NNs

    Linear algebra

    Vector and matrix operations

    Introduction to probability

    Probability and sets

    Conditional probability and the Bayes rule

    Random variables and probability distributions

    Probability distributions

    Information theory

    Differential calculus

    A short introduction to NNs

    Neurons

    Layers as operations

    NNs

    Activation functions

    The universal approximation theorem

    Training NNs

    Gradient descent

    Cost functions

    Backpropagation

    Weight initialization

    SGD improvements

    Summary

    Section 2: Computer Vision

    Understanding Convolutional Networks

    Understanding CNNs

    Types of convolutions

    Transposed convolutions

    1×1 convolutions

    Depth-wise separable convolutions

    Dilated convolutions

    Improving the efficiency of CNNs

    Convolution as matrix multiplication

    Winograd convolutions

    Visualizing CNNs

    Guided backpropagation

    Gradient-weighted class activation mapping

    CNN regularization

    Introducing transfer learning

    Implementing transfer learning with PyTorch

    Transfer learning with TensorFlow 2.0

    Summary

    Advanced Convolutional Networks

    Introducing AlexNet

    An introduction to Visual Geometry Group 

    VGG with PyTorch and TensorFlow

    Understanding residual networks

    Implementing residual blocks

    Understanding Inception networks

    Inception v1

    Inception v2 and v3

    Inception v4 and Inception-ResNet

    Introducing Xception

    Introducing MobileNet

    An introduction to DenseNets

    The workings of neural architecture search

    Introducing capsule networks

    The limitations of convolutional networks

    Capsules

    Dynamic routing

    The structure of the capsule network

    Summary

    Object Detection and Image Segmentation

    Introduction to object detection

    Approaches to object detection

    Object detection with YOLOv3

    A code example of YOLOv3 with OpenCV

    Object detection with Faster R-CNN

    Region proposal network

    Detection network

    Implementing Faster R-CNN with PyTorch

    Introducing image segmentation

    Semantic segmentation with U-Net

    Instance segmentation with Mask R-CNN

    Implementing Mask R-CNN with PyTorch

    Summary

    Generative Models

    Intuition and justification of generative models

    Introduction to VAEs

    Generating new MNIST digits with VAE

    Introduction to GANs

    Training GANs

    Training the discriminator

    Training the generator

    Putting it all together

    Problems with training GANs

    Types of GAN

    Deep Convolutional GAN

    Implementing DCGAN

    Conditional GAN

    Implementing CGAN

    Wasserstein GAN

    Implementing WGAN

    Image-to-image translation with CycleGAN

    Implementing CycleGAN

    Building the generator and discriminator

    Putting it all together

    Introducing artistic style transfer

    Summary

    Section 3: Natural Language and Sequence Processing

    Language Modeling

    Understanding n-grams

    Introducing neural language models

    Neural probabilistic language model

    Word2Vec

    CBOW

    Skip-gram

    fastText

    Global Vectors for Word Representation model

    Implementing language models

    Training the embedding model

    Visualizing embedding vectors

    Summary

    Understanding Recurrent Networks

    Introduction to RNNs

    RNN implementation and training

    Backpropagation through time

    Vanishing and exploding gradients

    Introducing long short-term memory

    Implementing LSTM

    Introducing gated recurrent units

    Implementing GRUs

    Implementing text classification

    Summary

    Sequence-to-Sequence Models and Attention

    Introducing seq2seq models

    Seq2seq with attention

    Bahdanau attention

    Luong attention

    General attention

    Implementing seq2seq with attention

    Implementing the encoder

    Implementing the decoder

    Implementing the decoder with attention

    Training and evaluation

    Understanding transformers

    The transformer attention

    The transformer model

    Implementing transformers

    Multihead attention

    Encoder

    Decoder

    Putting it all together

    Transformer language models

    Bidirectional encoder representations from transformers

    Input data representation

    Pretraining

    Fine-tuning

    Transformer-XL

    Segment-level recurrence with state reuse

    Relative positional encodings

    XLNet

    Generating text with a transformer language model

    Summary

    Section 4: A Look to the Future

    Emerging Neural Network Designs

    Introducing Graph NNs

    Recurrent GNNs

    Convolutional Graph Networks

    Spectral-based convolutions

    Spatial-based convolutions with attention

    Graph autoencoders

    Neural graph learning

    Implementing graph regularization

    Introducing memory-augmented NNs

    Neural Turing machines

    MANN*

    Summary

    Meta Learning

    Introduction to meta learning

    Zero-shot learning

    One-shot learning

    Meta-training and meta-testing

    Metric-based meta learning

    Matching networks for one-shot learning

    Siamese networks

    Implementing Siamese networks

    Prototypical networks

    Optimization-based learning

    Summary

    Deep Learning for Autonomous Vehicles

    Introduction to AVs

    Brief history of AV research

    Levels of automation

    Components of an AV system 

    Environment perception

    Sensing

    Localization

    Moving object detection and tracking

    Path planning

    Introduction to 3D data processing

    Imitation driving policy

    Behavioral cloning with PyTorch

    Generating the training dataset

    Implementing the agent neural network

    Training

    Letting the agent drive

    Putting it all together

    Driving policy with ChauffeurNet

    Input and output representations

    Model architecture

    Training

    Summary

    Other Books You May Enjoy

    Leave a review - let other readers know what you think

    Preface

    This book is a collection of newly evolved deep learning models, methodologies, and implementations based on the areas of their application. In the first section of the book, you will learn about the building blocks of deep learning and the math behind neural networks (NNs). In the second section, you'll focus on convolutional neural networks (CNNs) and their advanced applications in computer vision (CV). You'll learn to apply the most popular CNN architectures in object detection and image segmentation. Finally, you'll discuss variational autoencoders and generative adversarial networks.

    In the third section, you'll focus on natural language and sequence processing. You'll use NNs to extract sophisticated vector representations of words. You'll discuss various types of recurrent networks, such as long short-term memory (LSTM) and gated recurrent unit (GRU). Finally, you'll cover the attention mechanism to process sequential data without the help of recurrent networks. In the final section, you'll learn how to use graph NNs to process structured data. You'll cover meta-learning, which allows you to train an NN with fewer training samples. And finally, you'll learn how to apply deep learning in autonomous vehicles.

    By the end of this book, you'll have gained mastery of the key concepts associated with deep learning and evolutionary approaches to monitoring and managing deep learning models.

    Who this book is for

    This book is for data scientists, deep learning engineers and researchers, and AI developers who want to master deep learning and want to build innovative and unique deep learning projects of their own. This book will also appeal to those who are looking to get well-versed with advanced use cases and the methodologies adopted in the deep learning domain using real-world examples. Basic conceptual understanding of deep learning and a working knowledge of Python is assumed.

    What this book covers

    Chapter 1, The Nuts and Bolts of Neural Networks, will briefly introduce what deep learning is and then discuss the mathematical underpinnings of NNs. This chapter will discuss NNs as mathematical models. More specifically, we'll focus on vectors, matrices, and differential calculus. We'll also discuss some gradient descent variations, such as Momentum, Adam, and Adadelta, in depth. We will also discuss how to deal with imbalanced datasets.

    Chapter 2, Understanding Convolutional Networks, will provide a short description of CNNs. We'll discuss CNNs and their applications in CV

    Chapter 3, Advanced Convolutional Networks, will discuss some advanced and widely used NN architectures, including VGG, ResNet, MobileNets, GoogleNet, Inception, Xception, and DenseNets. We'll also implement ResNet and Xception/MobileNets using PyTorch.

    Chapter 4, Object Detection and Image Segmentation, will discuss two important vision tasks: object detection and image segmentation. We'll provide implementations for both of them. 

    Chapter 5, Generative Models, will begin the discussion about generative models. In particular, we'll talk about generative adversarial networks and neural style transfer. The particular style transfer will be implemented later.

    Chapter 6, Language Modeling, will introduce word and character-level language models. We'll also talk about word vectors (word2vec, Glove, and fastText) and we'll use Gensim to implement them. We'll also walk through the highly technical and complex process of preparing text data for machine learning applications such as topic modeling and sentiment modeling with the help of the Natural Language ToolKit's (NLTK) text processing techniques.

    Chapter 7, Understanding Recurrent Networks, will discuss the basic recurrent networks, LSTM, and GRU cells. We'll provide a detailed explanation and pure Python implementations for all of the networks.

    Chapter 8, Sequence-to-Sequence Models and Attention, will discuss sequence models and the attention mechanism, including bidirectional LSTMs, and a new architecture called transformer with encoders and decoders. 

    Chapter 9, Emerging Neural Network Designs, will discuss graph NNs and NNs with memory, such as Neural Turing Machines (NTM), differentiable neural computers, and MANN.

    Chapter 10, Meta Learning, will discuss meta learning—the way to teach algorithms how to learn. We'll also try to improve upon deep learning algorithms by giving them the ability to learn more information using less training samples.

    Chapter 11, Deep Learning for Autonomous Vehicles, will explore the applications of deep learning in autonomous vehicles. We'll discuss how to use deep networks to help the vehicle make sense of its surrounding environment.

    To get the most out of this book

    To get the most out of this book, you should be familiar with Python and have some knowledge of machine learning. The book includes short introductions to the major types of NNs, but it will help if you are already familiar with the basics of NNs.

    Download the example code files

    You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

    You can download the code files by following these steps:

    Log in or register at www.packt.com.

    Select the Support tab.

    Click on Code Downloads.

    Enter the name of the book in the Search box and follow the onscreen instructions.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR/7-Zip for Windows

    Zipeg/iZip/UnRarX for Mac

    7-Zip/PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789956177_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Build the full GAN model by including the generator, discriminator, and the combined network.

    A block of code is set as follows:

    import matplotlib.pyplot as plt

    from matplotlib.markers import MarkerStyle

    import numpy as np

    import tensorflow as tf

    from tensorflow.keras import backend as K

    from tensorflow.keras.layers import Lambda, Input, Dense

    Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The collection of all possible outcomes (events) of an experiment is called, sample space."

    Warnings or important notes appear like this.

    Tips and tricks appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

    Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Reviews

    Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

    For more information about Packt, please visit packt.com.

    Section 1: Core Concepts

    This section will discuss some core Deep Learning (DL) concepts: what exactly DL is, the mathematical underpinnings of DL algorithms, and the libraries and tools that make it possible to develop DL algorithms rapidly.

    This section contains the following chapter:

    Chapter 1, The Nuts and Bolts of Neural Networks

    The Nuts and Bolts of Neural Networks

    In this chapter, we'll discuss some of the intricacies of neural networks (NNs)—the cornerstone of deep learning (DL). We'll talk about their mathematical apparatus, structure, and training. Our main goal is to provide you with a systematic understanding of NNs. Often, we approach them from a computer science perspective—as a machine learning (ML) algorithm (or even a special entity) composed of a number of different steps/components. We gain our intuition by thinking in terms of neurons, layers, and so on (at least I did this when I first learned about this field). This is a perfectly valid way to do things and we can still do impressive things at this level of understanding. Perhaps this is not the correct approach, though.

    NNs have solid mathematical foundations and if we approach them from this point of view, we'll be able to define and understand them in a more fundamental and elegant way. Therefore, in this chapter, we'll try to underscore the analogy between NNs from mathematical and computer science points of view. If you are already familiar with these topics, you can skip this chapter. Still, I hope that you'll find some interesting bits you didn't know about already (we'll do our best to keep this chapter interesting!).

    In this chapter, we will cover the following topics:

    The mathematical apparatus of NNs

    A short introduction to NNs

    Training NNs

    The mathematical apparatus of NNs

    In the next few sections, we'll discuss the mathematical branches related to NNs. Once we've done this, we'll connect them to NNs themselves.

    Linear algebra

    Linear algebra deals with linear equations such as 

     and linear transformations (or linear functions) and their representations, such as matrices and vectors.

    Linear algebra identifies the following mathematical objects:

    Scalars: A single number.

    Vectors: A one-dimensional array of numbers (or components). Each component of the array has an index. In literature, we will see vectors denoted either with a superscript arrow ( ) or in bold (x). The following is an example of a vector:

    Throughout this book, we'll mostly use the bold (x) graph notations. But in some instances, we'll use formulas from different sources and we'll try to retain their original notation.

    We can visually represent an n-dimensional vector as the coordinates of a point in an n-dimensional Euclidean space,  (equivalent to a coordinate system). In this case, the vector is referred to as Euclidean and each vector component represents the coordinate along the corresponding axis, as shown in the following diagram: 

    Vector representation in   space

    However, the Euclidean vector is more than just a point and we can also represent it with the following two properties:

    Magnitude (or length) is a generalization of the Pythagorean theorem for an n-dimensional space:

    Directionis the angle of the vector along each axis of the vector space.

    Matrices: This is a two-dimensional array of numbers. Each element is identified by two indices (row and column). A matrix is usually denoted with a bold capital letter; for example, A. Each matrix element is denoted with the small matrix letter and a subscript index; for example, aij. Let's look at an example of the matrix notation in the following formula:

    We can represent a vector as a single-column n×1 matrix (referred to as a column matrix) or a single -ow 1×n matrix (referred to as a row matrix).

    Tensors: Before we explain them, we have to start with a disclaimer. Tensors originally come from mathematics and physics, where they have existed long before we started using them in ML. The tensor definition in these fields differs from the ML one. For the purposes of this book, we'll only consider tensors in the ML context. Here, a tensor is a multi-dimensional array with the following properties:

    Rank: Indicates the number of array dimensions. For example, a tensor of rank 2 is a matrix, a tensor of rank 1 is a vector, and a tensor of rank 0 is a scalar. However, the tensor has no limit on the number of dimensions. Indeed, some types of NNs use tensors of rank 4. 

    Shape: The size of each dimension.

    The data type of the tensor elements. These can vary between libraries, but typically include 16-, 32-, and 64-bit float and 8-, 16-, 32-, and 64-bit integers.

    Contemporary DL libraries such as TensorFlow and PyTorch use tensors as their main data structure.

    You can find a thorough discussion on the nature of tensors here: https://stats.stackexchange.com/questions/198061/why-the-sudden-fascination-with-tensors. You can also check the TensorFlow (https://www.tensorflow.org/guide/tensors) and PyTorch (https://pytorch.org/docs/stable/tensors.html) tensor definitions.

    Now that we've introduced the types of objects in linear algebra, in the next section, we'll discuss some operations that can be applied to them.

    Vector and matrix operations

    In this section, we'll discuss the vector and matrix operations that are relevant to NNs. Let's start:

    Vector addition is the operation of adding two or more vectors together into an output vector sum. The output is another vector and is computed with the following formula:

    The dot (or scalar) product takes two vectors and outputs a scalar value. We can compute the dot product with the following formula:

    Here, |a| and |b| are the vector magnitudes and θ is the angle between the two vectors. Let's assume that the two vectors are n-dimensional and that their components are a1b1a2b2, and so on. Here, the preceding formula is equivalent to the following:

    The dot product of two two-dimensional vectors, a and b, is illustrated in the following diagram:

    The dot product of vectors. Top: vector components; Bottom: dot product of the two vectors

    The dot product acts as a kind of similarity measure between the two vectors—if the angle θ between the two vectors is small (the vectors have similar directions), then their dot product will be higher because of 

    Following this idea, we can define a cosine similarity between two vectors as follows:

    The cross (or vector)product takes two vectors and outputs another vector, which is perpendicular to both initial vectors. We can compute the magnitude of the cross product output vector with the following formula:

    The following diagram shows an example of a cross product between two two-dimensional vectors:

    Cross product of two two-dimensional vectors

    As we mentioned previously, the output vector is perpendicular to the input vectors, which also means that the vector is normal to the plane containing them. The magnitude of the output vector is equal to the area of the parallelogram with the vectors a and b for sides (denoted in the preceding diagram).

    We can also define a vector through vector space, which is a collection of objects (in our case, vectors) that can be added together and multiplied by a scalar value. The vector space will allow us to define a linear transformation as a function, f, which can transform each vector (point) of vector space, V, into a vector (point) of another vector space, W : 

    . f has to satisfy the following requirements for any two vectors, 

    :

    Additivity: 

    Homogeneity: 

    , wherecis a scalar

    Matrix transpose: Here, we flip the matrix along its main diagonal (the main diagonal is the collection of matrix elements, aij, where i = j). The transpose operation is denoted with superscript,  . To clarify, the cell   of   is equal to the cell   of :

    The transpose of an m×n matrix is an n×m matrix. The following are a few transpose examples:

    Matrix-scalar multiplication is the multiplication of a matrix by a scalar value. In the following example,  is a scalar:

    Matrix-matrix addition is the element-wise addition of one matrix with another. For this operation, both matrices must have the same size. The following is an example:

    Matrix-vector multiplication is the multiplication of a matrix by a vector. For this operation to be valid, the number of matrix columns must be equal to the vector length. The result of multiplying the m×n matrix and an n-dimensional vector is an m-dimensional vector. The following is an example:

    We can think of each row of the matrix as a separate n-dimensional vector. Here, each element of the output vector is the dot product between the corresponding matrix row and x. The following is a numerical example:

    Matrix multiplication is the multiplication of one matrix with another. To be valid, the number of columns of the first matrix has to be equal to the number of rows of the second (this is a non-commutative operation). We can think of this operation as multiple matrix-vector multiplications, where each column of the second matrix is one vector. The result of an m×n matrix multiplied by an n×p matrix is an m×p matrix. The following is an example:

    If we consider two vectors as row matrices, we can represent a vector dot product as matrix multiplication, that is, 

    .

    This concludes our introduction to linear algebra. In the next section, we'll introduce the probability theory. 

    Introduction to probability

    In this section, we'll discuss some of the aspects of probability and statistics that are relevant to NNs.

    Let's start by introducing the concept of a statistical experiment, which has the following properties:

    Consists of multiple independent trials.

    The outcome of each trial is non-deterministic; that is, it's determined by chance.

    It has more than one possible outcome. These outcomes are known as events (we'll also discuss events in the context of sets in the following section).

    All the possible outcomes of the experiment are known in advance.

    One example of a statistical experiment is a coin toss, which has two possible outcomes—heads or tails. Another example is a dice throw with six possible outcomes: 1, 2, 3, 4, 5, and 6. 

    We'll define probability as the likelihood that some event, e, would occur and we'll denote it with P(e). The probability is a number in the range of [0, 1], where 0 indicates that the event cannot occur and 1 indicates that it will always occur. If P(e) = 0.5, there is a 50-50 chance the event would occur, and so on.

    There are two ways we can approach probability:

    Theoretical: The event we're interested in compared to the total number of possible events. All the events are equally as likely:

    To understand this, let's use the coin toss example with two possible outcomes. The theoretical probability of each possible outcome is P(heads) = P(tails) = 1/2. The theoretical probability for each of the sides of a dice throw would be 1/6. 

    Empirical: This is the number of times an event we're interested in occurs compared to the total number of trials:

    The result of the experiment may show that the events aren't equally likely. For example, let's say that we toss a coin 100 times and that we observe heads 56 times. Here, the empirical probability for heads is P(heads) = 56 / 100 = 0.56. The higher the number of trials, the more accurate the calculated probability is (this is known as the law of large numbers).

    In the next section, we'll discuss probability in the context of sets.

    Probability and sets

    The collection of all possible outcomes (events) of an experiment is called, sample space. We can think of the sample space as a mathematical set. It is usually denoted with a capital letter and we can list all the set outcomes with {} (the same as Python sets). For example, the sample space of coin toss events is Sc = {heads, tails}, while for dice rows it's Sd = {1, 2, 3, 4, 5, 6}. A single outcome of the set (for example, heads) is called a sample point. An event is an outcome (sample point) or a combination of outcomes (subset) of the sample space. An example of a combined event is for the dice to land on an even number, that is, {2, 4, 6}.

    Let's assume that we have a sample space S = {1, 2, 3, 4, 5} and two subsets (events) A = {1, 2, 3} and B = {3, 4, 5}. Here, we can do the following operations with them:

    Intersection: The result is a new set that contains only the elements found in both sets:

    Sets whose intersections are empty sets {} are disjoint.

    Complement: The result is a new set that contains all the elements of the sample space that aren't included in a given set:

    Union: The result is a new set that contains the elements that can be found in either set:

    The following Venn diagrams illustrate these different set relationships:

    Venn diagrams of the possible set relationships

    We can transfer the set properties to events and their probabilities. We'll assume that the events are independent—the occurrence of one event doesn't affect the probability of the occurrence of another. For example, the outcomes of the different coin tosses are independent of one another. That being said, let's learn how to translate the set operations in the events domain:

    The intersection of two events is a subset of the outcomes, contained in both events. The probability of the intersection is called joint probability and is computed via the following formula:

    Let's say that we want to compute the probability of a card being red (either hearts or diamonds) and a Jack. The probability for red is P(red) = 26/52 = 1/2. The probability for getting a Jack is P(Jack) = 4/52 = 1/13. Therefore, the joint probability is P(red, Jack) = (1/2) * (1/13) = 1/26. In this example, we assumed that the two events are independent. However, the two events occur at the same time (we draw a single card). Had they occurred successively, for example, two card draws, where one is a Jack and the other is red, we would enter the realm of conditional probability. This joint probability is also denoted as P(A, B) or P(AB).

    The probability of the occurrence of a single event P(A) is also known as marginal probability (as opposed to joint probability).

    Two events are disjoint (or mutually exclusive) if they don't share any outcomes. That is, their respective sample space subsets are disjoint. For example, the events of odd or even dice rows are disjoint. The following is true for the probability of disjoint events: 

    The joint probability of disjoint events (the probability for these events to occur simultaneously) is P(A∩B) = 0.

    The sum of the probabilities of disjoint events is 

    .

    If the subsets of multiple events contain the whole sample space between themselves, they are jointly exhaustive. Events A and B from the preceding example are jointly exhaustive because, together, they fill up the whole sample space (1 through 5). The following is true for the probability of jointly exhaustive events:

    If we only have two events that are disjoint and jointly exhaustive at the same time, the events are complement. For example, odd and even dice throw events are complement. 

    We'll refer to outcomes coming from either A or B (not necessarily in both) as the union of A and B. The probability of this union is as follows:

    So far, we've discussed independent events. In the next section, we'll focus on dependent ones.

    Conditional probability and the Bayes rule

    If the occurrence of event A changes the probability of the occurrence of event B, where A occurs before B, then the two are dependent. To illustrate this concept, let's imagine that we draw multiple cards sequentially from the deck. When the deck is full, the probability to draw hearts is P(hearts) = 13/52 = 0.25. But once we've drawn the first card, the probability to pick hearts on the second turn changes. Now, we only have 51 cards and one less heart. We'll call the probability of the second draw conditional probability and we'll denote it with P(B|A). This is the probability of event B (second draw), given that event A has occurred (first draw). To continue with our example, the probability of picking hearts on the second draw becomes P(hearts2|hearts1) = 12/51 = 0.235.

    Next, we can extend the joint probability formula (introduced in the preceding section) in terms of dependent events. The formula is as follows:

    However, the preceding equation is just a special case for two events. We can extend this further for multiple events, A1, A2, ..., An. This new generic formula is known as the chain rule of probability:

    For example, the chain rule for three events is as follows:

    We can also derive the formula for the conditional probability itself:

    This formula makes sense for the following reasons:

    P(A ∩ B) states that we're interested in the occurrences of B, given that A has already occurred. In other words, we're interested in the joint occurrence of the events, hence the joint probability.

    P(A) states that we're interested only in the subset of outcomes when event A has occurred. We already know that A has occurred and therefore we restrict our observations to these outcomes. 

    The following holds true for dependent events:

    Using this equation, we can replace the value of P(A∩B) in the conditional probability formula to come up with the following:

    The preceding

    Enjoying the preview?
    Page 1 of 1