Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture 12 - Deep Learning

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Deep Learning

Santosh GSK
Industry Expert
What is Deep Learning?

• A machine learning subfield of learning representations of data, where we use algorithms that
attempt to learn (multiple levels of) representation by using a hierarchy of multiple neural
layers
• If you provide the system tons of information, it begins to understand it and respond in useful
ways.
• DL is exceptionally effective at learning patterns.
Background - Machine Learning Example

• Suppose we want to separate two categories of data by drawing a line between them in a
scatterplot.
• In the plot on the left, we represent some data using Cartesian coordinates, and the task is
impossible.
• In the plot on the right, we represent the data with polar coordinates and the task becomes
simple to solve with a vertical line.
Solution

• Use Machine Learning to discover not only the mapping from representation to output but the
representation as well
• This is called Representation Learning
• Enable AI systems to rapidly adapt to new tasks with minimal human intervention

• Manually designing features for complex task requires a great deal of human effort
• The quintessential example of representation learning is autoencoder.
Drawbacks of traditional learning

• A major source of difficulty in many real-world AI applications is that factors of variation influence
every single example of data we observe
• Most applications want us to disentangle the factors of variation and discard the ones we do not
care about
• Many of these factors of variation, such as speaker’s accent (in Speech Recognition), can be
identified only using sophisticated, nearly human-level understanding of data
• When it is nearly as difficult to obtain a representation as to solve the problem, representation
learning does not, at first glance, seem to help us
Deep Learning to rescue

• DL solves these problems in representation learning by introducing representations that are


expressed in terms of other, simpler representations
• DL enables the computer to build complex concepts out of simpler concepts.
• Other challenges like understanding non-linearity, complex data-types, feature engineering can
also be solved efficiently using Deep Learning
MLP vs Deep Learning
Illustrations of a Deep Learning Model

• Learning or evaluating such mapping


seems insurmountable if tackled directly
• DL solves this mapping by breaking a
complicated mapping into nested simple
mappings, each described at a layer
• A series of hidden layers extract abstract
features from the image. These layers are
called “hidden” because these values are
not given in the data
• The visualizations are a representation of
features at each hidden layer
• Given the input pixels, the first layer
identifies the edges
• The second layer can search for corners
and contours which are recognizable as
collection of edges
Illustrations of a Deep Learning Model

• Given the second layer’s representation of


edges, the third layer can detect entire
parts of a specific objects
• Finally, this description of image in terms
of object parts it contains can be used to
recognize the objects present in the image
Depth of a model

• The idea of learning the right representation for the data provides one perspective on deep
learning.
• Another perspective on deep learning is that depth allows the computer to learn a multi-step
computer program
• Each layer of the representation can be thought of as the state of the computer’s memory after
executing another set of instructions in parallel.
• Networks with greater depth can execute more instructions in sequence.
• Sequential instructions offer great power because later instructions can refer back to the results
of earlier instructions.
Depth of a model

• There are two main ways of measuring the depth of a model


1. The first view is based on the number of sequential instructions that must be executed to
evaluate the architecture.
1. We can think of this as the length of the longest path through a flow chart that describes how to
compute each of the model’s outputs given its inputs.
2. Just as two equivalent computer programs will have different lengths depending on which language
the program is written in,
3. The same function may be drawn as a flowchart with different depths depending on which functions
we allow to be used as individual steps in the flowchart.
Depth of a model

2. Another approach, used by deep probabilistic models, regards the depth of a model as being
not the depth of the computational graph but the depth of the graph describing how concepts
are related to each other.
1. This is because the system’s understanding of the simpler concepts can be refined given information
about the more complex concepts.
2. For example, an AI system observing an image of a face with one eye in shadow may initially only see
one eye. After detecting that a face is present, it can then infer that a second eye is probably present
as well.
3. The graph of concepts only includes two layers—a layer for eyes and a layer for faces—but the graph
of computations includes 2n layers if we refine our estimate of each concept given the other n times.
Depth of a model

• Because it is not always clear which of these two views is most relevant, and
• There is no single correct value for the depth of an architecture, just as there is no single correct
value for the length of a computer program.
• Nor is there a consensus about how much depth a model requires to qualify as “deep.”

• However, deep learning can safely be regarded as the study of models that either involve a
greater amount of composition of learned functions or learned concepts than traditional machine
learning does.

• Deep learning is a particular kind of machine learning that achieves great power and flexibility by
learning to represent the world as a nested hierarchy of concepts,
• with each concept defined in relation to simpler concepts, and
• more abstract representations computed in terms of less abstract ones
Deep Learning and AI
Flowchart of AI concepts
Convolutional Neural Networks

Convolutional
Input Image
3x3 filter
CNN – Filters
Convolution in 3 dimensions

= (1*0)+(1*0)+(1*1)+(1*1)+
(0*1)+(1*0)+(1*1)+(0*0) +
(1*1)+(2*1)+(3*1)+(4*1)

= 13

Even a 3D convolution gives a 2D


output
CNN Summary

To summarize, the Conv Layer:

● Accepts a volume of size W1×H1×D1


● Requires four hyperparameters:
○ Number of filters K,
○ their spatial extent F,
○ the stride S,
○ the amount of zero padding P.
● Produces a volume of size W2×H2×D2 where:
○ W2=(W1−F+2P)/S + 1
○ H2=(H1−F+2P)/S + 1
○ D2=K
● With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of (F⋅F⋅D1)⋅K weights and K biases.

A common setting of the hyperparameters is F=3,S=1,P=1. But varies for different types of problems and
architectures.
Why do CNNs work?

• The networks can be large and hence bias is minimized


• The weights are mostly zero because of convolution.
• ReLU(Activation Function) and Dropout makes even fewer weights.
• Hence, variance is minimized
When Context is important

• Sometimes, we need to have the contextual information to be able to perform Machine Learning
predictions

• E.g., Estimating the probabilities of words given the context


• The clouds are in the ___ (sky – no further context is needed)
• I fell in love with this French girl. My parents were against it initially as they were worried about the
cultural differences. However, after they met her they realized how wonderful a person she is and
agreed for our marriage. All that is left is convincing her parents. Here, I am booking my tickets to
______ (A lot of context is needed)
Recurrent Neural Network (RNN)
Tasks where context is useful
Tasks where context is useful
Thank You!
In our next session:
Optimization Models

You might also like