Unit - 3
Unit - 3
Unit - 3
1. Encoder:
- The encoder is the first part of the autoencoder and is responsible for
mapping the input data to a lower-dimensional representation. This lower-
dimensional representation is often called the "encoding" or "latent space."
- The encoder consists of one or more layers of neurons, and each layer
applies linear transformations (weighted sum of inputs) followed by a non-
linear activation function (such as sigmoid, tanh, or ReLU). These
transformations progressively reduce the dimensionality of the input data.
2. Latent Space:
- The output of the encoder is the compressed representation of the input
data in the latent space. This space should capture essential features of the
input data in a compact form. The size of the latent space is a crucial
hyperparameter that influences the trade-off between compression and
information retention.
3. Decoder:
- The decoder takes the compressed representation from the latent space
and attempts to reconstruct the original input data.
- Similar to the encoder, the decoder consists of one or more layers of
neurons. Each layer applies linear transformations followed by non-linear
activation functions.
- The final layer of the decoder produces the reconstructed output, which
ideally should closely resemble the input data.
4. Loss Function:
- The autoencoder is trained to minimize a loss function, which measures
the difference between the input data and the reconstructed output. The
choice of loss function depends on the nature of the data (e.g., mean
squared error for continuous data, binary cross-entropy for binary data).
5. Training:
- Autoencoders are trained using backpropagation and gradient descent or
variants like Adam. The training process aims to optimize the weights of the
encoder and decoder to minimize the reconstruction error.
Keras is a high-level neural networks API (Application Programming Interface) written
in Python. It's designed to be user-friendly, modular, and extensible, making it a
popular choice for building and experimenting with deep learning models.
1. Neural Networks:
• Keras is primarily used for building neural networks. Neural networks
are computational models inspired by the way the human brain works.
They consist of layers of interconnected nodes (neurons) that process
and transform input data to produce an output.
2. High-Level API:
• Keras provides a high-level interface for defining, training, and
evaluating neural network models. This means you can create powerful
deep learning models with relatively simple and concise code.
3. Modularity:
• Keras models are built using a modular approach. You start by defining
the structure of your model as a sequence of layers. Each layer
performs a specific operation, like input processing, feature extraction,
or output generation. This modularity makes it easy to construct
complex models by stacking and connecting simpler building blocks.
4. Ease of Use:
• One of Keras' key advantages is its user-friendly syntax. It abstracts
away many of the complexities of neural network implementation,
making it accessible to both beginners and experienced deep learning
practitioners. You can quickly prototype and experiment with different
architectures.
5. Backend Support:
• Keras is designed to be backend-agnostic, which means it can run on
top of various computational backends, such as TensorFlow, Theano, or
Microsoft Cognitive Toolkit (CNTK). TensorFlow is the default backend
for Keras since version 2.3.
6. TensorFlow Integration:
• Although Keras can work with different backends, it is often used with
TensorFlow, one of the most widely used deep learning frameworks.
This integration allows users to leverage the capabilities of both Keras
and TensorFlow seamlessly.
7. Training and Evaluation:
• Keras simplifies the training and evaluation process. You compile your
model with a specified optimizer, loss function, and metrics, then train
it on your data. The training process involves adjusting the model's
weights based on the provided data to minimize the specified loss.
After training, you can evaluate the model's performance on new,
unseen data.
o TensorFlow
TensorFlow is a Google product, which is one of the most famous deep
learning tools widely used in the research area of machine learning and deep
neural network. It came into the market on 9th November 2015 under the
Apache License 2.0. It is built in such a way that it can easily run on multiple
CPUs and GPUs as well as on mobile operating systems. It consists of various
wrappers in distinct languages such as Java, C++, or Python.
o Theano
Theano was developed at the University of Montreal, Quebec, Canada, by the
MILA group. It is an open-source python library that is widely used for
performing mathematical operations on multi-dimensional arrays by
incorporating scipy and numpy. It utilizes GPUs for faster computation and
efficiently computes the gradients by building symbolic graphs automatically.
It has come out to be very suitable for unstable expressions, as it first observes
them numerically and then computes them with more stable algorithms.
o CNTK
Microsoft Cognitive Toolkit is deep learning's open-source framework. It
consists of all the basic building blocks, which are required to form a neural
network. The models are trained using C++ or Python, but it incorporates C#
or Java to load the model for making predictions.
TensorFlow Technical Architecture:
o Sources create loaders for Servable Versions, and then loaders are sent as
Aspired versions to the Manager, which will load and serve them to client
requests.
o The Loader contains metadata, and it needs to load the servable.
o The source uses a callback to convey the Manager of Aspired version.
o The Manager applies the effective version policy to determine the next action
to take.
o If the Manager determines that it gives the Loader to load a new version,
clients ask the Manager for the servable, and specifying a version explicitly or
requesting the current version. The Manager returns a handle for servable. The
dynamic Manager applies the version action and decides to load the newer
version of it.
o The dynamic Manager commands the Loader that there is enough memory.
o A client requests a handle for the latest version of the model, and dynamic
Manager returns a handle to the new version of servable.
Advantages of TensorFlow
1) Graphs:
TensorFlow has better computational graph visualizations. Which are inherent when
compared to other libraries like Torch and Theano.
2) Library management:
Google backs it. And has the advantages of seamless performance, quick updates,
and frequent new releases with new features.
3) Debugging:
The libraries are deployed on a hardware machine, which is a cellular device to the
computer with a complex setup.
5) Pipelining:
TensorFlow is designed to use various backend software (GPUs, ASIC), etc. and also
highly parallel.
6) It has a unique approach that allows monitoring the training progress of our
models and tracking several metrics.
Disadvantages of TensorFlow
There is a wide variety of users who are comfortable in a window environment rather
than Linux, and TensorFlow doesn't satisfy these users. But we need not worry about
that if we are a window user we can also install it through conda or python package
library (pip).
3) Benchmark tests:
TensorFlow lacks in both speed and usage when it is compared to its competitors.
Currently, the single supported GPUs are NVIDIA and the only full language support
of Python, which makes it a drawback as there is a hike of other languages in deep
learning as well as the Lau.
5) Computation Speed:
8) TensorFlow has a unique structure, so it's hard to find an error and difficult to
debug.
Neural networks, also known as artificial neural networks (ANNs) or simulated neural
networks (SNNs), are a subset of machine learning and are at the heart of deep
learning algorithms. Their name and structure are inspired by the human brain,
mimicking the way that biological neurons signal to one another.
Artificial neural networks (ANNs) are comprised of a node layers, containing an input
layer, one or more hidden layers, and an output layer. Each node, or artificial neuron,
connects to another and has an associated weight and threshold. If the output of any
individual node is above the specified threshold value, that node is activated, sending
data to the next layer of the network. Otherwise, no data is passed along to the next
layer of the network.
Neural networks rely on training data to learn and improve their accuracy over time.
However, once these learning algorithms are fine-tuned for accuracy, they are powerful
tools in computer science and artificial intelligence, allowing us to classify and cluster
data at a high velocity. Tasks in speech recognition or image recognition can take
minutes versus hours when compared to the manual identification by human experts.
One of the most well-known neural networks is Google’s search algorithm.
Summation wi * xi + b
1. Image Recognition:
• Neural networks are widely used in image recognition tasks.
Convolutional Neural Networks (CNNs), a specialized type of neural
network, have shown remarkable success in tasks such as object
detection and facial recognition. They can learn hierarchical
representations of visual features, enabling accurate pattern
recognition in images.
2. Speech Recognition:
• Neural networks play a crucial role in speech recognition systems.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory
(LSTM) networks are commonly used to model temporal dependencies
in audio sequences, making them effective in recognizing spoken
words and phrases.
3. Handwriting Recognition:
• Neural networks are applied in recognizing handwritten characters and
text. They can learn to identify patterns in various handwriting styles,
making them useful in applications like optical character recognition
(OCR).
4. Biometric Identification:
• Neural networks are employed in biometric recognition systems, such
as fingerprint recognition, iris recognition, and face recognition. They
can learn unique patterns and features from biometric data, facilitating
accurate and secure identification.
5. Gesture Recognition:
• Neural networks can be used to recognize gestures in applications like
sign language interpretation or human-computer interaction. They
learn to identify patterns associated with different gestures and
interpret them accordingly.
6. Medical Image Analysis:
• Neural networks are utilized in the analysis of medical images for tasks
like tumor detection, organ segmentation, and disease diagnosis. Deep
learning models, including convolutional neural networks, excel in
learning intricate patterns within medical images.
7. Natural Language Processing (NLP):
• In NLP applications, neural networks are employed for tasks such as
sentiment analysis, text classification, and named entity recognition.
Recurrent and transformer-based architectures can capture sequential
and contextual patterns in language data.
8. Financial Pattern Recognition:
• Neural networks are applied in financial markets for recognizing
patterns in stock prices, predicting market trends, and identifying
potential trading opportunities. They can learn from historical data to
make predictions about future market behavior.
9. Quality Control in Manufacturing:
• Neural networks are used in quality control processes to recognize
patterns associated with defects in manufactured products. They can
analyze sensor data and images to identify anomalies and ensure the
quality of the production process.
Pattern Recognition
Pattern is everything around in this digital world. A pattern can either be seen
physically or it can be observed mathematically by applying algorithms.
Example: The colors on the clothes, speech pattern, etc. In computer science,
a pattern is represented using vector feature values.
What is Pattern Recognition?
Example: consider our face then eyes, ears, nose, etc are features of the face.
A set of features that are taken together, forms the features vector.
Example: In the above example of a face, if all the features (eyes, ears, nose,
etc) are taken together then the sequence is a feature vector([eyes, ears,
nose]). The feature vector is the sequence of a feature represented as a d-
dimensional column vector. In the case of speech, MFCC (Mel-frequency
Cepstral Coefficient) is the spectral feature of the speech. The sequence of
the first 13 features forms a feature vector.
• Training set:
The training set is used to build a model. It consists of the set of images
that are used to train the system. Training rules and algorithms are used
to give relevant information on how to associate input data with output
decisions. The system is trained by applying these algorithms to the
dataset, all the relevant information is extracted from the data, and
results are obtained. Generally, 80% of the data of the dataset is taken
for training data.
• Testing set:
Testing data is used to test the system. It is the set of data that is used
to verify whether the system is producing the correct output after being
trained or not. Generally, 20% of the data of the dataset is used for
testing. Testing data is used to measure the accuracy of the system. For
example, a system that identifies which category a particular flower
belongs to is able to identify seven categories of flowers correctly out of
ten and the rest of others wrong, then the accuracy is 70 %
Real-time Examples and Explanations:
A pattern is a physical object or an abstract notion. While talking about the
classes of animals, a description of an animal would be a pattern. While
talking about various types of balls, then a description of a ball is a pattern. In
the case balls considered as pattern, the classes could be football, cricket ball,
table tennis ball, etc. Given a new pattern, the class of the pattern is to be
determined. The choice of attributes and representation of patterns is a very
important step in pattern classification. A good representation is one that
makes use of discriminating attributes and also reduces the computational
burden in pattern classification.
Advantages:
• Pattern recognition solves classification problems
• Pattern recognition solves the problem of fake biometric detection.
• It is useful for cloth pattern recognition for visually impaired blind
people.
• It helps in speaker diarization.
• We can recognize particular objects from different angles.
Disadvantages:
• The syntactic pattern recognition approach is complex to implement
and it is a very slow process.
• Sometimes to get better accuracy, a larger dataset is required.
• It cannot explain why a particular object is recognized.
Example: my face vs my friend’s face.
Applications
• Image processing, segmentation, and analysis
Pattern Recognition is efficient enough to give machines human
recognition intelligence. This is used for image processing,
segmentation, and analysis. For example, computers can detect
different types of insects better than humans.
• Computer Vision
Using a pattern recognition system one can extract important features
from the images and videos. This is helpful in computer vision which is
applied in different fields’, especially biomedical imaging.
• Seismic Analysis
Decision-theoretic and syntactic pattern recognition techniques are
employed to detect the physical anomalies (bright spots) and to recognize
the structural seismic patterns in two-dimensional seismograms. Here,
decision-theoretic methods include Bayes classification, linear and quadratic
classifications, tree classification, partitioning-method, and tree classification,
and sequential classification [5].
• Speech Recognition
All of us have heard the names Siri, Alexa, and Cortona. These are all the
applications of speech recognition. Pattern recognition plays a huge part in
this technique.
• Fingerprint Identification
Many recognition approaches are there to perform Fingerprint Identification.
But pattern recognition system is the most used approach.
• Medical Diagnosis
Algorithms of pattern recognition deal with real data. It has been found that
pattern recognition has a huge role in today’s medical diagnosis. From breast
cancer detection to covid-19 checking algorithms are giving results with more
than 90% accuracy.
Speech recognition uses a broad array of research in computer science, linguistics and
computer engineering. Many modern devices and text-focused programs have speech
recognition functions in them to allow for easier or hands-free use of a device.
Speech recognition and voice recognition are two different technologies and should not be
confused:
Speech recognition software must adapt to the highly variable and context-specific nature of
human speech. The software algorithms that process and organize audio into text are trained
on different speech patterns, speaking styles, languages, dialects, accents and phrasings. The
software also separates spoken audio from background noise that often accompanies the
signal.
To meet these requirements, speech recognition systems use two types of models:
• Acoustic models. These represent the relationship between linguistic units of speech and
audio signals.
• Language models. Here, sounds are matched with word sequences to distinguish between
words that sound similar.
Speech recognition systems have quite a few applications. Here is a sampling of them.
Mobile devices. Smartphones use voice commands for call routing, speech-to-text
processing, voice dialing and voice search. Users can respond to a text without looking at
their devices. On Apple iPhones, speech recognition powers the keyboard and Siri, the virtual
assistant. Functionality is available in secondary languages, too. Speech recognition can also
be found in word processing applications like Microsoft Word, where users can dictate words
to be turned into text.
• Language weighting. This feature tells the algorithm to give special attention to certain
words, such as those spoken frequently or that are unique to the conversation or subject. For
example, the software can be trained to listen for specific product references.
• Acoustic training. The software tunes out ambient noise that pollutes spoken audio.
Software programs with acoustic training can distinguish speaking style, pace and volume
amid the din of many people speaking in an office.
• Speaker labeling. This capability enables a program to label individual participants and
identify their specific contributions to a conversation.
• Profanity filtering. Here, the software filters out undesirable words and language.
• Hidden Markov model. HMMs are used in autonomous systems where a state is partially
observable or when all of the information necessary to make a decision is not immediately
available to the sensor (in speech recognition's case, a microphone). An example of this is in
acoustic modeling, where a program must match linguistic units to audio signals using
statistical probability.
• Natural language processing. NLP eases and accelerates the speech recognition process.
• N-grams. This simple approach to language models creates a probability distribution for a
sequence. An example would be an algorithm that looks at the last few words spoken,
approximates the history of the sample of speech and uses that to determine the probability
of the next word or phrase that will be spoken.
• Artificial intelligence. AI and machine learning methods like deep learning and neural
networks are common in advanced speech recognition software. These systems use
grammar, structure, syntax and composition of audio and voice signals to process speech.
Machine learning systems gain knowledge with each use, making them well suited for
nuances like accents.
• Inconsistent performance. The systems may be unable to capture words accurately because
of variations in pronunciation, lack of support for some languages and inability to sort
through background noise. Ambient noise can be especially challenging. Acoustic training
can help filter it out, but these programs aren't perfect. Sometimes it's impossible to isolate
the human voice.
• Speed. Some speech recognition programs take time to deploy and master. The speech
processing may feel relatively slow.
• Source file issues. Speech recognition success depends on the recording equipment used,
not just the software.
Computer vision helps to understand the complexity of the human vision system and trains
computer systems to interpret and gain a high-level understanding of digital images or videos. In the
early days, developing a machine system having human-like intelligence was just a dream, but with
the advancement of artificial intelligence and machine learning, it also became possible. Similarly,
such intelligent systems have been developed that can "see" and interpret the world around them,
similar to human eyes. The fiction of yesterday has become the fact of today.
Computer vision is one of the most important fields of artificial intelligence (AI) and
computer science engineering that makes computer systems capable of extracting
meaningful information from visual data like videos and images. Further, it also helps to
take appropriate actions and make recommendations based on the extracted information.
Further, Artificial intelligence is the branch of computer science that primarily deals with
creating a smart and intelligent system that can behave and think like the human brain. So, we
can say if artificial intelligence enables computer systems to think intelligently, computer
vision makes them capable of seeing, analyzing, and understanding.
On a certain level, computer vision is all about pattern recognition which includes the
training process of machine systems for understanding the visual data such as images and
videos, etc.
Firstly, a vast amount of visual labeled data is provided to machines to train it. This labeled
data enables the machine to analyze different patterns in all the data points and can relate to
those labels. E.g., suppose we provide visual data of millions of dog images. In that case, the
computer learns from this data, analyzes each photo, shape, the distance between each shape,
color, etc., and hence identifies patterns similar to dogs and generates a model. As a result,
this computer vision model can now accurately detect whether the image contains a dog or
not for each input image.
These are a few important prerequisites that are essentially required to start your career in
computer vision technology. Once you are prepared with the above prerequisites, you can
easily start learning and make a career in Computer vision.
3. Robotics:
- Object Manipulation: Enabling robots to recognize and manipulate
objects based on visual input.
- Navigation: Providing robots with the ability to navigate and
understand their environment using visual information.
5. Agriculture:
- Crop Monitoring: Analyzing images to assess the health of crops and
detect diseases or pests.
- Harvesting Robots: Enabling robots to identify and harvest crops using
computer vision.
6. Entertainment:
- Gesture Recognition: Interacting with devices and games through
gestures captured by cameras.
- Content Tagging: Automatically tagging and categorizing multimedia
content based on visual features.
7. Healthcare:
- Biometric Authentication: Using facial or iris recognition for secure
access to medical records.
- Rehabilitation: Assisting in rehabilitation exercises by providing real-
time feedback based on visual analysis.
8. Environmental Monitoring:
- Wildlife Conservation: Monitoring wildlife populations and behaviors
through camera traps.
- Climate Analysis: Analyzing satellite imagery for climate and
environmental studies.
COMPUTER DEVICES
Characterized by the relationships between deep neural network instances (I)
and compute devices (D), DL computation paradigms can be classified into
three new categories beyond single instance single device (SISD), namely
multi-instance single device (MISD), single-instance multi-device (SIMD), and
multi-instance multi-device (MIMD), as shown in Figure 1.
A Data scientist defines the layers of a neural network with feedforward and
probably backward connections. For large models, this net may be partitioned
across multiple machines like shown in Figure 2. A framework that supports
model parallelism automatically parallelized the computations in each
machine using CPU and GPU of that machine. Googles DistBelief also
manages communication, synchronization, and data transfer between the
machines during both training and inference phase.
Data parallelism
2. Embeddings:
- In natural language processing and recommendation systems,
embeddings are often used to represent categorical variables.
Embeddings can be seen as a form of factorization, capturing latent
factors in the data. While the transformation may be non-linear, the idea
of capturing underlying factors is similar.
3. Principal Component Analysis (PCA) as a Linear Factor Model:
- PCA is a linear technique often used for dimensionality reduction. In
the context of deep learning, autoencoders (a type of neural network) can
be seen as non-linear extensions of PCA. Both methods aim to capture
the most important features or factors in the data.
4. Interpretability:
- Linear factor models are known for their interpretability, as the factor
loadings directly indicate the contribution of each variable to the common
factors. In deep learning, interpretability is often a challenge due to the
complex and non-linear nature of the models. Techniques like attention
mechanisms are introduced to enhance interpretability.
It's important to note that the primary strength of deep learning lies in its
ability to model highly complex and non-linear relationships in data,
which linear factor models might struggle to capture. While some
connections exist, deep learning models are generally more powerful and
flexible, often making them the preferred choice for tasks involving
intricate patterns and representations.
2. Distributed Computing:
- To efficiently process large datasets and train complex models,
distributed computing frameworks are often used. Technologies like
Apache Spark, TensorFlow's distributed computing capabilities, and
Apache Hadoop facilitate parallel processing across multiple machines or
clusters.
5. Distributed Training:
- Large-scale models are often trained using distributed training
methods, where different portions of the model or subsets of the data are
processed simultaneously across multiple devices or nodes. This helps
reduce training time significantly.
6. Batch and Stochastic Gradient Descent:
- Batch gradient descent involves updating model parameters based on
the average gradient computed over the entire dataset, while stochastic
gradient descent (SGD) updates parameters based on a single or a few
random samples. Large-scale deep learning often employs SGD and its
variants due to their scalability and efficiency.
7. Model Compression:
- Given the large size of models, model compression techniques are
often applied to reduce the memory and computation requirements,
making them more feasible for deployment on resource-constrained
devices.
9. Scalability Challenges:
- Managing the scalability of infrastructure, handling communication
overhead, and ensuring efficient data distribution are challenges in large-
scale deep learning that require careful consideration.