Assignment DL
Assignment DL
Name: R. ATULIYA
USN:21BTRCL131
Subject: DEEP LEARNING Semester: 6th SEM
Subject Code: 21CS6AM11 Total Hours: 45
Credits: 03 Hours per week: 03
Faculty Name: Prof. Sahana Shetty Academic Year: 2022-23
Due Date: 01.03.24
data = np.array([
[0, 2.93, 6.63],
[0, 2.53, 7.79],
[0, 3.57, 5.65],
[0, 3.16, 5.47],
[1, 2.58, 4.46],
[1, 2.16, 6.22],
[1, 3.27, 3.52]
])
ANSWER
1. Overfitting:
2. Underfitting:
3. Balanced Fit:
Forward Pass:
Loss Function:
Gradient Descent:
Repeat:
Name: R. ATULIYA
USN: 21BTRCL131
Subject: Deep learning Semester: 6th SEM
Subject Code: 21CS6AM11 Total Hours: 45
Credits: 03 Hours per week: 03
Faculty Name: Ms. Sahana Shetty Academic Year: 2022-23
Due Date: 01.03.24
Dataset: Let us assume that you have a dataset with pictures of both
healthy and sick crops, each characterizing the specific type of
disease in the picture.
2. Approach:
- Assemble and come up with a set of images that simulate different
disease conditions and healthy crop images.
- Make available a deep learning model, like a convolutional neural
network (CNN), to segment images into either healthy or ill classes.
- Make model suitable on given dataset by using tuning hyper
parameters and architecture in order to improve performance.
- Make the model a valid one using an independent dataset or cross-
validation techniques to guarantee that it does not overfit.
3. Deployment:
- Deploy the model into a crop monitoring tools where the machine
can analyze the images from drones, satellites, or smartphones in
real time.
- Farmers will be able to spot early symptoms of a disease in their
crops before a substantial loss occurs, allowing them to take timely
measures such as spraying pesticides or doing crop rotation.
4. Benefits:
- Early Detection: The capability of deep learning facilitates an
early detection of farm diseases, which, in turn, enables farmers to
take timely measures in order to confine the damages to minimum
varieties only.
- Precision Agriculture: The method through which the problematic
regions using within the field are accurately identified would help
ensure that the intervention is targeted as much as possible with
minimum amounts of pesticides being sprayed making the
environmental impact minimal.
- Increased Yield: In time the dilemma of disease management can
be handled successfully, and both crop health and yield gain is
ensured, then food security and economic sustainability are
achieved.
5. Challenges:
- Data Quality: Including more varieties in the data sets and quality
in it comes along with the goal of obtaining more Robust model
that be adapted for new environment and disease types.
- Interpretability: The deep learning models, being the emphasis on
the latter of of CNNs, are usually perceived as black boxes, making
very much more difficult to get behind probable reasoning that
leads to their predictions. The utility of attention mechanisms, for
example, finds the balance between conceptual understanding and
practical handling of various situations.
3. You are tasked with developing a deep learning model for facial CO2 L3
expression recognition. Given images of faces, the model needs
to classify the facial expressions into one of several categories
such as happiness, sadness, anger, etc. Discuss the potential
advantages and limitations of each variant in the context of
optimizing the model's performance for accurately classifying
facial expressions. (all the types discussed in class)
ANSWER
1. Convolutional Neural Networks (CNNs):
Advantages:
- Visual networks, CNNs, are predisposed for processing image-
oriented tasks, such as facial recognition (of emotions), thanks to
their capability of learning hierarchical features straight from the
raw pixel data.
- They can observe and capture local spatial patterns of facial
geometry which contain the features of eyes, the shape of nose
and mouth etc and their variations that are very specific for
recognizing facial expressions.
- CNNs have capabilities of accepting the input images of every
size and every angle making them potentially suited to different
datasets of facial expressions.
Limitations:
- The CNNs might need a huge amount of narrated data for
training to provide them a possibility of complexity in facial
expressions that can be difficult and costly to achieve.
- They might be not able to convert the new forms or variation of
expression in lighting, pose and face appearance in sample data.
- Occasionally unneeded computational power and time may be
consumed if CNNs are taught from the beginning to recognize
facial expressions.
Advantages:
- Mostly, RNNs, especially the LSTM versions, have capacity to
track the correlations of varying facial expressions in temporal
sequence, and therefore grasp the dynamic expressions of the
images extending over multiple frames.
- They are good in dealing with sequential data with just a
scratch and make memory representations of previous sequences
of facial landmarks or video frames.
- The RNNs, being the best choice to face this kind of situations
where faces develop from the start till the end like in videos or
real life feeds.
Limitations:
- RNNs could be weak in capturing long-range dependencies, or
even losing small details in facial expressions through the long
time, this might result in the error called Exploding or Vanishing
gradients.
- Hyperparameter fine-tuning is necessary for their stability; this
may involve adjusting the learning rate, sequence length, and
other parameters to prevent training overfitting and enhance
training convergence.
- Running RNNs on multi-scale datasets will incur a
computational cost with increased time required for process-
learning, even when you are only dealing with high-resolution
videos.
ANSWER
The steps involved in training a Restricted Boltzmann Machine
(RBM) for a recommendation system using the given dataset:The
steps involved in training a Restricted Boltzmann Machine (RBM)
for a recommendation system using the given dataset:
1. Data Preprocessing:
- Change the data-set into a model's choice of training the RBM.
- Use one dimension vector to demonstrate the customer
interactions (e.g., browsing history or purchase behavior ) and
represent it into a binary vector where each element shows the
presence or absence of a specific item or activity.
- Normalization or scaling of the data is required if the data behave
differently, such as they are non-uniform, for the sake of consistent
input scale of the RBM.
2. Initialization:
- The parameters of a Boltzmann machine should be initialized,
including the synaptic weights W connecting the visible and
hidden layers, the visible layer biases a and the hidden layer biases
- Specify the parameters with random initialization or apply
techniques like Xavier or He initialization that set initial values to
calculate the initial values.
3.Training:
- Get held the CD (CD) or PCD (PCD), respectively, to train the
RBM.
- Pick out a chunk of customer interaction data through sampling
from the dataset.
- Execute Gibbs sampling to update the hidden units using the
visible units' values; the same logic should be applied to the visible
units, updating their values with the help of those from the hidden
units.
- Calculate the gradients of the RBM parameters utilizing the
customized function of the contrastive divergence algorithm.
- Reconfigure the RBM with gradient descent method or any its
alternative (example: stochastic gradient descent, Adam) by
altering weights and biases.
4. Repeat:
- Keep modifying strategy for multiple epochs until meeting the
convergence criteria.
- Assess the accuracy of training by conducting an examination of
reconstruction error or other indicators of the performance metrics
on a validation dataset.
5. Model Evaluation:
- Assess the trained RBM model's performance using a validation
set on a level of reaching to the learn's accuracy.
- Use either accuracy, precision, and recall as a measure of the
model performance or mean squared error (MSE) as a measure of
the error between the predicted target value and the actual observed
target value.
- Determine and compare the RBM's performance to another
recommendation method or a baseline method to determine
whether it is effective.
7. Deployment:
- The RBM model should be trained and evaluated until its
performance is good enough to proceed, then, integrate it into a
production environment to serve individualized recommendations
to customers.
- Implement the RBM inside the business recommendation system
framework, which gives ability in processing customers' real-time
interactions and providing relevant content.
Theoretical Analysis:
This upper bound on the capacity of an N-neuron Hopfield
network, theoretically, is around 0.138N, which is J. J. Hopfield’s
result according to the seminal work on associative memory. This
is analogous to saying that jamming has plentiful patterns stored
within the network network which they resemble as independent
and random. In other scenario, it means that there is a maximum
number of patterns in the network with precise recalling.
Calculation based on Overlap:
The comparison method with the Hopfield network's capacity can
be also assessed by analyzing the network’s ability to learn by its
overlapping patterns. Commonly, in speech recognition, the pattern
capacity is computed as a dot product of stored patterns. Following
John Hopfield and David Tank model, the network’s critical
capacity can be computed by 0.15N, where N is the neuron number
of the given network.
Practical Considerations:
In reality, the operation of the Hopfield Network might be less than
the theoretical and critical limits because what limits these are the
factors such as noise, pattern correlation as well as dynamic of the
network. The actual capacity of a Hopfield network may fluctuate
depending on particular implementation details like network
architecture, learning rule, and the distress that is caused by noise.
DEPARTMENT OF CSE (AI/ML)
Name: R. ATULIYA
USN: 21BTRCL131
Subject: Deep learning Semester: 6th SEM
Subject Code: 21CS6AM11 Total Hours: 45
Credits: 03 Hours per week: 03
Faculty Name: Ms. Sahana Shetty Academic Year: 2022-23
Due Date: 08.03.24
2. AlexNet:
- Accuracy: By this we would get a figure of nearly one hundred
percent since in CIFAR-10 it performs just so-so, giving a little less
accuracy than modern architectures such as ResNet, VGG and
others. It is capable of meeting the accuracy requirement with
numbers in the range of 80 – 85%.
- Model Complexity: AlexNet has a moderate model complexity
compared to ResNet. It is usually preferred for applications that are
not resource-intensive, such as small-scale machine learning
models or low-cost smartphones.
- Training Time: As per experience, AlexNet takes less time to
train than the more complex networks like ResNet, however, the
amount of hardware capacity it needs might still be large.
3. VGG16:
- Accuracy: VGG16 is widely recognized for it being less
complicated and yet still efficient. It produces good performance
competing with that of CIFAR-10 which is slightly lower in the
range of 85-89%.
- Model Complexity: VGG16's model complexity is mainly due to
its deep structure and small catch sizes of filters. The parameters of
VGG16 are greater than AlexNet while they are smaller than
ResNet.
- Training Time: Train-time of VGG16 is quite heavy since of its
depth and multiple parameters. Nevertheless, ResNet simplicity as
opposed to the Conv Net can favour faster convergence.
4. Stacked CNN:
- Accuracy: For an architect with a stacked model, multiple
convolutional layers are placed one after the other. The precision
provided by stacked CNNs relies on the structure of the involved
architecture and the depth. With more than enough complexity,
they can match the accuracy of VGG or ResNet if implemented
well.
- Model Complexity: By manipulating two parameters:the number
of layers and filter sizes, we end up with models of varying
complexity. As the architectures grow deeper, they become more
complex and have to sequentially use many parameters.
- Training Time: The depth and complexity of stacked CNNs is
the factor that will determine their training time. More profound
structure might need more long training but attaining an
outstanding result in the possible way.
5. Dilated CNN:
- Accuracy: Big CNNs use dilated convolutions to scale up a
receptive area avoiding parameter increase at the same time. They
may win in CIFAR-10 like VGG or ResNet by the measures of the
most competitive accuracy.
- Model Complexity: Dilated CNNs are a middle-class model
complexity level, whereas other structures as VGG or ResNet are
considered with larger model complexity. They rather have fewer
paramets since the dilation convolutions are in the game.
- Training Time: Investigated time for dilated CNNs in most cases
does not differ from it for VGG or ResNet type of CNNs, because
they are similar in their model complexity.
7. LeNet:
-Accuracy: LeNet being one of first CNNs introduced and might
not yield works as the accuracies achieved by modern architectures
in CIFAR-10 dataset. It can show an accuracy of up to plus/minus
70%.
-Model Complexity: Unlike VGG and ResNet, LeNet has a
simplistic model and, therefore, can be considered a low level
model complexity model. It is modelized with a lesser number of
parameters which makes it less calculationally intensive.
-Training Time: Humanize: LeNet and the deeper architectures
VGG and ResNet are two platforms being used for training. The
simplicity and lower model complexity of the LeNet is the reason
for faster training and high performance compared to the VGG and
ResNet.
Name: R. ATULIYA
USN: 21BTRCL131
Subject: Deep learning Semester: 6th SEM
Subject Code: 21CS6AM11 Total Hours: 45
Credits: 03 Hours per week: 03
Faculty Name: Ms. Sahana Shetty Academic Year: 2022-23
Due Date: 15.04.24
3. Speech Recognition:
4. Sequence-to-Sequence Learning:
1. Image Processing:
2. Graph-Structured Data:
4. Document Understanding:
np.random.seed(0)
data = np.random.randn(sequence_length, input_size)
learning_rate = 0.01
epochs = 1000
if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {loss}")
print("\nPredicted sequence:")
for i, element in enumerate(predicted_sequence):
print(f"Element {i+1}: {element}")
The dataset contains audio files of music tracks along with their
corresponding genre labels. Each audio file is represented as a
sequence of audio samples, and each music track is associated
with a genre label indicating its genre category (e.g., rock, pop,
jazz, classical).
Implementation Steps:
1. Data Preprocessing:
- Load the audio files and their corresponding genre labels.
- Preprocess the audio data by extracting features such as Mel-
Frequency Cepstral Coefficients (MFCCs), spectrograms, or other
representations suitable for audio data.
- Divide the dataset into training, validation, and test sets.
2. Model Architecture:
- Design an LSTM-based neural network architecture for music
genre classification.
- The input to the LSTM network will be the sequence of audio
features extracted from the audio files.
- Add one or more LSTM layers followed by fully connected
layers with appropriate activation functions.
- The output layer will have units equal to the number of genre
categories, with a softmax activation function to output genre
probabilities.
3. Training:
- Train the LSTM model using the training dataset.
- Define appropriate loss function, such as categorical cross-
entropy, and optimizer, such as Adam or RMSprop.
- Monitor the model's performance on the validation set to
prevent overfitting by using techniques like early stopping or
dropout.
4. Evaluation:
- Evaluate the trained LSTM model on the test dataset to
measure its performance in classifying music genres.
- Calculate metrics such as accuracy, precision, recall, and F1-
score to assess the model's classification performance.
5. Deployment:
- Once the LSTM model achieves satisfactory performance,
deploy it in production environments for music genre
classification tasks.
- Integrate the model into applications or systems where music
genre classification is required, such as music streaming platforms
or recommendation systems.