Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DL Practical

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

PRACTICAL-1

Aim: Creating a basic network and analyze its performance.

Material Required:

● Python
● TensorFlow Library
● NumPy Library
● scikit-learn Library
● Seaborn Library
● Matplotlib Library
● MNIST Dataset (automatically downloaded via TensorFlow)

Methodology:

Step 1: Load and Preprocess the MNIST Dataset

● The MNIST dataset consists of 70,000 handwritten digits (0-9).


● The data is split into training (60,000 samples) and testing (10,000 samples).
● Normalize the pixel values to a range of 0 to 1 for better model performance.

Step 2: Define the Neural Network Architecture

● Create a simple feedforward neural network.


● Use a Flatten layer to convert the 28x28 images into a 1D array.
● Add two hidden layers with 128 and 64 neurons, respectively, using the ReLU activation
function.
● The output layer consists of 10 neurons for the 10 digit classes, using the softmax activation
function.

Step 3: Compile the Model

● Compile the model using the Adam optimizer.


● Use sparse categorical cross-entropy as the loss function and track accuracy as a metric.

Step 4: Train the Model

● Fit the model to the training data for a specified number of epochs (e.g., 10).
● Validate the model using the test data.

Step 5: Generate Predictions

● Use the trained model to make predictions on the test dataset.


Step 6: Calculate Performance Metrics

● Compute the accuracy, error rate, precision, and recall of the model using scikit-learn
metrics.

Step 7: Print the Results

● Display the calculated metrics.

Code:

import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np

# Step 1: Load and Preprocess the MNIST dataset


(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the images to values between 0 and 1


X_train, X_test = X_train / 255.0, X_test / 255.0

# Step 2: Define the Neural Network Architecture


model = models.Sequential([
layers.Flatten(input_shape=(28, 28)), # Flatten the 28x28 images into 1D
layers.Dense(128, activation='relu'), # Hidden layer with 128 units
layers.Dense(64, activation='relu'), # Hidden layer with 64 units
layers.Dense(10, activation='softmax') # Output layer for 10 classes
])

# Step 3: Compile the Model


model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

# Step 4: Train the Model


model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

# Step 5: Generate Predictions


y_pred = np.argmax(model.predict(X_test), axis=1)

# Step 6: Calculate Accuracy, Error Rate, Precision, and Recall


accuracy = accuracy_score(y_test, y_pred)
error_rate = 1 - accuracy
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')

# Print the results


print(f'Accuracy: {accuracy}')
print(f'Error Rate: {error_rate}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')

Output:
PRACTICAL-2

Aim: To deploy a confusion matrix for evaluating a trained neural network model and to simulate
overfitting by using a complex model architecture on the MNIST dataset.
Material Required:
● Python 3.x
● TensorFlow
● NumPy
● Matplotlib
● Seaborn
● Jupyter Notebook or any Python IDE
Theory:
Confusion Matrix
● Definition: A confusion matrix is a table that is often used to describe the performance of a
classification model. It allows the visualization of the performance of an algorithm.
● Components:
○ True Positives (TP): Correctly predicted positive cases.
○ True Negatives (TN): Correctly predicted negative cases.
○ False Positives (FP): Incorrectly predicted positive cases.
○ False Negatives (FN): Incorrectly predicted negative cases.
Overfitting
● Definition: Overfitting occurs when a model learns the training data too well, including its
noise and outliers, resulting in poor generalization to new data.
● Indication: A significant gap between training accuracy and validation/test accuracy is a sign
of overfitting.
Experimental Procedure:
Step 1: Load and Preprocess the MNIST Dataset
python
Copy code
import tensorflow as tf

# Load the MNIST dataset


(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the images to values between 0 and 1


X_train, X_test = X_train / 255.0, X_test / 255.0

Step 2: Define the Neural Network Architecture


python
Copy code
from tensorflow.keras import layers, models

# Define a complex Neural Network Architecture to simulate overfitting


model = models.Sequential([
layers.Flatten(input_shape=(28, 28)), # Flatten the 28x28 images into 1D
layers.Dense(512, activation='relu'), # Hidden layer with 512 units
layers.Dense(512, activation='relu'), # Another hidden layer with 512 units
layers.Dense(512, activation='relu'), # Yet another hidden layer with 512 units
layers.Dense(10, activation='softmax') # Output layer for 10 classes
])

Step 3: Compile the Model


python
Copy code
# Compile the Model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

Step 4: Train the Model


python
Copy code
# Train the Model with more epochs to increase the chance of overfitting
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test))

Step 5: Evaluate the Model


python
Copy code
# Evaluate the Model on the Test Data
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {test_acc}')

Step 6: Generate Predictions


python
Copy code
import numpy as np

# Generate Predictions
y_pred = np.argmax(model.predict(X_test), axis=1)

Step 7: Generate the Confusion Matrix


python
Copy code
from sklearn.metrics import confusion_matrix

# Generate the Confusion Matrix


cm = confusion_matrix(y_test, y_pred)
Step 8: Plot the Confusion Matrix
python
Copy code
import seaborn as sns
import matplotlib.pyplot as plt

# Plot the Confusion Matrix


plt.figure(figsize=(10, 7))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))
plt.xlabel('Predicted Label')
plt.ylabel('Actual Label')
plt.title('Confusion Matrix')
plt.show()

Step 9: Analyze Overfitting


# Check for overfitting: Compare train and test accuracy
train_acc = history.history['accuracy'][-1]
val_acc = history.history['val_accuracy'][-1]

if train_acc > val_acc:


print("The model is likely overfitting.")
else:
print("The model is not overfitting.")

Output:
PRATICAL NO. 3

Aim: Visualizing neural network


Introduction:
In deep learning, neural networks often contain numerous layers and a vast number of parameters,
making them complex to understand and interpret. Visualizing these networks is essential for gaining
insights into their structure and functioning, enabling better model debugging, tuning, and
communication.
Importance of Neural Network Visualization:
 Understanding Model Structure: Visualization helps in grasping the architecture of the
network, including the arrangement of layers and connections between neurons.
 Identifying Issues: Visualization is key in diagnosing problems like vanishing gradients, dead
neurons, and overfitting.
 Enhancing Communication: Visual representations make it easier to explain model behavior
to non-experts, colleagues, and stakeholders.
Types of Visualizations:
 Architecture Visualization: Displays the entire structure of the neural network, showing the
layers, neurons, and connections.
 Weight Visualization: Visualizes the weights assigned to connections between neurons,
revealing how the network has learned during training.
 Gradient Flow: Illustrates the gradients calculated during backpropagation, highlighting
learning patterns and potential issues like exploding or vanishing gradients.
Tools for Neural Network Visualization:
 TensorBoard: A comprehensive tool that provides visualizations of neural network graphs,
training metrics, and more, integrated with TensorFlow.
 Matplotlib and Seaborn: Python libraries used to create custom visualizations of network
components, such as weight distributions and activation patterns.
Challenges in Visualization:
 Complexity: Large networks can be difficult to visualize in a way that is both comprehensive
and comprehensible.
 Interpretability: Effective visualizations must be clear and avoid overwhelming users with
too much information.
Code:
import matplotlib.pyplot as plt
import networkx as nx

def draw_neural_net(ax, left, right, bottom, top, layer_sizes):


G = nx.DiGraph()
positions = {}
layer_width = (right - left) / (len(layer_sizes) - 1)
layer_height = (top - bottom) / (max(layer_sizes) - 1)

for layer_idx, layer_size in enumerate(layer_sizes):


layer_y = top - layer_idx * layer_height
# Draw nodes
for node in G.nodes():
x, y = positions[node]
ax.plot(x, y, 'o', c='k', markersize=10)
ax.text(x, y, f'[node]', fontsize=12, ha='right')

# Draw edges
for edge in G.edges():
x_start, y_start = positions[edge[0]]
x_end, y_end = positions[edge[1]]
ax.plot([x_start, x_end], [y_start, y_end], 'k-', alpha=0.5)

ax.set_xlim(left, right)
ax.set_ylim(bottom, top)
ax.axis('off')

# Example usage
fig, ax = plt.subplots(figsize=(12, 8))
draw_neural_net(ax, left=0, right=5, bottom=0, top=10, layer_sizes=[3, 5, 3])
plt.show()

Output:
PRACTICAL NO. 4

Aim: Object Detection With Pre-trained RetinaNet with Keras

Material Required:
● Python
● TensorFlow Library
● NumPy Library
● scikit-learn Library
● Seaborn Library
● Matplotlib Library
● MNIST Dataset (automatically downloaded via TensorFlow)
Introduction
Object detection a very important problem in computer vision. Here the model is tasked with localizing
the objects present in an image, and at the same time, classifying them into different categories. Object
detection models can be broadly classified into "single-stage" and "two-stage" detectors. Two-stage
detectors are often more accurate but at the cost of being slower. Here in this example, we will
implement RetinaNet, a popular single-stage detector, which is accurate and runs fast. RetinaNet uses
a feature pyramid network to efficiently detect objects at multiple scales and introduces a new loss, the
Focal loss function, to alleviate the problem of the extreme foreground-background class imbalance.
Code:-
# import packages
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import argparse
import imutils
import time
import cv2

# construct the argument parse and parse the arguments


ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
help="minimum probability to filter weak predictions")
args = vars(ap.parse_args())

CLASSES = ["aeroplane", "background", "bicycle", "bird", "boat",


"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
"sofa", "train", "tvmonitor"]
# Assigning random colors to each of the classes
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

print("[INFO] loading model...")


net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

print("[INFO] starting video stream...")


vs = VideoStream(src=0).start()
time.sleep(2.0)

# FPS: used to compute the (approximate) frames per second

fps = FPS().start()

while True:
# grab the frame from the threaded video stream and resize it to have a maximum width of 400 pixels
# vs is the VideoStream
frame = vs.read()
frame = imutils.resize(frame, width=400)
print(frame.shape) # (225, 400, 3)
# grab the frame dimensions and convert it to a blob
# First 2 values are the h and w of the frame. Here h = 225 and w = 400
(h, w) = frame.shape[:2]
# Resize each frame
resized_image = cv2.resize(frame, (300, 300))

blob = cv2.dnn.blobFromImage(resized_image, (1/127.5), (300, 300), 127.5, swapRB=True)

# Predictions:
predictions = net.forward()

# loop over the predictions


for i in np.arange(0, predictions.shape[2]):

confidence = predictions[0, 0, i, 2]

if confidence > args["confidence"]:

idx = int(predictions[0, 0, i, 1])


# then compute the (x, y)-coordinates of the bounding box for the object
box = predictions[0, 0, i, 3:7] * np.array([w, h, w, h])

(startX, startY, endX, endY) = box.astype("int")


# Get the label with the confidence score
label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
print("Object detected: ", label)
# Draw a rectangle across the boundary of the object
cv2.rectangle(frame, (startX, startY), (endX, endY),
COLORS[idx], 2)
y = startY - 15 if startY - 15 > 15 else startY + 15
cv2.putText(frame, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

# show the output frame


cv2.imshow("Frame", frame)

# Now, let's code this logic (just 3 lines, lol)


key = cv2.waitKey(1) & 0xFF

# Press 'q' key to break the loop


if key == ord("q"):
break

# update the FPS counter


fps.update()
fps.stop()

print("[INFO] Elapsed Time: {:.2f}".format(fps.elapsed()))


print("[INFO] Approximate FPS: {:.2f}".format(fps.fps()))

# Destroy windows and cleanup


cv2.destroyAllWindows()
# Stop the video stream
vs.stop()

Output:-
PRACTICAL NO. 5

Aim: Neural Recommender System with Explicit Feedback

Material Required:
● Python
● TensorFlow Library
● NumPy Library
● scikit-learn Library
● Seaborn Library
● Matplotlib Library

Introduction:
Python Recommendation Systems employs a data-driven methodology to offer customers tailored
recommendations. It uses user data and algorithms to forecast and suggest goods, services, or content
that a user is probably going to find interesting. These systems are essential in applications where users
may become overwhelmed by large volumes of information, such as social media, streaming services,
and e-commerce. Building recommendation systems is a common use for Python because of its
modules and machine learning frameworks. The two main kinds are content-based filtering (which
takes into account the characteristics of products and user profiles) and collaborative filtering (which
generates recommendations based on user behaviour and preferences). Hybrid strategies that integrate
the two approaches are also popular.

Code:
PRATICAL NO. 6

Aim: Backpropagation in Neural Networks using Numpy

Introduction:
Backpropagation is a key algorithm used to train artificial neural networks by minimizing the error
between the predicted and actual outputs. It works through two main steps: forward propagation and
backward propagation. In forward propagation, input data moves through the network layers, where
weights, biases, and activation functions transform the data to produce an output. The error is
calculated by comparing this output to the actual target value. During backward propagation, the error
is propagated back through the network, adjusting the weights and biases using the gradient descent
method to reduce the error. This iterative process continues until the network's predictions become
more accurate.
Key Components
 Neural Network Architecture: A feedforward neural network with multiple layers.
 Activation Function: A sigmoid function will be used to introduce non-linearity.
 Loss Function: Mean squared error (MSE) will be used to measure the difference between
predicted and actual outputs.
 Optimization: Gradient descent will be used to update the weights and biases.
Implementation Steps
1. Define the Neural Network:
o Create classes for neurons and layers.
o Initialize weights and biases randomly.
2. Forward Pass:
o Calculate the weighted sum of inputs for each neuron.
o Apply the activation function to obtain the output.
o Propagate the output to the next layer.
3. Backward Pass:
o Calculate the error gradient for the output layer.
o Propagate the error gradient backward through the network, calculating gradients for
each layer's weights and biases.
4. Update Weights and Biases:
o Use gradient descent to update the weights and biases based on the calculated gradients.
5. Repeat:
o Iterate through the training data multiple times (epochs), updating weights and biases
in each iteration.
Applications of Backpropagation:
1. Image Recognition: Backpropagation is widely used in convolutional neural networks (CNNs)
for tasks like object detection, face recognition, and image classification.
2. Natural Language Processing (NLP): It powers language models for applications like
sentiment analysis, machine translation, and speech recognition.
3. Medical Diagnosis: Neural networks trained using backpropagation help in detecting diseases
from medical images, such as identifying tumors in MRI scans.
4. Financial Forecasting: Used to predict stock prices, market trends, and credit risk assessments
by learning from historical data.
5. Autonomous Systems: In robotics and self-driving cars, backpropagation enables learning
from environmental data, improving decision-making in real-time.

Code:
import numpy as np

# Sigmoid activation function and its derivative


def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

# Mean Squared Error (MSE) Loss function


def mse_loss(y_true, y_pred):
return np.mean((y_true - y_pred) ** 2)

# Initialize the dataset (XOR problem)


X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) # Input
y = np.array([[0], [1], [1], [0]]) # Expected output (XOR)

# Set the seed for reproducibility


np.random.seed(42)

# Initialize weights and biases


input_size = 2
hidden_size = 2
output_size = 1

# Random initialization of weights


W1 = np.random.randn(input_size, hidden_size) # Weights between input and hidden layer
b1 = np.zeros((1, hidden_size)) # Bias for hidden layer

W2 = np.random.randn(hidden_size, output_size) # Weights between hidden and output layer


b2 = np.zeros((1, output_size)) # Bias for output layer

# Training parameters
learning_rate = 0.1
epochs = 10000

# Training loop
for epoch in range(epochs):
# Forward pass
hidden_input = np.dot(X, W1) + b1 # Input to hidden layer
hidden_output = sigmoid(hidden_input) # Output from hidden layer

final_input = np.dot(hidden_output, W2) + b2 # Input to output layer


y_pred = sigmoid(final_input) # Output from output layer

# Calculate the loss (MSE)


loss = mse_loss(y, y_pred)

# Backpropagation
# Compute the error in output
error_output = y_pred - y
delta_output = error_output * sigmoid_derivative(y_pred)

# Compute the error in hidden layer


error_hidden = delta_output.dot(W2.T)
delta_hidden = error_hidden * sigmoid_derivative(hidden_output)

# Update the weights and biases


W2 -= learning_rate * hidden_output.T.dot(delta_output)
b2 -= learning_rate * np.sum(delta_output, axis=0, keepdims=True)

W1 -= learning_rate * X.T.dot(delta_hidden)
b1 -= learning_rate * np.sum(delta_hidden, axis=0, keepdims=True)

# Print loss every 1000 epochs


if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Testing the trained network


print("\nTesting the trained network:")
for i in range(len(X)):
hidden_output = sigmoid(np.dot(X[i], W1) + b1)
y_pred = sigmoid(np.dot(hidden_output, W2) + b2)
print(f"Input: {X[i]}, Predicted Output: {y_pred}, Expected Output: {y[i]}")

Output:
Epoch 0, Loss: 0.2558
Epoch 1000, Loss: 0.2494
Epoch 2000, Loss: 0.2454
Epoch 3000, Loss: 0.2047
Epoch 4000, Loss: 0.1532
Epoch 5000, Loss: 0.1387
Epoch 6000, Loss: 0.1336
Epoch 7000, Loss: 0.1312
Epoch 8000, Loss: 0.1297
Epoch 9000, Loss: 0.1288

Testing the trained network:


Input: [0 0], Predicted Output: [[0.05300376]], Expected Output: [0]
Input: [0 1], Predicted Output: [[0.49554286]], Expected Output: [1]
Input: [1 0], Predicted Output: [[0.95091752]], Expected Output: [1]
Input: [1 1], Predicted Output: [[0.50319846]], Expected Output: [0]
PRATICAL NO. 7

Aim: Backpropagation in Neural Networks using Numpy

Introduction:
Neural networks are powerful tools for solving complex problems by learning patterns from data. The
backpropagation algorithm is a key component of neural networks, allowing them to learn by adjusting
their internal weights based on the error between predicted and actual outputs. By iteratively updating
the weights in response to errors, the network becomes increasingly accurate in its predictions.
Backpropagation, or "backward propagation of errors," is an efficient way to compute gradients of the
loss function with respect to the weights in the network, using the chain rule of calculus. These
gradients are then used by optimization algorithms, such as gradient descent, to update the network
weights and minimize the loss.
Key Components:
 Neural Network Architecture: A feedforward neural network with multiple layers.
 Activation Function: A sigmoid function will be used to introduce non-linearity.
 Loss Function: Mean squared error (MSE) will be used to measure the difference between
predicted and actual outputs.
 Optimization: Gradient descent will be used to update the weights and biases.
Implementation Steps:
1. Define the Neural Network:
o Create classes for neurons and layers.
o Initialize weights and biases randomly.
2. Forward Pass:
o Calculate the weighted sum of inputs for each neuron.
o Apply the activation function to obtain the output.
o Propagate the output to the next layer.
3. Backward Pass:
o Calculate the error gradient for the output layer.
o Propagate the error gradient backward through the network, calculating gradients for
each layer's weights and biases.
4. Update Weights and Biases:
o Use gradient descent to update the weights and biases based on the calculated gradients.
5. Repeat:
o Iterate through the training data multiple times (epochs), updating weights and biases
in each iteration.

Applications:
The backpropagation algorithm is widely used in various machine learning tasks, including:
 Image Classification: Training deep learning models like convolutional neural networks
(CNNs) to classify images.
 Natural Language Processing (NLP): For tasks such as sentiment analysis, machine
translation, and text classification.
 Recommendation Systems: To learn user-item interactions and make personalized
recommendations.
 Financial Forecasting: Predicting stock prices or risk factors in financial markets.
 Medical Diagnostics: Helping to predict the likelihood of diseases based on patient data.

Code:
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, Dot, Lambda, Reshape, Flatten
from tensorflow.keras import backend as K # Corrected import of K

def create_recommender_model(num_users, num_items, embedding_dim):


# Input layers for user and item
user_input = Input(shape=(1,), name='user_input')
item_input = Input(shape=(1,), name='item_input')
positive_item_input = Input(shape=(1,), name='positive_item_input')

# Embedding layers for user and items


user_embedding = Embedding(num_users, embedding_dim, name='user_embedding')(user_input)
item_embedding = Embedding(num_items, embedding_dim, name='item_embedding')(item_input)
positive_item_embedding = Embedding(num_items, embedding_dim,
name='positive_item_embedding')(positive_item_input)

# Flatten the embeddings


user_embedding = Flatten()(user_embedding)
item_embedding = Flatten()(item_embedding)
positive_item_embedding = Flatten()(positive_item_embedding)

# Define the model with inputs


model = Model(inputs=[user_input, item_input, positive_item_input], outputs=[user_embedding,
item_embedding, positive_item_embedding])

return model

def triplet_loss(y_true, y_pred):


margin = 1.0 # You can adjust the margin parameter for the triplet loss

# Calculate distances between anchor-positive and anchor-negative pairs


anchor_positive_distance = K.sqrt(K.sum(K.square(y_pred[:, 0] - y_pred[:, 1]), axis=-1))
anchor_negative_distance = K.sqrt(K.sum(K.square(y_pred[:, 0] - y_pred[:, 2]), axis=-1))

# Triplet loss calculation


return K.mean(K.maximum(anchor_positive_distance - anchor_negative_distance + margin, 0))
# Example usage
num_users = 1000
num_items = 5000
embedding_dim = 64

# Create the model


model = create_recommender_model(num_users, num_items, embedding_dim)
model.compile(optimizer='adam', loss=triplet_loss)

# Sample data
user_ids = np.random.randint(0, num_users, size=(100,))
item_ids = np.random.randint(0, num_items, size=(100,))
positive_item_ids = np.random.randint(0, num_items, size=(100,))

# Train the model


model.fit([user_ids, item_ids, positive_item_ids], np.zeros((100, 1)), epochs=10)

Output:
4/4 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - loss: 1.0344
Epoch 2/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.9939
Epoch 3/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.9668
Epoch 4/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.9392
Epoch 5/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.9301
Epoch 6/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.8931
Epoch 7/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.8598
Epoch 8/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.8409
Epoch 9/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.8212
Epoch 10/10
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - loss: 0.7808
PRATICAL NO. 8

Aim: Fully convolutional neural network.

Introduction:
Fully Convolutional Neural Networks (FCNNs) are a type of deep learning architecture that have
gained significant attention due to their ability to process input images of arbitrary size and produce
output maps of corresponding size. Unlike traditional convolutional neural networks (CNNs) that
typically end with fully connected layers, FCNNs consist entirely of convolutional layers, making them
more flexible and efficient for tasks like semantic segmentation and object detection.
Key characteristics of FCNNs:
 No fully connected layers: FCNNs replace fully connected layers with convolutional layers,
allowing them to process input images of any size without requiring fixed-size input.
 Upsampling layers: FCNNs often use upsampling layers (e.g., transposed convolutions) to
increase the spatial resolution of the feature maps, enabling them to generate output maps that
match the size of the input image.
 Dense prediction: FCNNs produce dense predictions, meaning they assign a class label or
probability to each pixel in the input image.
Applications of FCNNs:
 Semantic segmentation: Assigning a semantic label (e.g., car, person, road) to each pixel in
an image.
 Object detection: Identifying and localizing objects within an image.
 Image generation: Generating new images or modifying existing ones.
 Medical image analysis: Analyzing medical images for tasks like tumor detection and
segmentation
Advantages of FCNNs:
 Flexibility: FCNNs can handle input images of arbitrary size, making them suitable for a wide
range of applications.
 Efficiency: FCNNs can be more efficient than traditional CNNs, especially for large input
images.
 End-to-end learning: FCNNs can learn feature extraction, classification, and localization in a
single end-to-end process.
Code:
pip install torch torchvision;
import torch
import torch.nn as nn
import torchvision.models as models

# Define the FCN (Fully Convolutional Network) model


class FCN(nn.Module):
def __init__(self, num_classes):
super(FCN, self).__init__()

# Load pre-trained VGG16 model


vgg = models.vgg16(pretrained=True)

# Extract features from VGG16, excluding the fully connected layers


self.features = vgg.features

# Define the classifier as a convolutional layer (instead of fully connected layers)


self.conv1 = nn.Conv2d(512, 4096, kernel_size=7)
self.relu = nn.ReLU(inplace=True)
self.drop = nn.Dropout2d()

self.conv2 = nn.Conv2d(4096, 4096, kernel_size=1)


self.conv3 = nn.Conv2d(4096, num_classes, kernel_size=1)

# Transpose convolution for upsampling (deconvolution)


self.upsample = nn.ConvTranspose2d(num_classes, num_classes, kernel_size=64, stride=32,
padding=16, bias=False)
def forward(self, x):
# Pass the input through the feature extractor (VGG16)
x = self.features(x)

# Convolutional layers replacing the fully connected ones


x = self.relu(self.conv1(x))
x = self.drop(x)
x = self.relu(self.conv2(x))
x = self.drop(x)

# Final convolution to get the desired number of output classes


x = self.conv3(x)

# Upsample the output to match the input size


x = self.upsample(x)

return x
# Initialize the FCN model
num_classes = 21 # For example, 21 classes in Pascal VOC dataset
model = FCN(num_classes)

# Print the model architecture


print(model)
# Example input
input_tensor = torch.randn(1, 3, 224, 224) # Batch size of 1, 3 channels (RGB), 224x224 image
output = model(input_tensor)
print(f"Output shape: {output.shape}") # Should output [1, num_classes, 224, 224]
Output:
FCN(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv1): Conv2d(512, 4096, kernel_size=(7, 7), stride=(1, 1))
(relu): ReLU(inplace=True)
(drop): Dropout2d(p=0.5, inplace=False)
(conv2): Conv2d(4096, 4096, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(4096, 21, kernel_size=(1, 1), stride=(1, 1))
(upsample): ConvTranspose2d(21, 21, kernel_size=(64, 64), stride=(32, 32), padding=(16, 16),
bias=False)
)

Output shape: torch.Size([1, 21, 32, 32])

You might also like