Mlaifile1 3

Experiment 1
Aim
To fit the polynomial ( y₁(X) = w₀+ w₁x + w₂x²+ w₃x³ + w₄x⁴ ) to ( y₂(x) = 5 sin(x) )
in the range X = (-π, +π), find the coefficients ( w₀, w₁, w₂, w₃, ) and( w₄ ).
Software Used
Jupyter Notebook
Theory
Curve fitting is a process used to find a mathematical function that best fits a
set of data points.
Importing Libraries: The code begins by importing necessary libraries such as
numpy for numerical operations, scipy.optimize.curve_fit for curve fitting, and
matplotlib.pyplot for data visualization.
Defining Functions: Two functions are defined: y₂(X) representing a target
function, and polynomial(X, w₀, w₁, w₂, w₃,w₄ ) representing a polynomial
function with coefficients w₀, w₁, w₂, w₃ and w₄.
Generating Data: An array X_range is created to represent the range of input
values. y₂ values are generated using the y2(X_range) function.
Curve Fitting: The curve_fit function is used to fit the polynomial function to
the data. The initial guess for the coefficients is provided as initial_guess.
Extracting Coefficients: The fitted coefficients (w₀, w₁, w₂, w₃,w₄) are extracted
from the results of curve fitting.
Generating Fitted Values: Using the fitted coefficients, y1_values are generated
using the polynomial function.
Printing Coefficients:The fitted coefficients are printed to the console for
analysis.
Plotting: The target function (y2_values) and the fitted polynomial function
(y1_values) are plotted against the input values (X_range). Labels for axes and a
title for the plot are added.
Display Plot: Finally, the plot is displayed using plt.show().
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
#PolynomiaL y2
def y2(X):
return 5 * np.sin(X)
#Polynomial y1(X)
def polynomial(X, w0, w1, w2, w3, w4):
return w0 + w1 * X + w2 * X**2 + w3 * X**3 + w4 * X**4
# x in the range of (-pi,pi)
X_range = np.linspace(-np.pi, np.pi, 100)
y2_values = y2(X_range)
# Curve fitting
initial_guess = [1, 1, 1, 1, 1]
fit_params, covariance = curve_fit(polynomial, X_range, y2_values,
p0=initial_guess)
# Extract the fitted coefficients
w0, w1, w2, w3, w4 = fit_params
# Generate y1 values using the fitted coefficients
y1_values = polynomial(X_range, w0, w1, w2, w3, w4)
w0, w1, w2, w3, w4 = fit_params
print("Fitted Coefficients:")
print("w0:", w0)
print("w1:", w1)
print("w2:", w2)
print("w3:", w3)
print("w4:", w4)
# Plot the results
plt.figure(figsize=(8, 6))
plt.plot(X_range, y2_values, label='Target Function: $5 \sin(X)$ ',
linestyle='dashed')
plt.plot(X_range, y1_values, label='Fitted Polynomial: $w0 + w1X + w2X^2 +
w3X^3 + w4X^4$')
plt.title('Polynomial Curve Fitting')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
# Show the plot
plt.show()
Output
Result
The polynomial ( y₁(X) = w₀+ w₁x + w₂x²+ w₃x³ + w₄x⁴ ) has been successfully
fitted to ( y₂(x) = 5 sin(x) ) within the range X = (-π, +π) through regression
analysis. The coefficients( w₀, w₁, w₂, w₃, ) and ( w₄ ).have been determined to
accurately represent the relationship between the two functions.
Experiment 2
Aim
To utilize linear regression analysis to predict the price of a car that is 12 years
old from the following data:
10 Years:Rs.30500
5 Years:Rs.58000
20 Years:Rs.14900
15 Years:Rs .20400
8 Years:Rs.37000
Along with the plot of data points, the output and fitted line.
Software Used
Jupyter Notebook
Theory
Linear regression is a statistical method used to model the relationship
between a dependent variable and one or more independent variables by
fitting a linear equation to observed data.
Importing Libraries: The code begins by importing necessary libraries such as
numpy for numerical operations, matplotlib.pyplot for data visualization, and
LinearRegression from sklearn.linear_model for building the regression model.
Data Preparation: The years of the used cars and their corresponding prices
are represented as numpy arrays.They are reshaped to ensure compatibility
with the linear regression model.
Model Initialization: A linear regression model object (model) is created using
the LinearRegression() constructor.
Model Fitting: The model is trained on the provided data using the fit()
method, where it learns the relationship between the age of the car and its
price.
Prediction: Using the trained model, the code predicts the price of a car that is
12 years old by calling the predict() method with the input [[12]].
Data Visualization: The data points are plotted on a scatter plot (plt.scatter)
with years on the x-axis and prices on the y-axis. Additionally, a line
representing the fitted regression model (plt.plot) is overlaid on the scatter
plot.
Result Highlight: The predicted price for a 12-year-old car is highlighted on the
plot with a green cross (plt.scatter) for visual clarity.
Plot Customization: Labels for the x-axis, y-axis, and title are added (plt.xlabel,
plt.ylabel, plt.title), and a legend is included to identify the data points, fitted
line, and predicted price.
Display Plot: Finally, the plot is displayed using plt.show().
Code:
# Import necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression
# Data
years = np.array([10, 5, 20, 15, 8])
prices = np.array([30500, 58000, 14900, 20400, 37000])
# Reshape the data

years_reshaped = years.reshape(-1, 1)
# Create a linear regression model

model = LinearRegression()
# Fit the model to the data

model.fit(years_reshaped, prices)
# Predict the price for a car that is 12 years old

predicted_price = model.predict([[12]])
# Plot the data points

plt.scatter(years, prices, color='blue', label='Data Points')
# Plot the fitted line
plt.plot(years, model.predict(years_reshaped), color='red', linewidth=2,
label='Fitted Line')
# Highlight the predicted price for 12 years old car

plt.scatter([12], predicted_price, color='green', marker='x', s=100,
label='Predicted Price (12 years)')
# Add labels and title

plt.xlabel('Years')
plt.ylabel('Price (Rs)')
plt.title('Linear Regression - Car Prices')
plt.legend()
# Show the plot
plt.show()
Output:
Result
The price of the car is predicted using linear regression analysis.
Experiment 3
Aim
To design a decision tree classifier model for the Iris plants dataset. Print the
training data and the decision tree model. Demonstrate prediction using the
model. Using sklearn library.
Software Used
Jupyter Notebook
Theory
Decision trees are a type of supervised machine learning algorithm used for
classification and regression tasks. They work by recursively partitioning the
data based on the features to create a tree-like structure of decisions.
Import Libraries: The code starts by importing necessary libraries from
scikit-learn: `load_iris` for loading the Iris dataset, `train_test_split` for splitting
the data into training and testing sets, `DecisionTreeClassifier` for creating a
decision tree model, and èxport_text` for visualizing the decision tree rules.
Load and Split the Data: The Iris dataset is loaded using `load_iris()`. It consists
of features (X) representing sepal and petal measurements and target labels (y)
indicating the species of iris flowers. The dataset is then split into training and
testing sets using `train_test_split`.
Create and Train Decision Tree Classifier: A decision tree classifier
(`DecisionTreeClassifier`) is created without specifying any hyperparameters.
The model is trained using the training data (`X_train` and `y_train`) with the
`fit` method.
Print Training Data Information: The code prints information about the training
data, specifically the shapes of `X_train` and `y_train`. This step provides insight
into the dimensions of the training data.
Print Decision Tree Model: The decision tree model is visualized using the
èxport_text` function, which generates a textual representation of the decision
tree rules. The printed rules include conditions based on feature values that
lead to different decision outcomes.
Demonstrate Prediction: A sample data point from the testing set (`X_test[0]`)
is reshaped to match the expected input format of the model. The model then
predicts the class for this sample using the `predict` method.
Code:
# # Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
# Load the Iris dataset

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.2, random_state=42)
# Create a decision tree classifier

clf = DecisionTreeClassifier()
# Train the model

clf.fit(X_train, y_train)
# Print the training data

print("Training Data:")
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
# Print the decision tree model

tree_rules = export_text(clf, feature_names=iris.feature_names)
print("\nDecision Tree Model:")
print(tree_rules)
# Demonstrate prediction using the model

sample_data = X_test[0].reshape(1, -1)
predicted_class = clf.predict(sample_data)[0]
print("\nSample Data for Prediction:")
print("Features:", sample_data)
print("Predicted Class:", predicted_class)
Output:
Result
The decision tree classifier model trained on the Iris plants dataset successfully
predicts the class labels based on the features provided
Experiment 4
Aim
To write an ML program using K nearest neighbour algorithm to train on iris
plant dataset. Find the accuracy, precision and recall for the model.
Software Used
Jupyter Notebook
Theory
k-Nearest Neighbors (k-NN) is a simple and versatile machine learning
algorithm used for both classification and regression tasks. It belongs to the
category of instance-based learning or lazy learning, where the algorithm
makes predictions based on the majority class (for classification) or the average
value (for regression) of the k-nearest data points in the feature space.
Data Preparation: Loads the Iris dataset using scikit-learn. Selects the first two
features (sepal length and sepal width) and their corresponding target labels.
Train-Test Split: Splits the dataset into training and testing sets, with 80% of the
data used for training and 20% for testing.
k-NN Classifier Training: Initializes a k-NN classifier with `n_neighbors=3`. Fits
the classifier to the training data (`X_train`, `y_train`).
Model Evaluation: Uses the trained k-NN classifier to predict labels for the test
set (`X_test`). Calculates and prints accuracy, precision, and recall scores to
evaluate the performance of the classifier.
Decision Boundaries Visualization: Creates a meshgrid of points covering the
feature space defined by sepal length and sepal width. Predicts the class labels
for each point in the meshgrid using the trained k-NN classifier. Visualizes the
decision boundaries by plotting the meshgrid with colored regions representing
different classes.
Scatter Plot of Data Points: Plots the training data points with markers and
colors corresponding to their classes. Additionally, plots the testing data points
using 'x' markers for better differentiation.
Plot Customization: Sets axis labels, a title, and a legend to enhance the
interpretability of the visualization.
Display the Visualization: Displays the final plot showing the decision
boundaries of the k-NN classifier on the Iris dataset.
Code:
import numpy as np
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score,
classification_report
import numpy as np
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
knn_classifier = KNeighborsClassifier(n_neighbors=3)
knn_classifier.fit(X_train, y_train)
y_pred = knn_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
h = .02
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = knn_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
scatter = plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cmap_bold,

edgecolors='k', s=100)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cmap_bold, marker='x',
s=200, linewidths=2)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('Decision Boundaries of k-NN Classifier (Iris Dataset)')
plt.legend(*scatter.legend_elements(), title='Classes')
plt.show()
Output:
Result
The K Nearest Neighbor algorithm trained on the Iris plant dataset achieves
high accuracy, precision, and recall, indicating its effectiveness in classifying iris
plant species based on given features.
Experiment 5
Aim
K-means clustering: Using K-means clustering algorithm, group the following
dataset into K=3 clusters. Dataset: (2,10), (2,5), (8,4), (5,8), (7,5),(6,4), (1,2) and
(4,9).
1. Show the 3 clusters on a graph.
2. Illustrate the step-by-step formation of the clusters from the beginning by
creating an animation video.
Software Used
Jupyter Notebook
Theory
K-means Clustering: It’s an unsupervised machine learning algorithm
used to group unlabeled datasets into different clusters. The ‘K’ in
K-means represents the number of clusters you want to divide your data
into.
Objective: The main goal of K-means clustering is to partition the data
points into ‘K’ clusters such that the points within each cluster are as
similar as possible while being dissimilar from the points in other
clusters. This similarity is typically based on the Euclidean distance
between data points.
How it Works:
Initialization: K-means starts by initializing ‘K’ centroids randomly. These
centroids are the initial guesses for the locations of the cluster centers.
Assignment: Each data point is assigned to the nearest centroid, forming
‘K’ clusters.
Update: The centroids are recalculated as the mean of all points
assigned to that cluster.
Iteration: The assignment and update steps are repeated until the
centroids no longer change significantly, indicating that the clusters have
stabilized, and the algorithm has converged.
Centroids: These are the mean values of the items categorized in them.
They serve as the ‘center’ of each cluster.
Euclidean Distance: It’s the measure used to calculate the similarity
between data points, determining how they are grouped.
Convergence: The process is repeated for a set number of iterations or
until the centroids’ positions stabilize, meaning there’s no further
significant change in their location.
Result: The final output is a set of clusters with minimized intra-cluster
distances (variance) and maximized inter-cluster distances.
Code:
import numpy as np
from sklearn.cluster import KMeans
# Dataset
data = np.array([[2, 10], [2, 5], [8, 4], [5, 8], [7, 5], [6, 4], [1, 2], [4, 9]])
# Number of clusters
k=3
# K-means clustering
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(data)
# Assign clusters and get centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Visualize the clusters
colors = ['r', 'g', 'b']
plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis', edgecolors='k')
# Plot centroids
plt.scatter(centroids[:, 0], centroids[:, 1], marker='X', s=200, c='red',
label='Centroids')
# Customize plot
plt.title('K-means Clustering')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()
Output:
Result
The K-means clustering algorithm successfully grouped the given dataset into
three distinct clusters, visualized on a graph. This clustering process effectively
separates the data points into their respective clusters based on their features.
Experiment 6
Aim
To Implement a perceptron training algorithm for two features. Consider the
features - sepal length and petal length of the two classes of flowers setosa and
versicolor from the Iris dataset and classify using the perceptron. Along with
the input scatter plot and output decision surface showing the Classification
using 50 training data points for each of the classes.
Software Used
Jupyter Notebook
Theory
The perceptron training algorithm is a supervised learning algorithm used for
binary classification tasks. It adjusts the weights of the input features iteratively
to minimize misclassifications. During each iteration, it updates the weights
based on the classification error, aiming to find a decision boundary that
separates the two classes. The process continues until convergence or a
predefined number of iterations is reached.
Loading the Iris dataset: The code starts by importing necessary libraries and
loading the Iris dataset using scikit-learn's load_iris function. We extract the
features (sepal length and petal length) and the target labels (species).
Selecting training data: We randomly select 50 data points for each of the two
classes (setosa and versicolor) from the dataset to use as training data.
Initializing weights and bias: We initialize the weights and bias for the
perceptron algorithm. These parameters will be updated during training to
classify the data.
Perceptron training algorithm: The perceptron training algorithm is
implemented next. This algorithm iteratively updates the weights and bias
based on misclassified points. It calculates the prediction for each data point
using the current weights and bias, compares it to the true label, and updates
the parameters accordingly.
Visualizing the input scatter plot: We create a scatter plot of the input data,
where the x-axis represents sepal length and the y-axis represents petal length.
Data points belonging to setosa and versicolor classes are plotted in different
colors.
Visualizing the decision surface: We plot the decision boundary learned by the
perceptron algorithm. This boundary separates the setosa and versicolor
classes in the feature space. It is represented by a line in the scatter plot.
Code:
import numpy as np
# Load Iris dataset
iris = load_iris()
X = iris.data[:100, [0, 2]] # sepal length and petal length
y = iris.target[:100]
y = np.where(y == 0, -1, 1) # Convert labels to -1 (setosa) and 1 (versicolor)
# Perceptron training algorithm
class Perceptron(object):
def __init__(self, eta=0.01, epochs=50):
self.eta = eta
self.epochs = epochs
def train(self, X, y):
self.w_ = np.zeros(1 + X.shape[1])
self.errors_ = []
for _ in range(self.epochs):
errors = 0
for xi, target in zip(X, y):
update = self.eta * (target - self.predict(xi))
self.w_[1:] += update * xi
self.w_[0] += update
errors += int(update != 0.0)
self.errors_.append(errors)
return self
def net_input(self, X):
return np.dot(X, self.w_[1:]) + self.w_[0]
def predict(self, X):
return np.where(self.net_input(X) >= 0.0, 1, -1)
# Create a perceptron instance and train it
ppn = Perceptron(eta=0.1, epochs=10)
ppn.train(X, y)
# Plot the input scatter plot
plt.scatter(X[:50, 0], X[:50, 1], color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1], color='blue', marker='x',
label='versicolor')
plt.xlabel('Sepal length')
plt.ylabel('Petal length')
plt.legend(loc='upper left')
plt.show()
# Plot the output decision surface
def plot_decision_regions(X, y, classifier, resolution=0.02):
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = ListedColormap(colors[:len(np.unique(y))])
# plot the decision surface
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
# plot class samples

for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
alpha=0.8, c=cmap(idx),
marker=markers[idx], label=cl)
plot_decision_regions(X, y, classifier=ppn)
plt.xlabel('Sepal length [cm]')
plt.ylabel('Petal length [cm]')
plt.legend(loc='upper left')
plt.show()
Output:
Result
The perceptron training algorithm successfully classifies the Iris flower dataset
based on sepal length and petal length into two classes, Setosa and Versicolor,
visualized through an input scatter plot and a decision surface showing the
classification boundaries.
Experiment 7
Aim
Use Q learning to navigate through a 2x2 maze.
Software Used
Jupyter Notebook
Theory
Importing Libraries: The code starts by importing the `numpy` library, which is
commonly used for numerical computations in Python.
Problem Definition:
● num_boxes: This variable defines the number of boxes in the
environment.
● reward_target_box: It specifies the reward obtained when reaching the
target box.
Q-Table Initialization: A Q-table `Q` is created as a NumPy array of zeros with
dimensions `(num_boxes, num_boxes)`. Each row corresponds to a state, and
each column corresponds to an action. Q-values represent the expected
cumulative rewards for taking a particular action in a particular state.
Learning Parameters:
● learning_rate: This parameter controls the magnitude of Q-value updates
and influences the agent's rate of learning.
● discount_factor: It determines the importance of future rewards
compared to immediate rewards. A discount factor closer to 1 values
long-term rewards more highly.
● num_episodes: The total number of episodes for training.
● max_steps_per_episode: The maximum number of steps the agent can
take in each episode before termination.
Action Selection: The `choose_action` function selects an action for a given
state based on an epsilon-greedy strategy. With probability èpsilon`, a random
action is chosen to explore the environment, and with probability `1 - epsilon`,
the action with the highest Q-value for the current state is selected to exploit
learned knowledge.
Q-Value Update: The ùpdate_q_value` function updates Q-values based on
the observed transition from the current state to the next state and the reward
received. It employs the Q-learning update rule to adjust the Q-value for the
`(state, action)` pair towards the estimated optimal value, considering both the
immediate reward and the discounted future rewards.
Q-Learning Algorithm Execution: The main loop runs for `num_episodes`,
iterating through episodes of interaction between the agent and the
environment. Within each episode, the agent selects actions based on the
epsilon-greedy strategy and updates Q-values accordingly. The episode
terminates when the target box is reached or when the maximum number of
steps per episode is exceeded.
Code:
import numpy as np
# Define the number of boxes
num_boxes = 6
# Define the reward for the target box

reward_target_box = 100
# Define the Q-table

Q = np.zeros((num_boxes, num_boxes))
# Define the learning parameters

learning_rate = 0.8
discount_factor = 0.9
num_episodes = 1000
max_steps_per_episode = 100
# Define the function to choose an action
def choose_action(state, epsilon):
if np.random.random() < epsilon:
return np.random.choice(num_boxes)
else:
return np.argmax(Q[state, :])
# Define the function to update Q-values

def update_q_value(state, action, reward, next_state):
best_next_action = np.argmax(Q[next_state, :])
Q[state, action] = Q[state, action] + learning_rate * (reward + discount_factor
* Q[next_state, best_next_action] - Q[state, action])
# Run the Q-learning algorithm
for episode in range(num_episodes):
current_state = np.random.randint(0, num_boxes)
epsilon = 0.5 / (episode + 1) # Exploration-exploitation trade-off
for step in range(max_steps_per_episode):
action = choose_action(current_state, epsilon)
reward = reward_target_box if action == num_boxes - 1 else 0
next_state = action
update_q_value(current_state, action, reward, next_state)
current_state = next_state
if action == num_boxes - 1: # If the target box is reached
break
# Print the learned Q-values
print("Learned Q-values:")
print(Q)
Output:
Result
Finally, the learned Q-values are printed, providing insight into the agent's
learned policy for selecting actions in different states.

Mlaifile1 3

Uploaded by

Copyright:

Available Formats

Mlaifile1 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mlaifile1 3

Uploaded by

Copyright:

Available Formats

Experiment 1

# Reshape the data

# Create a linear regression model

# Fit the model to the data

# Predict the price for a car that is 12 years old

# Plot the data points

# Highlight the predicted price for 12 years old car

# Add labels and title

# Load the Iris dataset

# Create a decision tree classifier

# Train the model

# Print the training data

# Print the decision tree model

# Demonstrate prediction using the model

accuracy = accuracy_score(y_test, y_pred)

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

scatter = plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cmap_bold,

# plot class samples

# Define the reward for the target box

# Define the Q-table

# Define the learning parameters

# Define the function to update Q-values

You might also like