R18 B ML LAB Manual - Minor Degree

EXPERIMENT NO:1
The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is the probability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)
Aim:
To Find the the probability that a student is absent given that today is Friday from given
data with Baye’s rule in python.
Theory:
P (Today is Friday)=0.2
P(B)=0.2
(It is Friday∩student is absent)= P(A∩B)=0.03

P(A | B) = P(A and B) / P(B)
it is required to find
P(student is absent |today is Friday) P(A|B)
The formula for obtaining the conditional probability of event A, given the event B has
occurred is as follows:
P(A|B)=P(A∩B)/P(B)
Thus the required probability is as follows:
P(student is absent| today is Friday)=P(It is Friday∩student is absent)/P(Today is Friday)
=0.03/0.2=0.15
The answer is 0.15
PROCEDURE / PROGRAMME
# calculate P(A|B) given P(A and B) and P(B)
def bayes_theorem(p_a_b, p_b):
# calculate P(A|B) = P(A and B) / P(B)
p_a_given_b = (p_a_b) / p_b
return p_a_given_b
# P(A and B)
p_a_b = 0.03
# P(B)
p_b = 0.20
# calculate P(A|B)
result = bayes_theorem(p_a_b, p_b)
# summarize
print('P(A|B) = %.f%%' % (result * 100))

EXPERIMENT NO:2
Extract the data from database using python
Aim:
To extract the data from database using python
Theory:
1. Connect to MySQL from Python
Refer to Python MySQL database connection to connect to MySQL database from Python
using MySQL Connector module
2. Define a SQL SELECT Query
Next, prepare a SQL SELECT query to fetch rows from a table. You can select all or limited
rows based on your requirement. If the where condition is used, then it decides the number
of rows to fetch.
For example, SELECT col1, col2,…colnN FROM MySQL_table WHERE id = 10;. This will
return row number 10.
3. Get Cursor Object from Connection
Next, use a connection.cursor() method to create a cursor object. This method creates a
new MySQLCursor object.
4. Execute the SELECT query using execute() method
Execute the select query using the cursor.execute() method.
5. Extract all rows from a result
After successfully executing a Select operation, Use the fetchall() method of a cursor
object to get all rows from a query result. it returns a list of rows.
6. Iterate each row
Iterate a row list using a for loop and access each row individually (Access each row’s
column data using a column name or index number.)
7. Close the cursor object and database connection object
use cursor.clsoe() and connection.clsoe() method to close open connections after your
work completes.
import mysql.connector
from mysql.connector import Error
try:
connection=mysql.connector.connect(host='localhost',database='employeeDB',charset='
utf8',user='root',password='root')
print("connected")
sql_select_Query = "SELECT * FROM employee"
cursor = connection.cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()
print("Total number of rows in employee is: ", cursor.rowcount)
print("\nPrinting each employee record")
for row in records:
print("Id = ", row[0],"\n" )
print("Name = ", row[1], "\n")
print("Address = ", row[2])
print("Join date = ", row[3], "\n")
except Error as e:
print("Error reading data from MySQL table", e)
connection.close()
cursor.close()
print("MySQL connection is closed")
For Insert the value Python program
import mysql.connector
from mysql.connector import Error
try:
mydb =
mysql.connector.connect(host='localhost',database='employeeDB',charset='utf8',user='r
oot',password='root')
mycursor = mydb.cursor()
sql = "INSERT INTO employee(id,Name,empaddress,edoj)VALUES (%s,%s,%s,%s)"
val = [
(2111,'rubesh','Lowstreet 4','2019-09-12'),
(2121,'siva','Apple st 652','2019-09-12'),
mycursor.executemany(sql, val)
mydb.commit()
print(mycursor.rowcount, "was inserted.")
except Error as e:
print("Error reading data from MySQL table", e)
finally:
if mydb.is_connected():
mydb.close()
#cursor.close()
print("MySQL connection is closed")
Output
connected
Total number of rows in employee is: 4
Printing each employee record
Id = 111
Name = siva
Address = madurai
Join date = 2015-12-17
Id = 112
Name = Ram
Address = Theni
Join date = 2016-12-18
Id = 2111
Name = rubesh
Address = Lowstreet 4
Join date = 2019-09-12
Id = 2121
Name = siva
Address = Apple st 652
Join date = 2019-09-12

EXPERIMENT NO:3
Implement k-nearest neighbours classification using python
Aim:
To implement k-nearest neighbours classification using python
Theory:
• K-Nearest Neighbors is one of the most basic yet essential classification algorithms in
Machine Learning. It belongs to the supervised learning domain and finds intense
application in pattern recognition, data mining and intrusion detection.
• It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not
make any underlying assumptions about the distribution of data.
• Algorithm
Input: Let m be the number of training data samples. Let p be an unknown point.
Method:
1. Store the training samples in an array of data points arr[]. This means each element
of this array represents a tuple (x, y).
2. for i=0 to m
Calculate Euclidean distance d(arr[i], p).
3. Make set S of K smallest distances obtained. Each of these distances correspond to
an already classified data point.
4. Return the majority label among S.
PROCEDURE / PROGRAMME :
# import the required packages
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
# Load dataset
iris=datasets.load_iris()
print("Iris Data set loaded...")
# Split the data into train and test samples
x_train, x_test, y_train, y_test = train_test_split(iris.data,iris.target,test_size=0.1)
print("Dataset is split into training and testing...")
print("Size of trainng data and its label",x_train.shape,y_train.shape)
print("Size of trainng data and its label",x_test.shape, y_test.shape)
# Prints Label no. and their names
for i in range(len(iris.target_names)):
print("Label", i , "-",str(iris.target_names[i]))
# Create object of KNN classifier

classifier = KNeighborsClassifier(n_neighbors=1)
# Perform Training
classifier.fit(x_train, y_train)
# Perform testing
y_pred=classifier.predict(x_test)
# Display the results
print("Results of Classification using K-nn with K=1 ")
for r in range(0,len(x_test)):
print(" Sample:", str(x_test[r]), " Actual-label:", str(y_test[r]), " Predicted-label:",
str(y_pred[r]))
print("Classification Accuracy :" , classifier.score(x_test,y_test));
#from sklearn.metrics import classification_report, confusion_matrix
#print('Confusion Matrix')
#print(confusion_matrix(y_test,y_pred))
#print('Accuracy Metrics')
#print(classification_report(y_test,y_pred))
Output
Result-1
Iris Data set loaded...
Dataset is split into training and testing samples...
Size of trainng data and its label (135, 4) (135,)
Size of trainng data and its label (15, 4) (15,)
Label 0 - setosa
Label 1 - versicolor
Label 2 - virginica
Results of Classification using K-nn with K=1
Sample: [4.4 3. 1.3 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 2.5 3. 1.1] Actual-label: 1 Predicted-label: 1
Sample: [6.1 2.8 4. 1.3] Actual-label: 1 Predicted-label: 1
Sample: [6. 2.7 5.1 1.6] Actual-label: 1 Predicted-label: 2
Sample: [6.7 2.5 5.8 1.8] Actual-label: 2 Predicted-label: 2
Classification Accuracy : 0.93
EXPERIMENT NO:4
Given the following data, which specify classifications for nine combinations of VAR1 and VAR2
predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
kmeans clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
Aim:
To predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
kmeans clustering with 3 means and given data.
Theory:
Step 1: Python 3 code snippet demonstrates the implementation of a simple K-Means
clustering to automatically divide input data into groups based on given features.
Step 2: “ , “ separated CSV file is loaded first, which contains three corresponding input
columns.
Step 3: K-Means clustering model is created from this input data. Afterwards, new data can
be classified using the predict() method based on the learned model.
Step 4: The Scikit-learn and the Pandas library to be installed (pip install sklearn, pip install
pandas).
Step 5
input_data.txt
VAR1,VAR2,cLASS
1.713,1.586,0
0.180,1.786,1
0.353,1.240,1
0.940,1.566,0
1.486,0.759,1
1.266,1.106,0
1.540,0.419,1
0.459,1.799,1
0.773,0.186,1
Step 6
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
import pickle
# read csv input file
input_data = pd.read_csv("input_data.txt", sep=",")
print(input_data.to_string())
# initialize KMeans object specifying the number of desired clusters
kmeans = KMeans(n_clusters=3)
# learning the clustering from the input date
kmeans.fit(input_data.values)
# output the labels for the input data
print(kmeans.labels_)
# predict the classification for given data sample
predicted_class = kmeans.predict([[0.906,0.606,1]])
print(predicted_class)
Output
VAR1 VAR2 cLASS
0 1.713 1.586 0
1 0.180 1.786 1
2 0.353 1.240 1
3 0.940 1.566 0
4 1.486 0.759 1
5 1.266 1.106 0
6 1.540 0.419 1
7 0.459 1.799 1
8 0.773 0.186 1
[1 0 0 1 2 1 2 0 2]
[2]
EXPERIMENT NO:5
The following training examples map descriptions of individuals onto high, medium and low
credit-worthiness.
Aim:
To unconditional probability of `golf' and the conditional probability of `single' given
`medRisk' in the dataset
medium skiing design single twenties no ->highRisk

high golf trading married forties yes ->lowRisk
low speedway transport married thirties yes ->medRisk
medium football banking single thirties yes ->lowRisk
high flying media married fifties yes ->highRisk
low football security single twenties no ->medRisk
medium golf media single thirties yes ->medRisk
medium golf transport married forties yes ->lowRisk
high skiing banking single thirties yes ->highRisk
low golf unemployed married forties yes ->highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?
Theory:
Calculations of parts:
P(A) = (2+1) / (4+2+ 3+1) =0 .3
P(B) = (3+1) / (4+2+ 3+1) = 0.4
P(A∩B) = (.1) / (4+2+ 3+1) =0 .1
And per the formula, P(A|B) = P(A ∩ B) / P(B), put it together.
P(A|B) =0.1 / 0.4= 0.25
unconditional probability of `golf' and given `medRisk' in the dataset is 25%.
import pandas as pd
import numpy as np
df = pd.read_csv('pd.csv')
df.head(10)
print(len(df))
print(df.to_string())
df['Arecreation'] = np.where(df['recreation']=='golf', 1, 0)
df['Arisk'] = np.where(df['risk']=='medRisk', 1, 0)
df['count'] = 1
df = df[['Arecreation','Arisk','count']]
df.head()
print(df.to_string())
table=pd.pivot_table(
df,
values='count',
index=['Arecreation'],
columns=['Arisk'],
aggfunc=np.size,
fill_value=0
print(table)
a0=table.at[0,0]
a1=table.at[0,1]
a2=table.at[1,0]
a3=table.at[1,1]
pa=(a1+a3)/(a0+a1+a2+a3)
pb=(a2+a3)/(a0+a1+a2+a3)
p_a_and_b=(a3/(a0+a1+a2+a3))
p_a_gives_b=p_a_and_b/pb
print(p_a_gives_b)
print('P(A|B) = %.f%%' % (p_a_gives_b * 100))

output
10
income recreation job status age-group home-owner risk
0 medium skiing design single twenties no highRisk
1 high golf trading married forties yes lowRisk
2 low speedway transport married thirties yes medRisk
3 medium football banking single thirties yes lowRisk
4 high flying media married fifties yes highRisk
5 low football security single twenties no medRisk
6 medium golf media single thirties yes medRisk
7 medium golf transport married forties yes lowRisk
8 high skiing banking single thirties yes highRisk
9 low golf unemployed married forties yes highRisk
Arecreation Arisk count
0 0 0 1
1 1 0 1
2 0 1 1
3 0 0 1
4 0 0 1
5 0 1 1
6 1 1 1
7 1 0 1
8 0 0 1
9 1 0 1
Arisk 01
Arecreation
0 4 2
1 3 1
0.25
P(A|B) = 25%
EXPERIMENT NO:6
Implement linear regression using python.
Aim:
To implement linear regression using python
Theory:
Linear Regression (Python Implementation)
Linear regression is a statistical method for modelling relationship between a dependent

variable with a given set of independent variables.
In order to provide a basic understanding of linear regression, we start with the most basic
version of linear regression, i.e. Simple linear regression.
Simple Linear Regression

Simple linear regression is an approach for predicting a response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear
function that predicts the response value(y) as accurately as possible as a function of the
feature or independent variable(x).
Let us consider a dataset where we have a value of response y for every feature x:
For generality, we define:

x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10).
A scatter plot of above dataset looks like:-
Now, the task is to find a line which fits best in above scatter plot so that we can predict
the response for any new feature values. (i.e a value of x not present in dataset)
This line is called regression line.
The equation of regression line is represented as:
Here,
• h(x_i) represents the predicted response value for ith observation.

• b_0 and b_1 are regression coefficients and represent y-intercept and slope of
regression line respectively.
To create our model, we must “learn” or estimate the values of regression coefficients b_0
and b_1. And once we’ve estimated these coefficients, we can use the model to predict
responses!
In this article, we are going to use the principle of Least Squares .
Now consider:
Here, e_i is residual error in ith observation.

So, our aim is to minimize the total residual error.
We define the squared error or cost function, J as:
and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:
where SS_xy is the sum of cross-deviations of y and x:
and SS_xx is the sum of squared deviations of x:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if name == " main ":
main()
Output
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
MRCE

R18 B ML LAB Manual - Minor Degree

Uploaded by

Copyright:

Available Formats

R18 B ML LAB Manual - Minor Degree

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R18 B ML LAB Manual - Minor Degree

Uploaded by

Copyright:

Available Formats

EXPERIMENT NO:1

(It is Friday∩student is absent)= P(A∩B)=0.03

The answer is 0.15

# calculate P(A|B) given P(A and B) and P(B)

def bayes_theorem(p_a_b, p_b):

# calculate P(A|B) = P(A and B) / P(B)

p_a_given_b = (p_a_b) / p_b

result = bayes_theorem(p_a_b, p_b)

print('P(A|B) = %.f%%' % (result * 100))

1. Connect to MySQL from Python

2. Define a SQL SELECT Query

3. Get Cursor Object from Connection

4. Execute the SELECT query using execute() method

Execute the select query using the cursor.execute() method.

5. Extract all rows from a result

6. Iterate each row

7. Close the cursor object and database connection object

from mysql.connector import Error

sql_select_Query = "SELECT * FROM employee"

print("Total number of rows in employee is: ", cursor.rowcount)

print("\nPrinting each employee record")

for row in records:

print("Id = ", row[0],"\n" )

print("Name = ", row[1], "\n")

print("Address = ", row[2])

print("Join date = ", row[3], "\n")

print("Error reading data from MySQL table", e)

print("MySQL connection is closed")

For Insert the value Python program

from mysql.connector import Error

sql = "INSERT INTO employee(id,Name,empaddress,edoj)VALUES (%s,%s,%s,%s)"

print(mycursor.rowcount, "was inserted.")

print("Error reading data from MySQL table", e)

print("MySQL connection is closed")

Total number of rows in employee is: 4

Printing each employee record

Join date = 2015-12-17

Join date = 2016-12-18

Join date = 2019-09-12

Address = Apple st 652

Join date = 2019-09-12

# Create object of KNN classifier

from sklearn.cluster import KMeans

# read csv input file

input_data = pd.read_csv("input_data.txt", sep=",")

# initialize KMeans object specifying the number of desired clusters

# learning the clustering from the input date

# output the labels for the input data

# predict the classification for given data sample

VAR1 VAR2 cLASS

medium skiing design single twenties no ->highRisk

P(A|B) =0.1 / 0.4= 0.25

unconditional probability of `golf' and given `medRisk' in the dataset is 25%.

print('P(A|B) = %.f%%' % (p_a_gives_b * 100))

income recreation job status age-group home-owner risk

0 medium skiing design single twenties no highRisk

1 high golf trading married forties yes lowRisk

2 low speedway transport married thirties yes medRisk

3 medium football banking single thirties yes lowRisk

SS_xy = np.sum(yx) - nm_y*m_x

SS_xx = np.sum(xx) - nm_x*m_x