Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R18 B ML LAB Manual - Minor Degree

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

EXPERIMENT NO:1

The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is the probability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)

Aim:
To Find the the probability that a student is absent given that today is Friday from given
data with Baye’s rule in python.

Theory:
P (Today is Friday)=0.2
P(B)=0.2

(It is Friday∩student is absent)= P(A∩B)=0.03


P(A | B) = P(A and B) / P(B)

it is required to find
P(student is absent |today is Friday) P(A|B)
The formula for obtaining the conditional probability of event A, given the event B has
occurred is as follows:

P(A|B)=P(A∩B)/P(B)
Thus the required probability is as follows:
P(student is absent| today is Friday)=P(It is Friday∩student is absent)/P(Today is Friday)
=0.03/0.2=0.15

The answer is 0.15

PROCEDURE / PROGRAMME

# calculate P(A|B) given P(A and B) and P(B)

def bayes_theorem(p_a_b, p_b):

# calculate P(A|B) = P(A and B) / P(B)

p_a_given_b = (p_a_b) / p_b

return p_a_given_b

# P(A and B)

p_a_b = 0.03

# P(B)

p_b = 0.20

# calculate P(A|B)

result = bayes_theorem(p_a_b, p_b)

# summarize

print('P(A|B) = %.f%%' % (result * 100))


EXPERIMENT NO:2
Extract the data from database using python

Aim:
To extract the data from database using python

Theory:

1. Connect to MySQL from Python

Refer to Python MySQL database connection to connect to MySQL database from Python
using MySQL Connector module

2. Define a SQL SELECT Query

Next, prepare a SQL SELECT query to fetch rows from a table. You can select all or limited
rows based on your requirement. If the where condition is used, then it decides the number
of rows to fetch.
For example, SELECT col1, col2,…colnN FROM MySQL_table WHERE id = 10;. This will
return row number 10.

3. Get Cursor Object from Connection

Next, use a connection.cursor() method to create a cursor object. This method creates a
new MySQLCursor object.

4. Execute the SELECT query using execute() method

Execute the select query using the cursor.execute() method.

5. Extract all rows from a result

After successfully executing a Select operation, Use the fetchall() method of a cursor
object to get all rows from a query result. it returns a list of rows.

6. Iterate each row

Iterate a row list using a for loop and access each row individually (Access each row’s
column data using a column name or index number.)

7. Close the cursor object and database connection object

use cursor.clsoe() and connection.clsoe() method to close open connections after your
work completes.
PROCEDURE / PROGRAMME

import mysql.connector

from mysql.connector import Error

try:

connection=mysql.connector.connect(host='localhost',database='employeeDB',charset='
utf8',user='root',password='root')

print("connected")

sql_select_Query = "SELECT * FROM employee"

cursor = connection.cursor()

cursor.execute(sql_select_Query)

records = cursor.fetchall()

print("Total number of rows in employee is: ", cursor.rowcount)

print("\nPrinting each employee record")

for row in records:

print("Id = ", row[0],"\n" )

print("Name = ", row[1], "\n")

print("Address = ", row[2])

print("Join date = ", row[3], "\n")

except Error as e:

print("Error reading data from MySQL table", e)

connection.close()

cursor.close()

print("MySQL connection is closed")

For Insert the value Python program

import mysql.connector

from mysql.connector import Error

try:

mydb =
mysql.connector.connect(host='localhost',database='employeeDB',charset='utf8',user='r
oot',password='root')

mycursor = mydb.cursor()

sql = "INSERT INTO employee(id,Name,empaddress,edoj)VALUES (%s,%s,%s,%s)"

val = [
(2111,'rubesh','Lowstreet 4','2019-09-12'),

(2121,'siva','Apple st 652','2019-09-12'),

mycursor.executemany(sql, val)

mydb.commit()

print(mycursor.rowcount, "was inserted.")

except Error as e:

print("Error reading data from MySQL table", e)

finally:

if mydb.is_connected():

mydb.close()

#cursor.close()

print("MySQL connection is closed")

Output

connected

Total number of rows in employee is: 4

Printing each employee record

Id = 111

Name = siva

Address = madurai

Join date = 2015-12-17

Id = 112

Name = Ram

Address = Theni

Join date = 2016-12-18

Id = 2111

Name = rubesh

Address = Lowstreet 4

Join date = 2019-09-12

Id = 2121

Name = siva

Address = Apple st 652

Join date = 2019-09-12


EXPERIMENT NO:3
Implement k-nearest neighbours classification using python
Aim:
To implement k-nearest neighbours classification using python

Theory:

• K-Nearest Neighbors is one of the most basic yet essential classification algorithms in
Machine Learning. It belongs to the supervised learning domain and finds intense
application in pattern recognition, data mining and intrusion detection.
• It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not
make any underlying assumptions about the distribution of data.
• Algorithm
Input: Let m be the number of training data samples. Let p be an unknown point.
Method:
1. Store the training samples in an array of data points arr[]. This means each element
of this array represents a tuple (x, y).
2. for i=0 to m
Calculate Euclidean distance d(arr[i], p).
3. Make set S of K smallest distances obtained. Each of these distances correspond to
an already classified data point.
4. Return the majority label among S.

PROCEDURE / PROGRAMME :
# import the required packages
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
# Load dataset
iris=datasets.load_iris()
print("Iris Data set loaded...")
# Split the data into train and test samples
x_train, x_test, y_train, y_test = train_test_split(iris.data,iris.target,test_size=0.1)
print("Dataset is split into training and testing...")
print("Size of trainng data and its label",x_train.shape,y_train.shape)
print("Size of trainng data and its label",x_test.shape, y_test.shape)
# Prints Label no. and their names
for i in range(len(iris.target_names)):
print("Label", i , "-",str(iris.target_names[i]))

# Create object of KNN classifier


classifier = KNeighborsClassifier(n_neighbors=1)
# Perform Training
classifier.fit(x_train, y_train)
# Perform testing
y_pred=classifier.predict(x_test)
# Display the results
print("Results of Classification using K-nn with K=1 ")
for r in range(0,len(x_test)):
print(" Sample:", str(x_test[r]), " Actual-label:", str(y_test[r]), " Predicted-label:",
str(y_pred[r]))
print("Classification Accuracy :" , classifier.score(x_test,y_test));
#from sklearn.metrics import classification_report, confusion_matrix
#print('Confusion Matrix')
#print(confusion_matrix(y_test,y_pred))
#print('Accuracy Metrics')
#print(classification_report(y_test,y_pred))

Output

Result-1
Iris Data set loaded...
Dataset is split into training and testing samples...
Size of trainng data and its label (135, 4) (135,)
Size of trainng data and its label (15, 4) (15,)
Label 0 - setosa
Label 1 - versicolor
Label 2 - virginica
Results of Classification using K-nn with K=1
Sample: [4.4 3. 1.3 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 2.5 3. 1.1] Actual-label: 1 Predicted-label: 1
Sample: [6.1 2.8 4. 1.3] Actual-label: 1 Predicted-label: 1
Sample: [6. 2.7 5.1 1.6] Actual-label: 1 Predicted-label: 2
Sample: [6.7 2.5 5.8 1.8] Actual-label: 2 Predicted-label: 2
Sample: [5.1 3.8 1.5 0.3] Actual-label: 0 Predicted-label: 0
Sample: [6.7 3.1 4.4 1.4] Actual-label: 1 Predicted-label: 1
Sample: [4.8 3.4 1.6 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 3.5 1.4 0.3] Actual-label: 0 Predicted-label: 0
Sample: [5.4 3.7 1.5 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.7 2.8 4.1 1.3] Actual-label: 1 Predicted-label: 1
Sample: [4.5 2.3 1.3 0.3] Actual-label: 0 Predicted-label: 0
Sample: [4.4 2.9 1.4 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 3.5 1.4 0.2] Actual-label: 0 Predicted-label: 0
Sample: [6.2 3.4 5.4 2.3] Actual-label: 2 Predicted-label: 2
Classification Accuracy : 0.93
EXPERIMENT NO:4

Given the following data, which specify classifications for nine combinations of VAR1 and VAR2
predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
kmeans clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1

Aim:
To predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
kmeans clustering with 3 means and given data.

Theory:
Step 1: Python 3 code snippet demonstrates the implementation of a simple K-Means
clustering to automatically divide input data into groups based on given features.
Step 2: “ , “ separated CSV file is loaded first, which contains three corresponding input
columns.
Step 3: K-Means clustering model is created from this input data. Afterwards, new data can
be classified using the predict() method based on the learned model.
Step 4: The Scikit-learn and the Pandas library to be installed (pip install sklearn, pip install
pandas).
Step 5
input_data.txt

VAR1,VAR2,cLASS

1.713,1.586,0

0.180,1.786,1

0.353,1.240,1

0.940,1.566,0

1.486,0.759,1

1.266,1.106,0

1.540,0.419,1

0.459,1.799,1

0.773,0.186,1
Step 6

PROCEDURE / PROGRAMME

from sklearn.cluster import KMeans

import pandas as pd

import numpy as np

import pickle

# read csv input file

input_data = pd.read_csv("input_data.txt", sep=",")

print(input_data.to_string())

# initialize KMeans object specifying the number of desired clusters

kmeans = KMeans(n_clusters=3)

# learning the clustering from the input date

kmeans.fit(input_data.values)

# output the labels for the input data

print(kmeans.labels_)

# predict the classification for given data sample

predicted_class = kmeans.predict([[0.906,0.606,1]])

print(predicted_class)

Output

VAR1 VAR2 cLASS

0 1.713 1.586 0

1 0.180 1.786 1

2 0.353 1.240 1

3 0.940 1.566 0

4 1.486 0.759 1

5 1.266 1.106 0

6 1.540 0.419 1

7 0.459 1.799 1

8 0.773 0.186 1

[1 0 0 1 2 1 2 0 2]

[2]
EXPERIMENT NO:5
The following training examples map descriptions of individuals onto high, medium and low
credit-worthiness.

Aim:
To unconditional probability of `golf' and the conditional probability of `single' given
`medRisk' in the dataset

medium skiing design single twenties no ->highRisk


high golf trading married forties yes ->lowRisk
low speedway transport married thirties yes ->medRisk
medium football banking single thirties yes ->lowRisk
high flying media married fifties yes ->highRisk
low football security single twenties no ->medRisk
medium golf media single thirties yes ->medRisk
medium golf transport married forties yes ->lowRisk
high skiing banking single thirties yes ->highRisk
low golf unemployed married forties yes ->highRisk

Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?

Theory:

Calculations of parts:
P(A) = (2+1) / (4+2+ 3+1) =0 .3
P(B) = (3+1) / (4+2+ 3+1) = 0.4
P(A∩B) = (.1) / (4+2+ 3+1) =0 .1
And per the formula, P(A|B) = P(A ∩ B) / P(B), put it together.

P(A|B) =0.1 / 0.4= 0.25

unconditional probability of `golf' and given `medRisk' in the dataset is 25%.

PROCEDURE / PROGRAMME

import pandas as pd

import numpy as np

df = pd.read_csv('pd.csv')

df.head(10)

print(len(df))

print(df.to_string())

df['Arecreation'] = np.where(df['recreation']=='golf', 1, 0)

df['Arisk'] = np.where(df['risk']=='medRisk', 1, 0)
df['count'] = 1

df = df[['Arecreation','Arisk','count']]

df.head()

print(df.to_string())

table=pd.pivot_table(

df,

values='count',

index=['Arecreation'],

columns=['Arisk'],

aggfunc=np.size,

fill_value=0

print(table)

a0=table.at[0,0]

a1=table.at[0,1]

a2=table.at[1,0]

a3=table.at[1,1]

pa=(a1+a3)/(a0+a1+a2+a3)

pb=(a2+a3)/(a0+a1+a2+a3)

p_a_and_b=(a3/(a0+a1+a2+a3))

p_a_gives_b=p_a_and_b/pb

print(p_a_gives_b)

print('P(A|B) = %.f%%' % (p_a_gives_b * 100))


output
10

income recreation job status age-group home-owner risk

0 medium skiing design single twenties no highRisk

1 high golf trading married forties yes lowRisk

2 low speedway transport married thirties yes medRisk

3 medium football banking single thirties yes lowRisk

4 high flying media married fifties yes highRisk

5 low football security single twenties no medRisk

6 medium golf media single thirties yes medRisk

7 medium golf transport married forties yes lowRisk

8 high skiing banking single thirties yes highRisk

9 low golf unemployed married forties yes highRisk

Arecreation Arisk count

0 0 0 1

1 1 0 1

2 0 1 1

3 0 0 1

4 0 0 1

5 0 1 1

6 1 1 1

7 1 0 1

8 0 0 1

9 1 0 1

Arisk 01

Arecreation

0 4 2

1 3 1

0.25

P(A|B) = 25%
EXPERIMENT NO:6
Implement linear regression using python.

Aim:
To implement linear regression using python

Theory:

Linear Regression (Python Implementation)

Linear regression is a statistical method for modelling relationship between a dependent


variable with a given set of independent variables.

In order to provide a basic understanding of linear regression, we start with the most basic
version of linear regression, i.e. Simple linear regression.

Simple Linear Regression


Simple linear regression is an approach for predicting a response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear
function that predicts the response value(y) as accurately as possible as a function of the
feature or independent variable(x).
Let us consider a dataset where we have a value of response y for every feature x:

For generality, we define:


x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10).
A scatter plot of above dataset looks like:-
Now, the task is to find a line which fits best in above scatter plot so that we can predict
the response for any new feature values. (i.e a value of x not present in dataset)
This line is called regression line.
The equation of regression line is represented as:

Here,

• h(x_i) represents the predicted response value for ith observation.


• b_0 and b_1 are regression coefficients and represent y-intercept and slope of
regression line respectively.
To create our model, we must “learn” or estimate the values of regression coefficients b_0
and b_1. And once we’ve estimated these coefficients, we can use the model to predict
responses!
In this article, we are going to use the principle of Least Squares .
Now consider:

Here, e_i is residual error in ith observation.


So, our aim is to minimize the total residual error.
We define the squared error or cost function, J as:

and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:

where SS_xy is the sum of cross-deviations of y and x:

and SS_xx is the sum of squared deviations of x:

PROCEDURE / PROGRAMME

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot

plt.scatter(x, y, color = "m",marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

def main():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \

\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if name == " main ":

main()

Output

Estimated coefficients:

b_0 = 1.2363636363636363

b_1 = 1.1696969696969697
MRCE

You might also like