Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
23 views25 pages

Lab Manual

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 25




1. Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Examples:
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong
Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change
No 4 Sunny Warm High Strong Cool Change Yes
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")

The Given Training Data Set
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']
The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis
For Training Example No:0 the hypothesis is
['sunny', 'warm', 'normal', 'strong', 'warm', 'same']
For Training Example No:1 the hypothesis is
['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:2 the hypothesis is
'sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:3 the hypothesis is
'sunny', 'warm', '?', 'strong', '?', '?']
The Maximally Specific Hypothesis for a given Training Examples:
['sunny', 'warm', '?', 'strong', '?', '?']
2. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.


The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses
from H that are consistent with an observed sequence of training examples.
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
• Remove s from S
• Add to S all minimal generalizations h of s such that
• h is consistent with d, and some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S

• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in G

CANDIDATE- ELIMINTION algorithm using version spaces

Training Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

Data Set: AirTemp Humidity Wind Water Forecast EnjoySport

sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new

import pandas as pd
import math
import numpy as np
data = pd.read_csv("Dataset/4-dataset.csv")
features = [feat for feat in data]
Create a class named Node with four members children, value, isLeaf and pred.
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""

Define a function called entropy to find the entropy oof the dataset.

def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))

Define a function named info_gain to find the gain of the attribute

def info_gain(examples, attr):

uniq = np.unique(examples[attr])
#print ("\n",uniq)
gain = entropy(examples)
#print ("\n",gain)
for u in uniq:
subdata = examples[examples[attr] == u]
#print ("\n",subdata)
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
#print ("\n",gain)
return gain

Define a function named ID3 to get the decision tree for the given dataset
def ID3(examples, attrs):
root = Node()

max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
child = ID3(subdata, new_attrs)

return root

Define a function named printTree to draw the decision tree

def printTree(root: Node, depth=0):

for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
for child in root.children:
printTree(child, depth + 1)

Define a function named classify to classify the new example

def classify(root: Node, new):

for child in root.children:
if child.value == new[root.value]:
if child.isLeaf:
print ("Predicted Label for new example", new," is:", child.pred)
classify (child.children[0], new)

Finally, call the ID3, printTree and classify functions

root = ID3(data, features)

print("Decision Tree is:")
print ("------------------")

new = {"outlook":"sunny", "temperature":"hot", "humidity":"normal", "wind":"strong"}

classify (root, new)


Decision Tree is:

overcast -> ['yes']

strong -> ['no']

weak -> ['yes']

high -> ['no']

normal -> ['yes']

Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot', 'humidity': 'normal', 'wind':
'strong'} is: ['yes']
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same using appropriate data sets.

Training Examples:

Expected % in
Example Sleep Study
1 2 9 92

2 1 5 86

3 3 6 89

Normalize the input

Expected %
Example Sleep Study
in Exams
1 2/3 = 0.66666667 9/9 = 1 0.92

2 1/3 = 0.33333333 5/9 = 0.55555556 0.86

3 3/3 = 1 6/9 = 0.66666667 0.89


import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinallyy = y/100

#Sigmoid Functiondef sigmoid (x):

return 1/(1 + np.exp(-x))

#Derivative of Sigmoid Functiondef derivatives_sigmoid(x):

return x * (1 - x)

#Variable initialization
epoch=5000 #Setting training
iterationslr=0.1 #Setting earning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neur ons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neuron s))
#draws a random range of numbers uniformly of dim x*yfor i in range(epoch):

#Forward Propogation
)hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout) outinp= outinp1+
output = sigmoid(outinp)

EO = y-
outgrad = derivatives_sigmoid(output)d_output = EO*
EH = d_output.dot(wout.T)

#how much hidden layer wts contributed to error hiddengrad =

derivatives_sigmoid(hlayer_act)d_hiddenlayer = EH * hiddengrad

# dotproduct of nextlayererror and currentlayerop wout +=

hlayer_act.T.dot(d_output) *lr
wh += X.T.dot(d_hiddenlayer) *lr

print("Input: \n" + str(X)) print("Actual Output: \n" + str(y))

print("Predicted Output: \n" ,output)


[[0.66666667 1.]
[0.33333333 0.55555556]
[1. 0.66666667]]


Predicted Output:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both
correct and wrong predictions.

Data Set:

Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the Class


from sklearn.model_selection import train_test_split from

sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrixfrom sklearn import datasets

""" Iris Plants Dataset, dataset contains 150 (50 in each of three classes)Number of Attributes: 4
numeric, predictive attributes and the Class

""" The x variable contains the first four columns of the dataset(i.e. attributes) while y contains the
""" x = iris.data
y = iris.target

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')print(x)

print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')print(y)
""" Splits the dataset into 70% train data and 30% test data. Thismeans that out of total 150
records, the training set will contain
105 records and the test set contains 45 of those records"""
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)

#To Training the model and Nearest nighbors


classifier = KNeighborsClassifier(n_neighbors=5) classifier.fit(x_train, y_train)

#to make predictions on our test



""" For evaluating an algorithm, confusion matrix, precision, recalland f1 score are the most
commonly used metrics.
""" print('Confusion Matrix') print(confusion_matrix(y_test,y_pred)) print('Accuracy
Metrics') print(classification_report(y_test,y_pred))


sepal-length sepal-width petal-length petal-width

[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
. . . . .
. . . . .

[6.2 3.4 5.4 2.3]

[5.9 3. 5.1 1.8]]

class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica

[0 0 0 ………0 0 1 1 1 …………1 1 2 2 2 ………… 2 2]

Confusion Matrix
[[20 0 0]
[0 1 0]
[0 1 14]]
Accuracy Metrics

Precision recall f1-score support

0 1.00 1.00 1.00 20

1 0.91 1.00 0.95 10
2 1.00 0.93 0.97 15

avg / total 0.98 0.98 0.98 45

Experiment-9: Implement the non-parametric Locally Weighted Regression algorithm in order
to fit data points. Select appropriate data set for your experiment and draw graphs.


import numpy as np
from bokeh.plotting import figure, show, output_notebook from bokeh.layouts import
from bokeh.io import push_notebook

def local_regression(x0, X, Y, tau):# add bias termx0 = np.r_[1, x0] # Add one to
avoid the loss in
X = np.c_[np.ones(len(X)), X]

# fit model: normal equations with kernel

xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W

beta = np.linalg.pinv(xw @ X) @ xw @ Y #@ MatrixMultiplication or Dot

# predict value
return x0 @ beta # @ Matrix Multiplication or Dot Productfor prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau *tau))
# Weight or Radial Kernal Bias Function

n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])Y = np.log(np.abs(X ** 2 -
1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y
# jitter X
X += np.random.normal(scale=.1, size=n) print("Normalised (10 Samples) X

domain = np.linspace(-3, 3, num=300)

print(" Xo Domain Space(10 Samples) :\n",domain[1:10])def plot_lwr(tau):
# prediction through regression
prediction = [local_regression(x0, X, Y, tau) for x0 indomain]
plot = figure(plot_width=400, plot_height=400)plot.title.text='tau=%g' %
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2, color='red')return plot

[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.
Data set:

Text Documents Label

1 I love this sandwich pos
2 This is an amazing place pos
3 I feel very good about these beers pos
4 This is my best work pos
5 What an awesome view pos
6 I do not like this restaurant neg
7 I am tired of this stuff neg
8 I can't deal with this neg
9 He is my sworn enemy neg
10 My boss is horrible neg
11 This is an awesome place pos
12 I do not like the taste of this juice neg
13 I love to dance pos
14 I am sick and tired of this place neg
15 What a great holiday pos
16 That is a bad locality to stay neg
17 We will have good fun tomorrow pos
18 I went to my enemy's house today neg


import pandas as pd msg=pd.read_csv('naivetext.csv',names=['message','label']) print('The

dimensions of the dataset',msg.shape)
X=msg.message y=msg.labelnum

#splitting the dataset into train and test

from sklearn.model_selection import train_test_split

print ('\n The total number of Training Data :',ytrain.shape)print ('\n The total number of Test
Data :',ytest.shape)

#output of count vectoriser is a sparse


from sklearn.feature_extraction.text import CountVectorizer count_vect = CountVectorizer()

xtrain_dtm = count_vect.fit_transform(xtrain)
print('\n The words or Tokens in the text documents \n')

df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_fe ature_names())

# Training Naive Bayes (NB) classifier on training


from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(xtrain_dtm,ytrain)predicted =

#printing accuracy, Confusion matrix, Precision and


from sklearn import metrics

print('\n Accuracy of the classifer is’,

print('\n Confusion matrix') print(metrics.confusion_matrix(ytest,predicted))

print('\n The value of Precision' , metrics.precision_score(ytest,predicted))

print('\n The value of Recall' , metrics.recall_score(ytest,predicted))


The dimensions of the dataset (18, 2)

0 I love this sandwich
1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today

Name: message, dtype: object0 1

1 1
2 1
3 1
4 1
5 0
6 0
7 0
8 0
9 0
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64

The total number of Training

Data: (13,) The total number of
Test Data: (5,)

The words or Tokens in the text documents

['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'can', 'deal', 'do', 'enemy', 'feel',
'fun', 'good', 'great', 'have', 'he', 'holiday', 'house', 'is', 'like', 'love', 'my', 'not', 'of', 'place',
'restaurant', 'sandwich', 'sick', 'sworn', 'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very','view',
'we', 'went', 'what', 'will', 'with', 'work']

Accuracy of the
classifier is 0.8
Confusion matrix
[[2 1]
[0 2]]
The value of Precision
0.6666666666666666 The value of
Recall 1.0
Experiment-11: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the
quality of clustering. You can add Java/Python ML library classes/API in the program.

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np

iris = datasets.load_iris()

X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']

y = pd.DataFrame(iris.target)
y.columns = ['Targets']

model = KMeans(n_clusters=3)


colormap = np.array(['red', 'lime', 'black'])

# Plot the Original Classifications

plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')

# Plot the Models Classifications

plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K Mean Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))
print('The Confusion matrixof K-Mean: ',sm.confusion_matrix(y, model.labels_))

from sklearn import preprocessing

scaler = preprocessing.StandardScaler()
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)

from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(n_components=3)
y_gmm = gmm.predict(xs)

plt.subplot(2, 2, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_gmm], s=40)
plt.title('GMM Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')

print('The accuracy score of EM: ',sm.accuracy_score(y, y_gmm))

print('The Confusion matrix of EM: ',sm.confusion_matrix(y, y_gmm))
Write a Python program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set

Data Set:
Title: Heart Disease Databases

The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used by
ML researchers to this date. The "Heartdisease" field refers to the presence of heart diseasein the
patient. It is integer valued from 0 (no presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:

1. age: age in years

2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
 Value 1: typical angina
 Value 2: atypical angina
 Value 3: non-anginal pain
 Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
 Value 0: normal
 Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevationor depression of
> 0.05 mV)
 Value 2: showing probable or definite left ventricular hypertrophy by Estes'criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
 Value 1: upsloping
 Value 2: flat
 Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13. Heartdisease: It is integer valued from 0 (no presence) to 4.

#Learning CPDs using Maximum Likelihood


print('\n Learning CPD using Maximum likelihood estimators')


#Inferencing with Bayesian


print('\n Inferencing with Bayesian Network:') HeartDiseasetest_infer = VariableElimination(model)

#computing the Probability of HeartDisease given


print('\n 1.Probability of HeartDisease given evidence= restecg :1')


#computing the Probability of HeartDisease given


print('\n 2.Probability of HeartDisease given evidence= cp:2 ')


You might also like