R18 B ML LAB Manual - Minor Degree
R18 B ML LAB Manual - Minor Degree
R18 B ML LAB Manual - Minor Degree
The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is the probability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)
Aim:
To Find the the probability that a student is absent given that today is Friday from given
data with Baye’s rule in python.
Theory:
P (Today is Friday)=0.2
P(B)=0.2
it is required to find
P(student is absent |today is Friday) P(A|B)
The formula for obtaining the conditional probability of event A, given the event B has
occurred is as follows:
P(A|B)=P(A∩B)/P(B)
Thus the required probability is as follows:
P(student is absent| today is Friday)=P(It is Friday∩student is absent)/P(Today is Friday)
=0.03/0.2=0.15
PROCEDURE / PROGRAMME
return p_a_given_b
# P(A and B)
p_a_b = 0.03
# P(B)
p_b = 0.20
# calculate P(A|B)
# summarize
Aim:
To extract the data from database using python
Theory:
Refer to Python MySQL database connection to connect to MySQL database from Python
using MySQL Connector module
Next, prepare a SQL SELECT query to fetch rows from a table. You can select all or limited
rows based on your requirement. If the where condition is used, then it decides the number
of rows to fetch.
For example, SELECT col1, col2,…colnN FROM MySQL_table WHERE id = 10;. This will
return row number 10.
Next, use a connection.cursor() method to create a cursor object. This method creates a
new MySQLCursor object.
After successfully executing a Select operation, Use the fetchall() method of a cursor
object to get all rows from a query result. it returns a list of rows.
Iterate a row list using a for loop and access each row individually (Access each row’s
column data using a column name or index number.)
use cursor.clsoe() and connection.clsoe() method to close open connections after your
work completes.
PROCEDURE / PROGRAMME
import mysql.connector
try:
connection=mysql.connector.connect(host='localhost',database='employeeDB',charset='
utf8',user='root',password='root')
print("connected")
cursor = connection.cursor()
cursor.execute(sql_select_Query)
records = cursor.fetchall()
except Error as e:
connection.close()
cursor.close()
import mysql.connector
try:
mydb =
mysql.connector.connect(host='localhost',database='employeeDB',charset='utf8',user='r
oot',password='root')
mycursor = mydb.cursor()
val = [
(2111,'rubesh','Lowstreet 4','2019-09-12'),
(2121,'siva','Apple st 652','2019-09-12'),
mycursor.executemany(sql, val)
mydb.commit()
except Error as e:
finally:
if mydb.is_connected():
mydb.close()
#cursor.close()
Output
connected
Id = 111
Name = siva
Address = madurai
Id = 112
Name = Ram
Address = Theni
Id = 2111
Name = rubesh
Address = Lowstreet 4
Id = 2121
Name = siva
Theory:
• K-Nearest Neighbors is one of the most basic yet essential classification algorithms in
Machine Learning. It belongs to the supervised learning domain and finds intense
application in pattern recognition, data mining and intrusion detection.
• It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not
make any underlying assumptions about the distribution of data.
• Algorithm
Input: Let m be the number of training data samples. Let p be an unknown point.
Method:
1. Store the training samples in an array of data points arr[]. This means each element
of this array represents a tuple (x, y).
2. for i=0 to m
Calculate Euclidean distance d(arr[i], p).
3. Make set S of K smallest distances obtained. Each of these distances correspond to
an already classified data point.
4. Return the majority label among S.
PROCEDURE / PROGRAMME :
# import the required packages
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
# Load dataset
iris=datasets.load_iris()
print("Iris Data set loaded...")
# Split the data into train and test samples
x_train, x_test, y_train, y_test = train_test_split(iris.data,iris.target,test_size=0.1)
print("Dataset is split into training and testing...")
print("Size of trainng data and its label",x_train.shape,y_train.shape)
print("Size of trainng data and its label",x_test.shape, y_test.shape)
# Prints Label no. and their names
for i in range(len(iris.target_names)):
print("Label", i , "-",str(iris.target_names[i]))
Output
Result-1
Iris Data set loaded...
Dataset is split into training and testing samples...
Size of trainng data and its label (135, 4) (135,)
Size of trainng data and its label (15, 4) (15,)
Label 0 - setosa
Label 1 - versicolor
Label 2 - virginica
Results of Classification using K-nn with K=1
Sample: [4.4 3. 1.3 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 2.5 3. 1.1] Actual-label: 1 Predicted-label: 1
Sample: [6.1 2.8 4. 1.3] Actual-label: 1 Predicted-label: 1
Sample: [6. 2.7 5.1 1.6] Actual-label: 1 Predicted-label: 2
Sample: [6.7 2.5 5.8 1.8] Actual-label: 2 Predicted-label: 2
Sample: [5.1 3.8 1.5 0.3] Actual-label: 0 Predicted-label: 0
Sample: [6.7 3.1 4.4 1.4] Actual-label: 1 Predicted-label: 1
Sample: [4.8 3.4 1.6 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 3.5 1.4 0.3] Actual-label: 0 Predicted-label: 0
Sample: [5.4 3.7 1.5 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.7 2.8 4.1 1.3] Actual-label: 1 Predicted-label: 1
Sample: [4.5 2.3 1.3 0.3] Actual-label: 0 Predicted-label: 0
Sample: [4.4 2.9 1.4 0.2] Actual-label: 0 Predicted-label: 0
Sample: [5.1 3.5 1.4 0.2] Actual-label: 0 Predicted-label: 0
Sample: [6.2 3.4 5.4 2.3] Actual-label: 2 Predicted-label: 2
Classification Accuracy : 0.93
EXPERIMENT NO:4
Given the following data, which specify classifications for nine combinations of VAR1 and VAR2
predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
kmeans clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
Aim:
To predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
kmeans clustering with 3 means and given data.
Theory:
Step 1: Python 3 code snippet demonstrates the implementation of a simple K-Means
clustering to automatically divide input data into groups based on given features.
Step 2: “ , “ separated CSV file is loaded first, which contains three corresponding input
columns.
Step 3: K-Means clustering model is created from this input data. Afterwards, new data can
be classified using the predict() method based on the learned model.
Step 4: The Scikit-learn and the Pandas library to be installed (pip install sklearn, pip install
pandas).
Step 5
input_data.txt
VAR1,VAR2,cLASS
1.713,1.586,0
0.180,1.786,1
0.353,1.240,1
0.940,1.566,0
1.486,0.759,1
1.266,1.106,0
1.540,0.419,1
0.459,1.799,1
0.773,0.186,1
Step 6
PROCEDURE / PROGRAMME
import pandas as pd
import numpy as np
import pickle
print(input_data.to_string())
kmeans = KMeans(n_clusters=3)
kmeans.fit(input_data.values)
print(kmeans.labels_)
predicted_class = kmeans.predict([[0.906,0.606,1]])
print(predicted_class)
Output
0 1.713 1.586 0
1 0.180 1.786 1
2 0.353 1.240 1
3 0.940 1.566 0
4 1.486 0.759 1
5 1.266 1.106 0
6 1.540 0.419 1
7 0.459 1.799 1
8 0.773 0.186 1
[1 0 0 1 2 1 2 0 2]
[2]
EXPERIMENT NO:5
The following training examples map descriptions of individuals onto high, medium and low
credit-worthiness.
Aim:
To unconditional probability of `golf' and the conditional probability of `single' given
`medRisk' in the dataset
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?
Theory:
Calculations of parts:
P(A) = (2+1) / (4+2+ 3+1) =0 .3
P(B) = (3+1) / (4+2+ 3+1) = 0.4
P(A∩B) = (.1) / (4+2+ 3+1) =0 .1
And per the formula, P(A|B) = P(A ∩ B) / P(B), put it together.
PROCEDURE / PROGRAMME
import pandas as pd
import numpy as np
df = pd.read_csv('pd.csv')
df.head(10)
print(len(df))
print(df.to_string())
df['Arecreation'] = np.where(df['recreation']=='golf', 1, 0)
df['Arisk'] = np.where(df['risk']=='medRisk', 1, 0)
df['count'] = 1
df = df[['Arecreation','Arisk','count']]
df.head()
print(df.to_string())
table=pd.pivot_table(
df,
values='count',
index=['Arecreation'],
columns=['Arisk'],
aggfunc=np.size,
fill_value=0
print(table)
a0=table.at[0,0]
a1=table.at[0,1]
a2=table.at[1,0]
a3=table.at[1,1]
pa=(a1+a3)/(a0+a1+a2+a3)
pb=(a2+a3)/(a0+a1+a2+a3)
p_a_and_b=(a3/(a0+a1+a2+a3))
p_a_gives_b=p_a_and_b/pb
print(p_a_gives_b)
0 0 0 1
1 1 0 1
2 0 1 1
3 0 0 1
4 0 0 1
5 0 1 1
6 1 1 1
7 1 0 1
8 0 0 1
9 1 0 1
Arisk 01
Arecreation
0 4 2
1 3 1
0.25
P(A|B) = 25%
EXPERIMENT NO:6
Implement linear regression using python.
Aim:
To implement linear regression using python
Theory:
In order to provide a basic understanding of linear regression, we start with the most basic
version of linear regression, i.e. Simple linear regression.
Here,
and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:
PROCEDURE / PROGRAMME
import numpy as np
# number of observations/points
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
# putting labels
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
plot_regression_line(x, y, b)
main()
Output
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
MRCE