Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
58 views

Machine Learning Lab

The document provides an overview of machine learning algorithms including Naive Bayes, Gaussian Naive Bayes, Multinomial Naive Bayes, and describes the basic steps involved in machine learning projects such as importing and cleaning data, splitting data into training and test sets, creating and training a model, making predictions, and evaluating and improving the model. It also discusses Python libraries commonly used for machine learning like NumPy, Pandas, Matplotlib, and Scikit-Learn.

Uploaded by

Dharanya V
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Machine Learning Lab

The document provides an overview of machine learning algorithms including Naive Bayes, Gaussian Naive Bayes, Multinomial Naive Bayes, and describes the basic steps involved in machine learning projects such as importing and cleaning data, splitting data into training and test sets, creating and training a model, making predictions, and evaluating and improving the model. It also discusses Python libraries commonly used for machine learning like NumPy, Pandas, Matplotlib, and Scikit-Learn.

Uploaded by

Dharanya V
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

MACHINE

LEARNING LAB
Dharanya S
◦ Types of Naive Bayes Algorithm
◦ Gaussian Naive Bayes When attribute values are continuous, an assumption is made that the values associated with each class
are distributed according to Gaussian i.e., Normal Distribution. If in our data, an attribute say “x” contains continuous data.
We first segment the data by the class and then compute mean & Variance of each class.
◦ MultiNomial Naive Bayes MultiNomial Naive Bayes is preferred to use on data that is multinomially distributed. It is one of
the standard classic algorithms. Which is used in text categorization (classification). Each event in text classification
represents the occurrence of a word in a document.
Machine Learning
How ML works?
Dataset
◦ A dataset is a collection of data in which data is arranged in some order. A dataset can contain any data from a series of an
array to a database table.
◦ A tabular dataset can be understood as a database table or matrix, where each column corresponds to a particular
variable, and each row corresponds to the fields of the dataset. The most supported file type for a tabular dataset is "Comma
Separated File,"
Country Age Salary Purchased

India 38 48000 No
France 43 45000 Yes
Germany 30 54000 No
France 48 65000 No
Germany 40 Yes
Steps involved in ML projects
◦ Import Data (CSV Files)
◦ Clean Data
◦ Removing duplicates, irrelevant and incomplete data.

◦ Splitting the data into training/ Test data


◦ For example if you have 100 pictures of cats and dogs we will split 80 pictures for training and 20 for test.

◦ Create Model (Algorithm)


◦ Train Model
◦ Make predictions
◦ Evaluate and improve
Python
◦ Python is a high-level, interpreted, interactive and object-oriented scripting language.
◦ Environment
◦ Jupyter use platofrom Anaconda
◦ Colab:
◦ https://colab.research.google.com/?utm_source=scs-index
◦ What is Colab?
◦ allows you to write and execute Python in your browser, with
◦ Zero configuration required
◦ Access to GPUs free of charge
◦ Easy sharing
◦ Dataset:
◦ https://github.com/
Python Libraries
◦ Numpy – Numerial Python
◦ NumPy is a Python library used for working with arrays. It also has functions for working in domain of
linear algebra, transform, and matrices. 
import numpy
arr = numpy.array([1, 2, 3, 4, 5])
print(arr)
◦ Pandas
◦ Used for data analysis and manipulation
import pandas as pd
df = pd.read_csv('data.csv’)
print(df)
◦ MatPlotLib
◦ Used for creating graphs and plots
◦ Scikit-Learn
◦ Gives all comman algorithms like decision trees neural network.
Thank You for
listening
Coding….
◦ Write a python code to print your name !
Answer
class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m’
  GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'
print(color.YELLOW + ’Dharanya' + color.END)
Find S algorithm
◦  General Hypothesis
◦ Hypothesis, in general, is an explanation for something. The general hypothesis basically states the
general relationship between the major variables.
◦ For example, a general hypothesis for ordering food would be I want a burger.
◦ G = { ‘?’, ‘?’, ‘?’, …..’?’}
◦ Specific Hypothesis
◦ The specific hypothesis fills in all the important details about the variables given in the general
hypothesis. The more specific details into the example given above would be I want a cheeseburger
with a chicken pepperoni filling with a lot of lettuce. 
◦ S = {‘Φ’,’Φ’,’Φ’, ……,’Φ’}
Find S Algorithm
◦ The Find-S algorithm follows the steps written below:
1. Initialize ‘h’ to the most specific hypothesis.
2. The Find-S algorithm only considers the positive examples and eliminates negative examples.
3. For each positive example, the algorithm checks for each attribute in the example.
4. If the attribute value is the same as the hypothesis value, the algorithm moves on without any changes.
5. But if the attribute value is different than the hypothesis value, the algorithm changes it to ‘?’.
Dataset

Time Weather Temperature Company Humidity Wind Goes

Morning Sunny Warm Yes Mild Strong Yes

Evening Rainy Cold No Mild Normal No

Morning Sunny Moderate Yes Normal Normal Yes

Evening Sunny Cold Yes High Strong Yes


Import panda as pd
import numpy as np
data = pd.read_csv('/content/StudentsPerformance.csv')
concepts = np.array(data)[:,:-1]
target = np.array(data)[:,-1]
def train(con,tar):
for i,val in enumerate(tar):
if val=='yes':
specific_h = con[i].copy()
break
for i,val in enumerate(con):
if tar[i]=='yes':
for x in range(len(specific_h)):
if val[x] != specific_h[x]:
specific_h[x] = '?'
else:
pass
return specific_h
print(train(concepts,target))
Supervised Learning
◦ Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically
supervised learning is when we teach or train the machine using data that is well labelled. Which means
some data is already tagged with the correct answer. After that, the machine is provided with a new set of
examples(data) so that the supervised learning algorithm analyses the training data(set of training
examples) and produces a correct outcome from labelled data.
Candidate Elimination Algorithm
◦ Why?
◦ Candidate Elimination Learning Algorithm addresses several of the limitations of FIND-S.
◦ Although the FIND-S algorithm outputs a hypothesis from H, that is consistent with the training examples, this is
just one of many hypotheses from H that might fit the training data equally well. 
◦ What?
◦ The key idea in the CANDIDATE-ELIMINATlON Algo is to output a description of the set of all hypotheses
consistent with the training examples. 
◦ At the end of the algorithm, we get both specific and general hypotheses as our final solution.
◦ For a positive example, we move from the most specific hypothesis to the most general hypothesis. 
◦ For a negative example, we move from the most general hypothesis to the most specific hypothesis. 
Continue..
◦ Unlike Find-S(#Link to Find-S) algorithm, the Candidate Elimination algorithm considers not just positive but
negative samples as well. It relies on the concept of version space. 
◦ What is Version Space?
◦ It’s a cross between a generic and a specific theory. It didn’t simply write one hypothesis; it wrote a list of all
feasible hypotheses based on the training data.
◦ With regard to hypothesis space H and training examples D, the version space, denoted as VSH,D, is the subset of
hypotheses from H that are consistent with the training instances in D .
Algorithm
Example
◦ Step 1:
◦ G0 = < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>, 

      <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ? >>   


◦ S0 = < ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>

◦ Step 2:
◦ S1 = < ‘Sunny’, ‘warm’, ‘normal’, ‘strong’, ‘warm ‘, ‘same’>
◦ G0 = G1= < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>, 

             <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ? >> 


◦ Step3 :
◦ S2 = < ‘Sunny’, ‘warm’, ‘?’, ‘strong’, ‘warm ‘, ‘same’>
◦ G0 = G1 = G2 = < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>, 

             <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ? >> 


◦ Step4
◦ G3 = < <‘Sunny’, ?, ?, ?, ?, ?>,  <?, ‘warm’, ?, ?, ?, ?>,
<?, ?, ‘Normal’ ?, ?, ?>, <?, ?, ?, ?, ?, ?>, <?, ?, ?, ?, ‘Warm’, ?>,
<?, ?, ?, ?, ?, ‘same’> >
◦ S3 = S2 = < ‘Sunny’, ‘warm’, ‘?’, ‘strong’, ‘warm ‘, ‘same’>

◦ Step5 : Final Hypothesis


◦ G4 = < <‘Sunny’, ?, ?, ?, ?, ?>,  <?, ‘warm’, ?, ?, ?, ?> >
◦ S4 = <‘Sunny’, ‘warm’, ?, ‘strong’, ?, ?> 
Iterative Dichotomiser 3:ID3
◦ ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively (repeatedly)
dichotomizes(divides) features into two or more groups at each step.
◦ Invented by Ross Quinlan, ID3 uses a top-down greedy approach to build a decision tree. In simple words, the top-
down approach means that we start building the tree from the top and the greedy approach means that at each iteration we
select the best feature at the present moment to create a node.
◦ It is used to create the smallest possible decision tree.
Entropy
◦ Entropy measure the impurity of collection of examples

                                                   
Where,           p+ is the proportion of positive examples in S
p– is the proportion of negative examples in S.

◦ For the set X = {a,a,a,b,b,b,b,b}


◦ Total instances: 8
◦ Instances of b: 5

◦ Instances of a: 3                                 
 = -[0.375 * (-1.415) + 0.625 * (-0.678)]

=-(-0.53-0.424) = 0.954
Information Gain
◦ Information gain, is the expected reduction in entropy caused by partitioning the examples according to
this attribute.
◦ The information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as
Steps
1.Calculate the Information Gain of each feature.
2.Considering that all rows don’t belong to the same class, split the dataset S into subsets using
the feature for which the Information Gain is maximum.
3.Make a decision tree node using the feature with the maximum Information gain.
4.If all rows belong to the same class, make the current node as a leaf node with the class as its
label.
5.Repeat for the remaining features until we run out of all features, or the decision tree has all
leaf nodes.
Advantage/Disadvantage
Advantage :
◦ Understandable prediction rules are created from the training data.
◦ Builds the fastest tree.
◦ Build a short tree.
◦ Only need to test enough attributes until all data is classified.
◦ Finding leaf nodes enables test data to be pruned, reaching number of tests.
Disadvantage:
◦ Data may be over-fitted or over classified if a small sample is tested.
◦ Only one attributed at a time is tested for making a decision.
◦ Classifying continuous data may be computationally expensive, as many trees must be generated to see
where to break the continuum.
Iterative Dichotomiser 3:ID3
◦ A decision tree is a structure that contains nodes and
edges and is built from a dataset.
The initial node is called the root node ,the final nodes are called the leaf nodes and the rest of
the nodes are called intermediate or internal nodes.
The root and intermediate nodes represent the decisions while the leaf nodes represent the
outcomes.
◦ Example:
◦ if a person is less than 30 years of age and doesn’t eat junk food then he is Fit,
◦ if a person is less than 30 years of age and eats junk food then he is Unfit and so on.
Happy Weekend…
Naïve Bayes Algorithm
◦ Naive Bayes uses a similar method to predict the probability of different class based on various attributes.
This algorithm is mostly used in text classification and with problems having multiple classes.
◦ The dataset is divided into two parts, namely, feature matrix and the response vector.
• Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the value of dependent features. In
above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
• Response vector contains the value of class variable(prediction or output) for each row of feature matrix. In above dataset, the
class variable name is ‘Play Tennis’.
Bayes’ Theorem
◦ Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred.
Bayes’ theorem is stated mathematically as the following equation:
• Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence.
• P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute
value of an unknown instance(here, it is event B).
• P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.
Text Classification
◦ Naive Bayes classifiers have been heavily used for text classification and text analysis machine learning problems.
◦ Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of
symbols (i.e. strings) cannot be fed directly to the algorithms themselves as most of them expect numerical feature
vectors with a fixed size rather than the raw text documents with variable length.
◦ In order to address this, scikit-learn provides utilities for the most common ways to extract numerical features from
text content, namely:
• tokenizing strings and giving an integer id for each possible token, for instance by using white-spaces and
punctuation as token separators.
• counting the occurrences of tokens in each document.
◦ In this scheme, features and samples are defined as follows:
• each individual token occurrence frequency is treated as a feature.
• the vector of all the token frequencies for a given document is considered a multivariate sample.
Count Vectorizer
◦ document = [ “One Geek helps Two Geeks”,
“Two Geeks help Four Geeks”,
“Each Geek helps many other Geeks at GeeksforGeeks.”]
◦ A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where
N is the number of target classes. The matrix compares the actual target values with those predicted by the
machine learning model.
◦ Precision is one indicator of a machine learning model's performance – the quality of a positive prediction
made by the model. Precision refers to the number of true positives divided by the total number of positive
predictions (i.e., the number of true positives plus the number of false positives).
◦ The recall is calculated as the ratio between the numbers of Positive samples correctly classified as
Positive to the total number of Positive samples. The recall measures the model's ability to detect positive
samples. The higher the recall, the more positive samples detected.

You might also like