Machine Learning Lab
Machine Learning Lab
LEARNING LAB
Dharanya S
◦ Types of Naive Bayes Algorithm
◦ Gaussian Naive Bayes When attribute values are continuous, an assumption is made that the values associated with each class
are distributed according to Gaussian i.e., Normal Distribution. If in our data, an attribute say “x” contains continuous data.
We first segment the data by the class and then compute mean & Variance of each class.
◦ MultiNomial Naive Bayes MultiNomial Naive Bayes is preferred to use on data that is multinomially distributed. It is one of
the standard classic algorithms. Which is used in text categorization (classification). Each event in text classification
represents the occurrence of a word in a document.
Machine Learning
How ML works?
Dataset
◦ A dataset is a collection of data in which data is arranged in some order. A dataset can contain any data from a series of an
array to a database table.
◦ A tabular dataset can be understood as a database table or matrix, where each column corresponds to a particular
variable, and each row corresponds to the fields of the dataset. The most supported file type for a tabular dataset is "Comma
Separated File,"
Country Age Salary Purchased
India 38 48000 No
France 43 45000 Yes
Germany 30 54000 No
France 48 65000 No
Germany 40 Yes
Steps involved in ML projects
◦ Import Data (CSV Files)
◦ Clean Data
◦ Removing duplicates, irrelevant and incomplete data.
◦ Step 2:
◦ S1 = < ‘Sunny’, ‘warm’, ‘normal’, ‘strong’, ‘warm ‘, ‘same’>
◦ G0 = G1= < <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?> , <?, ?, ?, ?, ?, ?>,
Where, p+ is the proportion of positive examples in S
p– is the proportion of negative examples in S.
◦ Instances of a: 3
= -[0.375 * (-1.415) + 0.625 * (-0.678)]
=-(-0.53-0.424) = 0.954
Information Gain
◦ Information gain, is the expected reduction in entropy caused by partitioning the examples according to
this attribute.
◦ The information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as
Steps
1.Calculate the Information Gain of each feature.
2.Considering that all rows don’t belong to the same class, split the dataset S into subsets using
the feature for which the Information Gain is maximum.
3.Make a decision tree node using the feature with the maximum Information gain.
4.If all rows belong to the same class, make the current node as a leaf node with the class as its
label.
5.Repeat for the remaining features until we run out of all features, or the decision tree has all
leaf nodes.
Advantage/Disadvantage
Advantage :
◦ Understandable prediction rules are created from the training data.
◦ Builds the fastest tree.
◦ Build a short tree.
◦ Only need to test enough attributes until all data is classified.
◦ Finding leaf nodes enables test data to be pruned, reaching number of tests.
Disadvantage:
◦ Data may be over-fitted or over classified if a small sample is tested.
◦ Only one attributed at a time is tested for making a decision.
◦ Classifying continuous data may be computationally expensive, as many trees must be generated to see
where to break the continuum.
Iterative Dichotomiser 3:ID3
◦ A decision tree is a structure that contains nodes and
edges and is built from a dataset.
The initial node is called the root node ,the final nodes are called the leaf nodes and the rest of
the nodes are called intermediate or internal nodes.
The root and intermediate nodes represent the decisions while the leaf nodes represent the
outcomes.
◦ Example:
◦ if a person is less than 30 years of age and doesn’t eat junk food then he is Fit,
◦ if a person is less than 30 years of age and eats junk food then he is Unfit and so on.
Happy Weekend…
Naïve Bayes Algorithm
◦ Naive Bayes uses a similar method to predict the probability of different class based on various attributes.
This algorithm is mostly used in text classification and with problems having multiple classes.
◦ The dataset is divided into two parts, namely, feature matrix and the response vector.
• Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the value of dependent features. In
above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
• Response vector contains the value of class variable(prediction or output) for each row of feature matrix. In above dataset, the
class variable name is ‘Play Tennis’.
Bayes’ Theorem
◦ Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred.
Bayes’ theorem is stated mathematically as the following equation:
• Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence.
• P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen). The evidence is an attribute
value of an unknown instance(here, it is event B).
• P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.
Text Classification
◦ Naive Bayes classifiers have been heavily used for text classification and text analysis machine learning problems.
◦ Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of
symbols (i.e. strings) cannot be fed directly to the algorithms themselves as most of them expect numerical feature
vectors with a fixed size rather than the raw text documents with variable length.
◦ In order to address this, scikit-learn provides utilities for the most common ways to extract numerical features from
text content, namely:
• tokenizing strings and giving an integer id for each possible token, for instance by using white-spaces and
punctuation as token separators.
• counting the occurrences of tokens in each document.
◦ In this scheme, features and samples are defined as follows:
• each individual token occurrence frequency is treated as a feature.
• the vector of all the token frequencies for a given document is considered a multivariate sample.
Count Vectorizer
◦ document = [ “One Geek helps Two Geeks”,
“Two Geeks help Four Geeks”,
“Each Geek helps many other Geeks at GeeksforGeeks.”]
◦ A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where
N is the number of target classes. The matrix compares the actual target values with those predicted by the
machine learning model.
◦ Precision is one indicator of a machine learning model's performance – the quality of a positive prediction
made by the model. Precision refers to the number of true positives divided by the total number of positive
predictions (i.e., the number of true positives plus the number of false positives).
◦ The recall is calculated as the ratio between the numbers of Positive samples correctly classified as
Positive to the total number of Positive samples. The recall measures the model's ability to detect positive
samples. The higher the recall, the more positive samples detected.