Unit III Da Notes

Regression: Types
Types of Regression
 Linear Regression.
 Logistic Regression.
 Polynomial Regression.
 Support Vector Regression.
 Decision Tree Regression.
 Random Forest Regression.
 Ridge Regression.
 Lasso Regression:
Linear Regression
Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between the dependent variable and one or more independent features by fitting a
linear equation to observed data.
Simple Linear Regression
This is the simplest form of linear regression, and it involves only one independent variable and
one dependent variable. The equation for simple linear regression is:
y=β0+β1X
where:
 Y is the dependent variable
 X is the independent variable
 β0 is the intercept
 β1 is the slope
Multiple Linear Regression
This involves more than one independent variable and one dependent variable. The equation for
multiple linear regression is:
y=β0+β1X+β2X+………βnX
where:
 Y is the dependent variable
 X1, X2, …, Xp are the independent variables
 β0 is the intercept
 β1, β2, …, βn are the slopes

Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-axis,
then such a relationship is termed as a Positive linear relationship.
Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the X-
axis, then such a relationship is called a negative linear relationship.
Cost function for Linear Regression
The cost function or the loss function is nothing but the error or difference between the predicted
value Y^ Y^ and the true value Y.
In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which
calculates the average of the squared errors between the predicted values yîyî and the actual
values yiyi. The purpose is to determine the optimal values for the intercept θ1θ1 and the
coefficient of the input feature θ2θ2 providing the best-fit line for the given data points. The
linear equation expressing this relationship is yî=θ1+θ2xiyî=θ1+θ2xi.
MSE function can be calculated as:
Cost function(J)=1/n∑in (yi^ − yi )2
Example Program LINEAR REGRESSION
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
data_set= pd.read_csv('Salary_Data.csv')
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
#Fitting the Simple Linear Regression model to the training dataset
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)
LinearRegression().
#Prediction of Test and Training set result
y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)
mtp.scatter(x_train, y_train, color="green")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Training Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
#visualizing the Test set results
mtp.scatter(x_test, y_test, color="blue")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Test Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
Polynomial Regression
Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and
independent variable(x) as nth degree polynomial. The Polynomial Regression equation is given below:
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
 y is the dependent variable.
 x is the independent variable.
 n is the degree of the polynomial.
 It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.
 It is a linear model with some modification in order to increase the accuracy.
 The dataset used in Polynomial regression for training is of non-linear nature.
Loss and Cost Function – Polynomial Regression
The Mean Squared Error may also be used as the Cost Function of Polynomial regression; however, the
equation will vary somewhat.
Types of Polynomial Regression
A quadratic equation is a general term for a second-degree polynomial equation. This degree, on the other
hand, can go up to nth values. Here is the categorization of Polynomial Regression:
1. Linear – if degree as 1
2. Quadratic – if degree as 2
3. Cubic – if degree as 3 and goes on, on the basis of degree.
Example Program for Polynomial Regression
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
datas = pd.read_csv('poly.csv')
datas
X = datas.iloc[:, 1:2].values
y = datas.iloc[:, 2].values
# Features and the target variables
X = datas.iloc[:, 1:2].values
y = datas.iloc[:, 2].values
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin = LinearRegression()
lin.fit(X, y)
LinearRegression()
# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=4)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
plt.plot(X, lin2.predict(poly.fit_transform(X)),
color='red')
plt.title('Polynomial Regression')
plt.xlabel('Temperature')
plt.ylabel('Pressure')
plt.show()
# Predicting a new result with Linear Regression
# after converting predict variable to 2D array
pred = 110.0
predarray = np.array([[pred]])
lin.predict(predarray)
array([0.20675333])
# Predicting a new result with Polynomial Regression
# after converting predict variable to 2D array
pred2 = 110.0
pred2array = np.array([[pred2]])
lin2.predict(poly.fit_transform(pred2array))
array([0.43295877])
Multivariate Analysis
Multivariate Analysis
Multivariate analysis is a set of techniques used for analysis of data sets that contain more than
one variable, and the techniques are especially valuable when working with correlated variables.
The techniques provide an empirical method for information extraction, regression, or
classification; some of these techniques have been developed quite recently because they require
the computational capacity of modern computers.
Multivariate analysis offers a more complete examination of data by looking at all possible
independent variables and their relationships to one another
Key points in Multivariate analysis:
1. Analysis Techniques:The ways to perform analysis on this data depends on the goals to
be achieved. Some of the techniques are regression analysis, principal component
analysis, path analysis, factor analysis and multivariate analysis of variance (MANOVA).
2. Goals of Analysis: The choice of analysis technique depends on the specific goals of the
study. For example, researchers may be interested in predicting one variable based on
others, identifying underlying factors that explain patterns, or comparing group means
across multiple variables.
3. Interpretation: Multivariate analysis allows for a more nuanced interpretation of

complex relationships within the data. It helps uncover patterns that may not be apparent
when examining variables individually.
Types of Multivariate Analysis Techniques
Multivariate methods can be subdivided according to different aspects.
Factor Analysis
Factor analysis is a technique to reduce many variables into fewer numbers of factors. Factorial
studies focus on different variables and are subdivided into principal component analysis and
correspondence analysis.
Example –
Company ABC is planning to understand the factors contributing to user engagement. It collects
data from a user population and tracks their activities, such as the number of posts they make, the
number of comments they leave, and the number of likes they give on the company’s posts.
Company ABC can use factor analysis to identify the underlying factors that explain the
variation in user engagement. The results can help the company understand its areas of
improvement, particularly the content that can provide more useful information to its users,
thereby boosting engagement.
Cluster Analysis
Cluster analysis is a statistical technique to identify groups or “clusters” of similar objects or

cases. The cases can be individuals, companies, or products, and the variables used to group
them can be qualitative or quantitative. The main objective of cluster analysis is to find patterns
or structures in the data that can help better understand the phenomenon being studied.
Example:
Streaming services like Netflix and Amazon Prime collect the following data about their
customers:
 Minutes watched per day
 Total viewing sessions per week
 Number of unique shows viewed per month

Clustering algorithms help these streaming platforms group people with similar interests and
suggest their interest-specific Sci-Fi movies and shows to users who have previously watched
many Sci-Fi movies.
Regression Analysis
Regression analysis is a statistical process that analyzes the relationship between two or more
variables, one dependent on the other variables used in mathematical calculation. In other words,
a regression analysis makes it possible to understand how independent variables directly affect
another variable that depends on them.
Example:
An e-commerce company wants to understand the factors influencing the number of items
customers purchase. It collects data from a sample of customers, tracking their shopping
behavior, such as the number of items they add to their cart, the number of items they check out,
or the items they purchase.
The company uses regression analysis to identify the factors that significantly impact the number
of items customers purchase. The regression analysis results may suggest that price, customer
reviews, and loyalty significantly impact the number of items customers purchase.
These results can help the company develop more effective marketing campaigns and product
offerings. It can focus on offering discounts on high-priced items, partnering with influencers to
generate positive product reviews, or developing loyalty programs to reward repeat customers.
Deviance Analysis
Deviance analysis determines the influence of several or individual group variables by
calculating statistical averages. Here you can compare variables within one or different groups,
depending on the assumption of the deviations.
Example:
A hospital wants to understand the factors contributing to patient readmissions. The hospital
collects data from a sample of patients, tracking their medical history, treatments, and
readmissions.
The hospital uses deviance analysis to identify the underlying factors that explain the variation in
patient readmissions. The deviance analysis results suggest chronic conditions are major factors
contributing to patient readmissions.
The deviance analysis results can help the hospital develop more effective interventions to
reduce patient readmissions. For example, the hospital can focus on providing better care for
patients with chronic conditions.
Discriminant Analysis
Used in the context of deviance analysis to differentiate between groups with similar or identical
characteristics.
Example
On the human resources front, discriminant analysis can help predict job success based on
predictors like educational background, experience and skill levels, and personality test scores.
The dependent variable could be a binary measure of job success or a multi-category measure
(low, medium, or high performer).
MANOVA (Multivariate Analysis of Variance)
MANOVA is a statistical test that analyzes the relationship between several response variables
and a common set of predictors. It requires continuous response variables and categorical
predictors. MANOVA has some important advantages compared to running multiple ANOVA
analyses, one response variable at a time.
Example
MANOVA algorithms can help online educational platforms group students based on their
learning styles, such as visual, auditory, and kinesthetic learners or identify factors that influence
student performance in online courses, such as prior knowledge, self-regulation, and technology
skills. This information can help improve online course design and provide more effective
support services to students.
Advantages of Multivariate Analysis
Here are some of the advantages of multivariate analysis:
 Multivariate analysis allows you to look at the relationships between multiple variables,
which can give you a more complete understanding of your data.
 Using multivariate analysis, you can make predictions about future events. For example,
it can predict customer behavior, the likelihood of a loan default, or the outcome of a
sporting event.
 Multivariate analysis can identify patterns in your data. You can understand how your
data is distributed and identify trends or anomalies.
 It can be used to understand the relationships between different variables in your data.
This can help you to understand how the variables affect each other and to identify any
causal relationships.
Support Vector Machine

Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or
nonlinear classification, regression, and even outlier detection tasks. SVMs can be used for a
variety of tasks, such as text classification, image classification, spam detection, handwriting
identification, gene expression analysis, face detection, and anomaly detection. SVMs are
adaptable and efficient in a variety of applications because they can manage high-dimensional
data and nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum separating hyperplane
between the different classes available in the target feature.
 Support Vectors: These are the points that are closest to the hyperplane. A separating
line will be defined with the help of these data points.
 Margin: it is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors). In SVM large margin is considered a good margin. There
are two types of margins hard margin and soft margin. I will talk more about these two
in the later section.
1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data points
of different classes in a feature space. In the case of linear classifications, it will be a
linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyperplane, which
makes a critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyperplane can be
easily found out even if the data points are not linearly separable in the original input
space. Some of the common kernel functions are linear, polynomial, radial basis
function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits
a soft margin technique. Each data point has a slack variable introduced by the soft-
margin SVM formulation, which softens the strict margin requirement and permits
certain misclassifications or violations. It discovers a compromise between increasing the
margin and reducing violations.
The equation for the linear hyperplane can be written as:

wTx+b=0
Advantages of SVM
 Effective in high-dimensional cases.
 Its memory is efficient as it uses a subset of training points in the decision function called
support vectors.
 Different kernel functions can be specified for the decision functions and its possible to
specify custom kernels.
Example Program
import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()
dir(iris)
iris.feature_names
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()
df['target']=iris.target
df.head()
iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
df[df.target==1].head()
df['flower_name']=df.target.apply(lambda x:iris.target_names[x])
df.head()
from matplotlib import pyplot as plt
%matplotlib inline
df0=df[df.target==0]
df0.head()
plt.scatter(df0['sepal length (cm)'],df0['sepal width (cm)'],color='green',marker='*')

plt.scatter(df1['sepal length (cm)'],df1['sepal width (cm)'],color='red',marker='.')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.scatter(df0['petal length (cm)'], df0['petal width (cm)'],color="green",marker='+')
plt.scatter(df1['petal length (cm)'], df1['petal width (cm)'],color="blue",marker='.')
x=df.drop(['target','flower_name'],axis='columns')
x
y=df.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
len(x_train)
120
len(x_test)
30
from sklearn.svm import SVC

model=SVC(C=5)
model.fit(x_train,y_train)
model.score(x_test,y_test)
1.0
Kernal Methods
Kernel Function is a method used to take data as input and transform it into the required form of
processing data. “Kernel” is used due to a set of mathematical functions used in Support Vector
Machine providing the window to manipulate the data. So, Kernel Function generally transforms
the training set of data so that a non-linear decision surface is able to transform to a linear
equation in a higher number of dimension spaces. Basically,
It returns the inner product between two points in a standard feature dimension.
The kernel trick is a mathematical technique that allows algorithms to operate in the high-
dimensional feature space without the computational cost. Instead of computing the dot product
in the high-dimensional space, we use a kernel function to compute the dot product directly in
the input space. Common kernel functions include:
 Linear Kernel: 𝐾(𝑥,𝑦)=𝑥𝑇𝑦K(x,y)=xTy
 Polynomial Kernel: 𝐾(𝑥,𝑦)=(𝑥𝑇𝑦+𝑐)𝑑K(x,y)=(xTy+c)d

 Gaussian (RBF) Kernel: 𝐾(𝑥,𝑦)=exp⁡(−∥𝑥−𝑦∥22𝜎2)K(x,y)=exp(−2σ2∥x−y∥2)
 Sigmoid Kernel: 𝐾(𝑥,𝑦)=tanh⁡(𝛼𝑥𝑇𝑦+𝑐)K(x,y)=tanh(αxTy+c)
Standard Kernel Function Equation :
Gaussian Kernel: It is used to perform transformation when there is no prior knowledge about
data.
Gaussian Kernel Radial Basis Function (RBF): Same as above kernel function, adding radial
basis method to improve the transformation.
Gaussian Kernel Graph
classifier = SVC(kernel ='rbf', random_state = 0)
# training set in x, y axis
classifier.fit(x_train, y_train)
classifier = SVC(kernel ='sigmoid')
classifier.fit(x_train, y_train) # training set in x, y axis
Practical Applications of Kernel Methods
1. Image Recognition: Kernel methods are widely used in image recognition tasks. For
example, in handwriting recognition, kernels can map pixel data to a high-dimensional
space, enabling more accurate classification. By using kernels like the RBF kernel, the
system can distinguish between different handwritten digits by finding complex patterns
in the pixel data.
2. Bioinformatics: In bioinformatics, kernel methods help analyze genetic data. By mapping
sequence data to a high-dimensional space, kernels facilitate the identification of patterns
and relationships that are not apparent in the original space. This is crucial for tasks such
as protein structure prediction and gene expression analysis.
3. Text Categorization: In natural language processing, kernels are used to classify

documents. The kernel trick enables the handling of sparse and high-dimensional text
data, improving the performance of text categorization algorithms. For instance, the
polynomial kernel can capture the relationship between different words and phrases,
leading to better document classification.
Choosing the Right Kernel
Selecting the appropriate kernel function is crucial for the success of kernel methods. There is no
one-size-fits-all approach, and the choice depends on the specific problem and data
characteristics. Some considerations include:
 Domain Knowledge: Understanding the problem domain can guide the choice of kernel.
For instance, the polynomial kernel may be suitable for data with polynomial
relationships.
 Cross-Validation: Experimenting with different kernels and using cross-validation helps

in selecting the kernel that provides the best performance.
 Parameter Tuning: Kernel parameters (like 𝜎σ in the RBF kernel) significantly impact
performance. Hyperparameter tuning techniques, such as grid search, can optimize these
parameters.
Implementing Kernel Methods in Python
import numpy as np
from sklearn import datasets
from sklearn.metrics import accuracy_score
# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create and train the SVM model with RBF kernel
model = SVC(kernel='rbf', gamma='scale')
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
Rule Mining
Association rule mining is a technique used to uncover hidden relationships between variables in
large datasets. It is a popular method in data mining and machine learning and has a wide range
of applications in various fields, such as market basket analysis, customer segmentation, and
fraud detection.
Association rule mining is a technique used to identify patterns in large data sets. It involves
finding relationships between variables in the data and using those relationships to make
predictions or decisions. The goal of association rule mining is to uncover rules that describe the
relationships between different items in the data set.
Association rule mining is commonly used in a variety of applications, some common ones are:
Market Basket Analysis
One of the most well-known applications of association rule mining is in market basket analysis.
This involves analyzing the items customers purchase together to understand their purchasing
habits and preferences.
For example, a retailer might use association rule mining to discover that customers who
purchase diapers are also likely to purchase baby formula. We can use this information to
optimize product placements and promotions to increase sales.
Customer Segmentation
Association rule mining can also be used to segment customers based on their purchasing habits.
For example, a company might use association rule mining to discover that customers who
purchase certain types of products are more likely to be younger. Similarly, they could learn that
customers who purchase certain combinations of products are more likely to be located in
specific geographic regions.
Fraud Detection
You can also use association rule mining to detect fraudulent activity. For example, a credit card
company might use association rule mining to identify patterns of fraudulent transactions, such
as multiple purchases from the same merchant within a short period of time.
Social network analysis
Various companies use association rule mining to identify patterns in social media data that can
inform the analysis of social networks.
For example, an analysis of Twitter data might reveal that users who tweet about a particular
topic are also likely to tweet about other related topics, which could inform the identification of
groups or communities within the network.
Recommendation systems
Association rule mining can be used to suggest items that a customer might be interested in
based on their past purchases or browsing history. For example, a music streaming service might
use association rule mining to recommend new artists or albums to a user based on their listening
history.
Association Rule Mining Algorithms
There are several algorithms used for association rule mining. Some common ones are:
Apriori algorithm
The Apriori algorithm is one of the most widely used algorithms for association rule mining. It
works by first identifying the frequent itemsets in the dataset (itemsets that appear in a certain
number of transactions). It then uses these frequent itemsets to generate association rules, which
are statements of the form "if item A is purchased, then item B is also likely to be purchased."
The Apriori algorithm uses a bottom-up approach, starting with individual items and gradually
building up to more complex itemsets.
FP-Growth algorithm
The FP-Growth (Frequent Pattern Growth) algorithm is another popular algorithm for
association rule mining. It works by constructing a tree-like structure called a FP-tree, which
encodes the frequent itemsets in the dataset. The FP-tree is then used to generate association
rules in a similar manner to the Apriori algorithm. The FP-Growth algorithm is generally faster
than the Apriori algorithm, especially for large datasets.
ECLAT algorithm
The ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a
variation of the Apriori algorithm that uses a top-down approach rather than a bottom-up
approach. It works by dividing the items into equivalence classes based on their support (the
number of transactions in which they appear). The association rules are then generated by
combining these equivalence classes in a lattice-like structure. It is a more efficient and scalable
version of the Apriori algorithm.
Apriori Algorithm
The apriori algorithm has become one of the most widely used algorithms for frequent itemset
mining and association rule learning. It has been applied to a variety of applications, including
market basket analysis, recommendation systems, and fraud detection, and has inspired the
development of many other algorithms for similar tasks.
Algorithm Details
The apriori algorithm starts by setting the minimum support threshold. This is the minimum
number of times an item must occur in the database in order for it to be considered a frequent
itemset. The algorithm then filters out any candidate itemsets that do not meet the minimum
support threshold.
The algorithm then generates a list of all possible combinations of frequent itemsets and counts
the number of times each combination appears in the database. The algorithm then generates a
list of association rules based on the frequent itemset combinations.
Metrics for Evaluating Association Rules

In association rule mining, several metrics are commonly used to evaluate the quality and
importance of the discovered association rules.
These metrics can be used to evaluate the quality and importance of association rules and to
select the most relevant rules for a given application. It is important to note that the appropriate
choice of metric will depend on the specific goals and requirements of the application.
Interpreting the results of association rule mining metrics requires understanding the meaning
and implications of each metric, as well as how to use them to evaluate the quality and
importance of the discovered association rules. Here are some guidelines for interpreting the
results of the main association rule mining metrics:
Support
Support is a measure of how frequently an item or itemset appears in the dataset. It is calculated
as the number of transactions containing the item(s) divided by the total number of transactions
in the dataset. High support indicates that an item or itemset is common in the dataset, while low
support indicates that it is rare.
Confidence
Confidence is a measure of the strength of the association between two items. It is calculated as
the number of transactions containing both items divided by the number of transactions
containing the first item. High confidence indicates that the presence of the first item is a strong
predictor of the presence of the second item.
Lift
Lift is a measure of the strength of the association between two items, taking into account the
frequency of both items in the dataset. It is calculated as the confidence of the association
divided by the support of the second item. Lift is used to compare the strength of the association
between two items to the expected strength of the association if the items were independent.
A lift value greater than 1 indicates that the association between two items is stronger than
expected based on the frequency of the individual items. This suggests that the association may
be meaningful and worth further investigation. A lift value less than 1 indicates that the
association is weaker than expected and may be less reliable or less significant.
An association rule is a statement of the form "if item A is present in a transaction, then item B is
also likely to be present". The strength of the association is measured using the confidence of the
rule, which is the probability that item B is present given that item A is present.
The algorithm then filters out any association rules that do not meet a minimum confidence
threshold. These rules are referred to as strong association rules. Finally, the algorithm then
returns the list of strong association rules as output.
Cluster Analysis
Clustering Methods:
The clustering methods can be classified into the following categories:
 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
 Model-Based Method
 Constraint-based Method
Partitioning Methods
This clustering method classifies the information into multiple groups based on the
characteristics and similarity of the data. Its the data analysts to specify the number of clusters
that has to be generated for the clustering methods. In the partitioning method when database(D)
that contains multiple(N) objects then the partitioning method constructs user-specified(K)
partitions of the data in which each partition represents a cluster and a particular region. There
are many algorithms that come under partitioning method some of the popular ones are K-Mean,
PAM(K-Medoids), CLARA algorithm (Clustering Large Applications) etc. In this article, we will
be seeing the working of K Mean algorithm in detail.
K-Mean (A centroid based Technique): The K means algorithm takes the input parameter K
from the user and partitions the dataset containing N objects into K clusters so that resulting
similarity among the data objects inside the group (intracluster) is high but the similarity of data
objects with the data objects from outside the cluster is low (intercluster). The similarity of the
cluster is determined with respect to the mean value of the cluster. It is a type of square error
algorithm. At the start randomly k objects from the dataset are chosen in which each of the
objects represents a cluster mean(centre). For the rest of the data objects, they are assigned to the
nearest cluster based on their distance from the cluster mean. The new mean of each of the
cluster is then calculated with the added data objects.
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the updated values.
4. Repeat Step 2 until no change occurs.
Figure – K-mean Clustering
Example: Suppose we want to group the visitors to a website using just their age as follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16 [16]
Centroid(C2) = 22 [22]
Note: These two points are chosen randomly from the dataset.
Iteration-1:
C1 = 16.33 [16, 16, 17]

C2 = 37.25 [20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-2:
C1 = 19.55 [16, 16, 17, 20, 20, 21, 21, 22, 23]
C2 = 46.90 [29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-3:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-4:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
No change Between Iteration 3 and 4, so we stop. Therefore we get the clusters (16-29) and (36-
66) as 2 clusters we get using K Mean Algorithm.
Hierarchical Methods
A Hierarchical clustering method works via grouping data into a tree of clusters. Hierarchical
clustering begins by treating every data point as a separate cluster. Then, it repeatedly executes
the subsequent steps:
1. Identify the 2 clusters which can be closest together, and
2. Merge the 2 maximum comparable clusters. We need to continue these steps until all the
clusters are merged together.
In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters. A

diagram called Dendrogram (A Dendrogram is a tree-like diagram that statistics the sequences
of merges or splits) graphically represents this hierarchy and is an inverted tree that describes the
order in which factors are merged (bottom-up view) or clusters are broken up (top-down view).
What is Hierarchical Clustering?
Hierarchical clustering is a method of cluster analysis in data mining that creates a hierarchical
representation of the clusters in a dataset. The method starts by treating each data point as a
separate cluster and then iteratively combines the closest clusters until a stopping criterion is
reached. The result of hierarchical clustering is a tree-like structure, called a dendrogram, which
illustrates the hierarchical relationships among the clusters.
Hierarchical clustering has several advantages over other clustering methods
 The ability to handle non-convex clusters and clusters of different sizes and densities.
 The ability to handle missing data and noisy data.
 The ability to reveal the hierarchical structure of the data, which can be useful for
understanding the relationships among the clusters.
Drawbacks of Hierarchical Clustering
 The need for a criterion to stop the clustering process and determine the final number of
clusters.
 The computational cost and memory requirements of the method can be high, especially
for large datasets.
 The results can be sensitive to the initial conditions, linkage criterion, and distance metric
used.
In summary, Hierarchical clustering is a method of data mining that groups similar data
points into clusters by creating a hierarchical structure of the clusters.
 This method can handle different types of data and reveal the relationships among the
clusters. However, it can have high computational cost and results can be sensitive to
some conditions.
Types of Hierarchical Clustering
Basically, there are two types of hierarchical Clustering:
1. Agglomerative Clustering
2. Divisive clustering
1. Agglomerative Clustering
Initially consider every data point as an individual Cluster and at every step, merge the nearest
pairs of the cluster. (It is a bottom-up method). At first, every dataset is considered an individual
entity or cluster. At every iteration, the clusters merge with different clusters until one cluster is
formed.
The algorithm for Agglomerative Hierarchical Clustering is:
 Calculate the similarity of one cluster with all the other clusters (calculate proximity
matrix)
 Consider every data point as an individual cluster
 Merge the clusters which are highly similar or close to each other.
 Recalculate the proximity matrix for each cluster

 Repeat Steps 3 and 4 until only a single cluster remains.
Let’s say we have six data points A, B, C, D, E, and F.
Agglomerative Hierarchical clustering
 Step-1: Consider each alphabet as a single cluster and calculate the distance of one
cluster from all the other clusters.
 Step-2: In the second step comparable clusters are merged together to form a single
cluster. Let’s say cluster (B) and cluster (C) are very similar to each other therefore we
merge them in the second step similarly to cluster (D) and (E) and at last, we get the
clusters [(A), (BC), (DE), (F)]
 Step-3: We recalculate the proximity according to the algorithm and merge the two
nearest clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]
 Step-4: Repeating the same process; The clusters DEF and BC are comparable and
merged together to form a new cluster. We’re now left with clusters [(A), (BCDEF)].
 Step-5: At last, the two remaining clusters are merged together to form a single cluster
[(ABCDEF)].
2. Divisive Hierarchical clustering
We can say that Divisive Hierarchical clustering is precisely the opposite of Agglomerative
Hierarchical clustering. In Divisive Hierarchical clustering, we take into account all of the data
points as a single cluster and in every iteration, we separate the data points from the clusters
which aren’t comparable. In the end, we are left with N clusters.
Density Based Methods
Density-based clustering refers to a method that is based on local cluster criterion, such as
density connected points.
Density-Based Clustering refers to one of the most popular unsupervised learning methodologies
used in model building and machine learning algorithms. The data points in the region separated
by two clusters of low point density are considered as noise. The surroundings with a radius ε of
a given object are known as the ε neighborhood of the object. If the ε neighborhood of the object
comprises at least a minimum number, MinPts of objects, then it is called a core object.
Density-Based Clustering - Background

There are two different parameters to calculate the density-based clustering
EPS: It is considered as the maximum radius of the neighborhood.
MinPts: MinPts refers to the minimum number of points in an Eps neighborhood of that point.
NEps (i) : { k belongs to D and dist (i,k) < = Eps}
Directly density reachable:
A point i is considered as the directly density reachable from a point k with respect to Eps,
MinPts if
i belongs to NEps(k)
Core point condition:
NEps (k) >= MinPts
Density reachable:
A point denoted by i is a density reachable from a point j with respect to Eps, MinPts if there is a
sequence chain of a point i1,…., in, i1 = j, pn = i such that i i + 1 is directly density reachable
from ii.
Density connected:
A point i refers to density connected to a point j with respect to Eps, MinPts if there is a point o
such that both i and j are considered as density reachable from o with respect to Eps and MinPts.
Working of Density-Based Clustering
Suppose a set of objects is denoted by D', we can say that an object I is directly density reachable
form the object j only if it is located within the ε neighborhood of j, and j is a core object.
An object i is density reachable form the object j with respect to ε and MinPts in a given set of
objects, D' only if there is a sequence of object chains point i1,…., in, i1 = j, pn = i such that i i +
1 is directly density reachable from ii with respect to ε and MinPts.
Major Features of Density-Based Clustering
The primary features of Density-based clustering are given below.
o It is a scan method.
o It requires density parameters as a termination condition.
o It is used to manage noise in data clusters.
o Density-based clustering is used to identify clusters of arbitrary size.
Density-Based Clustering Methods

DBSCAN
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise.

It depends on a density-based notion of cluster. It also identifies clusters of
arbitrary size in the spatial database with outliers.
Grid Based Methods
In the Grid-Based method a grid is formed using the object together,i.e, the object space is
quantized into a finite number of cells that form a grid structure. One of the major advantages of
the grid-based method is fast processing time and it is dependent only on the number of cells in
each dimension in the quantized space. The processing time for this method is much faster so it
can save time.
Statistical Information Grid(STING):

A STING is a grid-based clustering technique. It uses a multidimensional grid data structure that
quantifies space into a finite number of cells. Instead of focusing on data points, it focuses on the
value space surrounding the data points.
In STING, the spatial area is divided into rectangular cells and several levels of cells at different
resolution levels. High-level cells are divided into several low-level cells.
In STING Statistical Information about attributes in each cell, such as mean, maximum, and
minimum values, are precomputed and stored as statistical parameters. These statistical
parameters are useful for query processing and other data analysis tasks.
The statistical parameter of higher-level cells can easily be computed from the parameters of the
lower-level cells.
How STING Work:
Step 1: Determine a layer, to begin with.

Step 2: For each cell of this layer, it calculates the confidence interval or estimated range of
probability that this is cell is relevant to the query.
Step 3: From the interval calculate above, it labels the cell as relevant or not relevant.
Step 4: If this layer is the bottom layer, go to point 6, otherwise, go to point 5.
Step 5: It goes down the hierarchy structure by one level. Go to point 2 for those cells that form
the relevant cell of the high-level layer.
Step 6: If the specification of the query is met, go to point 8, otherwise go to point 7.
Step 7: Retrieve those data that fall into the relevant cells and do further processing. Return the
result that meets the requirement of the query. Go to point 9.
Step 8: Find the regions of relevant cells. Return those regions that meet the requirement of the
query. Go to point 9.
Step 9: Stop or terminate.
Advantages:
 Grid-based computing is query-independent because the statistics stored in each cell

represent a summary of the data in the grid cells and are query-independent.
 The grid structure facilitates parallel processing and incremental updates.
Disadvantage:
 The main disadvantage of Sting (Statistics Grid). As we know, all cluster boundaries are
either horizontal or vertical, so no diagonal boundaries are detected.
Model-Based Method
In the model-based method, all the clusters are hypothesized in order to find the data which is
best suited for the model. The clustering of the density function is used to locate the clusters for a
given model. It reflects the spatial distribution of data points and also provides a way to
automatically determine the number of clusters based on standard statistics, taking outlier or
noise into account. Therefore it yields robust clustering methods.
Clustering High-Dimensional Data

Clustering is basically a type of unsupervised learning method. An unsupervised learning method
is a method in which we draw references from datasets consisting of input data without labeled
responses.
Clustering is the task of dividing the population or data points into a number of groups such that
data points in the same groups are more similar to other data points in the same group and
dissimilar to the data points in other groups.
Challenges of Clustering High-Dimensional Data:
Clustering of the High-Dimensional Data return the group of objects which are clusters. It is
required to group similar types of objects together to perform the cluster analysis of high-
dimensional data, But the High-Dimensional data space is huge and it has complex data types
and attributes. A major challenge is that we need to find out the set of attributes that are present
in each cluster. A cluster is defined and characterized based on the attributes present in the
cluster. Clustering High-Dimensional Data we need to search for clusters and find out the space
for the existing clusters.
The High-Dimensional data is reduced to low-dimension data to make the clustering and search
for clusters simple. some applications need the appropriate models of clusters, especially the
high-dimensional data. clusters in the high-dimensional data are significantly small. the
conventional distance measures can be ineffective. Instead, To find the hidden clusters in high-
dimensional data we need to apply sophisticated techniques that can model correlations among
the objects in subspaces.
Subspace Clustering Methods:
There are 3 Subspace Clustering Methods:

 Subspace search methods
 Correlation-based clustering methods
 Biclustering methods
Subspace clustering approaches to search for clusters existing in subspaces of the given high-
dimensional data space, where a subspace is defined using a subset of attributes in the full space.
1. Subspace Search Methods: A subspace search method searches the subspaces for clusters.
Here, the cluster is a group of similar types of objects in a subspace. The similarity between the
clusters is measured by using distance or density features. CLIQUE algorithm is a subspace
clustering method. subspace search methods search a series of subspaces. There are two
approaches in Subspace Search Methods: Bottom-up approach starts to search from the low-
dimensional subspaces. If the hidden clusters are not found in low-dimensional subspaces then it
searches in higher dimensional subspaces. The top-down approach starts to search from the high-
dimensional subspaces and then search in subsets of low-dimensional subspaces. Top-down
approaches are effective if the subspace of a cluster can be defined by the local neighborhood
sub-space clusters.
2. Correlation-Based Clustering: correlation-based approaches discover the hidden clusters by

developing advanced correlation models. Correlation-Based models are preferred if is not
possible to cluster the objects by using the Subspace Search Methods. Correlation-Based
clustering includes the advanced mining techniques for correlation cluster analysis. Biclustering
Methods are the Correlation-Based clustering methods in which both the objects and attributes
are clustered.
3. Biclustering Methods:
Biclustering means clustering the data based on the two factors. we can cluster both objects and
attributes at a time in some applications. The resultant clusters are biclusters. To perform the
biclustering there are four requirements:
 Only a small set of objects participate in a cluster.
 A cluster only involves a small number of attributes.
 The data objects can take part in multiple clusters, or the objects may also include in any
cluster.
 An attribute may be involved in multiple clusters.
Objects and attributes are not treated in the same way. Objects are clustered according to their
attribute values. We treat Objects and attributes as different in biclustering analysis.
Predictive Analytics
Predictive analytics is a branch of advanced analytics that makes predictions about future
outcomes using historical data combined with statistical modeling, data mining techniques and
machine learning.
Types of predictive modeling
Predictive analytics models are designed to assess historical data, discover patterns, observe
trends, and use that information to predict future trends. Popular predictive analytics models
include classification, clustering, and time series models.
Classification models
Classification models fall under the branch of supervised machine learning models. These
models categorize data based on historical data, describing relationships within a given dataset.
For example, this model can be used to classify customers or prospects into groups for
segmentation purposes. Alternatively, it can also be used to answer questions with binary
outputs, such answering yes or no or true and false; popular use cases for this are fraud detection
and credit risk evaluation. Types of classification models include logistic regression, decision
trees, random forest, neural networks, and Naïve Bayes.
Clustering models
Clustering models fall under unsupervised learning. They group data based on similar attributes.
For example, an e-commerce site can use the model to separate customers into similar groups
based on common features and develop marketing strategies for each group. Common clustering
algorithms include k-means clustering, mean-shift clustering, density-based spatial clustering of
applications with noise (DBSCAN), expectation-maximization (EM) clustering using Gaussian
Mixture Models (GMM), and hierarchical clustering.
Time series models
Time series models use various data inputs at a specific time frequency, such as daily, weekly,
monthly, et cetera. It is common to plot the dependent variable over time to assess the data for
seasonality, trends, and cyclical behavior, which may indicate the need for specific
transformations and model types. Autoregressive (AR), moving average (MA), ARMA, and
ARIMA models are all frequently used time series models. As an example, a call center can use a
time series model to forecast how many calls it will receive per hour at different times of day.

Unit III Da Notes

Uploaded by

Copyright:

Available Formats

Unit III Da Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit III Da Notes

Uploaded by

Copyright:

Available Formats

Regression: Types

 Support Vector Regression.

 Decision Tree Regression.

 Random Forest Regression.

Simple Linear Regression

 Y is the dependent variable

 X is the independent variable

 Y is the dependent variable

 X1, X2, …, Xp are the independent variables

 β1, β2, …, βn are the slopes

Negative Linear Relationship:

MSE function can be calculated as:

Cost function(J)=1/n∑in (yi^ − yi )2

Example Program LINEAR REGRESSION

import matplotlib.pyplot as mtp

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)

#Fitting the Simple Linear Regression model to the training dataset

from sklearn.linear_model import LinearRegression

#Prediction of Test and Training set result

mtp.scatter(x_train, y_train, color="green")

mtp.plot(x_train, x_pred, color="red")

mtp.title("Salary vs Experience (Training Dataset)")

mtp.scatter(x_test, y_test, color="blue")

mtp.plot(x_train, x_pred, color="red")

mtp.title("Salary vs Experience (Test Dataset)")

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

 y is the dependent variable.

 x is the independent variable.

 n is the degree of the polynomial.

Loss and Cost Function – Polynomial Regression

3. Cubic – if degree as 3 and goes on, on the basis of degree.

Example Program for Polynomial Regression

import matplotlib.pyplot as plt

# Features and the target variables

# Fitting Linear Regression to the dataset

from sklearn.linear_model import LinearRegression

# Fitting Polynomial Regression to the dataset

from sklearn.preprocessing import PolynomialFeatures

# Visualising the Polynomial Regression results

# after converting predict variable to 2D array

# Predicting a new result with Polynomial Regression

# after converting predict variable to 2D array

Key points in Multivariate analysis:

3. Interpretation: Multivariate analysis allows for a more nuanced interpretation of

Types of Multivariate Analysis Techniques

Multivariate methods can be subdivided according to different aspects.

Cluster analysis is a statistical technique to identify groups or “clusters” of similar objects or

 Minutes watched per day

 Total viewing sessions per week

 Number of unique shows viewed per month

MANOVA (Multivariate Analysis of Variance)

Here are some of the advantages of multivariate analysis:

Support Vector Machine

The equation for the linear hyperplane can be written as:

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

plt.scatter(df0['sepal length (cm)'],df0['sepal width (cm)'],color='green',marker='*')

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

 Linear Kernel: 𝐾(𝑥,𝑦)=𝑥𝑇𝑦K(x,y)=xTy