Mid 2
Mid 2
Mid 2
• Classification
• classifies data (constructs a model) based on the training set and the values (class
labels) in a classifying attribute and uses it in classifying new data.
• Examples
• Credit/loan approval:
• Numeric Prediction
• Applications
Ans: Bayesian belief network is used in application where attributes of input are mutual dependent
on each other.
Ans :
If we use a too-small , we’ll get too broad clusters. Conversely, an overlarge l results in
clusters that are too specific:
It may require multiple runs to find the most suitable value of , which can be time-
consuming and resource-consuming.
Inability to Handle Categorical Data
Another drawback of the K-means algorithm is its inability to handle categorical data. The
algorithm works with numerical data, where distances between data points can be calculated.
However, categorical data doesn’t have a natural notion of distance or similarity.
When categorical data is used with the K-means algorithm, it requires converting the
categories into numerical values, such as using one-hot encoding. One shortcoming of using
one-hot encoding is that it treats each feature independently and can degrade performance
since it can significantly increase data dimensionality.
Time Complexity
The time complexity of the algorithm is , where is the number of
clusters, is the number of data points, is the number of dimensions, and is the
number of iterations.
So, even moderately large datasets can be challenging to handle if they’re high-
dimensional.
Once the nearest neighbor list is obtained take the majority vote of class labels among
the k-nearest neighbors
Where
– v is the class label
– yi is the class label for one of the nearest neighbors.
– I(.) is the indicator function that returns the value 1 if its argument is true and
0 other wise.
Weight the vote according to distance
– weight factor, w=1/d 2
Wards Method.
Text Clustering
• One popular text clustering algorithm is ward’s Minimum Variance method.
• It is an agglomerative hierarchal clustering technique and it tends to generate very
compact clusters.
• We can take either the Euclidean metric or Hamming distance as the measure of
dissimilarities between feature vectors.
• The clustering method begins with n clusters, one for each text.
• At each stage two clusters are merged to generate a new cluster.
• The clusters Ck and Ci are merged to get a new cluster Cki based on the following
criterion.
•
Ward’s method starts with n clusters, each containing a single object. These n clusters are
combined to make one cluster containing all objects. At each step, the process makes a new
cluster that minimizes variance, measured by an index called E (also called the sum of
squares index).
1. Agglomerative clustering
2. Divisive clustering
Agglomerative Clustering Example (Image taken from YouTube video on Hierarchical clustering by
Edureka )
Divisive clustering works by first assigning all the data points to one
cluster. Then, it looks for ways to split this cluster into two or more
smaller clusters. This process continues until each data point is in its
own cluster. For example, consider the following image.
Divisive Clustering Example (Image taken from YouTube video on Hierarchical clustering by Edureka )