Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ML Mod 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Q1.

EM ALGORITHM
Ans

the EM algorithm works as follows:

1. Initialize the model parameters and cluster assignments. This can be done
randomly or using a heuristic method.
2. E-step (Expectation step): Calculate the expected probability of each
data point belonging to each cluster, given the current model parameters
and cluster assignments.
3. M-step (Maximization step): Update the model parameters to maximize
the expected log-likelihood of the data, given the current cluster
assignments.
4. Repeat steps 2 and 3 until the model parameters and cluster assignments
converge.
Q2. Density based clustering
Ans
Density-based clustering is a type of clustering algorithm that groups
data points based on their density in the data space. The most popular
density-based clustering algorithm is DBSCAN. DBSCAN stands for
Density-Based Spatial Clustering of Applications with Noise.

In short, DBSCAN works as follows:

1. For each data point, find all other data points within a certain
distance (ε-neighborhood).
2. If a data point has at least a minimum number of neighbors
(MinPts) within its ε-neighborhood, then it is considered a core point.
3. Clusters are formed by connecting all core points that are directly or
indirectly connected to each other through a chain of core points.
4. Data points that are not core points and are not reachable from any
core point are considered noise.

Advantages of density-based clustering:

● It can identify clusters of arbitrary shape and size.


● It is robust to outliers.
● It does not require the number of clusters to be specified in
advance.
● It is not very sensitive to noise, it means it is noise tolerant.
● It is the second most used clustering method after K-means.
Disadvantages of density-based clustering:

● It can be sensitive to the choice of ε and MinPts parameters.


● It can be computationally expensive for large datasets.
● It can be slow in execution for higher dimensions.
● Adaptability of variation in local density is less.
3.Explain the distance metrics used in clustering
Ans
Q. K means Clustering
Ans
The k-means algorithm is a popular clustering algorithm used in machine
learning and data mining. It is an iterative algorithm that partitions a dataset
into k clusters, where each data point belongs to the cluster with the nearest
mean. The goal is to minimize the within-cluster variance or the sum of squared
distances between data points and their assigned cluster centroids.

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids.

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means assign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

You might also like