0% found this document useful (0 votes)

324 views

Hierarchical Clustering in Machine Learning - GeeksforGeeks

Hierarchical clustering is a method in machine learning that builds a hierarchy of clusters based on the similarity between data points. It can be categorized into two types: agglomerative (bottom-up) and divisive (top-down) clustering, each with distinct algorithms for merging or splitting clusters. The process is visualized through a dendrogram, which helps determine the ideal number of clusters by slicing at various heights.

Uploaded by

Vansh Munjal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

324 views

Hierarchical Clustering in Machine Learning - GeeksforGeeks

Uploaded by

Vansh Munjal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

Hierarchical Clustering in Machine Learning

Last Updated : 11 Mar, 2024

In data mining and statistics, hierarchical clustering analysis is a method

of clustering analysis that seeks to build a hierarchy of clusters i.e. tree-
type structure based on the hierarchy.

In machine learning, clustering is the unsupervised learning technique

that groups the data based on similarity between the set of data. There
are different-different types of clustering algorithms in machine
learning. Connectivity-based clustering: This type of clustering
algorithm builds the cluster based on the connectivity between the data
points. Example: Hierarchical clustering

Centroid-based clustering: This type of clustering algorithm forms

around the centroids of the data points. Example: K-Means
clustering, K-Mode clustering
Distribution-based clustering: This type of clustering algorithm is
modeled using statistical distributions. It assumes that the data
points in a cluster are generated from a particular probability
distribution, and the algorithm aims to estimate the parameters of
the distribution to group similar data points into clusters Example:
Gaussian Mixture Models (GMM)
Density-based clustering: This type of clustering algorithm groups
together data points that are in high-density concentrations and
separates points in low-concentrations regions. The basic idea is that
it identifies regions in the data space that have a high density of data
points and groups those points together into clusters. Example:
DBSCAN(Density-Based Spatial Clustering of Applications with
Noise)
https://www.geeksforgeeks.org/hierarchical-clustering/ 1/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

In this article, we will discuss connectivity-based clustering algorithms

i.e Hierarchical clustering

Hierarchical clustering
Hierarchical clustering is a connectivity-based clustering model that
groups the data points together that are close to each other based on
the measure of similarity or distance. The assumption is that data points
Courses @90% Refund Data Science IBM Certification Data Science Data Science Projects Data Analy
that are close to each other are more similar or related than data points
that are farther apart.

A dendrogram, a tree-like figure produced by hierarchical clustering,

depicts the hierarchical relationships between groups. Individual data
points are located at the bottom of the dendrogram, while the largest
clusters, which include all the data points, are located at the top. In
order to generate different numbers of clusters, the dendrogram can be
sliced at various heights.

The dendrogram is created by iteratively merging or splitting clusters

based on a measure of similarity or distance between data points.
Clusters are divided or merged repeatedly until all data points are
contained within a single cluster, or until the predetermined number of
clusters is attained.

We can look at the dendrogram and measure the height at which the
branches of the dendrogram form distinct clusters to calculate the ideal
number of clusters. The dendrogram can be sliced at this height to
determine the number of clusters.

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:

1. Agglomerative Clustering
2. Divisive clustering

Hierarchical Agglomerative Clustering

https://www.geeksforgeeks.org/hierarchical-clustering/ 2/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

It is also known as the bottom-up approach or hierarchical

agglomerative clustering (HAC). A structure that is more informative
than the unstructured set of clusters returned by flat clustering. This
clustering algorithm does not require us to prespecify the number of
clusters. Bottom-up algorithms treat each data as a singleton cluster at
the outset and then successively agglomerate pairs of clusters until all
clusters have been merged into a single cluster that contains all data.

Algorithm :

given a dataset (d1, d2, d3, ....dN) of size N

# compute the distance matrix
for i=1 to N:
# as the distance matrix is symmetric about
# the primary diagonal so we compute only lower
# part of the primary diagonal
for j=1 to i:
dis_mat[i][j] = distance[di, dj]
each data point is a singleton cluster
repeat
merge the two cluster having minimum distance
update the distance matrix
until only a single cluster remains

Hierarchical Agglomerative Clustering

Steps:

https://www.geeksforgeeks.org/hierarchical-clustering/ 3/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

Consider each alphabet as a single cluster and calculate the distance

of one cluster from all the other clusters.
In the second step, comparable clusters are merged together to form
a single cluster. Let’s say cluster (B) and cluster (C) are very similar
to each other therefore we merge them in the second step similarly
to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE),
(F)]
We recalculate the proximity according to the algorithm and merge
the two nearest clusters([(DE), (F)]) together to form new clusters as
[(A), (BC), (DEF)]
Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now
left with clusters [(A), (BCDEF)].
At last, the two remaining clusters are merged together to form a
single cluster [(ABCDEF)].

Python implementation of the above algorithm using the scikit-learn

library:

Python3

from sklearn.cluster import AgglomerativeClustering

import numpy as np

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])

# here we need to mention the number of clusters

# otherwise the result will be a single cluster
# containing all the data
clustering = AgglomerativeClustering(n_clusters=2).fit(X)

# print the class labels

print(clustering.labels_)

Output :

https://www.geeksforgeeks.org/hierarchical-clustering/ 4/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

[1, 1, 1, 0, 0, 0]

Hierarchical Divisive clustering

It is also known as a top-down approach. This algorithm also does not

require to prespecify the number of clusters. Top-down clustering
requires a method for splitting a cluster that contains the whole data
and proceeds by splitting clusters recursively until individual data have
been split into singleton clusters.

Algorithm :

given a dataset (d1, d2, d3, ....dN) of size N

at the top we have all data in one cluster
the cluster is split using a flat clustering method eg. K-Means
etc
repeat
choose the best cluster among all the clusters to split
split that cluster by the flat clustering algorithm
until each data is in its own singleton cluster

Hierarchical Divisive clustering

Computing Distance Matrix

While merging two clusters we check the distance between two every
pair of clusters and merge the pair with the least distance/most

https://www.geeksforgeeks.org/hierarchical-clustering/ 5/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

similarity. But the question is how is that distance determined. There

are different ways of defining Inter Cluster distance/similarity. Some of
them are:

1. Min Distance: Find the minimum distance between any two points of
the cluster.
2. Max Distance: Find the maximum distance between any two points
of the cluster.
3. Group Average: Find the average distance between every two points
of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the
increase in squared error when two clusters are merged.

For example, if we group a given data using different methods, we may

get different results:

Distance Matrix Comparision in Hierarchical Clustering

Implementations code

Python3

import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

# randomly chosen dataset

X = np.array([[1, 2], [1, 4], [1, 0],

https://www.geeksforgeeks.org/hierarchical-clustering/ 6/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

[4, 2], [4, 4], [4, 0]])

# Perform hierarchical clustering

Z = linkage(X, 'ward')

# Plot dendrogram
dendrogram(Z)

plt.title('Hierarchical Clustering Dendrogram')

plt.xlabel('Data point')
plt.ylabel('Distance')
plt.show()

Output:

Hierarchical Clustering Dendrogram

Hierarchical Agglomerative vs Divisive Clustering

Divisive clustering is more complex as compared to agglomerative

clustering, as in the case of divisive clustering we need a flat
clustering method as “subroutine” to split each cluster until we have
each data having its own singleton cluster.
Divisive clustering is more efficient if we do not generate a complete
hierarchy all the way down to individual data leaves. The time
complexity of a naive agglomerative clustering is O(n3) because we
exhaustively scan the N x N matrix dist_mat for the lowest distance
in each of N-1 iterations. Using priority queue data structure we can
reduce this complexity to O(n2logn). By using some more
https://www.geeksforgeeks.org/hierarchical-clustering/ 7/14
1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

optimizations it can be brought down to O(n2). Whereas for divisive

clustering given a fixed number of top levels, using an efficient flat
algorithm like K-Means, divisive algorithms are linear in the number
of patterns and clusters.
A divisive algorithm is also more accurate. Agglomerative clustering
makes decisions by considering the local patterns or neighbor points
without initially taking into account the global distribution of data.
These early decisions cannot be undone. whereas divisive clustering
takes into consideration the global distribution of data when making
top-level partitioning decisions.

Get IBM Certification and a 90% fee refund on completing 90%

course in 90 days! Take the Three 90 Challenge today.

Master Machine Learning, Data Science & AI with this complete

program and also get a 90% refund. What more motivation do you
need? Start the challenge right away!

Comment More info Next Article

Implementing Agglomerative
Advertise with us Clustering using Sklearn

Hierarchical Clustering in Machine Learning - GeeksforGeeks

Uploaded by

Hierarchical Clustering in Machine Learning - GeeksforGeeks

Uploaded by

1/16/25, 3:40 PM Hierarchical Clustering in Machine Learning - GeeksforGeeks

Hierarchical Clustering in Machine Learning

In data mining and statistics, hierarchical clustering analysis is a method

In machine learning, clustering is the unsupervised learning technique

Centroid-based clustering: This type of clustering algorithm forms

In this article, we will discuss connectivity-based clustering algorithms

A dendrogram, a tree-like figure produced by hierarchical clustering,

The dendrogram is created by iteratively merging or splitting clusters

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:

Hierarchical Agglomerative Clustering

It is also known as the bottom-up approach or hierarchical

given a dataset (d1, d2, d3, ....dN) of size N

Hierarchical Agglomerative Clustering

Consider each alphabet as a single cluster and calculate the distance

Python implementation of the above algorithm using the scikit-learn

from sklearn.cluster import AgglomerativeClustering

# randomly chosen dataset

# here we need to mention the number of clusters

# print the class labels

Hierarchical Divisive clustering

It is also known as a top-down approach. This algorithm also does not

given a dataset (d1, d2, d3, ....dN) of size N

Hierarchical Divisive clustering

Computing Distance Matrix

similarity. But the question is how is that distance determined. There

For example, if we group a given data using different methods, we may

Distance Matrix Comparision in Hierarchical Clustering

# randomly chosen dataset

[4, 2], [4, 4], [4, 0]])

# Perform hierarchical clustering

plt.title('Hierarchical Clustering Dendrogram')

Hierarchical Clustering Dendrogram

Hierarchical Agglomerative vs Divisive Clustering

Divisive clustering is more complex as compared to agglomerative

optimizations it can be brought down to O(n2). Whereas for divisive

Get IBM Certification and a 90% fee refund on completing 90%

Master Machine Learning, Data Science & AI with this complete

Comment More info Next Article

You might also like