AI With Python - Unsupervised Learning - Clustering
AI With Python - Unsupervised Learning - Clustering
Unsupervised machine learning algorithms do not have any supervisor to provide any sort of
guidance. That is why they are closely aligned with what some call true artificial intelligence.
In unsupervised learning, there would be no correct answer and no teacher for the guidance.
Algorithms need to discover the interesting pattern in data for learning.
What is Clustering?
Basically, it is a type of unsupervised learning method and a common technique for statistical data
analysis used in many fields. Clustering mainly is a task of dividing the set of observations into
subsets, called clusters, in such a way that observations in the same cluster are similar in one
sense and they are dissimilar to the observations in other clusters. In simple words, we can say
that the main goal of clustering is to group the data on the basis of similarity and dissimilarity.
For example, the following diagram shows similar kind of data in different clusters −
K-Means algorithm
K-means clustering algorithm is one of the well-known algorithms for clustering the data. We need
to assume that the numbers of clusters are already known. This is also called flat clustering. It is an
iterative clustering algorithm. The steps given below need to be followed for this algorithm −
Step 2 − Fix the number of clusters and randomly assign each data point to a cluster. Or in other
words we need to classify our data based on the number of clusters.
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 1/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
As this is an iterative algorithm, we need to update the locations of K centroids with every iteration
until we find the global optima or in other words the centroids reach at their optimal locations.
The following code will help in implementing K-means clustering algorithm in Python. We are going
to use the Scikit-learn module.
import numpy as np
The following line of code will help in generating the two-dimensional dataset, containing four
blobs, by using make_blob from the sklearn.dataset package.
plt.show()
Here, we are initializing kmeans to be the KMeans algorithm, with the required parameter of how
many clusters (n_clusters).
kmeans = KMeans(n_clusters = 4)
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 2/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
centers = kmeans.cluster_centers_
The code given below will help us plot and visualize the machine's findings based on our data, and
the fitment according to the number of clusters that are to be found.
plt.show()
First of all, we need to start with the data points assigned to a cluster of their own.
Now, it computes the centroids and update the location of new centroids.
By repeating this process, we move closer the peak of cluster i.e. towards the region of
higher density.
This algorithm stops at the stage where centroids do not move anymore.
With the help of following code we are implementing Mean Shift clustering algorithm in Python. We
are going to use Scikit-learn module.
import numpy as np
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 3/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
style.use("ggplot")
The following code will help in generating the two-dimensional dataset, containing four blobs, by
using make_blob from the sklearn.dataset package.
centers = [[2,2],[4,5],[3,10]]
plt.scatter(X[:,0],X[:,1])
plt.show()
Now, we need to train the Mean Shift cluster model with the input data.
ms = MeanShift()
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_
The following code will print the cluster centers and the expected number of cluster as per the input
data −
print(cluster_centers)
n_clusters_ = len(np.unique(labels))
[[ 3.23005036 3.84771893]
[ 3.02057451 9.88928991]]
Estimated clusters: 2
The code given below will help plot and visualize the machine's findings based on our data, and
the fitment according to the number of clusters that are to be found.
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 4/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
colors = 10*['r.','g.','b.','c.','k.','y.','m.']
for i in range(len(X)):
plt.scatter(cluster_centers[:,0],cluster_centers[:,1],
plt.show()
Silhouette Analysis
This method can be used to check the quality of clustering by measuring the distance between the
clusters. Basically, it provides a way to assess the parameters like number of clusters by giving a
silhouette score. This score is a metric that measures how close each point in one cluster is to the
points in the neighboring clusters.
Score of +1 − Score near +1 indicates that the sample is far away from the neighboring
cluster.
Score of 0 − Score 0 indicates that the sample is on or very close to the decision boundary
between two neighboring clusters.
Score of -1 − Negative score indicates that the samples have been assigned to the wrong
clusters.
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 5/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
Here, 𝑝 is the mean distance to the points in the nearest cluster that the data point is not a part of.
And, 𝑞 is the mean intra-cluster distance to all the points in its own cluster.
For finding the optimal number of clusters, we need to run the clustering algorithm again by
importing the metrics module from the sklearn package. In the following example, we will run the
K-means clustering algorithm to find the optimal number of clusters −
import numpy as np
With the help of the following code, we will generate the two-dimensional dataset, containing four
blobs, by using make_blob from the sklearn.dataset package.
scores = []
We need to iterate the K-means model through all the values and also need to train it with the input
data.
kmeans.fit(X)
Now, estimate the silhouette score for the current clustering model using the Euclidean distance
metric −
The following line of code will help in displaying the number of clusters as well as Silhouette score.
scores.append(score)
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 6/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
Number of clusters = 9
The concept of finding nearest neighbors may be defined as the process of finding the closest
point to the input point from the given dataset. The main use of this KNN)K-nearest neighbors)
algorithm is to build classification systems that classify a data point on the proximity of the input
data point to various classes.
The Python code given below helps in finding the K-nearest neighbors of a given data set −
import numpy as np
A = np.array([[3.1, 2.3], [2.3, 4.2], [3.9, 3.5], [3.7, 6.4], [4.8, 1.9],
[8.3, 3.1], [5.2, 7.5], [4.8, 4.7], [3.5, 5.1], [4.4, 2.9],])
k=3
We also need to give the test data from which the nearest neighbors is to be found −
The following code can visualize and plot the input data defined by us −
plt.figure()
plt.title('Input data')
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 7/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
Now, we need to build the K Nearest Neighbor. The object also needs to be trained
We can visualize the nearest neighbors along with the test data point
plt.figure()
plt.title('Nearest neighbors')
plt.scatter(test_data[0], test_data[1],
plt.show()
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 8/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
Output
K Nearest Neighbors
1 is [ 3.1 2.3]
2 is [ 3.9 3.5]
3 is [ 4.4 2.9]
Example
We are building a KNN classifier to recognize digits. For this, we will use the MNIST dataset. We
will write this code in the Jupyter Notebook.
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 9/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
import pandas as pd
%matplotlib inline
import numpy as np
The following code will display the image of digit to verify what image we have to test −
def Image_display(i):
plt.imshow(digit['images'][i],cmap = 'Greys_r')
plt.show()
Now, we need to load the MNIST dataset. Actually there are total 1797 images but we are using
the first 1600 images as training sample and the remaining 197 would be kept for testing purpose.
digit = load_digits()
digit_d = pd.DataFrame(digit['data'][0:1600])
Image_display(0)
Image_display(0)
Image of 0 is displayed as follows −
Image_display(9)
Image of 9 is displayed as follows −
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 10/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
digit.keys()
Now, we need to create the training and testing data set and supply testing data set to the KNN
classifiers.
train_x = digit['data'][:1600]
train_y = digit['target'][:1600]
KNN = KNeighborsClassifier(20)
KNN.fit(train_x,train_y)
The following output will create the K nearest neighbor classifier constructor −
weights = 'uniform')
We need to create the testing sample by providing any arbitrary number greater than 1600, which
were the training samples.
test = np.array(digit['data'][1725])
test1 = test.reshape(1,-1)
Image_display(1725)
Image_display(6)
Image of 6 is displayed as follows −
KNN.predict(test1)
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 11/12
7/31/22, 8:00 PM AI with Python - Unsupervised Learning: Clustering
array([6])
digit['target_names']
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
https://www.tutorialspoint.com/artificial_intelligence_with_python/artificial_intelligence_with_python_unsupervised_learning_clustering.htm 12/12