Python Special Assignment Solution Abhijeet
Python Special Assignment Solution Abhijeet
Abhijeet Singh
Powered by Crisp Analytics
Powered by Crisp Analytics
Table Of Contents
Table Of Contents ............................................................................................................................................................................................ 5
Problem ...................................................................................................................................................................................................................6
Approach ...................................................................................................................................................................................................... 3
Stage 2 ......................................................................................................................................................................................................... 18
Conclusion ............................................................................................................................................................................................... 20
Powered by Crisp Analytics
Problem
What values, within a certain range or boundary, could be assigned to each letter of
the English alphabet (from A to Z), allowing all the alphabet's letters to be classified
into two clusters after being processed by a function?
Conditions
number.
1. (5,9)
2. (20, 11)
3. (8, 15)
Approach:
The problem statement when looked for the first time seemed very straight forward as
clustering random values is an easy task, but as I started working on the problem the major
issue that I stumbled upon was how to give meaning to the clusters and not just assign each
alphabet a functional value.
Powered by Crisp Analytics
Initially I was clustering the alphabets using just one functional value but when visualized it still
was just a random value mapping via a function.
So, to give “MEANING” to the clusters and the alphabets in the clusters I created a
mathematical vector space using two distinct functions. Each point on this vector space had a
pre-defined set of values which are given by the functions creating the vector space which
gives meaning to any object that exists in that space by assigning it a unique vector.
I put the complete set of alphabets on this vector space which in a way created a vector for each
alphabet hence creating meaningful data points on the space and then I applied the clustering
algorithm to classify multiple meaningful clusters.
1. Selecting the range of alphabets and then assigning them random variables using
Random module in python.
2. Passing these randomly generated values through mathematical functions to get
meaningful set of numbers.
3. Looking out for the different mathematical functions and learning about their properties
on how they can help us in clustering the data accordingly.
4. Generating a n-dimensional vector space and assigning a unique vector to each of the
alphabets using these functional values. (Here I have used a 2-d vector space)
5. Using Machine learning algorithm (here K-Means) to classify these alphabets into
multiple clusters. (Here number of clusters are 2)
Powered by Crisp Analytics
Python Terminologies:
Def: Python def function is used to define a function, it is placed before a function name that
is provided by the user to create a user-defined function.
Dictionary: A dictionary represents the mapping of two sets of things, where each key is
assigned a respective value, its representation is as {‘KEY’:’VALUE’}.
DataFrame: A DataFrame is a tabular form of data which consists of rows, columns, and cells
which contains the data entries in visually appealing format.
Clustering: Clustering is a set of techniques used to partition data into groups, or clusters.
Clusters are loosely defined as groups of data objects that are more similar to other objects in
their cluster than they are to data objects in other clusters.
1. Pandas
2. NumPy
Powered by Crisp Analytics
3. Matplotlib.pyplot
4. Math
5. Tabulate
6. SKlearn
Now to start with the coding part firstly I imported all the required libraries and modules,
In the following step I have defined a function “assign_random_values,” this function takes two
inputs ‘alphabet range’ and ‘value range’ and then assign random values from the given range
to each individual alphabet also making sure that the values are unique for each alphabet.
After assigning the values and mapping them onto each alphabet given below is the tabular
form of the data.
Powered by Crisp Analytics
After assigning these random values I have further assigned unique indices to each alphabet for
better understanding of the alphabet in numerical form.
Powered by Crisp Analytics
Now here is a visual representation of how these random values have been assigned to each
alphabet via their unique indices.
In this next step, I have used multiple functions and passed the random values through all of
them to get meaningful functional values which will be indirectly mapped to each of the
alphabet.
So, to choose these functions I started looking for their mathematical forms and their
properties which will help the data to be further resolved and classified into multiple
clusters.
Basic properties that I looked for while searching the functions are as follows:
While searching I had to keep in mind the problem statement as the function can behave
completely different for the range we are working on as it can be highly non-linear but for
the range of values that we are working on, if they lie on the part of the function that has
an extremely limited variation then the rideability and the meaningfulness of the clusters
will be hampered.
Initially I used a Sigmoid function and the derivative of the Sigmoid function due to their
mathematical properties which I have shared below in detail.
Sigmoid Function
A sigmoid function is a bounded, differentiable, real function that is defined for all real
input values and has a non-negative derivative at each point and exactly one inflection
point. A sigmoid "function" and a sigmoid "curve" refer to the same object.
This function is also known as “S” function due to its graphical representation.
A sigmoid function is convex for values less than a particular point, and it is concave for values
greater than that point in many of the cases, that point is 0.
Now since this function makes the major difference at “x=0” and the ranges that we are taking
here are all on the positive side of the axis so, that makes this function not at its best for us,
but let us look at a different function.
Below is the representation of the derivative function and its comparison with the Sigmoid
function.
1. As we can already see that the sigmoid function that we were using did have only half of
its variation on the positive x axis.
2. This function provides far better options for varying functional values in the given ranges
3. Since it provides its whole functional values on the positive x axis only the relative
difference between the values will be highly noticeable when compared to the normal
sigmoid function in our case.
Now, I used these two functional values along with the other data and made a table for
a better understanding of data we have so far,
Powered by Crisp Analytics
Now in the next step I have used these functional values (“Sigmoid values”,” Derivative of
Sigmoid values”) as different unique features for each alphabet and combining them I have
Powered by Crisp Analytics
created a vector space which will represent each alphabet with meaningful values assigned to
each one of them in a remarkably analogous way as co-ordinates on a graph.
I have used K-Means from sklearn library to classify the alphabets into two clusters based on the
feature values of the vector space.
I have plotted the clusters formed using these two features for better visualization of the
clustering.
Powered by Crisp Analytics
Here, the clustering has happened, but it is very trivial due to the nature of the functions or the
features that I have used here in creating the vector space.
Now we can modify the mathematical forms of the function used here in such a way to get
a non-trivial form of cluster for the range we are working on and we can also try searching
for other functions having non-linear and varied nature for a very large domain of values,
keeping the meaningfulness also and creating good enough variation so as two form non-
trivial clusters.
To make our data more varying and non-linear I looked for a completely different function
which can provide us non-linearity in any range given with very slight modifications, the
function which i have used here is “Exponentially decaying sinusoidal function.”
This function has an incredibly unique nature, its properties are: having oscillatory nature but
non-periodic on the same time and also its exponential decaying nature makes it a great option
for this problem statement.
Powered by Crisp Analytics
The mathematical function used here is represented by, f(t)=exp(0.2×t) × sin(10×t). This
function takes “t” as an input and “0.2” and “10” values insode the parenthesis can be tuned
according to the need.
Some of the reasons why I have chosen this function particularly are:
I used this function as well and passed the random values through it, below I have
shown the updated table.
Powered by Crisp Analytics
Further, I changed my vector space and used the Derivative values as feature 1 and
exponentially decaying values as feature 2.
And applying clustering to this new vector space comprised of (“Derivative of sigmoid
function”,” Exponentially decaying sinusoidal function”) and plotting the results of clustering.
Powered by Crisp Analytics
This figure shows that the vector space formed using Decaying values shows better
non-linearity and that clustering is better visualized in this space.
Red cross in the figure are the centroids for each cluster, which the K-Means clustering
uses to specify each data point the cluster it belongs to.
Since we have two more ranges in which we have to cluster the alphabets, I have
followed the same procedure and I have attached the plots obtained by using the
second vector space.
Powered by Crisp Analytics
STAGE – 2
For clustering into three different clusters, I have used the same approach as used in the first
part and the only change is the parameter “num_clusters” which has been updated to be 3
instead of 2.
Conclusion:
In conclusion, this research pursued the task of clustering alphabets in a manner that extends
beyond mere functional value assignments. To overcome this challenge, a novel approach was
adopted, involving the construction of a mathematical vector space employing two distinct
functions.
This vector space facilitated the meaningful representation of each alphabet as a data point
with predefined values, imparting semantic significance to their positions. By subjecting the
entire alphabet set to a clustering algorithm, the study successfully revealed multiple clusters
that possess interpretative significance. These clusters provide valuable insights into the
underlying patterns and associations among the alphabets.
While this approach has shown promising results, the process of assigning contextual meaning
to the clusters still necessitates careful analysis and interpretation by researchers. As future
directions, further investigations may explore alternative data representations, clustering
techniques and better approaches to solve the assignment.