Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Python Special Assignment Solution Abhijeet

The document discusses solving a problem involving assigning random values to letters of the English alphabet such that the letters can be classified into two clusters after processing by a function. It describes the approach taken, which was to generate a 2D vector space using two distinct functions to give meaning to the clusters. Random values were assigned to letters and passed through functions to generate functional values, then a k-means clustering algorithm was used to classify the letters into two clusters.

Uploaded by

militantmaverick
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Python Special Assignment Solution Abhijeet

The document discusses solving a problem involving assigning random values to letters of the English alphabet such that the letters can be classified into two clusters after processing by a function. It describes the approach taken, which was to generate a 2D vector space using two distinct functions to give meaning to the clusters. Random values were assigned to letters and passed through functions to generate functional values, then a k-means clustering algorithm was used to classify the letters into two clusters.

Uploaded by

militantmaverick
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Powered by Crisp Analytics

PYTHON SPECIAL ASSIGNMENT


Documentation

Abhijeet Singh
Powered by Crisp Analytics
Powered by Crisp Analytics

Table Of Contents
Table Of Contents ............................................................................................................................................................................................ 5

Problem ...................................................................................................................................................................................................................6

Approach ...................................................................................................................................................................................................... 3

Python terminologies ......................................................................................................................................................................... 4

Functions and their need ................................................................................................................................................................ 8

Sigmoid function ................................................................................................................................................................................... 8

Derivative of Sigmoid function .................................................................................................................................................... 9

Solution for the Anomaly ................................................................................................................................................................ 13

Exponentially decaying sinusoidal function ..................................................................................................................... 14

Stage 2 ......................................................................................................................................................................................................... 18

Conclusion ............................................................................................................................................................................................... 20
Powered by Crisp Analytics

Problem
What values, within a certain range or boundary, could be assigned to each letter of
the English alphabet (from A to Z), allowing all the alphabet's letters to be classified
into two clusters after being processed by a function?

Conditions

1. The values could be Float type.

2. No two alphabets could be assigned the same value/

number.

3. One Alphabet could not be present in two clusters.

4. Any function f(x): X->Y, could be considered.

Range within which the value must lie:

1. (5,9)

2. (20, 11)

3. (8, 15)

Approach:

The problem statement when looked for the first time seemed very straight forward as
clustering random values is an easy task, but as I started working on the problem the major
issue that I stumbled upon was how to give meaning to the clusters and not just assign each
alphabet a functional value.
Powered by Crisp Analytics

Initially I was clustering the alphabets using just one functional value but when visualized it still
was just a random value mapping via a function.

So, to give “MEANING” to the clusters and the alphabets in the clusters I created a
mathematical vector space using two distinct functions. Each point on this vector space had a
pre-defined set of values which are given by the functions creating the vector space which
gives meaning to any object that exists in that space by assigning it a unique vector.

I put the complete set of alphabets on this vector space which in a way created a vector for each
alphabet hence creating meaningful data points on the space and then I applied the clustering
algorithm to classify multiple meaningful clusters.

The steps I followed to solve this problem are mentioned below:

1. Selecting the range of alphabets and then assigning them random variables using
Random module in python.
2. Passing these randomly generated values through mathematical functions to get
meaningful set of numbers.
3. Looking out for the different mathematical functions and learning about their properties
on how they can help us in clustering the data accordingly.
4. Generating a n-dimensional vector space and assigning a unique vector to each of the
alphabets using these functional values. (Here I have used a 2-d vector space)
5. Using Machine learning algorithm (here K-Means) to classify these alphabets into
multiple clusters. (Here number of clusters are 2)
Powered by Crisp Analytics

Fig 1: Block diagram for the solution path

Python Terminologies:

Some common python terms which are used in the code:

Def: Python def function is used to define a function, it is placed before a function name that
is provided by the user to create a user-defined function.

Dictionary: A dictionary represents the mapping of two sets of things, where each key is
assigned a respective value, its representation is as {‘KEY’:’VALUE’}.

DataFrame: A DataFrame is a tabular form of data which consists of rows, columns, and cells
which contains the data entries in visually appealing format.

Clustering: Clustering is a set of techniques used to partition data into groups, or clusters.
Clusters are loosely defined as groups of data objects that are more similar to other objects in
their cluster than they are to data objects in other clusters.

Python modules and libraries used in the assignment are as follows:

1. Pandas
2. NumPy
Powered by Crisp Analytics

3. Matplotlib.pyplot
4. Math
5. Tabulate
6. SKlearn

Now to start with the coding part firstly I imported all the required libraries and modules,

In the following step I have defined a function “assign_random_values,” this function takes two
inputs ‘alphabet range’ and ‘value range’ and then assign random values from the given range
to each individual alphabet also making sure that the values are unique for each alphabet.

After assigning the values and mapping them onto each alphabet given below is the tabular
form of the data.
Powered by Crisp Analytics

Table 1: Mapping of alphabets with intial random values

After assigning these random values I have further assigned unique indices to each alphabet for
better understanding of the alphabet in numerical form.
Powered by Crisp Analytics

Now here is a visual representation of how these random values have been assigned to each
alphabet via their unique indices.

Plot 1: Indices vs Random values

In this next step, I have used multiple functions and passed the random values through all of
them to get meaningful functional values which will be indirectly mapped to each of the
alphabet.

Functions and their need:

So, to choose these functions I started looking for their mathematical forms and their
properties which will help the data to be further resolved and classified into multiple
clusters.

Basic properties that I looked for while searching the functions are as follows:

1. Their graphical representation


2. Domain of the function
3. Variation in slopes
4. Mathematical nature (increasing, decreasing, monotonically increasing/decreasing)
5. Derivatives of the function
6. Linearity and Non-Linearity of the function

Reason for searching specific functions:


Powered by Crisp Analytics

While searching I had to keep in mind the problem statement as the function can behave
completely different for the range we are working on as it can be highly non-linear but for
the range of values that we are working on, if they lie on the part of the function that has
an extremely limited variation then the rideability and the meaningfulness of the clusters
will be hampered.

Initially I used a Sigmoid function and the derivative of the Sigmoid function due to their
mathematical properties which I have shared below in detail.

Sigmoid Function

A sigmoid function is a bounded, differentiable, real function that is defined for all real
input values and has a non-negative derivative at each point and exactly one inflection
point. A sigmoid "function" and a sigmoid "curve" refer to the same object.

This function is also known as “S” function due to its graphical representation.

A sigmoid function is convex for values less than a particular point, and it is concave for values
greater than that point in many of the cases, that point is 0.

Fig 2: Sigmoid Function along with its mathematical equation


Powered by Crisp Analytics

Now since this function makes the major difference at “x=0” and the ranges that we are taking
here are all on the positive side of the axis so, that makes this function not at its best for us,
but let us look at a different function.

Derivative of Sigmoid function

Below is the representation of the derivative function and its comparison with the Sigmoid
function.

Fig 3: Comparison between sigmoid and its derivative

Reasons for picking the Derivative function:

1. As we can already see that the sigmoid function that we were using did have only half of
its variation on the positive x axis.
2. This function provides far better options for varying functional values in the given ranges
3. Since it provides its whole functional values on the positive x axis only the relative
difference between the values will be highly noticeable when compared to the normal
sigmoid function in our case.

Now, I used these two functional values along with the other data and made a table for
a better understanding of data we have so far,
Powered by Crisp Analytics

Table 2: Dataframe with fucntional values

Now in the next step I have used these functional values (“Sigmoid values”,” Derivative of
Sigmoid values”) as different unique features for each alphabet and combining them I have
Powered by Crisp Analytics

created a vector space which will represent each alphabet with meaningful values assigned to
each one of them in a remarkably analogous way as co-ordinates on a graph.

I have used K-Means from sklearn library to classify the alphabets into two clusters based on the
feature values of the vector space.

I have plotted the clusters formed using these two features for better visualization of the
clustering.
Powered by Crisp Analytics

Plot 2: Clustering using Sigmoid function and its derivative

Here, the clustering has happened, but it is very trivial due to the nature of the functions or the
features that I have used here in creating the vector space.

Solution for this anomaly:

Now we can modify the mathematical forms of the function used here in such a way to get
a non-trivial form of cluster for the range we are working on and we can also try searching
for other functions having non-linear and varied nature for a very large domain of values,
keeping the meaningfulness also and creating good enough variation so as two form non-
trivial clusters.

To make our data more varying and non-linear I looked for a completely different function
which can provide us non-linearity in any range given with very slight modifications, the
function which i have used here is “Exponentially decaying sinusoidal function.”

Exponentially decaying sinusoidal function:

This function has an incredibly unique nature, its properties are: having oscillatory nature but
non-periodic on the same time and also its exponential decaying nature makes it a great option
for this problem statement.
Powered by Crisp Analytics

The mathematical function used here is represented by, f(t)=exp(0.2×t) × sin(10×t). This
function takes “t” as an input and “0.2” and “10” values insode the parenthesis can be tuned
according to the need.

Graphical representation for this function has been shown below,

Fig 4: Exponentially decaying sin function

Some of the reasons why I have chosen this function particularly are:

1. It provides the required non-linearity.


2. The Oscillatory nature of the function keeps the functional values to span a larger range.
3. It has multiple parameters through which we can tune the function to adjust to a very
wide variety of ranges.

I used this function as well and passed the random values through it, below I have
shown the updated table.
Powered by Crisp Analytics

Table 3: DataFrame with Decay functional values

Further, I changed my vector space and used the Derivative values as feature 1 and
exponentially decaying values as feature 2.

And applying clustering to this new vector space comprised of (“Derivative of sigmoid
function”,” Exponentially decaying sinusoidal function”) and plotting the results of clustering.
Powered by Crisp Analytics

Plot 3: Clustering on using Derivative and Decaying function

This figure shows that the vector space formed using Decaying values shows better
non-linearity and that clustering is better visualized in this space.

Red cross in the figure are the centroids for each cluster, which the K-Means clustering
uses to specify each data point the cluster it belongs to.

Since we have two more ranges in which we have to cluster the alphabets, I have
followed the same procedure and I have attached the plots obtained by using the
second vector space.
Powered by Crisp Analytics

Plot 4: Clustering for range (20,11)

Plot 5: Clustering for range (8,15)

STAGE – 2

What values, within any presumptive range or boundary, might be assigned


to each letter of the English alphabet (letters A through Z), allowing all
alphabet letters to be sorted into three groups after undergoing any
function?
Powered by Crisp Analytics

Conditions - same as previous, except range could be pre-assumed but fixed.

For clustering into three different clusters, I have used the same approach as used in the first
part and the only change is the parameter “num_clusters” which has been updated to be 3
instead of 2.

Plot 6: Clustering for range (5,9)


Powered by Crisp Analytics

Plot 7: Clustering for range (20,11)

Plot 8: Clustering for range (8,15)


Powered by Crisp Analytics

Conclusion:

In conclusion, this research pursued the task of clustering alphabets in a manner that extends
beyond mere functional value assignments. To overcome this challenge, a novel approach was
adopted, involving the construction of a mathematical vector space employing two distinct
functions.

This vector space facilitated the meaningful representation of each alphabet as a data point
with predefined values, imparting semantic significance to their positions. By subjecting the
entire alphabet set to a clustering algorithm, the study successfully revealed multiple clusters
that possess interpretative significance. These clusters provide valuable insights into the
underlying patterns and associations among the alphabets.

While this approach has shown promising results, the process of assigning contextual meaning
to the clusters still necessitates careful analysis and interpretation by researchers. As future
directions, further investigations may explore alternative data representations, clustering
techniques and better approaches to solve the assignment.

You might also like