0% found this document useful (0 votes)

11 views

Python Special Assignment Solution Abhijeet

The document discusses solving a problem involving assigning random values to letters of the English alphabet such that the letters can be classified into two clusters after processing by a function. It describes the approach taken, which was to generate a 2D vector space using two distinct functions to give meaning to the clusters. Random values were assigned to letters and passed through functions to generate functional values, then a k-means clustering algorithm was used to classify the letters into two clusters.

Uploaded by

militantmaverick

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Python Special Assignment Solution Abhijeet

Uploaded by

militantmaverick

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Powered by Crisp Analytics

PYTHON SPECIAL ASSIGNMENT

Documentation

Abhijeet Singh
Powered by Crisp Analytics
Powered by Crisp Analytics

Table Of Contents
Table Of Contents ............................................................................................................................................................................................ 5

Problem ...................................................................................................................................................................................................................6

Approach ...................................................................................................................................................................................................... 3

Python terminologies ......................................................................................................................................................................... 4

Functions and their need ................................................................................................................................................................ 8

Sigmoid function ................................................................................................................................................................................... 8

Derivative of Sigmoid function .................................................................................................................................................... 9

Solution for the Anomaly ................................................................................................................................................................ 13

Exponentially decaying sinusoidal function ..................................................................................................................... 14

Stage 2 ......................................................................................................................................................................................................... 18

Conclusion ............................................................................................................................................................................................... 20
Powered by Crisp Analytics

Problem
What values, within a certain range or boundary, could be assigned to each letter of
the English alphabet (from A to Z), allowing all the alphabet's letters to be classified
into two clusters after being processed by a function?

Conditions

1. The values could be Float type.

2. No two alphabets could be assigned the same value/

number.

3. One Alphabet could not be present in two clusters.

4. Any function f(x): X->Y, could be considered.

Range within which the value must lie:

1. (5,9)

2. (20, 11)

3. (8, 15)

Approach:

The problem statement when looked for the first time seemed very straight forward as
clustering random values is an easy task, but as I started working on the problem the major
issue that I stumbled upon was how to give meaning to the clusters and not just assign each
alphabet a functional value.
Powered by Crisp Analytics

Initially I was clustering the alphabets using just one functional value but when visualized it still
was just a random value mapping via a function.

So, to give “MEANING” to the clusters and the alphabets in the clusters I created a
mathematical vector space using two distinct functions. Each point on this vector space had a
pre-defined set of values which are given by the functions creating the vector space which
gives meaning to any object that exists in that space by assigning it a unique vector.

I put the complete set of alphabets on this vector space which in a way created a vector for each
alphabet hence creating meaningful data points on the space and then I applied the clustering
algorithm to classify multiple meaningful clusters.

The steps I followed to solve this problem are mentioned below:

1. Selecting the range of alphabets and then assigning them random variables using
Random module in python.
2. Passing these randomly generated values through mathematical functions to get
meaningful set of numbers.
3. Looking out for the different mathematical functions and learning about their properties
on how they can help us in clustering the data accordingly.
4. Generating a n-dimensional vector space and assigning a unique vector to each of the
alphabets using these functional values. (Here I have used a 2-d vector space)
5. Using Machine learning algorithm (here K-Means) to classify these alphabets into
multiple clusters. (Here number of clusters are 2)
Powered by Crisp Analytics

Fig 1: Block diagram for the solution path

Python Terminologies:

Some common python terms which are used in the code:

Def: Python def function is used to define a function, it is placed before a function name that
is provided by the user to create a user-defined function.

Dictionary: A dictionary represents the mapping of two sets of things, where each key is
assigned a respective value, its representation is as {‘KEY’:’VALUE’}.

DataFrame: A DataFrame is a tabular form of data which consists of rows, columns, and cells
which contains the data entries in visually appealing format.

Clustering: Clustering is a set of techniques used to partition data into groups, or clusters.
Clusters are loosely defined as groups of data objects that are more similar to other objects in
their cluster than they are to data objects in other clusters.

Python modules and libraries used in the assignment are as follows:

1. Pandas
2. NumPy
Powered by Crisp Analytics

3. Matplotlib.pyplot
4. Math
5. Tabulate
6. SKlearn

Now to start with the coding part firstly I imported all the required libraries and modules,

In the following step I have defined a function “assign_random_values,” this function takes two
inputs ‘alphabet range’ and ‘value range’ and then assign random values from the given range
to each individual alphabet also making sure that the values are unique for each alphabet.

After assigning the values and mapping them onto each alphabet given below is the tabular
form of the data.
Powered by Crisp Analytics

Table 1: Mapping of alphabets with intial random values

After assigning these random values I have further assigned unique indices to each alphabet for
better understanding of the alphabet in numerical form.
Powered by Crisp Analytics

Now here is a visual representation of how these random values have been assigned to each
alphabet via their unique indices.

Plot 1: Indices vs Random values

In this next step, I have used multiple functions and passed the random values through all of
them to get meaningful functional values which will be indirectly mapped to each of the
alphabet.

Functions and their need:

So, to choose these functions I started looking for their mathematical forms and their
properties which will help the data to be further resolved and classified into multiple
clusters.

Basic properties that I looked for while searching the functions are as follows:

1. Their graphical representation

2. Domain of the function
3. Variation in slopes
4. Mathematical nature (increasing, decreasing, monotonically increasing/decreasing)
5. Derivatives of the function
6. Linearity and Non-Linearity of the function

Reason for searching specific functions:

While searching I had to keep in mind the problem statement as the function can behave
completely different for the range we are working on as it can be highly non-linear but for
the range of values that we are working on, if they lie on the part of the function that has
an extremely limited variation then the rideability and the meaningfulness of the clusters
will be hampered.

Initially I used a Sigmoid function and the derivative of the Sigmoid function due to their
mathematical properties which I have shared below in detail.

Sigmoid Function

A sigmoid function is a bounded, differentiable, real function that is defined for all real
input values and has a non-negative derivative at each point and exactly one inflection
point. A sigmoid "function" and a sigmoid "curve" refer to the same object.

This function is also known as “S” function due to its graphical representation.

A sigmoid function is convex for values less than a particular point, and it is concave for values
greater than that point in many of the cases, that point is 0.

Fig 2: Sigmoid Function along with its mathematical equation

Now since this function makes the major difference at “x=0” and the ranges that we are taking
here are all on the positive side of the axis so, that makes this function not at its best for us,
but let us look at a different function.

Derivative of Sigmoid function

Below is the representation of the derivative function and its comparison with the Sigmoid
function.

Fig 3: Comparison between sigmoid and its derivative

Reasons for picking the Derivative function:

1. As we can already see that the sigmoid function that we were using did have only half of
its variation on the positive x axis.
2. This function provides far better options for varying functional values in the given ranges
3. Since it provides its whole functional values on the positive x axis only the relative
difference between the values will be highly noticeable when compared to the normal
sigmoid function in our case.

Now, I used these two functional values along with the other data and made a table for
a better understanding of data we have so far,
Powered by Crisp Analytics

Table 2: Dataframe with fucntional values

Now in the next step I have used these functional values (“Sigmoid values”,” Derivative of
Sigmoid values”) as different unique features for each alphabet and combining them I have
Powered by Crisp Analytics

created a vector space which will represent each alphabet with meaningful values assigned to
each one of them in a remarkably analogous way as co-ordinates on a graph.

I have used K-Means from sklearn library to classify the alphabets into two clusters based on the
feature values of the vector space.

I have plotted the clusters formed using these two features for better visualization of the
clustering.
Powered by Crisp Analytics

Plot 2: Clustering using Sigmoid function and its derivative

Here, the clustering has happened, but it is very trivial due to the nature of the functions or the
features that I have used here in creating the vector space.

Solution for this anomaly:

Now we can modify the mathematical forms of the function used here in such a way to get
a non-trivial form of cluster for the range we are working on and we can also try searching
for other functions having non-linear and varied nature for a very large domain of values,
keeping the meaningfulness also and creating good enough variation so as two form non-
trivial clusters.

To make our data more varying and non-linear I looked for a completely different function
which can provide us non-linearity in any range given with very slight modifications, the
function which i have used here is “Exponentially decaying sinusoidal function.”

Exponentially decaying sinusoidal function:

This function has an incredibly unique nature, its properties are: having oscillatory nature but
non-periodic on the same time and also its exponential decaying nature makes it a great option
for this problem statement.
Powered by Crisp Analytics

The mathematical function used here is represented by, f(t)=exp(0.2×t) × sin(10×t). This
function takes “t” as an input and “0.2” and “10” values insode the parenthesis can be tuned
according to the need.

Graphical representation for this function has been shown below,

Fig 4: Exponentially decaying sin function

Some of the reasons why I have chosen this function particularly are:

1. It provides the required non-linearity.

2. The Oscillatory nature of the function keeps the functional values to span a larger range.
3. It has multiple parameters through which we can tune the function to adjust to a very
wide variety of ranges.

I used this function as well and passed the random values through it, below I have
shown the updated table.
Powered by Crisp Analytics

Table 3: DataFrame with Decay functional values

Further, I changed my vector space and used the Derivative values as feature 1 and
exponentially decaying values as feature 2.

And applying clustering to this new vector space comprised of (“Derivative of sigmoid
function”,” Exponentially decaying sinusoidal function”) and plotting the results of clustering.
Powered by Crisp Analytics

Plot 3: Clustering on using Derivative and Decaying function

This figure shows that the vector space formed using Decaying values shows better
non-linearity and that clustering is better visualized in this space.

Red cross in the figure are the centroids for each cluster, which the K-Means clustering
uses to specify each data point the cluster it belongs to.

Since we have two more ranges in which we have to cluster the alphabets, I have
followed the same procedure and I have attached the plots obtained by using the
second vector space.
Powered by Crisp Analytics

Plot 4: Clustering for range (20,11)

Plot 5: Clustering for range (8,15)

STAGE – 2

What values, within any presumptive range or boundary, might be assigned

to each letter of the English alphabet (letters A through Z), allowing all
alphabet letters to be sorted into three groups after undergoing any
function?
Powered by Crisp Analytics

Conditions - same as previous, except range could be pre-assumed but fixed.

For clustering into three different clusters, I have used the same approach as used in the first
part and the only change is the parameter “num_clusters” which has been updated to be 3
instead of 2.

Plot 6: Clustering for range (5,9)

Plot 7: Clustering for range (20,11)

Plot 8: Clustering for range (8,15)

Conclusion:

In conclusion, this research pursued the task of clustering alphabets in a manner that extends
beyond mere functional value assignments. To overcome this challenge, a novel approach was
adopted, involving the construction of a mathematical vector space employing two distinct
functions.

This vector space facilitated the meaningful representation of each alphabet as a data point
with predefined values, imparting semantic significance to their positions. By subjecting the
entire alphabet set to a clustering algorithm, the study successfully revealed multiple clusters
that possess interpretative significance. These clusters provide valuable insights into the
underlying patterns and associations among the alphabets.

While this approach has shown promising results, the process of assigning contextual meaning
to the clusters still necessitates careful analysis and interpretation by researchers. As future
directions, further investigations may explore alternative data representations, clustering
techniques and better approaches to solve the assignment.

Apple Receipt Generator
No ratings yet
Apple Receipt Generator
3 pages
Enrichment Polya Problem Excerpt
No ratings yet
Enrichment Polya Problem Excerpt
8 pages
Data Analytics Using Python Lab Manual
50% (2)
Data Analytics Using Python Lab Manual
8 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Bartleby QN AGuidelines
73% (11)
Bartleby QN AGuidelines
51 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
183 pages
Numpy For Data Science
No ratings yet
Numpy For Data Science
94 pages
DS Lab Manual Final
No ratings yet
DS Lab Manual Final
49 pages
Statistical Learning
No ratings yet
Statistical Learning
2 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Unit 1 Ganeshk e
No ratings yet
Unit 1 Ganeshk e
24 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
10 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
R204146B-DataScience MID-I Assignment Questions and Answers
No ratings yet
R204146B-DataScience MID-I Assignment Questions and Answers
7 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
Log Line Arc Rfs
No ratings yet
Log Line Arc Rfs
30 pages
Functions and Their Graphs: What Is A Function?
No ratings yet
Functions and Their Graphs: What Is A Function?
10 pages
Untitled
No ratings yet
Untitled
286 pages
0.3 Functions and Graphs Contemporary Calculus 1
No ratings yet
0.3 Functions and Graphs Contemporary Calculus 1
12 pages
1.1 Graphs and Graphing Utilities
No ratings yet
1.1 Graphs and Graphing Utilities
3 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Foundations For Efficiencies Writing Efficiency Code With Python
No ratings yet
Foundations For Efficiencies Writing Efficiency Code With Python
28 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
Intro to Statistics for Engineers using Python
No ratings yet
Intro to Statistics for Engineers using Python
147 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
Teste 3
No ratings yet
Teste 3
3 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
21function_241208_110433
No ratings yet
21function_241208_110433
40 pages
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
SNM - R lab materials
No ratings yet
SNM - R lab materials
68 pages
R Prograaming Journal
No ratings yet
R Prograaming Journal
16 pages
Assignment-2-DS-EC11-3
No ratings yet
Assignment-2-DS-EC11-3
2 pages
2. SETS AND FUNCTIONS
No ratings yet
2. SETS AND FUNCTIONS
27 pages
Ch11a Numpy
No ratings yet
Ch11a Numpy
8 pages
NumPy Advanced Indexing and Numerical Operations
No ratings yet
NumPy Advanced Indexing and Numerical Operations
10 pages
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
DADS301 MBA Sem 3programming in DS
No ratings yet
DADS301 MBA Sem 3programming in DS
10 pages
10 Numpy Functions You Should Know - by Amanda Iglesias Moreno - Towards Data Science
No ratings yet
10 Numpy Functions You Should Know - by Amanda Iglesias Moreno - Towards Data Science
14 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Functional Regression Handout
No ratings yet
Functional Regression Handout
435 pages
Assignment 5 - Copy (4)
No ratings yet
Assignment 5 - Copy (4)
7 pages
Functions - Lecture Parts I, II & III
No ratings yet
Functions - Lecture Parts I, II & III
37 pages
6.Lab Activity
No ratings yet
6.Lab Activity
23 pages
Stats
No ratings yet
Stats
33 pages
Unit-2
No ratings yet
Unit-2
17 pages
AFB Saurabh Last Year
No ratings yet
AFB Saurabh Last Year
11 pages
Handout-for-Mat101
No ratings yet
Handout-for-Mat101
12 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Simulating Continuous and Non-Continuous Distributions
No ratings yet
Simulating Continuous and Non-Continuous Distributions
17 pages
Domain and Range Practice (Functions Day 3)
No ratings yet
Domain and Range Practice (Functions Day 3)
30 pages
Lesson 3
No ratings yet
Lesson 3
54 pages
R Exercises
No ratings yet
R Exercises
35 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
AI Obse-2
No ratings yet
AI Obse-2
32 pages
Singh_Project1_Report
No ratings yet
Singh_Project1_Report
12 pages
Manual
No ratings yet
Manual
48 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
Angular Observables and Promises: A Practical Guide to Asynchronous Programming
From Everand
Angular Observables and Promises: A Practical Guide to Asynchronous Programming
Abdelfattah Ragab
No ratings yet
DC Chap1A Functions
No ratings yet
DC Chap1A Functions
96 pages
Pds Full Asiignment Mam - 240926 - 123334
No ratings yet
Pds Full Asiignment Mam - 240926 - 123334
15 pages
Machine Learning Presentation Bushra Kambo Roll No 6
No ratings yet
Machine Learning Presentation Bushra Kambo Roll No 6
18 pages
Msi Pro B660M DDR4 Manual
No ratings yet
Msi Pro B660M DDR4 Manual
29 pages
The Villainess Is Shy In Receiving Love Ch.28 Page 46 - Mangago
No ratings yet
The Villainess Is Shy In Receiving Love Ch.28 Page 46 - Mangago
1 page
SPM Unit-1 For One Shot Video by Brevilearning YT-1
No ratings yet
SPM Unit-1 For One Shot Video by Brevilearning YT-1
18 pages
Experiment 7,8 and 9
No ratings yet
Experiment 7,8 and 9
15 pages
On Spatialization Author(s) : James Dashow Source: Computer Music Journal, Fall 2013, Vol. 37, No. 3 (Fall 2013), Pp. 4-6 Published By: The MIT Press
No ratings yet
On Spatialization Author(s) : James Dashow Source: Computer Music Journal, Fall 2013, Vol. 37, No. 3 (Fall 2013), Pp. 4-6 Published By: The MIT Press
4 pages
1.6.1 Packet Tracer - Implement A Small Network
No ratings yet
1.6.1 Packet Tracer - Implement A Small Network
6 pages
Sistem Informasi Bimbingan Skripsi Berbasis Web Di Universitas Pelita Harapan
No ratings yet
Sistem Informasi Bimbingan Skripsi Berbasis Web Di Universitas Pelita Harapan
8 pages
The Forest
No ratings yet
The Forest
24 pages
Revsheet Q2 G10
No ratings yet
Revsheet Q2 G10
10 pages
CEC335 ANTENNA DESIGN LAB MANUAL
No ratings yet
CEC335 ANTENNA DESIGN LAB MANUAL
37 pages
Threat and Risk Assessment Working Guide
No ratings yet
Threat and Risk Assessment Working Guide
119 pages
Zapurse 2.0
No ratings yet
Zapurse 2.0
15 pages
SCHENCK
No ratings yet
SCHENCK
26 pages
Hackers Presentation by Amran F Qasim
No ratings yet
Hackers Presentation by Amran F Qasim
10 pages
Convert From HTML To PDF Online: Options
No ratings yet
Convert From HTML To PDF Online: Options
2 pages
Log
No ratings yet
Log
4 pages
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
No ratings yet
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
167 pages
VBA Course
No ratings yet
VBA Course
8 pages
12 Computer Science Sp 03 A
No ratings yet
12 Computer Science Sp 03 A
18 pages
Case - TS012798208
No ratings yet
Case - TS012798208
5 pages
Ls Cognitive Proficiency CPI Analysis
No ratings yet
Ls Cognitive Proficiency CPI Analysis
6 pages
Continous Deployment
No ratings yet
Continous Deployment
2 pages
Nanocom Evolution Software Hardware Manual
No ratings yet
Nanocom Evolution Software Hardware Manual
19 pages
Unit-I: Cs8691 - Artificial Intelligence
100% (1)
Unit-I: Cs8691 - Artificial Intelligence
17 pages
Evolution - Downloads_ Instruction manuals
No ratings yet
Evolution - Downloads_ Instruction manuals
4 pages
RTSP Vip Configuration Note Enus 9007200806939915
No ratings yet
RTSP Vip Configuration Note Enus 9007200806939915
21 pages
Case Study
No ratings yet
Case Study
2 pages
ssss
No ratings yet
ssss
282 pages