Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning

This review article discusses the K-Nearest Neighbor (KNN) classification algorithm, a simple supervised learning method widely used for classification tasks in machine learning. It explains the principles, applications, and methodologies associated with KNN, highlighting its effectiveness in various fields such as finance, healthcare, and text classification. The paper also emphasizes the importance of labeled data in training machine learning models to predict outcomes based on input features.

Uploaded by

dipti.guptaa01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning

This review article discusses the K-Nearest Neighbor (KNN) classification algorithm, a simple supervised learning method widely used for classification tasks in machine learning. It explains the principles, applications, and methodologies associated with KNN, highlighting its effectiveness in various fields such as finance, healthcare, and text classification. The paper also emphasizes the importance of labeled data in training machine learning models to predict outcomes based on input features.

Uploaded by

dipti.guptaa01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Engineering Trends and Technology Volume 70 Issue 7, 43-48, July 2022

ISSN: 2231 – 5381 / https://doi.org/10.14445/22315381/IJETT-V70I7P205 © 2022 Seventh Sense Research Group®

Review Article

A Review on Analysis of K-Nearest Neighbor


Classification Machine Learning Algorithms based on
Supervised Learning
Manish Suyal1, Parul Goyal2
1,2
Department of CA & IT, SGRR University, Uttarakhand, India.
1suyal.manish922@gmail.com

Received: 22 May 2022 Revised: 21 June 2022 Accepted: 27 June 2022 Published: 18 July 2022

Abstract - Machine learning is a small part of artificial intelligence. Machine learning is one of the most trending
technologies in the world today. Whatever you search on Google, Google takes your data and uses machine learning to show
your advertisement and search results accordingly. In the same way the type of video you watch on YouTube, YouTube also
recommends the same type of video to you. A machine learning system works on the principle that it has to take input data,
learn something from it, and give output. In machine learning, a machine learning computer program is trained by giving
input data, and output is produced based on that input data. The paper aims to determine how the K-Nearest Neighbor (KNN)
machine learning classification algorithm is applied to the model dataset and how the given data is predicted by the model to
which class this given data will exist. K-Nearest Neighbor (KNN) is the simplest machine learning algorithm based on
supervised learning. The K-NN algorithm is mostly used in solving the classification problem. Supervised learning is a type of
machine learning algorithm. In supervised learning, input and output data are already provided to the machine, so the
training data is also called labeled data. When a new input is given to the machine, it will give the output only according to its
previous experience and data.

Keywords – Artificial Intelligence, K-Nearest Neighbor (KNN) Classification Algorithm, Machine learning, Supervised
Learning Algorithm, K-Nearest Neighbor (KNN) Classification Algorithm, Labeled Data.

1. Introduction
Today all human beings live in that era of technology • Machine learning is currently used in other areas,
where new things are making the work of human beings including the financial sector, social media, robots,
easier and faster every day. One of these topics is machine automation, gaming application, etc.
learning, whose name you have probably heard. With the • In daily life, human beings use social media many
help of machine learning, a machine completes tasks based times in which machine learning is used. Facebook
on its learning, understanding, and experience. Nowadays, and Google show relevant advertisements to users
many programs online are being guided with the help of based on their past search activity and influence
machine learning so that these machines recognize the habits, video results on YouTube.
likes, and dislikes of humans. Now let us understand the fact • Machine learning techniques save time and produce
with the help of an example. If you want to understand the better results even with limited resources.
camera of a smartphone that the object being captured in the • Many source programs help to increase the
camera is a book on multiple subjects in front of the usefulness of algorithms through machine learning.
Smartphone’s camera, and in doing so, the Smartphone’s • It can handle multidimensional or multi-variety
algorithm will understand that the object of this texture, even when no dynamic and favourable conditions
design, and shape is a book. In the future, if any book is exist.
placed in front of this Smartphone. In this case, the
Computer experts first created the idea of creating
Smartphone will understand based on its experience that this
human-like thinking and learning computers in the 1950s. In
object is the book. Still, on the contrary, if a pencil is held in
the effort, in 1950, the first computer game was developed
front of the camera, then the Smartphone algorithm will be
that could beat the world champion player. Deep Blue
unable to get the name or information of that object because
Computer is one of the best examples of machine learning
the Smartphone has never received information about the
techniques, which defeated world champion Garry Kasparov
pencil.
(Chess Champion Player).

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)


Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022

In the paper, the K-Nearest Neighbour classification The K-NN algorithm is considered a well-known pattern
algorithm has been described. K-NN algorithm is based on recognition method and is one of the most prominent text
supervised learning. Supervised learning is a type of machine classification algorithms. The K-NN machine learning
learning. Supervised learning can be understood in such a algorithm is based on supervised learning, one of the
way that a supervisor in supervised learning can also call the classification's simplest machine learning algorithms. This
teacher. Supervised is such learning in which train the paper explains the ideas related to the K-NN algorithm,
machine from the supervisor data or labelled data. Then after principles, and implementation steps in detail [4].
the training, the machine can predict the correct output for
any other input. In supervised learning, input and output data Imandoust and Bolandraftar [5] have analyzed the K-NN
are already provided to the machine, called training data or classification method, which is studied extensively for
labelled data. economic forecasts. The models for financial crisis
forecasting have been one of the most attractive areas in
The most important method of supervised learning is financial research.
classification; classification is used to analyze data in
supervised learning. The classification method is used for the In the study [6], expected in the present scenario, stock
prediction. The classification methods or algorithms divide price prediction has become a challenging research topic;
the data sets into categories. Here the data sets are divided currently, the stock markets are considered a great trading
into classes based on different parameters. For example, G- area. The stock market prices keep on fluctuating. Therefore,
mail divides its mail data into categories like Email, Spam, there is no loss to the inventors in the stock market, so the K-
Advertisement, Promotion, etc. So if you want to output the NN algorithm has been applied to the data of the companies
data into categories, then the classification algorithm will be in the stock market so that the inventors have minimum
used using the K-NN algorithm and solve the classification losses.
type problem. K Nearest Neighbour (KNN) is a supervised
learning algorithm that can perform classification tasks using The machine learning algorithms are majorly used for
neighbors' numbers (K). K- Nearest Neighbour is the various purposes like image processing, predictive analytics,
simplest machine learning algorithm based on supervised and image processing. One of the main advantages of using a
learning. The KNN algorithm is mostly used in solving the machine learning algorithm is what to do with the data; after
classification problem. that, it automatically performs its function very well [7].

2. Related Work The study [8] describes the survey paper and explains
After studying the review papers of the last 10 years to the essential definitions, principles, and methodologies of the
understand the work done in the previous year related to the supervised machine learning algorithm very well.
supervised machine learning algorithm. Reddy, and Ravi Babu [9], explain in the paper that
The study [2] proposes Muhammad. I, Machine learning supervised machine learning algorithm is mainly used for
can be implemented in the form of association analysis. It various types of image classification, predictive modelling,
will require supervised learning, unsupervised, and data mining technique, etc.
reinforcement learning, but a better understanding of the Bijalwan and Pascual [10] analyzed the supervised
strengths and weaknesses of the supervised learning machine learning classification techniques that are prominent
classification algorithms. The main objective of supervised in automatically classifying a set of text documents into
learning is to create a concise model that gives class label different categories from a predetermined set.
distributions in predictor features. In supervised learning,
train the model in such a way that trains the machine with Soofi, Awan [11] describes that data science uses data
labelled data. After the training, the machine can predict the mining techniques to predict the group members, which
correct output for any other input. come under a classification method. Many such classification
Several machine learning classifiers such as Decision techniques of machine learning are used for classification
Tree (DT), Naïve Bayes (NB), Support Vector Machine purposes. These include decision tree classification
(SVM), and Random Forest (RF) algorithm have been algorithms, K-NN, and support vector machines.
studied to predict heart problems accurately. In the present
scenario, machine learning is used in various fields in The computer algorithm is studied in machine learning.
today’s scenario. Machine learning is used for handwriting This computer is trained from experience, and after learning
recognition, medical diagnosis, and biometric recognition. from experience, the computer automatically improves itself.
Machine learning plays an important role in identifying The paper uses machine learning techniques to build and
diseases based on patient characteristics in the medical field. compare the classifier models, including Bayes network,
Doctors use machine learning software to diagnose various Logistic Regression, Decision Stump, J48, Random Forest,
diseases like cancer, cardiac arrest, and many more [3]. and Random Tree applied to agricultural data [12].

44
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022

In today’s time, many newspapers give us a lot of second is labelled data. An algorithm is used to train the ML
information, which can be divided into many categories like model and determine what kind of output to predict. Suppose
sports, politics, world, technology, etc. Text classification you have to train the ML model on how to recognize
has become very important today because classified data on a mangoes. So far, you have to use a supervised learning
large scale everywhere has become a problem for everyone. algorithm. Now you have to tell that ML model what the
Convolutional Neural Networks (CNN) is a very important mango looks like; for this, you will give feature data to the
approach for deep learning, which currently stands out as a ML Data( such as the mango colour is yellow, the shape is
strong competitor to other classification algorithms like a round, and the taste is sweet).
decision tree and K-NN [13].
Table 1. Supervised Learning Dataset-I
Karthikeya and Shetty [14] describe the prediction for
the crops in agriculture as a big problem for farmers because Feature Data Label Data
agriculture is always full of uncertainty due to climate Colour Shape Taste Fruit
change, rainfall, soil type, and other factors. The population Yellow Round Sweet Mango
of people is increasing daily due to the great need to produce
agricultural products to meet the people's demands. These
problems can be easily solved with machine learning When you have to give feature data to the ML model,
algorithms. using supervised first, it has been told that if something is an
input appearance, colour is yellow, the shape is round, and
The main objective of the research is to help doctors and taste is sweet. The output prediction will be mango because
patients to predict diabetes as soon as possible through you used feature and label data to train the machine learning
machine learning techniques. The prediction of the diagnosis system. So the algorithm is called a supervised learning
of diabetes is based on these algorithms of machine learning. algorithm.
K Nearest Neighbours (KNN), Naive Bayes (NB), Support
Table 2. Supervised Learning Dataset-II
Vector Machine (SVM), Decision Tree (DT), Logistic
Regression (LR), and Random Forest (RF) are prominent in Feature Data Label Data
these algorithms [15].
X1 X2 Class
3. Classification Methodology 10 100 Square
The most important method of supervised learning is 2 4 Root
Classification; classification is used to analyze data in
supervised learning. The classification method is used for the
If the value of X1 is 10 and X2 is 100 given to the above data
prediction. The classification methods or algorithms divide
set (Table 2), then the prediction of the output will be square.
the data sets into categories [16].
3.1. K-Nearest Neighbor Classification Algorithm
In supervised learning, input and output data are already
In classification, the output variables are called labels
provided to the machine, also called training data or labelled
or categories because this output is in categorical for, means
data [17]. According to the data, the machine gives its
output gets divided into two or more categories such that an
output, which completely depends on the quality of the
email can be categorized in two ways ( spam or not spam),
training data. If the data quality is good, the machine's output
categorizing age group into three parts, children, young and
will also improve. When a new input is given to the machine,
old. K-Nearest Neighbour is the simplest machine learning
it will output only according to its previous experience and
algorithm based on supervised learning. The KNN algorithm
data. In this algorithm, the machine applies what is written in
is mostly used in solving the classification problem. The
its past to the new data using the labelled example to predict
KNN classification algorithm need can be understood from
future events.
this example. Suppose we have to be given two categories,
i.e., category A and Category B, and have a new data point
Supervised learning can be understood with an example.
X1 and want to see which data point will lie in which of
With a supervised learning algorithm, a computer program or
these categories. To solve this problem and need a K-NN
Machine learning model is given certain datasets (such as
algorithm.
apple colour is red, weight 20 gram, round shape, and height
5 cm). Using these datasets, the computer program predicts
The category or class of any particular dataset can be
the outputs. The two types of datasets are given to the
easily solved with the help of the K-NN algorithm.
Machine learning model. The first is feature data and the

45
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022

(1)
X2
1 Category B
Y

B(X2, Y2)

Y2

A(X1, Y1)
Category A
Y1

X1
Fig. 1 Before K-NN 1
X1
X
X2
Fig. 3 Euclidian distance between A1 and B1
1
Category B
4. Experiment and Evaluation
In the simplest terms, the result of the K-NN algorithm
is as follows. The paper given a student dataset below in
which maths and computer science marks are given, and the
New Data Point result based on marks is also associated.
assigned to
Category A Table 3. Student Data Set

Math Computer Result


Category A Science
4 3 Fail
6 7 Pass
X1 7 8 Pass
Fig. 2 After K-NN 5 5 Fail
1
The working steps of the K-NN algorithm can be 8 8 Pass
understood as follows.
Now have a query that says student X that X student got
Step-1: First of all, selects the number k of the 6 marks in maths and 8 marks in a computer science subject.
neighbours.
Step-2: Euclidean distance has to be calculated for the K Query Student (X)→ (Math=6, Computer Science=8)
number of neighbours.
Step-3: Take the K nearest neighbours per the calculated From the above query, determine whether this X student
Euclidean distance. passed or failed. To find this, I have to use the K-NN
Step-4: Among these K neighbours, count the number of algorithm's help.
data points in each category.
Step-5: New data points will be assigned to the category Suppose taken the value of K as 3. It means that
for which the number of neighbours is maximum. whatever the resulting value will come in this query should
Step-6: In this way, the K-NN classification model will be very much to 3 neighbours. The paper has to find that the
be ready. X query topples nearest to the neighbours of the tuple in the
dataset. For this, take the help of the Euclidian distance
The formula for finding the Euclidian distance is as formula.
follows.
Euclidian distance between A1 and B1= D= (2)

46
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022

Where is the first observed value, is the


second observed value of query tuple (new data point), and 5. Conclusion and Suggestions for Future Work
is the first actual value, is the second actual value of In the paper, supervised learning and the K-NN
query tuple (new data point). classification algorithm has been presented very easily. The
problem faced in understanding the supervised machine
From a distance calculated below and find the three learning and K-NN algorithm has been solved in this paper
nearest neighbours to X. For the given X → 4, 5, and 7 are by easy example methods. The paper mentioned that
the nearest neighbours, so consider these three neighbours supervised learning is that type of machine learning which
and not consider those neighbours (3, 6) who are distant works on labelled data. This type of machine is of a very
neighbours. basic type. In this, the data parameters have to be defined
with utmost care. In supervised learning, a small piece of
= 5.38 (3) data is trained and applied to large-scale data. In this paper, a
very small dataset has been used to explain the K-NN
= 1 (4) algorithm so that every user reading this paper can easily
know the working of the K-NN classification algorithm. In
= 1 (5) this paper, the K-NN classification algorithm is applied to
the student dataset in which the students' marks are given,
and their result is linked with the labeled data based on the
= 3.16 (6) pass and fail. Now give the marks of a new student to the
dataset. Then, the K-NN algorithm will determine whether
= 2 (7) the new student will pass or fail. The evaluation and
experimental results of the K-NN classification algorithm are
explained with a small dataset and effective technique. For
The result of the second (4) equation is passed, the third easy understanding of the K-NN algorithm, the K size is 3
(5) equation is passed, and similarly, the result of the fifth (7) (very small) in the K-NN classification algorithm. In the
equation is also passed in the dataset (Table 3). paper, the complete dataset is not given, which can show the
prediction of the result. In the paper, given the query of only
The result, which is in the dataset, shows two things pass one student X in the dataset, predict the result in which their
and fail. Either student X could have passed or the failure. math and computer science marks are given. The K-NN
classification algorithm easily detects whether student X will
In the context, I got 3 (Three) pass values from the pass or fail. This paper will work on supervised learning and
dataset (Table 3) based on the nearest neighbours value and its classification technique K-NN algorithm, and understand
got a 0 (Zero) fail value from the dataset. its impact on society. The paper implemented the K-NN
classification algorithm on labelled data in supervised
3 Pass > 0 Fail (8) learning. In the future, work on the unsupervised learning
and now work on the unlabeled data instead of working on
The meaning of equation (8) is the 3 (three) neighbours the labelled data.
had selected, all three (3) have passed, and 0 (Zero) failed.
The probability of passing is greater than that of failing. So
that statement will declare to student X that they will also
pass.

References
[1] N. Kola, M. Kumar, “Supervised Learning Algorithms of Machine Learning: Prediction of Brand Loyalty,” International Journal of
Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 11, pp. 3886-3889, 2019.
[2] I. Muhammad, Z. Yan, “Machine Learning Approaches: A Survey,” ICTACT Journal on Soft Computing, vol. 5, no. 3, pp. 946-952,
2015.
[3] A. Juyal, C. Pande, “Performance Analysis of Supervised Machine Learning Algorithms on Medical Dataset,” International Journal of
Recent Technology and Engineering (IJRTE), vol. 8, no. 6, pp. 1637-1642, 2020.
[4] J. Sun, W. DU, “A Survey of KNN Algorithm,” Information Engineering and Applied Computing, vol. 8, no. 11, pp. 1-10, 2018.
[5] B. S. Imandoust, M. Bolandraftar, “Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical
Background,” Int. Journal of Engineering Research and Applications, vol. 3, no. 5, pp. 605-610, 2013.
[6] K. Alkhatib, K. Najadat, “Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm,” International Journal of Business,
Humanities and Technology, vol. 3, no. 3, pp. 32-44, 2013.

47
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022

[7] B. Mahesh, “Machine Learning Algorithms- A Review,” International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381-
386, 2018.
[8] N. Burkart, F.M. Huber, “A Survey on the Explain ability of Supervised Machine Learning,’ Journal of Artificial Intelligence Research,
vol. 70, no. 6, pp. 245-317, 2019.
[9] K.V. Reddy, R.U. Babu, “A Review on Classification Techniques in Machine Learning,” International Journal of Advance Research in
Science and Engineering, vol. 7, no. 3, pp. 40-47, 2018.
[10] V. Bijalwan, V. Kumari, “Machine Learning Approach for Text and Document Mining,” Research Gate, vol. 4, no. 5, pp. 1-9, 2019.
[11] A. Soofi, A. Awan, “Classification Techniques in Machine Learning: Application and Issues,” International Journal of Basic & Applied
Sciences, vol. 4, no. 13, pp. 459-465, 2017.
[12] A. Chaudhary, A. Kolhe, “Machine Learning Classification Techniques: A Company Study,” International Journal on Advanced
Computer Theory and Engineering (IJACTE), vol. 2, no. 4, pp. 21-25, 2013.
[13] J. Sreedevi, J. Bai, “Newspaper Article Classification using Machine Learning Techniques,” International Journal of Innovative
Technology and Exploring Engineering (IJITEE), vol. 9, no. 5, pp. 872-877, 2019.
[14] K. Karthikeya, K.H. Sudarshan, “Prediction of Agriculture Crops using KNN Algorithm,” International Journal of Innovative Science
and Research Technology, vol. 5, no. 5, pp. 1422-1424, 2019.
[15] N. Krishnamoorthy, N. Umarani, “Diabetes Prediction in Healthcare Using KNN Algorithm,” International Journal of Multidisciplinary
Educational Research, vol. 10, no. 5, pp. 36-39, 2021.
[16] M. Suyal, P. Goyal, “An Efficient Classifier Model for Opinion Mining to Analyze Drugs Satisfaction Among Patients,” Communications
in Computer and Information Science (CCIS), Springer Nature Switzerland AG, vol.1591, pp. 30-38, 2022. https://doi.org/10.1007/978-3-
031-07012-9_3
[17] M. Suyal, P. Goyal, “A Two-Phase Classifier Model for Predicting the Drug Satisfaction of the Patients Based on Their Sentiments,”
Communications in Computer and Information Science (CCIS), Springer Nature Switzerland AG. vol. 1591, pp. 79-89, 2022.
https://doi.org/10.1007/978-3-031-07012-9_7

48

You might also like