A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning
A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning
Review Article
Received: 22 May 2022 Revised: 21 June 2022 Accepted: 27 June 2022 Published: 18 July 2022
Abstract - Machine learning is a small part of artificial intelligence. Machine learning is one of the most trending
technologies in the world today. Whatever you search on Google, Google takes your data and uses machine learning to show
your advertisement and search results accordingly. In the same way the type of video you watch on YouTube, YouTube also
recommends the same type of video to you. A machine learning system works on the principle that it has to take input data,
learn something from it, and give output. In machine learning, a machine learning computer program is trained by giving
input data, and output is produced based on that input data. The paper aims to determine how the K-Nearest Neighbor (KNN)
machine learning classification algorithm is applied to the model dataset and how the given data is predicted by the model to
which class this given data will exist. K-Nearest Neighbor (KNN) is the simplest machine learning algorithm based on
supervised learning. The K-NN algorithm is mostly used in solving the classification problem. Supervised learning is a type of
machine learning algorithm. In supervised learning, input and output data are already provided to the machine, so the
training data is also called labeled data. When a new input is given to the machine, it will give the output only according to its
previous experience and data.
Keywords – Artificial Intelligence, K-Nearest Neighbor (KNN) Classification Algorithm, Machine learning, Supervised
Learning Algorithm, K-Nearest Neighbor (KNN) Classification Algorithm, Labeled Data.
1. Introduction
Today all human beings live in that era of technology • Machine learning is currently used in other areas,
where new things are making the work of human beings including the financial sector, social media, robots,
easier and faster every day. One of these topics is machine automation, gaming application, etc.
learning, whose name you have probably heard. With the • In daily life, human beings use social media many
help of machine learning, a machine completes tasks based times in which machine learning is used. Facebook
on its learning, understanding, and experience. Nowadays, and Google show relevant advertisements to users
many programs online are being guided with the help of based on their past search activity and influence
machine learning so that these machines recognize the habits, video results on YouTube.
likes, and dislikes of humans. Now let us understand the fact • Machine learning techniques save time and produce
with the help of an example. If you want to understand the better results even with limited resources.
camera of a smartphone that the object being captured in the • Many source programs help to increase the
camera is a book on multiple subjects in front of the usefulness of algorithms through machine learning.
Smartphone’s camera, and in doing so, the Smartphone’s • It can handle multidimensional or multi-variety
algorithm will understand that the object of this texture, even when no dynamic and favourable conditions
design, and shape is a book. In the future, if any book is exist.
placed in front of this Smartphone. In this case, the
Computer experts first created the idea of creating
Smartphone will understand based on its experience that this
human-like thinking and learning computers in the 1950s. In
object is the book. Still, on the contrary, if a pencil is held in
the effort, in 1950, the first computer game was developed
front of the camera, then the Smartphone algorithm will be
that could beat the world champion player. Deep Blue
unable to get the name or information of that object because
Computer is one of the best examples of machine learning
the Smartphone has never received information about the
techniques, which defeated world champion Garry Kasparov
pencil.
(Chess Champion Player).
In the paper, the K-Nearest Neighbour classification The K-NN algorithm is considered a well-known pattern
algorithm has been described. K-NN algorithm is based on recognition method and is one of the most prominent text
supervised learning. Supervised learning is a type of machine classification algorithms. The K-NN machine learning
learning. Supervised learning can be understood in such a algorithm is based on supervised learning, one of the
way that a supervisor in supervised learning can also call the classification's simplest machine learning algorithms. This
teacher. Supervised is such learning in which train the paper explains the ideas related to the K-NN algorithm,
machine from the supervisor data or labelled data. Then after principles, and implementation steps in detail [4].
the training, the machine can predict the correct output for
any other input. In supervised learning, input and output data Imandoust and Bolandraftar [5] have analyzed the K-NN
are already provided to the machine, called training data or classification method, which is studied extensively for
labelled data. economic forecasts. The models for financial crisis
forecasting have been one of the most attractive areas in
The most important method of supervised learning is financial research.
classification; classification is used to analyze data in
supervised learning. The classification method is used for the In the study [6], expected in the present scenario, stock
prediction. The classification methods or algorithms divide price prediction has become a challenging research topic;
the data sets into categories. Here the data sets are divided currently, the stock markets are considered a great trading
into classes based on different parameters. For example, G- area. The stock market prices keep on fluctuating. Therefore,
mail divides its mail data into categories like Email, Spam, there is no loss to the inventors in the stock market, so the K-
Advertisement, Promotion, etc. So if you want to output the NN algorithm has been applied to the data of the companies
data into categories, then the classification algorithm will be in the stock market so that the inventors have minimum
used using the K-NN algorithm and solve the classification losses.
type problem. K Nearest Neighbour (KNN) is a supervised
learning algorithm that can perform classification tasks using The machine learning algorithms are majorly used for
neighbors' numbers (K). K- Nearest Neighbour is the various purposes like image processing, predictive analytics,
simplest machine learning algorithm based on supervised and image processing. One of the main advantages of using a
learning. The KNN algorithm is mostly used in solving the machine learning algorithm is what to do with the data; after
classification problem. that, it automatically performs its function very well [7].
2. Related Work The study [8] describes the survey paper and explains
After studying the review papers of the last 10 years to the essential definitions, principles, and methodologies of the
understand the work done in the previous year related to the supervised machine learning algorithm very well.
supervised machine learning algorithm. Reddy, and Ravi Babu [9], explain in the paper that
The study [2] proposes Muhammad. I, Machine learning supervised machine learning algorithm is mainly used for
can be implemented in the form of association analysis. It various types of image classification, predictive modelling,
will require supervised learning, unsupervised, and data mining technique, etc.
reinforcement learning, but a better understanding of the Bijalwan and Pascual [10] analyzed the supervised
strengths and weaknesses of the supervised learning machine learning classification techniques that are prominent
classification algorithms. The main objective of supervised in automatically classifying a set of text documents into
learning is to create a concise model that gives class label different categories from a predetermined set.
distributions in predictor features. In supervised learning,
train the model in such a way that trains the machine with Soofi, Awan [11] describes that data science uses data
labelled data. After the training, the machine can predict the mining techniques to predict the group members, which
correct output for any other input. come under a classification method. Many such classification
Several machine learning classifiers such as Decision techniques of machine learning are used for classification
Tree (DT), Naïve Bayes (NB), Support Vector Machine purposes. These include decision tree classification
(SVM), and Random Forest (RF) algorithm have been algorithms, K-NN, and support vector machines.
studied to predict heart problems accurately. In the present
scenario, machine learning is used in various fields in The computer algorithm is studied in machine learning.
today’s scenario. Machine learning is used for handwriting This computer is trained from experience, and after learning
recognition, medical diagnosis, and biometric recognition. from experience, the computer automatically improves itself.
Machine learning plays an important role in identifying The paper uses machine learning techniques to build and
diseases based on patient characteristics in the medical field. compare the classifier models, including Bayes network,
Doctors use machine learning software to diagnose various Logistic Regression, Decision Stump, J48, Random Forest,
diseases like cancer, cardiac arrest, and many more [3]. and Random Tree applied to agricultural data [12].
44
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022
In today’s time, many newspapers give us a lot of second is labelled data. An algorithm is used to train the ML
information, which can be divided into many categories like model and determine what kind of output to predict. Suppose
sports, politics, world, technology, etc. Text classification you have to train the ML model on how to recognize
has become very important today because classified data on a mangoes. So far, you have to use a supervised learning
large scale everywhere has become a problem for everyone. algorithm. Now you have to tell that ML model what the
Convolutional Neural Networks (CNN) is a very important mango looks like; for this, you will give feature data to the
approach for deep learning, which currently stands out as a ML Data( such as the mango colour is yellow, the shape is
strong competitor to other classification algorithms like a round, and the taste is sweet).
decision tree and K-NN [13].
Table 1. Supervised Learning Dataset-I
Karthikeya and Shetty [14] describe the prediction for
the crops in agriculture as a big problem for farmers because Feature Data Label Data
agriculture is always full of uncertainty due to climate Colour Shape Taste Fruit
change, rainfall, soil type, and other factors. The population Yellow Round Sweet Mango
of people is increasing daily due to the great need to produce
agricultural products to meet the people's demands. These
problems can be easily solved with machine learning When you have to give feature data to the ML model,
algorithms. using supervised first, it has been told that if something is an
input appearance, colour is yellow, the shape is round, and
The main objective of the research is to help doctors and taste is sweet. The output prediction will be mango because
patients to predict diabetes as soon as possible through you used feature and label data to train the machine learning
machine learning techniques. The prediction of the diagnosis system. So the algorithm is called a supervised learning
of diabetes is based on these algorithms of machine learning. algorithm.
K Nearest Neighbours (KNN), Naive Bayes (NB), Support
Table 2. Supervised Learning Dataset-II
Vector Machine (SVM), Decision Tree (DT), Logistic
Regression (LR), and Random Forest (RF) are prominent in Feature Data Label Data
these algorithms [15].
X1 X2 Class
3. Classification Methodology 10 100 Square
The most important method of supervised learning is 2 4 Root
Classification; classification is used to analyze data in
supervised learning. The classification method is used for the
If the value of X1 is 10 and X2 is 100 given to the above data
prediction. The classification methods or algorithms divide
set (Table 2), then the prediction of the output will be square.
the data sets into categories [16].
3.1. K-Nearest Neighbor Classification Algorithm
In supervised learning, input and output data are already
In classification, the output variables are called labels
provided to the machine, also called training data or labelled
or categories because this output is in categorical for, means
data [17]. According to the data, the machine gives its
output gets divided into two or more categories such that an
output, which completely depends on the quality of the
email can be categorized in two ways ( spam or not spam),
training data. If the data quality is good, the machine's output
categorizing age group into three parts, children, young and
will also improve. When a new input is given to the machine,
old. K-Nearest Neighbour is the simplest machine learning
it will output only according to its previous experience and
algorithm based on supervised learning. The KNN algorithm
data. In this algorithm, the machine applies what is written in
is mostly used in solving the classification problem. The
its past to the new data using the labelled example to predict
KNN classification algorithm need can be understood from
future events.
this example. Suppose we have to be given two categories,
i.e., category A and Category B, and have a new data point
Supervised learning can be understood with an example.
X1 and want to see which data point will lie in which of
With a supervised learning algorithm, a computer program or
these categories. To solve this problem and need a K-NN
Machine learning model is given certain datasets (such as
algorithm.
apple colour is red, weight 20 gram, round shape, and height
5 cm). Using these datasets, the computer program predicts
The category or class of any particular dataset can be
the outputs. The two types of datasets are given to the
easily solved with the help of the K-NN algorithm.
Machine learning model. The first is feature data and the
45
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022
(1)
X2
1 Category B
Y
B(X2, Y2)
Y2
A(X1, Y1)
Category A
Y1
X1
Fig. 1 Before K-NN 1
X1
X
X2
Fig. 3 Euclidian distance between A1 and B1
1
Category B
4. Experiment and Evaluation
In the simplest terms, the result of the K-NN algorithm
is as follows. The paper given a student dataset below in
which maths and computer science marks are given, and the
New Data Point result based on marks is also associated.
assigned to
Category A Table 3. Student Data Set
46
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022
References
[1] N. Kola, M. Kumar, “Supervised Learning Algorithms of Machine Learning: Prediction of Brand Loyalty,” International Journal of
Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 11, pp. 3886-3889, 2019.
[2] I. Muhammad, Z. Yan, “Machine Learning Approaches: A Survey,” ICTACT Journal on Soft Computing, vol. 5, no. 3, pp. 946-952,
2015.
[3] A. Juyal, C. Pande, “Performance Analysis of Supervised Machine Learning Algorithms on Medical Dataset,” International Journal of
Recent Technology and Engineering (IJRTE), vol. 8, no. 6, pp. 1637-1642, 2020.
[4] J. Sun, W. DU, “A Survey of KNN Algorithm,” Information Engineering and Applied Computing, vol. 8, no. 11, pp. 1-10, 2018.
[5] B. S. Imandoust, M. Bolandraftar, “Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical
Background,” Int. Journal of Engineering Research and Applications, vol. 3, no. 5, pp. 605-610, 2013.
[6] K. Alkhatib, K. Najadat, “Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm,” International Journal of Business,
Humanities and Technology, vol. 3, no. 3, pp. 32-44, 2013.
47
Manish Suyal & Parul Goyal / IJETT, 70(7), 43-48, 2022
[7] B. Mahesh, “Machine Learning Algorithms- A Review,” International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381-
386, 2018.
[8] N. Burkart, F.M. Huber, “A Survey on the Explain ability of Supervised Machine Learning,’ Journal of Artificial Intelligence Research,
vol. 70, no. 6, pp. 245-317, 2019.
[9] K.V. Reddy, R.U. Babu, “A Review on Classification Techniques in Machine Learning,” International Journal of Advance Research in
Science and Engineering, vol. 7, no. 3, pp. 40-47, 2018.
[10] V. Bijalwan, V. Kumari, “Machine Learning Approach for Text and Document Mining,” Research Gate, vol. 4, no. 5, pp. 1-9, 2019.
[11] A. Soofi, A. Awan, “Classification Techniques in Machine Learning: Application and Issues,” International Journal of Basic & Applied
Sciences, vol. 4, no. 13, pp. 459-465, 2017.
[12] A. Chaudhary, A. Kolhe, “Machine Learning Classification Techniques: A Company Study,” International Journal on Advanced
Computer Theory and Engineering (IJACTE), vol. 2, no. 4, pp. 21-25, 2013.
[13] J. Sreedevi, J. Bai, “Newspaper Article Classification using Machine Learning Techniques,” International Journal of Innovative
Technology and Exploring Engineering (IJITEE), vol. 9, no. 5, pp. 872-877, 2019.
[14] K. Karthikeya, K.H. Sudarshan, “Prediction of Agriculture Crops using KNN Algorithm,” International Journal of Innovative Science
and Research Technology, vol. 5, no. 5, pp. 1422-1424, 2019.
[15] N. Krishnamoorthy, N. Umarani, “Diabetes Prediction in Healthcare Using KNN Algorithm,” International Journal of Multidisciplinary
Educational Research, vol. 10, no. 5, pp. 36-39, 2021.
[16] M. Suyal, P. Goyal, “An Efficient Classifier Model for Opinion Mining to Analyze Drugs Satisfaction Among Patients,” Communications
in Computer and Information Science (CCIS), Springer Nature Switzerland AG, vol.1591, pp. 30-38, 2022. https://doi.org/10.1007/978-3-
031-07012-9_3
[17] M. Suyal, P. Goyal, “A Two-Phase Classifier Model for Predicting the Drug Satisfaction of the Patients Based on Their Sentiments,”
Communications in Computer and Information Science (CCIS), Springer Nature Switzerland AG. vol. 1591, pp. 79-89, 2022.
https://doi.org/10.1007/978-3-031-07012-9_7
48