Performance Analysis of Machine Learning Classifier For Predicting Chronic Kidney Disease
Performance Analysis of Machine Learning Classifier For Predicting Chronic Kidney Disease
Performance Analysis of Machine Learning Classifier For Predicting Chronic Kidney Disease
By
CHERUKURI JALAJAKSHI
(18A21F0002)
A.N.L.KUMAR
CERTIFICATE
This is to certify that this project work entitled “Performance Analysis of Machine
Learning Classifier for Predicting Chronic Kidney Disease” is the bonafide work of
MS. CHERUKURI JALAJAKSHI who carried out the work under my supervision,
and submitted in partial fulfilment of the requirements for the award of the degree,
Master of Computer Applications, during the academic year 2018-2021.
External Examiner
DECLARATION
I certify that
The work contained in the project work is original and has been done by me under the
guidance of my supervisor
The work has not been submitted to any other University for the award of any degree or
diploma
The guidelines of the University are followed in writing this report.
Date: CH.JALAJAKSHI
(18A21F0002)
ACKNOWLEDGEMENT
I extend my heartfelt gratitude to the Almighty for giving me strength in proceeding with
this project title “Performance Analysis of Machine Learning Classifier for Predicting
Chronic Kidney Disease”.
I express my heartfelt gratitude to my parents for supporting me in all the ways in every
walk of my life.
I express my sincere thanks to Honorable Dr. S.Ramesh Babu, Secretary & Correspondent of
our college for making necessary arrangement for doing the project
I wish to express my gratitude to Dr. S. Suresh Kumar, principal of our college, for giving us
permission to carry out this project.
I express my sincere thanks to Mr. A.N.L. Kumar, Head of the Department of M.C.A. for
his learned suggestions and encouragement which made this project a successful one.
I convey my sincere thanks to my project guide Mr. A.N.L. Kumar, Associate Professor,
Department of MCA for his learned suggestions and encouragement which made this project
a success.
I express my sincere thanks of all the faculty members of our Department of MCA for their
valuable support throughout this project work.
I wish to express my thanks to my friends for their enthusiasm, support and encouragement for the
completion of this project.
CH. JALAJAKSHI
Regd. No.18A21F0002
INDEX
S. NO CONTENTS PAGENO
ACKNOWLEDGMENT
ABSTRACT
1 INTRODUCTION 1
2 LITERATURE SURVEY 2
SYSTEM ANALYSIS
3.1 Feasibility Study
3.2 Software requirements specifications
3
3.3 Functional Requirements
5
3.4 Non Functional Requirements
3.5 Hardware Requirements
3.6 Software Requirements
3.7 Functional model of the system
SYSTEM DESIGN
4.1 Architectural Design 12
4
4.2 Data Flow diagram
4.3 UML Diagrams
4.4 Database Design
4.5 Life Cycle of Machine Learning
4.6 Algorithms
SYSTEM IMPLEMENTATION
5 26
5.1 MODULES
6 CODING 34
SYSTEM TESTING
7 7.1 Technique’s 60
7.2 Confusion Matrix
8 SCREENSHOTS 68
9 CONCLUSION 75
10 REFERENCE 76
ABSTRACT
Chronic Kidney Disease (CKD) is a type of chronic disease which means it happens slowly
over a period of time and persists for a long time thereafter. It is deadly at its end stage and will
only be cured by kidney replacement or regular dialysis which is an artificial filtering
mechanism. It is important to identify CKD at the early stage so that necessary treatments can
be provided to prevent or cure the disease. The main focus in this paper is on the classification
techniques, that is, tree-based decision tree, random forest, and logistic regression has been
analyzed. Different measure has been used for comparison between algorithms for the dataset
collected from standard UCI repository.
1. INTRODUCTION
Chronic Kidney Disease (CKD) is a critical health condition worldwide that is a major reason
for malicious health outcomes, particularly in countries where income ranges from low-to-
middle where millions die regularly due to lack of modest treatment. As per the stages in any
chronic disease the fatality is related to the stage it had been without being cured. The high-
risk factors of CKD are increasing frequency of diabetic patient, hypertension, heart disease,
mellitus and family history of kidney failure. If CKD is left undetected and therefore untreated,
it can lead to hypertension and in severe cases to kidney failure. WE procured a standard dataset
from the UCI machine repository for chronic kidney disease. CKD if predicted early and
accurately, can benefit patients in many ways. It increases the probability of a successful
treatment while also adding years to the person’s life. This paper work aims to predict kidney
disease by using some of the selected machine learning algorithms and feature selection
methods. The objective is to collect the combination of different feature and then have used it
as input to the machine learning algorithms. The algorithms have been implemented on the
basis of selected features and then we compare their performances.
1
2. LITERATURE SURVEY
Automating the process of predicting diseases prove assistive and time-saving for a practitioner
in the field of medical diagnosis. The accurate prediction of any disease not only helps the
patients know about their health but also helps the doctors in medication suggestion well in
advance. In today's lifestyle, advance knowledge about health and proper care can add a
number of living days to a patient's life. In this paper, the prediction of chronic kidney disease
(CKD) is performed using individual and ensemble learners. The experiments are performed
on CKD dataset was taken from UCI repository. The three different classifiers from individual
classifiers, namely, Naive Bayes(NB), minimal sequential optimization (SMO), J48, and three
ensemble classifiers, namely, Random Forest (RF), bagging, AdaBoost respectively are used
for prediction. We have used the open source, weka tool, for all the experiments. The results
are evaluated using accuracy, precision, recall, F-measure and ROC performance measures.
The results suggested that the decision tree based individual learner (J48) and random forest
from ensemble classifier respectively perform better than the other classifiers.
The massive amount of data collected by healthcare sector can be effective for analysis,
diagnosis and decision making if it is mined properly. Hidden information extracted from the
voluminous data can provide help and remedy to handle critical healthcare situations. Chronic
kidney disease is a fatal illness of kidney which can be prevented with early correct predictions
and proper precautions. Data mining of the information collected from previously diagnosed
patients opened up a new phase of medical advancement. However, specific techniques must
be executed to accomplish better consequence. In this manuscript the capability of the
classification of Support Vector Machine, Decision tree, Naive Bayes and K-Nearest Neighbor
algorithm, in analysing the chronic kidney disease dataset collected from UCI repository, was
2
investigated to predict the presence of kidney disease. Data set has been analyzed in terms of
accuracy, Root Mean Squared Error, Mean Absolute Error and Receiver Operating
Characteristic curve. In the present study, Decision tree shows promising results when
implemented through WEKA data mining tool. Ranking algorithm provides vital
improvements in classifications with proper number attributes. 15 proves to be the magic
number for selecting attributes for the given dataset resulting highest percent of improvement
in accuracy.
Chronic kidney disease (CKD) is a hazardous disease effecting many people worldwide.
Individuals with chronic kidney disease (CKD) are often unaware that the medical tests they
undergo may provide useful information about CKD for other purposes and this information
may not be used effectively to address disease diagnosis. The major problem of this disease is
it is hard to recognize till it reaches advanced stage. In this paper we are predicting chronic
kidney disease(CKD) using machine learning techniques. In this paper, we are using machine
learning algorithms like decision tree, naïve Bayes classification, logistic regression(LR),
support vector machine(SVM) and random forest In this paper we detect the chronic kidney
disease (CKD) using the best suited method and got 99.3% as the most accurate result using
random forest method.
Early detection and characterization are considered to be critical factors in the management
and control of chronic kidney disease. Herein, use of efficient data mining techniques is shown
to reveal and extract hidden information from clinical and laboratory patient data, which can
be helpful to assist physicians in maximizing accuracy for identification of disease
severity stage. The results of applying Probabilistic Neural Networks (PNN), Multilayer
Perceptron (MLP), Support Vector Machine (SVM) and Radial Basis Function (RBF)
algorithms have been compared, and our findings show that the PNN algorithm provides better
3
classification and prediction performance for determining severity stage in chronic kidney
disease.
Data mining has been a current trend for attaining diagnostic results. Huge amount of unmined
data is collected by the healthcare industry in order to discover hidden information for effective
diagnosis and decision making. Data mining is the process of extracting hidden information
from massive dataset, categorizing valid and unique patterns in data. There are many data
mining techniques like clustering, classification, association analysis, regression etc. The
objective of our paper is to predict Chronic Kidney Disease(CKD) using classification
techniques like Naive Bayes and Artificial Neural Network(ANN). The experimental results
implemented in RapidMiner tool show that Naive Bayes produce more accurate results than
Artificial Neural Network.
4
3. SYSTEM ANALYSIS
Existing System:
Cosmology and machine learning for chronic kidney disease as a complex versatile WEKA
tool. Ontology and machine learning are the techniques that have been utilized in existing
methodology. Therefore, it shows a chronic kidney disease to help instrument for taking care
of mistakes in the and helps clinicians adequately recognize intense kidney torment patients
from those with different reasons for kidney torments. Another machine learning procedure is
coronary artery disease method called N2 Genetic optimizer agent (another hereditary
preparing) has been presented in this methodology. These outcomes are aggressive and
practically identical to the best outcomes in the field.
Proposed System:
I worked on chronic kidney disease dataset obtained from UCI (University of California at
Irvine) repository, the data set contained attributes such as age, Blood pressure, specific gravity,
albumin, sugar, red blood cells, pus cell, pus cell clumps, bacteria, blood glucose random, blood
urea, serum creatinine, sodium, potassium, haemoglobin, packed cell volume, white blood cell
count, red blood cell count, hypertension, diabetes mellitus, coronary artery disease, appetite,
pedal edema, anemia and label. Blood u with 400 instances has taken. At first level, the dataset
is first cleansed and processed using preprocessing techniques like Data Integration, Data
transformation, Data reduction, and Data cleaning using pandas tool. The proposed framework
a total of 400 patient records were visualized. Data visualization techniques helps the data
scientist to understand the feasibility of the dataset.
5
Advantages of Proposed System:
• The accuracy of the classifiers was calculated using the confusion matrix.
• The classifier which bags up the highest accuracy could be determined as the best
classifier.
The feasibility of the project is analyzed in this phase and business proposal is put forth with
a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
• Economical Feasibility
• Technical Feasibility
• Social Feasibility
Economical Feasibility:
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
Technical Feasibility:
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead
to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
6
Social Feasibility:
The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate the user about the system and to make
him familiar with it. His level of c onfidence must be raised so that he is also able to make
some constructive criticism, which is welcomed, as he is the final user of the system.
Functional Requirements:
View previous data: This system allows patient to see their previous health data.
Predicted result: The system takes the input from the patient and predict the result the
patient having chronic kidney disease or not.
Non Functional Requirements describes about the requirements, which are completely related
to the modules. Non Functional Requirements contains security, availability, usability,
accuracy etc.
7
Security:
The system provides more security to the users. Only authorized users can access system. The
authorization can be given by the admin to the users. The system is having a priority role in
keeping the secrecy of the use.it helps in storing important files or documents. nobody can
directly access the documents.
Availability:
Availability describes how likely the system is accessible for a user at a given point in time.
The information will be available for the user interested products instantly.
Usability:
It should be easy for the user, and easy to learn, operate, prepare inputs and outputs through
interaction with a system and user will interact with your products to achieve required goals
effectively and efficiently.
Accuracy:
The level of accuracy is very high. All the operation would be done well. The system can
perform accuracy when the labeled data and unlabeled data can be sharing.
System Specification:
Hardware Requirements:
• Ram : 8 GB.
Software Requirements:
• IDE : python
• Database : sqlite.
8
3.3 Functional model of the system:
1. Patient Registration:
Use case name: Patient Registration
9
Entry condition: Patient can register with valid details.
2. Login:
Entry condition: Patient and Hospital superintendent m ust have login id and password
to enter into the system.
Flow of events:
Post Condition: If the username and password is valid then successfully enter into the
system
3. Activate users:
Entry condition: The hospital superintendent enters into webpage and gives the correct
emails password then the hospital superintendent page is open.
10
Flow of Events:
4. Add data
Flow of Events:
Exceptional Flow of Events: If the patient can enter the invalid data the data can be
added unsuccessfully added.
Exit condition: The patient data is successfully add and get test result.
5. Predicated Results:
Entry condition: The patient and hospital superintendent enters into webpage and gives the
correct emails password then the patient and hospital superintendent pages are open.
Flow of Events: Patient given statement to checking by using some algorithm. Result is
show on a page with training accuracy.
Exit condition: Patient successfully know the percentage of the given statement.
11
4. SYSTEM DESIGN
4.1 System Architecture:
A System architecture is the conceptual model, the conceptual model is used for defines the
structure, behaviour, and more views of a system. An architecture description is a formal
description and the formal description is used for shows the performance of a System
behaviour. A System architecture is contain the system components and the sub-systems
developments.
The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system.
The data flow diagram (DFD) is one of the most important modeling tools. It is used to model
the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the
system.
12
DFD shows how the information moves through the system and how it is modified by a series
of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
DFD is also known as bubble chart. A DFD may be used to represent a system at any level of
abstraction. DFD may be partitioned into levels that represent increasing information flow
and functional detail
13
4.3UML diagrams:
UML stands for Unified Modeling Language. UML is a standardized general-purpose
modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven successful in
the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the software
development process. The UML uses mostly graphical notations to express the design of
software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
14
Class diagram:
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information.
Class diagram
15
Sequence diagram:
Sequence diagram
16
Activity diagram:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.
Activity Diagram
17
4.4Database Design
Normalization:
Normalization Rules:
• BCNF
To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is dependent on any
proper subset of any candidate key of the table. Partial Dependency – If the proper subset of
candidate key determines non-prime attribute, it is called partial dependency.
18
Third Normal Form (3NF):
A relation is in third normal form, if there is no transitive dependency for non-prime attributes
as well as it is in second normal form. A relation is in 3NF if at least one of the following
condition holds in every non-trivial function dependency X –> Y
1. X is a super key.
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key.
19
Table:
20
4.5Life Cycle of Machine Learning:
1.Understanding objective
2. Data collection
3. Data preprocessing
5. Model selection
6. Import model
7. Model implementation
8. Data visualization
4.6Algorithms:
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches. The decisions or the test are
performed on the basis of features of the given dataset. It is a graphical representation for
getting all the possible solutions to a problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure. In order to build a tree, we use
the CART algorithm, which stands for Classification and Regression Tree algorithm. A
decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree
into subtrees.
21
Uses Decision Trees:
There are various algorithms in Machine learning, so choosing the best algorithm for the given
dataset and problem is the main point to remember while creating a machine learning model.
Below are the two reasons for using the Decision tree:
Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand. The logic behind the decision tree can be easily understood because it shows a tree-
like structure.
• It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
22
Steps to implement the Decision tree algorithm:
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:
Step-1: For implementing any algorithm, we need dataset. So during the first step of decision
tree, we must load the training as well as test data.
Step-2: Begin the tree with the root node, says S, the best attribute in the dataset using Attribute
Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -
3. Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based
23
on the concept of ensemble learning, which is a process of combining multiple classifiers to
solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to improve the predictive accuracy
of that dataset." Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.
The below diagram explains the working of the Random Forest algorithm:
24
Advantages of Random Forest
• Random Forest is capable of performing both Classification and Regression tasks.
• It is capable of handling large datasets with high dimensionality.
• It enhances the accuracy of the model and prevents the overfitting issue.
• First we randomly select ‘p’ features from the total ‘q’ features (where p<<q).
• From the selected’ ‘p’ features, we need to calculate a node, referred as ‘d’ using the
method best split point.
• Using the best split, we need to split the node into daughter nodes.
• Then we repeat the above steps till a number ‘l’ is reached.
• A forest of ‘n’ number of trees is built by applying the above steps ‘n’ number of times.
In simple words, the dependent variable is binary in nature having data coded as either 1
(stands for success/yes) or 0 (stands for failure/no).
25
5. IMPLEMENTATION
5.1 Modules:
• Patient
• Hospital superintendent
• Data Preprocessing
• Machine Learning
Modules Description:
Patient:
The patient can register the first. While registering he required a valid user email and mobile
for further communications. Once the patient register then hospital superintendent can activate
the user. Once hospital superintendent activated the patient then patient can login into our
system. Patient can upload the dataset based on our dataset column matched. For algorithm
execution data must be in int of float format. Here we took UCI repository dataset for testing
purpose. Patient can also add the new data for existing dataset based on our Django application.
Patient can click the Data Preparations in the web page so that the data cleaning process will
be starts. The cleaned data and its required graph will be displayed.
Hospital superintendent:
Hospital superintendent can login with his login details. Hospital superintendent can activate
the registered users. Once he activate then only the patient can login into our system. Hospital
superintendent can view the overall data in the browser. He can also check the algorithms ROC
Curve, confusion matrix and accuracy. The comparison accuracy bar graph also displayed here.
All algorithm execution complete then hospital superintendent can see the overall accuracy in
web page.
Data Preprocessing:
A dataset can be viewed as a collection of data objects, which are often also called as a records,
points, vectors, patterns, events, cases, samples, observations, or entities. Data objects are
described by a number of features that capture the basic characteristics of an object, such as
the mass of a physical object or the time at which an event occurred, etc. Features are often
called as variables, characteristics, fields, attributes, or dimensions. The data preprocessing in
this forecast uses techniques like removal of noise in the data, the expulsion of missing
information, modifying default values if relevant and grouping of attributes for prediction at
various levels.
26
Machine learning:
Based on the split criterion, the cleansed data is split into 60% training and 40% test, then the
dataset is subjected to three machine learning classifiers such as Logistic Regression (LR) with
pipeline, Decision Tree (DT), Random Forest (RF).The accuracy of the classifiers was
calculated using the confusion matrix. The classifier which bags up the highest accuracy could
be determined as the best classifier. For arch algorithm confusion matrix roc curve and
accuracy has been calculated and displayed in my results.
Python
Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language. An interpreted language, Python has a design philosophy that
emphasizes code readability (notably using whitespace indentation to delimit code blocks
rather than curly brackets or keywords), and a syntax that allows programmers to express
concepts in fewer lines of code than might be used in languages such as C++or Java. Itprovides
constructs that enable clear programming on both small and large scales. Python interpreters
are available for many operating systems. CPython, the reference implementation of Python,
is open source software and has a community-based development model, as do nearly all of its
variant implementations. CPython is managed by the non- profit Python Software Foundation.
Python features a dynamic type system and automatic memory management. It supports
multiple programming paradigms, including object-oriented, imperative, functional and
procedural, and has a large and comprehensive standard library.
Interactive Mode Programming
Invoking the interpreter without passing a script file as a parameter brings up the following
prompt −
$ python
>>>
Type the following text at the Python prompt and press the Enter −
27
If you are running new version of Python, then you would need to use print statement with
parenthesis as in print ("Hello, Python!");. However in Python version 2.4.3, this produces the
following result −
Hello, Python!
Invoking the interpreter with a script parameter begins execution of the script and continues
until the script is finished. When the script is finished, the interpreter is no longer active.
Let us write a simple Python program in a script. Python files have extension .py. Type the
following source code in a test.py file −
Live Demo
We assume that you have Python interpreter set in PATH variable. Now, try to run this program
as follows −
$ python test.py
Hello, Python!
Let us try another way to execute a Python script. Here is the modified test.py file −
Live Demo
#!/usr/bin/python
We assume that you have Python interpreter available in /usr/bin directory. Now, try to run this
program as follows −
$./test.py
Hello, Python!
28
Python Identifiers
A Python identifier is a name used to identify a variable, function, class, module or other object.
An identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero or more
letters, underscores and digits (0 to 9).
Python does not allow punctuation characters such as @, $, and % within identifiers. Python is
a case sensitive programming language. Thus, Manpower and manpower are two different
identifiers in Python.
Class names start with an uppercase letter. All other identifiers start with a lowercase letter.
Starting an identifier with a single leading underscore indicates that the identifier is private.
Starting an identifier with two leading underscores indicates a strongly private identifier.
If the identifier also ends with two trailing underscores, the identifier is a language-defined
special name.
Reserved Words
The following list shows the Python keywords. These are reserved words and you cannot use
them as constant or variable or any other identifier names. All the Python keywords contain
lowercase letters only.
assert finally or
def if return
elif in while
else is with
29
DJANGO
Django is a high-level Python Web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle
of Web development, so you can focus on writing your app without needing to reinvent the
wheel. It’s free and open source.
Django's primary goal is to ease the creation of complex, database-driven websites. Django
emphasizes reusability and "pluggability" of components, rapid development, and the principle
of don't repeat yourself. Python is used throughout, even for settings files and data models.
Django also provides an optional administrative create, read, update and delete interface that
is generated dynamically through introspection and configured via admin models
30
Create a Project
Whether you are on Windows or Linux, just get a terminal or a cmd prompt and navigate to
the place you want your project to be created, then use this code −
myproject/
manage.py
myproject/
__init__.py
settings.py
urls.py
wsgi.py
The “myproject” folder is just your project container, it actually contains two elements −
manage.py − This file is kind of your project local django-admin for interacting with your
project via command line (start the development server, sync db...). To get a full list of
command accessible via manage.py you can use the code −
31
$ python manage.py help
The “myproject” subfolder − This folder is the actual python package of your project. It
contains four files −
urls.py − All links of your project and the function to call. A kind of ToC of your project.
Your project is set up in the subfolder myproject/settings.py. Following are some important
options you might need to set −
DEBUG = True
This option lets you set if your project is in debug mode or not. Debug mode lets you get more
information about your project's error. Never set it to ‘True’ for a live project. However, this
has to be set to ‘True’ if you want the Django light server to serve static files. Do it only in the
development mode.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'database.sql',
'USER': '',
'PASSWORD': '',
'HOST': '',
'PORT': '',
32
Database is set in the ‘Database’ dictionary. The example above is for SQLite engine. As stated
earlier, Django also supports −
MySQL (django.db.backends.mysql)
PostGreSQL (django.db.backends.postgresql_psycopg2)
MongoDB (django_mongodb_engine)
Before setting any new engine, make sure you have the correct db driver installed.
You can also set others options like: TIME_ZONE, LANGUAGE_CODE, TEMPLATE…
Now that your project is created and configured make sure it's working −
You will get something like the following on running the above code −
Validating models...
0 errors found
A project is a sum of many applications. Every application has an objective and can be reused
into another project, like the contact form on a website can be an application, and can be reused
for others. See it as a module of your project.
33
6. CODING
urls.py
url(r'^admins/', admin.site.urls),
url(r'^logout/',fstapp.logout,name="logout"),
path('UploadCSVToDataBase/',user.UploadCSVToDataBase,name='UploadCSVToDataBase
'),
path('BrowseCSV/',user.BrowseCSV,name='BrowseCSV'),
path('UserDataView/',user.UserDataView,name='UserDataView'),
path('UserAddData/',user.UserAddData,name='UserAddData'),
34
views.py:
import io
import csv
def UserRegister(request):
form = UserRegistrationForm()
return render(request,'user/Register.html',{'form':form})
def UserRegisterAction(request):
if request.method == 'POST':
form = UserRegistrationForm(request.POST)
if form.is_valid():
print('Data is Valid')
form.save()
# return HttpResponseRedirect('./CustLogin')
form = UserRegistrationForm()
else:
print("Invalid form")
else:
35
form = UserRegistrationForm()
def UserLogin(request):
def UserLoginCheck(request):
if request.method == "POST":
loginid = request.POST.get('loginname')
pswd = request.POST.get('pswd')
try:
status = check.status
if status == "activated":
request.session['id'] = check.id
request.session['loggeduser'] = check.name
request.session['loginid'] = loginid
request.session['email'] = check.email
else:
36
except Exception as e:
pass
def UserDataView(request):
data_list = livearDataModel.objects.all()
page = request.GET.get('page', 1)
try:
users = paginator.page(page)
except PageNotAnInteger:
users = paginator.page(1)
except EmptyPage:
users = paginator.page(paginator.num_pages)
def BrowseCSV(request):
return render(request,'user/BrowseCsv.html',{})
def UploadCSVToDataBase(request):
# declaring template
template = "users/UserHomePage.html"
data = HearDataModel.objects.all()
# prompt is a context variable that can have different values depending on their context
prompt = {
37
'order': 'Order of the CSV should be name, email, address, phone, profile',
'profiles': data
# GET request returns the value of the data with the specified key.
if request.method == "GET":
csv_file = request.FILES['file']
if not csv_file.name.endswith('.csv'):
data_set = csv_file.read().decode('UTF-8')
# setup a stream which is when we loop through each line we are able to handle a data in a
stream
io_string = io.StringIO(data_set)
next(io_string)
_, created = livearDataModel.objects.update_or_create(
age=column[0],
bp=column[1],
sg=column[2],
al=column[3],
su=column[4],
rbc=column[5],
pc=column[6],
38
pcc=column[7],
ba=column[8],
bgr=column[9],
bu=column[10],
sc=column[11],
sod=column[12],
pot=column[13],
hemo=column[14],
pcv=column[15],
wbcc=column[16],
rbcc=column[17],
htn=column[18],
dm=column[19],
cad=column[20],
appet=column[21],
pe=column[22],
ane=column[23],
class1=column[24],
def UserAddData(request):
if request.method == 'POST':
form = livearDataModelForm(request.POST)
39
if form.is_valid():
print('Data is Valid')
form.save()
# return HttpResponseRedirect('./CustLogin')
form = livearDataModelForm()
else:
print("Invalid form")
else:
form = livearDataModelForm()
def knn(request):
df = pd.read_csv('./chronic_kidney_disease_full.csv')
data = df
data.head()
40
data['pcc'] = data['pcc'].map({'present': 1, 'notpresent': 0})
data['class'].value_counts()
# plt.figure(figsize=(19, 19))
print(data.shape)
print(data.isnull().sum())
print(data.shape[0], data.dropna().shape[0])
print(data.dropna(inplace=True))
classifier =KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2)
X = data.iloc[:, :-1]
y = data['class']
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
print("prediction", y_pred)
print(acc)
# return render(request,'decision.html',{"acc":acc})
41
from sklearn.metrics import confusion_matrix, mean_absolute_error, mean_squared_error,
f1_score, precision_score, \
recall_score
plt.show()
models.py:
class UserRegistrationModel(models.Model):
name = models.CharField(max_length=100)
loginid = models.CharField(unique=True,max_length=100)
42
password = models.CharField(max_length=100)
mobile = models.CharField(max_length=100)
email = models.CharField(max_length=100)
locality = models.CharField(max_length=100)
address = models.CharField(max_length=1000)
city = models.CharField(max_length=100)
state = models.CharField(max_length=100)
status = models.CharField(max_length=100)
def __str__(self):
return self.loginid
class Meta:
db_table='Users'
class HearDataModel(models.Model):
age = models.IntegerField()
sex = models.IntegerField()
cp = models.IntegerField()
trestbps = models.IntegerField()
chol = models.IntegerField()
fbs = models.IntegerField()
restecg = models.IntegerField()
thalach = models.IntegerField()
exang = models.IntegerField()
oldpeak = models.FloatField()
slope = models.IntegerField()
43
ca = models.IntegerField()
thal = models.IntegerField()
target = models.IntegerField()
def __str__(self):
return self.id
class Meta:
db_table = 'HeartDatabase'
class livearDataModel(models.Model):
age=models.CharField(max_length=50)
bp=models.CharField(max_length=50)
sg=models.CharField(max_length=50)
al=models.CharField(max_length=50)
su=models.CharField(max_length=50)
rbc=models.CharField(max_length=50)
pc=models.CharField(max_length=50)
pcc=models.CharField(max_length=50)
ba=models.CharField(max_length=50)
bgr=models.CharField(max_length=50)
bu=models.CharField(max_length=50)
sc=models.CharField(max_length=50)
sod=models.CharField(max_length=50)
pot=models.CharField(max_length=50)
hemo=models.CharField(max_length=50)
pcv=models.CharField(max_length=50)
44
wbcc=models.CharField(max_length=50)
rbcc=models.CharField(max_length=50)
htn=models.CharField(max_length=50)
dm=models.CharField(max_length=50)
cad=models.CharField(max_length=50)
appet=models.CharField(max_length=50)
pe=models.CharField(max_length=50)
ane=models.CharField(max_length=50)
class1=models.CharField(max_length=50)
def __str__(self):
return self.id
class Meta:
db_table ='livearDatabase'
forms.py:
class UserRegistrationForm(forms.ModelForm):
name = forms.CharField(widget=forms.TextInput(attrs={'pattern':'[a-zA-Z]+'}),
required=True,max_length=100)
loginid = forms.CharField(widget=forms.TextInput(attrs={'pattern':'[a-zA-Z]+'}),
required=True,max_length=100)
password = forms.CharField(widget=forms.PasswordInput(attrs={'pattern':'(?=.*\d)(?=.*[a-
z])(?=.*[A-Z]).{8,}','title':'Must contain at least one number and one uppercase and lowercase
letter, and at least 8 or more characters'}), required=True,max_length=100)
45
mobile = forms.CharField(widget=forms.TextInput(attrs={'pattern':'[56789][0-9]{9}'}),
required=True,max_length=100)
email = forms.CharField(widget=forms.TextInput(attrs={'pattern':'[a-z0-9._%+-]+@[a-z0-
9.-]+\.[a-z]{2,}$'}), required=True,max_length=100)
city = forms.CharField(widget=forms.TextInput(attrs={'class':'form-control' ,
'autocomplete': 'off','pattern':'[A-Za-z ]+', 'title':'Enter Characters Only '}),
required=True,max_length=100)
state = forms.CharField(widget=forms.TextInput(attrs={'class':'form-control' ,
'autocomplete': 'off','pattern':'[A-Za-z ]+', 'title':'Enter Characters Only '}),
required=True,max_length=100)
class Meta():
model = UserRegistrationModel
fields='__all__'
class livearDataModelForm(forms.ModelForm):
age = models.CharField(max_length=50)
bp = models.CharField(max_length=50)
sg = models.CharField(max_length=50)
al = models.CharField(max_length=50)
su = models.CharField(max_length=50)
rbc = models.CharField(max_length=50)
pc = models.CharField(max_length=50)
pcc = models.CharField(max_length=50)
46
ba = models.CharField(max_length=50)
bgr = models.CharField(max_length=50)
bu = models.CharField(max_length=50)
sc = models.CharField(max_length=50)
sod = models.CharField(max_length=50)
pot = models.CharField(max_length=50)
hemo = models.CharField(max_length=50)
pcv = models.CharField(max_length=50)
wbcc = models.CharField(max_length=50)
rbcc = models.CharField(max_length=50)
htn = models.CharField(max_length=50)
dm = models.CharField(max_length=50)
cad = models.CharField(max_length=50)
appet = models.CharField(max_length=50)
pe = models.CharField(max_length=50)
ane = models.CharField(max_length=50)
class1 = models.CharField(max_length=50)
class Meta():
model = livearDataModel
fields = '__all__'
adminbase.html:
{% load static %}
<!DOCTYPE html>
<html lang="en">
47
<head>
<meta charset="UTF-8">
</head>
<body>
<div class="container">
</a>
<span class="navbar-toggler-icon"></span>
</button>
48
<div class="collapse navbar-collapse" id="navbarToggler">
</div>
</li>
</li>
</li>
</li>
</li>
49
</li>
</li>
</li>
<li class="nav-item">
</li>
</ul>
</div>
</div>
</div>
</nav>
<main class="bg-light">
<div class="hero-caption">
<div class="col-lg-6">
50
<h3 class="mb-4 fw-medium">Performance Analysis of Machine Learning Classifier for
Predicting Chronic Kidney Disease</h3>
</ol>
</nav>-->
</div>
</div>
</div>
</div>
</div><br><br>
{% block contents %}
{% endblock %}
<div class="container">
<div class="col-lg-8">
<div class="card-page">-->
<hr>
<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et
51
accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus
est</p>
<p>Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor</p>-->
<embed class="embed-video"
src="https://www.youtube.com/embed/k1D0_wFlXgo?list=PLl-
K7zZEsYLmnJ_FpMOZgyg6XcIGBu2OX">
</div>
<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et
accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus
est Lorem ipsum dolor sit amet.</p>
</div>
<hr>
<div class="team-item">
<div class="team-avatar">
52
</div>
</div>
</div>
<div class="team-item">
<div class="team-avatar">
</div>
</div>
</div>
<div class="team-item">
<div class="team-avatar">
</div>
<h5 class="team-name">Ellysa</h5>
</div>
</div>
</div>
53
</div>
<h5 class="fg-primary">Partners</h5>
<hr>
<div class="p-3">
</div>
</div>
<div class="p-3">
</div>
</div>
<div class="p-3">
</div>
</div>
<div class="p-3">
54
<img src="../assets/img/clients/global_tv.png" alt="">
</div>
</div>
<div class="p-3">
</div>
</div>
<div class="p-3">
</div>
</div>
<div class="p-3">
</div>
</div>
<div class="p-3">
</div>
</div>
</div>
55
</div>
</div>
</div>
</div>
</div>
<div class="container">
<h5 class="mb-3">Pages</h5>
<ul class="menu-link">
</ul>
</div>
<h5 class="mb-3">Company</h5>
<ul class="menu-link">
56
<li><a href="#" class="">Leadership</a></li>
</ul>
</div>
<h5 class="mb-3">Contact</h5>
<ul class="menu-link">
</ul>
</div>
<h5 class="mb-3">Subscribe</h5>
<form method="POST">
<div class="input-group">
<div class="input-group-append">
57
</div>
</div>
</form>-->
</div>
</div>
</div>
</div>
<hr>
<div class="container">
<div class="row">
58
<!-- <p class="d-inline-block ml-2">Copyright © <a
href="https://www.macodeid.com/" class="fg-white fw-medium">MACode ID</a>. All
rights reserved</p>
</div>
</ul>
</div>
</div>
</div>
</div>-->
</body>
</html>
59
7. SYSTEM TEST
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail in an unacceptable manner. There are various types of
test. Each test type addresses a specific testing requirement.
7.1Types of Tests:
Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components
is correct and consistent. Integration testing is specifically aimed at exposing the problems
that arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
60
Output : identified classes of application outputs must be exercised.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two
distinct phases.
Field testing will be performed manually and functional tests will be written in detail.
61
Test objectives
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level
– interact without error.
Test Results:
All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.
Test Results:
All the test cases mentioned above passed successfully. No defects encountered.
62
Test Cases
Excepted Remarks(IF
S.no Test Case Result
Result Fails)
If User If already user
1. User Register registration Pass email exist then it
successfully. fails.
If User name and
password is
Un Register Users
2. User Login correct then it Pass
will not logged in.
will getting valid
page.
According to UCI
A new record will repository the data
User Add the
3. added to our Pass must be Integer
Data
dataset. fields otherwise its
failed.
The data will be in
int or float format,
Data will be
4. Data Cleaning Pass otherwise
cleaned.
algorithm will not
work..
Target class is
Age and target positive or
Plot based graph
5. attribute plot a Pass negative label
is generated
box class, If not it will
fail.
User added data
User can add
will be consider Data added to test
6. extra records for Pass
for testing data for model.
testing
purpose.
For our all
Calculate models confusion Data is consider
7. Pass
Confusion Matrix matrix is for testing.
calculated
For our five Accuracy will be
Accuracy will models the consider, the
8. Pass
calculated accuracy will failed case is data
calculated in binary format
Admin can login
with his login Invalid login
9. Admin login credential. If Pass details will not
success he get his allowed here
home page
Admin can Admin can If user id not
activate the activate the Pass found then it
10.
register users register user id won’t login.
63
7.2Confusion Matrix:
Confusion matrix is a table that is often used to describe the performance of a classification
model (or "classifier") on a set of test data for which the true values are known.The confusion
matrix is a matrix used to determine the performance of the classification models for a given
set of test data. It can only be determined if the true values for test data are known. The matrix
itself can be easily understood, but the related terminologies may be confusing. Since it shows
the errors in the model performance in the form of a matrix, hence also known as an error
matrix. Some features of Confusion matrix are given below:
• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3
table, and so on for n classes,it is n*n.
• The matrix is divided into two dimensions, that are predicted values and actual values along
with the total number of predictions.
• Predicted values are those values, which are predicted by the model, and actual values are
the true values for the given observations.
True Negative: Model has given prediction No, and the real or actual value was also No.
64
True Positive: The model has predicted yes, and the actual value was also true.
False Negative: The model has predicted no, but the actual value was Yes, it is also called as
Type-II error.
False Positive: The model has predicted Yes, but the actual value was No. It is also called a
Type-I error.
We can perform various calculations for the model, such as the model's accuracy, using this
matrix. These calculations are given below:
𝑡𝑝+𝑡𝑛
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑡𝑝+𝑡𝑛+𝑓𝑝+𝑓𝑛
10+30
= 10+30+0+0
=1
The Mean absolute error represents the average of the absolute difference between the actual
and predicted values in the dataset. It measures the average of the residuals in the dataset.
𝑁
1
𝑀𝐴𝐸 = ∑|𝑦𝑖 − 𝑦̂|
𝑁
𝑖=1
𝑦𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛
𝑥𝑖 = 𝑡𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒
65
Mean Square Error:
Mean Squared Error represents the average of the squared difference between the original and
predicted values in the data set. It measures the variance of the residuals.
1
𝑀𝑆𝐸 = 𝑁 ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂)2
𝑦𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛
𝑥𝑖 = 𝑡𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒
Root Mean Squared Error is the square root of Mean Squared error. It measures the standard
deviation of residuals.
1
RMSE=√𝑀𝑆𝐸 = √𝑁 ∑𝑁 ̂)2
𝑖=1(𝑦𝑖 − 𝑦
𝑦𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛
𝑥𝑖 = 𝑡𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒
F1_score:
If two models have low precision and high recall or vice versa, it is difficult to compare these
models. So, for this purpose, we can use F-score. This score helps us to evaluate the recall and
precision at the same time. The F-score is maximum if the recall is equal to the precision. It
can be calculated using the below formula
2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1_𝑠𝑐𝑜𝑟𝑒 = (𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙)
2∗1∗1
= 1∗1
=1
66
Precision:
It can be defined as the number of correct outputs provided by the model or out of all positive
classes that have predicted correctly by the model, how many of them were actually true. It can
be calculated using the below formula:
𝑡𝑝
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑡𝑝+𝑓𝑝
=1
Recall:
It is defined as the out of total positive classes, how our model predicted correctly. The recall
must be as high as possible.
𝑡𝑝
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑡𝑝+𝑓𝑛
10
= 10+0
=1
67
8. SCREEN SHOTS
Patient Registration:
68
Hospital superintendent homepage:
69
Patient Login page:
70
Patient having CKD:
71
Confusion matrix for Logistic Regression Algorithm:
72
Confusion matrix for Decision Tree Algorithm:
73
Confusion Matrix for Random Forest Algorithm:
74
9. CONCLUSION
We were able to evaluate the performance of different ML algorithms on the chronic kidney
disease data set we took from UCI machine learning library. We preprocess the dataset and
then used the filter method of feature selection that is univariate selection and correlation
matrix along with feature importance to find best features from the dataset. The proposed
algorithm that is, Decision tree, Random Forest and logistic regression have achieved an
accuracy of 98.48, 94.16 and 99.24 respectively. Precision of 100, 95.12 and 98.82 and recall
of 97.61, 96.29 and 100. Two feature selecting techniques are combined by leveraging the
strength of each the techniques. On comparison we find Logistic Regression with highest
accuracy and recall while Decision tree have the highest precision.
75
10.REFERENCES
1. R, G. Sasi, R. Sankar, and O. Deepa, “Decision Support system for diagnosis and prediction
of Chronic Renal Failure using Random Subspace Classification,” 2016, pp. 1287–1292.
2. D. S. Sisodia and A. Verma, “Prediction Performance of Individual and Ensemble learners
for Chronic Kidney Disease,” 2017, pp. 1027– 1031.
3. A. V Kshirsagar et al., “A Simple Algorithm to Predict Incident Kidney Disease,” ARCH
Intern Med, vol. 168, no. 22, pp. 2466– 2473, 2008.
4. A. K. Shrivas and S. Kumar Sahu, “Classification of Chronic Kidney Disease using Feature
Selection Techniques,” IJCSE, vol. 6, no. 5, pp. 649–653, 2018.
5. M. Kumar, “Prediction of Chronic Kidney Disease Using Random Forest Machine
Learning Algorithm,” Int. J. Comput. Sci. Mob. Comput., vol. 5, no. 2, pp. 24–33, 2016.
6. P. Yildirim, “Chronic Kidney Disease Prediction on Imbalanced Data by Multilayer
Perceptron: Chronic Kidney Disease Prediction,” in Proceedings - International Computer
Software and Applications Conference, 2017, vol. 2, pp. 193–198, doi:
10.1109/COMPSAC.2017.84.
7. V. Kunwar, K. Chandel, S. A. sai, and A. Bansal, “Chronic kidney disease analysis using
data mining classification techniques,” 2016, pp. 300–305.
8. E. H. A. Rady and A. S. Anwar, “Prediction of kidney disease stages using data mining
algorithms,” Informatics Med. Unlocked, vol. 15, pp. 1–7, Jan. 2019, doi:
10.1016/j.imu.2019.100178.
9. V. S and D. S, “Data Mining Classification Algorithms for Kidney Disease Prediction,”
Int. J. Cybern. Informatics, vol. 4, no. 4, pp. 13– 25, Aug. 2015, doi:
10.5121/ijci.2015.4402.
10. A. Nway Oo, “Classification of Chronic Kidney Disease (CKD) Using Rule based
Classifier and PCA,” Int. J. Adv. Manag. Technol. Eng. Sci., vol. 8, no. 4, pp. 728–733,
2018.
76