100% found this document useful (1 vote)

127 views

Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset

This document is a report submitted by Pratiksha Dutta for the partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The report details a comparative analysis of machine learning algorithms using a diabetes dataset. Pratiksha conducted the project under the supervision of Mr. Kishor Kashyap and the report includes a declaration, certificate of supervision, external examiner's certificate, acknowledgements, abstract, contents, and introduction sections.

Uploaded by

Himangshu Shekhar Baruah

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

127 views

Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset

Uploaded by

Himangshu Shekhar Baruah

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

COMPARATIVE ANALYSIS OF MACHINE LEARNING

ALGORITHMS USING DIABETES DATASET

REPORT SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENT FOR THE DEGREE OF

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

BY
PRATIKSHA DUTTA
oll Number: 160103011
R

MR. KISHOR KASHYAP

ASSISTANTPROFESSOR

DEPARTMENT OF INFORMATION TECHNOLOGY

GAUHATI UNIVERSITY
GUWAHATI, INDIA, JULY –2020
DECLARATION

I “ PRATIKSHA DUTTA”, Roll No “ 160103011” B.Tech. student of the

department of Information Technology, Gauhati University hereby declares that I have
compiled this report reflecting all my works during the semester long full time project
as part of my BTech curriculum.
I declare that I have included the descriptions etc. of my project work, and
nothing has been copied/ replicated from other’s work. The facts, figures, analysis,
results, claims etc. depicted in my thesis are all related to my full time project work.
I also declare that the same report or any substantial portion of this report has not
been submitted anywhere else as part of any requirements for any degree/ diploma etc.

(Pratiksha Dutta)
Branch:CSE
Date:
GAUHATI UNIVERSITY

DEPARTMENT OF INFORMATION TECHNOLOGY

Gopinath Bordoloi Nagar, Jalukbari Guwahati-781014

Date:

CERTIFICATE

This is to certify that “Pratiksha Dutta” bearing Roll No: “160103011” has
carried out the project work “Comparative Analysis of Machine Learning Algorithms
Using Diabetes Dataset ” under my supervision and has compiled this report reflecting
the candidate’s work in the semester long project. The candidate did this project full time
during the whole semester under my supervision, and the analysis, results, claims etc. are
all related to his/her studies and works during the semester.

I recommend submission of this project report as a part for partial fulfillment of the
requirements for the degree of Bachelor of Technology in Information
Technology/Computer Science & Engineering of Gauhati University.
(MR. KISHOR KASHYAP )
(Assistant Professor )
GAUHATI UNIVERSITY
DEPARTMENT OF INFORMATION TECHNOLOGY
Gopinath Bordoloi Nagar, Jalukbari Guwahati-781014

External Examiners Certificate

This is to certify that “Pratiksha Dutta”, bearing Roll No “160103011” has

delivered her project presentation and I examined his/her report entitled “Comparative
Analysis of Diabetes Dataset Using Machine Learning Algorithms” and recommend this
project report as a part for partial fulfillment of the requirements for the degree of
Bachelor of Technology in Information Technology/Computer Science & Engineering
of Gauhati University.

_ _
ACKNOWLEDGEMENT

Project is an open door for learning and self-development. It is been a pleasure to

have so many wonderful people lead me through in completion of this project work.

I convey my sincere gratitude to Mr. Kishor Kashyap who guided us throughout the
project. Without his spirit of accommodation, willing disposition, timely
clarification and frankness and moreover, faith in me, this study could not have been
completed and at the same time fruitful. His readiness to discuss all important
matters at work deserve special attention. I am indebted to him for his constant
support, encouragement and interest at all steps of this project.

I sincerely thank Dr. Vaskar Deka,Head of Department of Information

Technology,Gauhati University for giving me the opportunity to carry out my
project.I am highly obliged and grateful.I also like to thank all the faculty of the
college for their valuable suggestions and help.

Pratiksha Dutta

Branch:CSE
Abstract

Machine Learning centers predominantly around planning of framework, consequently permitting them to learn
and make prediction dependent on certain encounters which is data in the event of machines.It empowers the
system to act and settle on data driven choices as opposed to being unequivocally programmed in playing out a
specific task.These system learn and improve over time with more and more experiences through the new data
fed to it. Advancement in this technology has brought a remarkable change in the arena of medical science as it
helps in improving patient care now and in future by processing huge datasets beyond human capabilities and
as well as equip doctors with more valuable information such as early detection of diseases for better diagnosis
and treatment options. This project tries to bring a comparative analysis of the classification algorithm and
predict and evaluate its performance on medical dataset such as the diabetes dataset.Since diabetes is a chronic
lifestyle disease,its prior detection is very necessary for an efficient and accurate treatment thus making way for
the application of machine learning tasks such as classification.

Keyword: Machine Learning,Classification,Diabetes Dataset.

Chapter 1: Introduction

1.1 Overview 6

1.2 Problem 6

1.3 Thesis Goals 7

1.4 Thesis Methodology 7

Chapter 2:Background

Chapter 3:Methods and Methodology

Chapter 4: Implementation

Chapter 5:Results and Discussion 27-28

Chapter 6: Conclusion 29-31

Future Work 32
References 33-35
List of Figures
Figure 1:Bar graph
Figure 2:Histogram
Figure 3:Heatmap
Figure 4:ROC-AUC curve of Naive Bayes classifier
Figure 5:ROC-AUC curve of Linear Discriminant classifier
Figure 6:ROC-AUC curve of KNN classifier
Figure 7:ROC-AUC curve of Random Forest classifier
Figure 8:ROC-AUC curve of Support Vector Machine classifier
Figure 9:ROC-AUC curve of Logistic Regression classifier
Figure 10:ROC-AUC curve of Decision Tree classifier

List of Tables
Table 1:Performance Evaluation of Machine Learning Algorithms.
CHAPTER 1: INTRODUCTION

Machine learning is a subset of Artificial Intelligence that acquaints the machine to learn like a human
brain.The machine is enabled to develop a pattern on the information that is fed to it through machine
learning.The astounding development of machine learning as an instrument to identify, describe, perceive,
order, or create complex information and its quick applications in a wide scope of fields, including image
recognition,face detection,traffic prediction,disease prediction,data analysis to even in quantum systems has
marked a tremendous potentiality in this technology and is now gaining a fresh momentum.Especially in the
field of medical science,machine learning is one of the brilliant technique of modelling a broad range of
variable associated with a disease.

Examining physiological information, ecological impacts, and hereditary factors permits medical experts to
analyze diseases early and all the more viably.Machine learning double checks for a medical expert that he
might miss.This project aims to study and compare the outcomes of the machine learning algorithms on the
prima indian diabetes dataset.

1.1 Overview

Machine learning is helping clinical experts make conclusion simpler by overcoming any
barrier between gigantic informational indexes and human knowledge.It enables us to
study the underlying biological mechanism associated with a disease and also determine
the impact of risk factor on their development, as well as help detect the group of people
who tends to have the risk of developing the disease.

1.2 Motivation and Objective

Diabetes is a chronic disease which affects the normal metabolism of human body resulting in increased blood
sugar level.Given the clinical information we can assemble about individuals, we ought to have the option to
improve prediction on how likely an individual is to endure the beginning of diabetes, and in this way early
treatment can be done .Diabetes left untreated can have many fatal complications.We can begin breaking down
information and trying different machine learning algorithms that will assist us with considering the beginning
of diabetes in Pima Indians and then draw a comparison between them to study the most effective algorithm in
diagnosis of this ailment.
1.3 Problem Statement

The application of machine learning in healthcare sector is essentially benefitting this field with early diagnosis
of diseases as well as smarter techniques of diagnosis.This powerful subset of AI has been proved to have true
life impacting potential in health care especially in the area of medical diagnosis.
The focus of this project is to identify the group of people who are highly likely to be detected with diabetes
through machine learning classification algorithms and further to draw an evaluation of the various algorithms
comparing their performance on the given dataset.
CHAPTER 2: BACKGROUND STUDY AND RELATED WORKS

2.1 Classification in Machine Learning

Machine learning is one of the fastest developing zones of software engineering, with sweeping application.
Machine learning field is exploding day by day.As it gains more and more experiences,the system’s
performance in a particular task is anticipated to be better at it. Based on the type of feedback received by the
machine for it to learn,machine learning is broadly categorized into 3 types: supervised learning, unsupervised
learning and reinforcement learning .

• In supervised learning, the learner is trained on labelled examples or instances. The ideal outputs for a problem
are known ahead of time. The basic objective of the machine is to learn a function that draws a relationship
between the input and the output. The most common form of supervised learning is classification.In
classification, a machine correctly predicts categorization of a set of instances,to be precise, features or
attributes into their classes,given their class labels. Classification can be binary or multi class.

• Unsupervised learning works on unlabelled instances Here,the system cannot explicitly learn from any correct
answers. It endeavors to discover intrinsic patterns or similarities and differences that can then be used to
determine groups for the given instances. Clustering is a form of unsupervised learning, where the learner must
explore underlying structures or correlations in the data to learn relationships rather than rules.

• In reinforcement learning, desired outputs are not directly provided.Each activity of the learner has diverse
effect on nature and the nature provides feedback on its action in the form of prizes and punishments. The
learner learns based on the prizes and punishments that it receives from the environment.

• Classification is one of the significant errands in machine learning, which is a process of appointing a given
data that are instances or examples to a class or a category . The learning algorithm uses a set of examples to
learn a classifier that is expected to correctly predict the class label of unseen (future) instances . The learnt
classifier takes the values of the features or attributes of an object as input and the predefined class labels for the
object as output. The set of class labels is defined as part of the problem (by users).
A typical classification example is the email spam-catching system, which is important and necessary in
real-world applications. Given a set of emails marked as “spam” and “non-spam” , the learner will learn the
characteristics of the spam emails and then the learnt classifier is able to process future email messages to mark
them as “spam” or “non-spam”.

2.2 K-Nearest Neighbors

The most straightforward classifier in the collection of machine learning techniques is the Nearest Neighbour
Classifier .In this algorithm,the value of a query point is determined with the help of ‘feature similarity’ with the
nearest neighbors and further this query point is assigned a class based on how closely it is related to those
neighbors.
Here, a value of K is chosen in terms of integer, which is the nearest data point.Then for every test data point,a
distance is calculated by considering the test data and each row of training data and then on the basis of the
distance calculated, the datapoints are sorted and then first k rows are selected from this sorted array.On the
basis of frequent class of these rows of the array, the test datapoints are assigned a class.

2.3 Support Vector Machine

The latest supervised machine learning technique is Support Vector Model.These models are similar to classical
multilayer perceptron neural networks.SVMs is based on the idea of a ―margin‖—either side of a hyperplane
that separates two data classes. Maximizing the margin and thereby creating the largest possible distance
between the separating hyperplane and the instances on either side of it has been proven to reduce an upper
bound on the expected generalisation error.

2.4 Decision Tree Classifier

Decision Trees (DT) are trees that group cases by arranging them dependent on feature values. A feature in an
example to be classified is depicted by each node of a decision tree, and each branch depicts the assumed value
of the node. Classification of the nodes of instances starts at the root node and orderly arranged according to
their feature values.Decision tree used in decision tree learning, data mining and machine learning, is nothing
but a predictive model which maps observations about an item to decisions about the objective value of the
item.. Such tree models are also known as classification trees or regression trees .Decision tree classifiers for the
most part utilize post-pruning procedures that assess the presentation of decision trees, as they are pruned by
utilizing an validation set. Any node can be removed and allotted the most well known class of the training
instances that are arranged to it.

2.5 Naive Bayes

Naive bayes classifier is an classification algorithm dependent on bayes hypothesis which gives a supposition of
freedom among predictors.In basic terms, a Naive Bayes classifier accept that the presence of a specific feature
in a class is random to the presence of some other feature.
Even if the features rely on each other, all of these properties contribute to the probability independently. Naive
Bayes model is easy to make and is especially useful for relatively large data sets. Even superficially, Naive
Bayes is known to outflank majority of the classification methods in machine learning. The Bayes theorem used
to implement Naive Bayes is as followed:

2.6 Logistic Regression

The most commonly used machine learning algorithm to predict binary classes i.e churn or not churn is logistic
regression. Based on the Sigmoid Function to each input,these algorithm attaches an output having probability
of 0 or 1.The threshold is set to 0.5 based on which anything higher than that will be categorized as 1 (churn)
and anything below that as 0 (not churn).

2.7 Random Forest

Forest as it means group of trees .The most simple and diverse supervised classifying algorithm which combines
the prediction results of each of the decision trees on data samples and then finally finds the best value by means
of voting is Random Forest.This is an ensemble method which considers the average of the result of each single
decision tree and hence avoid overfitting.
2.8 Linear Discriminant Analysis

The limitation of logistic regression such as multi class classification,unstability with well separated classes and
instability with fewer instances is overcomed by another linear classifying algorithm which calculates some
statistics for each classes.This considers the mean and variance for a single feature whereas for multiple feature
this is the means and covariance matrix calculated over Gaussian multivariate.

2.9 Related Works

● Ramana .et al. [1] has carried out an analysis on classification algorithms such as Bagging, IBK, J48,
JRip, Multilayer perceptron (MP) and Naive Bayes (NB) classifiers on several medical datasets such as
Breast Cancer Data, Chronic Kidney Disease, Cryotherapy, Hepatitis, Immunotherapy, Indian Liver
Patient Dataset (ILPD), Liver Disorders, and Liver disorders dataset.
● Nindrea.et al. [2] worked to predict the diagnostic accuracy of different machine learning algorithms for
breast cancer risk calculation.They showed that SVM had a better accuracy value than other machine
learning algorithm.
● PR.et al.[3] detected lung cancer by using machine learning algorithm and thus carrying out a
comparison among them to determine efficiency of the algorithms on prior detection of the disease.They
showed SVM was the better algorithm in terms of accuracy for the lung cancer dataset.
● Orabi.et al.[4] proposed a classification model for diabetes disease which can anticipate the likeliness
of diabetes at a particular age.The proposed model is organized subject to application of decision
tree.The outcomes acquired were acceptable for the expectation of the disease at a particular age, with
the most noteworthy exactness using Decision tree framework.[11][3].
● Jakka.et al.[5] carried out an investigation to assess the performance of classifiers in predicting the
probability of diabetes in patients.The classifiers were evaluated to find that Logistic Regression (LR)
has the best accuracy rate i.e 77.6% compared to other algorithms.
● Mujumdar.et al.[6] proposed a similar comparison of classifiers on two different diabetes dataset to
find Logistic Regression bagged an highest accuracy of 97.2% .Also application of pipeline resulted in
AdaBoost to be the best classification model and have an accuracy of 98.8%.
● Pradhan.P.et. al.[7] trained and test the PID dataset for diabetes from the UCI store applying
Genetic-programming (GP) method.
● Yuvaraj.et al.[8] introduced an application for diabetes prediction utilizing three diﬀerent ML classifiers
that is Random Forest, Decision Tree, and the Naïve Bayes.They extracted the relevant features and
chose only eight features among 13.The results they found is that random forest algorithm has a
accuracy of 94% which is higher than the other algorithm.
● Nongyao et al [9] used the classifiers DT, SVM, LR and NB to determine the risk of diabetes mellitus
The experiment demonstrated that LR gives perfect result among others.
CHAPTER 3: METHODS AND METHODOLOGY

3.1 Importing the data

The dataset is fetched in csv format.The dataset used for implementation of this project is one already available
dataset from Prima India diabetes database which is available on kaggle.The datasets comprise of few clinical
indicators (independent) variables and one target (dependent) variable, Outcome.Number of pregnancies the
patient has had, their blood pressure, insulin level, glucose level etc are some of the independent variables.Some
of the important python open source libraries such as numpy which is an universal standard for working with
scientific numerical data, pandas which allows high performance,easy to handle data structures and data analysis
tool,seaborn for data visualization and matplotlib for static,animated,interactive visualization,are imported.
After this the analysis is done on the data to explore it to learn its potential features and also to check whether
the data needs cleaning.

3.2 Application of EDA

Exploratory Data Analysis (EDA) is a methodology/reasoning for data investigation that utilizes an assortment
of strategies (generally graphical) to

1. amplify understanding into a dataset;

2. reveal basic structure;
3. separate significant variables;
4. identify exceptions and abnormalities;
5. test hidden assumptions;
6. develop stingy models; and
7. decide ideal factor settings.

The specific graphical strategies utilized in EDA are usually very straightforward, comprising of different
methods of:
● Plotting the crude data( such as data traces, histograms, bihistograms, probability plots, lag plots, block
plots, and Youden plots).
● Plotting basic insights, for example, mean plots, standard deviation plots, box plots, and main effect
plots of the raw data.
● Situating such plots in order to amplify our natural pattern-recognition abilities, such as using multiple
plots per page.

The dimension of the diabetes dataset using head() and shape() was found to be 768 rows and 9
columns.The outcome column is used for prediction of whether the patient has diabetes or
not.If the outcome is zero(0),then the patient doesn’t have diabetes and if it is one (1) it means
the patient doesn’t have diabetes.

After this,using isnull() pandas function the dataset is checked for null values for carrying out data
cleaning and however ,it was observed that there was no null data point in the diabetes dataset
that is obtained.
In this project, the graphical representation of the diabetes dataset is done through bar plot which is plotted
essentially for tally data, or data that show accumulation or proportion,histogram which is a classical
exploratory plots that show range and densities of the data and also show distribution of different variables and
are usually applied on numerical data and heatmap which is used to find the correlation of every variable
between the dataset in numerical form as well as in visual form for better understanding with the darkest red
that show high correlation and darkest blue that show none or negative correlation.

3.3 Splitting the data into test and train data

After the data is scaled and arranged using exploratory data analysis, the data is divided into train and test set.
The train set is the information on which we apply our algorithm or preparation procedure to build the model
while the test data is utilized to assess the model.

3.4 Application of Machine Learning Algorithms

The procedure by which a learning algorithm (classifier inducer) utilizes observations to get familiar with
another classifier is known as the training process and the procedure by which the learnt classifier is tried on
unseen perceptions is known as the testing process. The machine learning algorithm are programs (math and
rationale) that alter themselves to perform better as they are presented to more data. Just how human change
when they process data by learning,the "learning" in “Machine Learning implies that those algorithm change as
they process data over time.So an algorithm is a program with a particular method to changing its own
parameters, given feedback on its past performances making prediction about a dataset.The algorithms most
commonly used on almost an dataset for classifications are
● Linear Regression
● Logistic Regression
● Decision Tree
● SVM
● Naive Bayes
● kNN
● K-Means
● Random Forest
● Dimensionality Reduction Algorithms
● Gradient Boosting algorithms
○ GBM
○ XGBoost
○ LightGBM
○ CatBoost

In this project, an evaluation is drawn among the algorithms- Logistic Regression,Decision Trees,SVM,Naive
Bayes and KNN on the diabetes dataset for comparison of their performance on the same.

3.5 Evaluation of Performance Measures

To know which algorithm is best run on the dataset,the following parameters of performance measure are
evaluated on comparison.
ROC Curve-The ROC curve is considered good or bad depending on AUC(or Area under Curve) and also the
other parameters which are also called as confusion matrix.A confusion matrix is a table that is usually used to
depict the exhibition of an classification model on a lot of test data for which the true values are known.

From above,as we can see True Positive and True Negatives are the desired and correct prediction so shown in
green whereas False Positive and False Negative happen when the actual class negates with the predicted class.
True Positives (TP) - When the value of genuine class is yes and the value of predicted class is also yes,that is
correctly predicted true values.
True Negatives (TN) - When the value of genuine class is no and value of predicted class is also no that is
correctly predicted false values.
False Positives (FP) – When genuine class is no and predicted class is yes.
False Negatives (FN) – When genuine class is yes but predicted class in no.

Accuracy-Accuracy is the most instinctive performance measure and it is essentially a proportion of effectively
predicted observation to the total observation. One may believe that, on the off chance that we have high
accuracy, at that point our model is ideal. Indeed, accuracy is an extraordinary measure however just when we
have symmetric datasets where estimations of false positive and false negatives are practically same. Along
these lines, there are other parameters which are equally important to assess the performances of the model.

Accuracy = TP+TN/TP+FP+FN+TN

Precision - Precision refers to the ratio of positive observations that is predictably correctly to the total positive
observations predicted. Precision measures the extent of correct prediction by the model.High precision means
low false positive rate.

Precision = TP/TP+FP

Recall (Sensitivity) - Recall refers to the ratio of total positive observations correctly predicted to all the total
sum of positive observation as well negative observation in actual class.
Recall = TP/TP+FN

F1 score - F1 score refers to the mean of the precision and the recall.It considers both false positive and false
negative.An F1 score is considered best when it’s score is 1, while the model is a total failure when it’s score is
0

F1 Score = 2(Recall Precision) / (Recall + Precision)

CHAPTER 4: RESULTS AND DISCUSSION

● EDA

Figure 1:This bar graph shows the positive class and negative class of the dataset.The blue bar having 0 value
depicts the negative class that is number of patients without diabetes and the orange bar having 1 value depicts
the positive class that is number of patients with diabetes.
Figure 2:The histogram provides the distribution pattern of all the attributes along the dataset. These plots can
also be interpreted as the probability distribution function (PDF) of each of the features.
Figure 3:A heatmap is used to collectively plot the correlation patterns of each of the features
present in a dataset. This gives a clear idea of how each of the features are related,and any
redundant or duplicate features present in the dataset can be visualised from here. The denser
the colour more correlated the data is.
● EXPERIMENTAL RESULT

Examination of Various ML Classifier models is assessed to the Diagnosis of Diabetes. Accuracy of the
performance and f1-score of the classifiers is assessed dependent on Incorrectly and Correctly Classified
Instances out of a complete number of cases.

ALGORITHMS ACCURACY PRECISION RECALL F1-SCORE

K-NEAREST 0.7552 0.7506 0.7552 0.7524

NEIGHBORS

SUPPORT VECTOR 0.7812 0.7742 0.7812 0.7729

MACHINE

LOGISTIC 0.7917 0.7859 0.7917 0.7864

REGRESSION

NAIVE BAYES 0.7656 0.7572 0.7656 0.7575

DECISION TREE 0.7344 0.7432 0.7344 0.7378

RANDOM FOREST 0.7865 0.7816 0.7865 0.7829

LINEAR 0.8021 0.797 0.8021 0.7945

DISCRIMINANT
ANALYSIS
Table1:Performance evaluation of the machine learning algorithm in terms of the value of accuracy,precision,f1
score and recall.
As the above table 1 shows,linear discriminant analysis has the best accuracy among all the other
algorithms.Linear discriminant analysis gives an accuracy of 80%.The f1-score of the algorithms also shows
that LDA is the better algorithm than others.

Figure 4: ROC-AUC curve of Naive Bayes classifier

Fig 5: ROC-AUC curve of Linear Discriminant Analysis classifier

Fig 6: ROC-AUC curve of KNN classifier

Fig 7: ROC-AUC curve of Random Forest classifier

Fig 8: ROC-AUC curve of SVC classifier

Fig 9: ROC-AUC curve of Logistic Regression

Fig 9: ROC-AUC curve of Decision Tree

The above curves gives the results as follows

Decision Tree: ROC AUC=0.711

Logistic: ROC AUC=0.861
SVC: ROC AUC=0.842
NB: ROC AUC=0.818
LDA: ROC AUC=0.860
KNN: ROC AUC=0.760
Random Forest: ROC AUC=0.848
Higher the value of AUC,better is the classifier. Logistic Regression and LDA has comparable AOC
scores and are both the best classifiers for this dataset.

CHAPTER 5:CONCLUSION AND FUTURE SCOPE

In medical field, an early detection and proper diagnosis plays a very important part for the
treatment of a particular disease.For early detection, the doctor has to put the patient go through
a numerous examinations which is cumbersome.Thus to optimize the process time for an early
detection,machine learning is the need of the hour.In this project,certain classification
algorithms were evaluated based on their classification performance in terms of accuracy,
sensitivity, precision, specificity and ROC area.In terms of accuracy,Linear Discriminant
Analysis shows a higher accuracy.Logistic Regression and Linear Discriminant Analysis has
comparable ROC area.
CHAPTER 6:REFERENCES

[1]Bendi Venkata Ramana ; Raja Sarath Kumar Boddu,Performance Comparison of Classification Algorithms
on Medical Datasets
[2]Ricvan Dana Nindrea,*Teguh Aryandono, Lutfan Lazuardi, and Iwan Dwiprahasto,Diagnostic Accuracy of
Different Machine Learning Algorithms for Breast Cancer Risk Calculation: a Meta-Analysis
[3]Radhika P R,Rakhi.A.S.Nair,Veena G,A Comparative Study of Lung Cancer Detection using Machine
Learning Algorithms
[4] Orabi,” Early Predictive System for Diabetes Mellitus Disease,” , in Industrial Conference on DataMining,
2016, Springer.pp.420–427
[5]Aishwarya Jakka, Vakula Rani J,Performance Evaluation of Machine Learning Models for Diabetes
Prediction
[6]Aishwarya Mujumdar, Dr. Vaidehi V.,Diabetes Prediction using Machine Learning Algorithms
[7]Pradhan P, “Design of Classifier for Detection of Diabetes Mellitus Using Genetic Programming,” in
Advances in Intelligent Systems and Computing ( AISC) , 2014, vol. 1, pp 763–770.
[8]Yuvaraj, N.; Sri Preetha, K.R. Diabetes prediction in healthcare systems using machine learning algorithms
on Hadoop cluster. Clust. Comput. 2017, 22, 1–9
[9]Nongyao Nai-arun and Punnee Sittidech,]Ensemble Learning Model for Diabetes Classification
[10]Souad Larabi-Marie-Sainte , Linah Aburahmah, Rana Almohaini and Tanzila Saba, Current Techniques for
Diabetes Prediction: Review and Case Study
[11]NaiArun, “Comparison of Classifiers for the Risk of Diabetes Prediction,” in Proceedia Computer Science,
7. 2015, vol. 69, pp 132–142.
[12]Dr Saravana kumar N M, Eswari T, Sampath P and Lavanya S,” Predictive Methodology for Diabetic Data
Analysis in Big Data”, 2nd International Symposium on Big Data and Cloud Computing,2015.
[13]https://www.edureka.co/blog/classification-in-machine-learning/#:~:text=In%20machine%20learning%2C
%20classification%20is,recognition%2C%20document%20classification%2C%20etc.
[14]https://machinelearningmastery.com/case-study-predicting-the-onset-of-diabetes-within-five-years-part-1-of
-3/
[15]https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d0
1761
[16]https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm

[17]https://towardsdatascience.com/machine-learning-workflow-on-diabetes-data-part-01-573864fcc6b8

SIT718 Assessment-Task 4-T3 2019-Amended PDF
No ratings yet
SIT718 Assessment-Task 4-T3 2019-Amended PDF
7 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
2016 Medical Diagnosis With The Aid of Using Fuzzy Logic
100% (1)
2016 Medical Diagnosis With The Aid of Using Fuzzy Logic
19 pages
Practice CSCS Test 01
100% (1)
Practice CSCS Test 01
4 pages
Thesis Leadership of Apple
100% (1)
Thesis Leadership of Apple
78 pages
Diabetes Pridiction Using Machine Learning
No ratings yet
Diabetes Pridiction Using Machine Learning
31 pages
Roadmap To Build A Machine Learning Model
No ratings yet
Roadmap To Build A Machine Learning Model
12 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Data Analytics in Hospitality Industry
No ratings yet
Data Analytics in Hospitality Industry
13 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Machine Learning - Customer Segment Project. Approved by UDACITY
100% (1)
Machine Learning - Customer Segment Project. Approved by UDACITY
19 pages
Class Material - 1
No ratings yet
Class Material - 1
66 pages
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
No ratings yet
Study Material BTech IT VIII Sem Subject Deep Learning Deep Learning Btech IT VIII Sem
30 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Prediction of Alzheimer's Disease Using CNN
100% (2)
Prediction of Alzheimer's Disease Using CNN
11 pages
Big Data
No ratings yet
Big Data
9 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Life Expectancy Using Data Analytics
100% (1)
Life Expectancy Using Data Analytics
9 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Course Collections by Coursera - Machine Learning & Artificial Intelligence
100% (2)
Course Collections by Coursera - Machine Learning & Artificial Intelligence
6 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
How Artificial Intelligence (Ai) Is Revolutionizing Learning and Development (L&D) Practices
100% (1)
How Artificial Intelligence (Ai) Is Revolutionizing Learning and Development (L&D) Practices
36 pages
Visvesvaraya Technological University: City Engineering College
No ratings yet
Visvesvaraya Technological University: City Engineering College
31 pages
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
No ratings yet
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
19 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
No ratings yet
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
74 pages
Drug Dosage Control System Using Reinforcement Learning
No ratings yet
Drug Dosage Control System Using Reinforcement Learning
8 pages
Data Analytics Project
No ratings yet
Data Analytics Project
9 pages
Irjmets - Analysis of Diabetic Retinopathy in Fundus Images Using CNN
100% (2)
Irjmets - Analysis of Diabetic Retinopathy in Fundus Images Using CNN
13 pages
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Applications of Data Mining in The Banking Sector
No ratings yet
Applications of Data Mining in The Banking Sector
8 pages
AirBnb EDA
100% (1)
AirBnb EDA
20 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Econ209 f2024 Lab 4 Truong Gia Han
No ratings yet
Econ209 f2024 Lab 4 Truong Gia Han
11 pages
Computer Science On Diabetes
75% (4)
Computer Science On Diabetes
49 pages
Data Mart Info
No ratings yet
Data Mart Info
5 pages
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
No ratings yet
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
4 pages
What Is Molecular Imaging?
No ratings yet
What Is Molecular Imaging?
10 pages
Alzheimers Disease Detection Using Different Machine Learning Algorithms
100% (1)
Alzheimers Disease Detection Using Different Machine Learning Algorithms
7 pages
4958-Article Text-9220-1-10-20210503
No ratings yet
4958-Article Text-9220-1-10-20210503
11 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
KPMG Virtual Internship - Presentation For Client - by - Dhruv Bhatia
No ratings yet
KPMG Virtual Internship - Presentation For Client - by - Dhruv Bhatia
13 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
100% (1)
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
20 pages
Machine Learning in Big Data
No ratings yet
Machine Learning in Big Data
10 pages
Download full Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel ebook all chapters
100% (6)
Download full Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel ebook all chapters
61 pages
Malaria Detection Using CNN
100% (1)
Malaria Detection Using CNN
9 pages
Chapter 11: Business Intelligence and Knowledge Management: Oz (5th Edition)
100% (1)
Chapter 11: Business Intelligence and Knowledge Management: Oz (5th Edition)
20 pages
Artificial Intelligence: CS60045 Course Introduction
100% (4)
Artificial Intelligence: CS60045 Course Introduction
16 pages
Loan Prediction System
No ratings yet
Loan Prediction System
5 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Smart Disease Prediction Using Machine Learning
No ratings yet
Smart Disease Prediction Using Machine Learning
5 pages
116222942-Data Mining-On-Forest-Cover-Prediction
No ratings yet
116222942-Data Mining-On-Forest-Cover-Prediction
21 pages
Random Forest
No ratings yet
Random Forest
18 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
2018-19 - Sydney Opera House Annual Report - LR Spreads
No ratings yet
2018-19 - Sydney Opera House Annual Report - LR Spreads
108 pages
Krone Distribution Box 102218be
No ratings yet
Krone Distribution Box 102218be
4 pages
2014 Mitsubishi Mirage Ac
No ratings yet
2014 Mitsubishi Mirage Ac
13 pages
Demolish Serious Culture!: Henry Flynt Meets The New York Avant-Garde
No ratings yet
Demolish Serious Culture!: Henry Flynt Meets The New York Avant-Garde
1 page
Characterization and Reuse of Kiln Rollers Waste in The Manufacture of Ceramic Floor Tiles
No ratings yet
Characterization and Reuse of Kiln Rollers Waste in The Manufacture of Ceramic Floor Tiles
7 pages
Project For The Web Admin Help
No ratings yet
Project For The Web Admin Help
64 pages
SSR Supreme Brochure
No ratings yet
SSR Supreme Brochure
23 pages
HIRADC JAYA EXCAVATOR JADI Icha
No ratings yet
HIRADC JAYA EXCAVATOR JADI Icha
15 pages
Ipl, 225 H60, 2002-06
No ratings yet
Ipl, 225 H60, 2002-06
27 pages
Before The Lights Go Out: A Survey of EMP Preparedness Reveals Significant Shortfalls
No ratings yet
Before The Lights Go Out: A Survey of EMP Preparedness Reveals Significant Shortfalls
15 pages
Ra 9072
No ratings yet
Ra 9072
17 pages
Class 3 - Normal Distribution
No ratings yet
Class 3 - Normal Distribution
20 pages
Engine Misfire
No ratings yet
Engine Misfire
6 pages
Sorting and Filtering
No ratings yet
Sorting and Filtering
4 pages
Digital Textbooks Listing - 17K Pivot Subjects
No ratings yet
Digital Textbooks Listing - 17K Pivot Subjects
1,182 pages
tRENCHING AND SHORING
No ratings yet
tRENCHING AND SHORING
58 pages
Circular Economy in Spanish SMEs Challenges and Opportunities
100% (1)
Circular Economy in Spanish SMEs Challenges and Opportunities
11 pages
sda final
No ratings yet
sda final
7 pages
Gigabyte Ga-h81m-s Rev 10 PDF
No ratings yet
Gigabyte Ga-h81m-s Rev 10 PDF
28 pages
Cmos Digital Circuits - Book
100% (1)
Cmos Digital Circuits - Book
56 pages
Indifference Curve Analysis
No ratings yet
Indifference Curve Analysis
36 pages
Technical Specification: 40' X 8' X 8'6" ISO 1AA TYPE Steel Dry Cargo Container
No ratings yet
Technical Specification: 40' X 8' X 8'6" ISO 1AA TYPE Steel Dry Cargo Container
23 pages
Photographic Release Form
No ratings yet
Photographic Release Form
2 pages
Advanced Fired Boilers: Oil and Gas
No ratings yet
Advanced Fired Boilers: Oil and Gas
12 pages
Lecture 5 - Costs and Profit
No ratings yet
Lecture 5 - Costs and Profit
7 pages
The Secret To Marketing Simulations by Concentric
No ratings yet
The Secret To Marketing Simulations by Concentric
41 pages
Cebu T-121 (ND) 051723
No ratings yet
Cebu T-121 (ND) 051723
18 pages
Report autoDNA WBAVU71040KG92706 PDF
No ratings yet
Report autoDNA WBAVU71040KG92706 PDF
6 pages