Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
97 views

The Development of Mobile-Based Symptom Analysis For Early Detection of Diseases Using Hyper-Tuned C-Support Vector Classification Algorithm

Uploaded by

toysdreat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

The Development of Mobile-Based Symptom Analysis For Early Detection of Diseases Using Hyper-Tuned C-Support Vector Classification Algorithm

Uploaded by

toysdreat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 the 5th International Conference on Control and Robotics

The Development of Mobile-Based Symptom


Analysis for Early Detection of Diseases Using
Hyper-Tuned C-Support Vector Classification
Algorithm
Jefferson A. Costales Aivan Carlos B. Tuquero Nelson V. Nolia
Eulogio “Amang” Rodriguez Eulogio “Amang” Rodriguez Eulogio “Amang” Rodriguez
Institute of Science and Technology Institute of Science and Technology Institute of Science and Technology
Manila, Philippines Manila, Philippines Manila, Philippines
jacostales@earist.ph.education tuquero.ac.bscs@gmail.com nolia.n.bscs@gmail.com
2023 5th International Conference on Control and Robotics (ICCR) | 979-8-3503-0762-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCR60000.2023.10444876

Ma. Maila A. Martinez Tricia Anne M. Borcelis


Eulogio “Amang” Rodriguez Eulogio “Amang” Rodriguez
Institute of Science and Technology Institute of Science and Technology
Manila, Philippines Manila, Philippines
martinez.mm.bscs@gmail.com borcelis.ta.bscs@gmail.com

Abstract—One of the fundamental human needs that everyone up during the occurrence of symptoms to detect the possible
should be able to access is healthcare, but most people are unable cause of the changes in physical or psychological aspects of an
to do so for a variety of reasons, including remote areas, a lack of individual. With regards to that, individuals are able to prevent
healthcare facilities, and a lack of financial assistance. In this study, the continuous development of the disease and avoid its more
the researchers aim to diagnose users' diseases associated with serious effects. As mentioned, not everyone can afford medical
symptoms. To achieve this objective, the researchers built and check-ups every time they experience such symptoms.
leveraged an advanced predictive model upon the Hyper-Tuned
C-Support Vector Classification Algorithm. This model served as In this study, the researchers aim to develop a mobile-based
the cornerstone for the analysis, harnessing the power of machine diagnosis tool for early detection of diseases using Hyper-tuned
learning to accurately diagnose diseases. The researchers gathered C-Support Vector Classification Algorithm. A supervised
secondary data from Kaggle, and have used rigorous performance learning approach is used for classification and regression tasks.
metrics to test and analyze the accuracy and effectiveness of the
model. The study showed a promising result for contributing to II. RELATED WORKS
the early detection of diseases using machine learning methods.
Battineni et al. [2] focused on algorithms in machine
Keywords—Support Vector Machine (SVM), Disease, Hyper
learning for diagnosing different chronic diseases, highlighting
Tuning, Machine Learning, Predictive Model, Disease Detection, the lack of standard methods in real-time and actual medical
Healthcare, Diagnosis, Artificial Intelligence. practice. Logistic Regression, Support vector machine, and
clustering were the algorithms most used in disease
I. INTRODUCTION classification and diagnosis.
Health care is one of the primary human needs that everyone Liu et al. [3] conducted a study on the comparison of the
should have access to, however, most people do not have the performance on diagnostics between various deep learning
capability to acquire healthcare services due to several factors techniques with that of healthcare professionals, finding similar
such as remote locations, limited healthcare facilities, and a lack accuracy. However, the studies lacked externally validated
of financial support. These factors constraint individuals’ ability results and suffered from poor reporting, calling for new
to monitor their health and leave them unaware of their reporting standards tailored to deep learning challenges.
condition. In the study of Flores et al, about 50% of the Diwakar et al. [4] emphasized the importance of early and
population in the Philippines are experiencing lack of access to accurate disease diagnosis, the study focused on using image
basic healthcare services and do not have healthcare facilities fusion and various machine learning techniques for the
within 30 minutes away from the neighborhood [1]. Although prediction of different heart diseases. The researchers proposed
the government has already passed historic legislation known as an effective algorithm with enhanced precision through feature
the Universal Health Care (UHC) Act of 2019 to address the selection in order to aid with difficulties in accessing healthcare
issue, it will take quite some time before it officially or medical services.
accommodates individuals. Given the fact that individuals might
still confront one of the factors mentioned above, it is necessary Asaad [5] made a technical report that presents a novel
that the person know the significance of having a medical check- approach for using machine learning techniques in classifying

979-8-3503-0762-7/23/$31.00 ©2023 IEEE 150


Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
diabetes. Specifically, the algorithm used to categorize diabetes reported by the patients to classify its diagnoses. With the lack
into two classes is Support Vector Machine (SVM), and then of physicians and the difficulties that patients face in accessing
utilize it using various functions such as polynomial, linear, and healthcare, the development of this digital technology has a
sigmoid. The performance evaluation of this approach significant potential to enhance the identification and diagnosis
incorporates a pre-processing stage and employs different conditions of urologic diseases [11].
standard criteria. The polynomial function yields the highest
results, achieving with the accuracy of 83.77%, its sensitivity One of the main causes of death worldwide are heart diseases,
resulted with 86.07%, and result of its specificity to 81.97%. followed by diabetes that has been established in Centers for
Additionally, the researchers compared their study between the Medicare and Medicaid services by thorough evaluation of data
previous studies, and demonstrated the superior classification coming from different medical bodies all around the world.
ability of this approach in diagnosing diabetes. Although medical treatment has improved in curing these
diseases, leveraging Machine Learning techniques is also
Diabetes and tuberculosis are widespread and deadly considered to be one of the best ways to analyze and detect early
diseases that often go undiagnosed. Artificial Intelligence (AI) symptoms of diseases using a model trained on one of the
offers a solution by enabling early disease detection. AI techniques [12].
algorithms, such as Support Vector Machine with AIRS, have
shown high accuracy in detecting tuberculosis (92-100%) and Grampurohit and Sagarnal [13], conducted a study wherein
diabetes (83.0% in training, 76.9% in testing). This highlights they trained the data using three different algorithms: Random
the efficacy of AI in disease detection, particularly for Forest, Decision tree, and Naive Bayes. The dataset used
tuberculosis [6]. consisted of a total of 41 diseases with a total of 132 symptoms.
Considering 95/132 symptoms, they also performed K-fold
Shah et al. [7] and Hema et al. [8] conducted separate studies cross validation techniques (K=5). After training the data they
involving machine learning for disease prediction. Shah et al. achieve an accuracy score (DT = 93.29%, RF = 93.29%, NB =
focused on cardiovascular disease prediction, utilizing the 93.61%), while after performing testing the model, they achieve
Cleveland heart disease dataset with 303 cases and 17 attributes an accuracy of 95.12% for each. The three models were able to
from the UCI machine learning repository. They employed classify 39 out of 41 diseases.
various supervised classification methods, including Naive
Bayes, Decision Trees, Random Forest, and K-Nearest Rivera et al. presented an article on the development of a
Neighbor (KKN). The KKN model exhibited an impressive 90.8% mobile application for kidney disease diagnosis. Their
accuracy, emphasizing the potential of machine learning for application utilizes data from literature studies and expert
cardiovascular disease prediction and the importance of model interviews to provide users with information about different
selection. On the other hand, Hema et al. concentrated on kidney conditions. By utilizing the algorithms that are used in
symptom-based disease prediction, offering a user-friendly the study such as Fuzzy logic and Decision Tree, the app
graphical interface for symptom input. They employed machine accurately filters symptoms and aids in the early detection of
learning algorithms such as Naive Bayes, Random Forest, and kidney diseases. Evaluation results show that the mobile
Decision Tree, with Naive Bayes achieving 93% accuracy, application achieved its objectives and received positive
Random Forest 98%, and Decision Tree 94%. This research feedback from Nephrologists, patients, and mobile developers,
highlighted the practicality of machine learning for health confirming its functionality and usability [14].
assessment based on symptoms. AI-based diagnostic machine learning algorithms for liver
Lucas et al. [9] conducted a study using different machine disease detection tools may be feasible to predict the early stage
learning algorithms on disease prediction modeling based on of liver cancer, according to a study by Gamara et al. Liver
symptoms. The researchers made a website for the users to input cancer is the third leading cause of death in the Philippines, but
their symptoms and predict the disease. Throughout the study, there is no currently established detection method for identifying
the researchers trained the data and created a model. The the early stage of the disease. The authors developed a model
algorithms used are Naive Bayes, Decision Tree, Random Forest, based on an artificial neural network that achieved an accuracy
and K-Nearest Neighbor. After comparing the algorithms, it of 89% in predicting the early stage of liver cancer [15].
resulted in a 97% accuracy value. III. METHODOLOGY
Estonilo and Festijo [10] focused on developing a mobile The researchers aim to develop a model using the Hyper-
application for predicting mellitus diabetes to its users. Deep tuned C-Support Vector Classification Algorithm. The initial
Learning was used in the study. To create the model for step involved loading the required dataset, followed by data
predicting the disease, a function in the Keras Library called cleaning, which included the detection of any null values while
Sequential was used in the study. The prediction model resulted quantifying their values. In the second step, various symptoms
in an accuracy of 93%. were encoded into the dataset. Moving on to the third step, the
Dallas et al. conducted a study to enhance the accuracy of dataset was split into two parts, a training dataset and a testing
identifying diseases; it aimed to utilize machine learning dataset. The majority of the dataset was used in training, while
methods for categorizing lower urinary tract symptoms (LUTS) the remaining data was used in testing the model.
based on the clinical data records of the patients. This approach Moving on to the fourth step, researchers employed different
aimed to create a new diagnostic tool for effectively identifying algorithms with cross-validation techniques to assess their
patients with voiding complaints. They developed an algorithm performance on the dataset. Finally, the model was optimized
based on machine learning that solely relied on symptoms using GridSearchCV to identify the best hyperparameters

151
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
available, ensuring accurate classification of diseases based on inaccuracies in the analysis. To maintain the integrity of the
symptoms. dataset, the researchers assigned appropriate values to the null
entries, ensuring that no critical information was overlooked or
A. Support Vector Machine Algorithm omitted. This comprehensive procedure was crucial as
A supervised machine learning approach known as a Support disregarding it could introduce inconsistencies and hinder the
Vector Machine (SVM) can be used to separate data that is linear reliability of the dataset for subsequent analyses and predictions.
and nonlinear. In contrast with Extreme Learning Machines that
stands out for its simplicity and quick training for big data, but
it also has some drawbacks that makes it not the ideal algorithm
to use. Some of the drawbacks are less control over feature
extraction, sensitivity to initialization and the challenges in
interpreting model decisions. A Neural Network might also be a
good choice due to its versatility and the ability of handling
complex tasks, but due to its high cost computational
requirements that are not able to be achieved due to limited
resources. With regards to this, the proposed algorithm was
chosen due to its capability and considered as low cost.
In dealing with real-world data, the Support Vector Machine
(SVM) is capable of handling the data through manipulating the
kernel, also known as kernel trick. The following is the SVM
formula that make use of of the poly kernel:
Fig. 1. Ratio of Null Values

B. Features Used

TABLE I. FEATURES USED

Features Definition

Disease Depicts the diseases that may be present

Symptoms These are manifestations of physical condition that


indicate the possible presence of disease.

Table I shows a comprehensive reference, providing various


essential features to be utilized by an algorithm for the purpose
of possibly predicting diseases based on symptoms. The
inclusion of these features aids in establishing a systematic and
reliable approach to identifying potential ailments or health
conditions.
C. Design
The initial phase of the study involved the acquisition of a
dataset through a CSV file. The researchers obtained the dataset Fig. 2. Procedures on Model Creation
from Kaggle, specifically the "Disease Symptom Prediction"
dataset. This dataset encompassed information on 41 distinct Fig. 2 illustrates the sequential steps involved in creating a
diseases. Additionally, there were separate CSV files containing model using data from a symptoms and disease dataset extracted
descriptions and precautions associated with each disease. To from Kaggle. The process begins with data preprocessing, where
establish relationships between various features, the researchers the incoming dataset is cleaned, checked for null values, and
performed mapping operations, ensuring that the dataset had shaped to ensure data quality and integrity. Next is to divide the
cohesive connections and comprehensive information. data into 25% testing set and 75% training set in order to
facilitate model building. Different algorithms are compared
After the dataset was obtained, the researchers proceeded
using cross-validation techniques, and the model is fine-tuned
with essential pre-processing steps. The researchers first
using the GridSearchCV method to identify the optimal
shuffled the dataset to eliminate any inherent biases or patterns
hyperparameters. The model is created and built using the
in the original order. Then the researchers conducted a thorough
selected algorithm and hyperparameters, and its performance is
cleaning process, aiming to identify and address null values as
evaluated using performance metrics, specifically focusing on
shown in Fig. 1 that could potentially lead to irregularities or
the C-Support Vector Classification algorithm. Finally, the

152
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
researchers saved the model for external use, allowing it to be indicates that the model successfully achieved a similarity of
utilized by the mobile application. about 88.39% between its projected outcomes and the actual
events. According to this result, the model's predictions were
IV. RESULT AND DISCUSS accurate in 88.39% of the cases. This score was considered low
The gathered dataset from Kaggle consists of three tables, by the researchers for a sensitive and crucial model in the field
one of them is symptom severity which has 132 symptoms with of healthcare. The researchers continued with model
their corresponding weight shown in Fig. 3. The main dataset hypertuning after determining that model enhancement is
which conveys every disease with symptoms is encoded as necessary.
strings.

Fig. 6. Parameter Grid Setup

The researchers used a cross validation technique to find the


best optimal hyper parameters using GridSearchCV, the
researcher executed a thorough research over a predetermined
range of hyperparameter given in the Fig. 6 values and assesses
the model's performance using cross-validation to identify the
optimum hyperparameter combination.

Fig. 3. Symptom Severity Table

Fig. 7. GridSearchCV Output Values

The researchers discovered in Fig. 7 that using 1000 as the


value of the C regularization parameter, along with degree 3 of
a polynomial kernel, with scaling gamma, would give the model
an optimal score of 0.996. The researchers proceeded with
building the model with these hyper parameters setup.
Fig. 4. Cleaned Dataset

The researchers cleaned the secondary source of data from


Kaggle and applied preprocessing. The researchers thought of a
solution to express the strings in terms of numbers inside the
disease dataset by mapping symptom severity weight to every
symptom associated in the disease dataset. The processed data
frame shown in Fig. 4 contains the disease and every symptom
associated with it, the columns are extended up until the 18th
column for symptom 17. The researchers assessed the Jaccard
similarity coefficient of support vector machine classifier
without any hyper tuning using the formula shown as follows:

Fig. 8. Learning Curve of Hypertuned Support Vector Machine (SVM)


Fig. 5. Jaccard Similarity Coefficient Formula

Using the formula shown in Fig. 5, the SVM model's Jaccard The researchers used scikit-plot and matplotlib for plotting
score was calculated by the researchers to be 88.3871%, which and visualization of hyper-tuned models as presented in Fig. 8

153
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
by plotting the training and cross-validation (test) scores with prospect of being able to discover the complex pattern of disease
respect to the number of training examples. The researchers for individual patients represents a significant development in
found that despite the size of the training set, the training score health care.
remains high. The test score, on the other hand, improves as the
amount of the training dataset rises. It does grow until it hits an
endpoint, after all. When a model's generalization performance
reaches such an endpoint, it may not be worthwhile to collect
additional data to train the model because it will no longer
improve its accuracy. However, collecting additional data, such
as more diseases associated with symptoms, would be beneficial
since it could add more prediction clauses.

Fig. 10. Support Vector Machine (SVM) Performance Evaluation

The Hyper-tuned C-Support Vector Classification


Algorithm's outstanding results on Fig. 10 based on the disease
symptom dataset demonstrate both its potency and prospective
importance in the diagnosis and management of diseases. As the
field of machine learning and data-driven medical research
progresses, this algorithm's ability to contribute to improve
Fig. 9. Different Algorithms Evaluation Metrics
patient care and medical decision-making is undoubtedly a
In the course of conducting cross-validation in Fig. 9 on promising prospect. The ability to accurately identify diseases
diverse algorithms, it became evident that the Random Forest from symptoms data can significantly impact early diagnosis
Classifier emerged as a standout performer, exhibiting an and timely treatment, ultimately leading to improved healthcare
exceptional accuracy of 99.48%. While this accuracy rate outcomes. However, the researchers recommend to conduct a
aligned with the model developed in this study, it did not surpass further study to validate the model performance in more diverse
it. Concurrently, an examination of alternative models, datasets to guarantee its generalizability and robustness. In
including the Support Vector Machine with a linear kernel, addition to that, adding more known data such as diseases stated
revealed a relatively inadequate accuracy result of 78.25%. This in the dataset would not lead to higher accuracy based on Fig. 8,
outcome implies that the Support Vector Machine with a linear but adding more unseen data such as diseases associated with
kernel may not be well-suited for this study’s dataset or that symptoms that are not present in the dataset would lead to more
certain features may not be effectively linearly separable. The predictive clauses, and adding more variety to the model.
researchers successfully adapt the SVM’s potential to surpass ACKNOWLEDGMENT
the Random Forest Classifier by utilizing its parameters.
The researchers would like to express their gratitude to
V. CONCLUSION AND RECOMMENDATION Eulogio “Amang” Rodriguez Institute of Science and
Technology for the support of this study.
In relation to the study conducted by Ramezanpour, and
Mashaghi, entitled “Uncovering hidden disease patterns by REFERENCES
simulating clinical diagnosis processes”, which are made of
[1] L. J. Y. Flores, R. R. Tonato, G. A. dela Paz, and V. G. Ulep, “Optimizing
probabilistic models by using signs/symptoms to simulate a health facility location for universal health care: A case study from the
diagnosis. With regard to that, this study used symptoms Philippines,” PLOS ONE, vol. 16, no. 9, p. e0256821, Sep. 2021, doi:
patterns to simulate diagnosis based on the symptoms prompted https://doi.org/10.1371/journal.pone.0256821.
by the patients on the application [16]. Due to their dynamic [2] G. Battineni, G. G. Sagaro, N. Chinatalapudi, and F. Amenta,
nature, diseases can change and evolve throughout time. As “Applications of machine learning predictive models in the chronic
disease diagnosis,” Journal of Personalized Medicine, vol. 10, no. 2, Mar.
SVM is flexible for identifying diseases based on symptoms and 2020, doi: https://doi.org/10.3390/jpm10020021.
is well suited for analysis of high-dimensionality datasets, the [3] K. Balaskas, L. M. Bachmann, A. Bruynseels, A. K. Denniston, L. Faes,
researchers utilized it to identify diagnostic signs. In addition, it D. J. Fu, A. U. Kale, P. A. Keane, C. Kern, J. R. Ledsam, X. Liu, T.
introduces a new method for diagnosing diseases by Mahendiran, G. Moraes, M. Shamdas, M. K. Schmid, E. J. Topol, and S.
incorporating mobile devices, potentially providing a more K. Wagner, “A comparison of deep learning performance against health-
convenient experience for patients. Furthermore, the role of the care professionals in detecting diseases from medical imaging: a
systematic review and meta-analysis”. The Lancet Digital Health, 1(6),
algorithm on enhancing medical decision making is becoming e271–e297, Oct. 2019, doi: https://doi.org/10.1016/s2589-
increasingly promising as synergies between artificial 7500(19)30123-2.
intelligence and healthcare become more pronounced. The

154
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
[4] M. Diwakar, A. Tripathi, K. Joshi, M. Memoria, P. Singh, and N. Kumar, [10] C. G. Estonilo and E. D. Festijo, “Development of deep learning-based
“Latest trends on heart disease prediction using machine learning and mobile application for predicting Diabetes Mellitus,” Sep. 2021, doi:
image fusion,” Materials Today: Proceedings, vol. 37, pp. 3213–3218, https://doi.org/10.1109/ic2ie53219.2021.9649235.
Jan. 2021, doi: https://doi.org/10.1016/j.matpr.2020.09.078. [11] K. Dallas, J. N. Chiang, A. Caron, J. T. Anger, M. R. Kaufman, and A.
[5] R. Asaad, “Support vector machine classification learning algorithm for Lenore Ackerman, “Development and validation of a machine learning
diabetes prediction,” vol. 2, no. 2, 2022, doi: algorithm to classify Lower Urinary Tract symptoms,” Dec. 2022, doi:
https://doi.org/10.5281/zenodo.6975670. https://doi.org/10.1101/2022.12.25.22283168.
[6] J. Co, C. Lameseria, and R. Paiton, “Comparative analysis of using [12] P. Hamsagayathri and S. Vigneshwaran, “Symptoms based disease
Artificial Intelligence (AI) for diagnosis and treatment of Tuberculosis prediction using machine learning techniques,” Feb. 2021, doi:
and Diabetes,” July 2020, Available: https://www.dlsu.edu.ph/wp- https://doi.org/10.1109/icicv50876.2021.9388603.
content/uploads/pdf/research/journals/jciea/vol-5-1/5co.pdf. [13] S. Grampurohit and C. Sagarnal, "Disease prediction using machine
[7] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction using learning algorithms," 2020 International Conference for Emerging
machine learning techniques,” SN Computer Science, vol. 1, no. 6, Oct. Technology (INCET), Belgaum, India, 2020, pp. 1-7, doi:
2020, doi: https://doi.org/10.1007/s42979-020-00365-y. 10.1109/INCET49848.2020.9154130.
[8] P. Hema, N. Sunny, R. Venkata Naganjani, and A. Darbha, “Disease [14] R. F. Rivera, R. A. Pagaduan, J. A. Caliwag, F. C. Reyes, and R. Castillo,
prediction using symptoms based on machine learning Algorithms,” IEEE “A mobile expert system using fuzzy logic for diagnosing kidney
Xplore, Apr. 2022, https://ieeexplore.ieee.org/document/9914945. diseases,” Mar. 2019, doi: https://doi.org/10.1145/3322645.3322703.
[9] S. Lucas, M. Desai, A. Khot, S. Harriet, and N. Narkar, “SmartCare: A [15] R. P. C. Gamara, A. T. Teologo, R. Q. Neyra, and A. A. Bandala, “AI-
symptoms based disease prediction model using machine learning based diagnostic tool for liver disease using machine learning algorithms,”
approach,” International Journal for Research in Applied Science and Dec. 2022, doi: https://doi.org/10.1109/hnicem57413.2022.10109367.
Engineering Technology, vol. 10, no. 11, pp. 709–715, Nov. 2022, doi: [16] A. Ramezanpour, and A. Mashagi, “Uncovering hidden disease patterns
https://doi.org/10.22214/ijraset.2022.47434. by simulating clinical diagnostic processes,” Sci Rep 8, 2436, 2018, doi:
https://doi.org/10.1038/s41598-018-20826-y.

155
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.

You might also like