The Development of Mobile-Based Symptom Analysis For Early Detection of Diseases Using Hyper-Tuned C-Support Vector Classification Algorithm
The Development of Mobile-Based Symptom Analysis For Early Detection of Diseases Using Hyper-Tuned C-Support Vector Classification Algorithm
Abstract—One of the fundamental human needs that everyone up during the occurrence of symptoms to detect the possible
should be able to access is healthcare, but most people are unable cause of the changes in physical or psychological aspects of an
to do so for a variety of reasons, including remote areas, a lack of individual. With regards to that, individuals are able to prevent
healthcare facilities, and a lack of financial assistance. In this study, the continuous development of the disease and avoid its more
the researchers aim to diagnose users' diseases associated with serious effects. As mentioned, not everyone can afford medical
symptoms. To achieve this objective, the researchers built and check-ups every time they experience such symptoms.
leveraged an advanced predictive model upon the Hyper-Tuned
C-Support Vector Classification Algorithm. This model served as In this study, the researchers aim to develop a mobile-based
the cornerstone for the analysis, harnessing the power of machine diagnosis tool for early detection of diseases using Hyper-tuned
learning to accurately diagnose diseases. The researchers gathered C-Support Vector Classification Algorithm. A supervised
secondary data from Kaggle, and have used rigorous performance learning approach is used for classification and regression tasks.
metrics to test and analyze the accuracy and effectiveness of the
model. The study showed a promising result for contributing to II. RELATED WORKS
the early detection of diseases using machine learning methods.
Battineni et al. [2] focused on algorithms in machine
Keywords—Support Vector Machine (SVM), Disease, Hyper
learning for diagnosing different chronic diseases, highlighting
Tuning, Machine Learning, Predictive Model, Disease Detection, the lack of standard methods in real-time and actual medical
Healthcare, Diagnosis, Artificial Intelligence. practice. Logistic Regression, Support vector machine, and
clustering were the algorithms most used in disease
I. INTRODUCTION classification and diagnosis.
Health care is one of the primary human needs that everyone Liu et al. [3] conducted a study on the comparison of the
should have access to, however, most people do not have the performance on diagnostics between various deep learning
capability to acquire healthcare services due to several factors techniques with that of healthcare professionals, finding similar
such as remote locations, limited healthcare facilities, and a lack accuracy. However, the studies lacked externally validated
of financial support. These factors constraint individuals’ ability results and suffered from poor reporting, calling for new
to monitor their health and leave them unaware of their reporting standards tailored to deep learning challenges.
condition. In the study of Flores et al, about 50% of the Diwakar et al. [4] emphasized the importance of early and
population in the Philippines are experiencing lack of access to accurate disease diagnosis, the study focused on using image
basic healthcare services and do not have healthcare facilities fusion and various machine learning techniques for the
within 30 minutes away from the neighborhood [1]. Although prediction of different heart diseases. The researchers proposed
the government has already passed historic legislation known as an effective algorithm with enhanced precision through feature
the Universal Health Care (UHC) Act of 2019 to address the selection in order to aid with difficulties in accessing healthcare
issue, it will take quite some time before it officially or medical services.
accommodates individuals. Given the fact that individuals might
still confront one of the factors mentioned above, it is necessary Asaad [5] made a technical report that presents a novel
that the person know the significance of having a medical check- approach for using machine learning techniques in classifying
151
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
available, ensuring accurate classification of diseases based on inaccuracies in the analysis. To maintain the integrity of the
symptoms. dataset, the researchers assigned appropriate values to the null
entries, ensuring that no critical information was overlooked or
A. Support Vector Machine Algorithm omitted. This comprehensive procedure was crucial as
A supervised machine learning approach known as a Support disregarding it could introduce inconsistencies and hinder the
Vector Machine (SVM) can be used to separate data that is linear reliability of the dataset for subsequent analyses and predictions.
and nonlinear. In contrast with Extreme Learning Machines that
stands out for its simplicity and quick training for big data, but
it also has some drawbacks that makes it not the ideal algorithm
to use. Some of the drawbacks are less control over feature
extraction, sensitivity to initialization and the challenges in
interpreting model decisions. A Neural Network might also be a
good choice due to its versatility and the ability of handling
complex tasks, but due to its high cost computational
requirements that are not able to be achieved due to limited
resources. With regards to this, the proposed algorithm was
chosen due to its capability and considered as low cost.
In dealing with real-world data, the Support Vector Machine
(SVM) is capable of handling the data through manipulating the
kernel, also known as kernel trick. The following is the SVM
formula that make use of of the poly kernel:
Fig. 1. Ratio of Null Values
B. Features Used
Features Definition
152
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
researchers saved the model for external use, allowing it to be indicates that the model successfully achieved a similarity of
utilized by the mobile application. about 88.39% between its projected outcomes and the actual
events. According to this result, the model's predictions were
IV. RESULT AND DISCUSS accurate in 88.39% of the cases. This score was considered low
The gathered dataset from Kaggle consists of three tables, by the researchers for a sensitive and crucial model in the field
one of them is symptom severity which has 132 symptoms with of healthcare. The researchers continued with model
their corresponding weight shown in Fig. 3. The main dataset hypertuning after determining that model enhancement is
which conveys every disease with symptoms is encoded as necessary.
strings.
Using the formula shown in Fig. 5, the SVM model's Jaccard The researchers used scikit-plot and matplotlib for plotting
score was calculated by the researchers to be 88.3871%, which and visualization of hyper-tuned models as presented in Fig. 8
153
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
by plotting the training and cross-validation (test) scores with prospect of being able to discover the complex pattern of disease
respect to the number of training examples. The researchers for individual patients represents a significant development in
found that despite the size of the training set, the training score health care.
remains high. The test score, on the other hand, improves as the
amount of the training dataset rises. It does grow until it hits an
endpoint, after all. When a model's generalization performance
reaches such an endpoint, it may not be worthwhile to collect
additional data to train the model because it will no longer
improve its accuracy. However, collecting additional data, such
as more diseases associated with symptoms, would be beneficial
since it could add more prediction clauses.
154
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.
[4] M. Diwakar, A. Tripathi, K. Joshi, M. Memoria, P. Singh, and N. Kumar, [10] C. G. Estonilo and E. D. Festijo, “Development of deep learning-based
“Latest trends on heart disease prediction using machine learning and mobile application for predicting Diabetes Mellitus,” Sep. 2021, doi:
image fusion,” Materials Today: Proceedings, vol. 37, pp. 3213–3218, https://doi.org/10.1109/ic2ie53219.2021.9649235.
Jan. 2021, doi: https://doi.org/10.1016/j.matpr.2020.09.078. [11] K. Dallas, J. N. Chiang, A. Caron, J. T. Anger, M. R. Kaufman, and A.
[5] R. Asaad, “Support vector machine classification learning algorithm for Lenore Ackerman, “Development and validation of a machine learning
diabetes prediction,” vol. 2, no. 2, 2022, doi: algorithm to classify Lower Urinary Tract symptoms,” Dec. 2022, doi:
https://doi.org/10.5281/zenodo.6975670. https://doi.org/10.1101/2022.12.25.22283168.
[6] J. Co, C. Lameseria, and R. Paiton, “Comparative analysis of using [12] P. Hamsagayathri and S. Vigneshwaran, “Symptoms based disease
Artificial Intelligence (AI) for diagnosis and treatment of Tuberculosis prediction using machine learning techniques,” Feb. 2021, doi:
and Diabetes,” July 2020, Available: https://www.dlsu.edu.ph/wp- https://doi.org/10.1109/icicv50876.2021.9388603.
content/uploads/pdf/research/journals/jciea/vol-5-1/5co.pdf. [13] S. Grampurohit and C. Sagarnal, "Disease prediction using machine
[7] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction using learning algorithms," 2020 International Conference for Emerging
machine learning techniques,” SN Computer Science, vol. 1, no. 6, Oct. Technology (INCET), Belgaum, India, 2020, pp. 1-7, doi:
2020, doi: https://doi.org/10.1007/s42979-020-00365-y. 10.1109/INCET49848.2020.9154130.
[8] P. Hema, N. Sunny, R. Venkata Naganjani, and A. Darbha, “Disease [14] R. F. Rivera, R. A. Pagaduan, J. A. Caliwag, F. C. Reyes, and R. Castillo,
prediction using symptoms based on machine learning Algorithms,” IEEE “A mobile expert system using fuzzy logic for diagnosing kidney
Xplore, Apr. 2022, https://ieeexplore.ieee.org/document/9914945. diseases,” Mar. 2019, doi: https://doi.org/10.1145/3322645.3322703.
[9] S. Lucas, M. Desai, A. Khot, S. Harriet, and N. Narkar, “SmartCare: A [15] R. P. C. Gamara, A. T. Teologo, R. Q. Neyra, and A. A. Bandala, “AI-
symptoms based disease prediction model using machine learning based diagnostic tool for liver disease using machine learning algorithms,”
approach,” International Journal for Research in Applied Science and Dec. 2022, doi: https://doi.org/10.1109/hnicem57413.2022.10109367.
Engineering Technology, vol. 10, no. 11, pp. 709–715, Nov. 2022, doi: [16] A. Ramezanpour, and A. Mashagi, “Uncovering hidden disease patterns
https://doi.org/10.22214/ijraset.2022.47434. by simulating clinical diagnostic processes,” Sci Rep 8, 2436, 2018, doi:
https://doi.org/10.1038/s41598-018-20826-y.
155
Authorized licensed use limited to: Mapua University. Downloaded on March 06,2024 at 02:51:56 UTC from IEEE Xplore. Restrictions apply.