Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

29JOICS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/355730060

Prediction of Fatty Liver Disease using Machine Learning Algorithms

Article · September 2021

CITATIONS READS
10 714

2 authors:

Moels Sahaja Kondala Kameswara Rao Nynalasetti


Universitas Gadjah Mada SRKR Engineering College
1 PUBLICATION 10 CITATIONS 48 PUBLICATIONS 144 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Kondala Kameswara Rao Nynalasetti on 29 October 2021.

The user has requested enhancement of the downloaded file.


Journal of Information and Computational Science ISSN: 1548-7741

Prediction of Fatty Liver Disease using Machine Learning


Algorithms

M.Sahaja1, Dr.N.K Kameswara Rao2,


Student, Department of Computer Science, Sagi Rama Krishna Raju
Engineering College, Bhimavaram, AP, India.
Associate Professor, Department of Computer Science, Sagi Rama Krishna Raju Engineering
College, Bhimavaram, AP, India.

Abstract: Fatty Liver Disease (FLD) is very dangerous disease that can cause to any person
irrespective of age. Machine Learning (ML) is most widely used in medical data analysis. These
ML algorithms are used to find the hidden information for the diagnosis of FLD. Presently; many
people are suffering with liver disorders. Among all the liver disorder diseases fatty liver disease
is most complicated disease. The main aim of this paper is to predict fatty liver disease using
different machine learning algorithms. The algorithms used for this purpose of work are Naive
Bayes (NB), Support Vector Machines (SVM) and XGBoost (XGB) with ANN is developed.
Accuracy score and confusion matrix is used to compare this classification algorithm.

Keywords: SVM, NB, XGB, ANN.

Introduction
Fatty liver disease (FLD) is a regular medical issue; it's far likewise linked with excessive
horribleness and mortality. FLD in the end activates noncholestatic cirrhosis and hepatocellular
carcinoma [1]. Furthermore, FLD has been increasing in corresponding with the predominance
of diabetes, metabolic disease and stoutness [2]. Higher pervasiveness of FLD has confirmed up
as a greater distinguished economic weight. Consequently, particular distinguishing evidence of
human beings in threat and early acknowledgment of FLD ought to provide titanic blessings for
finding, preventive or plenty valid treatment. Over the preceding decade, the biopsy has been
applied to delineate sufferers, and taken into consideration as an analytic reference general for
the assessment of greasy penetration of the liver. Notwithstanding, this method is profoundly
glaring and expensive; it likewise might also additionally cause incidental outcomes and
analyzing errors at some stage in the usage of this strategy. Despite the truth that,
ultrasonography is using as a beneficial tool for FLD willpower with better exactness, at the
same time as distinguishing precision is extraordinarily administrator subordinate [3].

Machine Learning (ML) is a area of laptop technology that makes use of device calculations to
differentiate designs in full-size records, and moreover assist to assume the one of a kind end

Volume 11 Issue 9 - 2021 636 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

result depending on records [4]. ML techniques have arisen as anticipated equipment for
expectation and dynamic in a big quantity of pupils [5]. Because of the accessibility of medical
records, ML has been assuming a fundamental component in medical dynamic also [6, 7].
Fostering an ML version might fill in as a tremendous manual to differentiate contamination and
choose a non-stop feasible medical preference. It might likewise don't forget enhancement of
scientific hospital property with the aid of using arranging proper sufferers with big some threat
elements prior. These days, many examinations have successfully been researched medical
imaging strategies like ultrasound (US), Computed tomography (CT), and Magnetic resonance
imaging (MRI) for liver values data. Ultrasound imaging is noninvasive, modest, easy to work,
and convenient.

Literature Survey
Reddy et al., [10] propose a convolution neural network based PC supported conclusion
calculation for classifying the ultrasound liver parenchyma surface into four classes. The
proposed calculation is examined utilizing 1000 surface pictures containing 250 pictures having
a place with each class. Execution investigation shows that the proposed structure arranges the
surface with a precision of 93.5% when 80% and 20% of information utilized for preparing and
testing separately.

Chao-Feng Chang et al., [11] proposed the discovery of nonalcoholic greasy liver infection
(NAFLD). NAFLD is a free danger factor for pancreatic disease as per changed multivariate
calculated relapse investigation (OR 2.63, 95% CI 1.24-5.58, p = 0.011). The Kaplan-Meier
endurance bend uncovers that patients without NAFLD have longer endurance than patients with
NAFLD (p = 0.005, log-rank test). NAFLD is decidedly associated with pancreatic malignancy,
an outcome proposing that NAFLD might expand the rate and hazard of pancreatic disease.
Patients with pancreatic malignancy and NAFLD have less fortunate generally endurance than
patients without NAFLD, maybe, in light of the fact that dysregulated cytokine status prompts
movement of pancreatic disease. NAFLD might be a prognostic factor for pancreatic disease.

The investigation was set up as follows the commitment to the classifiers delivered is set apart
for the plain model [12]. The initial step is to aggregate a data file. The information base can be
amassed from the UCI store, bringing about the assortment of a couple of benchmark data sets.
The most un-confounded procedure, for instance, takes from recently given experts. The data set
comprises of harsh measurements and featured regards, so the past preparing technique is
extremely basic. Pre-handling of data. Model: The benefit is that it decreases the exploratory
computation and makes the measure of data accessible to the full scope of data and drawing
capacities. The benefit of the distinguished sub-thing is the methodology. It is feasible to
recognize conceivable and overabundance properties [13].

Processing Options: Computing alternatives are an exceptionally fundamental development as


far as time spent on directed contraption preparing. Assessed classifiers depend on expectations.

Volume 11 Issue 9 - 2021 637 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

[Cutting stage partitioned by blend scope of hypothesis]. There are a few regions where three
procedures can be utilized to decide classifier precision. For instance, 2/3 and 1/3 of the lead can
be gone through the guarantee, and one guarantee can be forgotten [14]. At the point when the
misfortune rate is high, we need to return to the past phase of acquiring information on the
interaction. Different components should be inspected and rectified utilizing restoration and
methodology. Acquire information than other study's over liver infections. Securing and
uncontrolled procurement of the executives information is the best plan, and its plan is
unlabelled. Utilizing these uncontrolled estimations, researchers need to track down the covered
up and immaterial in light of the fact that it is helpful. An unpracticed expert is particularly
called a nose. Gathering is consistently there and has made an enormous number of bundles.
Consider the many difficulties confronting numerous particular regions. Regardless, there is no
gathering estimation that can be utilized to keep everything equivalent. "It was truly hard to
make an extra assortment construction to ponder at a specific stage, particularly in a crisis.

Perspectives toward the gathering "[15]. AK jain acknowledgment bunch procedures are
separated into 5 classifications, various levels, densities are primarily delegated network-based,
model based methodologies. Securing information on support helps is one more sort of AI
estimation that requires programming administrators and machines to characterize modes that are
customized to explicit conditions and to build execution. All together for a specialist to notice
their conduct, a fundamental honor objection is required.

Figure 1: System Architecture

Figure 1 explains the several steps for executing the algorithms that are used to detect the fatty
liver detection.

Pre-Processing Module

This is the one of the important step in machine learning to reduce the noise and removal of
irrelevant data in the dataset. These factors show the impact on predicted output and also
improve the performance with this preprocessing step.

Volume 11 Issue 9 - 2021 638 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

Training

This progression is generally critical in this detecting FLD. The perceptions in the preparation set
structure the experience that the calculation uses to learn. In administered learning issues, every
perception comprises of a noticed yield variable and at least one noticed info factors.

Variable determination Module

In this interaction, we evaluated the heaviness of every factor by the data acquire positioning
cycle. It assisted with assessing the adequacy of remembered factors for the preparation dataset.
We included just those factors into the last model structure whose score was > 0 in the data
acquire positioning. We utilized forward choice model for variable decrease measure in our
present review.

Dataset Description

The total of 70000 cases with 13 attributes was amassed for the Liver informational collection
from kaggle. The trait ―diagnosis‖ depicted as the quantifiable are with free sure infer two
methods individual with liver infection and one imply that individual with presently not liver
sickness. 10 ascribes upsides of liver infection informational index .The informational index
having 416 no cases and 167 yes cases.

Figure 2: Dataset Description

Methodology

Artificial Neural Networks (ANN) are computational models that imitate the natural neural
organizations. It is exceptionally amazing nonlinear demonstrating which is as of now
demonstrated for precise forecasts in numerous CDS. This model comprises of various
counterfeit neural units called "perceptron". ANN is very like the natural neural cell where the
sign is communicated into neuron through dendrite. It reproduces the sign transmission through
an info layer to a few secret layers, lastly a yield layer. Notwithstanding, each layer contains

Volume 11 Issue 9 - 2021 639 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

numerous perceptron, and the perceptron between layers are associated by various loads that can
be changed in preparing the calculations.

Advanced Feature Extraction

The extracted highlights from Wavelet packet Transfer (WPT) coefficients are middle, standard
deviation, and interquartile range. The middle of the picture shows the mathematical worth of
power isolating the higher portion of pixel powers in a window of the picture from the lower
half. The middle worth of power in US greasy liver pictures is higher than the force of typical
ones because of their expanded echogenicity brought about by fat collection. This trademark
likewise exists in WPT subbands of FDL data. Standard deviation shows the variety from the
mean. The interquartile range estimates the scattering and is the distinction between the upper
and lower quartiles. The standard deviation and the interquartile scope of the picture address the
consistency or perfection. These two parameters might be acceptable pointers to recognize
dataset values of central and diffused disease of the liver.

Hybrid Machine Learning Algorithm for fatty liver disease detection

FLD is very unpredictable disease that can occur to human beings belongs to any age. Based on
the daily activities and habits of the person this disease will occur. In this project, to detect the
FLD in the early stages or in the present stage the hybrid algorithm is introduced. This is the
combination of the ANN and XG-Boost (XGB) to work efficiently when compare with the other
existing algorithms. The proposed algorithm is focused on recognizing patterns and classification
to predict the new data.

XGBoost is the most widely used to process the complex datasets and also this is based on
decision tree and also called as ensemble algorithm that adopts the gradient boosting framework.
XGB is very easy to handle and this more accurate to implement by using gradient boosting
machines that improves the computation strength and increase the performance of the model. By
applying a Ann on our data set alone gives less performance metrics 70 per accuracy is found
which is not a good model prediction. And another XG-boost algorithm also alone when applied
gives result of performance 82 accuracy which can be improved. Hence by using hybrid of these
algorithms we can evaluate the better performance of our model and obtain higher accuracy
comparatively.

Advantages of Proposed System

 Showing the improved accuracy for detecting fatty liver disease.

 Reducing the computation time.

 This can process the complex and large datasets.

Volume 11 Issue 9 - 2021 640 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

Experimental Results
The experiments are conducted by using python programming language. Libraries such as keras,
pandas and sklearn are used to develop python programming language.

Algorithms Accuracy Sensitivity Specificity Recall


NB 59.02 59.13 58.80 76.61
RF 78.17 78.28 77.95 88.86
ANN 88.01 88.12 87.79 90.01
XG-Boost 84.73 84.84 84.51 90.16
Hybrid
90.25 90.35 90.02 91.12
Algorithm
Table 1: Performance of Various Algorithms

Figure 3: Calculation of Accuracy

Figure 4: Calculation of Sensitivity

Volume 11 Issue 9 - 2021 641 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

Figure 5: Calculation of Specificity

Figure 6: Calculation of Recall

Conclusion
From the analysis of the result we can conclude that our proposed model by combination of
hybrid algorithm we can obtain higher performance results for detection of fatty liver disease
this method will lead effectively in identify of fld for novel diagnosis.

References
[1] M. Lazo, J.M. Clark, in: The Epidemiology of Nonalcoholic Fatty Liver Disease: A Global
Perspective: Seminars in Liver Disease, 28, © Thieme Medical Publishers, 2008, pp. 339–350.
[2] M.H. Le, P. Devaki, N.B. Ha, D.W. Jun, H.S. Te, R.C. Cheung, M.H. Nguyen, Prevalence of
non-alcoholic fatty liver disease and risk factors for advanced fibrosis and mortality in the
United States, PLoS One 12 (2017) e0173499.
[3] Q.M. Anstee, G. Targher, C.P. Day, Progression of NAFLD to diabetes mellitus,
cardiovascular disease or cirrhosis, Nat. Rev. Gastroenterol. Hepatol. 10 (2013) 330–344.

Volume 11 Issue 9 - 2021 642 www.joics.org


Journal of Information and Computational Science ISSN: 1548-7741

[4] M. Motwani, D. Dey, D.S. Berman, G. Germano, S. Achenbach, M.H. Al-Mallah, D.


Andreini, M.J. Budoff, F. Cademartiri, T.Q. Callister, Machine learning for prediction of all-
cause mortality in patients with suspected coronary artery disease: a 5-year multicentre
prospective registry analysis, Eur. Heart J. 38 (2016) 500–507.
[5] Sani A. Machine Learning for Decision Making, Université de Lille 1, 2015,
[6] W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: promise and potential,
Health Inf. Sci. Syst. 2 (2014) 3.
[7] P. Groves, B. Kayyali, D. Knott, S.V. Kuiken, The ’Big Data’ Revolution in Healthcare:
Accelerating Value and Innovation, 2016.
[8] A. Andrade, J.S. Silva, J. Santos, P. Belo-Soares, Classifier approaches for liver steatosis
using ultrasound images, Procedia Technol. 5 (2012) 763–770.
[9] R. Ribeiro, J. Sanches, Fatty liver characterization and classification by ultrasound, in:
Iberian Conference on Pattern Recognition and Image Analysis, Springer, 2009, pp. 354–361.
[10] D. S. Reddy, R. Bharath and P. Rajalakshmi, "Classification of Nonalcoholic Fatty Liver
Texture Using Convolution Neural Networks," 2018 IEEE 20th International Conference on e-
Health Networking, Applications and Services (Healthcom), 2018, pp. 1-5.

[11] Chang CF, Tseng YC, Huang HH, Shih YL, Hsieh TY, Lin HH. Exploring the relationship
between nonalcoholic fatty liver disease and pancreatic cancer by computed tomographic
survey. Intern Emerg Med. 2018 Mar;13(2):191-197.

[12] Logistic regression, recovery from:


Https://WWW.SAEDSAYAD.COM/LOGISTIC_REGRESSION.HTM, Last accessed: 5 October,
2019.

[13] P. Sajda, “Detection and diagnostic machinery disease. "Annu. Ilch. Biomed. Eng. 8
(2006); 537-565.

[14] UCI machine learning storage area. ILPD (liver of India) Patient database) Information
package.https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+ Patient+database).

[15] L. Breiman, Random Forest. Machine training, 45 (1), (2001); 5–32.


https://doi.org/10.1023/A:1010933404324.

Volume 11 Issue 9 - 2021 643 www.joics.org


View publication stats

You might also like