Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 1, March 2024, pp. 1005~1013
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp1005-1013 ๏ฒ 1005
Journal homepage: http://ijai.iaescore.com
Attention mechanism-based model for cardiomegaly recognition
in chest X-Ray images
Sara El Omary1
, Souad Lahrache2
, Rajae El Ouazzani1
1
Image Laboratory, Higher School of Technology, Moulay Ismail University of Meknes, Meknes, Morocco
2
LabSIV, Faculty of Sciences, Ibnou Zohr University of Agadir, Agadir, Morocco
Article Info ABSTRACT
Article history:
Received Aug 15, 2022
Revised Mar 9, 2023
Accepted Aug 2, 2023
Recently, cardiovascular diseases (CVDs) have become a rapidly
growing problem in the world, especially in developing countries. The
latter are facing a lifestyle change that introduces new risk factors for
heart disease, that requires a particular and urgent interest. Besides,
cardiomegaly is a sign of cardiovascular diseases that refers to various
conditions; it is associated with the heart enlargement that can be
either transient or permanent depending on certain conditions.
Furthermore, cardiomegaly is visible on any imaging test including
Chest X-Radiation (X-Ray) images; which are one of the most
common tools used by Cardiologists to detect and diagnose many
diseases. In this paper, we propose an innovative deep learning (DL)
model based on an attention module and MobileNet architecture to
recognize Cardiomegaly patients using the popular Chest X-Ray8
dataset. Actually, the attention module captures the spatial
relationship between the relevant regions in Chest X-Ray images. The
experimental results show that the proposed model achieved
interesting results with an accuracy rate of 81% which makes it
suitable for detecting cardiomegaly disease.
Keywords:
Attention
Cardiomegaly
Cardiovascular diseases
Chest X-Ray
Convolutional neural networks
MobileNet
This is an open access article under the CC BY-SA license.
Corresponding Author:
Sara El Omary
Image Laboratory, Higher School of Technology, Moulay Ismail University of Meknes
Marjane, Meknes 50050, Morocco
Email: elomarysr@gmail.com
1. INTRODUCTION
Cardiovascular diseases (CVDs) harm more than 23 million people around the world, which makes
heart diseases a principal health problem [1]. In the United Kingdom, CVDs are among the primary causes of
sudden deaths and disability. CVDs refer to a variety of conditions that affect the heart system and blood
arteries [1], [2]. In fact, there are no particular reasons behind CVDs, but there are many factors that can
augment their risk of development. The following are some factors of CVDs: high blood pressure is the most
critical factor, as it can cause damage to the blood vessel, cigarettes, cholesterol, diabetes, alcohol, unhealthy
food, physical inactivity, obesity, and family medical history [3]. However, people may prevent CVDs by
adopting a healthy lifestyle and making adjustments that reduce the risk of heart diseases. Besides,
cardiomegaly is a type of CVDs that can be described as a medical condition where the heart becomes larger
and often goes unnoticed until signs occur or the doctor orders imaging tests. Furthermore, the symptoms of
cardiomegaly do not appear until it has reached a critical stage that is characterized by abnormal heartbeats,
breathing problems, sensation of instability, fatigue, and swelling of certain parts of the body [4]. In addition,
๏ฒ ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013
1006
these symptoms are caused by ventricular hypertrophy or dilatation; ventricular hypertrophy corresponds to a
thickening of the ventricular wall, and ventricular dilatation to a thinning of the ventricular wall [5]. Doctors
use various techniques to diagnose Cardiomegaly including Echocardiogram, Chest X-Ray, computerized
tomography (CT) scan, and electrocardiogram [4]. The clinical diagnosis of chest radio-graphs can be difficult
and challenging; actually, reading and interpreting a Chest X-Ray image and extracting key information is a
difficult and time consuming task that requires an interesting doctorsโ€™ experience and can sometimes lead to
wrong results.
However, the existing approaches [6], [7] that have used Chest X-Ray images to detect Cardiomegaly
have various limitations such as ignoring the spatial relationship between regions of interest (ROI) in an image.
Actually, the analysis of ROI could improve the model performance. Hence, to overcome this restriction, we
design an innovative attention-based deep learning (DL) model that uses an attention module with MobileNet
to effectively capture critical regions, both at local and global levels, in Chest X-Ray images. Our proposed
DL model presents a novel approach that seamlessly integrates the attention mechanism into the MobileNet
framework, enabling more accurate identification of significant features and enhancing the model's
performance for Chest X-Ray image analysis. In particular, the attention module plays a crucial role in
capturing the most relevant and crucial regions within the Chest X-Ray images. Simultaneously, the MobileNet
convolution module focuses on detecting the activated regions through the rectified linear unit (ReLU)
function, employing a fixed kernel size. Moreover, the proposed model benefits from a reduced number of
trainable parameters as it utilizes pre-trained weights from the MobileNet architecture. Additionally, its
streamlined design eliminates the need for separate feature extraction and classification steps, commonly seen
in traditional machine learning approaches. As a result, the model becomes more efficient and readily
deployable for training and real-world applications.
The subsequent sections of the paper are structured as follows; the second section consists of related
work. Then, the third section presents the methodology details, in particular data preprocessing and the
convolutional neural network (CNN) architecture. Subsequently, the fourth section provides the results
obtained from the implemented CNN architecture in addition to the results discussion. Finally, the fifth section
encompasses the conclusion and outlines potential areas for future work.
2. RELATED WORK
Recently, machine and DL studies are widely investigated, as they have shown sophisticated results
in many problems such as CVDs. For example, El Omary et al. [8] employed serveral CNNs for the purpose
of detecting cardiac arrhythmia based on electrocardiogram (ECG) two-dimensional (2D) images; in addition,
they [9] utilized a variety of pre-trained CNN models to diagnose heart failure in Radiograph images. Next,
Yang et al. [10] introduced a model aiming at early heart failure diagnosis using a combination of Bayesian
principal component analysis (BPCA) and support vector machine (SVM) resulting in an accuracy rate of
74.4%. Afterwards, Miao et al. [11] employed DL to devise a system that enhances he dependability and
efficiency of CVDs diagnosis. Their approach involved a multi-layer model, leading to a recall rate of 72.86%
and a sensitivity rate of 93.51%. Furthermore, Son et al. [12] presented a model specifically designed for early-
stage diagnosis of heart failure in emergency rooms. They harnessed the potential of rough sets (RS) and
decision trees (DT) techniques, and attained an accuracy value of 97.5%. Further, Bar et al. [13] utilized an
image Net-based CNN architecture to identify various pathologies in Chest X-Ray images and obtained an
accuracy rate of 89%. Then, Acharya et al. [14] suggested a CNN model using electrocardiogram (ECG)
signals, and this model acquired an accuracy rate of 98.97%. Finally, Rubin et al. [15] proposed a new network
called DualNet that analyzes frontal and lateral Chest X-Ray images from the MIMIC Chest X-Ray (MIMIC-
CXR) dataset, and they procured an accuracy rate of 91%.
Actually, a single DL model may not be able to provide enough discriminative information for Chest
X-Ray images classification [16]. Due to this major issue, many researchers utilized ensemble learning
methods to train a set of algorithms to form robust models. Ensemble methods can be defined as a technique
that uses a collection of models rather than a single model to significantly improve experimental results [17].
There are three primary types of ensemble methods including bagging, boosting, and stacking [17]. Besides,
several studies have used ensemble methods, for example, Zhou et al. [18] employed a combination of various
artificial neural networks (ANNs) to recognize lung cancer cells. Next, Sasaki et al. [19] utilized an ensemble
model that can detect abnormalities in Chest X-Ray images. Meanwhile, Li et al. [20] used a variety of CNNs
with Chest X-Ray images of lung nodules to reduce the rate of false positive. Additionally, Islam et al. [21]
presented an ensemble model created by combining several different pre-trained DL models to detect lung
nodules as well. Finally, Chouhan et al. [22] suggested a model that combines ResNet-18, DenseNet-121,
AlexNet, GoogleNet, and Inception-V3 to diagnose pneumonia. However, ensemble learning methods still
have some weaknesses such as overfitting due to the small amount of medical data. Moreover, ensemble
Int J Artif Intell ISSN: 2252-8938 ๏ฒ
Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary)
1007
methods can be time and memory consuming, as they use a large number of parameters to extract key patterns
from input images.
3. METHODOLOGY
3.1. Data description
Actually, Chest X-Ray images are among the most common and economical medical imaging
procedures. National Institutes of Health (NIH) ChestX-ray8 is a public dataset containing various Chest X-
Ray images [23]. The NIH Chest X-Ray8 includes various images of 14 diseases, in particular 112,120 Chest
X-Ray images, including Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema,
Fibrosis, Effusion, Pneumonia, Pleural-thickening, Cardiomegaly, Chouhan at al. [22]. These diseases are
labeled from 30,805 patients, and the authors utilized natural language processing (NLP) tools to extract and
classify diseases using the associated radiology reports [24]. To classify patients with Cardiomegaly, we create
two classes, the first one represents Cardiomegaly, and the second one groups the other diseases under the
healthy class. Figure 1 exposes Chest X-Ray images of two distinct cases: one displaying a healthy patient and
the other showing a patient diagnosed with Cardiomegaly.
Figure 1. Healthy and Cardiomegaly patients' Chest X-Ray images [23]
3.2. Proposed architecture
We have proposed a model that combines the MobileNet and attention modules. MobileNet can be
defined as a simplified DL architecture that creates lightweight deep CNN using depthwise separable
convolutions and offers efficient models suitable for mobile and embedded vision applications [25]. MobileNet
has many advantages, including reduced network size, fewer parameters, speed, and applicability to real-time
applications [25]. In fact, the MobileNet model was chosen because it is among the five most accurate models
and has a small kernel size that allows extraction of low-level features, which is suitable for Chest X-Ray
images with fewer layers [26]. Moreover, the MobileNet provides an excellent feature extraction capability of
Chest X-Ray image classification. Figure 2 shows the construction of MobileNet using depthwise separable
filters [27].
Figure 2. The MobileNet architecture [27]
๏ฒ ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013
1008
Actually, depthwise separable convolution filters combine depthwise and pointwise convolution
filters [27], [28]. There are two primary types of separable convolutions exist: spatial separable convolutions
and depthwise separable convolutions [27]. First, the spatial separable convolution works mainly on the height
and width of the images and the kernel is divided into smaller elements [27]. For example, a kernel of 3ร—3
might be divided into 3ร—1 and 1ร—3 [27]. Then, the depth separable convolution name is derived from its
consideration of both the depth dimension (the number of channels) and the spatial dimensions [27]. An RGB
image has 3 channels which are red, blue, and green [27]. Moreover, a depth-separable convolution divides a
kernel into two distinct kernels that perform two convolutions which are the depth convolution and the
pointwise convolution, as for the spatial separable convolution [27]. The pointwise convolution is a
convolutional operation that utilizes a 1ร—1 kernel: a kernel that involves iterating through each point, and its
depth is equivalent to the number of channels in the input image [27]. As illustrated in Figure 3, that contains
subfigures 3(a) Standard convolutional filters, Figure 3(b) depthwise filters, and Figure 3(c) point filters. First,
standard convolutional filters, which can be defined as small matrices of numerical values, that slide over an
input image, performing a convolution operation at each location, to extract specific features from the image
such. Second, depthwise filters refer to a type of filter that performs convolution independently on each channel
of the input image, that can be defined as the depthwise convolution filter applies one convolution per input
channel, while the point convolution filter linearly mixes the depthwise convolution result with 1ร—1
convolutions.
In the following, we will highlight the modules that mainly compose the proposed architecture. The
architecture is composed of four building blocks which are the convolutional module, the attention module,
the fully connected (FC), and classification layers. First, the attention module is employed to retain the spatial
relationship of the visual information contained in the Chest X-Ray images. Next, the convolutional module is
used to extract the main features figured in our input data using the convolutional layers of the MobileNet
model, and then its output is given to the attention module. Further, we have the FC layers to concatenate the
features produced by the convolutional and attention blocks into a 1D representation. Finally, the last dense
layer is used to classify the input image as healthy or patient with Cardiomegaly disease using the sigmoid
function. However, in Figure 2, the global average pooling layer is oversimplified and the input images have
some regions that are more important than the others. Thus, we designed an attention mechanism to turn pixels
on and off, and then rescale the results using the Lambda layer based on the pixels' amount. Furthermore, the
attention layer is used to weight the processing of specific regions in the average pooling layer. Figure 4 and
Figure 5 expose more details about the proposed architecture. Then, Figure 6 illustrates the layers of the entire
model and each layer with its name, input vector, output vector, and how the components of the proposed
architecture are related to each other.
(a)
(b)
(c)
Figure 3. Illustration of
different types of convolutional
filters including: (a) standard
convolutional filters, (b)
depthwise filters, and (c) point
filters [27]
Figure 4. Number of parameters in the proposed architecture
Int J Artif Intell ISSN: 2252-8938 ๏ฒ
Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary)
1009
Figure 5. Summary of the proposed architecture
Figure 6. The detailed architecture of the proposed CNN
๏ฒ ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013
1010
4. RESULTS AND DISCUSSION
In this section, we will explore the results achieved by our proposed architecture, along with a
discussion of these results. Actually, the proposed CNN model and all experiments were implemented in
Python language using the Tensorflow and Keras libraries, which are open-source machine learning libraries
developed by Google for DL. In addition, due to resource limitations, we trained our model on the Kaggle
graphics processing unit (GPU) simulator. Furthermore, to evaluate the performance of classification models,
different metrics are required to differentiate between well-performing and non-performing models. Thus, we
utilize accuracy, precision, recall, F1-score, and ROC curve metrics as performance metrics. In the following,
we present the equations to calculate these metrics. We assume that TP are true positives, FP are false positives,
TN are true negatives, FN are false negatives, i is the class index, and S is the total number of classes.
The accuracy, the precision, and the recall can be calculated as (1)-(3) [8]:
๐ด๐‘๐‘๐‘ข๐‘Ÿ๐‘Ž๐‘๐‘ฆ =
1
๐‘†
โˆ‘๐‘†
๐‘–=1 (
๐‘‡๐‘ƒ๐‘–+๐‘‡๐‘๐‘–
๐‘‡๐‘ƒ๐‘–+๐‘‡๐‘๐‘–+๐น๐‘๐‘–+ ๐น๐‘๐‘–) (1)
๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› =
1
๐‘†
โˆ‘๐‘†
๐‘–=1 (
TPi
TP i + FPi) (2)
๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ =
1
๐‘†
โˆ‘๐‘†
๐‘–=1 (
๐‘‡๐‘ƒ๐‘–
๐‘‡๐‘ƒ๐‘–+ ๐น๐‘๐‘–) (3)
The F1-score can be calculated by considering both recall and precision [8]:
๐น1 โˆ’ ๐‘†๐‘๐‘œ๐‘Ÿ๐‘’ = 2 โˆ—
๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› โˆ— ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™
๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘œ๐‘› + ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™
(4)
The proposed model obtained an accuracy rate of 80%, a precision value of 66%, a recall rate of
62.5%, and an F1-score value of 64%. The achieved precision value indicates that the model correctly predicted
66% of the data, and the recall value acquired refers to 62.5% of the given results that are basically false and
classified well. Besides, the receiver operating characteristic (ROC) curve illustrates the trade-off between
specificity and sensitivity; where the specificity is the false positive rate (FPR), which corresponds to correctly
classified data, while the sensitivity is the true positive rate (TPR). The ROC curve is constructed by plotting
the TPR against the FPR. A classifier is considered effective when the curve approaches the upper left corner,
and the area under the ROC curve (AUC) value is close to one (0.74), indicating a strong classifier. Figure 7
exposes the ROC curve of the proposed model.
Figure 7. ROC curve of the proposed model
Int J Artif Intell ISSN: 2252-8938 ๏ฒ
Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary)
1011
In fact, comparing the proposed CNN model with other available algorithms and methods may be
inadequate due to various conditions including variations in the size of the used X-Ray images, the number of
classes employed for classification, and the used dataset. Additionally, the models may handle different data
characteristics that make the comparison unfair. However, the proposed model reached an accuracy rate of
81%, which outperformed the published results obtained by Bougias et al. [29], which are 71% and 81%
achieved by the Inception V3 and SqueezeNet models respectively. In addition, in terms of AUC value, our
proposed model achieved a value of 0.75 and surpassed the results obtained by Candemir et al. [30] that
achieved 0.61 using Inception V3. Additionally, Son et al. [12] used a large dataset named MIMIC-CXR and
achieved an accuracy of 89%, while it should be higher since the used dataset is huge.
Finally, the last part consists of presenting the predicted results using the unseen data of the test set.
After training and evaluating the proposed model, we have generated the performance scores to assess its
effectiveness. Then, we have used matplotlib functions to visualize the produced modelโ€™s predictions. Figure
8 shows some examples of Cardiomegaly (True) or healthy (False), and the attention map that provides the
prediction. These visualizations demonstrate the model's capability and ability to discern between
Cardiomegaly and healthy cases and contribute to gaining better understanding and deeper insights into its
performance and interpretability.
Figure 8. Examples of Cardiomegaly detection predicted using the proposed CNN model
5. CONCLUSION
Deep learning is a branch of artificial intelligence that empowers machines to acquire the ability to
learn on their own. Deep learning models imitate the learning process of the human brain, and have many
applications in the medical domain. In this paper, we introduced an innovative Deep Learning model that can
classify Chest X-Ray images of Cardiomegaly patients using the attention module with MobileNet. In addition,
the proposed model is composed of four main blocks, an attention module, a convolutional module, a fully
connected, and classifier layers. According to the results, we reached a classification accuracy rate of 81%, a
precision rate of 66%, a recall rate of 62.5%, and an F1-score value of 64%. In the future, we plan to use the
existing techniques of data augmentation including generative adversarial networks (GANs) and convolutional
autoencoder to enhance the efficiency of the classification model. In fact, data augmentation techniques are
applied to increase the number of images used in the model learning phase and reduce the overfitting risk.
Subsequently, we can use models that have a small filter size to extract the relevant part of the Chest X-Ray
images. Furthermore, this approach has been tested on Cardiomegaly disease, but it can also be applied to
detect the other diseases in the Chest X-Ray8 dataset.
REFERENCES
[1] Nawsherwan, W. Bin, Z. Le, S. Mubarik, G. Fu, and Y. Wang, โ€œPrediction of cardiovascular diseases mortality- and disability-
adjusted life-years attributed to modifiable dietary risk factors from 1990 to 2030 among East Asian countries and the world,โ€
๏ฒ ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013
1012
Frontiers in Nutrition, vol. 9, Oct. 2022, doi: 10.3389/fnut.2022.898978.
[2] L.-A. Bocancia-Mateescu, D. Stan, A.-C. Mirica, M. G. Ghita, D. Stan, and L. L. Ruta, โ€œNanobodies as Diagnostic and Therapeutic
Tools for Cardiovascular Diseases (CVDs),โ€ Pharmaceuticals, vol. 16, no. 6, p. 863, Jun. 2023, doi: 10.3390/ph16060863.
[3] D. Adhikary, S. Barman, R. Ranjan, and H. Stone, โ€œA Systematic Review of Major Cardiovascular Risk Factors: A Growing Global
Health Concern,โ€ Cureus, Oct. 2022, doi: 10.7759/cureus.30119.
[4] M.-P. S. T. S.Bhadauria, โ€œCardiomegaly: A brief review with basic and physiotherapeutic approach,โ€ Indian Journal of Physical
Rehabilitation,vol.2022.Available:https://www.researchgate.net/publication/363087892_Cardiomegaly_A_brief_review_with_bas
ic_and_physiotherapeutic_approach (accessed Nov. 9, 2022).
[5] S. Baudet, โ€œHypertrophy and dilation: a TOTally new story?,โ€ Cardiovascular Research, vol. 46, no. 1, pp. 17โ€“19, Apr. 2000, doi:
10.1016/S0008-6363(00)00015-8.
[6] A. Bouslama, Y. Laaziz, and A. Tali, โ€œDiagnosis and precise localization of cardiomegaly disease using U-NET,โ€ Informatics in
Medicine Unlocked, vol. 19, p. 100306, 2020, doi: 10.1016/j.imu.2020.100306.
[7] K. Almezhghwi, S. Serte, and F. Al-Turjman, โ€œConvolutional neural networks for the classification of chest X-rays in the IoT era,โ€
Multimedia Tools and Applications, vol. 80, no. 19, pp. 29051โ€“29065, Aug. 2021, doi: 10.1007/s11042-021-10907-y.
[8] S. El Omary, S. Lahrache, and R. El Ouazzani, โ€œA Lightweight CNN to Identify Cardiac Arrhythmia Using 2D ECG Images,โ€ 2022,
pp. 122โ€“160. doi: 10.4018/978-1-6684-2304-2.ch005.
[9] S. El Omary, S. Lahrache, and R. El Ouazzani, โ€œDetecting Heart Failure from Chest X-Ray Images Using Deep Learning
Algorithms,โ€ in 2021 3rd IEEE Middle East and North Africa COMMunications Conference (MENACOMM), Dec. 2021, pp. 13โ€“
18. doi: 10.1109/MENACOMM50742.2021.9678291.
[10] Guiqiu Yang et al., โ€œA heart failure diagnosis model based on support vector machine,โ€ in 2010 3rd International Conference on
Biomedical Engineering and Informatics, Oct. 2010, pp. 1105โ€“1108. doi: 10.1109/BMEI.2010.5639619.
[11] K. H. Miao and J. H., โ€œCoronary Heart Disease Diagnosis using Deep Neural Networks,โ€ International Journal of Advanced
Computer Science and Applications, vol. 9, no. 10, 2018, doi: 10.14569/IJACSA.2018.091001.
[12] C.-S. Son, W.-S. Kang, J.-H. Lee, and K. J. Moon, โ€œMachine Learning to Identify Psychomotor Behaviors of Delirium for Patients
in Long-Term Care Facility,โ€ IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 4, pp. 1802โ€“1814, Apr. 2022, doi:
10.1109/JBHI.2021.3116967.
[13] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and H. Greenspan, โ€œChest pathology detection using deep learning with non-
medical training,โ€ in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), Apr. 2015, pp. 294โ€“297. doi:
10.1109/ISBI.2015.7163871.
[14] U. R. Acharya, H. Fujita, S. L. Oh, Y. Hagiwara, J. H. Tan, and M. Adam, โ€œApplication of deep convolutional neural network for
automated detection of myocardial infarction using ECG signals,โ€ Information Sciences, vol. 415โ€“416, pp. 190โ€“198, Nov. 2017,
doi: 10.1016/j.ins.2017.06.027.
[15] J. Rubin, D. Sanghavi, C. Zhao, K. Lee, A. Qadir, and M. Xu-Wilson, โ€œLarge Scale Automated Reading of Frontal and Lateral
Chest X-Rays using Dual Convolutional Neural Networks,โ€ 2018, [Online]. Available: http://arxiv.org/abs/1804.07839
[16] U. Srinivas, โ€œDiscriminative models for robust image classification,โ€ 2016, [Online]. Available: http://arxiv.org/abs/1603.02736
[17] T. G. Dietterich, โ€œEnsemble Methods in Machine Learning,โ€ 2000, pp. 1โ€“15. doi: 10.1007/3-540-45014-9_1.
[18] C. Zhou et al., โ€œFinal overall survival results from a randomised, phase III study of erlotinib versus chemotherapy as first-line
treatment of EGFR mutation-positive advanced non-small-cell lung cancer (OPTIMAL, CTONG-0802),โ€ Annals of Oncology, vol.
26, no. 9, pp. 1877โ€“1883, Sep. 2015, doi: 10.1093/annonc/mdv276.
[19] Y. Sasaki, K. Abe, M. Tabei, S. Katsuragawa, A. Kurosaki, and S. Matsuoka, โ€œClinical usefulness of temporal subtraction method
in screening digital chest radiography with a mobile computed radiography system,โ€ Radiological Physics and Technology, vol. 4,
no. 1, pp. 84โ€“90, Jan. 2011, doi: 10.1007/s12194-010-0109-7.
[20] C. Li, G. Zhu, X. Wu, and Y. Wang, โ€œFalse-Positive Reduction on Lung Nodules Detection in Chest Radiographs by Ensemble of
Convolutional Neural Networks,โ€ IEEE Access, vol. 6, pp. 16060โ€“16067, 2018, doi: 10.1109/ACCESS.2018.2817023.
[21] S. R. Islam, S. P. Maity, A. K. Ray, and M. Mandal, โ€œAutomatic Detection of Pneumonia on Compressed Sensing Images using
Deep Learning,โ€ in 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), May 2019, pp. 1โ€“4. doi:
10.1109/CCECE.2019.8861969.
[22] V. Chouhan et al., โ€œA Novel Transfer Learning Based Approach for Pneumonia Detection in Chest X-ray Images,โ€ Applied
Sciences, vol. 10, no. 2, p. 559, Jan. 2020, doi: 10.3390/app10020559.
[23] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, โ€œChestX-ray8: Hospital-scale chest X-ray database and
benchmarks on weakly-supervised classification and localization of common thorax diseases,โ€ Proceedings - 30th IEEE Conference
on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 3462โ€“3471, 2017, doi: 10.1109/CVPR.2017.369.
[24] N. I. of H. C. X.-R. Dataset, โ€œNIH Chest X-rays,โ€ NIH Chest X-rays, 2018.
[25] H. A. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto,
โ€œMobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,โ€ Computer Vision and Pattern
Recognition, vol. 14, no. 2, pp. 53โ€“57, 2009, doi: 10.48550/arXiv.1704.04861.
[26] W. Wang, Y. Li, T. Zou, X. Wang, J. You, and Y. Luo, โ€œA Novel Image Classification Approach via Dense-MobileNet Models,โ€
Mobile Information Systems, vol. 2020, pp. 1โ€“8, Jan. 2020, doi: 10.1155/2020/7602384.
[27] W. Wang, Y. Hu, T. Zou, H. Liu, J. Wang, and X. Wang, โ€œA New Image Classification Approach via Improved MobileNet Models
with Local Receptive Field Expansion in Shallow Layers,โ€ Computational Intelligence and Neuroscience, vol. 2020, pp. 1โ€“10, Aug.
2020, doi: 10.1155/2020/8817849.
[28] G. Wang, G. Yuan, T. Li, and M. Lv, โ€œAn multi-scale learning network with depthwise separable convolutions,โ€ IPSJ Transactions
on Computer Vision and Applications, vol. 10, no. 1, p. 11, Dec. 2018, doi: 10.1186/s41074-018-0047-6.
[29] H. Bougias, E. Georgiadou, C. Malamateniou, and N. Stogiannos, โ€œIdentifying cardiomegaly in chest X-rays: a cross-sectional study
of evaluation and comparison between different transfer learning methods,โ€ Acta Radiologica, vol. 62, no. 12, pp. 1601โ€“1609, Dec.
2021, doi: 10.1177/0284185120973630.
[30] S. Candemir, S. Rajaraman, G. Thoma, and S. Antani, โ€œDeep Learning for Grading Cardiomegaly Severity in Chest X-Rays: An
Investigation,โ€ in 2018 IEEE Life Sciences Conference (LSC), Oct. 2018, pp. 109โ€“113. doi: 10.1109/LSC.2018.8572113.
Int J Artif Intell ISSN: 2252-8938 ๏ฒ
Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary)
1013
BIOGRAPHIES OF AUTHORS
Sara El Omary received a B.Sc. degree in computer science from the Higher
School of Technology of Oujda, Morroco in 2016, then an M.Sc. degree in data science from
the Faculty of Science Semlalia of Marrakech, Morocco in 2020. She is currently a Ph.D.
candidate at the Moulay Ismail University of Meknes (Morocco). Her main areas of interest
include machine learning, deep learning, image preprocessing, and computer vision. She can
be contacted at email: elomarysr@gmail.com.
Souad Lahrache Professor at the Faculty of Science, University Ibnou Zohr,
Agadir, Morocco. She obtained a Ph.D. from the Faculty of Sciences of the University
Moulay Ismail of Meknes, Morocco. She has published several papers in peer-reviewed
journals and international conferences. Her research interests include image processing,
pattern recognition, computer vision, and machine learning. She can be contacted at email:
souadlahrache@gmail.com.
Rajae El Ouazzani Received her masterโ€™s degree in computer science and
telecommunication by the Mohammed V University of Rabat (Morocco) in 2006 and the
Ph.D. in image and video processing by the High National School of Computer Science and
Systems Analysis (Morocco) in 2010. From 2011, she is a Professor in the High School of
Technology of Meknes, Moulay Ismail University in Morocco. Since 2007, she is an author
of several papers in international journals and conferences. Her domains of interest include
multimedia data processing and telecommunications. She can be contacted at email:
elouazzanirajae@gmail.com.

More Related Content

Attention mechanism-based model for cardiomegaly recognition in chest X-Ray images

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 1, March 2024, pp. 1005~1013 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i1.pp1005-1013 ๏ฒ 1005 Journal homepage: http://ijai.iaescore.com Attention mechanism-based model for cardiomegaly recognition in chest X-Ray images Sara El Omary1 , Souad Lahrache2 , Rajae El Ouazzani1 1 Image Laboratory, Higher School of Technology, Moulay Ismail University of Meknes, Meknes, Morocco 2 LabSIV, Faculty of Sciences, Ibnou Zohr University of Agadir, Agadir, Morocco Article Info ABSTRACT Article history: Received Aug 15, 2022 Revised Mar 9, 2023 Accepted Aug 2, 2023 Recently, cardiovascular diseases (CVDs) have become a rapidly growing problem in the world, especially in developing countries. The latter are facing a lifestyle change that introduces new risk factors for heart disease, that requires a particular and urgent interest. Besides, cardiomegaly is a sign of cardiovascular diseases that refers to various conditions; it is associated with the heart enlargement that can be either transient or permanent depending on certain conditions. Furthermore, cardiomegaly is visible on any imaging test including Chest X-Radiation (X-Ray) images; which are one of the most common tools used by Cardiologists to detect and diagnose many diseases. In this paper, we propose an innovative deep learning (DL) model based on an attention module and MobileNet architecture to recognize Cardiomegaly patients using the popular Chest X-Ray8 dataset. Actually, the attention module captures the spatial relationship between the relevant regions in Chest X-Ray images. The experimental results show that the proposed model achieved interesting results with an accuracy rate of 81% which makes it suitable for detecting cardiomegaly disease. Keywords: Attention Cardiomegaly Cardiovascular diseases Chest X-Ray Convolutional neural networks MobileNet This is an open access article under the CC BY-SA license. Corresponding Author: Sara El Omary Image Laboratory, Higher School of Technology, Moulay Ismail University of Meknes Marjane, Meknes 50050, Morocco Email: elomarysr@gmail.com 1. INTRODUCTION Cardiovascular diseases (CVDs) harm more than 23 million people around the world, which makes heart diseases a principal health problem [1]. In the United Kingdom, CVDs are among the primary causes of sudden deaths and disability. CVDs refer to a variety of conditions that affect the heart system and blood arteries [1], [2]. In fact, there are no particular reasons behind CVDs, but there are many factors that can augment their risk of development. The following are some factors of CVDs: high blood pressure is the most critical factor, as it can cause damage to the blood vessel, cigarettes, cholesterol, diabetes, alcohol, unhealthy food, physical inactivity, obesity, and family medical history [3]. However, people may prevent CVDs by adopting a healthy lifestyle and making adjustments that reduce the risk of heart diseases. Besides, cardiomegaly is a type of CVDs that can be described as a medical condition where the heart becomes larger and often goes unnoticed until signs occur or the doctor orders imaging tests. Furthermore, the symptoms of cardiomegaly do not appear until it has reached a critical stage that is characterized by abnormal heartbeats, breathing problems, sensation of instability, fatigue, and swelling of certain parts of the body [4]. In addition,
  • 2. ๏ฒ ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013 1006 these symptoms are caused by ventricular hypertrophy or dilatation; ventricular hypertrophy corresponds to a thickening of the ventricular wall, and ventricular dilatation to a thinning of the ventricular wall [5]. Doctors use various techniques to diagnose Cardiomegaly including Echocardiogram, Chest X-Ray, computerized tomography (CT) scan, and electrocardiogram [4]. The clinical diagnosis of chest radio-graphs can be difficult and challenging; actually, reading and interpreting a Chest X-Ray image and extracting key information is a difficult and time consuming task that requires an interesting doctorsโ€™ experience and can sometimes lead to wrong results. However, the existing approaches [6], [7] that have used Chest X-Ray images to detect Cardiomegaly have various limitations such as ignoring the spatial relationship between regions of interest (ROI) in an image. Actually, the analysis of ROI could improve the model performance. Hence, to overcome this restriction, we design an innovative attention-based deep learning (DL) model that uses an attention module with MobileNet to effectively capture critical regions, both at local and global levels, in Chest X-Ray images. Our proposed DL model presents a novel approach that seamlessly integrates the attention mechanism into the MobileNet framework, enabling more accurate identification of significant features and enhancing the model's performance for Chest X-Ray image analysis. In particular, the attention module plays a crucial role in capturing the most relevant and crucial regions within the Chest X-Ray images. Simultaneously, the MobileNet convolution module focuses on detecting the activated regions through the rectified linear unit (ReLU) function, employing a fixed kernel size. Moreover, the proposed model benefits from a reduced number of trainable parameters as it utilizes pre-trained weights from the MobileNet architecture. Additionally, its streamlined design eliminates the need for separate feature extraction and classification steps, commonly seen in traditional machine learning approaches. As a result, the model becomes more efficient and readily deployable for training and real-world applications. The subsequent sections of the paper are structured as follows; the second section consists of related work. Then, the third section presents the methodology details, in particular data preprocessing and the convolutional neural network (CNN) architecture. Subsequently, the fourth section provides the results obtained from the implemented CNN architecture in addition to the results discussion. Finally, the fifth section encompasses the conclusion and outlines potential areas for future work. 2. RELATED WORK Recently, machine and DL studies are widely investigated, as they have shown sophisticated results in many problems such as CVDs. For example, El Omary et al. [8] employed serveral CNNs for the purpose of detecting cardiac arrhythmia based on electrocardiogram (ECG) two-dimensional (2D) images; in addition, they [9] utilized a variety of pre-trained CNN models to diagnose heart failure in Radiograph images. Next, Yang et al. [10] introduced a model aiming at early heart failure diagnosis using a combination of Bayesian principal component analysis (BPCA) and support vector machine (SVM) resulting in an accuracy rate of 74.4%. Afterwards, Miao et al. [11] employed DL to devise a system that enhances he dependability and efficiency of CVDs diagnosis. Their approach involved a multi-layer model, leading to a recall rate of 72.86% and a sensitivity rate of 93.51%. Furthermore, Son et al. [12] presented a model specifically designed for early- stage diagnosis of heart failure in emergency rooms. They harnessed the potential of rough sets (RS) and decision trees (DT) techniques, and attained an accuracy value of 97.5%. Further, Bar et al. [13] utilized an image Net-based CNN architecture to identify various pathologies in Chest X-Ray images and obtained an accuracy rate of 89%. Then, Acharya et al. [14] suggested a CNN model using electrocardiogram (ECG) signals, and this model acquired an accuracy rate of 98.97%. Finally, Rubin et al. [15] proposed a new network called DualNet that analyzes frontal and lateral Chest X-Ray images from the MIMIC Chest X-Ray (MIMIC- CXR) dataset, and they procured an accuracy rate of 91%. Actually, a single DL model may not be able to provide enough discriminative information for Chest X-Ray images classification [16]. Due to this major issue, many researchers utilized ensemble learning methods to train a set of algorithms to form robust models. Ensemble methods can be defined as a technique that uses a collection of models rather than a single model to significantly improve experimental results [17]. There are three primary types of ensemble methods including bagging, boosting, and stacking [17]. Besides, several studies have used ensemble methods, for example, Zhou et al. [18] employed a combination of various artificial neural networks (ANNs) to recognize lung cancer cells. Next, Sasaki et al. [19] utilized an ensemble model that can detect abnormalities in Chest X-Ray images. Meanwhile, Li et al. [20] used a variety of CNNs with Chest X-Ray images of lung nodules to reduce the rate of false positive. Additionally, Islam et al. [21] presented an ensemble model created by combining several different pre-trained DL models to detect lung nodules as well. Finally, Chouhan et al. [22] suggested a model that combines ResNet-18, DenseNet-121, AlexNet, GoogleNet, and Inception-V3 to diagnose pneumonia. However, ensemble learning methods still have some weaknesses such as overfitting due to the small amount of medical data. Moreover, ensemble
  • 3. Int J Artif Intell ISSN: 2252-8938 ๏ฒ Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary) 1007 methods can be time and memory consuming, as they use a large number of parameters to extract key patterns from input images. 3. METHODOLOGY 3.1. Data description Actually, Chest X-Ray images are among the most common and economical medical imaging procedures. National Institutes of Health (NIH) ChestX-ray8 is a public dataset containing various Chest X- Ray images [23]. The NIH Chest X-Ray8 includes various images of 14 diseases, in particular 112,120 Chest X-Ray images, including Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural-thickening, Cardiomegaly, Chouhan at al. [22]. These diseases are labeled from 30,805 patients, and the authors utilized natural language processing (NLP) tools to extract and classify diseases using the associated radiology reports [24]. To classify patients with Cardiomegaly, we create two classes, the first one represents Cardiomegaly, and the second one groups the other diseases under the healthy class. Figure 1 exposes Chest X-Ray images of two distinct cases: one displaying a healthy patient and the other showing a patient diagnosed with Cardiomegaly. Figure 1. Healthy and Cardiomegaly patients' Chest X-Ray images [23] 3.2. Proposed architecture We have proposed a model that combines the MobileNet and attention modules. MobileNet can be defined as a simplified DL architecture that creates lightweight deep CNN using depthwise separable convolutions and offers efficient models suitable for mobile and embedded vision applications [25]. MobileNet has many advantages, including reduced network size, fewer parameters, speed, and applicability to real-time applications [25]. In fact, the MobileNet model was chosen because it is among the five most accurate models and has a small kernel size that allows extraction of low-level features, which is suitable for Chest X-Ray images with fewer layers [26]. Moreover, the MobileNet provides an excellent feature extraction capability of Chest X-Ray image classification. Figure 2 shows the construction of MobileNet using depthwise separable filters [27]. Figure 2. The MobileNet architecture [27]
  • 4. ๏ฒ ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013 1008 Actually, depthwise separable convolution filters combine depthwise and pointwise convolution filters [27], [28]. There are two primary types of separable convolutions exist: spatial separable convolutions and depthwise separable convolutions [27]. First, the spatial separable convolution works mainly on the height and width of the images and the kernel is divided into smaller elements [27]. For example, a kernel of 3ร—3 might be divided into 3ร—1 and 1ร—3 [27]. Then, the depth separable convolution name is derived from its consideration of both the depth dimension (the number of channels) and the spatial dimensions [27]. An RGB image has 3 channels which are red, blue, and green [27]. Moreover, a depth-separable convolution divides a kernel into two distinct kernels that perform two convolutions which are the depth convolution and the pointwise convolution, as for the spatial separable convolution [27]. The pointwise convolution is a convolutional operation that utilizes a 1ร—1 kernel: a kernel that involves iterating through each point, and its depth is equivalent to the number of channels in the input image [27]. As illustrated in Figure 3, that contains subfigures 3(a) Standard convolutional filters, Figure 3(b) depthwise filters, and Figure 3(c) point filters. First, standard convolutional filters, which can be defined as small matrices of numerical values, that slide over an input image, performing a convolution operation at each location, to extract specific features from the image such. Second, depthwise filters refer to a type of filter that performs convolution independently on each channel of the input image, that can be defined as the depthwise convolution filter applies one convolution per input channel, while the point convolution filter linearly mixes the depthwise convolution result with 1ร—1 convolutions. In the following, we will highlight the modules that mainly compose the proposed architecture. The architecture is composed of four building blocks which are the convolutional module, the attention module, the fully connected (FC), and classification layers. First, the attention module is employed to retain the spatial relationship of the visual information contained in the Chest X-Ray images. Next, the convolutional module is used to extract the main features figured in our input data using the convolutional layers of the MobileNet model, and then its output is given to the attention module. Further, we have the FC layers to concatenate the features produced by the convolutional and attention blocks into a 1D representation. Finally, the last dense layer is used to classify the input image as healthy or patient with Cardiomegaly disease using the sigmoid function. However, in Figure 2, the global average pooling layer is oversimplified and the input images have some regions that are more important than the others. Thus, we designed an attention mechanism to turn pixels on and off, and then rescale the results using the Lambda layer based on the pixels' amount. Furthermore, the attention layer is used to weight the processing of specific regions in the average pooling layer. Figure 4 and Figure 5 expose more details about the proposed architecture. Then, Figure 6 illustrates the layers of the entire model and each layer with its name, input vector, output vector, and how the components of the proposed architecture are related to each other. (a) (b) (c) Figure 3. Illustration of different types of convolutional filters including: (a) standard convolutional filters, (b) depthwise filters, and (c) point filters [27] Figure 4. Number of parameters in the proposed architecture
  • 5. Int J Artif Intell ISSN: 2252-8938 ๏ฒ Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary) 1009 Figure 5. Summary of the proposed architecture Figure 6. The detailed architecture of the proposed CNN
  • 6. ๏ฒ ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013 1010 4. RESULTS AND DISCUSSION In this section, we will explore the results achieved by our proposed architecture, along with a discussion of these results. Actually, the proposed CNN model and all experiments were implemented in Python language using the Tensorflow and Keras libraries, which are open-source machine learning libraries developed by Google for DL. In addition, due to resource limitations, we trained our model on the Kaggle graphics processing unit (GPU) simulator. Furthermore, to evaluate the performance of classification models, different metrics are required to differentiate between well-performing and non-performing models. Thus, we utilize accuracy, precision, recall, F1-score, and ROC curve metrics as performance metrics. In the following, we present the equations to calculate these metrics. We assume that TP are true positives, FP are false positives, TN are true negatives, FN are false negatives, i is the class index, and S is the total number of classes. The accuracy, the precision, and the recall can be calculated as (1)-(3) [8]: ๐ด๐‘๐‘๐‘ข๐‘Ÿ๐‘Ž๐‘๐‘ฆ = 1 ๐‘† โˆ‘๐‘† ๐‘–=1 ( ๐‘‡๐‘ƒ๐‘–+๐‘‡๐‘๐‘– ๐‘‡๐‘ƒ๐‘–+๐‘‡๐‘๐‘–+๐น๐‘๐‘–+ ๐น๐‘๐‘–) (1) ๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› = 1 ๐‘† โˆ‘๐‘† ๐‘–=1 ( TPi TP i + FPi) (2) ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ = 1 ๐‘† โˆ‘๐‘† ๐‘–=1 ( ๐‘‡๐‘ƒ๐‘– ๐‘‡๐‘ƒ๐‘–+ ๐น๐‘๐‘–) (3) The F1-score can be calculated by considering both recall and precision [8]: ๐น1 โˆ’ ๐‘†๐‘๐‘œ๐‘Ÿ๐‘’ = 2 โˆ— ๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› โˆ— ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ ๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘œ๐‘› + ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ (4) The proposed model obtained an accuracy rate of 80%, a precision value of 66%, a recall rate of 62.5%, and an F1-score value of 64%. The achieved precision value indicates that the model correctly predicted 66% of the data, and the recall value acquired refers to 62.5% of the given results that are basically false and classified well. Besides, the receiver operating characteristic (ROC) curve illustrates the trade-off between specificity and sensitivity; where the specificity is the false positive rate (FPR), which corresponds to correctly classified data, while the sensitivity is the true positive rate (TPR). The ROC curve is constructed by plotting the TPR against the FPR. A classifier is considered effective when the curve approaches the upper left corner, and the area under the ROC curve (AUC) value is close to one (0.74), indicating a strong classifier. Figure 7 exposes the ROC curve of the proposed model. Figure 7. ROC curve of the proposed model
  • 7. Int J Artif Intell ISSN: 2252-8938 ๏ฒ Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary) 1011 In fact, comparing the proposed CNN model with other available algorithms and methods may be inadequate due to various conditions including variations in the size of the used X-Ray images, the number of classes employed for classification, and the used dataset. Additionally, the models may handle different data characteristics that make the comparison unfair. However, the proposed model reached an accuracy rate of 81%, which outperformed the published results obtained by Bougias et al. [29], which are 71% and 81% achieved by the Inception V3 and SqueezeNet models respectively. In addition, in terms of AUC value, our proposed model achieved a value of 0.75 and surpassed the results obtained by Candemir et al. [30] that achieved 0.61 using Inception V3. Additionally, Son et al. [12] used a large dataset named MIMIC-CXR and achieved an accuracy of 89%, while it should be higher since the used dataset is huge. Finally, the last part consists of presenting the predicted results using the unseen data of the test set. After training and evaluating the proposed model, we have generated the performance scores to assess its effectiveness. Then, we have used matplotlib functions to visualize the produced modelโ€™s predictions. Figure 8 shows some examples of Cardiomegaly (True) or healthy (False), and the attention map that provides the prediction. These visualizations demonstrate the model's capability and ability to discern between Cardiomegaly and healthy cases and contribute to gaining better understanding and deeper insights into its performance and interpretability. Figure 8. Examples of Cardiomegaly detection predicted using the proposed CNN model 5. CONCLUSION Deep learning is a branch of artificial intelligence that empowers machines to acquire the ability to learn on their own. Deep learning models imitate the learning process of the human brain, and have many applications in the medical domain. In this paper, we introduced an innovative Deep Learning model that can classify Chest X-Ray images of Cardiomegaly patients using the attention module with MobileNet. In addition, the proposed model is composed of four main blocks, an attention module, a convolutional module, a fully connected, and classifier layers. According to the results, we reached a classification accuracy rate of 81%, a precision rate of 66%, a recall rate of 62.5%, and an F1-score value of 64%. In the future, we plan to use the existing techniques of data augmentation including generative adversarial networks (GANs) and convolutional autoencoder to enhance the efficiency of the classification model. In fact, data augmentation techniques are applied to increase the number of images used in the model learning phase and reduce the overfitting risk. Subsequently, we can use models that have a small filter size to extract the relevant part of the Chest X-Ray images. Furthermore, this approach has been tested on Cardiomegaly disease, but it can also be applied to detect the other diseases in the Chest X-Ray8 dataset. REFERENCES [1] Nawsherwan, W. Bin, Z. Le, S. Mubarik, G. Fu, and Y. Wang, โ€œPrediction of cardiovascular diseases mortality- and disability- adjusted life-years attributed to modifiable dietary risk factors from 1990 to 2030 among East Asian countries and the world,โ€
  • 8. ๏ฒ ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 1005-1013 1012 Frontiers in Nutrition, vol. 9, Oct. 2022, doi: 10.3389/fnut.2022.898978. [2] L.-A. Bocancia-Mateescu, D. Stan, A.-C. Mirica, M. G. Ghita, D. Stan, and L. L. Ruta, โ€œNanobodies as Diagnostic and Therapeutic Tools for Cardiovascular Diseases (CVDs),โ€ Pharmaceuticals, vol. 16, no. 6, p. 863, Jun. 2023, doi: 10.3390/ph16060863. [3] D. Adhikary, S. Barman, R. Ranjan, and H. Stone, โ€œA Systematic Review of Major Cardiovascular Risk Factors: A Growing Global Health Concern,โ€ Cureus, Oct. 2022, doi: 10.7759/cureus.30119. [4] M.-P. S. T. S.Bhadauria, โ€œCardiomegaly: A brief review with basic and physiotherapeutic approach,โ€ Indian Journal of Physical Rehabilitation,vol.2022.Available:https://www.researchgate.net/publication/363087892_Cardiomegaly_A_brief_review_with_bas ic_and_physiotherapeutic_approach (accessed Nov. 9, 2022). [5] S. Baudet, โ€œHypertrophy and dilation: a TOTally new story?,โ€ Cardiovascular Research, vol. 46, no. 1, pp. 17โ€“19, Apr. 2000, doi: 10.1016/S0008-6363(00)00015-8. [6] A. Bouslama, Y. Laaziz, and A. Tali, โ€œDiagnosis and precise localization of cardiomegaly disease using U-NET,โ€ Informatics in Medicine Unlocked, vol. 19, p. 100306, 2020, doi: 10.1016/j.imu.2020.100306. [7] K. Almezhghwi, S. Serte, and F. Al-Turjman, โ€œConvolutional neural networks for the classification of chest X-rays in the IoT era,โ€ Multimedia Tools and Applications, vol. 80, no. 19, pp. 29051โ€“29065, Aug. 2021, doi: 10.1007/s11042-021-10907-y. [8] S. El Omary, S. Lahrache, and R. El Ouazzani, โ€œA Lightweight CNN to Identify Cardiac Arrhythmia Using 2D ECG Images,โ€ 2022, pp. 122โ€“160. doi: 10.4018/978-1-6684-2304-2.ch005. [9] S. El Omary, S. Lahrache, and R. El Ouazzani, โ€œDetecting Heart Failure from Chest X-Ray Images Using Deep Learning Algorithms,โ€ in 2021 3rd IEEE Middle East and North Africa COMMunications Conference (MENACOMM), Dec. 2021, pp. 13โ€“ 18. doi: 10.1109/MENACOMM50742.2021.9678291. [10] Guiqiu Yang et al., โ€œA heart failure diagnosis model based on support vector machine,โ€ in 2010 3rd International Conference on Biomedical Engineering and Informatics, Oct. 2010, pp. 1105โ€“1108. doi: 10.1109/BMEI.2010.5639619. [11] K. H. Miao and J. H., โ€œCoronary Heart Disease Diagnosis using Deep Neural Networks,โ€ International Journal of Advanced Computer Science and Applications, vol. 9, no. 10, 2018, doi: 10.14569/IJACSA.2018.091001. [12] C.-S. Son, W.-S. Kang, J.-H. Lee, and K. J. Moon, โ€œMachine Learning to Identify Psychomotor Behaviors of Delirium for Patients in Long-Term Care Facility,โ€ IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 4, pp. 1802โ€“1814, Apr. 2022, doi: 10.1109/JBHI.2021.3116967. [13] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and H. Greenspan, โ€œChest pathology detection using deep learning with non- medical training,โ€ in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), Apr. 2015, pp. 294โ€“297. doi: 10.1109/ISBI.2015.7163871. [14] U. R. Acharya, H. Fujita, S. L. Oh, Y. Hagiwara, J. H. Tan, and M. Adam, โ€œApplication of deep convolutional neural network for automated detection of myocardial infarction using ECG signals,โ€ Information Sciences, vol. 415โ€“416, pp. 190โ€“198, Nov. 2017, doi: 10.1016/j.ins.2017.06.027. [15] J. Rubin, D. Sanghavi, C. Zhao, K. Lee, A. Qadir, and M. Xu-Wilson, โ€œLarge Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks,โ€ 2018, [Online]. Available: http://arxiv.org/abs/1804.07839 [16] U. Srinivas, โ€œDiscriminative models for robust image classification,โ€ 2016, [Online]. Available: http://arxiv.org/abs/1603.02736 [17] T. G. Dietterich, โ€œEnsemble Methods in Machine Learning,โ€ 2000, pp. 1โ€“15. doi: 10.1007/3-540-45014-9_1. [18] C. Zhou et al., โ€œFinal overall survival results from a randomised, phase III study of erlotinib versus chemotherapy as first-line treatment of EGFR mutation-positive advanced non-small-cell lung cancer (OPTIMAL, CTONG-0802),โ€ Annals of Oncology, vol. 26, no. 9, pp. 1877โ€“1883, Sep. 2015, doi: 10.1093/annonc/mdv276. [19] Y. Sasaki, K. Abe, M. Tabei, S. Katsuragawa, A. Kurosaki, and S. Matsuoka, โ€œClinical usefulness of temporal subtraction method in screening digital chest radiography with a mobile computed radiography system,โ€ Radiological Physics and Technology, vol. 4, no. 1, pp. 84โ€“90, Jan. 2011, doi: 10.1007/s12194-010-0109-7. [20] C. Li, G. Zhu, X. Wu, and Y. Wang, โ€œFalse-Positive Reduction on Lung Nodules Detection in Chest Radiographs by Ensemble of Convolutional Neural Networks,โ€ IEEE Access, vol. 6, pp. 16060โ€“16067, 2018, doi: 10.1109/ACCESS.2018.2817023. [21] S. R. Islam, S. P. Maity, A. K. Ray, and M. Mandal, โ€œAutomatic Detection of Pneumonia on Compressed Sensing Images using Deep Learning,โ€ in 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), May 2019, pp. 1โ€“4. doi: 10.1109/CCECE.2019.8861969. [22] V. Chouhan et al., โ€œA Novel Transfer Learning Based Approach for Pneumonia Detection in Chest X-ray Images,โ€ Applied Sciences, vol. 10, no. 2, p. 559, Jan. 2020, doi: 10.3390/app10020559. [23] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, โ€œChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,โ€ Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 3462โ€“3471, 2017, doi: 10.1109/CVPR.2017.369. [24] N. I. of H. C. X.-R. Dataset, โ€œNIH Chest X-rays,โ€ NIH Chest X-rays, 2018. [25] H. A. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, โ€œMobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,โ€ Computer Vision and Pattern Recognition, vol. 14, no. 2, pp. 53โ€“57, 2009, doi: 10.48550/arXiv.1704.04861. [26] W. Wang, Y. Li, T. Zou, X. Wang, J. You, and Y. Luo, โ€œA Novel Image Classification Approach via Dense-MobileNet Models,โ€ Mobile Information Systems, vol. 2020, pp. 1โ€“8, Jan. 2020, doi: 10.1155/2020/7602384. [27] W. Wang, Y. Hu, T. Zou, H. Liu, J. Wang, and X. Wang, โ€œA New Image Classification Approach via Improved MobileNet Models with Local Receptive Field Expansion in Shallow Layers,โ€ Computational Intelligence and Neuroscience, vol. 2020, pp. 1โ€“10, Aug. 2020, doi: 10.1155/2020/8817849. [28] G. Wang, G. Yuan, T. Li, and M. Lv, โ€œAn multi-scale learning network with depthwise separable convolutions,โ€ IPSJ Transactions on Computer Vision and Applications, vol. 10, no. 1, p. 11, Dec. 2018, doi: 10.1186/s41074-018-0047-6. [29] H. Bougias, E. Georgiadou, C. Malamateniou, and N. Stogiannos, โ€œIdentifying cardiomegaly in chest X-rays: a cross-sectional study of evaluation and comparison between different transfer learning methods,โ€ Acta Radiologica, vol. 62, no. 12, pp. 1601โ€“1609, Dec. 2021, doi: 10.1177/0284185120973630. [30] S. Candemir, S. Rajaraman, G. Thoma, and S. Antani, โ€œDeep Learning for Grading Cardiomegaly Severity in Chest X-Rays: An Investigation,โ€ in 2018 IEEE Life Sciences Conference (LSC), Oct. 2018, pp. 109โ€“113. doi: 10.1109/LSC.2018.8572113.
  • 9. Int J Artif Intell ISSN: 2252-8938 ๏ฒ Attention mechanism based model for cardiomegaly recognition in Chest X-Ray images (Sara El Omary) 1013 BIOGRAPHIES OF AUTHORS Sara El Omary received a B.Sc. degree in computer science from the Higher School of Technology of Oujda, Morroco in 2016, then an M.Sc. degree in data science from the Faculty of Science Semlalia of Marrakech, Morocco in 2020. She is currently a Ph.D. candidate at the Moulay Ismail University of Meknes (Morocco). Her main areas of interest include machine learning, deep learning, image preprocessing, and computer vision. She can be contacted at email: elomarysr@gmail.com. Souad Lahrache Professor at the Faculty of Science, University Ibnou Zohr, Agadir, Morocco. She obtained a Ph.D. from the Faculty of Sciences of the University Moulay Ismail of Meknes, Morocco. She has published several papers in peer-reviewed journals and international conferences. Her research interests include image processing, pattern recognition, computer vision, and machine learning. She can be contacted at email: souadlahrache@gmail.com. Rajae El Ouazzani Received her masterโ€™s degree in computer science and telecommunication by the Mohammed V University of Rabat (Morocco) in 2006 and the Ph.D. in image and video processing by the High National School of Computer Science and Systems Analysis (Morocco) in 2010. From 2011, she is a Professor in the High School of Technology of Meknes, Moulay Ismail University in Morocco. Since 2007, she is an author of several papers in international journals and conferences. Her domains of interest include multimedia data processing and telecommunications. She can be contacted at email: elouazzanirajae@gmail.com.