1 Introduction

The coronavirus disease broke out in Wuhan China at the end of 2019 and infected thousands within the early few weeks. Initially, it was named the Wuhan virus but later the disease was title COrona VIrus Disease 2019 (COVID-19) by the World Health Organization (WHO) Singhal (2020), Lai et al. (2020). COVID-19 is an ongoing pandemic that is spreading rapidly with every passing day and the cases have been reported from 213 countries and territories by the end of April 2020. Figure 1 shows total number of confirmed cases (2,810,325) and deaths (193,825) in all the countries affected from it Organization (2020). The suspected cases are those with signs are fever, sore throat, and cough which later turns into severe pneumonia. Study Chen et al. (2020a) reports 50% fever and 38% cough signs among the initial patients from Wuhan. Other reported symptoms are dyspnea, headache, and rhinorrhea Di Gennaro et al. (2020). However, the patients can be asymptomatic showing no signs in the initial days and still carry the disease.

Fig. 1
figure 1

COVID-19 cases for all the countries, left y-axis shows total confirmed cases in million, and right y-axis shows total number of deaths in million Organization (2020)

The period of COVID-19 symptoms to death ranges from 6 to 41 days while the median is 14 days. The period depends on the age and immunity of the patient. For example, the study Wu et al. (2020) shows a short period for patients with age> 70 than those under the age of 70 years. The suspected patients are initially diagnosed using real-time fluorescence (RT-PCR), throat swabs, and secretions of the lower tract samples Lippi et al. (2020). Chest Computed Tomography (CT) and chest X-ray are among the imaging findings that can help detect the abnormalities in the lungs of COVID-19 patients. The features extracted from CT and X-ray images are called radiological features.

Chest CT proves to be efficient for the detection of COVID-19 cases at the early stage and various imaging findings are reported in Kanne (2020), Kim (2020). Previous studies show that the imaging findings of radiological features can be very helpful for COVID-19 patients detection as patients have shown GGO lesions, bilateral patchy shadowing, and local patchy shadowing when CT is performed Huang et al. (2020a), Wang et al. (2020), Pan et al. (2020). Chest X-ray shows patchy shadowing, interstitial abnormalities, septal thickening, and crazy-paving pattern in COVID-19 patients Tu et al. (2020). Although the imaging findings from COVID-19 patients are non-conclusive, 17.9% of non-severe and 2.9% of severe cases did not have any CT or X-ray abnormality Chen et al. (2020c), X-ray can be used for early detection of the patients.

With the exponential growth of the malady, the available medical staff is unable to keep pace with the efficient diagnosis of the patients. Automate diagnosis of patients from radiological features is possible to lessen the burden of medical staff and increase their efficacy. Deep learning techniques have proven fruitful for disease detection from chest CT Li et al. (2020), Shi et al. (2020). Chest X-rays, however, have not been used so often for COVID-19 detection. This study aims at devising a Convolutional Neural Network (CNN) based model that can classify the patients into COVID-19 and normal pneumonia patients using chest X-rays. In summary, the proposed system makes the following contributions:

  • A CNN base model is proposed that can accurately classify the patients into COVID-19 and pneumonia patients based on the chest X-rays of the patients

  • The proposed approach is tested for three scenarios which involve two (COVID-19, normal people), three (COVID-19, normal people, and virus pneumonia)and four (COVID-19 and normal people, virus pneumonia, and bacterial pneumonia) class classification.

  • Image preprocessing steps are defined that can help in accurate edge detection and segmentation of the infected area and thus increase the accuracy of the proposed model.

  • An open dataset of COVID-19 patients containing chest X-rays is utilized to evaluate the performance of the proposed model and the results are compared with state-of-the-art deep learning-based approaches including VGG16 and AlexNet.

  • Lack of training images is compensated with images generated augmented image data generator of Keras. Total images are 10,000 for training and testing the proposed approach.

The rest of the paper is organized in the following manner. Section  2 describes the research work related to the current study. The components and functionality of the proposed system are discussed in Sect. 3. Results are given in Sect.  4, while Sect. 5 contains the discussions and conclusion.

2 Related work

Deep learning is a well known research area in artificial intelligence. It provides promising results with end to end modeling with out manual feature engineering in medical image classification Umer et al. (2020), multi-label image classification He et al. (2020), text categorization Imtiaz et al. (2020), lung cancer detection Yamunadevi and Ranjani (2020), ECG classification Huang et al. (2020b), glucoma diagnosis Ajesh et al. (2020) and athlete gesture tracking Long (2019). Since the start of COVID-19, researchers start focusing on vaccine development, detection of SARS-CoV-2 using medical images, salivary specimen detection Bajaj et al. (2020), factors affecting mortality of physicians and nurses Jackson et al. (2020) and clinical feature analysis Zhao et al. (2020).

Research Li et al. (2020) presents an approach which utilizes CNN on the chest CT for detecting COVID-19 patients’. A deep learning model COVID-19 detection neural network (COVNet) is designed which extracts visual features from chest CT. The CT exams for community-acquired pneumonia and non-pneumonia CT exams are added in the dataset as well to evaluate the proposed model. Results show that the model sensitivity and specificity are 114 of 127 (90%) and 294 of 307 (96%) for detecting COVID-19 patients.

Segmentation is an important and pivotal step for machine learning-based approaches that aim to detect COVID-19 patients through imaging techniques. It delimits the infected areas called regions of interest (ROIs) that can be used for further processing and analysis. So many research works proposed deep learning-based approaches for CT segmentation for the quantification and prediction of COVID-19. The U-Net designed by Ronneberger et al. (2015) is a famous technique used in general-purpose segmentation. It has been adopted by many authors to segment COVID-19 patient’s CT images. For example, authors in Zheng et al. (2020) used a pre-trained UNet to segment lung regions of CT images of the patients. A total of 499 and 131 CT images are used for training and testing with the proposed DeCoVNet which is a weakly-supervised deep learning model. The receiver operating characteristic curve (PR AUC) value is 0.975 for the tested CT images and the sensitivity and specificity values are larger than 0.9. Similarly, authors in Gozes et al. (2020) utilize deep learning approaches to classify COVID-19 and non-COVID-19 patients from CT images. Segmentation of ROIs is done using UNet while the classification of patients is achieved through the Reset-50 2D deep convolutional neural network He et al. (2016). Results are 0.996 AUC, 98.2% sensitivity and 92.2% specificity.

Another similar work that uses CT images to distinguish between COVID-19 and non-COVID-19 patients is Chen et al. (2020b). The proposed deep learning approach makes use of CT images of 51 confirmed COVID-19 patients, and 55 control patients from other diseases to train the model. Image segmentation is done using UNet++ and later a CNN is trained for classification Zhou et al. (2018). The proposed approach shows accuracy comparable to that of radiologists’ and can considerably reduce the reading time of the radiologists. Authors in Jin et al. (2020) design a system that automatically analyzes the features from CT images to detect COVID-19 pneumonia features and help physicians in the classification of the patients. A training dataset comprising of 1,136 CT images (723 positives for COVID-19) is used for this purpose. The 3D U-Net++ is leveraged for image segmentation while the classification is performed using ResNet He et al. (2016). The proposed approach achieves a sensitivity of 0.974 and a specificity of 0.922 for the used dataset.

The above-cited research works employ deep learning models on the CT images for COVID-19 detection. CT images are high-quality 3D images achieved from tomography. CT images are 3D images and contain hundreds of slices. It requires a substantial amount of time and computational resources to preprocess these images before we can put them to the training models. On the other hand, X-ray images are more common and easy to process than those of CT images. Hence various researchers proposed machine learning models that can work with X-ray images.

Authors in Narin et al. (2020) introduces three different models, i.e., ResNet50, InceptionV3, and Inception-ResNetV2 to classify COVID-19 patients from X-ray images. The models are trained on chest X-ray images of 50 COVID-19 patients and 50 normal people. The achieved accuracy is 98.0%, 97.0% and 87% for ResNet50, InceptionV3 and Inception-ResNetV2, respectively. A similar ResNet approach is presented in Zhang et al. (2020) whereby the trained model is used to classify the patients and detect the anomaly. Anomaly detection is used to improve COVID-19 classification. Classification is performed to separate COVID-19 patients from pneumonia patients. Results show the sensitivity of 96.0%, the specificity of 70.07%, and AUC of 0.952. Another deep learning model is worked out by authors in Wang and Wong (2020) for COVID-19 patient classification. The model i.e., COVID-Net is based on a deep CNN and uses X-ray images of 1203 healthy people, 931 bacterial pneumonia patients,660 patients with viral pneumonia, and 45 patients confirmed for COVID-19. The testing accuracy of COVID-19 is 83.50%.

The studies that utilize X-ray images to classify COVID-19 patients and healthy subjects train on a small dataset of 45 to 70 images Shi et al. (2020). With the limited number of X-ray images, the robustness and accuracy of the proposed approach can not be determined conclusively. Also, the results can not be generalized with a smaller dataset. We, therefore, use Keras ImageDataGenerator class to augment images for increasing the number of X-ray images. Later we work out image pre-processing technique and a customized CNN model to increase the prediction accuracy for COVID-19 patients.

3 Materials and methods

This section provides the details of the proposed COVID-19 prediction approach, preprocessing phases, and the structure of the CNN used for prediction.

3.1 Description of dataset used for experiments

This study uses the X-ray datasets from two sources. Dataset-1 is available at Dataset (2020) and contains 79 images each for virus and bacterial pneumonia. Dataset 2 is available at Kaggle (2020) and it contains 78 X-ray images of COVID-19 patients and 28 images for normal people. Figure 2 shows a few sample images from the dataset. Further images are generated using Dataset-2 using the ImageDataGenerator class form Keras to a total of 10,000. The size of the training dataset can potentially influence the performance of deep learning models. Deep learning is a data intensive approach and requires a large amount of data for training. The smaller dataset of X-ray images is not appropriate to produce generalized results. Among the widely used methods are Keras ImageDataGenerator class and Generative Adversarial Neural Network (GAN). Authors in Shorten and Khoshgoftaar (2019) perform analysis of augmentation techniques for image data and point out that ImageDataGenerator is preferred over GAN. It is superior to GAN and avoids overfitting as well.

Fig. 2
figure 2

Sample images from X-ray dataset. X-ray images come in different size

In the light of results given in Shorten and Khoshgoftaar (2019), this study uses the ImageDataGenerator class for generating more images Documentation (2018). Keras provided an image generator class that defines the configuration for image augmentation. Its capabilities include ’ random rotation, shift, shear and flips’, ’whitening’ and ’dimension reordering’, etc. Table  1 provides the names and values of the parameters used in the current study.

Table 1 Parameters used for ’ImageDataGenerator’ to augment images
Fig. 3
figure 3

The architecture of the proposed approach. The proposed approach uses the image preprocessing for X-ray images to remove noise. Later CNN is trained on the processed images for COVID-19 prediction

3.2 Proposed approach

The proposed system utilizes X-ray images from the dataset. The architecture of the proposed approach is shown in Fig.  3. The proposed approach comprises of two modules: image preprocessing and CNN. These modules are described here in detail.

3.2.1 Image preprocessing

The preprocessing aims at removing the noise in X-ray images to improve the training process of CNN. Predominantly, input images are large which increases the training time. The first step is to reduce the size of the X-rays images. The size of X-ray images in the dataset is different for X-ray images as shown in Fig.  2. In the first step, we reduce this size to 120 \(\times \) 120 \(\times \) 3 as shown in Fig. 4b. For edge detection, a value-based filter ([0,-1,0],[-1,6,-1],[0,-1,0]) is applied on the images which results in images with edges as shown in Fig.  4c. As the third step, Blue Green Red (BGR) image is converted to the luma component, blue projection, and red projection (YUV). It reduces the resolution of the U and V channels but keeps Y at full resolution. Because luminance is more important than color. And reducing U and V channels, the size of CNN can be reduced substantially. Figure 4d shows the results of BGR to YUV conversion. As the last step, we transform the YUV images back to BGR. It performs histogram normalization and smooths the edges.

Fig. 4
figure 4

Image preprocessing steps followed in the experiments, (a) Image size is reduced, (b) Kernel is applied for edge detection, (c) BGR image is converted to YUV to get \(Y_0\), and (d) YUV image is converted to BGR image

3.2.2 The architecture of the proposed cnn

Deep learning-based approaches have shown superior performance than those of traditional machine learning approaches. Owing to their significant accuracy, deep learning-based models has attracted considerable attention during recent days. They have been applied in a large variety of domains like object detection, scene recognition, scene analysis, etc. Convolutional neural networks (CNN) have been specifically utilized for computer vision tasks. CNN comprises of a large number of convolutional, as well as, pooling and fully connected layers, each layer performing a different task. For example, the convolutional layer uses a fixed size filter called kernel to extract local features from the input image.

A new convolved image is obtained each time a convolution is applied. Each convolved image contains features that have been extracted from the image of the previous step. Let I(x, y) be a 2D input image and let f(x, y) be the 2D kernel applied for convolution, then the convolution is Nielsen (2015)

$$\begin{aligned} y(i,j)=(I,f)(x,y)=\sum _{-\infty }^\infty \sum _{-\infty }^\infty I(x-u,y-v)f(u,v) \end{aligned}$$
(1)

When the convolution is applied, the pixel values at the edges can be ignored or padding can be applied. The output of the convolution can be transform using a nonlinear activation Patterson and Gibson (2017):

$$\begin{aligned} sigmoid(x)=\frac{1}{1+e^{-x}} \end{aligned}$$
(2)

Other than the convolutional layers, CNN contains pooling and fully connected layers. The pooling layer is used to summarize the local patches of convolutional layers. It subsamples the convolutional layer to reduce the size of the feature map. The pooling layer calculates the maximu and average function over the convolutional layer and are called max pooling and average pooling with respect to the function they perform. Spacing in the pixels of the image is used with pooling and is called stride. There is no activation function in pooling layers; they use a rectified linear unit (ReLU) instead. The pooling average for each convolutional layer can be calculated by Zhu et al. (2017):

$$\begin{aligned} X_{ij}^{\left[ l\right] }=\frac{1}{MN}\sum _{m}^{M}\sum _{n}^{N} X_{iM+m,jN+n}^{\left[ l-1\right] } \end{aligned}$$
(3)

where i and j show the positions of the output map, while M and N are the pooling sample sizes.

Fig. 5
figure 5

The architecture of the proposed convolutinal neural network for COVID-19 patient classification

Besides convolutional and pooling layers, fully connected layers are added in CNN to perform classification. The features from convolutional layers are given to the fully connected layer for classification. Fully connected layers have a different weight associated with each link and require substantial computing resources. Figure 5 shows the architecture of the proposed CNN used for the classification.

ReLU is used as activation with convolutional layers. Average pooling is used with a stride of 3 after the third convolutional layer. Dropout layers are used to prevent complex co-adaptations on the training data and avoid overfitting of the model. Originally, dropout layers were used with fully connected layers by Hinton et al. (2012), however, they have been used with convolutional layers as well Park and Kwak (2016). The last fully-connected layer uses the sigmoid function to output the prediction to 0 or 1. The standard sigmoid function Bishop (2006) is given as

$$\begin{aligned} S(t)=\frac{1}{1+e^{-t}} \end{aligned}$$
(4)

The details of the parameters used in the proposed CNN are given in Table 2.

Table 2 Detail of the layers structure used in the proposed CNN model

3.3 Performance evaluation metrics

This study uses accuracy, precision,recall, F-score, AUC, sensitivity, and specificity as the performance evaluation metrics. These metrics are based on four terms, i.e., True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). TP refers to patients that have a disease and the test is positive while FP are those patients who do not have the disease but the test is positive. Similarly, TN refers to those patients who do not have the disease and the test is negative and FN are the patients with disease but the test is negative. Based on these terms, sensitivity is calculated as

$$\begin{aligned} Sensitivity = \frac{TP}{TP+FN} \end{aligned}$$
(5)

A 100% sensitivity indicates the classifier has correctly classified all the patients with the disease Lalkhen and McCluskey (2008). High sensitivity is important for detecting a serious disease. Specificity is calculated as

$$\begin{aligned} Specificity = \frac{TN}{TN+FP} \end{aligned}$$
(6)

Both sensitivity and specificity do not consider a cut-off point for the test. A cut-off point affects the number of false negatives and false positives. For example, higher cut-off values result in higher false negatives and lower cut-off value raises false positives. The former indicates that the test highly specific but low sensitive while the latter shows a test that is highly sensitive but less specific. We have used AUC, i.e., the area under the receiver operating characteristic (ROC) to measure the discriminatory capability of the proposed model Jiménez-Valverde (2012). The model is considered to have a better discriminatory capacity if the AUC curve is higher than 0.5 Krzanowski and Hand (2009). Accuracy is one of the widely used metric to evaluate a classifiers performance and is calculate using

$$\begin{aligned} Accuracy = \frac{TP+TN}{TP+TN+FP+FN} \end{aligned}$$
(7)

Precision and recall are among the commonly used metrics for classifier performance evaluation. Precision measures the predicted positive cases which are positive and is calculated using

$$\begin{aligned} Precisioin = \frac{TP}{TP+FP} \end{aligned}$$
(8)

Besides the above-mentioned metrics, F-score is measured as well. F score is a statistical measure used in classification. It considers precision and recall of a model/classifiers to compute a value between 0 and 1 indicative of classifiers lowest to highest performance Mining (2006). F score is calculated as

$$\begin{aligned} F=2\times \frac{Precision\times Recall}{Precision+Recall} \end{aligned}$$
(9)

4 Results and discussions

The proposed model is tested with three scenarios which involve a different number of classes as follows

  • Scenario 1—Training and testing is performed with two classes, i.e., COVID-19 and Normal.

  • Scenario 2—Three classes are used for training and testing, i.e., COVID-19, Normal, and Virus Pneumonia.

  • Scenario 3—Training and testing is performed with four classes, i.e., COVID-19, Normal, Virus Pneumonia, and Bacterial Pneumonia.

4.1 performance analysis of the proposed model

The proposed model is trained and tested with 10,000 X-ray images of COVID-19 and normal people combined for two classes. For three and four class problems, 79 X-ray images of virus and bacterial pneumonia each are added. The training and validation are done with 70% and 10% of the data while the testing data is 20% of the total data. Training is performed using a Tesla K80 Tensor Processing Unit (TPU) available at Google Colab. It provides 180 TFlops, 16 GB Random Access Memory (RAM), and 128 GB of disk space. Training took 1.5 h to run 12 epochs on the dataset for two classes. Figure 6a shows the accuracy of training and validation while Fig. 6 shows the curves for loss of training and validation.

Fig. 6
figure 6

Training and validation curves of the proposed approach, (a) Training and validation accuracy, (b) Loss curve for training and validation

Figure 7 shows the confusion matrix for the proposed approach. Besides two-class prediction, the proposed approach is tested with three and four classes as well to evaluate its performance. Table 3 shows the results for accuracy, precision, recall and F-score while Table 4 shows sensitivity, specificity and AUC values for two, three and four classes using the proposed approach. Results show that the accuracy with two classes is very good, i.e., 0.9721, however, it is reduced when we add X-ray images for more classes. Results for two-class classification indicated that the proposed approach can discriminate COVID-19 patients from normal people with high accuracy. The adopted image preprocessing strategy helps training customized CNN to achieve that. Other indicators like precision, recall, and F-score is equally good. Accuracy for three-class classification has been reduced to 0.8986. However, it is still good considering the fact that the X-ray images from COVID-19 and virus pneumonia may have similarities. Additionally, the ribs overlying soft tissues, and low contrast can make the classification very challenging Zhang et al. (2020).

Fig. 7
figure 7

Confusion matrix for the proposed approach

Precision, recall, and F-score values for three class scenarios are still higher than 0.91. Four class discrimination scenario has an accuracy of 0.8476 which is lower than that of the other two scenarios. The customized CNN extract biomarker features from X-ray images for training and classification. X-ray images from COVID-19, virus pneumonia and bacterial pneumonia makes this process complicated which reduces the accuracy. Even then the accuracy is higher than other research works that used four classes with deep learning approaches. For example, the accuracy of CNN with four classes is 83.50% in Wang and Wong (2020).

Table 3 Statistics for the performance of the proposed approach
Table 4 Sensitivity, specificity and AUC of the proposed approach

4.2 comparison of models’ performance with VGG16 and AlexNet

Two deep learning-based classification models are selected with whom the performance of the proposed model is compared, i.e. VGG16 and AlexNet. AlexNet outperformed previous models and won ImageNet Large Scale Visual Recognition Competition (ILSVRC) in 2012 Krizhevsky et al. (2012). Its structure is similar to LeNet but it is deeper having more filters with stacked convolutional layers Lecun et al. (1998). Approximately 60 million parameters and 650,000 neurons are trained in AlexNet to perform image classification when the input image belongs to one of 1000 different classes. Comprising of 11\(\times \)11, 5\(\times \)5, 3\(\times \)3, convolutions, max pooling, dropout, data augmentation, it attaches ReLU to every convolutional and fully-connected layer. It shows that nonlinear ReLU can help in fast training of deep CNNs than that of using tanh or sigmoid. VGG16 is a CNN model that won ILSVRC in 2014 Simonyan and Zisserman (2015). Contrary to other models that focus on a large number of hyper-parameters, VGG16 adopts a different approach. It focuses on the use of convolutional layers of 3\(\times \)3 filters with a stride of 1 and always use the same padding and max pool layer of 2\(\times \)2 filters with a stride of 2. Two fully-connected layers are put at the end followed by a softmax for output. The 16 in VGG16 is indicative of 16 layers that have weights. Improvements are made through multiple 3\(\times \)3 filters.

Fig. 8
figure 8

Confusion matrices for techniques used for performance analysis, (a) Confusion matrix for VGG16, and (b) Confusion matrix for AlexNet

Figure 8 shows the confusion matrices for VGG16 and AlexNet for classification. It shows that the performance of the proposed approach is competitive with VGG16 for two and four class problems and identical when three classes are used for classification. AlexNet shows poor performance when used for COVID-19 and normal people X-ray images.

Table 5 Statistics for the performance metrics for the classifiers

Table 5 shows the performance comparison of the proposed approach with VGG16 and AlexNet. VGG16 achieves the highest accuracy when training data from COVID-19 and normal people is used; the accuracy of the proposed model is marginally low. AlexNet performs poorly with two classes. The accuracy of Alexnet with 2 classes is much lesser than other classifiers because it starts overfitting when it reaches the accuracy of 85% during model training. Testing accuracy starts falling until it reaches to 66% at the end of 12 epochs. An important point worth mentioning here is that the proposed image preprocessing is used with VGG16 and AlexNet. If we change the preprocessing, the results for VGG16 and AlexNet degrade. Precision, recall, and F-score for the proposed approach are slightly lower than VGG16 but better than AlexNet. Accuracy of the proposed with three class problems is the same as that of VGG16 and AlexNet and so are the values for precision, recall, and F-score.

When trained with data from COVID-19, normal people, virus pneumonia, and bacterial pneumonia X-ray images the accuracy of the proposed approach as well as, VGG16 and AlexNet is largely decreased. Accuracy of VGG16 and AlexNet is 0.85714 while the proposed approach is 0.00452 lower, i.e. 0.85262. The proposed approach can show similar performance for three and four class classification to that of VGG16 and AlexNet. Specificity, sensitivity, and AUC values are suggestive of good performance as well.

Table 6 Comparison of sensitivity, specificity and ACU for the classifiers

Table 6 demonstrates the results for sensitivity, specificity and AUC for the proposed as well as, VGG16 and AlexNet. AUC is an important factor to analyze the performance of a classifier to accurately discriminate between “diseased” and “non-diseased” patients Hajian-Tilaki (2013). AUC value for the proposed approach is marginally low for two classes than that of VGG16 and higher than that of AlexNet. For three-class classification, it is identical to VGG16 but lower than AlexNet. AUC value of the proposed approach for four-class problem is indicative of its good performance than those of both VGG16 and AlexNet.

Table 7 Statistics for required training time for classifiers

Tables 5 and 6 indicate that accuracy of VGG16 is slight higher than the proposed approach, however, we need to consider the complex architecture of VGG16 and the training time it requires as well to compare its performance with the proposed approach. Table 7 shows the training time which is required for the proposed approach in comparison to VGG16 and AlexNet. Considering the difference of training time and accuracy, we can say that the proposed approach performs better than VGG16 and AlexNet to discriminate among healthy people, and patients from COVID-19, virus and bacterial pneumonia.

5 Conclusion

This study presents a convolutional neural network to discriminate COVID-19 patients from normal people using X-ray images. Deep learning is a data intensive approach but the dataset of COVID-19 patients is small which makes it very difficult to evaluate the robustness and generalizability of the deep learning-based models. To overcome this issue, Keras’ ImageDataGenerator class is used to augment X-ray images. Image preprocessing is proposed which helps in the segmentation of the infected area in the X-ray images. Testing is performed with two, three, and four classes, i.e., COVID-19 patients, normal people, virus pneumonia, and bacterial pneumonia. Results indicate that the accuracy of the proposed approach is 0.97, 0.90, and 0.85 for two, three, and four classes, respectively. Precision, recall, and F-score values are also very good. The proposed approach has sensitivity and specificity of 0.98994, and 0.92190 respectively and AUC value is 0.5948 when four classes are used for training and testing. AUC values of VGG16 and AlexNet are 0.57280 and 0.57180 which are lower than that of the proposed approach for four-class classification.

The performance of the proposed approach is compared with VGG16 and AlexNet. The comparison indicates that the results of the proposed are marginally lower than that of VGG16 for two and four class problems while equal for three-class classification. AlexNet does not perform well for two-class, however, proves equally good for three and four class classification. Both VGG16 and AlexNet show the performance when used with the proposed image preprocessing strategy. The architecture of both VGG16 and AlextNet is complex and require higher training time than the proposed approach. The proposed approach is robust and produces good accuracy. If a higher number of X-ray images of COVID-19 patients are available, it is expected to improve the results further.