Introduction

The highly mutative SARS-COV-2 [1], commonly known as CoVID-19, bundled the entire existence of mankind in its initial surge washing away millions of innocent lives all around the world. Its adaptive variants curved out the second wave with India facing the worst of its wrath with around cumulative 383,490 deaths and 29.70 million positive cases as of 17th June 2021 (https://ourworldindata.org/). Marking its higher potency and lack of infrastructure to curb the exponential rise of infection, patients are dying without even being tested properly. Even though the testing graph shows quite a high rate, it is quite saturated when compared to the huge upsurge in positive cases.

As of now, the most viable resource for testing is the highly effective Real-Time Reverse Transcription Polymerase Chain Reaction [2], abbreviated as RT-PCR. Even though its sensitivity has marked a spike through repeated testing, it still beckons the question of its below-par one-shot sensitivity rates [3]. Furthermore, an RT-PCR test is still quite slow, with it taking at least a day to hand in a confident result (https://www.healthline.com/). To induce a time-efficient approach, Rapid Antigen Test or RAT has been introduced as a feasible alternative, and even though the reports can be handed in within minutes, its sensitivity is extremely low (around 30.2%) [4], showcasing a poor choice in terms of testing. As a result, many works cite the argument of validation through the means of various radiographical examinations [5, 6].

Leaning to find radiographical alternatives, like many other situations, computed tomography (CT) [7] and chest X-ray (CXR) [8] were thoroughly gone through in numerous works. CT scans, in particular, showed excellent sensitivity and specificity in the detection of CoVID-19 [9]. But, from a practical perspective, CT scan imaging brings up quite a lot of difficulties. First, it requiring expensive machinery for its working is more of a disadvantage than its preliminary advantages. Since the highest hit countries of the second wave are developing countries, the budget is quite tight, to say the least, and the bearing cost of installing as many CT machines as possible to carry out testing is quite impractical, to be honest. Second, CT scans require the suspected patients to come in contact with larger portions of the machine, thereby making it very time-consuming, with sanitization being required after each test gulping huge chunks of time. Third, its immobility. CT machines are gigantic and need to be installed in a spacious location, and lack the mobility that other modes of radiography imaging bring to the table. Finally, the need for expert radiologists to track the outputs. In this dire situation, the number of health workers is very bleak and the need for a radiologist to constantly check on the results brings in way more inefficiency in terms of human resource when dealing with the pandemic.

Therefore, the most realistic alternative is utilizing the prospect of chest X-ray imaging for the detection of CoVID-19. It brings up every disadvantage that a CT has and makes them its advantages. Cost-effective: X-ray machines are considered rudimentary for any medical facility, as such almost every hospital, nursing homes, and clinics have them. Therefore, a developing country, hugely affected by a surge in cases does not need to throw in all its resources for testing but instead use them for better purposes, for example in manufacturing effective vaccines. Less time consuming: CXR does not need direct contact of suspected patients with the machinery. Instead, one can lie in a dedicated bed and the machine can capture CXR from a distance. This as a result reduces the need for extra effort and time for sanitization, reducing the time for the entire process. Furthermore, CXR results are readily available on-screen, and also hard copies can be handed out in minutes, thereby reducing the chance of the spread of the infection as patients are updated on their condition almost instantly. Mobility: there is a huge availability of mobile X-ray machines, with them supporting the utility of door-to-door service. This as such opens quite a big door to aggressive testing to promote the disruption of virus spread. Human resource: the easy-to-handle X-ray machines do not require the trained radiologist to be ever-present to look at the pictures. Inexperienced personnel can be trained in just a couple of hours. This as such promotes the directives of experienced health workers to other sectors of the field, thereby curbing the infection as much as possible.

Therefore, inspired by the grave needs, these advantages have proven to be quite vital with many works deciphered based on CXR (as described in “Related Work”) pointing to excellent results. Even though many works have shown quite fascinating results, the sensitivity rate remains quite below par. Moreover, there has been a trend of utilizing imbalanced datasets without addressing any techniques to solve the balance problem among classes.

Moreover, post-covid symptoms and aftermath like lung pneumonia have been quite disastrous, with lives being taken away even with negative covid tests [10]. This has become quite worry-some.

In this paper, all the aforementioned points have been addressed meticulously. To integrate higher sensitivity, as well as to impute generic stability, a novel framework named CoWarriorNet has been introduced (shown in Fig. 1) which consists of ResiDense modules, shown in Fig. 2. To further enhance the network capability, a novel Pooling Layer, Alpha Trimmed Average Pooling has been introduced. Furthermore, a neural network is not perfect and in delicate scenarios like that of detection of CoVID-19, it does require human intervention from time to time. As such, our network does try to imbibe the need for human intervention through its confidence score along with the predicted output class. Finally, to address the post-covid pneumonia syndrome a lot of focus has been given to the sensitivity rate of pneumonia cases too.

Fig. 1
figure 1

The proposed novel CoWarriorNet

Fig. 2
figure 2

Our proposed ResiDense module

The subsequent “Related Work” addresses all the related works carried on in this field of study. It is followed up by “Organization of the Paper” which facilitates a roadmap of the paper for easier scanning.

Related Work

Hussain et al. [11] proposed a model CoroDet based on Convolutional Neural Network (CNN) [12] having 22 layers to perform the CoVID-19 detection on the CoVID-R dataset. However, this model suffers from vanishing gradient and exploding gradient problems. Ismael et al. [13] extended this work with pre-trained deep CNN (VGG16, VGG19, ResNet18, ResNet-50, and ResNet101) for feature extraction. Thereafter, they used Support Vector Machine for classification, utilizing several kernels, namely Linear, Cubic, Quadratic, and Gaussian functions. Since they were Imagenet pre-trained models, the evaluation metrics varied due to the lack of learned features specific for identifying CoVID-19 [14] cases, thereby pushing the models to overfit. Basu et al. [15] proposed a novel concept called domain extension transfer learning (DETL) with deep CNN [16] to classify the data in four classes based on their respective Class Activation Map. This model also suffers from the aforementioned drawbacks. Adding to it, due to the use of Grad-CAM, the model has exponential time complexity. Jain et al. [17] analyzed the effective deep learning techniques (ResNeXt, Inception V3, and Xception) on a dataset containing 6432 images which achieved a staggering performance accuracy of 97.97%. But one of the major drawbacks of this model is that it worked on augmented images, so a huge chunk of time was consumed in the process of data augmentation. Nayak et al. [18] performed a comparative evaluation between eight effective CNN models (AlexNet, VGG16, GoogleNet, ResNet-34, MobileNet-V2, SqueezeNet ResNet-50 and Inception-V3) to showcase their potential in CoVID-19 classification. The study showed that ResNet-34 achieved the highest accuracy of 98.33%. Moreover, the study also showcased that all the applied models have the overfitting problem—again mainly due to unrelated learned features by pre-trained models. Hemdan et al. [19] derived a CoVID X-Net model which mainly consists of seven various architectures namely, Visual Geometry Group Network (VGG19), Google MobileNet-V2, etc. to classify the chest X-ray images into the positive and negative classes (i.e., binary classification). The model seems to be massively bulky given that it was to be utilized for the binary classification task. Toğaçar et al. [20] used fuzzy colors with the stacking procedure with a deep learning model, Squeeze Net, to maneuver Social Mimic Optimization for multiclass classification. Ouchicha et al. [21] proposed a novel architecture, CVDNet, based on Residual Neural Network, which combines local and global features to classify the chest X-ray images (CoVID-19, normal and viral pneumonia). According to [22], ResNet has a very high inference time per image (best case 8.90 ms and worst case 84.52 ms) which dwindles the average response time of the designed architecture. Chaudhary et al. [23] utilized deep CNN to classify the CoVID CXR images in the CoVIDX dataset. However, the model seems not properly trained, resulting in the constant variation of statistical parameters. This is evident by the results depicted in the paper, which indicates an underfit model. Tang et al. [24] achieve the goal by ensemble multiple snapshots of CoVID Net design which gets a performance accuracy of 93.5% but it has a very low F1 score. Ramirez et al. [25] proposed uncertainty estimation through Monte-Carlo dropout, Softmax scores model in a MixMatch semi-supervised structure. They obtained Jensen–Shannon distance to evaluate its performance, whereas the PSNR, SISM, FISM have not been evaluated. Later on, they extend this work in [31] where they used scarce labeled data to overcome the aforementioned drawbacks. Jain et al. [26] used the YoloV3 model to detect the positive cases from chest X-ray images where the Gaussian blur and data augmentation were performed as a necessary pre-processing step. On an average case, YoloV3 takes 18 h to train with a Google Colab GPU. That is why the model is not very time efficient. Haghanifar et al. [27] implemented CheX Net which comprised of U-Net to extract the ROI through image segmentation. Later on, they applied Dense Net 121 as a backbone architecture to perform the detection. Although the model achieved a higher accuracy, the segmentation and detection is a sequential process and not applied in the real-time scenario. Luz et al. [28] used a pre-trained Efficient Net model evaluated on the CoVIDx dataset which also suffered the aforementioned problems of the pre-trained models along with data augmentation. Karakanis et al. [29] obtained Generative Adversarial Network with ResNet8 as the discriminator which utilized transfer learning for real-time weight transfer, thereby deducing exponential space complexity. Ibrahim et al. [30] utilized the highly popular AlexNet, to showcase the difference between viral pneumonia and CoVID-19 chest X-ray. Sakib et al. [32] proposed DL-CRC which consisted of a Generative Adversarial Neural network and Deep CNN to perform the binary classification. It has also the same drawbacks as mentioned for [29]. Panwar et al. [33] obtained a deep transfer learning algorithm with grad-CAM color visualization to separate the classes based on their activation map. A better alternative could have been the utilization of TSNE which can provide better feature distribution in less time. Kamal et al. [34] also used Deep Convolutional Neural Network to solve the aforementioned problem. However, it suffered from vanishing gradient and exploding gradient problems. Not only that, Gomes et al. [45] proposed an AI framework for texture analysis of the CoVID-19 chest X-ray images, however, for diagnosis, it is not the best approach. Therefore, they gradually shifted to the pseudo-convolutional machines with the help of RT-PCR results to characterize the virus sequences [46]. In the meantime, Ismael et al. [47] published a survey paper to show the performance of the multiresolution approaches on CoVID-19 chest X-ray images which shows sometimes image resolution creates a major issue in detection accuracy. Lastly, Singh et al. [35] proposed a novel design of Gen-ProtoPNet based on NP-complete problem by ensemble 30 prototypes of different CNN architectures for CoVID-19 chest X-ray detection. It is the latest updated architecture to date.

From the above discussion, it can be concluded that most of the deep learning models implement conventional Neural Networks with transfer learning for fine-tuning. Some use GAN for data augmentation and others extract the ROI through image segmentation. All these state-of-the-art approaches faced the problems of overfitting, underfitting, vanishing gradient, exploding gradient, space complexity, and exponential time complexity. To overcome these, a novel architecture is proposed in the following sections.

Organization of the Paper

The paper is oriented in a quite specific chronology with the proposed architecture discussed in the subsequent “Proposed Methodology”. The results obtained are discussed in “Experimental Results”, which addresses the efficiency of the architecture and its excellent adaptation to detecting CoVID CXR images. Finally, the proposed method is concluded in “Conclusion and Future Scope”, pointing to further improvements that can be adopted in the future for better results.

Proposed Methodology

The proposed methodology incorporates three modular parts with each serving specific purposes and integrates them to device a stringent CoVID-19 CXR detection algorithm. All the specific modules are discussed as follows. Figure 3 obtains the workflow of the proposed architecture.

Fig. 3
figure 3

Workflow of the CoWarriorNet

According to Fig. 3, first, the chest X-ray images are fed into the derived UNET model to perform feature mapping. Then, the mapped features are passed through ResiDense module which performs feature filtering and the added Global Average Pooling converts the features in the 1-D Vector Map. These are fed into Terminal Networks which consist of two parallel networks namely Classification Network and Confidence Network. The classification network is responsible for the multiclass classification and the second one is responsible to show the efficiency of the obtained classification.

Derived UNET for Feature Mapping

The proposed architecture initiates with a derived architecture of the highly efficient UNET [36], which picks up the input image and reproduces a specific map to serve as specifics of “where to look” for the image. The map is enhanced through automatic gradients with backpropagation and is the quintessential building block for efficient output classification. One of the major differences our architecture has from the original UNET architecture is in the use of Alpha Trimmed Average Pooling (discussed in detail in the latter half of this section) in place of Max-Pooling [37]. The Derived UNET has in total nine Identity blocks with four down-sample blocks, four up-sample blocks, and one bridge block joining the two modules. Each of the nine blocks has two convolutional layers with 3*3 kernels, one step stride, and one added padding pixel along the border, followed by Batch Normalization [38] and Rectified Linear Unit (ReLU) [37] adding non-linearity. The down-sampling block and the up-sampling block differ in the sense that the down-sampling ones are followed by a pooling layer whereas the up-sampling blocks are followed by transposed convolution. Similar, to the original UNET paper, our derived module also concatenates the feature maps from the down-sampled block with the parallel up-sampled block to provide soft attention to the image feature map in contention. The final feature map is of the same spatial dimension as the input, and also has the input image concatenated to it, deriving difference from the original UNET architecture.

ResiDense Module

The output of the Derived UNET is fed into two ResiDense modules, which compresses the features to feed into the classification network. The ResiDense module comprises hybrid residual and dense connections, which adds inefficient computation through clubbing of the hybrid connections. Our ResiDense module consists of five convolutional layers (as shown in Fig. 2), each followed by Batch Normalization and ReLU activation function. In between the initial two layers, an additive residual skip connection is added, which helps in the avoidance of accuracy degradation via the equation given as follows:

$$h\left(x\right)= \rho \left(x\right)+x,$$
(1)

where \(x\) is the input, \(\rho \left(x\right)\) depicts the residue, and \(h\left(x\right)\) depicts the prediction of the layer with a shortcut. Among the next three layers, densely concatenated connections persist. This type of module is improvised since the initial convolution does not show much change and simple and computationally inexpensive residual connection does enough justice to thrust excellent feature recognition. The later layers capture much deeper orientation, thereby requiring a more adaptive dense connection. The proposed module not only enhances and adjusts to deterioration of the accuracy but also helps in the smooth flowing of gradients while back-propagating for parameter adjustment.

Terminal Networks

The output feature map from the ResiDense module gets transformed into a vector through Global Average Pooling [39], which converts the maps into a 1-D vector of size equal to the number of channels of the map. In the proposed architecture, this vector is bifurcated into two parallel networks, one which predicts the output classification (Classifier Network) and one which depicts a confidence score (Confidence Network [40]). The proposed Classifier Network is a single-layered perceptron (shown in Fig. 4a), which outputs three probability scores, each for the classes of the utilized dataset. The Confidence Network is something that brings about subtlety to utilizing Deep Learning in the highly sensitive medical field. Through the Confidence Network, the proposed model tries to predict how confident the predictions are. The main tricky part of the network was to get this score. Popping out the highest predicted probability as the score can indeed be a viable option. But it is to be noted that in most cases, the final output is the class that had the highest probability prediction given out by the model. Hence, for this reason, during training, the true class label probability, that is the probability given by the model index for the actual target is learned via a series of layers. This in turn gives out the Confidence Score during evaluation. The proposed Confidence Network is quite simple. It is a five-layered Multilayer Perceptron that terminates into a single neuron giving an output score, as shown in Fig. 4b.

Fig. 4
figure 4

Terminal Networks. a Proposed classification network, b proposed confidence network

The detailed depiction of the model can be found in Table 1, with each module network classified for better re-implementation. The entire network is trained end-to-end with each of the terminal networks alternatively frozen when the other is being trained. As it is quite evident, the network’s error rate was driven by two loss functions, one each for the terminal networks. For the classifier network, since, it was devised as a multiclass problem, the Cross-Entropy Loss (given in the following equation) was selected:

$$L\left(y , \widehat{y}\right)=\frac{1}{m}\left(\sum_{i=1}^{m}{y}^{\left(i\right)}\mathrm{log}({\widehat{y}}^{\left(i\right)}\right),$$
(2)

where y is the actual target label, \(\widehat{y}\) is the predicted label by our network and m is the total number of samples in the dataset.

Table 1 Ablation study of CoWarriorNet

The confidence terminal network predicts a single value and is, therefore, aligned with the Mean Squared Error loss function (given in the following equation) to enhance its optimization:

$$L\left( {y , \hat{y}} \right) = \frac{1}{m}\mathop \sum \limits_{i = 1}^{m} \left( {y^{\left( i \right)} - \hat{y}^{\left( i \right)} } \right)^{2} ,$$
(3)

Introducing Alpha Trimmed Average Pooling

One of the highlights of our proposed network is through the novel pooling layer, named Alpha Trimmed Average Pooling. The X-ray images, in general, have a trend of partaking in salt and pepper noise. These noises have either very low values (in the case of pepper noise) or very high values (in the case of salt noise). As such, during the progression of this architecture, these artifacts do get incorporated by the convolutional layers. Utilizing the much-preferred Max-Pooling in this scenario diminishes the performance as a whole. As such, Alpha Trimmed Average Pooling is adopted which removes a percentage of the highest and the lowest values (after sorting) and averages the remaining ones to pop out an output. The percentage of values to be removed is taken as a hyperparameter, d.

Suppose there are k values and Alpha Trimmed Average Pooling are applied on them. First, a value is taken for d, and using that, it is decided how many values are to be removed from the k set of values using the following equation:

$$r= d\mathrm{ \%}*k.$$
(4)

After this, the r/2 highest and r/2 lowest values are removed from k, and then they are averaged using the following equation:

$$m= \frac{1}{(k-r)}\sum_{i=r/2}^{k-r/2}{k}_{\mathrm{sorted}}(i),$$
(5)

where m is the output after the pooling layer. A much clearer pictorial depiction is given in Fig. 5 and an algorithmic depiction when using with convolutional layer is also given in Algorithm 1.

Fig. 5
figure 5

Working principle of the proposed Alpha Trimmed Average Pooling

Algorithm

Begin

Step 1: Select a pool size of (k * k).

Step 2: Put this kernel on top of the image.

Step 3: Pick the values of the image under the selected pool, M = [\({m}_{1}, {m}_{2 },{m}_{3},\dots .{m}_{{k}^{2}}\)].

Step 4: Sort M as Msorted.

Step 5: Pick a value for the hyperparameter d (in percentage).

Step 6: Calculate r = d% * len (Msorted).

Step 7: Remove r/2 values lowest values and r/2 highest values from Msorted.

Step 8: Calculate the output, \(m\), using Eq. (5).

End

Experimental Results

This section ushers the light to the effective nature of our network via the gathered results in the detection of CoVID-19 through CXR images.

Implementation Details

The entire model is trained on Kaggle Kernel with accelerated GPU. The kernel built has a cumulative 13 GB RAM and 16 GB VRAM/GPU. The proposed model is fully running on a Jupiter Notebook environment with Python 3.7.10 and Pytorch version 1.7.

The model is tested and evaluated on the publicly available dataset obtained from Kaggle (Chest X-ray (CoVID-19 and Pneumonia) | Kaggle) [48]. The obtained dataset has a total of 5144 training data points and 1288 testing data points. The dataset is trifurcated into three classes, namely Normal, Pneumonia and CoVID-19. Besides that, CoVID-19 chest X-ray Dataset Initiative [49] and Synthetic CoVID-19 chest X-ray dataset [50] have also been utilized for the performance evaluation and comparative study. The distribution of the training and testing datasets (Kaggle) concerning the classes are depicted in Fig. 6a and b, respectively, and a detailed data description has been reported in Table 2. Due to the uneven distribution among classes, a batch-wise weight-based sampler is deduced which fed in images in batches that have even class distributions. The weight-based sampling and all the codes and results are documented in Python notebooks and will be made available once the paper has been accepted.

Fig.6
figure 6

a Train dataset class distribution. b Test dataset class distribution

Table 2 Dataset description

From Table 2, it can be concluded that the conducting experiment succeed to achieve the zero bias level as in the case of Kaggle dataset it is split into a 3:1 ratio, whereas for the synthetic dataset, it turns to a 1:1 ratio. Not only that, the Total Initiative dataset has been used for testing purposes only. It also helps to reduce the prone of overfitting issues and make the proposed model the best fit for global acceptance.

Evaluation Metrics

The proposed model is evaluated based on the industry-norm metrics of Accuracy, Precision, Recall, and F1-Score. A detailed description of each of these is given as follows.

Accuracy

It refers to the total number of predictions given out correctly by the hypothesized model and is given by the following equation:

$$\mathrm{Accuracy }=\frac{\mathrm{len}({h}_{\mathrm{correct}})}{\mathrm{len}({h}_{\mathrm{total}})},$$
(6)

where h represents the prediction of our model, and \(\mathrm{len}({h}_{\mathrm{correct}})\) and \(\mathrm{len}({h}_{\mathrm{total}})\) refer to the number of correct predictions and the total number of positive predictions, respectively.

The accuracy curve of our proposed model can be seen in Fig. 7. The training accuracy has shown vivid increment in due course with final data showcasing a 97.8% accuracy on board. On the validation set, even though there were initial fluctuations, it finally stabilized at around 93.05%. The small difference is partly due to the small number of images regarded in the validation set. This reason becomes much more evident, as the model performs better in the test set compared to the validation set with an accuracy of 94.11%.

Fig. 7
figure 7

Accuracy curve of the proposed model

Precision

Precision refers to the ratio of the correct number of positive predictions (\({h}_{{\mathrm{class}}^{i}}== {y}_{{\mathrm{class}}^{i}}\)) given by the hypothesized model to the total number of positive responses predicted by the model (\({h}_{{\mathrm{class}}^{i}})\), given in the following equation:

$$\mathrm{Precision }=\frac{\mathrm{len}({h}_{{\mathrm{class}}^{i}}== {y}_{{\mathrm{class}}^{i}})}{\mathrm{len}({h}_{{\mathrm{class}}^{i}})}.$$
(7)

In the proposed case, due to multiple classes, the method is tried to gather the precision of each of the classes. The proposed method is articulated this by first picking out the prediction, then one at a time selecting one of the classes and setting it as a “positive” class, then seeking out precision value using Eq. (7).

The precision relative to this model predictions, in terms of each class, is given in Fig. 8. In each of the cases, the validation curve seems quite deviating but similar to the accuracy curve, the validation precision of each did shallow off being optimal. But, the most important aspect of the model of getting better results for covid did fetch fruitful outcomes with both the training and the validation sets capping off over 0.99 precision with regards to covid classes. The model showed its superiority in pneumonia cases too with a 0.97 and 0.94 precision in training and validation set, respectively. The precision for normal cases is a bit off the par in terms of the validation set score. Yet, it achieved quite a high 0.88 in the validation set and 0.97 in the training set. The test case result though fared highly, with covid, normal, and pneumonia precision showcasing values 0.97, 0.94, and 0.93, respectively.

Fig. 8
figure 8

Precision curve of the proposed model

Recall/Sensitivity

Complementary to precision metric, Recall/Sensitivity refers to the ratio of the correct number of positive predictions (\({h}_{{\mathrm{class}}^{i}}== {y}_{{\mathrm{class}}^{i}}\)) given by the hypothesized model to the total number of actual positive responses in the dataset (\({y}_{{\mathrm{class}}^{i}})\), given in the following equation:

$$\mathrm{Recall}/\mathrm{Sensitivity }=\frac{\mathrm{len}({h}_{{\mathrm{class}}^{i}}== {y}_{{\mathrm{class}}^{i}})}{\mathrm{len}({y}_{{\mathrm{class}}^{i}})}.$$
(8)

The main objective coming to build this model is to get an appropriate sensitivity score. Based on such desires, the proposed model does fare amazingly well as shown in Fig. 9. The model in both the covid cases data point and the pneumonia cases data point in training and validation sets achieved values higher than 0.99. On the other hand, when recognizing normal cases, the training sensitivity is mapped at 0.97, but, in the validation set, it dropped down to 0.87. This low grading does not bear much importance since the model is precisely created for the sole purpose of getting better Recall in covid cases. This is proven further as in the test set, this novel architecture devised a Recall of 0.93, 0.83, and 0.98 in covid, normal, and pneumonia cases, respectively.

Fig. 9
figure 9

Recall curve of the proposed model

F1-Score

There is a typical bias relative to an evaluation in Precision and Recall scores. As such, a far concrete evaluation metric, namely F1-Score, has been deciphered which considers both the values of Precision and Recall to give out a better evaluation score, as given in the following equation:

$$\mathrm{F}1-\mathrm{score }=\frac{2*\mathrm{Precision}*\mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}.$$
(9)

In regards to this model, the F1-Score values with regard to each epoch are given in Fig. 10. As with the Recall and F1-Score, the values do flatten off at the end, citing optimal weights. The F1-Scores for covid cases lied above 0.99 for both the training and validation set—giving the objective quite a fruitful context. The normal case data points earned an F1-Score of 0.97 and 0.87 in training and validation sets, respectively. Whereas, the pneumonia cases achieved an F1 score of 0.97 and 0.92 in train and validation sets, respectively. When put through the test set, the covid, normal and pneumonia cases achieved 0.95, 0.88, and 0.95 F1-Score, respectively.

Fig. 10
figure 10

The F1-Score curve of the proposed model

Model Predictions: The Rights and the Wrongs for Kaggle Dataset

Our proposed model fetches quite a unique orientation when predicting hypothesized outputs. The two-terminal networks see to the higher so-called confidence of the model. The confidence network somewhat does a sanity check on the classification head by giving out a confidence value. The main motive is to make the model more confident, and with the results shown in Figs. 11 and 12, it can ascertain that notion.

Fig. 11
figure 11

Few samples of correct class labels are predicted by the proposed model, along with the respective confidence score

Fig. 12
figure 12

Few samples of wrong class labels are predicted by the proposed model, along with the respective confidence score

In Fig. 11, one can see some of the correct predictions in the test set as predicted by the model. The figure also shows that the correct predictions bear a high confidence score (> 0.98), given out by our model, indicating that the model has indeed learned quite a bit about the CoVID CXR Images.

On the other hand, as given in Fig. 12, some of the model predictions are quite wrong, to say the least. But, one can see that at that point of wrong predictions, the confidence scores are pretty low (< 0.80) when compared to the scores of the correct predictions. These confidence scores give the reliability and the confidence to put out the method in real-time use since it helps to indicate the perfect instance for human intervention due to discrepancy by the model.

Performance Evaluation on the Synthetic Dataset

In the previous section, the performance of the Kaggle dataset has been obtained. However, to prove, the robustness of the developed architecture performance evaluation on the primary dataset is not enough. It has to be performed well on synthetic datasets too as it contains images from various databases. Figure 13 gives a brief idea how a synthetic dataset looks like.

Fig. 13
figure 13

Synthetic database

In Fig. 13, the first row represents the CoVID-19 chest X-ray images, whereas, the second and third rows represent viral pneumonia and normal images, respectively. Not only had that, to show the robustness, two-dimensional UMAP embeddings been obtained in Fig. 14.

Fig. 14
figure 14

Two-dimensional UMAP embeddings. a Normal vs CoVID-19, b Normal vs CoVID-19 vs Pneumonia, c Normal vs CoVID-19 vs Pneumonia vs Mixed Classes

From Fig. 14, it can be concluded that it is very difficult to get distribution for the CoVID-19 in UMAP as X-ray images are not relevant in detecting CoVID-19 which depicts the complexities of the problem that has been tackled by CoWarriorNet. Not only that, this paper has been generated a mixed class in a synthetic dataset without ground truth to show the effectiveness of the proposed architecture in the real-time scenario.

However, the proposed model obtained a higher accuracy (~ 98%) still some failure cases have been reported. To demonstrate, the region heatmap activation of all the three classes has been reported in Fig. 15.

Fig. 15
figure 15

Heatmap activation of CoWarriorNet: a CoVID-19, b Normal, c Pneumonia

From Fig. 15, it can be observed that there is a negligible difference among the three classes heatmaps, still failure cases are less than (~ 2%). This shows the necessity of the derived UNET and residence module as they performed the feature mapping very effectively. As all the analyses on the synthetic dataset have been completed, the classification performance of the CoWarriorNet on the synthetic dataset is obtained in Fig. 16.

Fig. 16
figure 16

Performance evaluation on synthetic dataset

As depicted in Fig. 16, CoWarriorNet performs significantly well on the synthetic dataset too. Now, a detailed comparative study has been obtained in the following subsection to check whether the proposed architecture outperforms the existing ones or not.

A Comparative Study with the Other Architectures

Any study is quite incomplete without the comparison of the proposed architecture with other contemporary models. Here, the proposed framework has been compared with Ismael et al. [13], Abbas et al. [41], Minaee et al. [42], Wang et al. [43], and Tabik et al. [44] based on their confusion matrix and Precision, Recall, F1-Score of each of the three classes (Fig. 17).

Fig. 17
figure 17

The confusion matrix of a Ismael et al. [13], b Abbas et al. [41], c Minaee et al. [42], d Wang et al. [43], e Tabik et al. [44], and f CoWarriorNet on Kaggle data

From the above figure (Fig. 17), it can be concluded that the methods [13, 41,42,43] have suffered from overfitting whereas [44] goes underfitting. On the other hand, CoWarriorNet reduces the prone to overfitting. Now to prove its effectiveness the Precision, Recall, and F1-Score of the CoVID-19, Normal, and Pneumonia have been reported in Tables 3, 4, and 5, respectively.

Table 3 Comparative study of state-of-the-art approaches on CoVID-19 classes
Table 4 Comparative study of state-of-the-art approaches on Normal classes
Table 5 Comparative study of state-of-the-art approaches on Pneumonia classes

In Table 3, the CoWarriorNet always performs better than the state-of-the-art approaches as they are mainly trained on RGB images whereas X-ray images are mainly greyscale that is why their feature information does not matche with the RGB ones.

As depicted in Table 4, CoWarriorNet easily outperforms the existing ones as they are using the pre-trained model with the help of transfer learning whereas CoWarriorNet has been trained from scratch without any help from auxiliary models.

As CoWarriorNet outperforms the state-of-the-art approaches in Precision and Recall, it is obvious that it is outperformed in F1-Score too.

From the above discussion, it can be concluded that the proposed methodology provides more optimized statistically accurate results in a very lesser time as compared to the state-of-the-art approaches. It has the potential to overcome the drawbacks highlighted in the literature review and create its pathway by making a notable advancement in the classification of CoVID-19 chest X-ray images.

Conclusion and Future Scope

Through this paper, we have tried to build upon a new deep convolutional network that can detect CoVID-19 from chest X-ray images. Apart from a novel architecture, the paper also introduces a new pooling layer which has contributed to the excellent result that the model has shown. The future scope of the line of work can be in two ways.

The first is based on the advancement of further deep learning works to detect CoVID-19 with the proposed model being the platform. This can be done with a much better quality of images or through the artificial generation of a much diverse set of images. Furthermore, in our work, pre-processing is quite low. In the future, the researchers can try to implement newer pre-processing to fetch better outcomes. Apart from that, the attention-based models on top of the novel architecture can be visited to see the difference in the outcomes.

The second is related to the use of this network module in other sectors of study. This novel architecture can be utilized in other unknown areas of classification and detection. The novel alpha trimmed average pooling can be utilized in other areas of study to mark its importance. All in all, the proposed model has performed significantly well in detecting CoVID-19 cases from chest X-ray images and does provide a new benchmark for future works.