Empowering Tuberculosis Screening with Explainable Self-Supervised Deep Neural Networks

Neel Patel
Department of Mechanical and Mechatronics Engineering
University of Waterloo
Waterloo, ON N2L 3G1, Canada
&Alexander Wong
Department of Systems Design Engineering
University of Waterloo
Waterloo, ON N2L 3G1, Canada
&Ashkan Ebadi
Digital Technologies Research Centre
National Research Council Canada
Toronto, ON M5T 3J1, Canada
ashkan.ebadi@nrc-cnrc.gc.ca

Abstract

Tuberculosis persists as a global health crisis, especially in resource-limited populations and remote regions, with more than 10 million individuals newly infected annually. It stands as a stark symbol of inequity in public health. Tuberculosis impacts roughly a quarter of the global populace, with the majority of cases concentrated in eight countries, accounting for two-thirds of all tuberculosis infections. Although a severe ailment, tuberculosis is both curable and manageable. However, early detection and screening of at-risk populations are imperative. Chest x-ray stands as the predominant imaging technique utilized in tuberculosis screening efforts. However, x-ray screening necessitates skilled radiologists, a resource often scarce, particularly in remote regions with limited resources. Consequently, there is a pressing need for artificial intelligence (AI)-powered systems to support clinicians and healthcare providers in swift screening. However, training a reliable AI model necessitates large-scale high-quality data, which can be difficult and costly to acquire. Inspired by these challenges, in this work, we introduce an explainable self-supervised self-train learning network tailored for tuberculosis case screening. The network achieves an outstanding overall accuracy of 98.14% and demonstrates high recall and precision rates of 95.72% and 99.44%, respectively, in identifying tuberculosis cases, effectively capturing clinically significant features.

Keywords Tuberculosis $\cdot$ Deep learning $\cdot$ Explainable neural network $\cdot$ Rapid screening $\cdot$ Radiology

1 Introduction

Tuberculosis (TB), caused by the transmission of the bacillus Mycobacterium tuberculosis through airborne particles expelled by individuals with the illness [1], is estimated to have infected approximately a quarter of the world’s population [2], especially in regions grappling with poverty and economic hardship [1]. TB is a disease that can typically be prevented and cured. However, in 2022, it stood as the second most fatal infectious disease globally, following the coronavirus disease (COVID-19), and accounted for nearly double the number of fatalities compared to human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) [1]. After infection, the highest risk of TB disease occurs within the initial two years ( $\approx 5\%$ ), decreasing significantly thereafter [2]. And, some individuals will completely clear the infection [3]. Among those who develop TB annually, around 90% of cases occur in adults, with a greater prevalence observed among men [1]. Although tuberculosis is treatable, with about 85% of infections effectively cured using a six-month antibiotic regimen [4], the mortality rate from untreated TB disease remains notably high at around 50% [5].

Screening high-risk populations and early disease detection are vital steps in TB treatment [4], but tuberculosis continues to be either underdiagnosed or underreported to national authorities [1]. The situation worsened during the COVID-19 pandemic as evidenced by a significant global decline in newly diagnosed and officially reported TB cases, indicated by an 18% reduction between 2019 and 2020, after substantial increases from 2017 to 2019, dropping from 7.1 million to 5.8 million cases, with a partial recovery to 6.4 million in 2021 [1]. In 2022, the global count of newly diagnosed TB cases, officially reported and notified, reached 7.5 million, marking the highest count since the inception of global TB monitoring by the World Health Organization (WHO) in 1995 [1]. Moreover, the global decrease in TB-related deaths from 2015 to 2022 amounted to 19%, falling considerably short of the WHO’s End TB Strategy target of a 75% reduction by 2025 [1].

Whether it is improved medication and care or more effective resistance management, the cornerstone of success in TB treatment lies in early and accurate diagnostics [4, 6]. As the predominant modality utilized in tuberculosis screening, chest x-ray (CXR) imaging has proven to be highly efficient and cost-effective [7, 8]. However, it poses further challenges due to the presence of atypical radiographic presentation and shortages of radiologists [9], especially in resource-limited settings, since it necessitates skilled human readers or trained clinicians/technicians for interpretation [4].

Due to the global scarcity of experienced individuals for interpreting CXR images in tuberculosis screening, there has been a notable surge in interest in artificial intelligence (AI)-driven TB screening solutions, e.g., [10, 11, 4]. AI-driven systems hold promise in healthcare by offering efficient analysis of vast amounts of medical data, aiding in diagnosis, treatment planning, and personalized care, ultimately improving patient outcomes and streamlining healthcare delivery [12]. AI-powered TB screening can assist clinicians by providing efficient and reliable analysis of imaging data, thereby optimizing resource utilization and enabling early detection of the disease. Indeed, the latest report from the WHO highlights further products being considered for review, including point-of-care TB tests and computer-aided detection (CAD) for digital chest radiography in individuals under 15 years of age, among other potential applications [1].

Inspired by the urgent demand in resource-limited settings and low-income populations and the WHO’s recent endorsement of CAD for tuberculosis screening, we present an explainable self-supervised deep neural network tailored for tuberculosis case screening. We leveraged a framework named distillation for self-supervision and self-train learning (DISTL) [13] that is inspired by the learning process of radiologists. DISTL incorporates both self-supervision and self-training through knowledge distillation, enabling the model to learn from unlabeled data and iteratively improve its performance. The approach overcomes the need for large-scale high-quality data in building reliable medical AI models by utilizing limited labelled and extensive unlabeled data, mirroring the teacher-student learning paradigm. We also conducted explainability analysis to ensure the network effectively identifies disease-related patterns and indicators in the CXR images. This research holds significant promise in addressing current challenges and we hope it could revolutionize TB diagnosis through its innovative blend of explainable self-supervised learning and deep neural networks.

The structure of the article is outlined as follows: Section 2 elaborates on the data and methodology. Section 3 showcases the study’s findings, while Section 4 offers conclusions and discussions on broader implications. Lastly, Section 5 outlines the study’s limitations and suggests future directions.

2 Data and Methods

This section provided a comprehensive overview of the data and methods utilized in the study, detailing data preparation, network architecture, and the validation process driven by explainability analysis.

2.1 Data Collection and Preparation

We used a comprehensive chest x-ray (CXR) dataset comprising four distinct sources to ensure diversity and robustness. Tuberculosis-positive images ( $n=2,141$ ) were sourced from the Montgomery ( $n=58$ ) [14], Shenzhen ( $n=336$ ) [14], Belarus ( $n=1,047$ ) [15], and Rahman et al. ( $n=700$ ) [10] datasets. Additionally, the normal images ( $n=3,500$ ) were exclusively collected from Rahman et al. [10] dataset.

After data collection and integration, we proceeded with several preprocessing steps. Initially, we employed the U-Net model [16] to extract the lung region from the CXR images. The U-Net architecture is specifically designed to handle biomedical images and has gained widespread recognition in medical image segmentation [17]. For instance, in [18], a U-Net-based image segmentation approach demonstrated superior performance compared to traditional methods in segmenting lung, heart, and clavicle structures in chest radiographs. U-Net effectively manages the spatial complexities of medical images by employing a contracting path to capture contextual information and a symmetric expanding path to enable precise localization. The U-Net model predicted the lung regions, producing segmentation masks. These masks were then resized to match the original dimensions of CXRs, using nearest-neighbor interpolation, to preserve the aspect ratio of the lung regions. The x-ray images were cropped using the two largest contours identified. The segmentation step ensured that subsequent analyses focused solely on the relevant lung regions, effectively excluding any extraneous artifacts, text, or markers present in the original images. The cropped images were then resized to 225x225 pixels.

Next, we conducted a manual review to identify and exclude images exhibiting poor segmentation outcomes, such as those depicting only one lung or misidentified non-lung regions. This review led to the removal of 516 normal and 218 tuberculosis images that did not meet quality standards. The final dataset consisted of $1,923$ tuberculosis-positive and $2,984$ normal images. The dataset was subsequently partitioned into training and test subsets, with 10% of the images reserved for testing and evaluating the model’s performance, leaving the remaining 90% for training purposes. Upon partitioning the dataset, the training subset was further divided into four portions. Specifically, 10% of the training data was designated as labeled data, while the remaining 90% was designated as unlabeled data and evenly distributed among three subsets. Figure 1 depicts sample normal and tuberculosis images, original and segmented, in our dataset.

Refer to caption — (a) Normal case, original

2.2 Network Architecture

Our tuberculosis screening network adopts the distillation for self-supervision and self-train learning (DISTL) framework [13]. DISTL draws inspiration from the learning process of radiologists, enhancing the performance of vision transformers by incorporating self-supervision and self-training techniques through knowledge distillation simultaneously [13]. In the self-training approach [19], a learner (referred to as the teacher), initially trained with limited labeled data, continues to label a large pool of unlabeled data, creating pseudo-labels. These pseudo-labels are then utilized to train a new model (referred to as the student) with an expanded dataset. This teacher-student learning framework is commonly known as knowledge distillation [13].

In our self-supervised self-train learning framework, both the teacher and student models utilize the ViT small model, as the backbone of the network, with the student model benefiting from a drop path for improved regularization. The ViT small model [13] that we employed was pre-trained on the CheXpert dataset [20], tailored for CXR analysis. Specifically, the model was pre-trained on five radiological categories: lung opacity, consolidation, edema, pneumonia, and pleural effusion. These categories were used for identifying various manifestations of infectious diseases. Through training on these specific categories and radiological markers, the model has been finely tuned to improve its performance and adaptability, effectively managing a diverse range of patient conditions and varying imaging settings.

Images are partitioned into $8x8$ pixel patches, converting each patch into a $384$ -dimensional embedding vector to capture local features within chest x-rays. This high-dimensional embedding is essential for capturing complex patterns indicative of pathological changes. Additionally, the network consists of 12 transformer layers with 6 attention heads each, augmenting its ability to detect dependencies among different regions of the image. Layer normalization ( $\epsilon=1\text{e-}6$ ) was employed to ensure stability. Subsequently, the models were encapsulated within a multi crop wrapper (MCW) to accommodate inputs of varying resolutions.

The network incorporates two distinct multilayer perceptrons (MLP) heads: 1) The self-distillation with no labels (DINO) head [21]: Integrated as a feature generation head within the DISTL framework for self-supervised learning, the DINO head utilizes batch normalization and the Gaussian error linear unit (GELU) activation to produce normalized feature vectors. These features support contrastive loss calculations, promoting robust, label-independent learning by enhancing feature discriminability and stability across different perspectives. 2) The classifier (CLS) head: It is attached directly to the ViT model and is tasked with binary classification of the processed images, determining the presence or absence of tuberculosis. The classification module comprises sequential linear layers followed by rectified linear unit (ReLU) activations, mapping the dense transformer outputs to the final classification task. The high-level conceptual flow of the framework is depicted in Figure 2.

2.3 Network Training and Performance Evaluation

Feature extraction was conducted using the DINO head, and tuberculosis detection was performed using the CLS head. We adapted the weights from the CheXpert pre-trained model and made minor adjustments to ensure compatibility with our training environment. We applied various data augmentation techniques to the original dataset, including random resizing, cropping, color jittering, rotation, auto contrast, equalization, and blurring, to prepare it for robust training across diverse conditions. The student model learned from labeled data by minimizing the binary cross entropy (BCE) with logits loss (BCEWithLogitsLoss) between its predictions and the labels derived from the transformed images. The loss function measured the error between the student’s predictions and the actual labels, directing backpropagation and parameter adjustments through the AdamW optimizer, which dynamically adjusts the weights based on cosine annealing schedules for both the learning rate and weight decay. Mixed precision training was utilized to boost both performance and efficiency.

For unlabeled training, both teacher and student models were initialized with matching weights extracted from the saved state dictionary of the student model, ensuring structural uniformity. This method guarantees that both models begin with a resilient, pre-trained base encompassing learned patterns specific to tuberculosis. A three-tiered augmentation strategy was employed comprising two global and one local augmentation methods. This configuration aimed to offer diverse perspectives of the input images, fostering robust feature learning during the self-supervised phase of training. The global part ranged from minimal to extensive augmentation (e.g., rotation, auto contrast) to emulate various viewing conditions and imaging variances observed in clinical scenarios. Conversely, the local part included intensive augmentation to capture intricate details, which are crucial for discerning subtle pathological characteristics in CXR images.

There were a total of three runs utilizing the unlabeled training data portions, with each new run incorporating an additional subset of unlabeled data. As training progressed through these subsequent runs, the student model was continually initialized from its updated state dictionary. In contrast, the teacher model was loaded from its own state dictionary, potentially containing more generalized and stable knowledge accumulated over multiple iterations. This enabled the teacher model to serve as a consistent and comprehensive guide, assisting the student model in stabilizing its learning amidst the increasing complexity of the data (Figure 2). In other words, with each subsequent training run, a larger portion of the unlabeled dataset was utilized. This gradual expansion of data exposure ensured that the models were not overwhelmed prematurely and allowed them to learn from intricate, unlabeled inputs, thereby enhancing their performance.

The training process incorporated two primary loss functions: 1) DINOLoss: This loss function plays a crucial role in the self-supervised learning aspect of the DISTL framework. In this setup, the teacher model processes only the global views of the images, which are expected to capture more general features, while the student model receives both global and local views (see Figure 2). This design encourages the student to learn from a richer, more varied context. DINOLoss is designed to minimize the difference in responses between the teacher and the student on the global views, thus encouraging the student model to internalize and replicate the semantic features perceived in the global crops. This mechanism aims to enhance the student’s ability to generalize from broad visual cues without relying directly on labeled data. 2) BCEWithLogitsLoss: It was employed for the self-training part of the training process. This loss function was used to align the student’s classifications (computed from both global and local views) with the teacher’s predictions (derived from the global views). This loss was calculated by comparing the transformed outputs from the teacher with the corresponding outputs from the student, adjusting the student’s understanding to be more in line with the teacher’s perspective. This alignment helps refine the student’s predictive capabilities on the task at hand. Both losses were weighted and combined to achieve a balance between self-supervised and supervised learning, enabling flexible adjustment of learning priorities throughout training epochs.

The following strategy was proceeded for parameter updating. For the student model, backpropagation was used to update it with gradients computed from the combined loss. For the teacher model, updates were performed through an exponential moving average (EMA) of the student model’s parameters, integrating refined student parameters over time to stabilize learning. Periodic adjustments were also implemented. Every 500 iterations, comparisons against the student model were made using labeled data. This was accomplished through BCEWithLogitsLoss computation, comparing the student’s classification outputs with the labels. This supervised adjustment fine-tuned the student model’s performance, which was crucial for maintaining accuracy.

The model performance was evaluated using several metrics including recall, precision, and accuracy scores. These metrics assessed the diagnostic performance of the model in distinguishing TB-positive and -negative cases. Furthermore, we compared the performance of our model against four fine-tuned convolutional neural network (CNN)-based baseline models, namely a custom vanilla CNN, VGG16, ResNet18, and ResNet50. All experiments and evaluations were carried out within an environment with PyTorch version 2.2.1+cu118, Pillow version 9.0.1, OpenCV-Python version 4.9.0.80, and Scikit-Learn version 1.0.2. These components were operated on a system equipped with CUDA 11.1 and an NVIDIA RTX 4090 GPU, running Python version 3.8.0.

2.4 Explainability Analysis

DISTL provides a simpler localization of lesions through the model’s attention mechanism. We adopted the methodology outlined in [13] to evaluate whether the network effectively identifies disease-related patterns or indicators in the CXR images. It is argued that ViT achieves superior localization compared to the CNN due to its direct attention mechanism, as opposed to the CNN’s indirect attention, e.g., through Gradient-weighted Class Activation Mapping (GradCAM) [13]. Hence, we assessed the localization performance with model attention. The predictions generated by model attention were derived by applying threshold values post-normalization to localize the target lesions. Given the ViT model’s multiple heads available for visualization, the best-performing head was chosen for evaluation and visualization purposes.

3 Results

We assessed the effectiveness of our network in detecting TB cases from CXR images through two approaches. First, we evaluated the network’s quantitative performance and compared it against various baseline models. Subsequently, we examined its decision-making process by conducting an explainability performance validation. Further analysis of the results will be elaborated upon in this section.

3.1 Performance Analysis

Table 1 presents the performance comparison results between our self-supervised self-trained network and four baseline models. All five models underwent training and testing on identical datasets. As expected, the vanilla CNN model, characterized by a straightforward architecture comprising two convolutional layers and a fully connected layer, exhibits the lowest recall for TB detection. The VGG and ResNet architectures demonstrate enhancements over the vanilla CNN. The VGG16 model achieves an accuracy of 96.07%, whereas ResNet18 and ResNet50 attained scores of 96.49% and 97.31%, respectively. A consistent trend of escalating precision and recall in both categories is apparent across these models, with the ResNet50 model surpassing the others in the TB class. As observed, our model, employing image segmentation and self-attention techniques, outperforms all other models across all classes and performance metrics. It achieves an overall accuracy of 98.14% and demonstrates exceptional precision and recall in both normal and tuberculosis classes. The process of distilling knowledge through self-supervised learning and self-training, despite the lack of lesion-specific information, is argued to foster a robust correlation between attention and the lesion, which can potentially enhance the model’s diagnostic accuracy [13].

Table 1: Model performance comparison.

Model	Class	Precision	Recall	F1 score	Accuracy
Vanilla CNN	N	92.11%	98.32%	95.11%	93.80%
Vanilla CNN	TB	97.01%	86.63%	91.53%	93.80%
VGG16	N	94.84%	98.99%	96.87%	96.07%
VGG16	TB	98.28%	91.44%	94.74%	96.07%
ResNet18	N	94.87%	99.66%	97.21%	96.49%
ResNet18	TB	99.42%	91.44%	95.26%	96.49%
ResNet50	N	96.71%	98.99%	97.84%	97.31%
ResNet50	TB	98.33%	94.65%	96.46%	97.31%
Our model	N	97.37%	99.66%	98.50%	98.14%
Our model	TB	99.44%	95.72%	97.55%	98.14%
^aN: Normal, TB: Tuberculosis.
^bHighest values of performance metrics for normal and tuberculosis
classes are highlighted in bold green and red, respectively.

3.2 Explainability Analysis

Radiologists rely on several critical indicators to detect tuberculosis in CXR images. Specific abnormalities observed on chest x-rays, such as upper lobe infiltrates or consolidation, cavity formation, rounded densities in lung parenchyma, pleural effusion, and bilateral hilar lymphadenopathy, strongly indicate active tuberculosis [22]. These indicators, along with clinical history and other diagnostic tests, aid radiologists in the accurate detection and diagnosis of tuberculosis from CXR images.

In addition to the quantitative performance evaluation, a thorough explainability analysis was carried out on the proposed network. Figure 3 illustrates two samples of TB-positive patient cases, highlighting the critical factors identified. It is evident that the model primarily relies on clinically relevant areas of the lung in the CXR images to guide its decision-making process.

TB case 1: As depicted in Figure 3(b), the heatmap highlights the superior lobe in the left lung and the middle and inferior lobes in the right lung. Abnormal densities in the lung fields, such as infiltrates or consolidations, are also visible and identified. These are consistent with observations in clinical studies [23] and underscore the network’s ability to recognize patterns indicative of tuberculosis.

TB case 2: As seen in Figure 3(d), the heatmap demonstrates a pronounced emphasis on the superior lobe in the left lung, indicating a possible consolidation or cavitation, which are radiographic indicators of tuberculosis [23]. This attention pattern reaffirms the model’s concordance with established radiological knowledge, where the presence of cavitary lesions and consolidation typically indicates tuberculosis.

4 Discussion and Conclusion

Many AI models have been proposed in the literature for medical imaging with most of them being heavily dependent on the availability of massive labeled data of high quality [13]. Although vast amounts of medical imaging data are accumulated annually, leveraging this data with common supervised learning approaches is hindered by label scarcity [13]. In this work, we utilized a self-supervised, self-trained deep neural network architecture tailored for tuberculosis case screening and detection from CXR images. Our evaluation encompassed a thorough performance assessment, incorporating an explainability validation to scrutinize and authenticate the decision-making processes of the network. In experimental results, it is evident that our model excels in detecting tuberculosis cases with high performance while also displaying clinically relevant behavior.

As indicated in Table 1, our model attained an impressive overall accuracy of 98.14%. Moreover, it exhibited high recall and precision rates of 95.72% and 99.44% in detecting TB cases and 99.66% and 97.37% in identifying normal cases. When compared to the baseline models, the proposed model also demonstrated superior performance across all the performance metrics and all classes. This difference in performance can be attributed to the architectural advantages of the self-supervised self-trained model. Unlike other models reliant on convolutional layers, ours interpreted inputs as arrays of patches, employing self-attention mechanisms to capture interdependencies across the image. This approach is especially advantageous in medical imaging, where a holistic grasp of entire images is critical for accurate diagnostics. Furthermore, we leveraged multiple techniques such as multi-crop strategies to enhance the model’s exposure to diverse image presentations during training. Additionally, we conducted a thorough explainability analysis to verify the network’s ability to capture clinically relevant indicators of tuberculosis. As illustrated in the examples presented in Figure 3, the network primarily relies on clinically relevant lung areas in CXR images to inform its decision-making process and avoids relying on erroneous visual indicators and imaging artifacts.

In conclusion, our study presents a promising framework for tuberculosis screening, demonstrating robust performance and reliable decision-making behavior. Our team is continuously working to further validate these findings across diverse patient populations and clinical settings, paving the way for broader adoption and impact in the fight against tuberculosis. We aspire for this work to contribute to the advancement of this field, aiding researchers and clinicians in addressing the global public health crisis effectively. Additionally, we hope for this research to enhance healthcare quality for individuals experiencing poverty and economic hardship, thereby addressing significant resource limitations they encounter.

5 Limitations and Future Work

This study represents an ongoing research effort, with avenues for further exploration and refinement in addressing the limitations outlined in this section. Our main objective in this work was to advance research in combating tuberculosis as a worldwide public health emergency and it is crucial to emphasize that the proposed network is not yet a fully deployable and production-ready solution. Access to patient demographics and detailed chest X-ray characteristics was unavailable to us during the study as data were acquired from publicly available data sources. We employed 1,923 TB-positive and 2,984 normal CXRs, dividing them into small labeled and large unlabeled subsets. Larger datasets are could be employed for conclusive results, suggesting a direction for future research. We conducted an explainability analysis to verify the model’s identification of clinically relevant features. Moving forward, we intend to enhance the evaluation by seeking validation from certified radiologists.

References

[1] World Health Organization. Global tuberculosis report 2023, 2023.
[2] United Nations. Sustainable development goals, 2022.
[3] J. C. Emery, Richards A. S., K. D. Dale, C. F. McQuaid, R. G. White, J. T. Denholm, and R. M. Houben. Self-clearance of mycobacterium tuberculosis infection: implications for lifetime risk and population at-risk of tuberculosis disease. Proceedings of the Royal Society B, 288:20201635, 2021.
[4] A. Wong, J. R. H. Lee, H. Rahmat-Khah, A. Sabri, A. Alaref, and H. Liu. Tb-net: a tailored, self-attention deep convolutional neural network design for detection of tuberculosis cases from chest x-ray images. Frontiers in Artificial Intelligence, 5:827299, 2022.
[5] E. W. Tiemersma, M. J. van der Werf, M. W. Borgdorff, B. G. Williams, and N. J. Nagelkerke. Natural history of tuberculosis: duration and fatality of untreated pulmonary tuberculosis in hiv negative patients: a systematic review. PloS one, 6:e17601, 2011.
[6] V. Nema. Tuberculosis diagnostics: Challenges and opportunities. Lung India, 29:259–266, 2012.
[7] M. Herrera Diaz, M. Haworth-Brockman, and Y. Keynan. Review of evidence for using chest x-rays for active tuberculosis screening in long-term care in canada. Frontiers in Public Health, 8:497453, 2020.
[8] J. Li, B. H. Yip, C. Leung, W. Chung, K. O. Kwok, E. Y. Chan, E. Yeoh, and P. Chung. Screening for latent and active tuberculosis infection in the elderly at admission to residential care homes: a cost-effectiveness analysis in an intermediate disease burden area. PloS one, 13:e0189531, 2018.
[9] P. Rajpurkar, C. O’Connell, A. Schechter, N. Asnani, J. Li, A. Kiani, R. L. Ball, M. Mendelson, G. Maartens, D. J. van Hoving, and R. Griesel. Chexaid: deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with hiv. NPJ digital medicine, 3:115, 2020.
[10] T. Rahman, A. Khandakar, M. A. Kadir, K. R. Islam, K. F. Islam, R. Mazhar, T. Hamid, M. T. Islam, S. Kashem, Z. B. Mahbub, and M. A. Ayari. Reliable tuberculosis detection using chest x-ray with deep learning, segmentation and visualization. IEEE Access, 8:191586–191601, 2020.
[11] N. Singh and S. Hamde. Tuberculosis detection using shape and texture features of chest x-rays. In Innovations in Electronics and Communication Engineering: Proceedings of the 7th ICIECE 2018, Springer Singapore, pages 43–50, 2019.
[12] S. Sunarti, F. F. Rahman, M. Naufal, M. Risky, K. Febriyanto, and R. Masnina. Artificial intelligence in healthcare: opportunities and risk for future. Gaceta sanitaria, 35:s67–s70, 2021.
[13] S. Park, G. Kim, Y. Oh, J. B. Seo, S. M. Lee, J. H. Kim, S. Moon, J. K. Lim, C. M. Park, and J. C. Ye. Self-evolving vision transformer for chest x-ray diagnosis through knowledge distillation. Nature communications, 13:3848, 2022.
[14] S. Jaeger, S. Candemir, S. Antani, Y. X. Wáng, P. X. Lu, and G. Thoma. Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg, 4:475–477, 2014.
[15] TB Portals Program. Drug resistant tuberculosis x-rays.
[16] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, 18:234–241, 2015.
[17] S. Asgari Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad, and G. Hamarneh. Deep semantic segmentation of natural and medical images: a review. Artificial Intelligence Review, 54:137–178, 2021.
[18] C. Wang. Segmentation of multiple structures in chest radiographs using multi-task fully convolutional networks. In Image Analysis: 20th Scandinavian Conference (SCIA), 20:282–289, 2017.
[19] Q. Xie, M. T. Luong, E. Hovy, and Q. V. Le. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10687–10698, 2020.
[20] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, and J. Seekins. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, 33:590–597, 2019.
[21] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
[22] S. M. Kamran, J. Ahmad, T. Ejaz, Y. Jamal, and S. A. Satti. Frequency of active and latent pulmonary tuberculosis in apparently healthy asymptomatic young patients having subtle non-specific x ray chest abnormalities. Pakistan Armed Forces Medical Journal, 70:373–378, 2020.
[23] W. J. Koh, Y. J. Jeong, O. J. Kwon, H. J. Kim, E. H. Cho, W. J. Lew, and K. S. Lee. Chest radiographic findings in primary pulmonary tuberculosis: observations from high school outbreaks. Korean journal of radiology, 11:612, 2010.