2002 09334 PDF
2002 09334 PDF
2002 09334 PDF
Pneumonia
Xiaowei Xu1, MD; Xiangao Jiang2, MD, Chunlian Ma3, MD; Peng Du4;
Xukun Li4; Shuangzhi Lv5, MD; Liang Yu1, MD; Yanfei Chen1, MD;
Junwei Su1, MD; Guanjing Lang1, MD; Yongtao Li1, MD; Hong Zhao1,
MD; Kaijin Xu1, PhD MD; Lingxiang Ruan5, MD; Wei Wu1, PhD MD
Innovation Center for Diagnosis and Treatment of Infectious Diseases, the First
1 / 29
Correspondence to Wei Wu, PhD, MD
E-mail: 1198042@zju.edu.cn
2 / 29
Key Points
patients from their computed tomography (CT) images and what is the diagnostic
FINDINGS: In this multi-center case study, the overall accuracy of the deep learning
models were 86.7% for three groups: COVID-19, Influenza-A viral pneumonia and
healthy cases.
3 / 29
Abstract
reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has
a relatively low positive rate in the early stage to determine COVID-19 (named by the
imaging of COVID-19 had their own characteristics, which are different from other
doctors call for another early diagnostic criteria for this new type of pneumonia as
soon as possible.
COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases with
DESIGN: The candidate infection regions were first segmented out using a
3-dimensional deep learning model from pulmonary CT image set. These separated
images were then categorized into COVID-19, Influenza-A viral pneumonia and
using a location-attention classification model. Finally the infection type and total
confidence score of this CT case were calculated with Noisy-or Bayesian function.
PARTICIPANTS: A total of 618 CT samples were collected: 219 from 110 patients
with COVID-19, 224 CT samples from 224 patients with Influenza-A viral
4 / 29
pneumonia, and 175 CT samples from healthy people.
RESULTS: The experiments result of benchmark dataset showed that the overall
CONCLUSIONS: The deep learning models established in this study were effective
5 / 29
1. Introduction
At the end of 2019, the novel coronavirus disease 2019 pneumonia (COVID-19)
occurred in the city of Wuhan, China.1-4 On January 24, 2020, Huang et al.5
the common onset symptoms were fever, cough, myalgia, or fatigue. All these 41
patients had pneumonia and their chest CT examination showed abnormalities. The
complications included acute respiratory distress syndrome, acute heart injury, and
secondary infections. Thirteen (32%) patients were admitted to the intensive care unit
(ICU), and six (15%) died. The the Kok-KH6 team at the University of Hong Kong
time.
high ICU admission and mortality. The current clinical experience for treating these
patients revealed that the RT-PCR detection of viral RNA from sputum or
nasopharyngeal swab had a low positive rate in the early stage. However, a high
proportion of abnormal chest CT images were obtained from patients with this disease.
for replacing nucleic acid testing with lung CT as one of the early diagnostic criteria
6 / 29
With the rapid development of computer technology, digital image processing
technology has been widely applied in the medical field, including organ
segmentation and image enhancement and repair, providing support for subsequent
network (CNN) with the strong ability of nonlinear modeling, have extensive
In this study, multiple CNN models were used to classify CT image datasets and
calculate the infection probability of COVID-19. The findings might greatly assist in
2. Method
219 from 110 patients with COVID-19 from the First Affiliated Hospital of Zhejiang
University, the No.6 People’s Hospital of Wenzhou, and the No.1 People’s Hospital of
Wenling , from Jan 19 to Feb 14, 2020. All three hospitals are designated COVID-19
hospitals in Zhejiang Province. Every COVID-19 patient was confirmed with RT-PCR
testing kit and we also excluded the cases that had no image manifestations on the
chest CT images. In addition, there had at least two days gap between CT datasets if
7 / 29
taken from the same patient to ensure the diversity of samples. The remaining 399 CT
samples were collected from the First Affiliated Hospital of Zhejiang University as
the controlled experiment group. Among them, 224 CT samples were from 224
patients with Influenza-A viral pneumonia including H1N1, H3N2, H5N1, H7N9 etc.,
and 175 CT samples from healthy people. There were 198 (90.4%) COVID-19 and
(196) 86.6% Influenza-A cases from early or progressive stages and the rest 9.6% and
13.4% cases from severe stage respectively (P > 0.05). Moreover, Influenza-A viral
pneumonia CT cases used as it was most critically to distinguish them from suspected
Zhejiang University approved this study and all research was performed in
A total of 528 CT samples (85.4%) were used for training and validation sets,
including 189 samples of patients with COVID-19, 194 samples from patients with
Influenza-A viral pneumonia, and 145 samples from healthy people. The remaining
Influenza-A viral pneumonia, and 30 healthy case. Furthermore, the test cases of CT
set were selected from the people who had not been included in the training stage.
8 / 29
(a) (b) (c)
CT image.
2.2 Process
Figure 2 shows the whole process of COVID-19 diagnostic report generation in this
study. First, the CT images were preprocessed to extract effective pulmonary regions.
Second, a 3D CNN model was used to segment multiple candidate image cubes. The
center image, together with the two neighbors of each cube, was collected for further
steps. Third, an image classification model was used to categorize all the image
irrelevant-to-infection. Image patches from the same cube voted for the type and
confidence score of the candidate as a whole. Finally, the overall analysis report for
9 / 29
Process
= COVID-19 83.1%
COVID-19 68.4%
= COVID-19 73.1%
Influenza-A-viral-
= Influenza-A-viral- pneumonia 31.6%
pneumonia 57.1%
Method
To study was expedited using the same method and models in data preprocessing and
structures and types, including miliary, infiltrative, caseous, tuberculoma, and cavitary
etc. Although, the VNET20 based segmentation model VNET-IR-RPN17 was trained
for pulmonary tuberculosis purpose, it was verified to be still good enough to separate
was used both for both segmentation and classification. Only the segmentation-related
bounding box regression part was preserved, regardless of the classification results,
because only the former task was required at this stage in this study.
10 / 29
2.4 Image data processing and augmentation
A large number of non-infection regions irrelevant to this study were also separated
Influenza-A-viral-pneumonia.
candidate cubes were generated from the 3D segmentation model. Only the territory
close to the middle of this cube contained maximum information about this focus of
infection. Hence, only the center image together with the two neighbors of each cube
was collected to represent this region for future classification steps. Next, all image
patches were manually classified by two professional radiologists into two types:
irrelevant-to-infection and pneumonia. The the images in the latter category were
A total of 11,871 image patches were acquired from the aforementioned steps,
validation sets had 528 CT samples, equivalent to 10,161 (85.6%) images, including
irrelevant-to-infection images. The remaining 1,710 (14.4%) images were reserved for
11 / 29
the test dataset.
so as to reduce the influence of the uneven distribution of different image types on the
present dataset. At the same time, generic data expansion mechanisms, such as
performed on specimens to increase the number of training samples and prevent data
overfitting.
The work of Jeffrey Kanne21 and Chung M et al,22 showed at least three
distribution along with the pleura, and usually more than one independent focus of
The models were optimized based on these findings. The image classification
model was designed to distinguish the appearance and structure of different infections.
Moreover, relative distance-from-edge as an extra weights, was used for the model to
learn the relative location information of the patch on the pulmonary image. The focus
COVID-19.
12 / 29
1) Measure the minimum distance from the mask to the center of this patch
3) Then the relative distance-from-edge achieved by the distance obtained from step
minimum distance from the mask to the center of this patch (double-headed
pulmonary image.
another model was designed based on the first network structure by concatenating the
13 / 29
accuracy rate.
The classical ResNet-18 network structure was used for image feature
extraction. Pooling operations were also used for the dimensional reduction of data to
The output of the convolution layer was flattened to a 256-dimensional feature vector
was first normalized to the same order of magnitude and then concatenated to this
output the final classification result together with the confidence score.
Resnet-18
...
repeated
18 times
256
16 16+1
Influenza-A- 8 30×30×32
viral-pneumonia 3 15×15×16
8×8×4
COVID-19
irrelevant-
to-infection
relative distance-
from-edge
14 / 29
Figure 4. Network structure of the location-attention oriented model.
technology, one candidate region was represented by three image patches: the center
image together with its two neighbors. These three images voted for this whole region
1) If at least two images were categorized into the same type, then the image with the
2) Otherwise, the image with the maximum confidence score was picked (no type
dominated).
Regions that voted as irrelevant-to-infection type was ignored in the next step.
One of the remarkable features of COVID-19 is more than one independent focus of
infections in one CT case. It is reasonable that the overall probability is much larger
than 50% if a patient has two COVID-19 regions, both having a 50% probability.
Accordingly, the total infection confidence score (P) for one infection type was
calculated using the probability formula of the Noisy-or Bayesian function as follows:
P 1 (1 Pi ) (1)
i
15 / 29
The confidence scores of two types PCOVID 19 and PInfluenza Aviral pneumonia were
deduced accordingly, then this CT sample was categorized to the corresponding group
Moreover, the following strategies were used output the confidence possibility of an
no-infection-found case
2. If one of the P values was equal to 0, then the other P value was exported
3. Otherwise, the softmax function was used to generate two confidence scores.
e Pi
Si (2)
je j
P
as the confidence score for each type of infection. The softmax operation normalized
the sum of Si to 100% and did not alter the judgment result of infection types.
3. Results
An Intel i7-8700k CPU together with NVIDIA GPU GeForce GTX 1080ti was used
as the testing server. The processing time largely depended on the number of image
layers in one CT set. On average, it was less than 30s for a CT set with 70 layers from
16 / 29
3.2 Training process
As one of the most classical loss function used in classification models, cross-entropy
was used in this study. When the epoch number of training iterations increased to
more than 1000, the loss value did not decrease or increase obviously, suggesting that
the models converged well to a relative optimal state without distinct overfitting. The
training curves of the loss value and the accuracy for two classification models were
better performance on the training dataset compared with the original ResNet.
Figure 5. Training curve of loss and accuracy for the two classification
models.
17 / 29
3.3 Performance on test dataset
The accuracy of a method determines how correct the values are predicted. Precision
determines the reproducibility of the measurement or how many of the predictions are
correct. Recall shows how many of the correct results are discovered. F1-score uses a
following equations show how to calculate these values, where TP, TN, FP and FN are
true positive, true negative, false positive, and false negative respectively.
TP TN
accuracy (3)
TP FP TN FN
TP
precision (4)
TP FP
TP
recall (5)
TP FN
2 precision recall
f 1 score (6)
precision recall
3.3.2 Segmentation
Influenza-A-viral-pneumonia and healthy) for the test set, following the rule that this
person (owner of this CT) had not been included in the previous training stage.
normal regions could be included in. One CT case from COVID-19 group that had no
18 / 29
infections could be barely notified by human eyed and seemed too tenuous to be
Figure 6. All CT images from a single CT case. The focus of infections are
A total of 1,710 image patches were acquired from 90 CT samples, including 357
(ground truth). To determine which was the optimal approach, the design of each
methodology was assessed using a confusion matrix. Two network structures were
and Table 2.
Predicted result
COVID-19 IAVP ITI
(M1/M2) (M1/M2) (M1/M2)
COVID-19 (M1/M2) 260/273 47/32 50/52
Actual
IAVP (M1/M2) 55/46 276/280 59/64
result
ITI (M1/M2) 75/77 81/82 807/804
19 / 29
Table 1. The confusion matrix of COVID-19,
mechanism model.
Table 2. The recall, precision and f1-score of two classification models for
The average f1_scores for two models were 0.750 and 0.764, which indicated that the
averagely. Therefore, this model was used for the rest of this study.
Each image patch voted to represent this entire candidate region. A total of 570
The confusion matrix of voting result and corresponding recall, precision and f1-score
were showed in Table 3 and Table 4. The average f1-score for three categories was
20 / 29
Predicted result
COVID-19 IAVP ITI
COVID-19 97 15 7
Actual
IAVP 18 98 14
result
ITI 5 2 314
Table 4. The recall, precision and f1-score of after voting process for each
irrelevant-to-infection (ITI).
Noisy-or Bayesian function was used to identify the dominated infection types. Three
counted by the Bayesian function, we only compare the average f1-score for the first
two items. They were 0.806 and 0.843 respectively, which showed a promotion of
4.7%. Moreover, the overall classification accuracy for all three groups are 86.7%.
Predicted result
COVID-19 IAVP NIF
21 / 29
COVID-19 26 3 1
Actual
IAVP 4 25 1
result
NIF 2 1 27
Table 6. The recall, precision and f1-score from the output of the Bayesian
no-infection-found (NIF).
4. Discussion
COVID-19, which was first detected in Wuhan China, has caused serious public
health safety problems and hence become a global concern.25-27 The severe situation
puts forward new requirements for the prevention and control strategy. A large
number of patients with viral pneumonia had been detected in Wuhan city. The
RT-PCR test of 2019-nCoV RNA can make a definite diagnosis of COVID-19 from
Influenza-A viral pneumonia patients. However, the nucleic acid testing has some
defects, such as time lag, relatively low detection rate, and short of supply. In the
early stage of COVID-19, some patients may already have positive pulmonary
imaging findings but they have no sputum and negative test results in nasopharyngeal
swabs of RT-PCR. These patients are not diagnosed as suspected or confirmed cases.
22 / 29
Thus, they are not be isolated or treated for the first time, making them potential
sources of infection.
the "halo sign" of surrounding ground glass shadow in both lungs, mesh shadows and
bronchiectasis and inflating signs inside the lesions, and multiple consolidation of
different sizes and grid-shaped high-density shadows. However, it is not objective and
accurate to distinguish COVID-19 from other diseases only with human eyes. In
comparison, deep learning system-based screen models revealed more specific and
reliable results by digitizing and standardizing the image information. Hence, they can
assist physicians to make a quick clinical decision more accurately, which would
In this study, the deep learning technology was used to design a classification
terms of the network structure, the classical ResNet was used for feature extraction. It
was compared with the network model with and without the added location-attention
mechanism. The experiment showed that the aforementioned mechanism could better
23 / 29
to combine the patients’ contact history, travel history, first symptoms and laboratory
examination. In this study, the number of model samples was limited. Hence, the
training and test the number of samples should be expand to improve the accuracy in
the future. More multi-center clinical studies should be conducted to cope with the
classification model. A better exclusive models should be designed for training, the
segmentation and classification accuracy of the model should be improved, and the
generalization performance of this algorithm should be verified with a larger data set.
5. Conclusion
In this multi-center case study, we had presented a novel method that could screen
radiography with the overall accuracy rate of 86.7 % and could be a promising
24 / 29
References
10.1056/NEJMoa2001017.
10.1056/NEJMoa2001316.
10.2807/1560-7917.ES.2020.25.3.2000045.
S0140-6736(20)30183-5. doi:10.1016/S0140-6736(20)30183-5.
6. Chan JF, Yuan S, Kok KH, et al. A familial cluster of pneumonia associated with
10.1016/S0140-6736(20)30154-9.
7. Liu X, Guo S, Yang B, et al. Automatic Organ Segmentation for CT Scans Based
2018, 31(6).
25 / 29
8. Gharbi, Michaël, Chen J, Barron J T, et al. Deep Bilateral Learning for Real-Time
32(8).
12. Zhu W, Huang Y, Zeng L, et al. AnatomyNet: Deep learning for fast and fully
13. Huang P, Park S, Yan R, et al. Added Value of Computer-aided CT Image Features
for Early Lung Cancer Diagnosis with Small Pulmonary Nodules: A Matched
10.1148/radiol.2017162725.
14. Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with
26 / 29
cancer with deep neural networks. Nature. 2017 Feb 2;542(7639):115-118. doi:
arXiv:1910.02285, 2019.
2016.
Infections from Wuhan, China: Key Points for the Radiologist[J].Radiology, 2020.
23. He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[J].
27 / 29
2015.
25. Wang C, Horby PW, Hayden FG, et al. A novel coronavirus outbreak of global
doi:10.1016/S0140-6736(20)30185-9.
26. Holshue ML, DeBolt C, Lindquist S, et al. First Case of 2019 Novel Coronavirus
10.1056/NEJMoa2001191.
doi:10.1016/S0140-6736(20)30211-7.
28 / 29
Acknowledgments
This study was supported by the China National Science and Technology Major
Competing interests
Author contributions
Wei Wu and Xiaowei Xu initiated the project and provided clinical expertise and
guidance on the study design. Xukun Li and Peng Du desiged the network
analysis. Xiaowei Xu, Wei Wu, Xukun Li and Peng Du wrote the manuscript.
Xiangao Jiang, Chunlian Ma, Shuangzhi Lv, Liang Yu, Yanfei Chen, Junwei Su,
Guanjing Lang, Yongtao Li, Hong Zhao, Kaijin Xu and Lingxiang Ruan collected the
datasets and interpreted the data. Xiaowei Xu, XianGao Jiang, Chun-Lian Ma and
29 / 29