Accuracy and Efficiency of Deep-Learning-Based Automation of Dual Stain Cytology in Cervical Cancer Screening

JNCI J Natl Cancer Inst (2021) 113(1): djaa066
doi: 10.1093/jnci/djaa066
First published online June 25, 2020
Article
Accuracy and Efficiency of Deep-Learning–Based Automation of Dual

Stain Cytology in Cervical Cancer Screening
Nicolas Wentzensen, MD , 1,*,1 Bernd Lahrmann, PhD, 2,1 Megan A. Clarke, PhD , 1 Walter Kinney, MD ,3
Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

Diane Tokugawa, MD , 4 Nancy Poitras, BS, 4 Alex Locke, MD, 4 Liam Bartels, BS, 5,6 Alexandra Krauthoff, BS, 5,6
Joan Walker, MD, Rosemary Zuna, MD, 7 Kiranjit K. Grewal, MS, 4 Patricia E. Goldhoff, MD, 4 Julie D. Kingery, MD, 4
7
Philip E. Castle, PhD , 8 Mark Schiffman, MD, 1 Thomas S. Lorey, MD, 4 Niels Grabe, PhD, 2,5,6
1
Affiliations of authors: Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA, 2Steinbeis Transfer Center for Medical Systems
Biology, Heidelberg, Germany, 3Global Coalition Against Cervical Cancer, Arlington, VA, USA, 4Kaiser Permanente TPMG Regional Laboratory, Berkeley, CA, USA,
5
Hamamatsu Tissue Imaging and Analysis Center (TIGA), BIOQUANT, University Heidelberg, Heidelberg, Germany, 6National Center of Tumor Diseases, Medical
Oncology, University Hospital Heidelberg, Heidelberg, Germany, 7University of Oklahoma, Oklahoma City, OK, USA and and 8Albert Einstein College of Medicine,
Bronx, NY, USA
‡Authors contributed equally to this work.
*Correspondence to: Nicolas Wentzensen, MD, PhD, MS, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Room 6E-
448, Bethesda, MD 20892-9774, USA (e-mail: wentzenn@mail.nih.gov).
Abstract
Background: With the advent of primary human papillomavirus testing followed by cytology for cervical cancer screening,
visual interpretation of cytology slides remains the last subjective analysis step and suffers from low sensitivity and
reproducibility. Methods: We developed a cloud-based whole-slide imaging platform with a deep-learning classifier for p16/
Ki-67 dual-stained (DS) slides trained on biopsy-based gold standards. We compared it with conventional Pap and manual DS
in 3 epidemiological studies of cervical and anal precancers from Kaiser Permanente Northern California and the University
of Oklahoma comprising 4253 patients. All statistical tests were 2-sided. Results: In independent validation at Kaiser
Permanente Northern California, artificial intelligence (AI)-based DS had lower positivity than cytology (P < .001) and manual
DS (P < .001) with equal sensitivity and substantially higher specificity compared with both Pap (P < .001) and manual DS
(P < .001), respectively. Compared with Pap, AI-based DS reduced referral to colposcopy by one-third (41.9% vs 60.1%, P < .001).
ARTICLE
At a higher cutoff, AI-based DS had similar performance to high-grade squamous intraepithelial lesions cytology, indicating a
risk high enough to allow for immediate treatment. The classifier was robust, showing comparable performance in 2 cytology
systems and in anal cytology. Conclusions: Automated DS evaluation removes the remaining subjective component from
cervical cancer screening and delivers consistent quality for providers and patients. Moving from Pap to automated DS
substantially reduces the number of colposcopies and also achieves excellent performance in a simulated fully vaccinated
population. Through cloud-based implementation, this approach is globally accessible. Our results demonstrate that AI not
only provides automation and objectivity but also delivers a substantial benefit for women by reduction of unnecessary
colposcopies.
Advances in digital imaging and machine learning can revolu- high-risk human papillomavirus (HPV) screening (5–7).
tionize cancer screening, diagnosis, and treatment by improving Although a negative HPV test provides great reassurance of low
accuracy and reproducibility of image assessment and stream- cervical cancer risk over the next decade (8–10), only a small
lining clinical workflow (1–4). With its requirement for high subset of women with a positive HPV test require further evalu-
throughput and fast turnaround and its dependence on micro- ation. To avoid overburdening the system with HPV-positive
scopic and visual technologies, automation can play a major women, additional triage is required for colposcopy referral (11,
role in improving the efficiency of cervical cancer screening. 12). Current triage strategies include partial HPV genotyping
Many countries are currently switching from Pap cytology to and Pap cytology (7, 13). The limited sensitivity and
Received: November 5, 2019; Revised: March 18, 2020; Accepted: April 30, 2020
© The Author(s) 2020. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact journals.permissions@oup.com
72
N. Wentzensen et al. | 73
reproducibility of cytology require laborious quality control pro- candidate CNN was applied on the slide level on training slides.
cedures and frequent retesting (14, 15). Improving the efficiency A cutoff of positive tiles is used to determine slide positivity (3
of cervical cancer screening is particularly important for vacci- tiles per cell for CNN4 and 2 tiles per cell for IncV3). From mis-
nated populations due to lower disease prevalence and higher classified slides, false-positive or false-negative tiles were
demands for screening test performance. extracted and fed back into the original CNN training to opti-
A promising triage strategy is concomitant detection of p16 mize classification accuracy of the CNN. A final locked CNN was
and Ki-67 in the same cell (p16/Ki-67 dual stain [DS]), 2 markers applied on the patient level on the blinded validation set com-
that are closely linked to cervical carcinogenesis and HPV onco- prising 3803 slides. CNN4 showed good performance in
protein actions. The HPV oncoprotein E7 interrupts cell cycle Thinprep slides but not in Surepath slides. Subsequently, a sec-
control by releasing E2F, activating p16 expression. The coex- ond algorithm (IncV3) was trained specifically for Surepath
pression of p16 and Ki-67, a cell proliferation marker, in the slides (Supplementary Methods, available online). We published
same cell is specific to HPV-related carcinogenesis. DS has a GitHub repository and created a web page at https://github.
shown greater accuracy for detection of HPV-related precancers com/stcmedhub/dual_stain_dl with a source code description of
compared with cytology (16–21). Currently, artificial intelligence the models and the installation instructions.
(AI) algorithms mostly try to match manual reading accuracy to

improve automation but do not offer a substantial improve-
ment for patients. Automated scanning and deep-learning eval- Study Populations
uation of DS slides can improve throughput, reproducibility,
The Biopsy Study is a population-based study of women aged
and accuracy of the assay for better risk stratification and a di-
18 years or older referred to colposcopy at the University of
rect benefit to women (8, 22, 23). To achieve this, we developed
Oklahoma Health Sciences Center between 2009 and 2011 (24).
the CYTOREADER system that combines whole-slide scanning
We included DS slides from 602 women as previously described
with automated evaluation of DS cytology slides. Cloud-based
(19). The study population was split into a representative train-
evaluation provides ample computational capacity and storage
ing set (193 slides with 741 DS-positive and 953 DS-negative
space and can provide diagnostic procedures where sufficient
tiles) and a validation set of 409 slides (Figure 1). This study was
personnel, expertise, or infrastructure is lacking. We evaluated
approved by the University of Oklahoma and National Cancer
the clinical performance of CTYOREADER in 4253 slides from 3
Institute (NCI) institutional review boards (IRB); written in-
epidemiological studies of HPV-positive cervical and anal
formed consent was obtained from all participants before study
precancers.
enrollment.
The Anal Cancer Screening Study (ACSS) was based at the
San Francisco Kaiser Permanente Northern California (KPNC)
Methods
Anal Cancer Screening Clinic. HIV-positive men who have sex
with men 18 years or older were enrolled at KPNC between 2009
General Approach
and 2010. DS slides from 318 men were generated as previously
CYTOREADER uses whole-slide scanners (Hamamatsu described (25). From 19 training slides, 445 DS-positive and 532
Nanozoomers HT, XR, and S360) for imaging of ThinPrep DS-negative tiles were used for training (Figure 1). This study
(Hologic) or SurePath (Becton Dickinson, BD) slides, 2 widely was approved by the KPNC and NCI IRBs; written informed con-
used liquid-based cytology technologies. CYTOREADER is a sent was obtained from all participants before study
cloud-based system (Google Cloud Platform) that can also run enrollment.
as a local installation. Training of deep-learning algorithms for At KPNC, DS was evaluated for triage of HPV-positive women
automated DS evaluation was performed using small areas between 2012 and 2015 in a population of women aged 25 years
(tiles) from whole slides containing individual or small numbers and older who were undergoing routine cervical cancer screen-
ARTICLE
of epithelial cells. For training of the deep-learning algorithms, ing (16). From a screening population of more than 300 000
tiles from training slides were manually evaluated for DS- women in a year, 3333 slides from HPV-positive women were in-
positive cells by 3 observers (Supplementary Figure 1, available cluded. From 238 training slides, 8215 DS-positive and 9739 DS-
online). negative tiles were used for training (Figure 1). The study was
approved by the KPNC IRB and was exempted from institutional
review at the NCI by the Office of Human Subjects Research.
Deep Learning Patient consent was waived because deidentified discarded
specimens were used in this study.
Two deep-learning approaches (Convolutional Neural Network
with 4 layers [CNN4] and Inception-v3 with 48 layers [IncV3])
were developed sequentially as shown in Figure 1 and described
Clinical Endpoints
in Supplementary Methods (available online). The algorithms
determine the number of DS-positive cells on a slide by detect- All studies followed routine clinical practice at the respective
ing the number of tiles above a certain likelihood threshold. A institutions. Cytology was classified by the Bethesda System:
slide is considered positive if the number of DS-positive cells on negative for intraepithelial lesions or malignancy, atypical
a slide exceeds a certain cutoff. Training and validation were squamous cells of undetermined significance, low-grade squa-
conducted on the tile level and the slide level. First, a training mous intraepithelial lesions, and high-grade squamous intrae-
set from 450 patients was selected for which the clinical end- pithelial lesions (HSIL) (26). Final diagnosis was established by
point cervical intraepithelial neoplasia grade 3 or greater histopathology classified according to the cervical intraepithe-
(CIN3þ) was unblinded. Tiles were selected for initial training lial neoplasia (CIN) scale for cervical endpoints, which indicates
(80%) and validation (20%) of the algorithm. The deep-learning the extent of dysplastic cells in the cervical epithelium: no indi-
network provides a likelihood for each tile above which it is con- cation for biopsy, normal CIN, grade 1 (CIN1), grade 2 (CIN2),
sidered positive (0.5 for CNN4 and 0.4 for IncV3). The resulting grade 3 (CIN3), and cancer. We grouped adenocarcinoma in situ
74 | JNCI J Natl Cancer Inst, 2021, Vol. 113, No. 1
Patient level Tile level Slide level
Developing AI algorithm on le level Improving AI algorithm on slide level

Candidate CNN
20% les for Slide-level Ground truth
validaon predicon (CIN3+ histology)
450 paents
Train le + +
based CNN
Training
193 paents 19 paents 238 paents + -

1,694 les 977 les 17,954 les
(44% DS+) (46 % DS+) (46% DS+) - +
Enrich le sets - -
80% les for
training
Studies
Biopsy Anal Screening Kaiser Screening

Study Study Study

602 women at 318 men at 3,333 HPV- Locked CNN
colposcopy anoscopy posive women
Evaluang on the slide level

Applying AI algorithm on le level
Slide-level Ground truth
Tile-level predicon predicon (CIN3+ histology)
Validation
(% likelihood threshold) + +
409 paents 299 paents 3,095 paents Slide to les Number of posive les + - Clinical
3,803 paents
- + performance
- -
Figure 1. Study design. AI ¼ artificial intelligence; CNN ¼ convolutional neural network; CIN3þ ¼ cervical intraepithelial neoplasia grade 3 or worse; DS ¼ dual stain.
with CIN3. For anal disease endpoints, the comparable anal (AUC) was calculated. Sensitivity and specificity coordinates for
intraepithelial neoplasia nomenclature (AIN) was used. manual DS evaluation and cytology were plotted on the receiver
operator characteristics curve for comparison. We calculated
percentage positivity, sensitivity, specificity, and Youden’s in-
p16/Ki-67 Staining and Evaluation dex in the Biopsy Study and ACSS for the cutoff determined by
CNN4 and for manual DS evaluation. In the Kaiser Study, with a
For the Biopsy Study and ACSS, slides were prepared from resid-
representative population of women who underwent routine
ual PreservCyt material using a T2000 processor (Hologic,
screening, we calculated percentage positivity, sensitivity, spe-
Bedford, MA). For the KPNC study, slides were prepared from re-
cificity, and positive and negative predictive values for auto-
sidual SurePath tubes according to the manufacturer’s instruc-
mated and manual DS. Differences in positivity, sensitivity, and
tions (BD, Sparks, MD). Immunostaining of cervical cytology
specificity were evaluated using an exact McNemar’s v2, and dif-
ARTICLE
slides for p16/Ki-67 was performed using the CINtec Plus Kit
ferences in predictive values were evaluated using the R pack-
(Roche, Tucson, AZ) according to the manufacturer’s instruc-
age DTComPair, using the generalized score statistic (27). To
tions. DS-trained cytotechnologists reviewed all slides; a slide
evaluate clinical efficiency of each strategy, we estimated the
was considered positive if 1 or more cervical epithelial cell(s)
number of CIN3þ detected for different cutoffs of DS-positive
stained both with a brown cytoplasmic stain (p16) and a red nu-
cells and the ratio of the number of tests and colposcopies per
clear (Ki-67) irrespective of morphologic abnormalities. Slides
case of CIN3þ detected. We also evaluated the theoretical per-
from the Biopsy Study and ACSS were stained and evaluated at
formance of automated DS in a fully vaccinated population by
Roche mtm laboratories AG, Heidelberg, Germany, whereas
excluding all women who were positive for HPV16 and/or
slides from the Kaiser DS study were stained and evaluated at
HPV18 from the analysis. Analyses were performed in SPSS,
KPNC. HPV testing with partial genotyping (HPV16 and HPV18)
Stata, and R. All statistical tests were 2-sided and P less than .05
at KPNC was based on the cobas assay (Roche, Pleasanton, CA).
was considered statistically significant.
Statistical Analysis
Results
We created boxplots and calculated medians to show the distri-
bution of DS-positive cells in cytology and histology categories. Automated Detection of DS-Positive Cells in Colposcopy
We compared differences in DS cell counts in ordinal cytology
and Anoscopy Populations
and histology categories using 1-way analysis of variance. The
primary endpoint for the Biopsy Study and the Kaiser Study was We developed a deep-learning algorithm for automated detec-
CIN3 or greater (CIN3þ). For ACSS, the primary endpoint was tion of DS-positive cells on ThinPrep slides from 2 referral popu-
AIN2 or AIN3 (AIN2þ). Receiver operator characteristics curve lations (Biopsy Study and ACSS), including 212 training slides
analysis was conducted for the number of DS-positive cells with 1186 DS-positive and 1485 DS-negative tiles (Figure 1). We
against the primary endpoints, and the area under the curve evaluated the algorithm in independent validation slides from

Figure 2. Receiver operating curve characteristics analysis of number of dual stain (DS)-positive cells detected by CYTOREADER for detection of cervical precancer in
the Biopsy Study and anal precancer in the Anal Cancer Screening Study. AUC ¼ area under the curve; AIN2þ ¼ anal intraepithelial neoplasia grade 2 or worse; CIN3þ
¼ cervical intraepithelial neoplasia grade 3 or worse.
Table 1. Accuracy for cervical and anal precancer based on manual and automated detection of DS-positive cells on ThinPrep slides in a col-
poscopy population (Biopsy Study, N ¼ 409) and an anoscopy population (ACSS, N ¼ 299)
Positive Sensitivity Specificity

a a
Evaluation % P AUC % (95% CI) P % (95% CI) Pa Youden’s index
Biopsy Study validation set (CIN3þ)

Manual 63.1 Ref 87.0 (75.6 to 93.6) Ref 40.5 (35.6 to 45.7) Ref 0.27
CNN4 57.9 .06 0.74 87.0 (75.6 to 93.6) 1.0 45.6 (40.5 to 50.8) .07 0.33
ACSS validation set (AIN2þ)
Manual 71.0 Ref 92.8 (82.2 to 96.5) Ref 36.1 (30.3 to 42.4) Ref 0.29
CNN4b 62.9 <.001 0.77 91.3 (80.2 to 95.4) 1.0 46.1 (40.0 to 52.6) <.001 0.37
ARTICLE
a
Two-sided McNemar’s test. ACCSS ¼ Anal Cancer Screening Study; AIN2þ ¼ anal intraepithelial neoplasia grade 2 or worse; AUC ¼ area under the curve; CIN3þ ¼ cer-
vical intraepithelial neoplasia grade 3 or worse; DS ¼ dual stain; CNN ¼ convolutional neural network; Ref ¼ referent.
bCNN4 cutoff for Biopsy: 3 or more cells; CNN4 cutoff for Anal: 3 or more cells.
both studies (Figure 1). In both studies, we observed an increase specificity compared with manual DS (36.1% vs 46.1%, respec-
in the number of DS-positive cells by increasing severity of cy- tively, P ¼ .001) (Table 1).
tology and histology, with higher absolute DS-positive cell num-
bers in ACSS (P < .001 for all comparison; Supplementary Figure
2, available online).
Automated Detection of DS-Positive Cells in an HPV
In the Biopsy Study validation set with 53 CIN3þ, the AUC
Screening Population
for detecting CIN3þ based on automated DS using CNN4 was
0.74 (Figure 2). At a cutoff of 3 DS-positive cells, the CNN4 algo- We developed the deep-learning algorithm for SurePath slides
rithm had marginally lower positivity (58% vs 63%, respectively, using a training set of 238 slides from the Kaiser study with
P ¼ .06) with comparable sensitivity (P ¼ 1.0) and marginally 8215 DS-positive and 9739 DS-negative tiles and applied it in an
higher specificity compared with manual DS (40.6% vs 45.7%, re- independent validation set of slides from 3095 women. We ob-
spectively, P ¼ .07) (Table 1). served an increase of DS-positive cells with increasing severity
In the ACSS validation set with 69 AIN2þ, the AUC for detect- of cytology and histology (Supplementary Figure 3, available
ing AIN2þ based on automated evaluation of DS slides with online).
CNN4 was 0.77 (Figure 2). At a cutoff of 3 DS-positive cells, the In the Kaiser validation study including 218 CIN3þ, the AUC
positivity of the CNN4 algorithm was lower (63% vs 71%, respec- for detecting CIN3þ based on automated evaluation of DS slides
tively, P ¼ .001) with comparable sensitivity (P ¼ 1.0) and higher was 0.82 (Figure 3). At a cutoff of 2 cells, the positivity of the
algorithm was statistically significantly lower (42% vs 50%, re- treatment according to current management guidelines
spectively, P < .001) with equal sensitivity (P ¼ .4) but statistically (Figure 3). Automated DS provided better risk stratification com-
significantly higher specificity (61.5% vs 52.6%, respectively, pared with Pap cytology and manual DS (Figure 4): more women
P < .001) compared with the manual DS. At a cutoff of 100 cells, were reassured of a lower risk compared with the other strate-
accuracy approached HSIL cytology that allows for immediate gies (58% for automated DS vs 50% for manual DS and 40% for
cytology), and risk among positives was higher.
Clinical Efficiency of Automated DS Evaluation
We compared the clinical efficiency of Pap cytology, manual DS,

and automated DS at 2 cutoffs (2 or more cells and 1 or more
cells) for triage of HPV-positive women (Table 2). All DS strate-
gies achieved equal or better sensitivity for detection of CIN3þ
compared with Pap cytology while reducing unnecessary colpo-

scopic referrals. Automated DS reduced overall referral to col-
poscopy by one-third for the primary automated cutoff of 2 cells
(41.9% for automated DS vs 60.1% for cytology, P < .001).
Automated DS at a cutoff of 2 or more cells had similar sensitiv-
ity but statistically significantly higher specificity compared
with manual DS evaluation (61.5% vs 52.6%, P < .001).
Automated DS detection at a cutoff of 1 or more cells achieved
the highest sensitivity of all strategies, with statistically signifi-
cantly higher specificity and lower colposcopy referral com-
pared with Pap cytology. Automated DS at a cutoff of 2 or more
cells had the most favorable ratio of colposcopies per CIN3þ
detected compared with the least favorable ratio for the current
standard, Pap cytology (6.8 vs 9.9, respectively). Extrapolating
this to the full Kaiser screening population, out of 300 000
women screened annually, approximately 30 000 would test
Figure 3. Receiver operating curve characteristics analysis of number of dual
stain (DS)-positive cells detected by CYTOREADER for detection of cervical pre- HPV-positive and more than 18 000 would be referred to colpos-
cancer in a human papillomavirus screening population in Kaiser Permanente copy using the current approach with Pap cytology, while only
Northern California. ASC-US ¼ Atypical Squamous Cells of Undetermined 12 570 would be referred to colposcopy using automated DS. We
Significance; AUC ¼ area under the curve; HSIL ¼ high-grade squamous intraepi- also estimated the performance of automated DS in a fully vac-
thelial lesions.
cinated population. Similar to the overall evaluation, automated
ARTICLE
Figure 4. Absolute risk of precancer for Pap cytology, manual dual stain (DS), and automated DS. ASCUSþ¼ positive for Atypical Squamous Cells of Undetermined
Significance or greater cytology results. The dotted lines show clinical action risk thresholds for colposcopy referral (4% risk) and immediate treatment (50% risk).
DS showed equal sensitivity and lower colposcopy referral com-
9.9 (8.7 to 11.3)

7.8 (6.8 to 8.9)
6.8 (5.9 to 7.7)
8.7 (7.6 to 9.9)

Colposcopies
/manual DS) No. (95% CI)

per CIN3þ
detected,
pared with Pap cytology with even higher specificity
(Supplementary Table 1, available online).
Pb (cytology
Discussion
Table 2. Accuracy for cervical precancer based on Pap cytology and manual and automated detection of DS-positive cells on SurePath slides in the Kaiser Validation Study (N ¼ 3095)
.02/Ref
.03/.8
.01/.5
Ref
Using a rigorous study design, we developed a novel deep-learn-

ing–based image analysis platform for automated evaluation of
DS cytology. In a large population of women undergoing HPV-
97.5 (96.6 to 98.3)
52.6 (50.8 to 54.5) <.001/ Ref 12.6 (11.0 to 14.3) <.001/ Ref 98.6 (98.0 to 99.2)
61.5 (59.7 to 63.3) <.001/ <.001 14.8 (12.9 to 16.8) <.001/ <.001 98.5 (97.8 to 99.0)
98.7 (97.9 to 99.2)

based cervical cancer screening, we show that automated eval-
(95% CI)
uation of DS slides dramatically increases the efficiency of cer-

NPV, %
Two-sided McNemar’s test. CIN3þ ¼ cervical intraepithelial neoplasia grade 3 or worse; DS ¼ dual stain; NPV ¼ negative predictive value; PPV ¼ positive predictive value; Ref ¼ referent. vical cancer screening by substantially reducing unnecessary
colposcopies compared with current standards and similarly
achieves excellent performance in a simulated fully vaccinated

population. Thus, CYTOREADER exceeds human diagnostic ac-
/ manual DS)
Pb (cytology
.002/ <.001
curacy and serves as an example of AI achieving improvements

Ref
beyond the automation of a human standard.

Our results demonstrate how automation and machine
learning can transform cervical cancer screening that is cur-
11.5 (10.1 to 13.1)
rently undergoing major changes. HPV testing for cervical can-

10.1 (8.7 to 11.5)
cer screening is an objective and reliable approach directly

(95% CI)
PPV, %
linked to the carcinogenic process (28). HPV-negative women

are at very low risk of developing precancer or cancer over the
next decade and screening intervals can be extended (8–10). Yet
most HPV infections are transient, and women require addi-
/manual DS)
Pa (cytology
tional tests to decide who needs further evaluation or treatment

.06/ <.001
(11, 12). Pap cytology is recommended and approved for triage

Ref
of HPV-positive women but suffers from subjectivity, lack of re-

producibility, and relatively low sensitivity (14). Our previous
study comparing manual DS to cytology together with the cur-
41.9 (40.1 to 43.7)
46.5 (44.6 to 48.3)
rent results demonstrates that automated DS evaluation can

Specificity,
% (95% CI)
supplant and improve the role of Pap cytology for triage of HPV-
positive women and should also be evaluated for postcolpo-
scopy and posttreatment surveillance (16). Compared with Pap
cytology, manual DS has higher accuracy and can provide lon-
/manual DS)
ger reassurance against disease when a test is negative, while

Pa (cytology
.2/Ref
the risks to patients do not differ from Pap cytology, because

.05/.5
.6/.4
Ref
the same sample type is used (17, 21). We previously showed

that the few DS-negative CIN3s are more likely to have no
HPV16/18 and no high-grade cytology, suggesting that these
85.8 (81.2 to 90.5)
<.001/ Ref 90.0 (86.0 to 93.9)
<.001/ <.001 88.1 (82.5 to 91.7)
.007/ <.001 91.8 (87.3 to 95.1)
ARTICLE
cases are less likely to progress (16). Automated DS evaluation
Sensitivity,
% (95% CI)
can provide a completely objective cervical cancer–screening

approach, improving efficiency and reducing harms and cost re-
lated to false-positive screening results. Furthermore, by dem-
onstrating that AI-based DS detection works for anal cytology,
we show the robustness of the imaging and analysis platform.
Pa (cytology/
referral, No. (%) manual DS)
Importantly, our approach is also suited for vaccinated popula-

Ref
tions, where it may achieve even higher specificity and counter-

balance the lower disease prevalence in vaccinated women (29).
Automated DS evaluation immediately quantifies the num-
ber of DS-positive cells on a slide, allowing tailoring positivity
Colposcopy
1860 (60.1)
1536 (49.6)
1298 (41.9)
1741 (56.3)
cutoffs for specific clinical decisions. Current guidelines give an

option for immediate treatment in women with HSIL cytology,
Two-sided generalized score statistic.
who have a very high probability of having underlying CIN3þ

(30). A higher cutoff of DS-positive cells could be used to guide
Pap cytology (188 CIN3þ)
treatment decisions. Moving forward, additional criteria can be

Manual DS (197 CIN3þ)
(2 cells) (192 CIN3þ)
developed to expand slide assessment; for example, the pres-

(1 cell) (201 CIN3þ)
ence of abnormal glandular cells to identify adenocarcinoma

precursors, which is a particular challenge for Pap cytology (31).
Automated DS
Automated DS
Digitization of glass slides paired with automated evaluation

Evaluation
in the cloud can provide high-throughput triage of HPV-positive

women with inherent objectivity. Furthermore, the functional-
ity of CYTOREADER can provide an assisted diagnostics mode
b
a
for evaluating DS slides. The automatic algorithm can be used Notes

for presenting all DS-positive cells found on a slide ranked by
the likelihood that a cell is DS-positive to accelerate slide evalu- Role of the funder: The funder had no role in the design, of the
ation. Similarly, CYTOREADER can be used for quality control of study; the collection, analysis, and interpretation of the data;
a program that is based on manual DS evaluation. the writing of the manuscript; and the decision to submit the
Successful implementation of CYTOREADER requires an in- manuscript for publication.
frastructure for high-quality staining, full-slide scanning, and Conflict of interest: Drs. Wentzensen and Schiffman are
running the machine-learning algorithm. However, slide prepa- employed by the National Cancer Institute (NCI), which has re-
ration, scanning, and slide evaluation can be geographically ceived cervical cancer screening assays in-kind or at reduced
separated, providing high-quality cervical cancer screening and cost from BD and Roche for studies that Drs. Wentzensen and
triage in locations that currently do not have infrastructure and Schiffman are conducting. Dr Goldhoff reported receiving grants
training to achieve reliable DS evaluation given a reliable cou- from the National Institutes of Health (NIH)/NCI during the con-
rier system is available. Compared with manual evaluation of duct of the study. Dr Castle reported receiving cervical screening
DS slides, the automated evaluation requires access to scanning tests and diagnostics from Roche, Becton Dickinson, Cepheid,
infrastructure but may require a smaller cytotechnology work- and Arbor Vita Corp at a reduced cost or no cost for research. Dr

force. Scanners are increasingly available in pathology laborato- Kingery reported receiving grants from the NIH and the NCI dur-
ries and can process large batches of slides with limited need ing the conduct of the study and receiving grants from the NIH
for a skilled operator (22, 23). Studies are warranted to evaluate and the NCI outside the submitted work. Dr Grewal reported re-
if DS is amenable to self-collected specimens, a sampling strat- ceiving grants from the NIH and the NCI during the conduct of
egy that is important for low-resource settings. Future efforts the study and grants from the NIH and the NCI outside the sub-
also need to evaluate how long a negative automated DS result mitted work. Dr Lorey reported receiving grants from the NIH
provides reassurance against precancer and how automated DS and the NCI during the conduct of the study and grants from
can be used in women undergoing surveillance. the NIH and the NCI outside the submitted work. No other dis-
We conducted a large, well-powered study to evaluate perfor- closures were reported.
mance of automated DS for triage of HPV-positive women.
However, some limitations should be noted. In contrast to the Data accessibility statement: The code is available at: https://
large KPNC study on HPV triage based on SurePath slides, 2 studies github.com/stcmedhub/dual_stain_dl.
using ThinPrep slides were comparably small, and they were con-
Author contributions: All authors contributed substantially to
ducted in colposcopy/anoscopy populations. Future studies need
the conception and design of the study, the acquisition of data,
to evaluate automated DS in larger HPV screening populations us-
or the analysis and interpretation. All authors drafted or pro-
ing ThinPrep slides. Also, the positivity and sensitivity of cytology
vided critical revision of the article and provided final approval
at KPNC is much higher compared with other settings, which may
of the version to publish.
affect the comparison of clinical efficiency estimates.
Our approach to train and validate both on the tile level and
the slide level with ground truth disease endpoints sets our References
work apart from other deep-learning approaches in digital pa- 1. Bi WL, Hosny A, Schabath MB, et al. Artificial intelligence in cancer imaging:
thology that focus on replicating a subjective evaluation. We clinical challenges and applications. CA Cancer J Clin. 2019;69(2):127–157.
2. Hinton G. Deep learning-a technology with the potential to transform health
recognize that there is substantial subjectivity underlying histo-
care. JAMA. 2018;320(11):1101–1102.
logic endpoints of cervical disease (15). In our study, we mini- 3. Shouval R, Labopin M, Bondi O, et al. Prediction of allogeneic hematopoietic
mized the impact by relying on the most reproducible correlate stem-cell transplantation mortality 100 days after transplantation using a
of cervical precancer, CIN3, as our primary endpoint for evalua- machine learning algorithm: a European Group for blood and marrow trans-
ARTICLE
plantation acute leukemia working party retrospective data mining study. J

tion of triage of HPV-positive women. Our work also emphasizes Clin Oncol. 2015;33(28):3144–3151.
the importance of integrating epidemiology and AI with the 4. Stead WW. Clinical implications and challenges of artificial intelligence and
availability of population bases studies to improve medical deep learning. JAMA. 2018;320(11):1107–1108.
5. Wentzensen N, Arbyn M, Berkhof J, et al. Eurogin 2016 roadmap: how HPV
diagnostics beyond automation. It has been proposed for a long knowledge is changing screening practice. Int J Cancer. 2017;140(10):2192–2200.
time that “digital pathology” will become an important corner- 6. Schiffman M, Wentzensen N, Wacholder S, et al. Human papillomavirus test-
stone of future health care. Despite this vision, image analysis ing in the prevention of cervical cancer. J Natl Cancer Inst. 2011;103(5):368–383.
7. Curry SJ, Krist AH, Owens DK, et al.; US Preventive Services Task Force.
currently does not contribute substantially to routine clinical Screening for cervical cancer: US Preventive Services Task Force recommen-
practice and to the benefit of the patient. The automated evalu- dation statement. JAMA. 2018;320(7):674–686.
ation of DS cytology slides has substantially improved accuracy 8. Gage JC, Schiffman M, Katki HA, et al. Reassurance against future risk of pre-
cancer and cancer conferred by a negative human papillomavirus test. J Natl
and efficiency compared with Pap cytology and serves as an im-
Cancer Inst. 2014;106(8):dju153.
portant example for introducing digital pathology and deep 9. Dillner J, Rebolj M, Birembaut P, et al. Long term predictive values of cytology
learning into clinical practice. This approach has the potential and human papillomavirus testing in cervical cancer screening: joint
European cohort study. BMJ. 2008;337(oct13 1):a1754.
to substantially improve screening program performance, po-
10. Katki HA, Kinney WK, Fetterman B, et al. Cervical cancer risk for women un-
tentially affecting millions of women testing HPV-positive in dergoing concurrent testing for human papillomavirus and cervical cytology:
cervical cancer screening each year. a population-based study in routine clinical practice. Lancet Oncol. 2011;12(7):
663–672.
11. Cuschieri K, Ronco G, Lorincz A, et al. Eurogin roadmap 2017: triage strategies
Funding for the management of HPV-positive women in cervical screening programs.
Int J Cancer. 2018;143(4):735–745.
This work was supported by the Intramural Research 12. Wentzensen N, Schiffman M, Palmer T, et al. Triage of HPV positive women
Program of the US National Cancer Institute, National in cervical cancer screening. J Clin Virol. 2016;76(Suppl 1):S49–S55.
13. Huh WK, Ault KA, Chelmow D, et al. Use of primary high-risk human papillo-
Institutes of Health, Department of Health and Human mavirus testing for cervical cancer screening: interim clinical guidance. J Low
Services. Genit Tract Dis. 2015;19(2):91–96.
14. Wright TC Jr, Stoler MH, Behrens CM, et al. Interlaboratory variation in the 23. Grabe N, Lahrmann B, Pommerencke T, et al. A virtual microscopy system to
performance of liquid-based cytology: insights from the ATHENA trial. Int J scan, evaluate and archive biomarker enhanced cervical cytology slides. Cell
Cancer. 2014;134(8):1835–1843. Oncol. 2010;32(1-2):109–119.
15. Stoler MH, Schiffman M. Interobserver reproducibility of cervical cytologic 24. Wentzensen N, Walker JL, Gold MA, et al. Multiple biopsies and detection of
and histologic interpretations: realistic estimates from the ASCUS-LSIL cervical cancer precursors at colposcopy. J Clin Oncol. 2015;33(1):83–89.
Triage Study. JAMA. 2001;285(11):1500–1505. 25. Wentzensen N, Follansbee S, Borgonovo S, et al. Human papillomavirus gen-
16. Wentzensen N, Clarke MA, Bremer R, et al. Clinical evaluation of HPV screen- otyping, human papillomavirus mRNA expression, and p16/Ki-67 cytology to
ing with p16/Ki-67 dual stain triage in a large organized cervical cancer detect anal cancer precursors in HIV-infected MSM. Aids. 2012;26(17):
screening program. JAMA Intern Med. 2019;179(7):881. 2185–2192.
17. Clarke MA, Cheung LC, Castle PE, et al. Five-year risk of cervical precancer fol- 26. Solomon D, Davey D, Kurman R, et al. The 2001 Bethesda system:
lowing p16/Ki-67 dual-stain triage of HPV-positive women. JAMA Oncol. 2019; terminology for reporting results of cervical cytology. JAMA. 2002;287(16):
5(2):181. 2114–2119.
18. Wentzensen N, Fetterman B, Castle PE, et al. p16/Ki-67 dual stain cytology for 27. Leisenring W, Alono T, Pepe MS. Comparisons of predictive values of binary
detection of cervical precancer in HPV-positive women. J Natl Cancer Inst. medical diagnostic tests for paired designs. Biometrics. 2000;56(2):345–351.
2015;107(12):djv257. 28. Schiffman M, Doorbar J, Wentzensen N, et al. Carcinogenic human papillo-
19. Wentzensen N, Schwartz L, Zuna RE, et al. Performance of p16/Ki-67 immu- mavirus infection. Nat Rev Dis Primers. 2016;2(1):16086.
nostaining to detect cervical cancer precursors in a colposcopy referral popu- 29. Franco EL, Cuzick J. Cervical cancer screening following prophylactic human
lation. Clin Cancer Res. 2012;18(15):4154–4162. papillomavirus vaccination. Vaccine. 2008;26(Suppl 1):A16–A23.
20. Wright TC Jr, Behrens CM, Ranger-Moore J, et al. Triaging HPV-positive 30. Massad LS, Einstein MH, Huh WK, et al. 2012 ASCCP Consensus Guidelines

women with p16/Ki-67 dual-stained cytology: results from a sub-study Conference. 2012 updated consensus guidelines for the management of ab-
nested into the ATHENA trial. Gynecol Oncol. 2017;144(1):51–56. normal cervical cancer screening tests and cancer precursors. J Low Genit
21. Carozzi F, Gillio-Tos A, Confortini M, et al. Risk of high-grade cervical intrae- Tract Dis. 2013;17(5 Suppl 1):S1–S27.
pithelial neoplasia during follow-up in HPV-positive women according to 31. Conrad RD, Liu AH, Wentzensen N, et al. Cytologic patterns of cervical adeno-
baseline p16-INK4A results: a prospective analysis of a nested substudy of carcinomas with emphasis on factors associated with underdiagnosis.
the NTCC randomised controlled trial. Lancet Oncol. 2013;14(2):168–176. Cancer Cytopathol. 2018;126(11):950–958.
22. Lahrmann B, Valous NA, Eisenmann U, et al. Semantic focusing allows fully
automated single-layer slide scanning of cervical cytology slides. PLoS One.
2013;8(4):e61441.
ARTICLE

Accuracy and Efficiency of Deep-Learning-Based Automation of Dual Stain Cytology in Cervical Cancer Screening

Uploaded by

Copyright:

Available Formats

Accuracy and Efficiency of Deep-Learning-Based Automation of Dual Stain Cytology in Cervical Cancer Screening

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accuracy and Efficiency of Deep-Learning-Based Automation of Dual Stain Cytology in Cervical Cancer Screening

Uploaded by

Copyright:

Available Formats

JNCI J Natl Cancer Inst (2021) 113(1): djaa066

Accuracy and Efficiency of Deep-Learning–Based Automation of Dual

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

‡Authors contributed equally to this work.

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

Patient level Tile level Slide level

Developing AI algorithm on le level Improving AI algorithm on slide level

193 paents 19 paents 238 paents + -

Biopsy Anal Screening Kaiser Screening

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

Evaluang on the slide level

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

Positive Sensitivity Specificity

Biopsy Study validation set (CIN3þ)

Clinical Efficiency of Automated DS Evaluation

We compared the clinical efficiency of Pap cytology, manual DS,

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

DS showed equal sensitivity and lower colposcopy referral com-

9.9 (8.7 to 11.3)

8.7 (7.6 to 9.9)

/manual DS) No. (95% CI)

Using a rigorous study design, we developed a novel deep-learn-

98.7 (97.9 to 99.2)

uation of DS slides dramatically increases the efficiency of cer-

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

curacy and serves as an example of AI achieving improvements

beyond the automation of a human standard.

rently undergoing major changes. HPV testing for cervical can-

cer screening is an objective and reliable approach directly

linked to the carcinogenic process (28). HPV-negative women

tional tests to decide who needs further evaluation or treatment

(11, 12). Pap cytology is recommended and approved for triage

of HPV-positive women but suffers from subjectivity, lack of re-

46.5 (44.6 to 48.3)

rent results demonstrates that automated DS evaluation can

ger reassurance against disease when a test is negative, while

the risks to patients do not differ from Pap cytology, because

the same sample type is used (17, 21). We previously showed

.007/ <.001 91.8 (87.3 to 95.1)

can provide a completely objective cervical cancer–screening

Importantly, our approach is also suited for vaccinated popula-

tions, where it may achieve even higher specificity and counter-

cutoffs for specific clinical decisions. Current guidelines give an

who have a very high probability of having underlying CIN3þ

treatment decisions. Moving forward, additional criteria can be

(2 cells) (192 CIN3þ)

developed to expand slide assessment; for example, the pres-

ence of abnormal glandular cells to identify adenocarcinoma

Digitization of glass slides paired with automated evaluation

in the cloud can provide high-throughput triage of HPV-positive

for evaluating DS slides. The automatic algorithm can be used Notes

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

plantation acute leukemia working party retrospective data mining study. J

Downloaded from https://academic.oup.com/jnci/article/113/1/72/5862008 by guest on 14 March 2023

You might also like