Computational Identi Fication of Preneoplastic Cells Displaying High Stemness and Risk of Cancer Progression
Computational Identi Fication of Preneoplastic Cells Displaying High Stemness and Risk of Cancer Progression
Computational Identi Fication of Preneoplastic Cells Displaying High Stemness and Risk of Cancer Progression
ABSTRACT
◥
Evidence points toward the differentiation state of cells as a cancer. Spatial transcriptomics and whole-genome bisulfite
marker of cancer risk and progression. Measuring the differentia- sequencing demonstrated that differentiation activity of tissue-
tion state of single cells in a preneoplastic population could thus specific TFs was decreased in cancer cells compared with the basal
enable novel strategies for early detection and risk prediction. cell-of-origin layer and established that differentiation state corre-
Recent maps of somatic mutagenesis in normal tissues from young lated with differential DNA methylation at the promoters of these
1
Department of Etiology and Carcinogenesis, National Cancer Center/Cancer T. Liu, X. Zhao, Y. Lin, Q. Luo, and S. Zhang contributed equally to this article.
Hospital, Chinese Academy of Medical Sciences and Peking Union Medical Corresponding Authors: Chen Wu, Department of Etiology and Carcinogen-
College, Beijing, China. 2Biomedical Pioneering Innovation Center (BIOPIC), esis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical
School of Life Sciences, Peking University (PKU), Beijing, China. 3Beijing Sciences and Peking Union Medical College, Beijing 100021, China. Phone: 8601-
Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 0877-87395; E-mail: chenwu@cicams.ac.cn; Andrew E. Teschendorff, CAS Key
China. 4CAS Key Laboratory of Computational Biology, Shanghai Institute of Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health,
Nutrition and Health, University of Chinese Academy of Sciences, Chinese University of Chinese Academy of Sciences, Chinese Academy of Sciences,
Academy of Sciences, Shanghai, China. 5School of Life Sciences, Tsinghua- Shanghai 200031, China. Phone: 8618-3170-47442; E-mail: andrew@picb.ac.cn;
Peking Center for Life Sciences, Tsinghua University, Beijing, China. 6Depart- Dongxin Lin, Department of Etiology and Carcinogenesis, National Cancer
ment of Health Toxicology, Key Laboratory for Environment and Health, Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking
School of Public Health, Tongji Medical College, Huazhong University of Union Medical College, Beijing 100021, China. Phone: 8601-0877-88491;
Sciences and Technology, Wuhan, Hubei, China. 7Collaborative Innovation E-mail: lindx@cicams.ac.cn; and Jiang Chang, Department of Health
Center for Cancer Personalized Medicine, Nanjing Medical University, Toxicology, Key Laboratory for Environment and Health, School of Public
Nanjing, China. 8UCL Cancer Institute, University College London, London, Health, Tongji Medical College, Huazhong University of Sciences and
United Kingdom. 9CAMS Oxford Institute (COI), Chinese Academy of Medical Technology, Wuhan 430030, Hubei, China. Phone: 8618-6940-68151; E-mail:
Sciences, Beijing, China. 10CAMS key Laboratory of Cancer Genomic Biology, changjiang815@hust.edu.cn
Chinese Academy of Medical Sciences and Peking Union Medical College,
Beijing, China. Cancer Res 2022;82:2520–37
Note: Supplementary data for this article are available at Cancer Research doi: 10.1158/0008-5472.CAN-22-0668
Online (http://cancerres.aacrjournals.org/). 2022 American Association for Cancer Research
AACRJournals.org | 2520
Preneoplastic Cells of High Stemness and Cancer Risk
evidence pointing toward an epigenetic silencing mechanism low-grade intraepithelial neoplasias (LGIN), 9 high-grade intraepithe-
(11, 19, 20). However, the precise role and timing of these putative lial neoplasias (HGIN), and 14 invasive cancers (ICA). Medical records
silencing/inactivation events in carcinogenesis remains unclear, and were reviewed to collect clinical data from each patient, including age,
has not yet been explored at single-cell resolution. gender, smoking, and drinking behavior.
With single-cell technology (21), it is in principle now possible to
explore the heterogeneity of differentiation states within a cell pop- Sample handling and tissue processing
ulation, including preneoplastic and cancer cells, a critically important Tissue samples were placed in RPMI1640 medium (Corning,
task that could help identify the least differentiated and more stem-like catalog no. 10–040-CV) with 20% FBS (Cell Signaling Technologies,
cells that are believed to underpin cancer risk and drive cancer catalog no. 30070.03) immediately after surgical resection. Tissue
progression, thus paving the way for novel cancer risk and early was processed for scRNA-seq following previously described proto-
detection strategies. Inferring the differentiation state of individual col (26, 29) with a portion being cryosectioned and hematoxylin and
preneoplastic or cancer cells from single-cell omic data is, however, eosin stained to confirm the histologic staging. Briefly, tissues were
challenging since traditional differentiation markers may no longer be rinsed with cold 10% FBS PBS, cut into small pieces on ice, and digested
valid (22). While a number of computational methods for measuring in RPMI1640 medium containing 2 mg/mL collagenase IV (Gibco,
stemness and differentiation state from single-cell RNA sequencing catalog no. 17104–019) and 0.5 mg/mL hyaluronidase (Sigma Aldrich,
(scRNA-seq) data have been proposed (23–25), each of these methods catalog no. 7326–33–3) for 1 hour at 37 C. The digested cell suspen-
is based on a measure of global transcriptional entropy that does not sion was subsequently filtered through a 70-mm cell strainer (Falcon,
catalog no. 352350) before centrifuging at 560 g for 6 minutes at 4 C.
clusters. These clusters were annotated on the basis of the expression of depths with an Illumina NovaSeq 6000 (Illumina, Inc.). Tissue spots
known markers. For epithelial cells, the marker genes included were visually inspected and annotated by aligning the scanned histo-
EPCAM, SFN, KRT5, and KRT14, resulting in 5,070 epithelial cells, logic images using Loupe Browser (version 4.1.0). Raw ST sequences
including 215 from tissue of normal or inflammatory esophageal were mapped to hg38 genome using Spaceranger (version 1.0.0), and
epithelium, 44 from LGIN, 1,456 from HGIN, and 3,355 from ICA reached an average of 202,743 reads per tissue covered spot (mean
(657, 1,540, 1,137, and 21 for stage I, II, III, and IV, respectively). The reads of 231,137, 186,996, 190,097 for LZE7, LZE8, and LZE22 tissue
mean number of genes detected in each epithelial cell was 2,023 and the blocks, respectively). The ST sequencing data encompassed a total of
average UMI count per cell was 8,202. The mean mitochondrial gene 8,679 spots with an average of 3,322 genes detected. A total of 4,208
content was 4.3% of all UMI counts. epithelium/carcinoma spots (Epi spots) were manually selected with
Loupe Browser, including 477 NOR, 945 INF, 243 LGIN, 527 HGIN,
Processing of epithelial cells from 14 ESCC patients (Cohort 1) and 2016 ICA Epi spots. Specifically, NOR/INF basal Epi spots were
The 5,070 epithelial cells were then rerun through a Seurat recognized as located in basal layers of epithelium or near papillae
analysis at a higher level of stringency, where we only retained based on histologic characters (n ¼ 621). Each Epi spot covered an area
cells (i) expressing at least 200 genes, (ii) expressing less than of 55 mm diameter encompassing 10–20 epithelial cells. ST data were
6,000 genes, (iii) with a DoubletScore < 1 using the doubletCells analyzed with Seurat following standard procedure with the same
function from scran R-package (33) and (iv) a mitochondrial percent- quality control, standardization, and clustering parameters, as men-
age < 5%. This resulted in 3178 cells: 95 from normal/inflammatory tioned above. Briefly, raw data were imported into R using Seurat
bulk-tissue samples, the overexpression analysis is carried out again Power calculation
by comparing the given tissue to blood and separately again to The calculation of SCIRA’s sensitivity (SE) to detect highly
blood vessels, which we found to be a very effective procedure (35). expressed cell type–specific TFs in a given tissue type from the
Tissue-specific TFs are then defined as those overexpressed in the bulk-tissue GTEX dataset is described in detail in ref. 35. Briefly, the
given tissue (in our case esophagus, n ¼ 686 samples) compared main parameters affecting the power estimate include the relative
with all other tissue types (n ¼ 7,869) as well as when compared sample sizes of the two groups being compared (n1 and n2), the average
with blood (n ¼ 511) and blood vessels (n ¼ 689). Independently expression effect size e (in effect the average expression fold-change) of
from this, SCIRA also applies a greedy 2-step partial correlation the cell type–specific TFs compared with all other cell types, which will
framework to the same GTEX dataset to infer regulons for these depend on the proportion of the cell type (w) within the tissue
TFs. To generate the full esophageal network, we ran the following of interest. Assuming that a given TF is more highly expressed in a
commands: cell type that makes up only a proportion w of the cells in the tissue of
interest, then e ¼ log2[FC w þ 1 (1 w)]/s where FC is the average
reg.o sciraInfReg(data.m, tfEID.v, sdth ¼ 0.25, sigth ¼ 1e-6, fold change and s is a pooled SD. To estimate the average expression
spTH ¼ 0.01, pcorth ¼ 0.2, minNtgts ¼ 10, ncores ¼ 4) fold-change FC for top DEGs between single-cell types in a tissue, we
net.o sciraSelReg(reg.o, tissue.v, toi ¼ c("Esophagus"), cft ¼ analyzed expression data from purified FACS sorted luminal and basal
c("Blood","Blood Vessel"), degth.v ¼ rep(0.05,3), lfcth.v ¼ c(log2(1.5), cells from the mammary epithelium (40), as described in detail in
log2(1),log2(1))); ref. 35. Because FACS-sorted cell populations are still heterogeneous,
TFA-matrix is defined over a relatively small number of features otherwise, and with Aii ¼ 0). CCAT is a much faster and scalable proxy
(the tissue-specific TFs), that no dimensional reduction is necessary of differentiation potency than SCENT. The reason why CCAT
prior to application of diffusion maps. The aim of this diffusion map measures potency is that a cell of higher stemness tends to overexpress
analysis is to ascertain the existence of a bifurcation, with one branch network hubs, with many of these network hubs encoding ribosomal
defining invasion/cancer and the other representing a non-cancer fate proteins (23), a result we have validated across over 2 million cells and
(e.g., differentiation). To estimate pseudotime, we use the following 28 scRNA-seq studies (45). The association between ribosomal gene
procedure to obtain a root-cell, from which the two tip points (cancer expression and differentiation potency has been observed across
vs. noncancer) are then identified. From the Markov transition matrix different species and is independent of cell proliferation (46, 47). It
M defined over all cells, we define the submatrix M ~ by only selecting is important to observe that the three single-cell measures we compute
cells in the normal þ inflammatory state. This submatrix defines a within the CancerStemID framework, that is, the stemness index
weighted subgraph, which is not necessarily connected. To identify CCAT, the TFIL, and the cancer risk score, are all independent from
the main modules within this subgraph we use the walk-trap each other, and that any associations between them are nontrivial.
community detection algorithm (44), to subsequently select the
largest community. This defines the root-state and the root-cell is Calculation of cell-cycle scores
obtained as the cell that minimizes the median absolute deviation in To identify single cells in either the G1–S or G2–M phases of the
diffusion component space. cell-cycle we followed the procedure described in Tirosh and col-
To estimate the cancer risk score, we compute the Pearson corre- leagues (48). Briefly, we used genes whose expression is reflective of
G1–S or G2–M phase. A given normalized scRNA-seq data matrix is
esophageal-specific TFs were extracted for analysis. The overall study profiling malignant and nonmalignant colon epithelial cells from
comparison of promoter methylation was performed with paired 11 patients. We processed these data as described previously (35).
Student t test using the averaged DNA methylation (DNAm) levels Briefly, we downloaded the normal mucosa and tumor epithelial cell
across all promoter CpGs. For CpG-specific differential methylation FPKM files from GEO under accession number GSE81861. In total,
analysis, we used the Wald test as implemented in the dss R-package there were 160 and 272 normal and tumor epithelial cells.
(version 2.38.0; ref. 53). Differentially methylated CpGs between
ESCC and paired normal tissue (n ¼ 26 pairs) were defined by Data availability
requiring a significant Wald test P < 0.01 and a difference in average The raw sequencing data of our human Cohort 1 scRNA-seq
DNAm (delta) of at least 0.1. To assess statistical significance over data is available from the Genome Sequence Archive of Beijing
the whole promoter, we used a paired t test comparing the mean Institute of Genomics, Chinese Academy of Sciences (https://ngdc.
DNAm value over all DMLs in the promoter between the 26 ESCCs cncb.ac.cn/gsa/) with accession number HRA000776 (GSA-Human
and 26 matched normals. Of note, the latter test requires directional subAccession number). The raw sequencing data of the human
DNAm changes to be more consistent to attain statistical signifi- Cohort 2 scRNA-seq data is available from GSA (https://bigd.big.ac.
cance and is therefore more stringent. cn/gsa) under accessing number HRA000195. The gene-by-cell
count matrix of Cohorts-1 and 2 are available from GEO under
Analysis of genomic alterations in Cohort 2 ESCC patients accession numbers GSE199654 and GSE160269. Gene expression
Somatic mutation and copy-number variation (CNV) profiles of matrix of ESCC and paired adjacent normal samples is available
TFs
displaying
Hyperplasia reduced
Dysplasia TFs differentiation
Carcinoma in situ activity Cells at low Cells at high
cancer risk cancer risk
TFA
Invasive High
cancer (C) Noncancer Cancer High TFIL/stemness
Low fates fate Low TFIL/stemness
Transcription factor
differentiation activity
(TFA) map
Figure 1.
Rationale and the CancerStemID algorithm. A, Focusing on normal development and differentiation, tissue-specific TFs exhibit increased differentiation activity
(TFA) as cells differentiate from adult stem cells to multi-or-unipotent progenitors and finally to fully differentiated cells, as shown. B, Given a population of
preneoplastic cells, these cells exhibit heterogeneity in terms of their TFA profiles. The underlying hypothesis is that those preneoplastic cells with a TFA profile more
similar to that of the adult or progenitor states of the tissue are more likely to be selected for during cancer progression, in line with the Cancer Stem Cell hypothesis.
C, CancerStemID is a computational framework applicable to scRNA-seq data generated from different stages in cancer progression, aimed at identifying the
preneoplastic cells that are under positive selection, i.e., at highest risk of cancer progression. The CancerStemID algorithm first estimates transcription factor
differentiation activity (TFA) for tissue-specific TFs across all single cells in order to identify the TFs that exhibit reduced differentiation activity during cancer
progression. For each cell, we also independently estimate a (i) differentiation potency (dedifferentiation) score using the CCAT/SCENT algorithm, (ii) a TFIL
representing the number of tissue-specific TFs that are inactivated in a given cell, and (iii) a cancer progression (or cancer risk) score. The cancer progression score is
derived by applying diffusion maps to the TFA matrix, so as to infer lineage trajectories that map to cancer and noncancer fates, estimating for each preneoplastic cell
a relative probability of diffusing to the cancer fate, thus defining a cancer progression score. The main hypothesis is that a preneoplastic cell with a higher TFIL is
associated with an increased stemness and cancer progression score.
a higher stemness and cancer risk, reflecting the cell-of-origin that prevents reliable inference of TF regulatory activity from measured
undergoes positive selection during cancer progression (Fig. 1B). TF expression levels. In the second step, we quantify the overall
The CancerStemID framework thus involves two steps: (i) identi- degree of differentiation activity of a cell by direct comparison of the
fication of the key TFs and inference of their differentiation activity inferred TFA values relative to an appropriate normal state. In
(TFA) in single-cells, and (ii) quantification of the overall level of effect, the number of tissue-specific TFs displaying low differenti-
dedifferentiation, which we posit identifies cellular states that ation activity relative to this normal state, a quantity we call TFIL, is
progress to the invasive cancer stage (Fig. 1C). To identify the a direct proxy of the dedifferentiation state of the cell (Fig. 1C).
tissue-specific TFs and to estimate their TFA values we use the
SCIRA algorithm (35), a machine-learning method that infers TFs Construction and validation of an esophageal-specific
and associated regulons from a large and appropriately powered regulatory network
multi-tissue gene expression dataset while adjusting for cell type To test CancerStemID in ESCC, we first aimed to identify esoph-
heterogeneity. Differentiation activity of TFs in single cells is then ageal-specific TFs and their regulons. To this end, we applied
derived using the regulon set of each TF. As shown by a number of SCIRA (35) to the large multi-tissue GTEX expression dataset (61),
studies (35, 59, 60), this regulon-based approach leads to improved encompassing 8,555 samples and 29 tissue types, including 686 normal
inference of differentiation activity in the context of scRNA-seq esophageal tissue specimens, while adjusting for the variation in
data, mainly due to the high dropout rate of such data, which immune-cell infiltration between samples and tissues (Materials and
Methods). Our power calculation indicated more than 90% sensitivity (Fig. 2A; Supplementary Data File S1; Materials and Methods), with
to detect esophageal-epithelial specific TFs (Materials and Methods; an average of 42 regulon-genes per TF. Several of the identified TFs
Supplementary Fig. S1A). SCIRA inferred a regulatory network con- (e.g., TP63, KLF5, SOX2, FOXE1, PAX9, EHF) have established roles
sisting of 43 esophageal-specific TFs and 1,136 target/regulon genes in squamous epithelial differentiation of the esophagus (62–64).
A SCIRA on GTEx
C
ELF3 EHF
686 esophagus samples vs. 7,869 other-tissue samples
10 P < 10−100 P < 10−100
5
Esophageal-specific regulatory network 5
(43 TFs + 1,136 regulon genes) TFA 0
0
−5 −5
D
TRIM29 PAX9 YBX2
B
2.9
Epithelial cell (basal)
Epithelial cell (suprabasal) 0.8
Epithelial cell (stratified) 0.2
TFA
Epithelial cell (upper) −0.3
Endothelial cell (lymphatic) −0.9
Endothelial cell (vascular arterial)
−3.1
Endothelial cell (vascular venous)
Gland duct
Mucous gland Epi stratified
CD27- B cell
CD27+ B cell
CD4+ T cell Epi upper
CD8+ T cell
NKT/CD8+ CTL
Dendritic cell
Monocyte/macrophage
UMAP-2
UMAP-2
Epi suprabasal
Mast cell
Fibroblast Epi basal
Muscle
UMAP-1 UMAP-1
Figure 2.
Construction and validation of the esophageal-specific regulatory network. A, We applied the SCIRA algorithm to the large multi-tissue GTEX expression dataset,
encompassing 686 esophagus and more than 7,500 samples from other tissue-types, to infer an esophageal-specific regulatory network consisting of 43
esophageal-specific TFs (black squares) and their regulon genes (red circles). The regulon associated with each TF is depicted with a distinct background color, with
the regulon genes representing direct binding and indirect downstream targets. The regulons are then used to estimate regulatory activity of the TFs (TFA) in an
independent sample (bulk or single-cell RNA-seq profile). B, Validation of the esophageal-specific TF regulons in the 10× scRNA-seq esophageal tissue dataset from
the HCA. Left, UMAP depicts the clusters representing different cell types in the human esophagus. Right, UMAP colors the cells according to the average TFA over the
43 esophageal TFs. C, Violin plots for two of the esophageal TFs (ELF3, EHF) displaying their estimated TFA levels across all cells from the human esophagus stratified
according to whether the cell is epithelial, an immune cell, a fibroblast, or an endothelial cell. P value derived from a one-tailed Wilcoxon rank sum test comparing
epithelial with the other cell types. D, Diagram displaying for each of the 43 esophageal TFs if they are inactivated/downregulated (DN) or activated/overexpressed
(UP) according to differential TFA or differential expression (DE). In the case of differential expression, P values were derived from a Wilcoxon rank sum test. In the
case of TFA values, because these do not have dropouts, we used a t test to estimate P values.
We validated the 43 esophageal-specific TFs and regulons in two and Methods; Fig. 3D and E). We observed good agreement between
independent multi-bulk tissue expression datasets (Supplementary the cancer versus normal differential activity patterns derived from the
Fig. S1B and S1C; refs. 36, 37), using ChIP-seq data from the two independent cohorts (Fisher one-tailed test P ¼ 0.006; Fig. 3F). Of
ChIP-seq Atlas (Supplementary Fig. S1D; ref. 38), and in 10× note, these skews toward lower differentiation activity were not
scRNA-seq data of normal esophageal tissue (50,000 cells and 19 observed at the level of TF expression, consistent with previous
cell types) generated as part of the Human Cell Atlas (HCA; demonstrations that regulons improve the sensitivity to detect differ-
Fig. 2B–D; see Materials and Methods; ref. 39). By estimating entiation activity changes as compared with TF expression (Fig. 3E;
differentiation activity of the 43 esophageal TFs in this normal ref. 35). In support of this, we note that tumor versus normal differential
esophagus HCA set, we verified that the average differentiation activity patterns derived from the scRNA-seq data were more consistent
activity (TFA) was highest in the epithelial clusters, and that 81% than differential expression, when compared with the differential
(i.e., 35) of our TFs displayed a significantly higher activity in expression patterns seen in corresponding bulk tissue RNA-seq datasets
epithelial cells (Fig. 2B–D). Within the epithelial compartment, the (Supplementary Fig. S3C and S3D; Supplementary Table S2).
average TFA correlated with differentiation state, being lowest and Of note, some of the TFs displaying reduced differentiation
highest for cells in the basal and upper epithelium layers, respec- activity (e.g., TRIM29, EHF, PAX9) have been implicated as tumor
tively (Fig. 2B; Supplementary Fig. S2A). To benchmark this suppressors in squamous cell carcinoma including ESCC (65–67).
association with differentiation state, we separately estimated Other TFs like TP63 and SOX2, which have been implicated as
potency of each cell using CCAT (45), a model of single-cell potency oncogenes in ESCC (68–72), displayed increased expression in
A Multi–region tissue collection (Cohort 1) Single-cell transcriptomic analysis Cell type annotation
EC1 NOR
INF
EC2 N/INF
LGIN LGIN
PC2
EC3
HGIN HGIN
EC4 ICA
ICA_Stage I
EC5 ICA_Stage II
tSNE_2
tSNE_2
_
Density
tSNE_1 tSNE_1
PC1
D Cohort 1 (14 ESCC) Cohort 2 (60 ESCC)
N/INF LGIN HGIN
PAX9
ICA t(TFA) N ICA t(TFA)
E Cohort 1 Cohort 2
BARX2 *** **
ZNF185 *** *
30
30
FOXN1 *** * P = 3 × 10 −5 P = 3 × 10 −4
TRIM16 ***
SOX15 *** *
HES2 *** UP
***
20
20
EHF P = 0.952
#SigTF
#SigTF
DN
SOX2 *** *
TRIM29 *** ** P =0.806
FOXQ1 *** *
OVOL1 *** *
10
10
TP63 ***
TFAP2C **
**
5
5
KLF3
IRF6 **
**
0
0
ELF3
KLF8 ** ** TFA DE TFA DE
RARG * *
DTX2 *
HR *
BNC1 *
TRIOBP * F
BNC2 * *
ELK3 * **
* 0 4
6
RCOR1 *
*
t(TFA:ICA - N) [Cohort 2]
MYCBP
*
4
STON1
DES **
*
2
TEAD1
TEAD3 **
GRHL2 * P = 0.006
0
AFDN *
ZNF219
−2
FOXE1
HDAC1 *
−4
TRIP10
YBX2
−6
RREB1 13 1
FOXA1 * *
MYC * **
KLF5 * * −30 −20 −10 0 10 20 30
TFAP2A *** *
n = 95 28 1,053 2,022
*** n = 37 20,433
* t(TFA:ICA - N) [Cohort 1]
TFA
Low High * P < 0.05
** P < 10
−5
Figure 3.
Reduced differentiation activity of esophageal-specific TFs precedes cancer development. A, scRNA-seq profiling on tumor and normal adjacent tissue from
14 patients with ESCC. UMAP diagram depicts 115,930 cells with clusters annotated to different cell types. B, The first tSNE-plot displays six different epithelial
subclusters. The second tSNE plot colors cells by disease stage. C, PCA scatterplot (PC1 vs. PC2), as derived by applying PCA to the transcription factor
regulatory activity (TFA) matrix of 43 esophageal TFs and a total of 3,178 epithelial cells. Cells are colored by disease stage. Density plot beneath
PC1 axis depicts the distribution of cells of each disease stage according to PC-1 weight. P value is from a Pearson correlation test between PC1 and disease
stage (1 ¼ N/INF, 2 ¼ LGIN, 3 ¼ HGIN, 4 ¼ ICA). D, Heatmaps of TFA for the 43 esophageal TFs across the four main disease stages in Cohorts 1 and 2 as shown.
For each disease stage, the TFA over all cells in that stage were averaged. Color bar labeled “t(TFA)” displays the t statistic of a linear regression between TFA
and disease stage (encoded as an ordinal variable, 1 ¼ N/INF, 2 ¼ LGIN, 3 ¼ HGIN, 4 ¼ ICA), and P values shown derive from this t test. In the case of Cohort 2,
there were only two disease stages (1 ¼ N, 2 ¼ ICA). E, Barplots displaying the number of significantly inactivated/downregulated (DN) and activated/
overexpressed (UP) TFs in Cohorts 1 and 2 according to differential TFA or differential expression. F, Scatterplot of the t statistics of differential TFA between
ICA and N for Cohort 1 versus Cohort 2. P value is from a linear regression. The number of TFs significantly inactivated in both Cohorts 1 and 2 is displayed in blue.
TFAP2C
TFAP2A
TRIOBP
ZNF185
ZNF219
MYCBP
4
TRIM29
TRIM16
RCOR1
HDAC1
FOXQ1
GRHL2
BARX2
TRIP10
OVOL1
RREB1
FOXN1
FOXA1
STON1
TEAD3
FOXE1
TEAD1
SOX15
RARG
AFDN
BNC1
BNC2
distribution of TFA across NOR, HGIN,
SOX2
HES2
YBX2
DTX2
ELK3
**
PAX9
KLF8
KLF3
KLF5
ELF3
TP63
MYC
IRF6
DES
EHF
HR
and ICA spots (n ¼ 141, 313, and 613,
2
Basal respectively). P values were computed
TFA
141
with an unpaired Student t test. C,
8)
22
C
ZE
C
ZE
ES
D E
(L
(L
se
um
um
C
um
P = 2 × 10−8
C
ou
si
si
ES
ES
Vi
M
Vi
100
PAX9
KLF8
Obs ciation between differentiation activity
80 Null
EHF (TFA) and cancer progression, for the
Fraction (%)
ELF3
FOXQ1
60 43 esophageal-specific TFs across six
TRIM29
TRIM16
40 independent scRNA-seq studies with
ZNF185
20
the 10 Visium data results displayed
RCOR1
BARX2 separately for each of the 3 patients.
FOXN1 0
SOX15
For the 10 Visium data we display the
0 2 4 6 8 10
DTX2 results for each patient separately
HR Number of inactivated TFs shared by
OVOL1
all 6 studies
because for the 10 Visium data we
IRF6
GRHL2
had enough normal epithelial spots for
TP63 the comparison within each patient
RARG P = 5 × 10−9
HES2
Sign(t)*(-Log10 P)
100 to be meaningful. The values in this
KLF3
BNC1
>20 Obs
Null
heatmap represent the sign of the t-
80
TFAP2C 5 statistic multiplied by -log10(P), where
Fraction (%)
AFDN 60
SOX2 3 P is the associated P value. Blue colors
TRIOBP 0
TRIP10
40 denote reduced TFA during ESCC pro-
MYCBP −3 gression. The color bar to the right
20
TEAD3 −5 labels the number of studies in which
ZNF219 0
RREB1 <−20
YBX2
the TF displays reduced TFA. E, Plots
0 3 6 9 13 17
BNC2
Number of inactivated TFs shared compare the number of TFs observed
MYC 6
FOXE1 5 by at least 5 studies to exhibit reduced differentiation
ELK3
FOXA1 4 activity in all 6 studies (left) and in at
TEAD1 3 least 5 studies (right) with the corre-
HDAC1
2 sponding binomial null distributions.
KLF5
STON1 1
DES 0 Green vertical line indicates the
TFAP2A observed numbers and the P value is
from a one-tailed binomial test.
Reduced differentiation activity is observed relative to the basal ESCC Cohort 2 for which using less stringent quality control
epithelium thresholds, a sufficient number of normal epithelial cells (n ¼
Given that normal esophageal basal cells displayed much lower 183) were obtained. On the basis of four well-known esophageal
TFA compared with normal cells from the differentiated upper basal markers (TP63, KRT5, KRT14, KRT15), we identified 36 basal
epithelium (Fig. 2B; Supplementary Fig. S2A), we reasoned that the cells, which reassuringly displayed a significantly higher potency
lower differentiation activity displayed by esophageal-specific TFs than the 147 nonbasal ones, thus validating our assignments
during cancer progression could reflect the increased enrichment of (Supplementary Fig. S6A and S6B). Despite the relatively small
the basal cell-of-origin population. To explore this, we reran the number of basal cells, esophageal-TFs still displayed a clear trend
differential TFA-analysis using only a subset of normal cells that we toward reduced differentiation activity in preneoplastic and cancer
could confidently classify as basal. This was done in our human cells (Supplementary Fig. S6C; binomial test, P < 0.0001). To
A 12 B Cohort 1
8
TFIL P = 3 × 10−19
4 0.25
0
Stemness (CCAT)
ELK3 0.20
TEAD1
AFDN 0.15
RCOR1
IRF6 0.10
HR
EHF
TRIOBP 0.05
DES
N/INF RARG 0.00
N ICA
TFAP2C
(n = 95) Differential ZNF219 n = 95 n = 2,002
DTX2
KLF8
TFA HES2
analysis TRIM29 C P = 8 × 10−88
SOX15
LG/HGIN OVOL1 0.25
FOXQ1
(n = 1,081)
Stemness (CCAT)
ELF3 0.20
TP63
FOXN1 0.15
PAX9
KLF3
0.10
BARX2
SOX2
TRIM16 0.05
Noncancer
D E fate
P = 2 × 10−190 Noncyclling cells
0.25 0.25 P = 4 × 10−24
Stemness (CCAT)
Stemness (CCAT)
0.20 0.20
0.15 0.15
DC3
0.10 0.10
High
0.05 0.05 Cancer
fate
0.00 Low 0.00
0.0 0.5 1.0 0 1 2 ≥3
n = 412 n = 144 n = 34 n = 16
Cycle score
2
TFIL DC
DC
1
0.10 0.10
0.10
0.05 0.05 0.05
0.00 0.00
0.00
−0.05 −0.05
−0.05 −0.10
−0.10 High
−0.10 −0.15
−0.15
−0.15
Low
0 1 2 ≥3 0 1 2 ≥3
0.0 0.5 1.0 n = 412 n = 144 n = 34 n = 16
n = 590 n = 247 n = 127 n = 117
Cycle score TFIL
TFIL
Figure 5.
Transcription factor inactivation load correlates with stemness and cancer risk. A, A differential TFA analysis was performed between epithelial cells from the normal/
inflammatory stage and cells from the LGIN/HGIN (Cohort 1). Heatmap displays a binary matrix [black, inactivation event; gray, not significant (n.s.)] depicting the
inactivation events for each cell and TF. For a given LGIN/HGIN cell, inactivation of a TF is defined by a significantly lower activity in that cell compared with all N/INF
cells using a Bonferroni-adjusted P < 0.05 threshold, and where the P value is computed from a cells linear model. TFA is ranked in increasing order of TFIL, where TFIL
is defined as the number of TFs displaying an inactivation event in that cell. TFs labeled in blue are those exhibiting a significantly lower activity in LGIN/HGIN
compared with normal/inflammatory stage. B, Violin plots display the estimated stemness scores using the CCAT measure for epithelial cells in the normal (N) and
ICA for Cohort 1. P values derived from a one-tailed Wilcoxon rank sum test. C, Violin plots displaying the estimated stemness score (CCAT) against the TFIL in the
LGIN/HGIN cells from Cohort 1. P values derived from a linear regression between CCAT and TFIL. D, Smoothed scatterplot of CCAT versus the computed cell-cycle
score for the LGIN/HGIN cells. P values are from a linear regression between CCAT and cell-cycle score. Violin plot to the right is as in C but now only using noncycling
cells, that is, cells with a negative cell-cycle score. E, Three-dimensional diffusion map inferred by applying the diffusion maps algorithm to the TFA-matrix defined
over the 43 esophageal TFs and 3,178 epithelial cells (Cohort 1). Cells are colored according to disease stage, as shown. Black box contains the root state, that is, a cell
from the normal stage that has highest centrality. Red boxes denote the two inferred tipping points, labeling cancer-free and cancer endpoints. Below the diffusion
map, we display a two-dimensional density plot encompassing all LGIN/HGIN cells, and a cancer risk score was obtained for each of these cells by their proximity to
the cancer fate. F, Violin plot displays the estimated cancer progression score for epithelial cells in LGIN/HGIN stage as a function of TFIL. P values derived from a
linear regression. G, Smoothed scatterplot displays the relation between the cancer progression and cell-cycle scores. P values derived from a linear regression. Right
panel is like F, but now using only noncycling cells, defined as cells with a cell-cycle score less than 0.
Frequency of promoter
methylation changes
40% Hyper loci ≥1 Hyper loci = 0 Promoter CpG methylation
Hyper
30% Hypermethylation
20% Hypomethylation
10%
0%
Hypo 10%
20%
genomic alterations
Mutation
Mutation/Deletion
Deletion 0% 6% 12%
Amplification
Amplification 0% 50%
25%
PAX9
ELF3
STON1
FOXN1
TRIM29
ZNF219
FOXA1
HDAC1
TEAD1
OVOL1
RARG
ELK3
RCOR1
BNC1
FOXE1
DES
TP63
KLF3
FOXQ1
RREB1
TFAP2A
TEAD3
SOX2
GRHL2
KLF8
BNC2
YBX2
BARX2
HES2
DTX2
HR
TRIP10
TFAP2C
MYC
IRF6
SOX15
ZNF185
KLF5
TRIM16
TRIOBP
MYCBP
AFDN
EHF
Not available
Not significant
B C **
*** D t(TFA)[MC-UC]
ELF3 ** ****
FOXN1
TFA (PAX9)
20
KLF8 ***
n = 37 4,388 6,823
SOX15 ***
****
n.s. ** PAX9 ***
10 **
TFA (EHF) ZNF219
TRIM16 5 DES *
FOXN1 HES2
SOX15 0 BNC2
TRIOBP
−5
DES N UC MC YBX2 *
n = 37 4,142 7,069 AFDN *
ZNF185 ***
****
*** TP63 ***
STON1 20 **
STON1 ***
TFA (ELF3)
10
TP63 t(TFA)[MC-UC]
0 −50 0 20
0.25 0.25
Stemness (CCAT)
TFIL
Fraction (%)
80
0.20 0.20
60
0
0.15 0.15 1
0.10 0.10 40 2
≥3
0.05 P< 10−500 0.05 20
0.00 0.00 0
WT MT WT MT WT MT WT MT
0 1 2 3 4 5 6 n = 37,497 7,050 23,946 20,601
n = 28,841 10,218 3,420 1,301 512 188 67 NOTCH1 TP53 NOTCH1 TP53
TFIL
Figure 6.
Differential TFA of esophageal-specific TFs is associated with differential DNAm. A, Top, the y-axis of the barplot represents for each TF, the fraction of
promoter CpGs that display significant differential methylation between the 26 ESCCs and their 26 matched normals (Cohort 2) as assessed using a Wald test
(P < 0.01). TFs with at least one significantly hypermethylated promoter CpG site are displayed to the left of the dashed line (n ¼ 19). Significant differences at
the level of each TF promoter were assessed using a paired Student t test (P < 0.01) comparing the mean DNAm values over the promoter. Significant TFs are
shown by annotating the TF names in purple or yellow depending on whether it is hyper- or hypomethylated, respectively. Overall significance of differences in
DNAm levels for all 43 TFs was calculated with a paired Student t test (P ¼ 7.2 10–16). Bottom, heatmap displays the frequency of nonsynonymous somatic
mutations and gene copy number variations across the ESCC patients. B, Heatmap displays the methylation profiles of CpGs mapping to promoters with
significant hypermethylation (red) or hypomethylation (blue) in at least five patients. C, Violin plots display the TFA levels of PAX9, EHF, and ELF3 for
epithelial cells derived from normal esophageal tissue (N), tumor cells from patients with no significant promoter hypermethylation (UC), and tumor cells from
patients with significant promoter hypermethylation (MC). The number of single cells in each category is indicated. P values were computed with an unpaired
Student t test. (Continued on the following page.)
validate and strengthen these findings with increased cell numbers, tiation activity. Of note, the CCAT potency measure also exhibited a
we performed STs with the 10× Visium platform on normal, strong association with cell proliferation, yet critically, the association
squamous dysplasia and invasive cancer samples from three is nonlinear, indicating that noncycling cells can also exhibit moderate
patients with ESCC of Cohort 1 (see Materials and Methods). to high potency (Fig. 5D). We verified that the association between
Across all three patients, this revealed a total of 4,208 epithelial stemness and TFIL is independent of cell proliferation (Supplementary
spots (“Epi spots”), distributed as 477 normal, 945 inflammatory, Table S4), and in line with this, noncycling cells with a high TFIL
243 LGIN, 527 HGIN, and 2016 ICA (Fig. 4A; Supplementary exhibited a higher stemness than noncycling low TFIL ones (Fig. 5D).
Fig. S6D and S6E). From the normal/inflammatory stages, we To test whether the TFIL and stemness are associated with cancer
confidently identified by histology (three separate pathologists progression, we independently estimated, for each of the noncancer-
working independently with a 20 microscope) a total of 621 basal ous cells, a cancer progression score, reflecting the closeness of the cell’s
spots located in the vicinity of the basal membrane or papillae position to the cancer state in the differentiation activity (TFA) phase
(Supplementary Fig. S7A and S7B), which we subsequently con- space, which we inferred by applying diffusion maps (see Materials and
firmed by ST expression of basal-specific markers (Supplementary Methods; Fig. 5E; refs. 42, 43). We note that the diffusion map
Fig. S8). Unsupervised clustering of annotated epithelial, stromal, naturally predicted a bifurcation with cancer cells clustering almost
and immune-cell spots validated our assignments, revealing clear exclusively at one end of diffusion component-1 (DC1) and with
separability, thus confirming high purity of our epithelial spots noncancer cells distributed more evenly (Fig. 5E). As with the stem-
(Supplementary Fig. S9A–S9E). Estimating TFA values in the ness measure itself, the cancer progression score increased with the
(Continued.) D, Heatmap displaying the significance of differential TFA between tumor cells from patients with and without significant promoter
hypermethylation and for the 19 TFs displaying significant hypermethylation in ESCC compared with normal adjacent tissue (i.e., the significant TFs in
barplot of A). E, Boxplots displaying correlation between the CCAT potency/stemness index (y-axis) and TFIL (x-axis) in the cancer cells from ESCC Cohort 2.
P value is from a linear regression. F, Violin plots compare the CCAT potency/stemness values with NOTCH1 and TP53 mutation status as assessed in ESCC
Cohort 2. Note that somatic mutations were only assessed at the bulk tissue level within each ESCC patient, hence for patients carrying mutations, we assigned
all corresponding single cells as “MT,” with patients not carrying mutations assigned the status of wild-type (WT). P values are from a one-tailed Wilcoxon rank
sum test. Barplots compare the relative proportions of cells with varying TFILs between NOTCH1 mutant and wild-type patients, and similarly for TP53. P value
derives from a x2 test. In the case of TP53, relative proportions don’t change in a consistent manner, so P value is not shown.
displayed sufficient read coverage at regulon genes and that according to Supplementary Fig. S13C and S13D; ref. 56). Thus, these data establish
our previous SCIRA-based analysis were inactivated in ESCC. Of these that tissue-specific TFs display lower differentiation activity in corre-
11, a total of 4 (SOX2, RCOR1, ELK3, TEAD1) displayed significantly sponding single cancer cells, and across different cancer types.
lower TFA in ESCC, while the remaining 7 did not display differential
TFA (Supplementary Fig. S11). Thus, for a small fraction of TFs, their
lower TFA in ESCC is associated with hypermethylation of TF-target Discussion
promoters. Next, we decided to explore whether the correlation of TFA Here we have devised a computational method to dissect the
with dedifferentiation is independent of underlying NOTCH1 and TP53 heterogeneity of a preneoplastic epithelial cell population, identifying
mutations, two key mutations in ESCC development. Whilst our CCAT a subpopulation of cells with a high TFIL that is independently
stemness/dedifferentiation index displayed a very strong and highly associated with high stemness and that is found enriched at the
significant association with the TFIL derived from the TFA profiles (as invasive cancer stage. Underlying this result is the important obser-
assessed over the single cells from Cohort 2; Fig. 6E), we only observed a vation that the number of tissue-specific TFs displaying reduced
much milder and no association with NOTCH1 and TP53 mutation differentiation activity increases during cancer progression, consistent
status, respectively (Fig. 6F). Thus, these data support the view that with the progressive selection of a dedifferentiated stem-like state.
changes in differentiation activity of the esophageal-specific TFs is Given that differentiation within the esophageal epithelium proceeds
mirrored at the level of the DNA methylome and that these changes via a unipotent lineage driven by the stem and progenitor cells located
provide a closer proxy to the dedifferentiation/stemness index of cancer in the basal layer, our observations are entirely consistent with a
repressive histone marks, and not by promoter hypermethylation (84). cells identifies dedifferentiated stem-like cells that appear to be selected
In line with this, promoter hypermethylation of tissue-specific TFs for during cancer progression. These novel insights and the compu-
is observed in normal cells exposed to cancer risk factors, including tational CancerStemID framework presented herein, could facilitate
age (85), and has been proposed to be a cancer hallmark (3, 19). the development of the much-needed early detection and cancer risk
However, we cannot exclude the possibility that other epigenetic prediction markers for deadly cancers such as ESCC, or alternatively,
mechanisms, for instance somatic mutations affecting epigenetic to help assess the efficacy of cancer prevention trials (90).
enzymes, could drive DNAm changes affecting tissue-specific TFs.
It will be important for future work to generate scRNA-seq data Authors’ Disclosures
jointly with scATAC-seq (86), histone modifications (87) or DNAm No disclosures were reported.
data (88), in the same cells, as this could establish direct relation-
ships between TFIL and changes to chromatin accessibility. Authors’ Contributions
Overall, we acknowledge that our study and the conclusions drawn T. Liu: Data curation, formal analysis, visualization, writing–review and editing.
from it are subject to several limitations. First, our in silico predictions X. Zhao: Data curation, formal analysis, investigation. Y. Lin: Formal analysis,
would require experimental validation. To establish experimentally if visualization. Q. Luo: Formal analysis, visualization. S. Zhang: Formal analysis,
the preneoplastic cells we have identified represent those at highest investigation, visualization, writing–review and editing. Y. Xi: Data curation.
cancer risk would require advanced in vivo lineage tracing techni- Y. Chen: Data curation, formal analysis. L. Lin: Data curation. W. Fan: Data
ques (89) that have not yet been developed. Second, how to epige- curation. J. Yang: Data curation. Y. Ma: Data curation. A.K. Maity: Formal analysis.
Y. Huang: Validation, methodology. J. Wang: Validation, methodology. J. Chang:
References
1. Tirosh I, Venteicher AS, Hebert C, Escalante LE, Patel AP, Yizhak K, et al. 8. Issa JP. Epigenetic variation and cellular Darwinism. Nat Genet 2011;43:724–6.
Single-cell RNA-seq supports a developmental hierarchy in human oligo- 9. Winslow MM, Dayton TL, Verhaak RG, Kim-Kiselak C, Snyder EL, Feldser DM,
dendroglioma. Nature 2016;539:309–13. et al. Suppression of lung adenocarcinoma progression by Nkx2–1. Nature 2011;
2. Feinberg AP, Ohlsson R, Henikoff S. The epigenetic progenitor origin of human 473:101–4.
cancer. Nat Rev Genet 2006;7:21–33. 10. Zhao W, Hisamuddin IM, Nandan MO, Babbin BA, Lamb NE, Yang VW.
3. Baylin SB, Ohm JE. Epigenetic gene silencing in cancer - a mechanism for early Identification of Kruppel-like factor 4 as a potential tumor suppressor gene in
oncogenic pathway addiction? Nat Rev Cancer 2006;6:107–16. colorectal cancer. Oncogene 2004;23:395–402.
4. Schedl A, Hastie N. Multiple roles for the Wilms’ tumour suppressor 11. Teschendorff AE, Zheng SC, Feber A, Yang Z, Beck S, Widschwendter M.
gene, WT1 in genitourinary development. Mol Cell Endocrinol 1998;140: The multi-omic landscape of transcription factor inactivation in cancer.
65–9. Genome Med 2016;8:89.
5. Tao Y, Kang B, Petkovich DA, Bhandari YR, In J, Stein-O’Brien G, et al. Aging- 12. Chen Y, Widschwendter M, Teschendorff AE. Systems-epigenomics inference of
like spontaneous epigenetic silencing facilitates Wnt activation, stemness, and transcription factor activity implicates aryl-hydrocarbon-receptor inactivation
Braf(V600E)-induced tumorigenesis. Cancer Cell 2019;35:315–28. as a key event in lung cancer development. Genome Biol 2017;18:236.
6. Xie W, Kagiampakis I, Pan L, Zhang YW, Murphy L, Tao Y, et al. DNA 13. Moore L, Leongamornlert D, Coorens THH, Sanders MA, Ellis P, Dentro SC,
methylation patterns separate senescence from transformation potential and et al. The mutational landscape of normal human endometrial epithelium.
indicate cancer risk. Cancer Cell 2018;33:309–21. Nature 2020;580:640–6.
7. Maegawa S, Gough SM, Watanabe-Okochi N, Lu Y, Zhang N, Castoro RJ, et al. 14. Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al.
Age-related epigenetic drift in the pathogenesis of MDS and AML. Genome Res The repertoire of mutational signatures in human cancer. Nature 2020;578:
2014;24:580–91. 94–101.
15. Lee-Six H, Olafsson S, Ellis P, Osborne RJ, Sanders MA, Moore L, et al. The 41. Smyth GK. Linear models and empirical bayes methods for assessing
landscape of somatic mutation in normal colorectal epithelial cells. Nature 2019; differential expression in microarray experiments. Stat Appl Genet Mol
574:532–7. Biol 2004;3:Article3.
16. Brunner SF, Roberts ND, Wylie LA, Moore L, Aitken SJ, Davies SE, et al. Somatic 42. Angerer P, Haghverdi L, Buttner M, Theis FJ, Marr C, Buettner F. destiny:
mutations and clonal dynamics in healthy and cirrhotic human liver. Nature diffusion maps for large-scale single-cell data in R. Bioinformatics 2016;32:
2019;574:538–42. 1241–3.
17. Martincorena I, Fowler JC, Wabik A, Lawson ARJ, Abascal F, Hall MWJ, et al. 43. Haghverdi L, Buttner M, Wolf FA, Buettner F, Theis FJ. Diffusion pseudotime
Somatic mutant clones colonize the human esophagus with age. Science 2018; robustly reconstructs lineage branching. Nat Methods 2016;13:845–8.
362:911–7. 44. Pons P, Latapy M. Computing communities in large networks using random
18. Li R, Di L, Li J, Fan W, Liu Y, Guo W, et al. A body map of somatic mutagenesis in walks. Berlin, Heidelberg: Springer; 2005.
morphologically normal human tissues. Nature 2021;597:398–403. 45. Teschendorff AE, Maity AK, Hu X, Weiyan C, Lechner M. Ultra-fast scalable
19. Ohm JE, McGarvey KM, Yu X, Cheng L, Schuebel KE, Cope L, et al. A stem cell- estimation of single-cell differentiation potency from scRNA-Seq data. Bioin-
like chromatin pattern may predispose tumor suppressor genes to DNA formatics 2021;37:1528–34.
hypermethylation and heritable silencing. Nat Genet 2007;39:237–42. 46. Athanasiadis EI, Botthof JG, Andres H, Ferreira L, Lio P, Cvejic A. Single-cell
20. Schlesinger Y, Straussman R, Keshet I, Farkash S, Hecht M, Zimmerman J, et al. RNA-sequencing uncovers transcriptional states and fate decisions in haema-
Polycomb-mediated methylation on Lys27 of histone H3 pre-marks genes for topoiesis. Nat Commun 2017;8:2045.
de novo methylation in cancer. Nat Genet 2007;39:232–6. 47. Shi J, Teschendorff AE, Chen W, Chen L, Li T. Quantifying Wadding-
21. Tang F, Lao K, Surani MA. Development and applications of single-cell ton’s epigenetic landscape: a comparison of single-cell potency measures.
transcriptome analysis. Nat Methods 2011;8:S6–11. Briefings Bioinf 2018.
22. Teschendorff AE, Feinberg AP. Statistical mechanics meets single-cell biology. 48. Tirosh I, Izar B, Prakadan SM, Wadsworth MH II, Treacy D, Trombetta JJ, et al.
65. Yanagi T, Watanabe M, Hata H, Kitamura S, Imafuku K, Yanagi H, et al. Loss of 78. The Cancer Genome Atlas Research Network. Integrated genomic charac-
TRIM29 alters keratin distribution to promote cell invasion in squamous cell terization of oesophageal carcinoma. Nature 2017;541:169–75.
carcinoma. Cancer Res 2018;78:6795–806. 79. Gao YB, Chen ZL, Li JG, Hu XD, Shi XJ, Sun ZM, et al. Genetic
66. Smirnov A, Lena AM, Cappello A, Panatta E, Anemona L, Bischetti S, et al. landscape of esophageal squamous cell carcinoma. Nat Genet 2014;46:
ZNF185 is a p63 target gene critical for epidermal differentiation and squamous 1097–102.
cell carcinoma development. Oncogene 2019;38:1625–38. 80. Yokoyama A, Kakiuchi N, Yoshizato T, Nannya Y, Suzuki H, Takeuchi Y, et al.
67. Xiong Z, Ren S, Chen H, Liu Y, Huang C, Zhang YL, et al. PAX9 regulates Age-related remodelling of oesophageal epithelia by mutated cancer drivers.
squamous cell differentiation and carcinogenesis in the oro-oesophageal epi- Nature 2019;565:312–7.
thelium. J Pathol 2018;244:164–75. 81. Tomasetti C, Poling J, Roberts NJ, London NR Jr, Pittman ME, Haffner MC, et al.
68. Watanabe H, Ma Q, Peng S, Adelmant G, Swain D, Song W, et al. SOX2 and p63 Cell division rates decrease with age, providing a potential explanation for the
colocalize at genetic loci in squamous cell carcinomas. J Clin Invest 2014;124: age-dependent deceleration in cancer incidence. Proc Nat Acad Sci USA 2019;
1636–45. 116:20482–8.
69. Wu Z, Zhou J, Zhang X, Zhang Z, Xie Y, Liu JB, et al. Reprogramming of the 82. Yamashita S, Kishino T, Takahashi T, Shimazu T, Charvat H, Kakugawa Y,
esophageal squamous carcinoma epigenome by SOX2 promotes ADAR1 depen- et al. Genetic and epigenetic alterations in normal tissues have differential
dence. Nat Genet 2021;53:881–94. impacts on cancer risk among tissues. Proc Nat Acad Sci USA 2018;115:
70. Jiang Y, Jiang YY, Xie JJ, Mayakonda A, Hazawa M, Chen L, et al. Co-activation of 1328–33.
super-enhancer-driven CCAT1 by TP63 and SOX2 promotes squamous cancer 83. Yoshida K, Gowers KHC, Lee-Six H, Chandrasekharan DP, Coorens T,
progression. Nat Commun 2018;9:3619. Maughan EF, et al. Tobacco smoking and somatic mutations in human
71. Jiang YY, Jiang Y, Li CQ, Zhang Y, Dakle P, Kaur H, et al. TP63, SOX2, and KLF5 bronchial epithelium. Nature 2020;578:266–72.
establish a core regulatory circuitry that controls epigenetic and transcription 84. Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, et al.