International Journal of Radiation Biology
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/irab20
Improved radiation expression profiling in blood
by sequential application of sensitive and specific
gene signatures
Eliseos J. Mucaki, Ben C. Shirley & Peter K. Rogan
To cite this article: Eliseos J. Mucaki, Ben C. Shirley & Peter K. Rogan (2021): Improved radiation
expression profiling in blood by sequential application of sensitive and specific gene signatures,
International Journal of Radiation Biology, DOI: 10.1080/09553002.2021.1998709
To link to this article: https://doi.org/10.1080/09553002.2021.1998709
View supplementary material
Published online: 12 Nov 2021.
Submit your article to this journal
Article views: 41
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=irab20
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
https://doi.org/10.1080/09553002.2021.1998709
ORIGINAL ARTICLE
Improved radiation expression profiling in blood by sequential application of
sensitive and specific gene signatures
Eliseos J. Mucakia, Ben C. Shirleyb, and Peter K. Rogana,b
a
Department of Biochemistry, University of Western Ontario, London, Canada; bCytoGnomix Inc., London, Canada
ABSTRACT
ARTICLE HISTORY
Purpose: Combinations of expressed genes can discriminate radiation-exposed from normal control blood samples by machine learning (ML) based signatures (with 8–20% misclassification rates).
These signatures can quantify therapeutically relevant as well as accidental radiation exposures.
The prodromal symptoms of acute radiation syndrome (ARS) overlap those present in influenza
and dengue fever infections. Surprisingly, these human radiation signatures misclassified gene
expression profiles of virally infected samples as false positive exposures. The present study investigates these and other confounders, and then mitigates their impact on signature accuracy.
Methods: This study investigated recall by previous and novel radiation signatures independently
derived from multiple Gene Expression Omnibus datasets on common and rare non-neoplastic
blood disorders and blood-borne infections (thromboembolism, S. aureus bacteremia, malaria,
sickle cell disease, polycythemia vera, and aplastic anemia). Normalized expression levels of signature genes are used as input to ML-based classifiers to predict radiation exposure in other hematological conditions.
Results: Except for aplastic anemia, these blood-borne disorders modify the normal baseline
expression values of genes present in radiation signatures, leading to false-positive misclassification of radiation exposures in 8–54% of individuals. Shared changes, predominantly in DNA damage response and apoptosis-related gene transcripts in radiation and confounding hematological
conditions, compromise the utility of these signatures for radiation assessment. These confounding
conditions (sickle cell disease, thrombosis, S. aureus bacteremia, malaria) induce neutrophil extracellular traps, initiated by chromatin decondensation, DNA damage response and fragmentation
followed by programmed cell death or extrusion of DNA fragments. Riboviral infections (e.g. influenza or dengue fever) have been proposed to bind and deplete host RNA binding proteins, inducing R-loops in chromatin. R-loops that collide with incoming replication forks can result in
incompletely repaired DNA damage, inducing apoptosis and releasing mature virus. To mitigate
the effects of confounders, we evaluated predicted radiation-positive samples with novel gene
expression signatures derived from radiation-responsive transcripts encoding secreted blood
plasma proteins whose expression levels are unperturbed by these conditions.
Conclusions: This approach identifies and eliminates misclassified samples with underlying hematological or infectious conditions, leaving only samples with true radiation exposures. Diagnostic
accuracy is significantly improved by selecting genes that maximize both sensitivity and specificity
in the appropriate tissue using combinations of the best signatures for each of these classes
of signatures.
Received 8 July 2021
Revised 19 October 2021
Accepted 21 October 2021
Introduction
One of the most promising approaches to quantify absorbed
ionizing radiation is based on levels of different gene expression responses in blood (Dressman et al. 2007; Paul and
Amundson 2008; Ding et al. 2013; Lu et al. 2014).
Combinations of gene expression levels, termed signatures,
can predict ionizing radiation exposure in humans and mice
from publicly available microarray gene expression levels
(Zhao et al. 2018a). Several groups (Boldt et al. 2012;
Budworth et al. 2012; Knops et al. 2012; Ghandhi et al.
KEYWORDS
Biodosimetry; gene
expression; false positive
reactions; DNA damage
response (DDR); radiation;
hematological disease
2015) have also used signatures to determine radiation exposures. Our approach uses supervised machine learning (ML)
with genes previously implicated or established from genetic
evidence and biochemical pathways that are altered in
response to these exposures (Zhao et al. 2018a).
Biochemically inspired ML is a robust approach to derive
diagnostic gene signatures for radiation and chemotherapy
(Dorman et al. 2016; Mucaki et al. 2016, 2019; BagcheeClark et al. 2020). Given the limited sample sizes of typical
datasets, appropriate ML methods for deriving gene signatures have included support vector machines, random forest
CONTACT Peter K. Rogan
progan@uwo.ca
Departments of Biochemistry and Oncology, Schulich School of Medicine and Dentistry, University of Western
Ontario, London, ON N6A 2C1 Canada.
Supplemental data for this article can be accessed here.
Copyright ß 2021 Taylor & Francis Group LLC.
2
E. J. MUCAKI ET AL.
classifiers, decision trees, simulated annealing, and artificial
neural networks (Boldrini et al. 2019; Rogan 2019).
Before performing ML, we ranked a set of curated radiation-response genes in radiation-exposed samples by minimum redundancy maximum relevance (mRMR; Ding and
Peng 2005), which orders genes based on mutual information (MI) between their expression and whether the sample
was irradiated (MR) and the degree to which their expression profile is dissimilar from previously selected genes
(mR). A support vector model (SVM) or signature is generated from the top ranked gene set by adding and removing
genes that minimize either model misclassification or logloss (Zhao et al. 2018a). ML alone is not sufficient to derive
useful signatures, since underlying biochemical pathways
and disease mechanisms also contribute to selecting relevant
and non-redundant gene features (Bagchee-Clark et al.
2020). Signatures were evaluated either by a traditional validation approach or by stratified k-fold validation (which
splits the same dataset into k groups reserved for testing and
training). The traditional model-centric approach uses a
normalized training set which is then used to predict outcome of normalized independent test data. While more susceptible to batch effects than k-fold validation, this approach
can incorporate smaller datasets or heterogeneous
data sources.
The present study is concerned with the selection of
‘normal’ controls for training and testing signatures.
Selection of appropriate controls has been a debated subject
in other medical fields (Lipsitch et al. 2010). In individuals
with clinically overlapping diagnoses, this has a critical but
underappreciated impact on the accuracy of molecular diagnostic testing. Previous studies have revealed gene expression changes in blood from unirradiated individuals with
underlying metabolic or confounders, including smokers
(Paul and Amundson 2011) and modulators of inflammation such as lipopolysaccharide of bacterial origin or curcumin (Cruz-Garcia et al. 2018). There is also evidence of
interactions between hematological comorbidities and radiation exposure. Radiodermatitis has been associated with S.
aureus infections (Hill et al. 2004). Radiation therapy has
also been contraindicated in individuals with venous
thromboembolism (Guy et al. 2017).
While investigating the possibility of using radiation gene
signatures to differentiate the prodromal phase of acute radiation syndrome (ARS) from early-stage influenza, riboviral
infections induced expression changes similar to those seen
in gamma-irradiated samples (Rogan et al. 2021). Radiation
signatures misclassified some unirradiated blood samples
from infected individuals as radiation-exposed (derived in
Zhao et al. 2018a; designated M1–M4 in Rogan et al. 2020).
False positive (FP) radiation exposure predictions of unirradiated samples from individuals diagnosed of influenza
has also been noted by others (Jacobs et al. 2020
(Supplementary data)). The M1–M4 signatures consisted
predominantly of genes with roles in DNA damage
response, programmed cell death, and inflammation.
However, many of these genes were also similarly dysregulated in blood from influenza and dengue virus-infected
samples. The expression of the DNA damage response gene
DDB2, for example, increases with radiation, but was also
induced in a significant number of viral infected samples
which were then classified incorrectly as radiation exposed
(Figure 5 of Rogan et al. 2021). DDB2 is present in many
other radiation gene signatures or developed radiationexposure assays (Paul and Amundson 2008; Lu et al. 2014;
Jacobs et al. 2020).
While this process strictly validates signatures to estimate
sensitivity to radiation, the same rigor has not been applied
to determining specificity, which we suspected could be
impacted by comorbidities in the general population. This
study considers other hematological conditions that alter the
normal expression of the same genes in blood that are often
selected for assessment of radiation exposure and suggests
an approach for addressing this issue.
We evaluated gene expression data from other individuals
with blood-borne conditions (infections, inherited and idiopathic hematological disorders) to determine whether previous and novel gene signatures could discriminate radiation
exposure from these phenotypes. We investigate whether
these effects are reproducible using newly derived signatures
from independent datasets derived from irradiated blood
samples (Figure 1).
Methods
Datasets evaluated
Expression data from the Gene Expression Omnibus (GEO;
https://www.ncbi.nlm.nih.gov/geo/) and Array Express
databases
were
(https://www.ebi.ac.uk/arrayexpress/)
required to contain the same genes in each signature in
both training and independent validation radiation datasets
(Zhao et al. 2018a). These datasets were: GSE1725
[Designated:
RadLymphCL-1],
GSE6874
[GPL4782;
Designated: RadTBI-2], GSE10640 [GPL6522; Designated:
RadTBI-3] and GSE701 [Designated: RadLymphCL-4]
(Table 1). Several well known radiation responsive genes
which have appeared in other radiation gene signatures
(Paul and Amundson 2008; Oh et al. 2014; Port et al. 2017;
Tichy et al. 2018; Jacobs et al. 2020) were previously not
considered and were not present in any of our former ML
models. Genes were excluded either because they were: (1)
absent from one or more datasets (e.g. FDXR, RPS27L, AEN
were missing from RadTBI-3); (2) mislabeled in the dataset
with a legacy name leading to a mismatch between datasets
(e.g. PARP1 appears as ADPRT in RadTBI-2); (3) secondary
RNAs, such as micro- or long non-coding-RNA derived
from the same gene (e.g. BBC3 probes also detected multiple
microRNAs in RadLymphCL-1; POU2AF1 probes in
RadLymphCL-4 indicated as LOC101928620); or (4) missing
from the set of curated radiation response genes (e.g.
PHPT1, VWCE, WNT3).
To address the possibility that inclusion of genes missing
from signatures based on RadLymphCL-1, RadTBI-2 or
RadTBI-3 might improve radiation response prediction, we
developed novel signatures from more recent radiation datasets, including GEO: GSE26835, GSE85570, GSE102971 and
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
3
Figure 1. Evaluation of conditions confounding radiation gene signatures. The traditional validation approach was used to evaluate unirradiated datasets for hematological conditions to assess the performance of radiation gene signatures derived in Zhao et al. (2018a). With this approach, we can identify and then reject models with high rates of FP radiation diagnosis in confounding conditions while identifying what confounders could make individuals ineligible for a radiation gene
signature assay. Alternatively, new radiation gene signatures could be derived that show improved FP rates in both controls and test subjects.
ArrayExpress: E-TABM-90 (Designated RadLymphCL-5,
RadBloodpost-6, RadBlood-7, and RadBloodpost-8, respectively; Table 1). Previous radiation gene signatures (Zhao
et al. 2018a) were derived from GSE6874[GPL4782] and
GSE10640[GPL6522] (RadTBI-2 and RadTBI-3, respectively;
bracketed accession numbers refer to the subset of samples
belonging to a corresponding GEO SuperSeries). Exposure
levels were at a minimum of 2 Gy (including total body
irradiation (TBI)) and were carried out between 4 and
24 hours post-irradiation for all radiation-derived signatures.
To investigate whether other disorders could also confound radiation signatures, we assessed performance of radiation signatures utilized in this study with available gene
expression datasets for other blood-borne diseases. These
datasets include GEO: GSE117613 (cerebral malaria and
severe malarial anemia; Designated: BBD-Malaria [BBD –
blood borne disease]), GSE35007 (sickle cell disease in children; Designated: BBD-Sickle), GSE47018 (polycythemia
vera; Designated: BBD-Polycyt), GSE19151 (single and
recurrent venous thrombosis; Designated: BBD-Thromb),
GSE30119 (Staphylococcus [S.] aureus infection; Designated:
BBD-Saureus), and GSE16334 (aplastic anemia; Designated:
BBD-Aanemia) (Table 1). Gene expression was measured by
microarray in each dataset (qRT-PCR validation data was
not available). An idiopathic portal hypertension dataset
(GSE69601) was excluded due to insufficient numbers of
samples (N ¼ 6).
Data preprocessing
Microarray data from each dataset were pre-processed as
described in Zhao et al. (2018a). Briefly, missing gene
expression values were imputed (gene feature was removed
if values were missing from >5% samples) from nearest
neighbors, the expression values of patient replicates were
averaged, and gene expression of all genes was z-score normalized. Genes previously implicated in the radiation
response (N ¼ 998) were analyzed, including 13 additional
radiation genes described in other studies, including CD177,
DAGLA, HIST1H2BD, MAMDC4, PHPT1, PLA2G16, PRF1,
SLC4A11, STAT4, VWCE, WLS, WNT3, and ZNF541
(N ¼ 1011 genes total).
Derivation of radiation gene signatures
mRMR ranking was determined for curated, expressed radiation-related genes in the presence or absence of radiation.
mRMR first selects the gene with the highest MI (Cover and
Thomas 2006; Zeng 2015) between its expression level and
the radiation exposure status of each sample. MI ranges
from 0 to 1 bit for a gene for a set comprised of radiated
and unirradiated samples and measures the mutual dependence between the radiation exposure status and expression
for each gene within the same dataset. With MI ¼ 1 bit,
expression levels and radiation exposure are perfectly correlated. Expression levels of a gene that distinguish some but
not all irradiated and unirradiated samples produce MI values between 0 and 1. A low MI value of 0 bits indicates
that expression levels are weakly or uncorrelated with the
radiation phenotype. mRMR feature selection then minimizes redundant expression patterns among the genes
chosen by prioritizing gene candidates with the highest difference between its MI and the average MI of all previously
selected genes with the candidate gene as a probability vector (Ding and Peng 2005; Mucaki et al. 2016). Minimizing
redundancy results in some subsequent selected gene(s) with
orthologous expression patterns relative to the preceding
gene(s). These may exhibit significantly lower MI, typical of
4
E. J. MUCAKI ET AL.
Table 1. Characteristics of datasets analyzed.
Gene expression [designation]
dataset
Radiation-exposed
[RadLymphCL-1] GSE1725a
[RadTBI-2] GSE6874 [GPL4782]a,e
[RadTBI-3] GSE10640 [GPL6522]a,e
[RadLymphCL-4] GSE701a
[RadLymphCL-5] GSE26835a
[RadBloodpost-6] GSE85570a
[RadBlood-7] GSE102971a
[RadBloodpost-8] E-TABM-90b
Gene expression [designation]
dataset
Other blood-borne diseasesg
[BBD-Malaria] GSE117613a
[BBD-Sickle] GSE35007a
[BBD-Polycyt] GSE47018a
[BBD-Thromb] GSE19151a
[BBD-Saureus] GSE30119a
[BBD-Aanemia] GSE16334a
[BBD-Flu85] GSE29385a
[BBD-Flu50] GSE82050a
[BBD-Flu28] GSE50628a
[BBD-Flu21] GSE61821a
[BBD-Flu31] GSE27131a
[BBD-Deng61] GSE97861a
[BBD-Deng62] GSE97862a
[BBD-Deng08] GSE51808a
[BBD-Deng78] GSE58278a
Phenotype of
samplesd
Count: individuals/
controls
Exposure Gy
(time points)c
57/57
27/51
10/75
10/1
362/362f
220/220
5 (4 h)
2 (6 h)
2 (6 h)
3, 10 (1, 2, 6, 12, 24 h)
10 (2, 6 h)
2 (24 h)
80/20
50/50
2, 5, 6, 7 (24 h)
2 (24 h)
Count: individuals/
controls
34/12
250/61
20/7
70/63
99/44
21/11
71/84
24/15
10/10
238/164
7/14
27/3
20/24
28/9
12/6
Array platform
Lymph. CL
TBI
TBI
Lymph. CL
Lymph. CL
Blood-Prostate cancer,
2 year post-RT
Blood-Healthy
Blood-Prostate cancer,
2 year post-RT
Reference(s)
Af. U95 V2
Operon V3.0.2
Operon V4.0
Af. U95A
Af. HG-U133A 2.0
Af. HT HG-U133þ
Rieger et al. (2004)
Dressman et al. (2007)
Meadows et al. (2008)
Jen and Cheung (2003)
Smirnov et al. (2012)
van Oorschot et al. (2017)
Ag. 4x44K v2
Af. HG-U133A
Park et al. (2017)
Svensson et al. (2006)
Phenotype of
samplesd
Array platform
Malaria
Sickle cell
Polycythemia vera
Thrombosis
S. aureus
Aplastic anemia
Influenza
Influenza
Influenza
Influenza
Influenza
Dengue fever
Dengue fever
Dengue fever
Dengue fever
Il. HT-12 V4.0
Il. HT-12 V4.0
Af. HG-U133A
Af. HG-U133A 2.0
Il. HT-12 V3.0
Af. HG-U133A
Il. HT-12 V4.0
Ag. SurePrint G3 GE v3
Af. HG U133 þ 2.0
Il. HT-12 v4.0
Af. HG 1.0 ST
RNASeq
RNASeq
Af. HT HG-U133þ
Il. HT-12 v4.0
Reference(s)
Nallandhighal et al. (2019)
Quinlan et al. (2014)
Spivak et al. (2014)
Lewis et al. (2011)
Banchereau et al. (2012)
Vanderwerf et al. (2009)
NA
Tang et al. (2017)
Tsuge et al. (2014)
Hoang et al. (2014)
Berdal et al. (2011)
Tian et al. (2017)
Tian et al. (2017)
Kwissa et al. (2014)
Olagnier et al. (2014)
BBD: blood borne disease; NA: not available; CL: cell line; Lymph.: lymphoblastoid cell lines; pts: patients; RT: radiation therapy; Af.: Affymetrix; Ag.: Agilent; Il:
Illumina; RNASeq: normalized expression from RNA sequencing.
a
Gene Expression Omnibus.
b
Array Express.
c
All radiated samples were exposed ex vivo (except datasets RadTBI-2 and RadTBI-3, which were patients undergoing total body irradiation (TBI)). Other controls
were obtained from healthy individuals.
d
All exposed and unirradiated control samples were from blood, except from datasets RadLymphCL-1, RadLymphCL-4 and RadLymphCL-5, which were lymphoblastoid cell line cultures irradiated in vitro.
e
Study utilized multiple array platforms.
f
Samples taken 2- and 6-hours post-radiation (N ¼ 362 each) were evaluated independently.
g
All confounder datasets are sourced from blood except for BBD-Aanemia (bone marrow cells), BBD-Deng61 (T cells) and BBD-Deng78 (in vitro infected monocyte-derived dendritic cells).
a weak radiation response. Nevertheless, higher ranked gene
features exhibit larger MI values in general. Gene rankings
by mRMR and the computed MI for each radiation gene in
each of the datasets evaluated are provided in Suppl.
Table S1.
SVM-based gene signatures were derived by greedy feature selection, including forward sequential feature selection
(FSFS), backward sequential feature selection (BSFS) and
complete sequential feature selection (CSFS; Zhao et al.
2018a). Our software for biochemically inspired ML is available in a Zenodo archive (Zhao et al. 2018b). Both FSFS and
BSFS models were derived from the top 50 ranked mRMR
genes, in addition to other published radiation responsive
genes: AEN, BAX, BCL2, DDB2, FDXR, PCNA, POU2AF1,
and WNT3. SVMs were derived with a Gaussian radial basis
function kernel by iterating over box-constraint (C) and kernel-scale (r) parameters and gene features, minimizing to
either misclassification or log loss by cross-validation (Zhao
et al. 2018a; Bagchee-Clark et al. 2020). Gene signatures
were then assessed with a validation dataset and reevaluated
(by misclassification rates, log loss, Matthews correlation
coefficient, or goodness of fit). This study primarily reports
misclassification rates to simplify comparisons of results
between radiation-exposed and disease confounder datasets.
Those with high misclassification rates in validated radiation
datasets (>50%) have been excluded. Variation in radiation
signature composition among different source datasets can
be attributed to distinct microarray platforms, other batch
effects, and inter-individual variation in gene expression
which cannot be fully mitigated by normalization. These
contribute to MI variability, which both alters mRMR rank
and features selected during signature derivation.
The quality of each dataset was assessed based on the
dynamic range in responses by signature genes to radiation.
This was based on the premise that these responses among
different confounding datasets might alter expression of
some of the same radiation-responsive genes. MI between
gene expression and radiation dose is indicated in Suppl.
Table S1. Datasets RadBloodpost-6 and RadBlood-7 both
exhibit high MI with radiation exposure (maximum MI
>0.7 bits for both datasets; 77 and 115 genes with >0.2 bits
MI, respectively). By contrast, datasets RadLymphCL-5 and
RadBloodpost-8 both exhibited low MI for top ranked genes
with radiation exposure. Of the top 50 ranked genes in
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
dataset RadBloodpost-8, 13 genes had MI values <10% of
the MI of the top ranked gene (0.3 bits), and 856 of 860
genes in the complete dataset had MI <0.2 bits. The maximum MI for RadLymphCL-5 was 0.25 bits; 40 of 50 top
ranked genes exhibited <10% MI of this value, and the radiation response genes DDB2, PCNA, FDXR, AEN, and BAX,
had unexpectedly low rankings (>100; 2 h and 6 h postexposure) and MI <0.15 bits. The low MIs across all eligible
genes indicates that the response to radiation was nearly
random in both datasets. These datasets failed to satisfy
minimum quality criteria and were excluded from further
analyses. Radiation toxicity in dataset RadBloodpost-8 and
cell line immortalization in RadLymphCL-5 appears to compromise their radiation response.
Radiation gene signatures derived from expressed genes
encoding secreted factors originating from blood cells
Only genes that encoded proteins present in blood plasma
were used to derive an alternate set of gene expression signatures. The initial set of plasma proteins from the Human
Protein Atlas ‘Human Secretome’ (http://www.proteinatlas.
org/humanproteome/secretome) and the Plasma Protein
Database (http://www.plasmaproteomedatabase.org) were
cross-referenced to create a list of 1377 shared proteins. The
Genotype-Tissue Expression (GTEx) Portal (https://gtexportal.org/home/) was used to determine which genes encoding
these secreted proteins are expressed in either leukocytes or
transformed lymphoblasts at detectable levels (where
Transcripts Per Million (TPM) >1; N ¼ 682). Expressed
genes present in radiation datasets RadTBI-2 (N ¼ 428) and
RadTBI-3 (N ¼ 325) were used to derive ML models of
genes encoding for human plasma proteins using CSFS,
BSFS, and FSFS.
Evaluating specificity of radiation gene signatures with
expression of genes in confounding
hematological conditions
Radiation gene signatures derived in this study (M5–M20)
and in Zhao et al. (2018a) (M1–M4, KM3–KM7) were used
to evaluate datasets consisting of independent expression
measurements of the same gene features in samples derived
from hematological disorders and controls (Table 1).
Traditional ML validation was performed using available
software (‘regularValidation_multiclassSVM.m’; Zhao et al.
2018b). This software performs quantile normalization to
the features of the training and test set together (making the
distributions of the two datasets statistically identical) before
fitting a model to the training data which is then used to
predict exposure based on the normalized expression of the
test set. We evaluated how often these unirradiated individuals were misclassified as radiation-exposed by these models.
Significant differences between FP blood-borne cases and
controls for the same signature were determined with the
Mantel–Haenszel chi square and Mid-P exact tests, using a
threshold of p¼.05. Assessing unirradiated expression datasets featuring patients with hematological or infectious
5
conditions using radiation signatures will not yield any true
positive (TP) or false negative (FN) cases. This is because
our experimental design did not merge radiation and confounder datasets by joint normalization of expression values.
Normalization often does not adequately account for variations due to batch effects or between different microarray
platforms. Furthermore, we cannot exclude that irradiated
samples from healthy individuals may mask underlying
blood-based phenotypes that were not documented in published studies. For these reasons, radiation and hematological disorder datasets were evaluated separately with the
same signatures for FP and TN levels only, rather than by
positive or negative predictive values. We determined the
impact of genes in each signature on misclassification by
iteratively removing individual gene features, rederiving signatures with these genes, and redetermining misclassification
rates using expression data from hematological or infectious
diseases (Zhao et al. 2018a; Mucaki et al. 2019). Expression
levels of radiation responsive genes in confounder datasets
of correctly vs. misclassified samples were contrasted using
violin plots. These display weighted distributions of the normalized gene expression from each confounder datasets
which were either properly (true negative (TN)) and
improperly (FP) classified as irradiated by the radiation gene
signatures (created in R [i386 v4.0.3] with ggplot2). Counts
of confounder sub-phenotypes were stratified using Sankey
diagrams (SankeyMATIC; http://sankeymatic.com/build/)
which display the distribution of misclassified samples by
phenotype. This analysis delineates FP and TN predictions
(at the individual level) of groups of diseased patients and
controls from these datasets according to predictions of the
designated, specific radiation gene signature.
Results
Initial evaluation of candidate genes in radiation gene
expression datasets for machine learning
We derived new gene expression signatures by leave-one-out
and k-fold cross-validation from microarray data based on
more recent comprehensive gene datasets (RadBloodpost-6
and RadBlood-7) besides those we previously reported
(Zhao et al. 2018a). Only some of the 1011 curated genes
were present on these microarray platforms, including 864
genes in RadBloodpost-6 and 971 genes of RadBlood-7.
After normalization, gene rankings by mRMR between datasets RadBloodpost-6 and RadBlood-7 were similar (Suppl.
Table S1). In RadBloodpost-6, FDXR were ranked first,
while AEN was top ranked in RadBlood-7 (FDXR was
ranked 38th). DDB2 was top ranked in datasets RadTBI-2
and RadTBI-3 (Zhao et al. 2018a), datasets which lacked
expression for FDXR and AEN, respectively. Radiationresponse genes among the top 50 ranked genes present in
all four datasets included BAX, CCNG1, CDKN1A, DDB2,
GADD45A, PPM1D and TRIM22.
ERCC1 was selected by mRMR to be the second-ranked
gene in dataset RadBlood-7, even though its MI was 31-fold
lower than the top ranked gene, AEN (Suppl. Table S1). MI
of the second-ranked genes in dataset RadTBI-2 (RAD17)
6
E. J. MUCAKI ET AL.
Table 2. Traditional and k-fold validated radiation gene expression signatures M1–M4 and KM1–KM7.
Signature
(a) Derived from radiation dataset RadLymphCL-1 (GSE1725)
KM1
GADD45A DDB2
KM2
PPM1D DDB2 CCNF CDKN1A PCNA GADD45A PRKAB1 TOB1 TNFRSF10B MYC CCNB2 PTP4A1 BAX CCNA2 ATF3
LIG1 CCNG1 FHL2 PPP1R2 MBD4 RASGRP2 UBC NINJ1 TRIM22 IL2RB TP53BP1 PTPRCAP EEF1D PTPRE RAD23B
EIF2B4 STX11 PTPN6 STK10 PSMD1 BTG3 MLH1 RNPEP HSPD1 UNG PTPRC PTPRA BCL2 GSS SH3BP5 TPP2
IDH3B CCNH STK11 EIF4EBP2 HSPA4 FADS2 RPA3 GZMK ANXA4 ICAM1 PPID LMO2 PPIE NUDT1 FUS POLR2A
LY9 RPA1 PTS TNFRSF4 RPA2 PSMD8 GCDH MAN2C1 PTPN2 RUVBL1 ATP5H GK CD79B MAP4K4 POLE3
PRKCH AKT2 MOAP1 CCNG2 ALDOA SRD5A1 HAT1 XRCC1 EIF2S3 RAD1 UBE2A ZFP36L1 CD8A TALDO1
GPX4 SSBP2 ERCC3 ATP5O PEPD EIF4G2 ACO2 HEXB UBE3A ARPC1A PSMD10 PRCP PPIB ZNF337 CETN2 RPL29
(b) Derived
M1
M2
KM3
KM4
KM5
from radiation dataset RadTBI-3 (GSE10640[GPL6522])
DDB2 HSPD1 MAP4K4 GTF3A PCNA MDH2
DDB2 GTF3A TNFRSF10B
DDB2 RAD17 PSMD9 LY9 PPIH PCNA MDH2 MOAP1 TP53BP1 PPM1D ATP5G1 BCL2L2 ENO2 PTP4A1
PSMD8 LIG1 FDPS OGDH CCNG1 PSMD1
DDB2 HSPD1 ICAM1 PTP4A1 GTF3A LY9
RAD17 TNFRSF10B PSMD9 LY9 PPIH PCNA ZNF337 MDH2 TP53BP1 PPM1D ZFP36L1 ATP5G1
ALDOA BCL2L2 ENO2 GADD45A PTP4A1 PSMD8 LIG1 ATP5O FDPS OGDH PSMD1
(c) Derived from radiation dataset RadTBI-2 (GSE6874[GPL4782])
M3
DDB2 CD8A TALDO1 PCNA EIF4G2 LCN2 CDKN1A PRKCH ENO1 PPM1D
M4
DDB2 CD8A TALDO1 PCNA LCN2 CDKN1A PRKCH ENO1 GTF3A IL2RB NINJ1 BAX TRIM22 PRKDC
GADD45A MOAP1 ARPC1B LY9 LMO2 STX11 TPP2 CCNG1 GABARAP BCL2 GSS FTH1
KM6
DDB2 PRKDC PRKCH IGJ
KM7
DDB2 PRKDC TPP2 PTPRE GADD45A
FS. algorithm
Accuracya
FSFS
CSFS
93%
93%
FSFS
FSFS
BSFS
86%
80%
95%
FSFS
BSFS
92%
95%
BSFS
BSFS
88%
92%
FSFS
FSFS
98%
98%
FS: feature selection metrics.
Performance metrics for these radiation gene signatures were previously reported in Zhao et al. (2018a) Table 2 (k-fold validated signatures (KM1–KM7)) and
Table 3 (traditionally validated signatures (M1–M4)).
a
was 7-fold lower than the first (DDB2), while dataset
RadTBI-3 (CD8A) showed a fourfold difference. Six of the
top 50 genes in RadBlood-7 exhibited <10% of the MI of
AEN (three genes for RadTBI-2; none of the top 50 in
RadTBI-3 and RadBloodpost-6 were <10% of the top
ranked gene). Selection of low MI genes by ML feature
selection likely reduces accuracy of gene signatures during
validation steps. In the future, signature derivation will set a
minimum MI threshold for ranking genes by mRMR.
The overall levels of MI for top ranked genes in datasets
RadBloodpost-6 (0.72 bits for AEN) and RadBlood-7
(0.82 bits for FDXR) were comparable (Suppl. Table S1). In
RadBlood-7, the genes with the highest MI were AEN,
DDB2, FDXR, PCNA and TNFRSF10B (closely followed by
BAX). While each were found in the top 50 ranked genes,
some rankings were decreased to minimize redundant information (FDXR and AEN are ranked #38 and #41 in the
RadBlood-7 dataset, respectively). MI for the top ranked
genes in datasets RadTBI-2 and RadTBI-3 were lower by
comparison (0.31 and 0.47 bits for DDB2, respectively); the
depressed maximum MI values in these datasets may, in
part, be related to reduced numbers of eligible genes on
these microarray platforms.
Radiation gene signature performance in bloodborne diseases
The specificity of previously derived radiation signatures
selected after k-fold validation (KM1–KM7) and traditional
validation (M1–M4; Zhao et al. 2018a) was assessed with
normalized expression data of patients with unrelated hematological conditions, rather than evaluating unirradiated
healthy controls. Signatures M1 and M2 (derived from dataset RadTBI-3; Table 2) and M3 and M4 (derived from
RadTBI-2; Table 2) were assessed with multiple expression
datasets from influenza A (BBD-Flu##) and dengue fever
(BBD-Deng##) blood infections (Rogan et al. 2021; Table 1).
FPs for radiation exposure were defined as instances where
the misclassification rates of individuals with the disease
diagnosis exceeded normal controls. A clear bias toward FP
predictions of infected samples relative to controls was evident with all of these radiation gene signatures (Rogan et al.
2020; extended data – Section 1 Table 7). Dissection of the
ML features responsible implicated 10 genes contributing to
misclassification, including BCL2, DDB2 and PCNA. These
other conditions also confound predictions by radiation signatures derived by k-fold validation (KM1–KM7; Table 2).
High levels of FP misclassification of viral infections were
also evident with these signatures (Supp. Table S2A). KM6
and KM7 (derived from dataset RadTBI-2) misclassify all
influenza and most dengue fever (BBD-Deng61, BBDDeng08 and BBD-Deng78) datasets of patients at higher
rates than uninfected controls. KM3–KM5 exhibited low FP
rates in influenza relative to other models, but dengue virus
datasets BBD-Deng62, BBD-Deng08 and BBD-Deng78
exhibited higher FP rates in infected samples relative to
uninfected controls (Suppl. Table S2A). Interestingly, KM5
is the only gene signature in which DDB2 is not present,
and this gene contributes to high FP rates (Rogan et al.
2021). KM1 and KM2, which were derived from a third
radiation dataset (RadLymphCL-1), often misclassified virus
infected samples relative to controls (KM1 only: BBDDeng61; KM2 only: BBD-Flu50, BBD-Flu31, BBD-Deng62
and BBD-Flu28; both KM1 and KM2: BBD-Deng08, BBDDeng78, and BBD-Flu21). However, in some datasets, these
models also demonstrated high FP rates in controls.
Expression levels in patients with latent influenza A and
dengue fever stabilize at levels similar to uninfected controls
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
7
Figure 2. Performance of traditionally validated radiation signatures on confounders stratified by sub-phenotypes. Sankey diagrams delineate what fraction of disease patients and controls were properly (TN) and improperly classified (FP) by a radiation gene signature. (A) The radiation signature M4 incorrectly classified 53%
of dengue-infected patient as irradiated; however, all convalescent patients were properly classified (0% FP). (B) Similarly, the FP rate of M1 decreased considerably
after patient recovery (27% FP rate (N ¼ 19) against samples <3 days after symptoms; 3% (N ¼ 2) after 2–5 weeks). (C) The FP rate of M3 was higher for severe malarial anemia patients versus those with cerebral malaria, suggesting that the differential expression caused by the two infection types may diverge in such a way
that is measurable by M3. (D) Conversely, the FP rate of thrombosis patients by M4 was not influenced on whether the disease was recurrent.
after either convalescence or at the end stage infection. For
example, M4 exhibited a 54% FP rate in dengue-infected
individuals 2–9 days after onset of symptoms (BBDDeng08), but samples were correctly classified as unirradiated >4 weeks after initial diagnosis (Figure 2(A)). The
influenza gene expression dataset BBD-Flu85 longitudinally
sampled infected patients after initial symptoms at
<72 hours (T1), 3–7 days (T2), and 2–5 weeks (T3). FPs
were significantly decreased at T3 for nearly all models
tested (19 infected samples misclassified by M1 at T1 was
reduced to two cases at T3; Figure 2(B)). These results
clearly implicate these viral infections as the source of the
transcriptional changes that affect parallel effects of radiation
on these signature genes.
Specificity of radiation signatures using datasets of
other hematological conditions
We investigated whether radiation gene signature accuracy
was compromised by the presence of other blood borne
infections and noninfectious, nonmalignant hematological
pathologies with publicly available expression data on
patients with adequate sample sets (>10 individuals with
corresponding control samples except for aplastic anemia).
These included thromboembolism, S. aureus bacteremia,
malaria, sickle cell disease, polycythemia vera, and aplastic
anemia. We then determined recall levels for signatures
M1–M4 and KM3–KM7 evaluated with these datasets, with
the expectation that these models would predict all potential
confounders as unirradiated (Suppl. Table S2B).
Each radiation gene signature was confounded by some,
but not all, blood-borne disorders and infections. S. aureus
infected samples were frequently misclassified as FPs by all
signatures, except KM7 (Figure 3). High FP rates were
observed for: M1 and KM5 – sickle cell and S. aureus; M2 –
S. aureus; M3 and M4 – malaria, sickle cell, thromboembolism, polycythemia vera and S. aureus; KM3 and KM4 – malaria, sickle cell and S. aureus; KM6 – thromboembolism,
polycythemia vera, S. aureus; KM7 – malaria, thromboembolism and polycythemia vera. We compared differences
between FPs in patients and controls for each dataset using
the Mantel–Haenszel chi square and mid-P exact statistical
tests (Figure 3 and Suppl. Table S2B). Predictions of model
M4 significantly confounded misclassification of radiation
exposure for all conditions tested (polycythemia vera was
only significant with the mid-P exact test), while the FP rate
of KM6 was significantly higher in patients with either
thrombosis or S. aureus infection (p values indicated in
Suppl. Table S2B). The malaria dataset stratified patients
with either cerebral malaria or severe malarial anemia
(Nallandhighal et al. 2019). The severe malarial anemia subset contains the majority of the FPs (Figure 2(C)). The
thromboembolism dataset (Lewis et al. 2011), which categorized patient diagnoses as either single or recurrent
thromboembolism, exhibited similar FP rates for both subsets (Figure 2(D)). Predictions by M3, M4, KM6 and KM7
were confounded by transcriptional changes resulting from
different blood-borne conditions, while M2 and KM5 are
the least influenced by these conditions. Aplastic anemia did
not increase FP rates compared to controls for any of the
8
E. J. MUCAKI ET AL.
Figure 3. Misclassification rates of radiation models in unirradiated blood-borne disorders. Radiation gene signatures M1–M4 and KM3–KM7 performed well when
predicting radiation exposure (80% overall accuracy; Table 2). However, many of these models falsely predicted individuals with blood-borne disorders (thrombosis (A) and sickle cell disease (C)) and infectious diseases (S. aureus (B) and malaria (D)) as irradiated (%FP provided for individuals with the indicated disease
(dark gray; top value), and controls (light gray; bottom value)). Asterisks indicate when the differences between the FP counts in controls and diseased individuals
were significant with both Mantel–Haenszel chi square and mid-P exact tests (one-tailed; p<.05). In general, the FP rate was high for all traditional validated
(M1–M4) and most k-fold validated models (KM4, KM6 and KM7). Models KM3 and KM5 had a low FP rate across all datasets tested.
signatures, consistent with our previous findings (Rogan
et al. 2021).
Predictions of radiation exposure by signatures M4, KM6
and KM7 were confounded by multiple viral, blood-borne
infections and noninfectious blood disorders (Suppl. Table
S2A and S2B). The genes responsible for the high sensitivity
of these signatures were evident by comparative expression
levels correctly (TN) vs. incorrectly (FP) classified samples.
The normalized gene expression distributions of TN and FP
samples in malaria, S. aureus, sickle cell disease, thromboembolism, influenza A and dengue fever were visualized as
violin plots (Figure 4 and Mucaki and Rogan 2021). The
shared distributions in gene expression in FP confounders
and radiation-exposed individuals can be observed without
the need for advanced statistical measurements; however,
differences between TN and FP expression levels for the
same gene were frequently also statistically significant. For
example, expression of BCL2 in sickle cell disease, and S.
aureus and malaria infected samples was significantly lower
in FP samples relative to TNs with M4 (p<.05 with
Student’s t-test, assuming two-tailed distribution and equal
variance; Figure 4(A) (left)), similar to the effect of radiation
exposure on expression of this gene (Figure 4(A) (right)).
These same FP individuals have significantly higher DDB2
expression in both S. aureus and sickle cell disease (Figure
4(B)). Increased DDB2 expression was also observed for FPs
using KM6 and KM7. For both genes, differences in expression in TN and FP samples were congruent with the
changes observed in the radiation exposure datasets. Genes
that may also contribute to misclassification include
GADD45A in M4 (higher expression in diseased individuals
vs. controls and induced by radiation exposure), and
PRKCH and PRKDC, respectively, in KM6 and KM7
(decreased expression in FPs and in response to radiation).
BAX, which is induced by radiation, is similarly expressed in
FP and TN samples, and probably does not contribute to
misclassification by M4.
To determine the extent to which each gene contributes
to the FP rates in each signature, gene features were
removed individually, the radiation signature was rederived
by biochemically inspired ML, and misclassification rates
were reassessed for each confounding condition (Suppl.
Tables S3A (M1–M4) and S3B (KM3–KM7)). Removing any
gene from gene signatures M1, M3, M4, KM3, KM5 and
KM7 did not significantly alter the observed misclassification rates. Elimination of PRKDC (DNA double stranded
break repair and recombination) and IL2RB (innate immunity/inflammation) reduced FP rates in thromboembolism
patients by 10% and 5% for M4, respectively (Figure 5(A)),
which still exceeded the FP rates of controls. Removal of
these genes did not improve the FP rate of M4 in S. aureusinfected samples (Figure 5(B)). Thus, no single gene feature
dominated the predictions by these signatures and could
account for the misclassified samples. Removal of DDB2,
GTF3A or HSPD1 from KM4 significantly decreased its FP
rate to the malaria dataset (18% to 0–3%; Suppl. Table S3B).
Similarly, removal of DDB2 from M2 and KM6 led to the
complete elimination of FPs in both patients and controls.
However, the removal of DDB2 from these models was previously shown to severely reduce the TP rate in irradiated
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
9
Figure 4. DDB2 and BCL2 expression in hematological disorders with radiation-exposed control datasets. Normalized distribution of gene expression of confounder
datasets (VTE: venous thromboembolism; SAu: S. aureus; Sic: sickle cell; and Mal: malaria) for the genes (A) DDB2 and (B) BCL2 is presented as violin plots, where
the expression of individuals with these conditions is divided by those predicted as irradiated (FP; left) or unirradiated (TN; right) by signature M4. Control expression of radiation-exposed (Irr.) and (Non.) unirradiated individuals is indicated by distributions labeled with R2 (dataset RadTBI-2) and R3 (dataset RadTBI-3) on the
right side of each panel. All expression differences between FP and TN samples (predicted with signature M4) found to be significant by Student’s t-test (assuming
two-tailed distribution and equal variance) are indicated by brackets above the corresponding pair of predictions.
samples (Zhao et al. 2018a); these genes cannot be eliminated without affecting the sensitivity of these signatures to
accurately identify radiation exposed samples.
The contributions of individual signature genes can be
assessed by evaluating their impact on overall model predictions for different patients. Expression changes were incrementally introduced to computationally determine the
expression level required to change the outcome of the ML
model (i.e. the inflection point of the prediction that distinguishes exposed from unirradiated samples). The threshold
is visualized in the context of the expression value in the
individual superimposed over a histogram of the distribution
of expression for all confounders in the dataset. Individual
expression values close to this threshold can indicate lower
confidence in either the radiation exposure prediction or of
misclassification by the model. Expression levels and thresholds of DDB2, IL2RB, PCNA and PRKDC for three individuals with thromboembolism (BBD-Thromb) predicted as
irradiated by M4 (GSM474819, GSM474822, and
GSM474828) are indicated in Suppl. Figure S1. Reduction of
DDB2 expression corrected misclassification for all patients,
as did decreasing PCNA expression in GSM474822 and
GSM474828. Increasing IL2RB and PRKDC expression of
these two patients also corrected their misclassification.
These results correspond to the effects of radiation on the
expression of these genes in the RadTBI-2 and RadTBI-3
datasets, e.g. induction of DDB2 and PCNA, repression of
IL2RB and PRKDC (Mucaki and Rogan 2021). The expression changes required for DDB2, PCNA and PRKDC in
these patients were nominal relative to the dynamic range of
the entire dataset but were sufficient to alter predictions of
the signatures. Conversely, changes in expression of PCNA,
IL2RB or PRKDC were unable to modify the prediction of
M4 in GSM474819. Only a large decrease in DDB2 expression to levels below those of nearly all other thromboembolism patients was able to switch the classification of this
individual. This reinforces previous observations about the
strong impact of DDB2 expression levels on prediction
accuracy (Zhao et al. 2018a). Nevertheless, the combined
expression of most of the genes which constitute the signature determines the classification result for each sample.
Incorrect classifications where expression values are close to
the model’s predictive inflection point are relevant when
assessing misclassification accuracy. Generally, expression
levels of most samples analyzed deviated significantly from
these thresholds, leading to robust classifications by the
M4 model.
Misclassification of confounders with radiation gene
signatures derived in this study
FSFS- and BSFS-based radiation gene signatures were
derived from the top 50 ranked genes of datasets
RadBloodpost-6 and RadBlood-7. RadBlood-7 contained sets
of 20 samples, each irradiated at different absorbed energy
levels (0 vs. 2, 5, 6, and 7 Gy). Different ML models were
derived either utilizing the full dataset or based on a combination of 2 and 5 Gy samples. The models derived from
either subset of RadBlood-7 also exhibited very low misclassification (0–1 samples) and log-loss (<0.01). Common
genes selected from signatures derived from RadBlood-7
included AEN, BAX, TNFRSF10B, RPS27L, ZMAT3 and
BCL2 (Table 3(a) and Suppl. Table S4A). Genes selected in
RadBloodpost-6-based signatures included BAX, FDXR,
10
E. J. MUCAKI ET AL.
Figure 5. Multiple genes contribute to misclassification of confounding datasets. Accuracy of M4 (Zhao et al. 2018a (Table 3(b))) was significantly influenced by
hematological confounders such as thrombosis (top) and S. aureus infection (bottom). M4 misclassified diseased individuals (circles) far more often than controls
(squares). Feature removal analysis of M4 determines if a particular gene was contributing to the %FP rate by observing how accuracy changes when a gene is
removed. While M4 accuracy improved with the removal of PRKDC, IL2RB and LCN2, no individual gene restored misclassification back to control levels suggesting
multiple genes are confounded by these diseases.
Table 3. Radiation gene expression signatures derived from RadBloodpost-6 and RadBlood-7 radiation datasets.
Fraction of FP by confounder (disease/controls)
Signature
FS. algorithm
(a) Derived from radiation dataset RadBlood-7 (GSE102971)
M5
AEN BCL2
FSFSa
M6
RPS27L ZMAT3
FSFS
M7
AEN ERCC1 BAX
CSFS
M8
AEN TNFRSF10B
FSFS
(b) Derived from radiation dataset RadBloodpost-6 (GSE85570)
M9
BAX FDXR
FSFS
M10
BAX FDXR XPC
FSFS
M11
BAX DDB2
FSFS
M12
BAX DDB2 SLC7A6
FSFS
M13
RPS27L DDB2 ARL6IP1 TRIM32
FSFS
FS. misclass
FS. log loss
Thrombosis
S. aureus
Sickle cell
Malaria
–
–
0%
–
8.1E–15
5.2E–15
–
5.2E–15
0.57/0.40b
0.78/0.69
0.60/0.83b
0.00/0.00
0.48/0.41
0.49/0.46
0.71/0.66
0.00/0.00
0.49/0.43
0.50/0.43
0.66/0.72
0.56/0.26b
0.41/0.08b
0.71/0.25b
0.68/0.42
1.00/1.00
0%
0%
0%
0%
0%
–
–
–
–
–
0.64/0.27b
0.80/0.46b
0.34/0.76b
0.51/0.31b
0.77/0.40b
0.46/0.64b
0.84/0.65b
0.52/0.48
0.40/0.23b
0.63/0.55
0.52/0.33b
0.67/0.69
0.48/0.49
0.41/0.47
0.54/0.67b
0.44/0.75b
0.68/1.00b
0.59/0.17b
0.44/0.20
0.68/0.25b
FS: feature selection metrics (by leave-one-out cross-validation).
a
Gene Signatures derived using 0 Gy, 2 Gy and 5 Gy samples only (excludes 6 Gy and 7 Gy samples from RadBlood-7).
b
Difference in FPs between controls and test subjects significant by the Mantel–Haenszel chi square and mid-P exact tests (p.05); additional models can be
found in Suppl. Tables S4A and S4B.
XPC, DDB2 and TRIM32 (Table 3(b) and Suppl. Table S4B).
All signatures from this dataset exhibited low misclassification rates (<0.5% by cross-validation).
The radiation gene signatures with the lowest misclassification rates from these datasets were evaluated against the
blood-borne disease confounder datasets that compromised
the accuracies of the M1–M4 and KM3–KM7 signatures
(Zhao et al. 2018a). Misclassification rates were estimated
using confounder datasets containing the largest numbers of
samples, including thromboembolism, S. aureus infection,
sickle cell disease and malaria. The signature designated M5
(consisting of AEN and BCL2; Table 3(a)) showed a significantly elevated FP rate over controls in blood samples from
individuals with thromboembolism (18%) and malaria infection (33%; Suppl. Table S4A). Misclassification by M5 was
increased by 6% in sickle cell disease, which also exhibited a
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
11
Table 4. Radiation gene signatures including only genes encoding secreted factors derived from RadTBI-2 and RadTBI-3 datasets.
Validation misclassification
Signature
(a) Derived from radiation dataset RadTBI-2 (GSE6874[GPL4782]) and validated on RadTBI-3 (GSE10640[GPL6522])
SM1
PDE7A FBXW7 CLCF1 ALB IDUA USP3 SLPI COASY MFAP4 LTBP1 VPS37B VEGFA IRAK3
SM2
PDE7A FBXW7 CLCF1 ALB IDUA USP3 SLPI COASY MFAP4 LTBP1 VPS37B VEGFA IRAK3 MZB1 DHH
GRN AEBP1 CNPY3 NUCB1 RDH11 CXCL3 POFUT1 CST1 ARCN1 PLA2G12A ERAP2 GOLM1
B3GAT3 ADAMTS9 FKBP9 ALDH9A1 LY86 HARS2 PRSS21 RETN C1GALT1 MGAT2 FUCA1 TTC19
MANF LUM GALNT15 APOM NME1 ATMIN GPX4 POLL LY6H SMARCA2
(b) Derived from radiation dataset RadTBI-3 (GSE10640[GPL6522]) and validated on RadTBI-2 (GSE6874[GPL4782])
SM3
TRIM24 TOR1A GRN HP RBP4 PFN1 FN1
SM4
XCL1 CDC40 PTGS2 DHX8 NENF PTX3 WNT1 CTSW TINF2 AOAH VPS51 TOR1A HINT2 CRTAP
SUCLG1 TF EDEM2 LAMA5 AGPS TFPI WFDC2 SRGN SIL1 PPOX AMY2A NUBPL GARS LRPAP1
VPS37B PNP C3orf58 HP SPOCK2 NME1 GRN TRIM24 MRPL34 SRP14 THOC3 RNASE6 RBP4
MSRB2 RNASET2 TGFBI PRDX4 GLA GLB1 PFN1 GDF15 VCAN TRIM28 TAGLN2 TIMP1 IPO9 CPVL
MANBA CEP57 RNF146 PF4 RETN HCCS DPP7 RNASE2 QPCT AHSG CTSC LYZ B2M EMILIN2
STOML2 LCN2
SM5
TRIM24 IRAK3 PPP1CA MTX2 FBXW7 PFN1 SDHB CTSC MSRB2
FS. algorithm
k-folda
Traditional
CSFS
BSFS
0.12
0.12
0.25
0.27
FSFS
CSFS
0.32
0.39
0.49
0.32
FSFSb
0.33
0.38
Additional metrics for these signatures can be found in Suppl. Table S6A.
FS: feature selection metrics.
a
Tested using k-fold validation methods (where k ¼ 5).
b
Derived from the top 50 genes by ranked mRMR (Suppl. Table S1).
significantly higher FP rate in an RadBlood-7-derived signature containing AEN (M8; Table 3(a)) in sickle cell disease
(29%; p<.05). Removal of genes from M8 significantly
increased the FP rate for both controls and diseased individuals, which is a limitation of models based on small numbers of genes (Suppl. Table S5A). M9 (Table 3(b)) includes
BAX and FDXR, and exhibited significantly increased FP
rates in thromboembolism relative to controls (34–38%
increased FP). Interestingly, M13 shows a significant
increase in FPs of individuals with thrombosis (similar to
M1–M4), while M11 does not (Table 3(b)), despite both signatures containing DDB2. Removing any of the genes from
these models did not substantially alter misclassification,
except for a large decrease in FP upon removal of RPS27L
from M9 (Suppl. Table S5B). Both M11 and M13 exhibited
significantly high FP rates in malaria samples. BSFS models
derived from dataset RadBloodpost-6 contained FDXR, BAX
and DDB2, and showed high FP in S. aureus, sickle cell disease and malaria samples (significant by statistical analysis;
Suppl. Table S4B). These confounders adversely affect the
accuracy of gene signatures containing radiation response
genes (such as FDXR and AEN) present in both these and
other recently derived signatures in the published literature.
Mitigating expression changes arising from confounding
blood disorders with gene signatures comprising
secreted factors originating in blood
Highly specific gene expression signatures that identify radiation exposed blood samples should also minimize inclusion
of genes whose expression is altered by other hematological
conditions. Predicted FPs in unexposed patients with confounding conditions may be the result of changes in expression of DNA damage and apoptotic genes that are shared
with radiation responses. We derived ML-based gene signatures that exclude DNA damage or apoptotic genes which
we anticipated would be less prone to misclassifying individuals with confounding blood disorders.
Changes in transcript levels of extracellular blood plasma
proteins resulting from radiation exposure might exclude
those associated with DNA damage or apoptosis response
(e.g. FLT3 ligand (FLT3LG) and amylase (AMY; AMY1A,
AMY2A); Barrett et al. 1982; Bertho et al. 2001; Tapio
2013). This idea is predicated on observations that global
protein synthesis significantly increases 4–8 h after initial
radiation exposures (Braunstein et al. 2009), with some profile changes detectable weeks to months later (Pernot et al.
2012; Hall et al. 2017). Radiation signatures in blood have
been derived from proteins secreted into plasma (Wang
et al. 2020) and expressed by multiple cell lineages (Ostheim
et al. 2021). Radiation-induced short-term changes in the
abundance of mRNAs encoding plasma proteins (that correspond to protein concentration changes) could allow
steady state mRNA expression to be used as a surrogate for
plasma protein levels. Significant correlations between
mRNA and protein expression have been shown when the
data have been transformed to normal distributions
(Greenbaum et al. 2001, 2002). This approach was adopted
to derive mRNA signatures from radiation responsive genes
in blood encoding secreted factors.
Genes which encode secreted proteins were used to
derive new radiation gene expression signatures using our
previously described methods (Zhao et al. 2018a). The
plasma protein-encoding gene GM2A had the highest MI
with radiation in dataset RadTBI-2 (MI ¼ 0.31), while
TRIM24 was highest in dataset RadTBI-3 (MI ¼ 0.27;
Suppl. Table S1). GM2A is absent from dataset RadTBI-3.
MI of TRIM24 was low in RadTBI-2 (MI ¼ 0.05) resulting
in it being ranked second to last (Suppl. Table S1) and it
was not differentially expressed in this dataset (p value
>.05 by t-test; Suppl. Table S6C). Other top 50 ranked
genes by mRMR in both datasets include ACYP1,
B4GALT5, FBXW7, IRAK3, MSRB2, NBL1, PRF1, SPOCK2,
and TOR1A.
We derived five independent radiation gene signatures
encoding proteins secreted by blood cells (e.g. blood
12
E. J. MUCAKI ET AL.
Figure 6. Radiation gene signatures derived from transcripts encoding secreted factors reduce misclassification in unirradiated confounder phenotypes. Radiation
signatures which consist exclusively of genes encoding for plasma secreted proteins were derived following the same basic approach of Zhao et al. (2018a). These
models showed generally favorable performance when tested against an independent radiation dataset by k-fold validation (Table 4). Five blood secretome radiation signatures were derived consisting of 7–75 genes (SM1–SM5). Two models (SM3 and SM5) show high specificity across all hematological conditions tested
(thrombosis(A), S. aureus (B), sickle cell disease (C) and malaria (D)).
secretome models) that showed the lowest cross-validation
misclassification accuracy or log-loss by various feature
selection strategies (labeled SM1–SM5 (secretome model
1–5) in Table 4 and Suppl. Table S6A). SM5 feature selection was limited to the top 50 genes ranked by mRMR. This
pre-selection step was not applied when deriving SM2 and
SM3, whereas SM1 and SM4 were derived by CSFS feature
selection which obtains genes sequentially by mRMR rank
order without applying a threshold. Significantly upregulated
genes and models consisted of SLPI (SM1, SM2), TRIM24
(SM3, SM4, SM5), TOR1A (SM3, SM4), GLA (SM4), SIL1
(SM4), NUBPL (SM4), NME1 (SM4), IPO9 (SM4), IRAK3
(SM5), MTX2 (SM5), and FBXW7 (SM5). Downregulated
genes included CLCF1 (SM1, SM2), USP3 (SM1, SM2),
TTC19 (SM2), PFN1 (SM3, SM4, SM5), CDC40 (SM4),
SPOCK2 (SM4), CTSC (SM4), GLS (SM4), and PPP1CA
(SM5; Suppl. Table S6C). The models exhibited 12–39%
misclassification (by k-fold validation) when validated
against the alternative radiation dataset. The RadTBI-2 dataset was not suitable for signature derivation or validation,
since data for LCN2, ERP44, FN1, GLS, and HMCN1 were
missing; these genes are present in models SM3 and/or SM4
(Table 4). The performance of the derived signatures was
also assessed by inclusion of FLT3 or AMY, either individually or in combination. These genes did not improve model
accuracy beyond the levels of the best performing signatures
that we derived.
The specificity of signatures derived from genes encoding
secreted factors was then evaluated with expression data
from unirradiated individuals with blood-borne diseases and
infections (Suppl. Table S6B). SM3 and SM5 correctly
classified nearly all samples in each dataset as unirradiated
and maintained an FP rate <5% in all datasets (Figure 6).
SM3 and SM5 contain <10 genes, were derived from dataset
RadTBI-3 and share the genes TRIM24 and PFN1 (ranked
#1 and #21 by mRMR). Both genes are significantly differentially expressed after radiation exposure, as is TOR1A in
SM3 and IRAK3, PPP1CA, MTX2, FBXW7 and CTSC in
SM5 (Suppl. Table S6C). SM3 and SM5 have the highest
fraction of genes found significant by Student’s t-test, which
may explain its superior specificity relative to the other
blood secretome signatures. Thromboembolism could only
be evaluated with SM3 and SM5 due to missing genes from
the SM1, SM2 and SM4 signatures. Conversely, SM1, SM2
and SM4 accuracy was compromised by expression changes
of genes in one or more blood-borne diseases. Malaria
(28%) and S. aureus (19%) infected patients were misclassified by SM1 as FPs (with 0.5% and 6.1% FP in controls,
respectively), indicating that SM1 accuracy was significantly
affected by these underlying infections (Suppl. Table S6B).
SM4 accuracy was also impacted by S. aureus infection and
sickle cell disease. Since the predictions of SM3 and SM5
were not influenced by the confounding conditions evaluated here, we suggest that these models will also be likely
to be useful to exclude misclassification by other confounding conditions. Ultimately, it will be necessary to evaluate
these signatures over a wide spectrum of other potential
confounder phenotypes.
SM3 and SM5 exhibited high specificity for radiation
exposure (low false positivity in all confounding datasets)
but were less sensitive than M1–M4 and KM3–KM7 (Table
4). Accurate identification of radiation exposed individuals
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
should be feasible with a sequential strategy that first evaluates blood samples with suspected radiation exposures with
signatures known to exhibit high sensitivity (e.g. M4; 88%
accuracy to radiation exposure), followed by identification of
FPs among predicted positives with SM3 and/or SM5 (which
were not influenced by confounders; Suppl. Table S6B and
S6D). By identifying and removing misclassified, unirradiated samples with the blood secretome-based radiation signatures, sequential application of both sets of signatures
would predict predominantly TP samples.
Discussion
We demonstrated high misclassification rates of radiation
gene expression signatures in unirradiated individuals with
either infections or blood borne disorders relative to normal
controls. This was confirmed with a second set of k-fold
validated radiation signatures from our previous study
(Zhao et al. 2018a). Similar results were obtained with
expression data from unirradiated individuals exhibiting
other hematological conditions, which extended the spectrum of other abnormalities misclassified as exposed to radiation. Some of the same genes that are induced or repressed
by radiation exhibit similar changes in direction and magnitude in infections and hematological conditions (e.g. DDB2,
BCL2). Signatures derived from more recent microarray
platforms that contain key radiation response genes missing
in our previous study (e.g. FDXR, AEN) were also prone to
misclassifying hematological confounders as FPs. By assessing the performance of each model and rejecting signatures
with a high rate of false radiation diagnoses in confounding
conditions, many individuals with these comorbidities might
be ineligible for these radiation gene signature assays.
The symptoms of prodromal influenza and ARS significantly overlap. During influenza outbreaks, this could
impact accurate and timely diagnosis of ARS. Expressionbased bioassays might not improve this diagnostic accuracy,
since traditional radiation signatures maximize sensitivity
without accounting for the diminished specificity due to
underlying hematological conditions. Other highly specific
tests for radiation exposure, such as the dicentric chromosome assay, can be more accurate and less variable than
expression-based assays, but require more time in the
laboratory despite recent improvements in the speed of these
analyses (Rogan et al. 2016; Liu et al. 2017; Shirley et al.
2017; Li et al. 2019; Shirley et al. 2020). Existing gene
expression assays will need to address the FP results
obtained for individuals with hematological conditions
before they can be used in general populations, who may
not have a history of these conditions or who may have
been prescreened as a precondition to military or
space deployment.
Use of matched, unirradiated controls provides a measure
of sensitivity and dynamic range of the derived radiation
gene signature. ML models for the same datasets can consist
of different gene sets and are based on different C and r
values, which can lead to differences in their ability to predict radiation exposures under different biological
13
conditions. Nevertheless, genes with high MI between radiation among confounders consistently show differences in
the distribution of TNs and FPs (Figure 4; Mucaki and
Rogan 2021). That is, the models tend to unambiguously
classify individual samples (Suppl. Fig. 1). Given the shared
responses of different hematopathologies by leukocytes, the
specificity of the signature for radiation exposure would,
under ideal circumstances, be expected to exclude detection
of other pathologies. Negative controls do not exhibit disease symptoms. In a nuclear incident or accident, the
exposed population will include many individuals with
underlying comorbidities. Application of radiation signatures
derived by maximizing sensitivity in this population could
lead to inappropriate diagnosis, and possibly treatment for
ARS. The sequential gene signature assay design should
improve the specificity of radiation gene expression assays
in these individuals, and across the general population.
The cumulative incidences of these confounders are not
rare, especially influenza which affected approximately 11%
of the US population during the 2019–2020 season (11,575
per 100,000; Disease Burden of Influenza 2021). The frequency of dengue fever was also high in the Caribbean
(2510 per 100,000), Southeast Asia (2940 per 100,000) and
in South Asia (3546 per 100,000; based on cases from 2017
(Zeng et al. 2021)). The annual prevalence of S. aureus bacteremia in the US is 38.2–45.7 per 100,000 person-years (El
Atrouni et al. 2009; Rhee et al. 2015), but is higher among
specific populations, such as hemodialysis patients. There
are between 350,000 and 600,000 cases (200 per 100,000) of
deep vein thromboembolism and pulmonary embolism that
occur in the US every year (Anderson et al. 1991).
Furthermore, there are over 100,000 individuals with sickle
cell in the US (33.3 per 100,000; Hassell 2010). Malaria is
also common in sub-Saharan Africa in 2018 (21,910 per
100,000; Global Malaria Programme, World Health
Organization 2020). The prevalence of these diseases makes
it clear that they could very well have a severe impact on
assessment in a population-scale radiation exposure event.
Exploring the basis of these confounding disorders could
facilitate strategies that minimize FPs suggesting radiation
exposures. Common elements among their molecular etiologies may provide insight into their high misclassification
rates. Despite their different clinical presentations, the
underlying mechanisms of all of these conditions and radiation exposure appear to culminate in overwhelming
chromosomal damage, degradation and cell lysis. Riboviral
infections have been proposed to sequester host RNA binding proteins, leading to R-loop formation, DNA damage
responses, and apoptosis (Rogan et al. 2021). This study suggested that expression of some key radiation signature genes
appear to be altered by such infections. We also suggest that
neutrophil extracellular traps (or NETs; Qi et al. 2020) may
activate biochemical pathways that are present in early radiation responses. An early step in the formation of these
structures is chromosome decondensation followed by the
fragmentation of DNA which act as extracellular fibers
which bind pathogens (such as S. aureus) in a process similar to autophagy in neutrophils (NETosis). This process
14
E. J. MUCAKI ET AL.
Figure 7. Sequential application of radiation-responsive and blood secretome gene signatures identifies exposed individuals. False positive predictions due to differential expression caused by confounding conditions could be mitigated by following a sequential approach where samples are evaluated with both a highly sensitive radiation gene signature and a second signature with high specificity. M4, for example, is highly sensitive when validated against radiation dataset RadTBI-3
(88% accuracy), where all incorrect classifications were due to FP predictions (zero false negatives (FN)). Predicted irradiated samples could then be evaluated with
a highly specific model such as SM3, which would identify and remove any misclassified unirradiated samples remaining in the set and leave only TPs.
would likely activate DNA damage in neutrophils, and some
of the same DNA damage response genes that are activated
(DDB2, PCNA, GADD45A) and repressed (BCL2) after radiation exposure are also similarly regulated after infections
such as S. aureus. It has also been proposed that NET formation affects the severity of malaria infections (Boeltz et al.
2017). NETosis also contributes to the pathogenesis of
numerous noninfectious diseases such as thromboembolism
(Demers and Wagner 2014; Collison 2019) and sickle cell
disease (Hounkpe et al. 2020), in addition to autoimmune
disease (He et al. 2018) and general inflammation (DelgadoRizo et al. 2017). If the origin of the FPs is confined to this
lineage, then a comparison of the predictions of our traditionally validated signatures using data from the granulocyte vs. lymphocyte lineages in individuals with these
conditions should reveal whether NETosis is the likely etiology of the confounder expression phenotypes, or possibly
even in radiation treated cells. To do this for radiation
exposed cells, would require RNASeq data from these isolated cell populations (Ostheim et al. 2021). We would
expect FPs in the confounder populations using signatures
derived from myeloid-derived lineages, which include
neutrophils.
The discovery of blood-borne conditions which lead to
high FPs raises the question of whether other hematological
conditions could also increase misclassification by radiation
gene signatures. Such datasets are either unavailable or not
suitable for analysis. Some studies consist with too few samples (e.g. GSE69601 (idiopathic portal hypertension) has six
samples total) or lack the control samples necessary to perform a proper comparison (e.g. GSE33812 (aplastic
anemia)). Although the available gene expression datasets
covered a broad range of hematopathologies, additional testing of the sequential gene signatures will be required to
exclude FPs due to underlying changes in gene expression
from other confounders.
Confounding conditions will affect the precision of other
assays and biomarkers that are routinely used to assess radiation exposure. Elevated levels of c-H2AX, a marker of
DNA damage, occur in cancer (Banath et al. 2004; Warters
et al. 2005; Sedelnikova and Bonner 2006; Yu et al. 2006), in
ulcerative colitis (Risques et al. 2008) and zinc depletion/
restriction (Mah et al. 2010). c-H2AX has been suggested as
an early cancer screening and cancer therapy biomarker
(Sedelnikova and Bonner 2006). Besides its application for
radiation assessment, the cytokinesis block micronucleus
assay (CBMN; Fenech 2010) is also a multi-target endpoint
for genotoxic stress from exogenous chemical agents
(Kirsch-Volders et al. 2011; Fenech et al. 2016; KirschVolders et al. 2018) and deficiency of micronutrients
required for DNA synthesis and/or repair (folate, zinc;
Beetstra et al. 2005; Sharif et al. 2012). The specificity of
radiation testing may also be affected in patients with cancer
using the c -H2AX assay and patients under genotoxic stress
and nutrient deficiencies using the CBMN assay.
Many radiation responsive genes were frequently selected
as features for multiple signatures, and includes genes with
roles in DNA damage response (CDKN1A, DDB2,
GADD45A, LIG1, PCNA), apoptosis (AEN, CCNG1, LY9,
PPM1D, TNFRSF10B), metabolism (FDXR), cell proliferation
(PTP4A1) and the immune system (LY9 and TRIM22). In
general, the removal of these genes did not significantly alter
the FP rate against confounder data. However, the removal
of LIG1, PCNA, PPM1D, PTP4A1, TNFRSF10B, and
TRIM22 could partially decrease misclassification of influenza samples in some models, as well as DDB2 for dengue
(in addition to S. aureus and polycythemia vera). Many of
these genes in our models are also present in other published radiation gene signatures and assays (Paul and
Amundson 2008; Lu et al. 2014; Oh et al. 2014; Port et al.
2017; Tichy et al. 2018; Jacobs et al. 2020). Paul and
Amundson (2008) developed a 74-gene radiation signature
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
comprised of 16 genes present in the human signatures
reported in Zhao et al. (2018a), including CDKN1A, DDB2
and PCNA. Similarly, three of the five biomarkers implicated
in Tichy et al. (2018) were also commonly selected (CCNG1,
CDKN1A, and GADD45A), as were five of the 13 genes in
the radiation assay described in Jacobs et al. (BAX,
CDKN1A, DDB2, MYC and PCNA). While we cannot determine the impact on the accuracy of their signatures for confounders, it is evident that some genes that are included in
these and other gene signatures (such as DDB2) can have a
profound impact on the misclassification of individuals with
confounding conditions.
The proposed sequential approach that combines highly
sensitive predictors (affected by confounders) with high-specificity signatures could improve the accuracy of predicting
TP exposures (Figure 7). After assessment with a sensitive
signature (e.g. M4), all predicted positive samples would be
reevaluated with a high specificity signature (e.g. SM3 or
SM5) to remove misclassified FP samples resulting in a
higher performance assay that predominantly or exclusively
labels truly irradiated samples. Datasets derived from different post-exposure times and exposures (RadLymphCL-1,
RadTBI-2, RadTBI-3, RadBloodpost-6, and RadBlood-7)
could generate ML models which can be used to assess the
extent to which hematological confounders influence radiation exposure predictions for a variety of exposures and
post-irradiation time constraints. A signature can be derived
and selected which best fits the circumstances of the radiation exposure profile of a potentially exposed individual.
The high specificity signatures, SM3 and SM5, were not as
sensitive to misclassification as M1–M4 and KM1–KM7. It
is conceivable that a proportion of radiation-exposed individuals could be misclassified as FNs if samples were evaluated solely with these signatures. Besides radiation
exposure, the application of sequential gene signatures, each
optimized respectively to maximize sensitivity and specificity, may turn out to be a general strategy for improving
accuracy of molecular diagnoses for a wide spectrum of disease pathologies.
Differential molecular diagnoses based on gene signatures
would evaluate predicted radiation positive samples by individual gene signatures, each trained on different confounders (e.g. one model for influenza infection, another for
thromboembolism, etc.). This approach would explicitly
exclude FPs for radiation response while identifying the
underlying condition. Separate signatures for sensitivity and
specificity might also be avoided by training adversarial networks (Goodfellow et al. 2014) that contrast radiationexposed samples with one or more datasets for confounding
conditions, with emphasis on samples predicted to be FPs
from the current radiation signatures. These resultant signatures would select radiation responsive genes which are
resistant to the effects of confounders. Finally, ensuring that
both the positive test and negative control samples in training sets properly account for the population frequencies of
confounding diagnoses would also be expected to improve
the performance of radiation gene signatures.
15
Disclosure statement
Ben C. Shirley is an employee and Peter K. Rogan is a cofounder of
CytoGnomix Inc. This work is patent pending.
Funding
This work was supported by the University of Western Ontario and
CytoGnomix Inc. The authors thank Drs. Ruth Wilkins and Joan Knoll
for their constructive comments.
Notes on contributors
Eliseos J. Mucaki, M.Sc., is a Technologist in the Department of
Biochemistry, University of Western Ontario, Canada.
Ben C. Shirley, M.Sc., is the Chief Software Architect, CytoGnomix
Inc. Canada.
Peter K. Rogan, Ph.D., is a Professor of Biochemistry and Oncology,
Schulich School of Medicine and Dentistry, University of Western
Ontario, Canada, and President, CytoGnomix Inc.
ORCID
Peter K. Rogan
http://orcid.org/0000-0003-2070-5254
Data availability statement
A Zenodo data repository has been created for this study (DOI:
doi.org/10.5281/zenodo.5009007). This archive provides additional violin plots which illustrate the expression of genes in models M1–M4,
KM3–KM7 and SM1–SM5 for patients with a bloodborne condition or
RNA viral infection. This archive also provides each ML model utilized
in this manuscript (M1-M20, KM1-KM7, SM1-SM5) as MATLAB formatted data (MAT files) with usage documentation.
References
Anderson FA Jr, Wheeler HB, Goldberg RJ, Hosmer DW, Patwardhan
NA, Jovanovic B, Forcier A, Dalen JE. 1991. A population-based
perspective of the hospital incidence and case-fatality rates of deep
vein thrombosis and pulmonary embolism. The Worcester DVT
Study. Arch Intern Med. 151(5):933–938.
Bagchee-Clark AJ, Mucaki EJ, Whitehead T, Rogan PK. 2020. Pathwayextended gene expression signatures integrate novel biomarkers that
improve predictions of patient responses to kinase inhibitors.
MedComm. 1(3):311–327.
Banath JP, Macphail SH, Olive PL. 2004. Radiation sensitivity, H2AX
phosphorylation, and kinetics of repair of DNA strand breaks in
irradiated cervical cancer cell lines. Cancer Res. 64(19):7144–7149.
Banchereau R, Jordan-Villegas A, Ardura M, Mejias A, Baldwin N, Xu
H, Saye E, Rossello-Urgell J, Nguyen P, Blankenship D, et al. 2012.
Host immune transcriptional profiles reflect the variability in clinical
disease manifestations in patients with Staphylococcus aureus infections. PLoS One. 7(4):e34390.
Barrett A, Jacobs A, Kohn J, Raymond J, Powles RL. 1982. Changes in
serum amylase and its isoenzymes after whole body irradiation. Br
Med J (Clin Res Ed). 285(6336):170–171.
Beetstra S, Thomas P, Salisbury C, Turner J, Fenech M. 2005. Folic
acid deficiency increases chromosomal instability, chromosome 21
aneuploidy and sensitivity to radiation-induced micronuclei. Mutat
Res. 578(1–2):317–326.
Berdal JE, Mollnes TE, Waehre T, Olstad OK, Halvorsen B, Ueland T,
Laake JH, Furuseth MT, Maagaard A, Kjekshus H, et al. 2011.
Excessive innate immune response and mutant D222G/N in severe
A (H1N1) pandemic influenza. J Infect. 63(4):308–316.
16
E. J. MUCAKI ET AL.
Bertho JM, Demarquay C, Frick J, Joubert C, Arenales S, Jacquet N,
Sorokine-Durm I, Chau Q, Lopez M, Aigueperse J, et al. 2001. Level
of Flt3-ligand in plasma: a possible new bio-indicator for radiationinduced aplasia. Int J Radiat Biol. 77(6):703–712.
Boeltz S, Mu~
noz LE, Fuchs TA, Herrmann M. 2017. Neutrophil extracellular traps open the pandora’s box in severe malaria. Front
Immunol. 8(874):874.
Boldrini L, Bibault JE, Masciocchi C, Shen Y, Bittner MI. 2019. Deep
learning: a review for the radiation oncologist. Front Oncol. 9:977.
Boldt S, Knops K, Kriehuber R, Wolkenhauer O. 2012. A frequencybased gene selection method to identify robust biomarkers for radiation dose prediction. Int J Radiat Biol. 88(3):267–276.
Braunstein S, Badura ML, Xi Q, Formenti SC, Schneider RJ. 2009.
Regulation of protein synthesis by ionizing radiation. Mol Cell Biol.
29(21):5645–5656.
Budworth H, Snijders AM, Marchetti F, Mannion B, Bhatnagar S,
Kwoh E, Tan Y, Wang SX, Blakely WF, Coleman M, et al. 2012.
DNA repair and cell cycle biomarkers of radiation exposure and
inflammation stress in human blood. PLoS One. 7(11):e48619.
Collison J. 2019. Preventing NETosis to reduce thrombosis. Nat Rev
Rheumatol. 15(6):317.
Cover TM, Thomas JA. 2006. Elements of information theory. 2nd ed.
New York (NY): John Wiley & Sons.
Cruz-Garcia L, O’Brien G, Donovan E, Gothard L, Boyle S, Laval A,
Testard I, Ponge L, Wozniak G, Miszczyk L, et al. 2018. Influence of
confounding factors on radiation dose estimation using in vivo validated transcriptional biomarkers. Health Phys. 115(1):90–101.
Delgado-Rizo V, Martınez-Guzman MA, I~
niguez-Gutierrez L, GarcıaOrozco A, Alvarado-Navarro A, Fafutis-Morris M. 2017. Neutrophil
extracellular traps and its implications in inflammation: an overview.
Front Immunol. 8:81.
Demers M, Wagner DD. 2014. NETosis: a new factor in tumor progression and cancer-associated thrombosis. Semin Thromb Hemost.
40(3):277–283.
Ding C, Peng H. 2005. Minimum redundancy feature selection from
microarray gene expression data. J Bioinform Comput Biol. 3(2):
185–205.
Ding LH, Park S, Peyton M, Girard L, Xie Y, Minna JD, Story MD.
2013. Distinct transcriptome profiles identified in normal human
bronchial epithelial cells after exposure to c-rays and different elemental particles of high Z and energy. BMC Genomics. 14:372.
Disease Burden of Influenza. 2021. Centers for Disease Control and
Prevention, National Center for Immunization and Respiratory
Diseases (NCIRD); [accessed 2021 Apr 9]. https://www.cdc.gov/flu/
about/burden.
Dorman SN, Baranova K, Knoll JHM, Urquhart BL, Mariani G,
Carcangiu ML, Rogan PK. 2016. Genomic signatures for paclitaxel
and gemcitabine resistance in breast cancer derived by machine
learning. Mol Oncol. 10(1):85–100.
Dressman HK, Muramoto GG, Chao NJ, Meadows S, Marshall D,
Ginsburg GS, Nevins JR, Chute JP. 2007. Gene expression signatures
that predict radiation exposure in mice and humans. PLoS Med.
4(4):e106.
El Atrouni WI, Knoll BM, Lahr BD, Eckel-Passow JE, Sia IG, Baddour
LM. 2009. Temporal trends in the incidence of Staphylococcus aureus bacteremia in Olmsted County, Minnesota, 1998 to 2005: a
population-based study. Clin Infect Dis. 49(12):e130–e138.
Fenech M. 2010. The lymphocyte cytokinesis-block micronucleus
cytome assay and its application in radiation biodosimetry. Health
Phys. 98(2):234–243.
Fenech M, Knasmueller S, Bolognesi C, Bonassi S, Holland N, Migliore
L, Palitti F, Natarajan AT, Kirsch-Volders M. 2016. Molecular mechanisms by which in vivo exposure to exogenous chemical genotoxic
agents can lead to micronucleus formation in lymphocytes in vivo
and ex vivo in humans. Mutat Res Rev Mutat Res. 770(Pt A):12–25.
Ghandhi SA, Smilenov LB, Elliston CD, Chowdhury M, Amundson
SA. 2015. Radiation dose-rate effects on gene expression for human
biodosimetry. BMC Med Genomics. 8:22.
Global Malaria Programme, World Health Organization. 2020. World
Malaria Report 2020. ISBN 978-92-4-001579-1. https://www.who.int/
publications/i/item/9789240015791
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D,
Ozair S, Courville A, Bengio Y. 2014. Generative adversarial nets.
arxiv. 1406.2661.
Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M. 2001.
Interrelating different types of genomic data, from proteome to
secretome: ’oming in on function. Genome Res. 11(9):1463–1468.
Greenbaum D, Jansen R, Gerstein M. 2002. Analysis of mRNA expression and protein abundance data: an approach for the comparison
of the enrichment of features in the cellular population of proteins
and transcripts. Bioinformatics. 18(4):585–596.
Guy JB, Bertoletti L, Magne N, Rancoule C, Mahe I, Font C, Sanz O,
Martın-Antoran JM, Pace F, Vela JR, et al. 2017. Venous thromboembolism in radiation therapy cancer patients: findings from the
RIETE registry. Crit Rev Oncol Hematol. 113:83–89.
Hall J, Jeggo PA, West C, Gomolka M, Quintens R, Badie C, Laurent
O, Aerts A, Anastasov N, Azimzadeh O, et al. 2017. Ionizing radiation biomarkers in epidemiological studies – an update. Mutat Res.
771:59–84.
Hassell KL. 2010. Population estimates of sickle cell disease in the U.S.
Am J Prev Med. 38(4 Suppl.):S512–S521.
He Y, Yang FY, Sun EW. 2018. Neutrophil extracellular traps in autoimmune diseases. Chin Med J. 131(13):1513–1519.
Hill A, Hanson M, Bogle MA, Duvic M. 2004. Severe radiation dermatitis is related to Staphylococcus aureus. Am J Clin Oncol. 27(4):
361–363.
Hoang LT, Tolfvenstam T, Ooi EE, Khor CC, Naim AN, Ho EX, Ong
SH, Wertheim HF, Fox A, Van Vinh Nguyen C, et al. 2014. Patientbased transcriptome-wide analysis identify interferon and ubiquination pathways as potential predictors of influenza A disease severity.
PLoS One. 9(11):e111640.
Hounkpe BW, Chenou F, Domingos IF, Cardoso EC, Costa Sobreira
MJV, Araujo AS, Lucena-Ara
ujo AR, da Silva Neto PV, Malheiro A,
Fraiji NA, et al. 2020. Neutrophil extracellular trap regulators in
sickle cell disease: modulation of gene expression of PADI4, neutrophil elastase, and myeloperoxidase during vaso-occlusive crisis. Res
Pract Thromb Haemost. 16(1):204–210.
Jacobs AR, Guyon T, Headley V, Nair M, Ricketts W, Gray G, Wong
JYC, Chao N, Terbrueggen R. 2020. Role of a high throughput biodosimetry test in treatment prioritization after a nuclear incident.
Int J Radiat Biol. 96(1):57–66.
Jen KY, Cheung VG. 2003. Transcriptional response of lymphoblastoid
cells to ionizing radiation. Genome Res. 13(9):2092–2100.
Kirsch-Volders M, Plas G, Elhajouji A, Lukamowicz M, Gonzalez L,
Vande Loock K, Decordier I. 2011. The in vitro MN assay in 2011:
origin and fate, biological significance, protocols, high throughput
methodologies and toxicological relevance. Arch Toxicol. 85(8):
873–899.
Kirsch-Volders M, Fenech M, Bolognesi C. 2018. Validity of the
lymphocyte cytokinesis-block micronucleus assay (L-CBMN) as biomarker for human exposure to chemicals with different modes of
action: a synthesis of systematic reviews. Mutat Res Genet Toxicol
Environ Mutagen. 836(Pt A):47–52.
Knops K, Boldt S, Wolkenhauer O, Kriehuber R. 2012. Gene expression
in low- and high-dose-irradiated human peripheral blood lymphocytes: possible applications for biodosimetry. Radiat Res. 178(4):
304–312.
Kwissa M, Nakaya HI, Onlamoon N, Wrammert J, Villinger F, Perng
GC, Yoksan S, Pattanapanyasat K, Chokephaibulkit K, Ahmed R,
et al. 2014. Dengue virus infection induces expansion of a
CD14(þ)CD16(þ) monocyte population that stimulates plasmablast
differentiation. Cell Host Microbe. 16(1):115–127.
Lewis DA, Stashenko GJ, Akay OM, Price LI, Owzar K, Ginsburg GS,
Chi JT, Ortel TL. 2011. Whole blood gene expression analyses in
patients with single versus recurrent venous thromboembolism.
Thromb Res. 128(6):536–540.
Li Y, Shirley BC, Wilkins RC, Norton F, Knoll JHM, Rogan PK. 2019.
Radiation dose estimation by completely automated interpretation of
INTERNATIONAL JOURNAL OF RADIATION BIOLOGY
the dicentric chromosome assay. Radiat Protect Dosim. 186(1):
42–47.
Lipsitch M, Tchetgen Tchetgen E, Cohen T. 2010. Negative controls: a
tool for detecting confounding and bias in observational studies
[published correction appears in Epidemiology. 2010 Jul;21(4):589].
Epidemiology. 21(3):383–388.
Liu J, Li Y, Wilkins R, Flegal F, Knoll JHM, Rogan PK. 2017. Accurate
cytogenetic biodosimetry through automated dicentric chromosome
curation and metaphase cell selection. F1000Res. 6:1396.
Lu TP, Hsu YY, Lai LC, Tsai MH, Chuang EY. 2014. Identification of
gene expression biomarkers for predicting radiation exposure. Sci
Rep. 4:6293.
Mah LJ, El-Osta A, Karagiannis TC. 2010. gammaH2AX: a sensitive
molecular marker of DNA damage and repair. Leukemia. 24(4):
679–686.
Meadows SK, Dressman HK, Muramoto GG, Himburg H, Salter A,
Wei Z, Ginsburg GS, Ginsburg G, Chao NJ, Nevins JR, et al. 2008.
Gene expression signatures of radiation response are specific, durable and accurate in mice and humans. PLoS One. 3(4):e1912.
Mucaki EJ, Baranova K, Pham HQ, Rezaeian I, Angelov D, Ngom A,
Rueda L, Rogan PK. 2016. Predicting outcomes of hormone and
chemotherapy in the Molecular Taxonomy of Breast Cancer
International Consortium (METABRIC) Study by biochemicallyinspired machine learning. F1000Res. 5:2124.
Mucaki EJ, Zhao J, Lizotte DJ, Rogan PK. 2019. Predicting responses
to platin chemotherapy agents with biochemically-inspired machine
learning. Signal Transduct Target Ther. 4:1.
Mucaki EJ, Rogan PK. 2021. Zenodo Archive for "Improved radiation
gene expression profiles with sequentially applied, sensitive and specific gene signatures”. Zenodo.
Nallandhighal S, Park GS, Ho YY, Opoka RO, John CC, Tran TM.
2019. Whole-blood transcriptional signatures composed of erythropoietic and NRF2-regulated genes differ between cerebral malaria
and severe malarial anemia. J Infect Dis. 219(1):154–164.
Oh DS, Cheang MC, Fan C, Perou CM. 2014. Radiation-induced gene
signature predicts pathologic complete response to neoadjuvant
chemotherapy in breast cancer patients. Radiat Res. 181(2):193–207.
Olagnier D, Peri S, Steel C, van Montfoort N, Chiang C, Beljanski V,
Slifker M, He Z, Nichols CN, Lin R, et al. 2014. Cellular oxidative
stress response controls the antiviral and apoptotic programs in dengue virus-infected dendritic cells. PLoS Pathog. 10(12):e1004566.
Ostheim P, Don Mallawaratchy A, M€
uller T, Sch€
ule S, Hermann C,
Popp T, Eder S, Combs SE, Port M, Abend M. 2021. Acute radiation syndrome-related gene expression in irradiated peripheral
blood cell populations. Int J Radiat Biol. 97(4):474–484.
Park JG, Paul S, Briones N, Zeng J, Gillis K, Wallstrom G, LaBaer J,
Amundson SA. 2017. Developing human radiation biodosimetry
models: testing cross-species conversion approaches using an ex vivo
model system. Radiat Res. 187(6):708–721.
Paul S, Amundson SA. 2008. Development of gene expression signatures for practical radiation biodosimetry. Int J Radiat Oncol Biol
Phys. 71(4):1236–1244.
Paul S, Amundson SA. 2011. Gene expression signatures of radiation
exposure in peripheral white blood cells of smokers and non-smokers. Int J Radiat Biol. 87(8):791–801.
Pernot E, Hall J, Baatout S, Benotmane MA, Blanchardon E, Bouffler
S, El Saghire H, Gomolka M, Guertler A, Harms-Ringdahl M, et al.
2012. Ionizing radiation biomarkers for potential use in epidemiological studies. Mutat Res. 751(2):258–286.
Port M, Herodin F, Valente M, Drouet M, Lamkowski A, Majewski M,
Abend M. 2017. Gene expression signature for early prediction of
late occurring pancytopenia in irradiated baboons. Ann Hematol.
96(5):859–870.
Qi J-L, He J-R, Liu C-B, Jin S-M, Gao R-Y, Yang X, Bai H-M, Ma Y-B.
2020. Pulmonary Staphylococcus aureus infection regulates breast
cancer cell metastasis via neutrophil extracellular traps (NETs) formation. MedComm. 1(2):188–201.
Quinlan J, Idaghdour Y, Goulet JP, Gbeha E, de Malliard T, Bruat V,
Grenier JC, Gomez S, Sanni A, Rahimy MC, et al. 2014. Genomic
17
architecture of sickle cell disease in West African children. Front
Genet. 5:26.
Rhee Y, Aroutcheva A, Hota B, Weinstein RA, Popovich KJ. 2015.
Evolving epidemiology of Staphylococcus aureus bacteremia. Infect
Control Hosp Epidemiol. 36(12):1417–1422.
Rieger KE, Hong WJ, Tusher VG, Tang J, Tibshirani R, Chu G. 2004.
Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage. Proc Natl Acad Sci USA. 101(17):
6635–6640.
Risques RA, Lai LA, Brentnall TA, Li L, Feng Z, Gallaher J, Mandelson
MT, Potter JD, Bronner MP, Rabinovitch PS. 2008. Ulcerative colitis
is a disease of accelerated colon aging: evidence from telomere attrition and DNA damage. Gastroenterology. 135(2):410–418.
Rogan PK, Li Y, Wilkins RC, Flegal FN, Knoll JH. 2016. Radiation
dose estimation by automated cytogenetic biodosimetry. Radiat Prot
Dosimetry. 172(1–3):207–217.
Rogan PK. 2019. Multigene signatures of responses to chemotherapy
derived by biochemically-inspired machine learning. Mol Genet
Metab. 128(1–2):45–52.
Rogan PK, Mucaki EJ, Shirley BC. 2020. Characteristics of human and
viral RNA binding sites and site clusters recognized by SRSF1 and
RNPS1. Zenodo. http://www.doi.org/10.5281/zenodo.3737089.
Rogan PK, Mucaki EJ, Shirley BC. 2021. A proposed molecular mechanism for pathogenesis of severe RNA-viral pulmonary infections.
F1000Res. 9:943.
Sedelnikova OA, Bonner WM. 2006. GammaH2AX in cancer cells: a
potential biomarker for cancer diagnostics, prediction and recurrence. Cell Cycle. 5(24):2909–2913.
Sharif R, Thomas P, Zalewski P, Fenech M. 2012. Zinc deficiency or
excess within the physiological range increases genome instability
and cytotoxicity, respectively, in human oral keratinocyte cells.
Genes Nutr. 7(2):139–154.
Shirley B, Li Y, Knoll JHM, Rogan PK. 2017. Expedited radiation biodosimetry by automated dicentric chromosome identification
(ADCI) and dose estimation. J Vis Exp. (127):56245.
Shirley BC, Knoll JHM, Moquet J, Ainsbury E, Pham ND, Norton F,
Wilkins RC, Rogan PK. 2020. Estimating partial-body ionizing radiation exposure by automated cytogenetic biodosimetry. Int J Radiat
Biol. 96(11):1492–1503.
Smirnov DA, Brady L, Halasa K, Morley M, Solomon S, Cheung VG.
2012. Genetic variation in radiation-induced cell death. Genome
Res. 22(2):332–339.
Spivak JL, Considine M, Williams DM, Talbot CC Jr, Rogers O,
Moliterno AR, Jie C, Ochs MF. 2014. Two clinical phenotypes in
polycythemia vera. N Engl J Med. 371(9):808–817.
Svensson JP, Stalpers LJ, Esveldt-van Lange RE, Franken NA, Haveman
J, Klein B, Turesson I, Vrieling H, Giphart-Gassler M. 2006.
Analysis of gene expression using gene sets discriminates cancer
patients with and without late radiation toxicity. PLoS Med. 3(10):
e422.
Tang BM, Shojaei M, Parnell GP, Huang S, Nalos M, Teoh S,
O’Connor K, Schibeci S, Phu AL, Kumar A, et al. 2017. A novel
immune biomarker IFI27 discriminates between influenza and bacteria in patients with suspected respiratory infection. Eur Respir J.
49(6):1602098.
Tapio S. 2013. Ionizing radiation effects on cells, organelles and tissues
on proteome level. In: Leszczynski D, editor. Radiation proteomics.
Advances in experimental medicine and biology. Vol. 990.
Dordrecht: Springer; p. 37–48.
Tian Y, Babor M, Lane J, Schulten V, Patil VS, Seumois G, Rosales SL,
Fu Z, Picarda G, Burel J, et al. 2017. Unique phenotypes and clonal
expansions of human CD4 effector memory T cells re-expressing
CD45RA. Nat Commun. 8(1):1473.
Tichy A, Kabacik S, O’Brien G, Pejchal J, Sinkorova Z, Kmochova A,
Sirak I, Malkova A, Beltran CG, Gonzalez JR, et al. 2018. The first
in vivo multiparametric comparison of different radiation exposure
biomarkers in human blood. PLOS One. 13(2):e0193412.
Tsuge M, Oka T, Yamashita N, Saito Y, Fujii Y, Nagaoka Y, Yashiro
M, Tsukahara H, Morishima T. 2014. Gene expression analysis in
18
E. J. MUCAKI ET AL.
children with complex seizures due to influenza A(H1N1)pdm09 or
rotavirus gastroenteritis. J Neurovirol. 20(1):73–84.
van Oorschot B, Uitterhoeve L, Oomen I, Ten Cate R, Medema JP,
Vrieling H, Stalpers LJ, Moerland PD, Franken NA. 2017.
Prostate cancer patients with late radiation toxicity exhibit
reduced expression of genes involved in DNA double-strand
break repair and homologous recombination. Cancer Res. 77(6):
1485–1491.
Vanderwerf SM, Svahn J, Olson S, Rathbun RK, Harrington C, Yates J,
Keeble W, Anderson DC, Anur P, Pereira NF, et al. 2009. TLR8dependent TNF-(alpha) overexpression in Fanconi anemia group C
cells. Blood. 114(26):5290–5298.
Wang Q, Lee Y, Shuryak I, Pujol Canadell M, Taveras M, Perrier
JR, Bacon BA, Rodrigues MA, Kowalski R, Capaccio C, et al.
2020. Development of the FAST-DOSE assay system for highthroughput biodosimetry and radiation triage. Sci Rep. 10(1):
12716.
Warters RL, Adamson PJ, Pond CD, Leachman SA. 2005. Melanoma
cells express elevated levels of phosphorylated histone H2AX foci.
J Invest Dermatol. 124(4):807–817.
Yu T, MacPhail SH, Banath JP, Klokov D, Olive PL. 2006. Endogenous
expression of phosphorylated histone H2AX in tumors in relation to DNA
double-strand breaks and genomic instability. DNA Repair. 5(8):935–946.
Zeng G. 2015. A unified definition of mutual information with applications in machine learning. Math Probl Eng. 2015:1–12.
Zeng Z, Zhan J, Chen L, Chen H, Cheng S. 2021. Global, regional, and
national dengue burden from 1990 to 2017: a systematic analysis
based on the global burden of disease study 2017.
EClinicalMedicine. 32:100712.
Zhao JZL, Mucaki EJ, Rogan PK. 2018a. Predicting ionizing radiation
exposure using biochemically-inspired genomic machine learning.
F1000Res. 7:233.
Zhao JZL, Mucaki EJ, Rogan PK. 2018b. Matlab code for “Predicting
exposure to ionizing radiation by biochemically-inspired genomic
machine learning”. Zenodo.