Tumour gene expression signature in primary melanoma predicts long-term outcomes

Garg, Manik; Couturier, Dominique-Laurent; Nsengimana, Jérémie; Fonseca, Nuno A.; Wongchenko, Matthew; Yan, Yibing; Lauss, Martin; Jönsson, Göran B.; Newton-Bishop, Julia; Parkinson, Christine; Middleton, Mark R.; Bishop, D. Timothy; McDonald, Sarah; Stefanos, Nikki; Tadross, John; Vergara, Ismael A.; Lo, Serigne; Newell, Felicity; Wilmott, James S.; Thompson, John F.; Long, Georgina V.; Scolyer, Richard A.; Corrie, Pippa; Adams, David J.; Brazma, Alvis; Rabbie, Roy

doi:10.1038/s41467-021-21207-2

Download PDF

Article
Open access
Published: 18 February 2021

Tumour gene expression signature in primary melanoma predicts long-term outcomes

Nature Communications volumeÂ 12, ArticleÂ number:Â 1137 (2021) Cite this article

16k Accesses
60 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 17 May 2022

This article has been updated

Abstract

Adjuvant systemic therapies are now routinely used following resection of stage III melanoma, however accurate prognostic information is needed to better stratify patients. We use differential expression analyses of primary tumours from 204 RNA-sequenced melanomas within a large adjuvant trial, identifying a 121 metastasis-associated gene signature. This signature strongly associated with progression-free (HRâ=â1.63, pâ=â5.24 Ã 10^â5) and overall survival (HRâ=â1.61, pâ=â1.67 Ã 10^â4), was validated in 175 regional lymph nodes metastasis as well as two externally ascertained datasets. The machine learning classification models trained using the signature genes performed significantly better in predicting metastases than models trained with clinical covariates (p_AUROC = 7.03 Ã 10^â4), or published prognostic signatures (p_AUROC < 0.05). The signature score negatively correlated with measures of immune cell infiltration (Ïâ=ââ0.75, pâ<â2.2 Ã 10^â16), with a higher score representing reduced lymphocyte infiltration and a higher 5-year risk of death in stage II melanoma. Our expression signature identifies melanoma patients at higher risk of metastases and warrants further evaluation in adjuvant clinical trials.

An integrated approach to biomarker discovery reveals gene signatures highly predictive of cancer progression

Article Open access 04 December 2020

A genomic signature for accurate classification and prediction of clinical outcomes in cancer patients treated with immune checkpoint blockade immunotherapy

Article Open access 25 November 2020

Gene expression analysis in formalin fixed paraffin embedded melanomas is associated with density of corresponding immune cells in those tissues

Article Open access 27 October 2020

Introduction

Cutaneous melanoma (CM) accounts for 75% of skin cancer-related deaths, and the incidence has been increasing worldwide¹. Most patients present with primary tumours and the majority will be cured by local surgery. Outcomes for patients with metastatic melanoma have improved radically over the past 10 years with the introduction of new systemic therapies², although median survival of patients has remained at ~3 years. Importantly, of those patients who ultimately die of melanoma a significant proportion originally presented with early-stage disease³, suggesting that there is a subgroup of these patients who have aggressive tumours. Thus, optimal management of early melanoma is key to improving outcomes.

Patients with resected AJCC stage III melanoma are now eligible for adjuvant immune checkpoint inhibitors, as well as BRAF-targeted therapies, based on randomised trials confirming a reduction in the risk of relapse and improved overall survival (OS)^4,5,6,7. Clinical trials are underway to evaluate similar therapies in resected stage IIB/C patients⁸, whose outcomes reflect that of untreated stage IIIA/B melanoma⁹. As such, the number of patients eligible for the treatment with adjuvant therapies over the coming years is expected to increase substantially. These modern anti-cancer drugs are high cost and carry a risk of both life-changing and life-threatening toxicities, so there is a growing desire to more accurately predict those patients at high risk of recurrence in whom intervention is expected to be beneficial so-as-to avoid over-treating patients likely to have been cured of their disease by surgery alone.

Gene expression signatures have the potential to improve the prediction of the biological behaviour of melanoma by objectively defining âhigh riskâ on a molecular level¹⁰. Previous transcriptomic analyses of CM identified patterns of gene expression associated with survival independent of AJCC stage¹¹. Building on these data, Gerami et al. first reported a proprietary prognostic gene expression profile (GEP) test utilising a 27-gene panel for use in patients with CM (Gerami_27)¹². The test uses quantitative reverse transcriptase polymerase chain reaction technology to measure the expression of individual genes from formalin-fixed paraffin-embedded (FFPE) primary melanomas to provide a binary classification of low (class 1) or high (class 2) risk for developing metastases within 5 years of diagnosis (with A and B subclasses to further stratify risk)¹³. The signatureâs performance has since been evaluated in a number of retrospective clinical studies evaluating recurrence-free survival^14,15. A further recent unsupervised clustering analysis based on 677 primary melanoma transcriptomes (generated using the Illumina DASL array platform) embedded within a population-controlled cohort study from the Leeds Melanoma Cohort (LMC) identified a six-class 150 gene prognostic signature (herein referred to as LMC_150)¹⁶. The signature uniquely demonstrated prognostic relevance (melanoma-specific survival; MSS) in patients with stage I primary melanoma and further predicted poor outcomes in patients undergoing immunotherapy¹⁶. Heterogeneity in key aspects of the aforementioned studies (including varying trial design, sample type, sequencing platforms and primary outcome measures) may partially explain the small number of overlapping genes (nâ=â4) between both sets of signatures. Furthermore, owing to a lack of prospective data proving the clinical utility of such prognostic molecular tools¹⁷, there are currently no established prognostic biomarkers able to accurately identify truly high-risk patients.

Using patient samples and long-term clinical outcome data from one of the largest adjuvant melanoma trials^18,19, we undertook RNA sequencing of the primary tumour matched with robust prospective clinical data to uncover a molecular signature that could be used to predict patient outcomes. This was then validated in two externally ascertained datasets.

Results

Prognostic signature generated using covariate-corrected differential expression

The structure of the datasets and analyses are depicted in Supplementary Fig.Â S1. Principal component analysis (PCA) showed that primary CMs (nâ=â204) and melanoma spread to local lymph nodes (LNs; nâ=â175) clustered separately, suggesting an impact of the microenvironment on tumour gene expression (Supplementary Fig.Â S2; see âMethodsâ section âVisualization of inherent distribution of samplesâ). We therefore decided to treat these as separate datasets, focussing our analyses on the primary melanoma samples followed by a validation of our results in the regional LN metastases from this dataset. We conducted a differential expression analysis, identifying differences in gene expression levels in primary tumours between those patients with and without distant metastasis over a minimum of 6 years follow up, while controlling for a key set of variables that were independently associated with distant metastases, including Stage (AJCC 7th edition²⁰, herein referred to as âstageâ), Breslow thickness, ECOG; Eastern Cooperative Oncology Group Performance Status and the experimental adjuvant therapy (Supplementary Fig.Â S3a and Supplementary TableÂ S1; see âMethodsâ sections âClinical covariate selectionâ and âDifferential expression analysisâ). Our analyses revealed 197 significantly differentially expressed genes (DEGs, FDR-adjusted p value <0.1) associated with metastases (Supplementary Figs.Â S1 and S3b). These DEGs were further filtered to remove pseudogenes (nâ=â39) and those genes not identified within the LMC DASL array (nâ=â37)¹⁶ to enable external validation of our signature (Supplementary Fig.Â S1). We were therefore left with 121 DEGs, which made up our core prognostic signature herein referred to as âCam_121â (Supplementary DataÂ 1).

Signature added incremental prognostic value when combined with conventional clinical staging

In order to explore the relationship between Cam_121 gene expression and prognosis, we first performed univariate Cox regression using the weighted Cam_121 expression score (see âMethodsâ section âSurvival analysesâ) as a predictor. We found that the Cam_121 signature significantly associated with both OS (hazard ratio (HR)â=â1.64 (95% CI 1.30â2.07), pâ=â3.56âÃâ10^â5) and progression-free survival (PFS; HRâ=â1.63 (95% CI 1.31â2.02), pâ=â8.92âÃâ10^â6; Fig.Â 1). In order to evaluate whether the signature score contributed independent prognostic information while controlling for conventional clinical covariates, multivariate Cox regression analyses were performed (Fig.Â 1c). The signature score was significantly associated with both OS (HRâ=â1.61 (95% CI 1.26â2.07), pâ=â0.000167) and PFS (HRâ=â1.63 (95% CI 1.29â2.07), pâ=â5.24âÃâ10^â5) in multivariate Cox regression models.

**Fig. 1: The Cam_121 gene expression signature is strongly associated with survival in uni- and multivariate Cox regression analyses (AVAST-M primary melanoma cohort; nâ=â194).**

In order to avoid the risk of overfitting, we further tested the performance of Cam_121 in an (entirely separate) set of samples from regional LN metastasis embedded within this dataset (nâ=â143). We found that the weighted signature score was also significantly associated with both OS (HRâ=â1.72 (95% CI 1.37â2.14), pâ=â1.53âÃâ10^â6) and PFS (HRâ=â1.75 (95% CI 1.43â2.16), pâ=â1.10âÃâ10^â7) in multivariate Cox regression models (Fig.Â 2aâc). Thereby indicating that Cam_121 could also be relevant as a prognostic tool after the resection of stage III (regional LN positive) melanoma.

**Fig. 2: Validation of the Cam_121 in further datasets.**

We further tested the signaturesâ prognostic power in four externally acquired independent validation datasets from; (i) the LMC (nâ=â677)¹⁶; (ii) The Cancer Genome Atlas (TCGA-SKCM²¹; skinâ=â159 and LNâ=â216); (iii) the Lund Primary Melanoma Cohort²² (nâ=â223) and (iv) the Australian Melanoma Genome Project²³. Validation within the LMC confirmed that Cam_121 was associated with melanoma-specific survival in both univariate (HRâ=â1.49 (95% CI 1.27â1.74), pâ=â5âÃâ10^â7) and multivariate Cox regression models (HRâ=â1.7 (95% CI), pâ=â0.001, Fig.Â 2d, e). Owing to a lack of power when considering true primary melanomas within TCGA-SKCM dataset (nâ=â87), samples from primary tumours were considered together with regional cutaneous relapsed tumours (defined herein as âSkin TCGA-SKCMâ (nâ=â159)) and were tested separately from the regional LN samples (nâ=â216). Cam_121 was associated with OS in both univariate (skin: HRâ=â1.50, 95% CIâ=â(1.15, 1.96), pâ=â0.00273; LN: HRâ=â1.28, 95% CIâ=â(1.06, 1.56), pâ=â0.0109) and multivariate survival analyses (skin: HRâ=â1.59, 95% CIâ=â(1.17, 2.17), pâ=â0.00348; LN: HRâ=â1.28, 95% CIâ=â(1.03, 1.59), pâ=â0.0256) in these two external datasets (Fig.Â 2fâi). A third external validation was attempted using the Lund Primary Melanoma Cohort²² (nâ=â223); however (owing to the use of less sensitive sequencing technologies within the study), only 24 of the Cam_121 genes were identified within this dataset (Supplementary Fig.Â S5b). Nonetheless, this 24-gene signature was significantly correlated with PFS (HRâ=â1.67 (95% CI 1.06â2.62), pâ=â0.03 univariate Cox regression analyses), but not with OS (pâ=â0.32) (Supplementary Fig.Â S5a). A final validation was attempted using the Australia Melanoma Genome Project dataset²³, in which only 55 samples from a mixture of tissue sites (including primary tumours, regional LNs, distant metastases, in-transit metastases and others) were available for analysis. Cox regression parameter estimates showed the same trend as observed above when comparing the âhighâ/âlowâ risk Cam_121 cohorts in this dataset (Supplementary Fig.Â S4a), however, significance was not achieved. Our power calculations revealed that the sample size was too small to statistically detect the effect of interest with a high probability (Supplementary Fig.Â S4b; see power analysis in âMethodsâ section âPower calculation for the external validation datasetsâ).

The published signature from Gerami et al.¹² (Gerami_27; nâ=â27 genes) was not associated with OS in multivariate models in the AVAST-M primary melanoma dataset. The signature from Thakur et al.¹⁶ (LMC_150; nâ=â150 genes) was associated with both OS and PFS, though the wide confidence intervals may in part be reflective of a higher proportion of stage III patients in the AVAST-M dataset (Supplementary Fig.Â S6).

Cam_121 predicts metastasis better than both clinical covariates and published prognostic signatures

We further sought to determine whether the Cam_121 gene expression signature outperforms key clinical covariates in predicting whether a primary melanoma would ultimately metastasise to distant body sites or not. For this, we developed separate machine learning (ML) classification models using the Cam_121 gene expression values as features, as well as using clinical covariates as features, with the aim of maximising the area under the sensitivity vs (1-specificity) curve (herein referred to as âAUROCâ (area under the receiver operating characteristic curve)) as both the metrics are important in this case (see âMethodsâ section âMachine learning analysisâ; Supplementary Fig.Â S15). Note that these clinical covariates were selected independently based on their level of association with distant metastases (Supplementary TableÂ S1a). We found that models trained with the Cam_121 gene expression signature as features significantly outperformed the models trained with the clinical covariates as features (p_AUROCâ=â2.27âÃâ10^â3, p_sensitivityâ=â1.79âÃâ10^â3, p_specificityâ=â0.46; Supplementary TableÂ S2a), and this remained consistent across all ML classifiers (Fig.Â 3a, Supplementary Figs.Â S7 andÂ S8, and Supplementary TableÂ S2a). The classifier giving the highest AUROC with the prognostic signature gave better results across all three performance metrics than the classifier giving the highest AUROC with the clinical covariates alone: AUROC (0.67âÂ±â0.12 with the clinical covariates alone vs 0.83âÂ±â0.09 with the Cam_121 alone), sensitivity (0.58âÂ±â0.16 vs 0.75âÂ±â0.13) and specificity (0.71âÂ±â0.16 vs 0.73âÂ±â0.14; Fig.Â 3a and Supplementary Fig.Â S8).

**Fig. 3: The Cam_121 gene expression signature is predictive of metastases across multiple machine learning classification models and in an internal validation dataset (nâ=â143).**

In order to reduce the risk of bias that might result from feature selection (of both Cam_121 and clinical covariates) and training/testing from the same dataset, we went on to validate our findings in an entirely independent dataset represented by the regional LN samples from within the AVAST-M dataset (nâ=â143). In this model, the classifiers giving the highest AUROC for each set of features on the training data (AVAST-M primary melanoma data) were selected for further validation within regional LN samples. We found that the classification model developed using the Cam_121 gene expression signature as features (AUROCâ=â0.67) significantly outperformed the classification model developed using the clinical covariates alone (AUROCâ=â0.54; DeLongâs test p valueâ=â0.02, zâ=â2.05; Fig.Â 3b and Supplementary TableÂ S2b). In particular, adding the signature to the clinical covariates (Cam_121â+âclinical covariates) correctly predicted an additional three overlapping cases that were missed out by the model trained on the signature alone (49 overlapping cases in Fig.Â 3c vs 46 overlapping cases in Fig.Â 3d).

In order to test the performance of the published prognostic signatures from Gerami et al.¹² (Gerami_27; nâ=â27 genes) and Thakur et al.¹⁶ (LMC_150; nâ=â150 genes) in predicting metastases in an unbiased way, we repeated the training and testing of classification models in the entirely independent AVAST-M LN dataset (nâ=â143) from which no feature selection has been undertaken. We found that Cam_121 significantly outperformed the baseline clinical covariates (Cam_121 vs clinical covariates: p_AUROCâ=â7.03âÃâ10^â4, z_AUROCâ=â4.44), as well as these two published signatures at the 5% significance level (Cam_121 vs LMC_150: p_AUROCâ=â0.02, z_AUROCâ=â2.30; Cam_121 vs Gerami_27: p_AUROCâ=â0.012, z_AUROCâ=â2.45; Fig.Â 3e and Supplementary TableÂ S2c, d).

Cam_121 gene signature score performed significantly better than genes selected at random in predicting overall and progression-free survival

In light of reports suggesting that randomly selected genes may perform equally well in predicting prognosis as published signatures²⁴, we further tested the performance of Cam_121 against a signature of 121 randomly selected genes (from a pool of 19,434 protein-coding genes in our dataset and repeated 1000 times; see âMethodsâ section âTesting signature performance against randomly selected genesâ). Cam_121 significantly outperformed random signatures across all measures of clinical efficacy including: OS (pâ=â0.001) and PFS (pâ=â0.001) in multivariate Cox regression models (Supplementary Fig.Â S9).

Stage II patients with a âhigh-riskâ signature demonstrated a 33% risk of death at 5 years, a threshold for which adjuvant therapy could be considered

We envisage that one of the central clinical applications of a prognostic GEP might be to identify those patients with stage II melanoma who may be at higher risk of relapse or death and for whom adjuvant systemic therapies may be considered. In order to compare our data with the registration adjuvant melanoma trials^4,5,6,7, we measured the absolute risk of death at 5 years (calculated as the proportion of patients who died due to melanoma within 5 years from diagnosis; see âMethodsâ section âDetermination of the weighted expression score cut-off to define âhighâ and âlowâ absolute risk of death at 5 yearsâ).

Analyses within the LMC cohort (where there was a higher preponderance of early-stage patients) revealed that stage II patients had a 27% (76/279) baseline absolute risk of death at 5 years. This risk rose to 33% (64/192) in those stage II patients with a high-risk-weighted Cam_121 expression score profile and dropped to 14% (12/87) in those stage II patients with a low-risk profile (TableÂ 1). The stratification of high/low-risk cohorts in this context was based on a 0.33 quantile cut off of the weighted Cam_121 expression score and subsequent references to high/low Cam_121 risk groups refer to these weighted expression groups.

Table 1 Clinical utility of GEP test 5-year melanoma-specific survival (Leeds Melanoma Cohort Data).

Full size table

Per-gene analyses

In order to determine the relative influence of each gene and each baseline clinical covariate within the ML model, we analysed their feature importance scores and found that no single feature dominated the performance of the model (Supplementary Fig.Â S10), suggesting that it is the combination of all the features that yielded improved performance over the baseline clinical covariates (Fig.Â 3a, and Supplementary Figs.Â S7 and S8). In keeping with this, none of the Cam_121 genes proved significant in the per-gene multivariate survival analyses after correcting for multiple testing (p value <0.05), further confirming that it is the combination of all the Cam_121 signature genes that provide the improved performance in predicting OS and PFS (Supplementary Figs.Â S11 and S12).

We undertook further multivariate Cox regression analyses for all protein-coding genes in this dataset (nâ=â19,427) and found no single gene was significantly associated with either PFS or OSÂ after correcting for multiple testing (p valueâ<â0.05) (Supplementary DataÂ 2 and Supplementary DataÂ 3).

A high weighted Cam_121 score reflected a lymphocyte depleted tumour with worse clinical outcomes

In order to identify the biological processes reflected by the signature, we ran pre-ranked gene set enrichment analyses on genes ordered by their shrunken log-fold change from the covariate-corrected differential expression analysis (see âMethodsâ section âGene set enrichment analysesâ). In doing this, we found that the top five significantly (FDR-corrected p value <0.05) downregulated hallmark gene sets resulting from this analysis included interferon (IFN) gamma response, IFN alpha response, allograft rejection, inflammatory response and IL6-JAK-STAT3 signalling (Supplementary Fig.Â S13). Interestingly, when we ran gene set enrichment on DEGs derived from the (entirely separate) LN samples (nâ=â143; Supplementary Fig.Â S14), we observed significant (FDR-corrected p value <0.01) downregulation of the exact same immune-related processes (Supplementary Fig.Â S14c). Therefore, indicating that the differential expression analyses (with a predominance of downregulated genes in both the primary melanoma and regional LN datasets) also reflect a significant downregulation of key immune-mediated processes in the samples from patients that developed metastases (Supplementary Figs.Â S3b, S13a and S14c).

We next used the Angelova dataset²⁵ to deconvolute the expression of immune cell subtypes within each sample (see âMethodsâ section âImmune cell correlation analysisâ). We found a negative correlation between the weighted signature score and multiple immune cell types (with the highest correlation found with activated B cells (Ï_{Distant-Metastases=No}â=ââ0.8, exact two-sided p_{Distant-Metastases=No}â<â2.2âÃâ10^â16; Fig.Â 4a), T cells (Ï_{Distant-Metastases=No}â=ââ0.73, p_{Distant-Metastases=No}â<â2.2âÃâ10^â16; Fig.Â 4c), as well the overall immune cell expression score (Ï_{Distant-Metastases=No}â=ââ0.75, p_{Distant-Metastases=No}â<â2.2âÃâ10^â16; Fig.Â 4e). We also found that samples with a high weighted Cam_121 expression score were more likely to develop metastases than samples with a low weighted Cam_121 expression score (Fig.Â 4b, d, f and Supplementary TableÂ S3). Although nine of the Cam_121 signature genes were common with the Angelova immune marker genes (TUBB, AIM2, CASQ1, NTRK1, FASLG, CCR3, P2RY14, PRF1 and CCR5), we were able to demonstrate that samples segregated based on overall immune cell score, with low immune cell expression clustering with high weighted Cam_121 gene expression scores (Fig.Â 4g), using PCA.

Fig. 4: Weighted Cam_121 score negatively correlates with immune cell expression scores, indicating that a lower weighted signature expression score is associated with a richer immune microenvironment and better prognosis.

We further explored the relationship between the weighted Cam_121 gene expression score and histopathologically assessed tumour-infiltrating lymphocyte (TIL) scores (applying independent scoring criteria with both the Clark²⁶ and Melanoma Institute Australia (MIA) scores²⁷, see âMethodsâ section âTumour infiltrating lymphocyte analysisâ). This further confirmed a significant negative correlation between the Cam_121 signature and TIL scores, such that a higher signature score equated to a relatively immune-deprived tumour with consequent worse clinical outcomes (Fig.Â 5). It is nonetheless important to point out that the Cam_121 signature retained its prognostic influence even following correction for pathologically assessed TIL scores, and this remained consistent both within the AVAST-M and the external validation dataset from the LMC (Figs.Â 1c and 2e, respectively).

**Fig. 5: Cam_121 negatively correlates with tumour-infiltrating lymphocyte counts.**

Discussion

The ability to identify primary melanoma patients at risk for disease recurrence is an important unmet need and effective prognostic biomarkers that could serve to guide adjuvant therapy are lacking. The 31-GEP assay (Gerami_27) has been developed in an attempt to address this clinical dilemma, however, the majority of published studies evaluating its performance have been retrospective or prospective cohort studies without a comparator group²⁸, and its use has not been advocated in established clinical guidelines¹⁷. We sought to identify whether the expression of genes in a primary melanoma tumour could predict for distant metastasis and survival, analysing data acquired from one of the largest phase III prospective adjuvant melanoma clinical trials associated with long-term patient outcome data¹⁹. We used covariate-corrected differential expression analyses to identify 121 genes significantly associated with distant metastases, which made up our signature, and found that this added prognostic value in both the prediction of metastasis and survival. The prognostic relevance was further confirmed in two independent external validation cohorts. Immune cell deconvolution analyses revealed that the weighted Cam_121 expression score negatively correlated with multiple measures of lymphocyte infiltration, with a high weighted signature score reflecting a relatively cold tumour immune microenvironment with worse long-term prognosis. These findings were cross-validated using pathologically assessed TIL scores, as well as gene set enrichment analyses, the latter showing that differential expression analyses in both primary melanoma (nâ=â194) and LN (nâ=â143) datasets reflected downregulation of the same key immune-mediated processes in association with metastases. That this conclusion was reached using unbiased differential expression, reaffirms the central importance of the immune system in this context.

The melanoma microenvironment consists of multiple immune and stromal cells, which play a critical role in regulating both the initiation and development of disease. Several studies have demonstrated the association of lymphocyte infiltration with longer survival^29,30,31, as well as an inverse relationship between TIL grade and the presence of LN metastases^27,32, implying that evaluating the tumour microenvironment landscape may hold promise for prognostic biomarkers. However, only a limited number of studies have investigated the immune landscape in primary melanomas. A transcriptomic analysis of primary melanomas identified six distinct subgroups based on their expression of immune-related, keratin and beta-catenin pathway genes³³. In this study, patients with low immune but high beta-catenin score (CIC4) had the poorest OS³³. A recent study utilising high-throughput sequencing of T-cell receptor beta-chain in T2âT4 primary melanomas (nâ=â199) indicated that the T-cell fraction accurately predicted PFS and was independent of other key clinico-pathologic covariates³⁴. Although in our study it was difficult to discern specific immune cell subtypes using bulk RNA sequencing, given that the weighted Cam_121 score was strongly negatively correlated with B cells, T cells and all immune cells (Ï_{Distant-Metastases=No}â=ââ0.8, â0.73 and â0.75, respectively, with exact two-sided p_{Distant-Metastases=No}â<â2.2âÃâ10^â16) and that IFN pathways dominated gene set enrichment, we regard this as further evidence that a successful immune-mediated cytotoxic anti-tumour response exists in primary melanoma. Critically, we found that the signature retained its prognostic power even after correcting for TIL score, and it is our opinion that quantifying the expression of these key immune-mediated genes could potentially provide a more standardised and reproducible measure of immune activity. Furthermore, the prognostic relevance in both primary melanoma and LN datasets attests to the signaturesâ robustness. The challenge over the coming years will be to identify and validate a clinically relevant measure of lymphocytic abundance of relevance to primary CM, that can be easily implemented in real-life clinical practice. These studies will also need to consider aspects of cost-effectiveness, which have not been explored in this analysis.

Interrogating the LMC, we found that the GEP-designated high/low risk could be used to separate patients with â¥33% risk of death at 5 years; a risk threshold for which we believe adjuvant systemic therapies could be considered. Conversely, it is envisaged that âlow-riskâ GEP profiles could also be used to âdownstageâ stage III patients for whom treatment might be unnecessary. There is substantial evidence supporting the importance of pre-treatment immune cell infiltration in eliciting anti-tumour responses with immunotherapy³⁵, however, it remains to be seen whether Cam_121 expression can predict therapeutic responses in this setting. Future well-designed prospective clinical trials will ultimately be required to examine whether Cam_121 can be used to better tailor adjuvant therapy for early-stage melanoma patients.

The present study has a number of advantages over previous analyses. First, the large sample size linked to a well-conducted prospective clinical trial enabled an objective assessment of the risk of distant metastases, in addition to the key survival measures of interest. Furthermore, the long duration of follow up (minimum of 6 years) in a cohort of patients predating modern approved adjuvant systemic therapies provided a unique insight into the ânatural historyâ of primary CM. Finally, to our knowledge, this is the first large-scale biomarker analysis in primary melanoma to make use of data from comprehensive RNA sequencing. That such unbiased genome-wide assessment uncovered the dominance of immune-mediated genes reaffirms the central role of the host immune systemâs ability to respond to the tumour, resulting in immune editing or in some immune control.

It is important to point out that high-quality evidence guiding the best practice use of gene expression predictors, particularly in the context of early-stage CM are lacking. Future trials evaluating adjuvant therapies should examine both primary and locoregional melanoma samples using full RNA-sequencing technologies, to better characterise a molecular subtype/signature that could ultimately be used in conjunction with existing CM staging parameters and tailor future interventions more specifically to the individual. We believe that measures of lymphocytic infiltration should also be assessed. Ultimately such studies will need to show that randomising early-stage melanoma patients based on a high-risk Cam_121 GEP to an intervention (or a change in surveillance) leads to improved outcomes³⁶.

Our results indicate that the Cam_121 signature score complements conventional melanoma staging by contributing prognostically relevant information and could potentially be used to select early-stage melanoma patients at higher risk of relapse or death. Further carefully designed prospective clinical trials will help guide how molecular features can be incorporated with traditional clinico-pathologic features to best estimate individual risk and guide the optimal clinical use of molecular biomarkers.

Methods

AVAST-M melanoma cohort

This study made use of individual patient-level and transcriptomic data from the phase III adjuvant AVAST-M study, investigating the role of the angiogenesis inhibitor bevacizumab vs placebo in high-risk primary CM^18,19. One thousand three hundred forty-three stage American Joint Committee on Cancer stage IIB (T3bN0M0 and T4aN0M0), IIC (T4bN0M0) and III (TxN1â3M0) cutaneous melanoma (seventh edition AJCC²⁰) were recruited to the study over the period July 18, 2007âMarch 29, 2012, as previously described. The study (including the collection of DNA and RNA) was ethically approved in accordance with the Declaration of Helsinki (REC reference number 07/Q1606/15, 16th March 2007). Participants provided written informed consent to sampling of their tumour blocks during study recruitment (and prior to the investigational systemic therapy).

All study participants underwent a sentinel LN node biopsy, and if positive proceeded to a completion LN clearance as per the study protocol. Demographic (including gender, age, centre, as well as pathologic data (site of primary, Breslow depth, ulceration, LN involvement and BRAF/NRAS mutation by pyrosequencing)) was collected at the time of randomisation. Data were also collected on the timing, presence/absence and site/s of distant metastases (according to the findings from CT scanning). Data on overall and progression-free survival were collected with a minimum of 6 years follow up.

RNA-sequencing data was available on 204 primary melanoma samples of which 10 samples were removed from the downstream analyses owing to lack of data on all the clinical covariates, and 175 regional LN samples of which 32 samples removed from the downstream analyses owing to a lack of data on all clinical covariates (Supplementary Fig. S1).

Leeds Melanoma Cohort

A primary melanoma transcriptomic dataset from the LMC study (LMC DASL array) was used as independent replication. This represents a population-controlled cohort study, as previously described¹⁶. This study recorded data on MSS in 677 patients, calculated from the time of diagnosis to the time of last follow up or time of death from melanoma, whichever occurred first. The regression coefficient (beta) for each gene (reflecting differential expression in AVAST-M dataset) was used to generate a weighted signature score in the new dataset. Hence for further analysis, a per-sample weighted gene expression score for our Cam_121 gene signature was calculated by multiplying the expression value of each gene by its corresponding beta coefficient (Eq.Â 1) followed by z-score normalisation (zero meanâunit variance).

$${{{\mathrm{Weighted}}}}\,{{{\mathrm{signature}}}}\,{{{\mathrm{score}}}} = \mathop {\sum}\limits_{i = 1}^n {\beta _i \times {{{\mathrm{gene}}}}_i}$$

(1)

where i ranges from 1 to number of genes in the signature and Î² corresponds to the beta coefficient of the respective gene obtained from DESeq2 analysis on AVAST-M melanoma cohort.

Lund Melanoma Cohort

Gene expression data on 223 primary tumours was generated using the Illumina DASL platform, as previously described²². Data on relapse-free survival as well as OS were collected. The DASL platform analysed 7752 genes and only 24 of the Cam_121 genes were present. Validation was undertaken using weighted signature scores as outlined above (Eq.Â 1).

The Cancer Genome Atlas-SKCM Cohort

The clinical and gene expression data from TCGA-SKCM²¹, was downloaded from the cBioPortal³⁷. The TCGA-SKCM dataset included only CMs, in particular any melanomas within these datasets from acral, mucosal and other rarer sites were excluded. Samples were filtered to a single sample per patient giving a total of 375 samples from 375 patients (including 87 primaries, 72 cutaneous relapses and 216 regional LNs).

Australia Melanoma Genome Project Cohort

All fresh frozen and FFPE samples were obtained in a method that was compliant with the relevant ethical regulations for work with human participants. The fresh-frozen tissue from the biospecimen bank of MIA²³. All tissues and bloods form part of prospective collections of fresh-frozen samples accrued with written informed patient consent. The study was approved by the Sydney Local Health District RPAH zone ethics committee (Protocol No. X15-0454âpreviously X11-0289 and HREC/11/RPAH/444; Protocol No X17-0312âpreviously X11-0023 and HREC/11/RPAH/32; and Protocol No X15-0311âpreviously X10-0300 and HREC/10/RPAH/530). All samples were independently reviewed by expert melanoma pathologists to confirm the origin of each tumour from cutaneous skin.

Total RNA was extracted from fresh-frozen tissue using a mirVana miRNA Isolation Kit (Applied Biosystems, AM1560). RNA quality and presence of a small RNA fraction were measured using the Agilent 2100 RNA 6000 Nano and small RNA kits. RNA sequencing was performed using 1âÎ¼g of total RNA, which was converted into messenger RNA libraries using an Illumina mRNA TruSeq kit. RNA sequencing was performed using 2âÃâ75âbp paired-end reads on an Illumina Hiseq2000. Small RNA sequencing was performed using 1âÎ¼g of total RNA, which was converted into a small RNA libraries, size selection range 145â160âbp (RNA of 18â33 nucleotides) using Illuminaâs TruSeq Small RNA Library Preparation Kit and sequenced on an Illumina Hiseq2000 using 50âbp single-read sequencing with 1% control spiked in. RNA sequence reads were aligned to transcripts corresponding to ensemble 70 annotations using RSEM, raw sequences are available under study accession EGAS00001001552. Data from 55 samples from a mixture of tissue sites (including primary tumours, regional LNs, distant metastases, in-transit metastases and others) were available for this analysis.

mRNA extraction

Histopathological assessments of hematoxylin and eosin (H&E) stained slides were used to facilitate tumour sampling. Samples were consistently extracted from the least inflamed, least stromal regions of the invasive front of the tumour. RNA was extracted using the Roche High Pure FFPE RNA Micro Kit (cat# 04823125001; Genentech Biosciences) according to the manufacturerâs recommendations. RNA quantity and quality were assessed using Agilentâs 2100 bioanalyzer.

Expression data generation

Extracted RNA was sequenced on the Illumina exome-capture sequencing platform, using 50 base-pair paired-end sequencing. Quality control (QC) was performed using fastq_utils (https://github.com/nunofonseca/fastq_utils; v0.14.7) and FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/; v0.11.7). The reads that passed QC were aligned to the reference genome (GRCh38) using TopHat2 (ref. ³⁸). Aligned reads were quantified using HTSeq³⁹. Only those genes with more than five reads, as reported by HTSeq, in at least one sample were selected for further analysis. The sequencing data was of good quality with a median of ~50 million read-pairs/sample. A total of 446 tumour transcriptomes were profiled which included samples from: primary tumour (nâ=â204); LN (nâ=â175); local/distant relapse (nâ=â58) and uncategorised samples (nâ=â9). However, due to the clinical value of primary tumours in facilitating stratification at the earliest disease timepoint, we chose to focus our analyses on samples from cutaneous primaries (nâ=â204) and used the LN samples as an internal validation.

Clinical covariate selection

Firstly, the association between distant relapse (yes/no) and clinical covariates was studied both with or without controlling for length of follow up (defined as the time from diagnosis to last follow up) and for treatment (yes/no). When ignoring length of follow up and treatment, generalised Cochran-Mantel-Haenszel tests (R-package coin⁴⁰ v1.3-1) were used for nominal clinical predictors as they have the Pearsonâs Chi-square tests and Cochran-Armitage trend tests as special cases, when respectively considering the clinical covariate of interest as categorical or ordinal. For ordinal clinical covariates, we reported the ânominal/nominalâ association results when the ânominal/ordinalâ one was found less significant (as it is likely a sign that the assumption of linearity required by the ordinal test was not met). MannâWhitneyâWilcoxon tests were used for continuous clinical covariates. When controlling for length of follow up and treatment, likelihood ratio tests comparing the fits of logistic regression models with and without the clinical predictor of interest were used. The p values were corrected for multiple testing using the HolmâBonferroni method (Supplementary TableÂ S1a). Note that, as the five-level stage variable was highly related to Nclass (Spearman correlation coefficient over 0.85), we picked the one with the lowest number of levels.

The variables two-level Breslow staging and two-level ECOG were significantly associated with relapse. The variable two-level treatment was found to be related to relapse, but is kept as control and the two-level EventMet was the variable of interest indicating whether the patient relapsed or not. Therefore, the covariates Stage, Breslow thickness, ECOG and treatment were accounted for in the design formula of DESeq2 (ref. ⁴¹) without interactions and for further downstream analysis.

Secondly, the association between clinical covariates and OS (calculated from the time of diagnosis to the time of last follow up or death) and PFS (calculated from the time of diagnosis to the time of last follow up or death/progression to metastatic disease, whichever occurred first) was assessed by means of Cox proportional hazard models (R-package survival⁴²). Both outcomes were considered as left-truncated due to delayed patient enrolment and right-censored due to loss of follow up or alive at the time of the end of the study. Six years was chosen as the minimum cut off for these analyses based on the original trial design.

Stage, sex, age and Nclass were significantly associated with both OS and PFS (pâ<â0.05; Supplementary TableÂ S1b, c). The state of distant relapse (âEventMetâ) was the most important variable but was not of our interest, hence dropped. ECOG is a good predictor. Treatment was not significant (pâ>â0.05), but was kept in the analysis. Also for PFS, ulceration (Ulc) was borderline at 5% level and was dropped from further analysis. Therefore, stage, sex, age, Nclass, ECOG and treatment were corrected for in subsequent gene-level survival analyses. The AVAST-M primary melanoma dataset was also corrected for TIL counts (Clark Score, see also section âTumour-infiltrating lymphocyte analysisâ).

Differential expression analysis

Differential expression analysis between primary tumours that became metastatic vs those that remained non-metastatic over the 6-year study period was performed using the package DESeq2 (ref. ⁴¹; v1.18; R v3.6.1). The negative binomial models we considered controlled for the clinical covariates Stage, Breslow thickness, ECOG and treatment, as well as for the library size (offset). Raw read counts were provided as the input, with each column representing a sample and each row representing a gene, along with the categorical clinical information about each sample as colData. Samples with missing information for any of these four covariates were removed from the analysis, leaving 194 samples in total. The adjusted p value cut off (FDR) was set to 0.1 using the alpha parameter in DESeq2 results function and genes with FDR <0.1 were considered significantly differentially expressed.

Log-fold change shrinkage was applied using the lfcshrink function with apeglm method from the apeglm package⁴³ (v1.6.0; R v3.6). For visualisation and other downstream analysis, variance stabilising transformation (vst) was used by means of the DESeq2âs varianceStabilizingTransformation function with option blindâ=âFALSE.

Machine learning analysis

This section explains the steps followed to develop a ML classifier for each signature and to evaluate their performance in predicting relapse (yes/no). A summary of the pipeline is outlined in Supplementary Fig.Â S15. The following steps were conducted using the following packages; caret⁴⁴ v6.0-86, DESeq2 v1.28.1; R v4.0.2, snakemake⁴⁵ v5.17.0. The AVAST-M primary melanoma dataset (nâ=â194) was used for classification model development and the AVAST-M LN dataset (nâ=â143) was used to test the performance of the final model (internal validation). In an independent analysis, we used only the AVAST-M LN dataset for both training/testing (âMethodsâ section âModel development and selectionâ).

Dataset preparation and pre-processing

The AVAST-M primary melanoma (training) dataset was prepared such that each column/feature contained information about all the patients/samples. These features could either be the expression values of the genes within the signature and/or the categorical clinical covariate metadata. In the latter case, the categorical clinical covariates were converted to numeric dummy variables using one-hot encoding. The clinical outcome data (relapse vs non-relapse) were used as labels for the analyses. The AVAST-M LN dataset (testing) dataset was pre-processed in the same way, such that the order of the features was preserved as in the training dataset and in case where the clinical covariates were used as features, they were converted to dummy variables from the train clinical covariates using the predict function in R.

In case of the gene expression data, vst transformation was applied to both training and testing dataset. To apply vst transformation on the testing dataset, mean-dispersion estimates learnt on the training dataset were used.

Next, the features corresponding to near-zero variance were removed using the default parameters of the trainControl function (R-package caret; i.e., freqCutâ=â95/5, uniqueCutâ=â10). The same feature(s) were removed from the testing dataset before evaluating the performance of the fully trained model.

Model development and selection

The aim of this analysis was to critically assess whether the Cam_121 gene expression signature (with or without clinical covariates as features) could outperform clinical covariates alone in predicting relapse (yes/no). We also compared this to the predictive power of two published prognostic signatures (LMC_150 (ref. ¹⁶) and Gerami_27¹²) in an independent analysis. In this model, the training/testing was carried out on the AVAST-M LN dataset instead of AVAST-M primary melanoma dataset on which feature selection was performed for Cam_121 and clinical covariates. This was undertaken to reduce the risk of over-optimistic results that might arise from feature selection and testing from the same dataset. In developing a classification model for each of these five signatures of interest, seven different ML classifiers were considered, including; Bayesian generalised linear model⁴⁶ (bayesglm), Lasso and elastic-net regularised generalised linear model⁴⁷ (glmnet), k-nearest neighbour⁴⁸ (knn), linear discriminant analysis⁴⁹ (lda), random forest⁵⁰ (rf) and support vector machine⁵¹ with linear (svmLinear) and radial kernel (svmRadial).

To avoid overfitting, repeated tenfold cross validation (repeatsâ=â1000) was performed for model development and evaluation. Leave-one-out cross validation method (Supplementary Fig.Â S15) and bootstrap resampling method (Fig.Â 3e; repeats = 1000) were also tested to see if the choice of resampling method altered our results. This was implemented using the trainControl function (R-package caret).

At each training step, a random search was performed using 100 random (combinations of) hyperparameter(s) and the set of hyperparameter(s) leading to the largest maximum AUROC estimate on the training dataset was selected. Using this approach, we obtained 14 different classification models (7 classifiersâÃâ2 resampling methods) for each of the five different signatures. To select the final best performing classification model for each signature, the model giving the highest AUROC value was selected for testing on the AVAST-M LN dataset.

When training the random forest classifier on the feature set Cam_121 with clinical covariates, we estimated the average feature importance score based on repeated tenfold CV and displayed them by means of boxplots using geom_boxplot function in R.

Gene expression vs covariate performance comparison per patient

In order to compare the performance of the signatures: âCam_121â+âclinical covariatesâ, âCam_121â and âclinical covariatesâ on the AVAST-M LN dataset (nâ=â143) on a per-patient basis, the labels predicted by their respective best performing classifiers were extracted (tenfold CVâ+âsvmRadial, tenfold CVâ+âsvmRadial and tenfold CVâ+âsvmLinear respectively). Venn diagrams obtained by means of the venn.diagram function from the VennDiagram package⁵² (v1.6.20) in R were used to visualise overlaps.

Statistical analyses

To check whether the Cam_121 gene expression signature performed better in predicting relapse (yes/no) than models built on clinical covariates alone across multiple (nâ=â7) classifiers, Welch Two Sample t tests were used (R function t.test with option var.equalâ=âFALSE, pairedâ=âFALSE and alternativeâ=ââgreaterâ) for each performance metric and each combination of signature and clinical covariate at the 5% level. The null hypothesis was that the true difference in mean performance across seven classifiers between both conditions (âclinical covariates aloneâ vs âsignature with/without clinical covariatesâ) equals 0, while the alternative hypothesis was that the true difference in means is >0.

To compare the AUROC obtained on the testing dataset, DeLongâs tests⁵³ were used using the roc.test function with alternative âgreaterâ from the pROC package⁵² v1.16.2 in R, where the null hypothesis is two AUROC obtained from the model trained on gene expression as features and the model trained on clinical covariates as features are equal, while the alternate hypothesis is that the model trained on gene expression as features performs better than the model trained on clinical covariates as features. The p values and the z decision threshold values from the test were reported.

Determination of the weighted expression score cut-off to define âhighâ and âlowâ absolute risk of death at 5 years

Data from the LMC were used to calculate the absolute risk of death at 5 years (this dataset was chosen for this analysis due to the preponderance of early-stage patients; stage Iâ=â194 samples; stage IIâ=â279 samples; stage IIIâ=â76 samples). Five-year MSS was calculated such that those patients who died due to melanoma within 5 years of follow up were assigned eventâ=ââYesâ and those that did not were assigned eventâ=ââNoâ. Those patients who did not yet die and were followed up for <5 years were removed from the analysis due to inadequate follow up.

The quantile cut offs 0.25, 0.33 and 0.5 were used to divide patients into high/low groups based on their corresponding weighted Cam_121 expression score. Absolute risk of death at 5 years was calculated as the ratio of patients where eventâ=ââYesâ to the total number of patients within each stage (IâIII). The cut off giving the maximal separation (of absolute risk of death) between high/low groups was selected. This was achieved using a 0.33 quantile cut off of the weighed Cam_121 expression score and subsequent references to high/low Cam_121 risk groups refer to these high/low weighted stratification cohorts.

Survival analyses

For each sample, a vector of weighted signature expression scores was calculated by using Eq.Â 1 on the vst normalised gene expression data. The standardised scores were then used as a continuous predictor in Cox regression models fitted by means of the coxph function of the survival package⁴² (v3.1-12) in R (v3.6.3). The HR (95% CI) and p values corresponding to the signature were reported in both univariate and multivariate analyses. Note that in case of the two published signatures, median gene expression scores were used instead of the weighted gene expression scores.

In order to display KaplanâMeier (KâM) survival curves, samples were divided into âhighâ/âlowâ signature expression groups based on the 0.33 quantile cut off which we obtained from the absolute 5-year risk assessment in âMethodsâ section âDetermination of the weighted expression score cut off to define âhighâ and âlowâ absolute risk of death at 5 yearsâ. Samples with weighted signature expression score greater than this cut off were assigned to the âhighâ group and those with weighted signature expression score lower than this cut off were assigned to the âlowâ group. Of note, we found that the 0.33 quantile cut off of the weighted gene expression scores were remarkably consistent across all the external datasets (data not shown). The survival distribution of both groups was finally compared by means of logrank tests using survfit function from the survival package (v3.1-12) in R (v3.6.3). The parallel processing was conducted using snakemake⁴⁵ v5.17.0. The KâM curves were plotted using plot function from the R-package graphics⁵⁴ v3.6.3 and ggsurvplot function from R-package survminer⁵⁵ v0.4.7.

Tumour immune microenvironment analysis

Sample-level gene expression data from the AVAST-M primary melanoma cohort was deconvoluted into infiltrating immune cell scores using the Angelova dataset²⁵. This dataset reports 812 marker genes corresponding to 31 immune cell subtypes. Out of these 812 genes, 53 genes were missing from our 38,690 gene list. Therefore, two immune cell subtypes MDSC (myeloid-derived suppressor cells) and NK56_bright (natural-killer CD56^bright cells), with >1% of missing marker genes were removed from further analysis, leaving 719 marker genes corresponding to 29 cell types.

Immune cell correlation analysis

To perform correlation analysis for each cell type, the median of the corresponding marker genesâ expression for each sample (y-axis) was plotted against the weighted Cam_121 expression score for that sample (x-axis). To calculate the overall immune score, the median of all 719 marker genes was used for the analysis.

A regression line was fitted through these points using the geom_smooth function (methodâ=ââlmâ) of the ggplot2 package⁵⁶ (v3.3.0) in R (v3.6.2). The Pearsonâs correlation coefficient (Ï) between the immune cell score and the signature, as well as the p value of the corresponding test of association were estimated by means of the function stat_cor of the ggpubr package⁵⁷ (v0.2.5) in R (v3.6.2). Samples were coloured according to their metastatic status. With red points indicating the samples obtained from patients who later metastasised (Yes) and blue points indicating the samples obtained from patients who didnât (No). The density plots were made using the geom_density_2d function of R-package ggplot2 (ref. ⁵¹; v 3.3.0).

To confirm the relationship between the signature and immune cells, a PCA analysis was performed. For each cell type, the corresponding marker genesâ expression was projected into the principal component space and the first two principal components explaining the maximum variance were plotted against each other. These samples were then coloured by the âhighâ/âlowâ weighted Cam_121 expression score groups obtained during the survival analysis and shaped by the âhighâ/âlowâ groups based on marker genesâ median expression corresponding to that particular cell type. Here, to divide the samples into two independent categories based on their marker genesâ expression, the median cut off was used, where the sample with marker genesâ median expression value above its overall median value was assigned to the âhighâ group and that with marker genesâ median expression value below its overall median value was assigned to the âlowâ group.

Gene set enrichment analyses

Preranked GSEA (GSEA-P) was implemented using the GSEAPreranked tool of the GSEA software from Broad Institute^58,59 (v4.0.2). Hallmark gene sets were downloaded from the MSigDB database⁶⁰ (v6.2.0). The genes were preranked according to their shrinked log-fold change values obtained in the differential expression analysis and the GSEAPreranked tool was run with default parameters with the enrichment statistic set to âclassicâ.

Tumour-infiltrating lymphocyte analysis

H&E slides corresponding to each of the 194 samples were digitally scanned to 40Ã magnification using the Vectra Polaris scanner from AKOYA biosciences. TIL scores were double blindly evaluated by two experienced pathologists. Two different scoring methods were used including; (i) the Clark scoring²⁶ and (ii) the MIA system²⁷. This resulted in an agreement of 56% and 40%, respectively. This lack of consistency in scoring (particularly within the ânon-briskâ group) has been previously noted in the literature^61,62. We used a third independent expert pathologist to assess those slides where the two pathologists failed to agree. After removing the slides with poor scan quality, we had Clark TIL scores for 133 primary tumours and MIA TIL scores for 135 primary tumours.

Once the scores were obtained, a violin plot was plotted between the scores and the standardised weighted Cam_121 score using the geom_violin function of ggplot2 package. The p values for pairwise comparisons were obtained using t test and the global p value was computed using ANOVA. This was implemented using the stat_compare_means function from the R-package ggpubr (v0.2.5)⁵⁷.

Visualisation of inherent distribution of samples

To visualise if the samples cluster by their metastatic status, PCA was performed on the primary melanoma samples (nâ=â204) and LN samples (nâ=â175) using 1000 most variable genes. This analysis was performed using the prcomp function from R-package stats⁵⁴ (v3.6.2), plotted using the qplot function from ggplot2 package⁵⁶ (v 3.2.1) and the scree plot was generated using the screeplot function, also using the R-package stats⁵⁴ (v3.6.2). The samples were further coloured by whether they metastasised or not and shaped by their tissue of origin.

Testing signature performance against randomly selected genes

In order to test the performance of our signature against randomly selected genes, random genes of the 121 gene length were selected from 19,434 protein-coding genes. The analysis was repeated using the exact same pipeline to compare its performance against our signature. This process was repeated 1000 times without replacement using 1000 different seeds and the p value testing the significance of our signature was defined as a left-tailed event for predicting OS/PFS survival. This analysis was inspired from the SigCheck package⁶³ (v2.14.0) in R (v3.5.1), whereby the plotting function sigCheckPlotSurvival was modified to accept scores generated from our analysis.

Power calculation for the external validation datasets

To assess whether significant validation of our signature was likely in the external validation datasets, we performed a simulation-based power analysis considering Râ=â2500 Monte Carlo samples. Simulation parameters, like the proportion of events for both outcomes, the hazard ratios and the predictor and right-censored time-to-event distributions, were based on the AVAST-M study. The log-normal and exponential distributions were respectively chosen to model time-to-relapse and time-to-death from time-to-relapse. The normal distribution was selected to model censoring for both outcomes.

Reporting summary

Further information on research design is available in theÂ Nature Research Reporting Summary linked to this article.

Data availability

The raw RNA-sequencing data (forward and reverse fastq files) has been made available at the European Genome-Phenome Archive at the EBI under the following dataset accession ID: EGAD00001006401. The source data underlying this paper are also available through the GitHub repository: Manikgarg/MelanomaTranscriptomics [https://github.com/Manikgarg/MelanomaTranscriptomics/tree/master/Source_Data)⁶⁴. The clinical and gene expression data from The Cancer Genome Atlas (TCGA-SKCM), can be downloaded from the cBioPortal³⁷. Data from the Leeds Melanoma Cohort¹⁶, Lund Melanoma Cohort²² and the Australia Melanoma Genome Project²³ are available from the source publications.The MSigDB database⁶⁰ gene set collections are available for download from http://www.gsea-msigdb.org/gsea/downloads.jsp#msigdb.Â Source data are provided with this paper.

Code availability

The code to reproduce the results is available at the GitHub repository: Manikgarg/MelanomaTranscriptomics (https://github.com/Manikgarg/MelanomaTranscriptomics/tree/master/scripts)⁶⁴.

Change history

17 May 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41467-022-30365-w

References

Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2020. CA Cancer J. Clin. 70, 7â30 (2020).
ArticleÂ PubMedÂ Google ScholarÂ
Spain, L., Larkin, J. & Turajlic, S. New survival standards for advanced melanoma. Br J Cancer 122, 1275â1276 (2020).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Whiteman, D. C., Green, A. C. & Olsen, C. M. The growing burden of invasive melanoma: projections of incidence rates and numbers of new cases in six susceptible populations through 2031. J. Investig. Dermatol. 136, 1161â1171 (2016).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Tarhini, A. A. et al. Phase III study of adjuvant ipilimumab (3 or 10 mg/kg) versus high-dose interferon alfa-2b for resected high-risk melanoma: North American Intergroup E1609. J. Clin. Oncol. 38, 567â575 (2020).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Robert, C. et al. Pembrolizumab versus ipilimumab in advanced melanoma (KEYNOTE-006): post-hoc 5-year results from an open-label, multicentre, randomised, controlled, phase 3 study. Lancet Oncol. 20, 1239â1251 (2019).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Maio, M. et al. Adjuvant vemurafenib in resected, BRAF(V600) mutation-positive melanoma (BRIM8): a randomised, double-blind, placebo-controlled, multicentre, phase 3 trial. Lancet Oncol. 19, 510â520 (2018).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Long, G. V. et al. Adjuvant dabrafenib plus trametinib in stage III BRAF-mutated melanoma. N. Engl. J. Med. 377, 1813â1823 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Luke, J. J. et al. KEYNOTE-716: Phase III study of adjuvant pembrolizumab versus placebo in resected high-risk stage II melanoma. Future Oncol. 16, 4429â4438 (2020).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Poklepovic, A. S. & Luke, J. J. Considering adjuvant therapy for stage II melanoma. Cancer 126, 1166â1174 (2019).
Bhutiani, N., Egger, M. E. & McMasters, K. M. Optimizing follow-up assessment of patients with cutaneous melanoma. Ann. Surg. Oncol. 24, 861â863 (2017).
ArticleÂ PubMedÂ Google ScholarÂ
Jonsson, G. et al. Gene expression profiling-based identification of molecular subtypes in stage IV melanomas with different clinical outcome. Clin. Cancer Res. 16, 3356â3367 (2010).
ArticleÂ PubMedÂ Google ScholarÂ
Gerami, P. et al. Development of a prognostic genetic signature to predict the metastatic risk associated with cutaneous melanoma. Clin Cancer Res 21, 175â183 (2015).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Zager, J. S. et al. Performance of a prognostic 31-gene expression profile in an independent cohort of 523 cutaneous melanoma patients. BMC Cancer 18, 130 (2018).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Gastman, B. R. et al. Identification of patients at risk for metastasis using a prognostic 31-gene expression profile in subpopulations of melanoma patients with favorable outcomes by standard criteria. J. Am. Acad. Dermatol. 80, 149â157 (2018).
Vetto, J. T. et al. Guidance of sentinel lymph node biopsy decisions in patients with T1-T2 melanoma using gene expression profiling. Future Oncol. 15, 1207â1217 (2019).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Thakur, R. et al. Transcriptomic analysis reveals prognostic molecular signatures of stage I melanoma. Clin. Cancer Res. 25, 7424â7435 (2019).
Kovarik, C. L., Chu, E. Y. & Adamson, A. S. Gene expression profile testing for thin melanoma: evidence to support clinical use remains thin. JAMA Dermatol. 156, 837â838 (2020).
ArticleÂ PubMedÂ Google ScholarÂ
Corrie, P. G. et al. Adjuvant bevacizumab in patients with melanoma at high risk of recurrence (AVAST-M): preplanned interim results from a multicentre, open-label, randomised controlled phase 3 study. Lancet Oncol. 15, 620â630 (2014).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Corrie, P. G. et al. Adjuvant bevacizumab for melanoma patients at high risk of recurrence: survival analysis of the AVAST-M trial. Ann. Oncol. 29, 1843â1852 (2018).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Balch, C. M. et al. Final version of 2009 AJCC melanoma staging and classification. J. Clin. Oncol. 27, 6199â6206 (2009).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Genomic Classification of Cutaneous Melanoma. Cell 161, 1681â1696 (2015).
ArticleÂ Google ScholarÂ
Harbst, K. et al. Molecular profiling reveals low- and high-grade forms of primary melanoma. Clin. Cancer Res. 18, 4026â4036 (2012).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature 545, 175â180 (2017).
ArticleÂ ADSÂ CASÂ PubMedÂ Google ScholarÂ
Venet, D., Dumont, J. E. & Detours, V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput. Biol. 7, e1002240 (2011).
Angelova, M. et al. Characterization of the immunophenotypes and antigenomes of colorectal cancers reveals distinct tumor escape mechanisms and novel targets for immunotherapy. Genome Biol. 16, 64 (2015).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Clark, W. H. Jr. et al. Model predicting survival in stage I melanoma based on tumor progression. J. Natl Cancer Inst. 81, 1893â1904 (1989).
ArticleÂ PubMedÂ Google ScholarÂ
Azimi, F. et al. Tumor-infiltrating lymphocyte grade is an independent predictor of sentinel lymph node status and survival in patients with cutaneous melanoma. J. Clin. Oncol. 30, 2678â2683 (2012).
ArticleÂ PubMedÂ Google ScholarÂ
Berman, B. et al. Appropriate use criteria for the integration of diagnostic and prognostic gene expression profile assays into the management of cutaneous malignant melanoma: an expert panel consensus-based modified delphi process assessment. SKIN J. Cutan. Med. 3, 291â306 (2019).
ArticleÂ Google ScholarÂ
van Houdt, I. S. et al. Favorable outcome in clinically stage II melanoma patients is associated with the presence of activated tumor infiltrating T-lymphocytes and preserved MHC class I antigen expression. Int. J. Cancer 123, 609â615 (2008).
ArticleÂ PubMedÂ Google ScholarÂ
Krynitz, B., Rozell, B. L., Lyth, J., Smedby, K. E. & Lindelof, B. Cutaneous malignant melanoma in the Swedish organ transplantation cohort: a study of clinicopathological characteristics and mortality. J. Am. Acad. Dermatol. 73, 106â113.e2 (2015).
ArticleÂ PubMedÂ Google ScholarÂ
Thomas, N. E. et al. Tumor-infiltrating lymphocyte grade in primary melanomas is independently associated with melanoma-specific survival in the population-based genes, environment and melanoma study. J. Clin. Oncol. 31, 4252â4259 (2013).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Donizy, P. et al. Paucity of tumor-infiltrating lymphocytes is an unfavorable prognosticator and predicts lymph node metastases in cutaneous melanoma patients. Anticancer Res. 35, 351â358 (2015).
PubMedÂ Google ScholarÂ
Nsengimana, J. et al. beta-Catenin-mediated immune evasion pathway frequently operates in primary cutaneous melanomas. J. Clin. Investig. 128, 2048â2063 (2018).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Pruessmann, W. et al. Molecular analysis of primary melanoma T cells identifies patients at risk for metastatic recurrence. Nat. Cancer 1, 197â209 (2020).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Vilain, R. E. et al. Dynamic changes in PD-L1 expression and immune infiltrates early during treatment predict response to PD-1 blockade in melanoma. Clin. Cancer Res. 23, 5024â5033 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Grossman, D. et al. Prognostic gene expression profiling in melanoma: necessary steps to incorporate into clinical practice. Melanoma Manag. 6, Mmt32 (2019).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401â404 (2012).
ArticleÂ PubMedÂ Google ScholarÂ
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Anders, S., Pyl, P. T. & Huber, W. HTSeqâa Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166â169 (2015).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Hothorn, T., Hornik, K., van de Wiel, M. A. & Zeileis, A. Lego system for conditional inference. Am. Stat. 60, 257â263 (2006).
ArticleÂ MathSciNetÂ Google ScholarÂ
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Therneau, T. A package for survival analysis in R. R package version 3.1-12 https://CRAN.R-project.org/package=survival (2015).
Zhu, A., Ibrahim, J. G. & Love, M. I. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35, 2084â2092 (2019).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Max Kuhn. Contributions from Jed Wing, S.W., Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt. caret: Classification and Regression Training. R package version 6.0-84. https://CRAN.R-project.org/package=caret (2019).
KÃ¶ster, J. & Rahmann, S. Snakemakeâa scalable bioinformatics workflow engine. Bioinformatics 28, 2520â2522 (2012).
ArticleÂ PubMedÂ Google ScholarÂ
Gelman, A., Jakulin, A., Pittau, M. G. & Su, Y. S. A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2, 1360â1383 (2009).
MathSciNetÂ MATHÂ Google ScholarÂ
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1â22 (2010).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Thomas, C. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. theory 13, 21â27 (1967).
ArticleÂ MATHÂ Google ScholarÂ
Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179â188 (1936).
ArticleÂ Google ScholarÂ
Breiman, L. Random forests. Mach. Learn. 45, 5â32 (2001).
ArticleÂ MATHÂ Google ScholarÂ
Boser, B., Guyon, I. & Vapnik, V. An algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144â152 (ACM, Pittsburgh, PA, 1992).
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837â845 (1988).
ArticleÂ CASÂ PubMedÂ MATHÂ Google ScholarÂ
R Core Team. R: a Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).
Kassambara, A., Kosinski, M. & Biecek, P. survminer: Drawing Survival Curves using âggplot2â. https://cran.r-project.org/web/packages/survminer/index.html (2020).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
Kassambara, A. ggpubr: âggplot2â Based Publication Ready Plots Vol. R package version 0.2.5 https://cran.r-project.org/web/packages/ggpubr/index.html (2020).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545â15550 (2005).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Mootha, V. K. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267â273 (2003).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417â425 (2015).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Saldanha, G., Flatman, K., Teo, K. W. & Bamford, M. A novel numerical scoring system for melanoma tumor-infiltrating lymphocytes has better prognostic value than standard scoring. Am. J. Surg. Pathol. 41, 906â914 (2017).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Park, C. K. & Kim, S. K. Clinicopathological significance of intratumoral and peritumoral lymphocytes and lymphocyte score based on the histologic subtypes of cutaneous melanoma. Oncotarget 8, 14759â14769 (2017).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Stark, R. & Norden, J. SigCheck: check a gene signatureâs prognostic performance against random signatures, known signatures, and permuted data/metadata. R package version 2.18.0. https://rdrr.io/bioc/SigCheck/ (2019).
Garg, M. Melanoma transcriptomics, v1.0.0 edn https://doi.org/10.5281/zenodo.4420033 (2020).

Download references

Acknowledgements

We would like to thank Neera Maroo, Leticia Campo and the Translational Histopathology Laboratory at the Oxford Department of Oncology for sample biobanking and processing, as well as digitally scanning the primary tumour H&E slides. Vivek Iyer for cross-checking the differential expression analyses. Wolfgang Huber and Janet M. Thornton for advice on machine learning analyses. Michael I. Love for advice on data normalisation for machine learning. Nicholas K. Hayward for sharing RNA-sequencing data from the Australia Melanoma Genome Project. Danish Memon for expert input on immune cell deconvolution and Andrea Manrique-Rincon for her help on the critical interpretation of these analyses. Alastair Droop for critical review of the manuscript. This work was supported by Cancer Research UK and the Wellcome Trust. Bevacizumab was supplied by Genentech pharmaceuticals. R.R. was partially supported by the 2019 Cycle for Survivalâs Equinox Innovation Award. N.A.F. was partially supported by the European Unionâs Horizon 2020 research and innovation programme under grant agreement no. 668981. The Leeds Melanoma Cohort research was carried out with research funding CR-UK; C588/A19167, C8216/A6129 and C588/A10721 and NIH; CA83115. M.R.M. is supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. R.A.S. supported by a National Health and Medical Research Council of Australia (NHMRC) Program Grant and Practitioner Fellowship. G.V.L. is supported by the University of Sydney Medical Foundation. Support from the Ainsworth Foundation, the Cameron Family, as well as from colleagues at Melanoma Institute Australia and Royal Prince Alfred Hospital are also gratefully acknowledged.

Author information

These authors contributed equally: David J. Adams, Alvis Brazma, Roy Rabbie.

Authors and Affiliations

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, UK
Manik GargÂ &Â Alvis Brazma
Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, UK
Dominique-Laurent Couturier
University of Leeds School of Medicine, Leeds, United Kingdom
JÃ©rÃ©mie Nsengimana,Â Julia Newton-BishopÂ &Â D. Timothy Bishop
Biostatistics Research Group, Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
JÃ©rÃ©mie Nsengimana
CIBIO/InBIO-Centro de InvestigaÃ§Ã£o em Biodiversidade e Recursos GenÃ©ticos, Universidade do Porto, Rua Padre Armando Quintas, 4485-601, VairÃ£o, Portugal
Nuno A. Fonseca
Oncology Biomarker Development, Genentech Inc., 1 DNA Way, South San Francisco, CA, 94080, USA
Matthew WongchenkoÂ &Â Yibing Yan
Lund University Cancer Center, Lund University, Lund, Sweden
Martin LaussÂ &Â GÃ¶ran B. JÃ¶nsson
Cambridge Cancer Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
Christine Parkinson,Â Pippa CorrieÂ &Â Roy Rabbie
Oxford NIHR Biomedical Research Centre and Department of Oncology, University of Oxford, Oxford, UK
Mark R. Middleton
Department of Pathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
Sarah McDonald,Â Nikki StefanosÂ &Â John Tadross
Melanoma Institute Australia, The University of Sydney, North Sydney, NSW, Australia
Ismael A. Vergara,Â Serigne Lo,Â James S. Wilmott,Â John F. Thompson,Â Georgina V. LongÂ &Â Richard A. Scolyer
Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
Ismael A. Vergara,Â Serigne Lo,Â James S. Wilmott,Â John F. Thompson,Â Georgina V. LongÂ &Â Richard A. Scolyer
Institute for Research and Medical Consultations (IRMC), Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
Serigne Lo
QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
Felicity Newell
Discipline of Surgery, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
John F. Thompson
Royal North Shore and Mater Hospitals, Sydney, Australia
Georgina V. Long
Tissue Pathology and Diagnostic Oncology, Royal Prince Alfred Hospital and New South Wales Health Pathology, Sydney, NSW, Australia
Richard A. Scolyer
Experimental Cancer Genetics, The Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
David J. AdamsÂ &Â Roy Rabbie

Authors

Manik Garg
View author publications
You can also search for this author in PubMedÂ Google Scholar
Dominique-Laurent Couturier
View author publications
You can also search for this author in PubMedÂ Google Scholar
Nuno A. Fonseca
View author publications
You can also search for this author in PubMedÂ Google Scholar
Matthew Wongchenko
View author publications
You can also search for this author in PubMedÂ Google Scholar
Yibing Yan
View author publications
You can also search for this author in PubMedÂ Google Scholar
Martin Lauss
View author publications
You can also search for this author in PubMedÂ Google Scholar
GÃ¶ran B. JÃ¶nsson
View author publications
You can also search for this author in PubMedÂ Google Scholar
Julia Newton-Bishop
View author publications
You can also search for this author in PubMedÂ Google Scholar
Christine Parkinson
View author publications
You can also search for this author in PubMedÂ Google Scholar
Mark R. Middleton
View author publications
You can also search for this author in PubMedÂ Google Scholar
D. Timothy Bishop
View author publications
You can also search for this author in PubMedÂ Google Scholar
Sarah McDonald
View author publications
You can also search for this author in PubMedÂ Google Scholar
Nikki Stefanos
View author publications
You can also search for this author in PubMedÂ Google Scholar
John Tadross
View author publications
You can also search for this author in PubMedÂ Google Scholar
Ismael A. Vergara
View author publications
You can also search for this author in PubMedÂ Google Scholar
Serigne Lo
View author publications
You can also search for this author in PubMedÂ Google Scholar
Felicity Newell
View author publications
You can also search for this author in PubMedÂ Google Scholar
James S. Wilmott
View author publications
You can also search for this author in PubMedÂ Google Scholar
John F. Thompson
View author publications
You can also search for this author in PubMedÂ Google Scholar
Georgina V. Long
View author publications
You can also search for this author in PubMedÂ Google Scholar
Richard A. Scolyer
View author publications
You can also search for this author in PubMedÂ Google Scholar
Pippa Corrie
View author publications
You can also search for this author in PubMedÂ Google Scholar
David J. Adams
View author publications
You can also search for this author in PubMedÂ Google Scholar
Alvis Brazma
View author publications
You can also search for this author in PubMedÂ Google Scholar
Roy Rabbie
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

R.R., A.B. and D.J.A. conceived the project. MRM co-ordinated the sample biobanking. M.W. and Y.Y. co-ordinated the mRNA extraction and RNA sequencing from the AVAST-M trial. R.R. co-ordinated the sequencing and clinical data extraction. N.A.F. derived the RNA-seq counts from raw sequencing reads. R.R., D.J.A., A.B. and M.G. developed the clinical questions and experiments and M.G. carried out the analysis. D.L.C. ran the statistical analyses. J.N., T.B. and J.N.B. validated the findings in the Leeds Melanoma Cohort. M.L. and G.B.J. validated the findings in the Lund Melanoma Cohort. P.C., M.R.M. and C.P. provided senior input on the translational scope of the project. S.M., N.S. and J.T. annotated histopathological slides with the tumour-infiltrating scores. I.V., S.L., F.N., J.S.W., J.F.T., G.V.L. and R.A.S. facilitated validation within the Australian Melanoma Genome Project. M.G. and R.R. wrote the manuscript. A.B. and D.J.A. provided overall study supervision. All authors approved the final manuscript.

Corresponding author

Correspondence to Roy Rabbie.

Ethics declarations

Competing interests

P.C. is an advisory board member at Roche. M.W./Y.Y. are employees and stockholders at Roche/Genentech. D.J.A. is a consultant at Microbiotica. R.A.S. has received fees for professional services from Qbiotics, Novartis, MSD Sharp & Dohme, NeraCare, AMGEN Inc., Bristol-Myers Squibb, Myriad Genetics, GlaxoSmithKline. G.V.L. is consultant advisor for Aduro Biotech Inc, Amgen Inc, Array Biopharma inc, Boehringer Ingelheim International GmbH, Bristol-Myers Squibb, Highlight Therapeutics S.L., Merck Sharpe & Dohme, Novartis Pharma AG, QBiotics Group Limited, Regeneron Pharmaceuticals Inc, SkylineDX B.V. J.F.T. has received honoraria for advisory board participation from BMS Australia, MSD Australia, GSK and Provectus Inc, and travel support from GlaxoSmithKline and Provectus Inc. All other authors report no conflicts of interest.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Updated Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Garg, M., Couturier, DL., Nsengimana, J. et al. Tumour gene expression signature in primary melanoma predicts long-term outcomes. Nat Commun 12, 1137 (2021). https://doi.org/10.1038/s41467-021-21207-2

Download citation

Received: 10 March 2020
Accepted: 15 January 2021
Published: 18 February 2021
DOI: https://doi.org/10.1038/s41467-021-21207-2