Next generation pan-cancer blood proteome profiling using proximity extension assay

Emma Lundin

Next generation pan-cancer blood proteome profiling using proximity extension assay

Nature Communications

A comprehensive characterization of blood proteome profiles in cancer patients can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratification and better monitoring of the different cancer subtypes. Here, we describe the use of next generation protein profiling to explore the proteome signature in blood across patients representing many of the major cancer types. Plasma profiles of 1463 proteins from more than 1400 cancer patients are measured in minute amounts of blood collected at the time of diagnosis and before treatment. An open access Disease Blood Atlas resource allows the exploration of the individual protein profiles in blood collected from the individual cancer patients. We also present studies in which classification models based on machine learning have been used for the identification of a set of proteins associated with each of the analyzed cancers. The implication for cancer precision medicine of next generation pl......Read more

Article https://doi.org/10.1038/s41467-023-39765-y Next generation pan-cancer blood proteome proﬁling using proximity extension assay María Bueno Álvez 1 , Fredrik Edfors 1 , Kalle von Feilitzen 1 , Martin Zwahlen 1 , Adil Mardinoglu 1,2 , Per-Henrik Edqvist 3 , Tobias Sjöblom 3 , Emma Lundin 3 , Natallia Rameika 3 , Gunilla Enblad 3 , Henrik Lindman 3 , Martin Höglund 4 , Göran Hesselager 4 , Karin Stålberg 5 , Malin Enblad 6 , Oscar E. Simonson 6 , Michael Häggman 6 , Tomas Axelsson 4 , Mikael Åberg 7 , Jessica Nordlund 4 , Wen Zhong 8 , Max Karlsson 1 , Ulf Gyllensten 3 , Fredrik Ponten 3 , Linn Fagerberg 1 & Mathias Uhlén 1,9 A comprehensive characterization of blood proteome proﬁles in cancer patients can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratiﬁcation and better monitoring of the different cancer subtypes. Here, we describe the use of next generation pro- tein proﬁling to explore the proteome signature in blood across patients representing many of the major cancer types. Plasma proﬁles of 1463 proteins from more than 1400 cancer patients are measured in minute amounts of blood collected at the time of diagnosis and before treatment. An open access Disease Blood Atlas resource allows the exploration of the individual protein proﬁles in blood collected from the individual cancer patients. We also present studies in which classiﬁcation models based on machine learning have been used for the identiﬁcation of a set of proteins associated with each of the analyzed cancers. The implication for cancer precision medicine of next gen- eration plasma proﬁling is discussed. Cancer is a highly heterogeneous disease in need of accurate and non-invasive diagnostic tools. Cancer Precision Medicine aims to enable high-resolution individualized diagnosis by the use of molecular tools such as genomics, proteomics and metabolomics, with subsequent optimized treatment and monitoring of cancer patients. Of particular importance is the possibility to identify cancers early, allowing initiation of treatment and thereby improv- ing patient outcome by avoiding tumor progression, metastasis, and emergence of treatment resistant tumors. When cancers are detected at an earlier stage, treatment is more effective and survival is drastically improved 1 . As an example, according to US-based statistics 2 , the ﬁve-year survival for breast cancer is 99% when detected at an early stage (localized), whereas survival decreases to only 30% when detected at later stages (metastasized). Similarly, the corresponding survival for ovarian cancer is 93% at early stage and 31% when detected at later stage 2 . Based on this, several population screening programs have been initiated to identify cancer before symptoms arise, including screening for prostate cancer using PSA protein level 3 , colorectal cancer by detecting blood in feces 4 , and breast cancer using mammography 5 . Received: 26 October 2022 Accepted: 27 June 2023 Check for updates 1 Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden. 2 Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London SE1 9RT, UK. 3 Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden. 4 Department of Medical Sciences, Uppsala University, Uppsala, Sweden. 5 Department of Women’s and Children’s Health, Uppsala University, Uppsala, Sweden. 6 Department of Surgical Sciences, Uppsala University, Uppsala, Sweden. 7 Department of Medical Sciences, Clinical Chemistry and SciLifeLab Afﬁnity Proteomics, Uppsala University, Uppsala, Sweden. 8 Science for Life Laboratory, Department of Biomedical and Clinical Sciences (BKV), Linköping University, Linköping, Sweden. 9 Department of Neuroscience, Karolinska Institutet, Stockholm, Sweden. e-mail: mathias.uhlen@scilifelab.se Nature Communications | (2023)14:4308 1 1234567890():,; 1234567890():,;

The main focus of Cancer Precision Medicine in the past decade has been to use genomics, involving next-generation sequencing to explore the genetic make-up of individual cancers. Huge efforts have been made to gain genetic insight into tumors from patients, including The Cancer Genome Atlas (TCGA) 6,7 ; the International Cancer Genome Consortium (ICGC) 8 ; and the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium 9 . Although invaluable insights regarding the biology of individual cancers have been gained by these efforts, the genomics information has not led to substantial changes in ther- apeutic regimes or facilitated screening for cancer in the population. Therefore, a move towards a multi-omics analysis has been suggested 10 , including functional analysis and alternative assay plat- forms, such as proteomics using either dissected tumor biopsies or non-invasive body ﬂuids 11 . An interesting approach in Cancer Precision Medicine is thus to use protein proﬁling to allow for liquid biopsy assays from minute amounts of blood. An attractive vision would be to allow multiple cancer types to be screened and detected using a single multiplex protein assay. However, the staggering dynamic range in concentra- tions of blood proteins spanning at least ten orders of magnitude, with concentrations as low as pg/ml for cytokines, makes multiplex analysis involving even a handful of protein targets difﬁcult. This has hampered the development of multiplex blood protein assays during the last few decades. This situation has now changed with the recent development of high-throughput platforms for sensitive proteomics assays in blood, such as Somascan 12 and Proximity Extension Assay (PEA) 13 . These platforms allow thousands of target proteins to be analyzed simulta- neously using a few microliters of blood with sensitivity to detect and quantify proteins present in low femtomolar amounts. This means that even proteins well below the detection level for mass spectrometry can now be accurately quantiﬁed and used for population screening. Here, we describe a strategy for pan-cancer analysis in which the plasma proﬁles of patients with different types of cancer are compared to ﬁnd cancer-speciﬁc signatures that can distinguish each type of cancer from other cancer types. Next Generation Blood Proﬁling 14 , combining the antibody-based PEA with next-generation sequencing, has been used to quantify protein concentrations in multiple cancer types. Samples of more than 1400 cancer patients from a standardized biobank collection have been analyzed, along with a wealth of clinical metadata 15 . Altogether 12 cancer types including the most prevalent types such as colorectal-, breast-, lung-, and prostate-cancer, have been studied. The data is presented in the Disease Blood Atlas resource, which is available without restrictions (open access) to allow researchers both from academia and industry to explore the individual blood protein proﬁles from cancer patients. We also present initial studies in which classiﬁcation models based on machine learning have been used to identify a panel of proteins associated with each of the analyzed cancers. Results The pan-cancer cohort In this study, we have characterized the plasma proteome of a pan- cancer cohort from the Uppsala-Umeå Comprehensive Cancer Con- sortium (U-CAN) biobank 15 , comprising 1477 patients from twelve cancer types, including acute myeloid leukemia (AML) (n = 50), chronic lymphocytic leukemia (CLL) (n = 48), diffuse large B-cell lym- phoma (DLBCL) (n = 55), myeloma (n = 38), colorectal cancer (n = 221), lung cancer (n = 268), glioma (n = 145), breast cancer (n = 152), cervical cancer (n = 102), endometrial cancer (n = 101), ovarian cancer (n = 134), and prostate cancer (n = 163). Plasma samples were collected at the time of diagnosis and before treatment was initiated. Summary sta- tistics for the cancer cohorts regarding age, sex, grade, and stage distribution are available in Suppl. data 1. A summary of the age dis- tribution of the cancer patients is shown in Fig. 1a and the clinical metadata regarding age, sex, diagnosis, and cancer stage or grade available for the cancer samples are available in Suppl. data 2. The open access Human Disease Blood Atlas resource The Human Disease Blood Atlas resource has been created as part of the Human Protein Atlas (v22.proteinatlas.org). This section contains more than 2 million data points representing the individual blood level for target proteins in 1477 cancer patients. The individual protein levels in blood are presented across these cancer patients character- ized using the Olink Explore 1536 Proximity Extension Assay (PEA) technology, allowing the quantiﬁcation of 1463 proteins using less than 3 microliters of plasma 13 . The Olink Explore has been shown to be a robust platform 13 , and we here report on the coefﬁcient of variation (CV) with an average IntraCV of 13.3% and average InterCV of 21.1% (Fig. S1a), and a high interpanel correlation for assays used as technical controls (r = 0.97 for IL6, r = 0.96 for CXCL8 and r = 0.91 for TNF) (Fig. S1b). Several upregulated and downregulated proteins in speciﬁc cancer types can be observed as exempliﬁed in Fig. 1b. Some of these potential biomarkers are cancer-speciﬁc, such as Fms-related receptor tyrosine kinase 3 (FLT3) in AML and SLAM family member 7 (SLAMF7) in myeloma, while others are found to be elevated in two or more cancers, such as lymphocyte antigen 9 (LY9) with higher expression in both CLL and myeloma. Interestingly, the B lymphocyte antigen receptor CD79b molecule (CD79B) exhibits elevated plasma levels in all four immune cell-related cancers. Figure 1c shows an overview of our workﬂow used to identify cancer-associated proteins based on both differential expression analysis and classiﬁcation models. Identiﬁcation of cancer-speciﬁc proteins using differential expression To investigate the cancer-speciﬁc proteome proﬁles, differential expression analyses were performed where each cancer was compared to all other cancers (Fig. 1c). For the male and female cancers, only samples with the same sex were compared. The up- and down- regulated proteins in each cancer are summarized by volcano plots (Fig. 2a and Fig. S2a). For glioma, the signiﬁcantly upregulated proteins include the glial ﬁbrillary acidic protein (GFAP), a protein with enriched expression in astrocytes according to the Human Protein Atlas (v22.proteinatlas.org) and for AML, the most signiﬁcant protein is FLT3, a protein with elevated expression in lymphoid tissues. FKBP prolyl isomerase 1B, a protein shown by HPA to be elevated in reg- ulatory T-cells, is upregulated in colorectal cancer, while progesterone associated emndometrial protein (PAEP), a protein secreted in the female reproductive tissues according to HPA, is signiﬁcantly upre- gulated in ovarian cancer. The results for all 12 cancer types can be found on the interactive Disease Blood Atlas resource with links to the underlying blood levels for all analyzed proteins. In Fig. 2b, the number of up- and downregulated proteins are shown across the 12 cancers. The results show that a large fraction of the analyzed proteins is differentially expressed. The overlap between proteins upregulated in more than one different cancer type is shown in Fig. 2c. As expected, there is a large number of upregulated proteins shared by the four immune cell-related cancers (AML, CLL, lymphoma, and myeloma), in many cases consisting of proteins related to immune-related functions. However, the largest number of over- lapping proteins is observed for lung and colorectal cancer. This observation might reﬂect common features between these two cancer types, such as adenocarcinoma origin and a high fraction of high-grade tumors with likely similar host inﬂammatory response. A functional gene ontology (GO) analysis was also performed for the upregulated proteins for each of the cancer types (Fig. S2b). As expected, the upregulated proteins in the immune cell-related cancers (AML, CLL, and lymphoma) are related to immune processes, while breast, endometrial, and prostate cancer have an over-representation of cell Article https://doi.org/10.1038/s41467-023-39765-y Nature Communications | (2023)14:4308 2

Article https://doi.org/10.1038/s41467-023-39765-y Next generation pan-cancer blood proteome proﬁling using proximity extension assay Received: 26 October 2022 Accepted: 27 June 2023 1234567890():,; 1234567890():,; Check for updates María Bueno Álvez 1, Fredrik Edfors 1, Kalle von Feilitzen 1, Martin Zwahlen 1, Adil Mardinoglu 1,2, Per-Henrik Edqvist 3, Tobias Sjöblom 3, Emma Lundin 3, Natallia Rameika 3, Gunilla Enblad 3, Henrik Lindman3, Martin Höglund 4, Göran Hesselager 4, Karin Stålberg 5, Malin Enblad 6, Oscar E. Simonson 6, Michael Häggman6, Tomas Axelsson4, Mikael Åberg 7, Jessica Nordlund 4, Wen Zhong 8, Max Karlsson 1, Ulf Gyllensten 3, Fredrik Ponten 3, Linn Fagerberg 1 & Mathias Uhlén 1,9 A comprehensive characterization of blood proteome proﬁles in cancer patients can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratiﬁcation and better monitoring of the different cancer subtypes. Here, we describe the use of next generation protein proﬁling to explore the proteome signature in blood across patients representing many of the major cancer types. Plasma proﬁles of 1463 proteins from more than 1400 cancer patients are measured in minute amounts of blood collected at the time of diagnosis and before treatment. An open access Disease Blood Atlas resource allows the exploration of the individual protein proﬁles in blood collected from the individual cancer patients. We also present studies in which classiﬁcation models based on machine learning have been used for the identiﬁcation of a set of proteins associated with each of the analyzed cancers. The implication for cancer precision medicine of next generation plasma proﬁling is discussed. Cancer is a highly heterogeneous disease in need of accurate and non-invasive diagnostic tools. Cancer Precision Medicine aims to enable high-resolution individualized diagnosis by the use of molecular tools such as genomics, proteomics and metabolomics, with subsequent optimized treatment and monitoring of cancer patients. Of particular importance is the possibility to identify cancers early, allowing initiation of treatment and thereby improving patient outcome by avoiding tumor progression, metastasis, and emergence of treatment resistant tumors. When cancers are detected at an earlier stage, treatment is more effective and survival is drastically improved1. As an example, according to US-based statistics2, the ﬁve-year survival for breast cancer is 99% when detected at an early stage (localized), whereas survival decreases to only 30% when detected at later stages (metastasized). Similarly, the corresponding survival for ovarian cancer is 93% at early stage and 31% when detected at later stage2. Based on this, several population screening programs have been initiated to identify cancer before symptoms arise, including screening for prostate cancer using PSA protein level3, colorectal cancer by detecting blood in feces4, and breast cancer using mammography5. 1 Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Stockholm, Sweden. 2Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London SE1 9RT, UK. 3Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden. 4Department of Medical Sciences, Uppsala University, Uppsala, Sweden. 5Department of Women’s and Children’s Health, Uppsala University, Uppsala, Sweden. 6Department of Surgical Sciences, Uppsala University, Uppsala, Sweden. 7 Department of Medical Sciences, Clinical Chemistry and SciLifeLab Afﬁnity Proteomics, Uppsala University, Uppsala, Sweden. 8Science for Life Laboratory, Department of Biomedical and Clinical Sciences (BKV), Linköping University, Linköping, Sweden. 9Department of Neuroscience, e-mail: mathias.uhlen@scilifelab.se Karolinska Institutet, Stockholm, Sweden. Nature Communications | (2023)14:4308 1 Article The main focus of Cancer Precision Medicine in the past decade has been to use genomics, involving next-generation sequencing to explore the genetic make-up of individual cancers. Huge efforts have been made to gain genetic insight into tumors from patients, including The Cancer Genome Atlas (TCGA)6,7; the International Cancer Genome Consortium (ICGC)8; and the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium9. Although invaluable insights regarding the biology of individual cancers have been gained by these efforts, the genomics information has not led to substantial changes in therapeutic regimes or facilitated screening for cancer in the population. Therefore, a move towards a multi-omics analysis has been suggested10, including functional analysis and alternative assay platforms, such as proteomics using either dissected tumor biopsies or non-invasive body ﬂuids11. An interesting approach in Cancer Precision Medicine is thus to use protein proﬁling to allow for liquid biopsy assays from minute amounts of blood. An attractive vision would be to allow multiple cancer types to be screened and detected using a single multiplex protein assay. However, the staggering dynamic range in concentrations of blood proteins spanning at least ten orders of magnitude, with concentrations as low as pg/ml for cytokines, makes multiplex analysis involving even a handful of protein targets difﬁcult. This has hampered the development of multiplex blood protein assays during the last few decades. This situation has now changed with the recent development of high-throughput platforms for sensitive proteomics assays in blood, such as Somascan12 and Proximity Extension Assay (PEA)13. These platforms allow thousands of target proteins to be analyzed simultaneously using a few microliters of blood with sensitivity to detect and quantify proteins present in low femtomolar amounts. This means that even proteins well below the detection level for mass spectrometry can now be accurately quantiﬁed and used for population screening. Here, we describe a strategy for pan-cancer analysis in which the plasma proﬁles of patients with different types of cancer are compared to ﬁnd cancer-speciﬁc signatures that can distinguish each type of cancer from other cancer types. Next Generation Blood Proﬁling14, combining the antibody-based PEA with next-generation sequencing, has been used to quantify protein concentrations in multiple cancer types. Samples of more than 1400 cancer patients from a standardized biobank collection have been analyzed, along with a wealth of clinical metadata15. Altogether 12 cancer types including the most prevalent types such as colorectal-, breast-, lung-, and prostate-cancer, have been studied. The data is presented in the Disease Blood Atlas resource, which is available without restrictions (open access) to allow researchers both from academia and industry to explore the individual blood protein proﬁles from cancer patients. We also present initial studies in which classiﬁcation models based on machine learning have been used to identify a panel of proteins associated with each of the analyzed cancers. Results The pan-cancer cohort In this study, we have characterized the plasma proteome of a pancancer cohort from the Uppsala-Umeå Comprehensive Cancer Consortium (U-CAN) biobank15, comprising 1477 patients from twelve cancer types, including acute myeloid leukemia (AML) (n = 50), chronic lymphocytic leukemia (CLL) (n = 48), diffuse large B-cell lymphoma (DLBCL) (n = 55), myeloma (n = 38), colorectal cancer (n = 221), lung cancer (n = 268), glioma (n = 145), breast cancer (n = 152), cervical cancer (n = 102), endometrial cancer (n = 101), ovarian cancer (n = 134), and prostate cancer (n = 163). Plasma samples were collected at the time of diagnosis and before treatment was initiated. Summary statistics for the cancer cohorts regarding age, sex, grade, and stage distribution are available in Suppl. data 1. A summary of the age distribution of the cancer patients is shown in Fig. 1a and the clinical Nature Communications | (2023)14:4308 https://doi.org/10.1038/s41467-023-39765-y metadata regarding age, sex, diagnosis, and cancer stage or grade available for the cancer samples are available in Suppl. data 2. The open access Human Disease Blood Atlas resource The Human Disease Blood Atlas resource has been created as part of the Human Protein Atlas (v22.proteinatlas.org). This section contains more than 2 million data points representing the individual blood level for target proteins in 1477 cancer patients. The individual protein levels in blood are presented across these cancer patients characterized using the Olink Explore 1536 Proximity Extension Assay (PEA) technology, allowing the quantiﬁcation of 1463 proteins using less than 3 microliters of plasma13. The Olink Explore has been shown to be a robust platform13, and we here report on the coefﬁcient of variation (CV) with an average IntraCV of 13.3% and average InterCV of 21.1% (Fig. S1a), and a high interpanel correlation for assays used as technical controls (r = 0.97 for IL6, r = 0.96 for CXCL8 and r = 0.91 for TNF) (Fig. S1b). Several upregulated and downregulated proteins in speciﬁc cancer types can be observed as exempliﬁed in Fig. 1b. Some of these potential biomarkers are cancer-speciﬁc, such as Fms-related receptor tyrosine kinase 3 (FLT3) in AML and SLAM family member 7 (SLAMF7) in myeloma, while others are found to be elevated in two or more cancers, such as lymphocyte antigen 9 (LY9) with higher expression in both CLL and myeloma. Interestingly, the B lymphocyte antigen receptor CD79b molecule (CD79B) exhibits elevated plasma levels in all four immune cell-related cancers. Figure 1c shows an overview of our workﬂow used to identify cancer-associated proteins based on both differential expression analysis and classiﬁcation models. Identiﬁcation of cancer-speciﬁc proteins using differential expression To investigate the cancer-speciﬁc proteome proﬁles, differential expression analyses were performed where each cancer was compared to all other cancers (Fig. 1c). For the male and female cancers, only samples with the same sex were compared. The up- and downregulated proteins in each cancer are summarized by volcano plots (Fig. 2a and Fig. S2a). For glioma, the signiﬁcantly upregulated proteins include the glial ﬁbrillary acidic protein (GFAP), a protein with enriched expression in astrocytes according to the Human Protein Atlas (v22.proteinatlas.org) and for AML, the most signiﬁcant protein is FLT3, a protein with elevated expression in lymphoid tissues. FKBP prolyl isomerase 1B, a protein shown by HPA to be elevated in regulatory T-cells, is upregulated in colorectal cancer, while progesterone associated emndometrial protein (PAEP), a protein secreted in the female reproductive tissues according to HPA, is signiﬁcantly upregulated in ovarian cancer. The results for all 12 cancer types can be found on the interactive Disease Blood Atlas resource with links to the underlying blood levels for all analyzed proteins. In Fig. 2b, the number of up- and downregulated proteins are shown across the 12 cancers. The results show that a large fraction of the analyzed proteins is differentially expressed. The overlap between proteins upregulated in more than one different cancer type is shown in Fig. 2c. As expected, there is a large number of upregulated proteins shared by the four immune cell-related cancers (AML, CLL, lymphoma, and myeloma), in many cases consisting of proteins related to immune-related functions. However, the largest number of overlapping proteins is observed for lung and colorectal cancer. This observation might reﬂect common features between these two cancer types, such as adenocarcinoma origin and a high fraction of high-grade tumors with likely similar host inﬂammatory response. A functional gene ontology (GO) analysis was also performed for the upregulated proteins for each of the cancer types (Fig. S2b). As expected, the upregulated proteins in the immune cell-related cancers (AML, CLL, and lymphoma) are related to immune processes, while breast, endometrial, and prostate cancer have an over-representation of cell 2 Article https://doi.org/10.1038/s41467-023-39765-y a Sex Male Female Cancer b AML CLL Myeloma DLBCL FLT3 Other cancers SLAMF7 6 Healthy 4 4 Ovarian Endometrial 2 2 Cervical 0 0 Breast NPX Cancer −2 Prostate Glioma LY9 CD79B 6 Colorectal Lung 4 2 Myeloma 2 DLBCL 0 0 CLL Ovarian Endometrial Breast Cervical Glioma Prostate Lung Colorectal Myeloma CLL DLBCL Ovarian Endometrial Breast Cervical Glioma Prostate Lung Colorectal Myeloma CLL AML 200 Number of samples −2 DLBCL Age 100 0 80 60 40 20 −2 AML AML Cancer c Dataset overview Classification models Sample collection Cancer-specific classification U-CAN cohort 1,477 cancer patients 12 cancer types Wellness cohort 74 healthy patients Proximity Extension Assay 1,463 protein targets Data preprocessing 12 cancer types Classify each cancer against the rest Machine learning models based on glmnet Pan-cancer multiclassification Identify a pan-cancer protein panel Evaluate classification based on pan-cancer panel ROC AUC Differential expression Assessment to healthy individuals Compare each cancer to the rest Classify each cancer from a healthy cohort based on cancer-specific panel proteins cancer signature adhesion proteins and both lung and colorectal cancer had an overrepresentation of apoptotic-related proteins. Cancer-speciﬁc classiﬁcation models To identify proteins relevant for each cancer type, a disease classiﬁcation model was built for each cancer, respectively, using all measured proteins as input (n = 1463) and 70% of the cancer patients as Nature Communications | (2023)14:4308 the training set (Fig. 1c). To build the models, the machine learning algorithm glmnet16, which is based on regularized generalized linear models, was selected. The control group in each model was composed of all the other cancer samples and was subsampled to include a similar number of patients to the modeled cancer. For the male and female cancers, only samples with the same sex were used as controls. 3 Article https://doi.org/10.1038/s41467-023-39765-y Fig. 1 | Overview of the pan-cancer study. a Age distribution and number of patients included for each cancer and the healthy cohort. b Examples of protein levels for four example proteins across the 12 cancer types. Boxplots summarize the median value, upper and lower hinges corresponding to the ﬁrst and third quartiles, and whiskers indicating the minimum and maximum values within 1.5 times the IQR. Individual data points are presented for each cancer group, with n = 1462, n = 1402, n = 1462, and n = 1399 independent samples for CD79B, FLT3, LY9, and SLAMF7, respectively. c Schematic representation of the workﬂow used in this study. Blood plasma from 1477 cancer patients and 74 healthy individuals was not significant significant down AML Colorectal FLT3 20 15 30 FLT3LG MMP9 LCN2 EPO CD244 20 −log10(p.adjusted) 10 TGM2 600 FKBP1B MFGE8 GLO1 400 200 0 Intersection Size 0 25 50 75 0 0 −2.5 30 PRDX5 PRDX6 10 5 40 c significant up Myeloma CLL AML Lung Prostate Breast DLBCL Colorectal Glioma Endometrial Ovarian Cervical Significance Set Size a analyzed using Proximity Extension Assay. Differential expression analysis and classiﬁcation models was used to compare one cancer to all other cancers and identify cancer-associated proteins. The models for cancer classiﬁcation were generated using machine learning techniques (70% of the data in training set). The resulting pan-cancer protein panel was used in a pan-cancer multiclassiﬁcation strategy, and the performance tested against a test set (30% of the data) and ultimately compared against healthy individuals. Source data are provided as a Source data ﬁle. AML acute myeloid leukemia, CLL chronic lymphocytic leukemia, DLBCL diffuse large B-cell lymphoma. 0.0 5.0 2.5 -1.0 -0.5 0.0 Glioma TNFRSF9 MMP12 CCDC80 0.5 1.5 1.0 Ovarian PAEP 30 IL1R2 PROC WFDC2 GFAP 20 20 MUC16 CNDP1 ICAM3 ITGB1 10 10 0 0 −2 0 2 4 0 1 2 3 NPX_difference b Significance not significant significant down significant up Number of proteins 1500 1000 500 Glioma Cervical Colorectal AML Lung Prostate Endometrial Ovarian CLL Myeloma Breast DLBCL 0 Fig. 2 | Differential expression analysis. a Volcano plots summarizing the differential expression results for AML, colorectal, glioma, and ovarian cancer. Corresponding results for all 12 cancers are shown in Fig. S2. P-values are calculated using a two-sided t-test, with Benjamini-Hochberg multiple hypothesis correction. b Barplot showing the number of proteins signiﬁcantly upregulated, signiﬁcantly downregulated, or with no signiﬁcant differential expression for all cancer types. c Upset plot showing the number of upregulated proteins shared by the different cancer types. The top barplot shows the total number of upregulated proteins per cancer. Source data are provided as a Source data ﬁle. AML acute myeloid leukemia, CLL chronic lymphocytic leukemia, DLBCL diffuse large B-cell lymphoma. The training of a glmnet model results in an estimation of the overall importance of each protein to a model (ranging between 0–100%), revealing how many proteins are relevant to the speciﬁc classiﬁcation problem and to which extent. In Fig. 3a, the number of proteins contributing to each cancer classiﬁcation model is shown. Note that many proteins have a relatively high importance score for some of the cancers, including colorectal and lung cancers, while for other cancers, such as the hematological cancers and glioma, relatively few proteins contribute to the classiﬁcation model. This suggests that some of the cancers require a higher number of proteins to be included in the model to classify the cancer samples from the controls. For some cancers, such as glioma, one protein (GFAP) is given a high score with considerably lower scores for the other proteins (<50%), while in other cancers there is a continuum of importance scores, such as AML or colorectal cancer. In Fig. S3a, a heatmap visualization shows the importance score for the 486 proteins that scored high Nature Communications | (2023)14:4308 4 Article https://doi.org/10.1038/s41467-023-39765-y b Colorectal 0 473 proteins 500 0 −2 Lung 367 proteins 500 5.0 2.5 0.0 −2.5 Breast 328 proteins 500 4 3 2 1 0 −1 Cervical 315 proteins 500 GLO1 LTA4H ICAM4 CPB1 PLXNB2 IL1RAP MUC16 GLO1 CD33 CHRDL2 LY75 6 4 2 Prostate Prostate 0 220 proteins 500 DNER PRDX6 DNER BGN IL20 FAP CXCL6 CD34 CDH17 CRTAC1 IL18RAP 2 0 −2 Endometrial Endometrial 203 proteins 500 PLAT PSG1 PLAT VEGFD DNAJB8 HS6ST1 TNFSF10 WFDC2 SERPINA11 DPT ARNT 5.0 2.5 0.0 Glioma NPX 0 Glioma 0 55 proteins 500 GFAP BCAN IGFBP1 ADAMTS13 LAG3 CALCOCO1 ADAMTS16 FKBP4 LAMP2 DLK1 −2.5 GFAP 8 4 0 DLBCL DLBCL 0 43 proteins 500 CXCL9 KLK12 CXCL9 CXCL13 DCXR SERPINA9 NME3 WFIKKN2 ICOSLG LSM1 CELA3A 6 3 0 AML AML 0 33 proteins 500 CD244 5 4 3 2 1 0 −1 CD244 FLT3LG FLT3 MMP9 LCN2 PGLYRP1 TNFSF13B TNFRSF10C RGMA FCGR3B Ovarian Ovarian 0 22 proteins 500 PAEP PAEP CDH3 ICAM3 SSC5D VTCN1 ICAM2 LAMP2 PSPN ADA PRDX6 2.5 0.0 −2.5 −5.0 CLL CLL 0 20 proteins 500 TCL1A 8 TCL1A STC1 FCRL2 CD22 FCER2 CD6 FCRL1 NPM1 APEX1 CD5 4 0 Myeloma Myeloma 500 100 2 0 −2 0 25 50 75 100 Protein importance (%) C ol Protein importance (%) 4 (>25% importance) in at least one of the cancer types by glmnet. Moreover, several proteins scored high (>25% importance) in more than one cancer, as shown in the network visualization revealing relationships between the potential biomarkers in the different cancer types (Fig. S3b). In Fig. 3b, the ten proteins with the highest important score using the glmnet algorithm are shown for each cancer, with examples of boxplots of upregulated proteins for each cancer in Nature Communications | (2023)14:4308 ta l n Br g ea s C er t v P ica En ros l t do ate m et ri G al lio m D a LB C L AM O L va r ia n M CLL ye lo m a 9 proteins 75 CNTN5 CNTN5 SLAMF7 MZB1 QPCT WFIKKN1 ITGB2 CD22 TNFSF13B PLA2G7 ec 0 Lu Protein importance rank PRTG IL5 NEFL PAEP GCG EBI3_IL27 PRTG FOLR3 CCL2 HAGH CST5 Cervical 0 50 CEACAM5 VTCN1 SLAMF7 CEACAM5 TRIM5 MMP12 BCL2L11 PLAT PRDX5 ATP6V1D RUVBL1 Breast 0 25 PRDX5 2 Lung 0 0 c Colorectal WFDC2 PLA2G10 PRDX5 TFRC PADI4 CCL27 LPL FKBP1B PRDX6 ATP5PO or a Fig. 3c. The importance scores for each protein across the 12 cancer types are found in Suppl. data 3. Evaluation of cancer-speciﬁc classiﬁcation models The performance of the cancer classiﬁcation models was subsequently evaluated using the 30% of the data excluded from the model training. In Fig. 4a, the classiﬁcation probabilities for each of the cancer models 5 Article https://doi.org/10.1038/s41467-023-39765-y Fig. 3 | Estimation of protein importance by the cancer classiﬁcation models. a Protein importance rank proﬁles for each cancer model. For each cancer, the ﬁrst 500 proteins in the importance rank are included (y-axis), and the corresponding importance score is shown (x-axis). The total number of proteins with a positive score is indicated for each of the cancers. b Lollipop chart showing the top ten scoring proteins in each cancer model, with the exception of myeloma with only nine positive proteins. c Selected examples of upregulated proteins for each of the cancer types. The colored boxes indicate the cancer type where the protein is upregulated, and gray shading indicates the absence of upregulation. Boxplots summarize the median value, upper and lower hinges corresponding to the ﬁrst and third quartiles, and whiskers indicating the minimum and maximum values within 1.5 times the IQR. Individual data points are presented for each cancer group, with n = 1462, n = 1402, n = 1457, n = 1413, n = 1432, n = 1476, n = 1402, n = 1432, n = 1462, n = 1389, n = 1389, and n = 1477, for PRDX5, CEACAM5, PRTG, GLO1, DNER, PLAT, GFAP, CXCL9, CD244, PAEP, TCL1A, and CNTN5, respectively. Source data are provided as a Source data ﬁle. AML acute myeloid leukemia, CLL chronic lymphocytic leukemia, DLBCL diffuse large B-cell lymphoma. DLBCL Myeloma Lung Colorectal Glioma Prostate Breast Cancer CLL Cancer AML Cervical Endometrial Ovarian 1.00 0.75 0.50 0.25 Control Cancer Control Cancer Control Cancer Control Control Control Cancer Control Cancer Control Cancer Control Cancer Control Cancer Control Control Cancer 0.00 Cancer Probability_cancer a True_class b Sensitivity AML CLL DLBCL Myeloma AUC: 1 AUC: 0.82 Lung Colorectal Glioma Prostate Breast Cervical Endometrial Ovarian 1 AUC: 1 0 0 10 10 AUC: 0.99 10 AUC: 0.95 10 AUC: 0.99 10 AUC: 0.98 10 1−Specificity AUC: 0.91 10 AUC: 0.89 10 AUC: 0.9 10 AUC: 0.88 10 AUC: 0.95 10 1 c Control 0 22 0 22 6 16 0 11 11 77 2 64 3 41 6 29 10 40 3 27 11 19 5 35 Cancer 15 0 14 0 14 2 10 1 74 6 64 2 41 2 44 4 40 5 22 8 30 0 35 5 Control Cancer Control Cancer Control Cancer Control Cancer Control Cancer Control Cancer Control Ovarian Cancer Endometrial Control Cervical Cancer Breast Control Prostate Cancer Glioma Control Colorectal Cancer Lung Control Myeloma Cancer DLBCL Control CLL Cancer True_class AML Predicted_class Fig. 4 | Performance of the classiﬁcation models for each cancer on the test set. a Cancer probabilities for samples in the test set per cancer. The optimal probability cutoffs are indicated with a dashed gray line. b ROC curves and corresponding AUC. The sensitivity and speciﬁcity corresponding to the optimal probability cutoff is marked with an x. c Confusion matrices summarizing the classiﬁcation results for each cancer at the given probability cutoff. The optimal probability cutoff was calculated using the Youden method. Source data are provided as a Source data ﬁle. AML acute myeloid leukemia, CLL chronic lymphocytic leukemia, DLBCL diffuse large B-cell lymphoma. are summarized. For each cancer model, we show the probability of the plasma sample in the test set to come from the speciﬁc cancer type. We found that the machine learning models can separate samples between all the speciﬁc cancers with area under the curve AUC16 ranging between 0.8 and 1 (Fig. 4b). Particularly high conﬁdence was observed for three of the immune cell-related cancers: AML, CLL, and myeloma, all having AUC of 0.99–1. To investigate the sensitivity and speciﬁcity further, a confusion matrix17 was created based on the probabilities estimated on the test set (Fig. 4c), with a probability cutoff calculated according to the Youden method18. The results suggest relatively high speciﬁcity and sensitivity across all cancers, with largest number of false positves for lung, endometrial, and breast cancers. However, the low sample size in general in the test set reinforces the need to validate the classiﬁcation models in larger cohorts in the future. In this analysis, all proteins were used as input to the model to classify the cancer types. However, to investigate the impact of using less proteins, we analyzed the classiﬁcation power using different numbers of proteins as input data to the model. In Fig. S4a, the receiver operating characteristic (ROC) plots for each cancer using all proteins as input (n = 1463) were compared with using only the most important proteins for each cancer, including 3, 10, 50, and 200 proteins. The AUC and accuracy for each of the 12 cancers differs quite signiﬁcantly as summarized in the radar plots (Fig. S4b, c) demostarting much higher AUC when using 50 or more proteins as input to the classiﬁcation models for most of the cancers, although some cancers, such as AML, myeloma, and glioma, only need a few proteins to obtain high AUC scores. Additional performance scores are available in Suppl. data 4. In conclusion, this demonstrates the value of including many proteins in the classiﬁcation model to gain higher conﬁdence for some of the cancers. Nature Communications | (2023)14:4308 Selection of a panel with cancer-speciﬁc proteins Combining the previous results, we sought to identify a panel of proteins based on the ranking from the glmnet models and relevant to each of the analyzed cancers. The following inclusion criteria were used: (i) proteins with more than 50% overall importance as indicated by the cancer classiﬁcation models, (ii) proteins identiﬁed as upregulated by differential expression analysis, and (iii) at least three proteins per cancer, which for three cancers (glioma, myeloma, and ovarian cancer) resulted in the inclusion of one or two proteins below the 50% cutoff, respectively. Based on these criteria, we ended up with a panel 6 Article of 83 proteins (Fig. 5a), which are listed in Suppl. data 5 along with the results from the classiﬁcation models and differential expression. Lung- and prostate cancer contributed to the largest number of proteins in the panel, 18 and 14, respectively, whereas only three protein targets each were selected for AML, glioma, myeloma, and ovarian cancer. In Fig. 5b, the average plasma levels of the 83 selected protein members of the panel are visualized across all cancer types. Most of the selected proteins had a higher level in only one cancer, while some had high protein levels in multiple cancers. For example, CXADR-like membrane protein (CLM), selected to identify endometrial cancer, also showed elevated plasma levels in myeloma patients. Only two of the proteins were given a high importance score (> 50%) by the classiﬁcation model in more than one cancer. Both FKB prolyl isomerase 1B (FKBP1B) and peroxiredoxin 5 (PRDX5) had higher plasma levels in lung- and colorectal cancer as compared to all the other cancers and were also selected independently by the models for both of these cancer types. Interestingly, FKBP1B is involved in immunoregulation and protein folding and has previously been linked to colorectal cancer19 but not to lung cancer. Similarly, PRDX5 has an antioxidant function in normal and inﬂammatory conditions and several other proteins of the peroxiredoxin family have been linked to lung and colorectal cancers in transcriptomics analysis of cancer cell lines20,21. Classiﬁcation of the pan-cancer cohort based on the selected protein panel Next, we aimed to assess whether a multiclass classiﬁcation model based on the selected protein panel could result in an accurate classiﬁcation of samples of the different cancer types. Here, a glmnet model was built using all previous cancer samples from the training set and the performance was estimated on all cancer samples on the test set, looking at the ability of the model to score each sample with a probability to belong to each of the cancer types. In order to explore the impact of including different number of proteins, we built four different multiclass classiﬁcation models based on a different selection of proteins: (i) all proteins (n = 1463), (ii) those selected in the panel (n = 83), (iii) the three most important proteins per cancer (n = 36) and (iv) the single most important protein per cancer (n = 12), and we evaluated the performance in each setting. Comparative ROC analyses were performed for each cancer type in which the speciﬁcity/sensitivity measured as AUC was determined for different number of proteins (Fig. S5). The results (Fig. 5c) show that the panel of 83 proteins can identify the right cancer with relatively high selectivity and sensitivity with AUC ranging between 0.93 and 1 for all cancer types. The analysis using all proteins gave only slightly better results, while the use of only the top 3 proteins in each cancer gave somewhat less reliable results. The lowest performance scores were obtained when using only the top protein for each of the 12 cancers. Additional performance scores for the different protein numbers are summarized for each of the cancers in Suppl. data 6. The results demonstrate that a panel with only a small number of protein markers can achieve similar classiﬁcation reliability as using all proteins. Although based on a small sample size in the test cohort, the results suggest that a panel of less than hundred proteins yields highly promising results (AUC) for simultaneous identiﬁcation of all 12 cancer types. As shown in Fig. 5d, there is some overlap in the classiﬁcation results for some of the cancers, such as lung and colorectal cancer, while for other cancers, such as glioma and immune-related cancers, the samples have a high probability of being correctly classiﬁed. Comparative analyses between healthy individuals and patients with cancer An important question is how well the protein signature identiﬁed on the pan-cancer study can distinguish cancer patients from healthy Nature Communications | (2023)14:4308 https://doi.org/10.1038/s41467-023-39765-y individuals. To investigate this, for each of the 12 cancer types, a cancer classiﬁcation model was built but this time including 74 healthy individuals previously studied as part of a wellness study14,22,23 as the control group instead of all of the other cancers. As described above, each of the cancers contributed to the panel with a different number of proteins3–18 and these models were based only on these speciﬁc proteins, i.e., the AML model was based on the three AML-speciﬁc proteins included in the panel. We again used 70% of the cancer and healthy samples as the training set and the remaining 30% to test the performance of the model, being the cancer samples in the train and test set the same as before. The results for four of the cancers are shown in Fig. 6a–d and all cancers in Fig. S6. For CLL (Fig. 6a), the model can distinguish cancer patients from healthy controls using the six proteins selected for CLL with total accuracy (AUC = 1). Similarly, the same analysis for colorectal- (Fig. 6b), ovarian- (Fig. 6c), and lung cancer (Fig. 6d), respectively, shows high accuracy with all AUC results above 0.83 when using the corresponding proteins, demonstrating that the selected cancer signatures can distinguish cancer patients from healthy individuals with relatively high accuracy. Additional performance metrics are provided for all models in Suppl. data 7. These results suggest that the protein panel is suitable to classify patients with the analyzed cancer types from each other as well as distinguish cancer patients from healthy individuals (without a cancer diagnosis). However, caution is required since the wellness panel was sampled and analyzed in a separate study, thus sample bias can not be ruled out. Stratiﬁcation of patients with cancers of different stages An important quest in the ﬁeld of Cancer Precision Medicine is to aid clinicians to indicate the stage of the cancer. For some cancers in this study, a relatively large number of patients had stage data available and therefore we investigated whether the protein panel could stratify patients into stages for these cancer types. In Fig. 6e, we show four examples of proteins where we ﬁnd an association between the plasma levels and disease stage, including (i) CD22 used to identify CLL patients; (ii) galectin 4 (LGALS4) in colorectal cancer patients; (iii) arbhydrolsase domain containing 14B (ABHD14B) in lung cancer patients; and (iv) the ovarian cancer biomarker Progestagen associated endometrial protein (PAEP). These examples demonstrate the possibility to perform stage stratiﬁcation simply by analyzing selected plasma protein levels, but further analyses in additional cohorts are needed to demonstrate the validity of the protein panel for cancer stage stratiﬁcation. Classiﬁcation of early-stage cancer samples One of the most important objectives in the ﬁeld of cancer precision medicine is to identify cancer at an early stage to provide successful therapeutic intervention and to improve patient survival. To assess the ability of the protein panel to distinguish early-stage cancer from healthy individuals, we stratiﬁed the ROC analysis into the early (stage 1 and 2) and advanced (stage 3 and 4) stages for colorectal and lung cancer, where we have the largest sample sizes for patients across stages (Fig. S7 and Fig. 6f, g). In Fig. 6f (top), the cancer probability score for lung cancer patients across stages is compared with the corresponding score for healthy individuals. A clear difference in score is shown for most samples and the AUC score (Fig. 6f, bottom) for separating early-stage colorectal cancer patients from healthy individuals is 0.80. Similarly, for the early-stage lung cancer patients, a clear difference in the estimated probabilities is observed between earlystage cancer and healthy samples by the protein panel model (Fig. 6g, top), and the corresponding AUC score (Fig. 6g, bottom) is 0.79. In both cases, there is no signiﬁcant difference between the model performance on early and advanced stage cancer patients. This highlights the potential of the selected biomarker panel to identify early-stage colorectal and lung cancer patients, although more in depth analysis in independent cohorts is warranted. 7 Article https://doi.org/10.1038/s41467-023-39765-y a FLT3 Cancer AML CLL DLBCL Myeloma Lung Colorectal Glioma Prostate Breast Cervical Endometrial Ovarian TCL1A SERPINA9 CD6 CD22 AML CLL DCXR CXCL13 DLBCL FCER2 TNFSF13B FCRL2 CD244 STC1 CXCL9 TRAF2 ACP5 BCL2L11 ABHD14B CNTN5 ADAMTS8 SCGB3A2 CXCL6 GZMB MTPN LSM1 MMP12 Myeloma ICAM4 IL18RAP Prostate LBP Lung MLN CRTAC1 SPINK5 CXCL17 ANXA11 BPIFB1 40 CD34 FAP CEACAM5 IL20 BCAN GFAP Glioma FKBP1B PRDX5 60 CDH17 COL9A1 SLAMF7 MZB1 80 DNER SFTPD TFPI2 Importance 100 F3 MSMB LAMP3 ADAMTS13 BTC TNFSF10 AREG LAP3 OXT Colorectal PRTG Breast DPT PRDX6 AFP Endometrial CLEC7A SELE LGALS4 SDC4 LPL CLMP GLO1 HSD11B1 TFRC PADI4 Scaled expression 1 PLAT CCL20 CRNN 0.8 FCGR3B 0.6 Cervical AGER CDH3 Ovarian 0.4 CHRDL2 PAEP LYPD3 0.2 MFAP5 0 SSC5D Importance b 100 75 50 25 0 Cancer CLL Cervical Prostate Breast Endometrial Ovarian Colorectal Lung Myeloma Glioma AML DLBCL SERPINA9 DCXR CXCL13 CXCL9 TNFSF13B FLT3 CD244 ADAMTS13 BCAN GFAP MZB1 SLAMF7 CNTN5 COL9A1 TFPI2 ACP5 LBP SCGB3A2 MTPN SFTPD ANXA11 MLN CXCL17 ABHD14B BPIFB1 LSM1 BCL2L11 MMP12 CEACAM5 AREG LAP3 SELE CCL20 LGALS4 PRDX6 FKBP1B PADI4 TFRC PRDX5 SSC5D CDH3 PAEP AFP CLMP CLEC7A DPT TNFSF10 PLAT MSMB LPL BTC HSD11B1 OXT SDC4 LAMP3 PRTG ICAM4 F3 SPINK5 GZMB ADAMTS8 TRAF2 IL18RAP CRTAC1 CDH17 CD34 CXCL6 FAP IL20 DNER LYPD3 MFAP5 AGER CRNN FCGR3B CHRDL2 GLO1 CD6 FCER2 CD22 FCRL2 STC1 TCL1A Method c Ovarian AML protein panel (n= 83) 0.95 Endometrial CLL top 3 (n= 36) AUC 0.90 Cancer d all proteins 1.00 AML CLL DLBCL Myeloma Lung Colorectal Glioma Prostate Breast Cervical Endometrial Ovarian top 1 (n= 12) Endometrial 0.85 Cervical Breast DLBCL Cervical 0.80 Prostate 0.75 DLBCL CLL Breast Myeloma Myeloma Cancer probability AML Ovarian 0.8 Prostate Lung Glioma Colorectal Glioma 0.6 Colorectal 0.4 Lung 0.2 Cancer Samples Fig. 5 | Pan-cancer protein panel and multiclassiﬁcation of the pan-cancer test cohort. a Nework visualization of proteins included in the panel. Protein nodes are colored according to the importance score in the speciﬁc cancer. b Summarized expression proﬁles of panel proteins across the cancer types. For each protein, the scaled expression is calculated as the average NPX per cancer which is rescaled between 0 and 1. c Summary of the AUC for the different cancers based on models run with four different protein selections. Nature Communications | (2023)14:4308 “Top 1” and “top 3” refers to the one or three proteins with the highest importance scores for each of the individual 12 cancers models ran in the previous step, respectively, resulting in sets of 12 and 36 proteins as input to the multiclassiﬁcation model. d Cancer probabilities for samples in the test set in the pan-cancer classiﬁcation model using the panel of 83 proteins. Source data are provided as a Source data ﬁle. AML acute myeloid leukemia, CLL chronic lymphocytic leukemia, DLBCL diffuse large B-cell lymphoma. 8 Article https://doi.org/10.1038/s41467-023-39765-y c 0.25 0.50 0.25 Healthy 0.8 0.6 0.4 Colorectal Healthy True_class Ovarian True_class 1 Stage 0 0 1−Specificity 1 2 CLL − CD22 p = 1.1e−07 p = 1.7e−14 4 3 4 Healthy f Colorectal − LGALS4 p= 0.016 p = 0.032 5 1 4 3 2 2 1 0 1−Specificity 1−Specificity g Colorectal Lung 1.00 Cancer probability 1−Specificity e AUC 0.85 0 0 1 Healthy True_class AUC 0.83 0 0 Lung Healthy 1 Sensitivity AUC 0.85 0 0.70 1 Sensitivity AUC 1 0.75 True_class 1 Sensitivity 1 0.80 Cancer probability CLL 0.75 Cancer probability 0.50 d 1.0 Sensitivity 0.75 1.00 Cancer probability Cancer probability b 0.75 0.50 0.25 0.80 0.75 0.70 NPX 0 Lung − ABHD14B p = 0.0019 p = 0.0078 Ovarian − PAEP 5.0 p = 5.1e−10 p = 0.00022 He al th y Ea r ly st Ad ag va e nc ed st ag e 1 0 He al th y Ea r ly st Ad ag va e nc ed st ag e Cancer probability a 1 1 Sensitivity 2.5 2 1 0.0 0 −2.5 −1 AUC: 0.88 AUC: 0.8 p = 0.38 e d 0 AUC: 0.88 AUC: 0.79 p = 0.26 0 1 0 1−Specificity 1 1−Specificity Ad va n ce r ly Ea st st ag ag e y lth ea H ce va n Ad Ea r ly d st st ag ag e y lth ea H e 0 Sensitivity 3 Stage Advanced stage Early stage Fig. 6 | Classiﬁcation of cancer samples against a healthy cohort based on the selected protein panel. Model results showing the cancer probability for cancer and healthy individuals from the test set (top) and the ROC curve with AUC score (bottom) for a CLL, b colorectal cancer, c ovarian cancer, d lung cancer. e Protein levels of four different proteins for cancer samples stratiﬁed into early (stage 1–2) or advanced (stage 3–4) stages as well as the healthy cohort. Boxplots summarize the median value, upper and lower hinges corresponding to the ﬁrst and third quartiles, and whiskers indicating the minimum and maximum values within 1.5 times the IQR. Individual data points are presented for each cancer group, with n = 327, n = 114, n = 289, and n = 200, for ABHD14B, CD22, LGALS4, and PAEP, respectively. P-values are calculated using a two-sided t-test to compare the group means. Model results showing the cancer probability for cancer samples stratiﬁed by stage (early or advanced) and healthy individuals (top) and the ROC curve with AUC score (bottom) for f colorectal cancer and g lung cancer. The p-values are calculated using unpaired DeLong’s test. Source data are provided as a Source data ﬁle. CLL chronic lymphocytic leukemia. Discussion different cancer types, and the results for the individual protein targets are presented in the open access Human Disease Blood Atlas (v22.proteinatlas.org/humanproteome/disease). We have used the data to identify a set of proteins associated with each of the cancers studied using machine learning. A classiﬁcation model based on a restricted set of 83 upregulated proteins was built and the accuracy of the classiﬁcation of pan-cancer samples was evaluated in a separate test cohort. It is interesting to observe the dramatic increase in classiﬁcation performance when using the protein panel (n = 83) as compared to the use of only the top protein marker Here, we describe a strategy based on next-generation plasma proﬁling to explore the cancer proteome signatures by comprehensively exploring the protein levels in patients representing most major cancer types. The study describes and compares the plasma proteome across all major cancers using a multiplex assay platform. The platform allows thousands of proteins to be quantitatively analyzed using only a few microliters of blood opening up new opportunities for Precision Cancer Medicine. The plasma levels of each individual protein have been determined for more than 1400 cancer patients representing 12 Nature Communications | (2023)14:4308 9 Article for each cancer. This demonstrates the added advantage of using a panel of blood proteins, as exempliﬁed by patients with breast cancer for which individual markers are relatively unselective, but the classiﬁcation model using multiple proteins gave a potentially much more accurate classiﬁcation. The panel allowed the stratiﬁcation of plasma samples from most cancer types with high sensitivity and speciﬁcity and it was also able to detect patients with early disease, as exempliﬁed by early-stage patients in lung and colorectal cancers. However, in this context it is important to point out that the test cohorts used for the various cancer validations were relatively small sized and additional validation cohorts are needed to conﬁrm the validity of each protein in the classiﬁcation model. For example, in two earlier studies of blood from glioma patients24,25, only a few upregulated proteins were found and none of these were signiﬁcantly upregulated here. This demonstrates the importance of several independent studies before establishing a pan-cancer protein panel. The performance of the classiﬁcation model and the utility of the protein panel need to be validated in independent cohorts before consideration for clinical use. Of particular importance is validation in a large background of non-diseased individuals to establish the breadth of false positives. It is also desirable to have the results validated by independent technical platforms, such as sandwich26, mass spectrometry27, or Somascan12 assays. The proteins used in the classiﬁcation models include well-known markers for some of the cancers, but also proteins with, to our knowledge, no previous connection to cancer. It is noteworthy that the cancer-speciﬁc elevation of the panel proteins in blood plasma could reﬂect several underlying causes, such as an increase of leakage or secretion from the tumor or surrounding tissue itself, or due to the bodily response to the tumor. However, a more in-depth analysis is needed to explain the causal relationship between the proteins and the respective cancer types. As mentioned above, it is noteworthy that individual variation of protein plasma levels in both healthy and disease states calls for validation of potential biomarkers using an independent assay platform as well as using independent patient cohorts. Since even a highly selective assay used in a population screening still could generate a large number of false positives, when millions of individuals are screened for presence of cancer, it is particularly important to rule out false positives, which could cause considerable and unnecessary stress for the individual. It is thus important for any screening procedure to be followed up by independent validation, such as mammography for breast cancer, blood in feces and/or colon spectroscopy for colorectal cancer, radiological examination, and/or tissue-based analysis of biopsies for many other cancers. This makes it possible to combine initial and broad population screening with less cost-effective assay platforms to establish the diagnosis of patients with cancer. It is of course interesting to expand the analysis presented here to add other frequent and important cancers to the pan-cancer strategy, such as liver, kidney, and pancreatic cancers. Similarly, it is also valuable to compare the cancer proﬁles reported here with plasma proﬁles from patients having other diseases. Our aim in the near future is to be able to report such studies as part of the open access Human Disease Blood Atlas resource for patients in the ﬁeld of cardiovascular, autoimmune, neurological, and infectious disease, among others. It is also interesting to add more protein targets to the analysis and such larger panels are now available for exploration by both the PEA13 technology, which currently can analyze 3000 targets, and the Somascan platform12, including 7000 targets. In summary, we describe a strategy for exploration of protein proﬁles in blood with the ultimate objective to allow simultaneous identiﬁcation of cancers using few microliters of blood. Since the analytical platform used here can be combined with simple sample collection formats such as dried blood spots, cost-effective pan-cancer population screening can be foreseen in which a panel of proteins are Nature Communications | (2023)14:4308 https://doi.org/10.1038/s41467-023-39765-y used to identify multiple cancer types in a single assay. Such population screenings could be organized to allow the discovery of cancers early and thus help clinicians to start treatment of cancer patients at earlier stages. It is our hope that the data in the open access Human Blood Disease database will be a valuable resource for such future efforts in the ﬁeld of Cancer Precision Medicine. Methods The research complies with all relevant ethical regulations. The pancancer study was approved by the Swedish Ethical Review Authority (EPM dnr 2019-00222). The research was in line with donor consents in U-CAN (28631533, EPN Uppsala 2010-198 with amendments), and all participants provided written informed consent. The Wellness healthy cohort study was approved by the Ethical Review Board of Goteborg, Sweden (registration number 407-15), and all participants provided written informed consent. The study protocol conforms to the ethical guidelines of the 1975 Declaration of Helsinki. The pan-cancer study cohort Plasma samples from 1477 cancer patients were obtained from the U-CAN biobank which collects samples from consenting patients diagnosed at the Akademiska hospital in Uppsala as part of the clinical routine and with a high degree of standardization15. Plasma samples were obtained from treatment-naïve patients taken around the time of their diagnosis. Plasma was prepared from whole blood by centrifugation at 2.400 × g for seven minutes at room temperature, after which the plasma was aliquoted into several 220 µl vials and immediately frozen for long-term storage at −80 °C. Exclusion criteria included any concurrent or previous cancer within the last ﬁve years, and arm-to-freezer time exceeding 360 min. Diagnosis, stage, age, sex and other variables were obtained from the U-CAN database and the patient’s clinical records. The Wellness healthy cohort Plasma samples from healthy individuals (39 males and 35 females) were selected from the ﬁrst sampling time point of the Swedish SciLifeLab SCAPIS Wellness Proﬁling (S3WP) study as described previously22,23. The selection process aimed to include patients with the most complete data available for all sampling time points across multiple datasets. The S3WP program includes longitudinal samples from 101 healthy individuals aged 50–64, recruited from the prospective observational Swedish CArdioPulmonary bioImage Study (SCAPIS) sampled at six different time points during a 2-year period. Measurement of protein levels The protein levels of all 1477 cancer samples were measured in plasma using the Olink Explore PEA technology13, which uses antibodybinding capabilities to detect the levels of 1463 targets in plasma coupled with next-generation sequencing (NGS) readout. The Wellness healthy cohort had previously been analyzed in the Olink Explore as described in Zhong et al.14 and 16 samples from this study were included in the cancer study to allow for bridging between the results for the two cohorts. The Olink Explore 1536 platform includes four different panels: the Olink Explore 384 Cardiometabolic Reagent Kit (Panel lot number: B04413, Product number: 97700/97300), the Olink Explore 384 Inﬂammation Reagent Kit (Panel lot number: B04411, Product number: 97500/97100)), the Olink Explore 384 Oncology Reagent Kit (Panel lot number: B04412, Product number: 97600/ 97200)), and the Olink Explore 384 Neurology Reagent Kit (Panel lot number: B04414, Product number: 97800/97400). A total of 1472 proteins were targeted using speciﬁc antibodies, including 1463 unique proteins as well as controls. Each antibody was conjugated separately with two complementary probes, and distributed in four separate 384-plex panels, focused on the four disease areas: cardiovascular, inﬂammation, neurology, and oncology. In brief, the PEA 10 Article workﬂow started with an overnight incubation to allow the conjugated antibodies to bind to the corresponding proteins in the samples. The incubation was followed with an extension and preampliﬁcation step when the hybridization and extension of complementary probes takes pace. The extended DNA was then ampliﬁed by PCR and further indexed to allow the preparation of libraries, which were then sequenced using Illumina’s NovaSeq platform. The counts obtained from the sequencing run were subjected to a quality control and normalization procedure. Here, internal controls introduced at different steps were used to reduce intra-assay variability. These include an incubation control consisting of a non-human antigen measured with the same technology, an extension control consisting of an antibody conjugated to a unique pair of probes which are in proximity and is expected to produce a positive signal, and a control in the ampliﬁcation step consisting of a double-stranded DNA sequence which is expected to produce a positive signal independent of the ampliﬁcation step. Additionally, external controls such as negative control (buffer sample) and plate controls (pool of plasma) were used to establish a limit of detection (LOD) and adjust levels between plates, respectively. Finally, two known samples acted as sample controls to calculate the precision of the measurements. After quality control and normalization, the data was provided in the relative protein quantiﬁcation unit Normalized Protein eXpression (NPX) unit, which is on a log2 scale. The NPX score is calculated based on matched counts from the sequencing data and a high NPX value can be interpreted as a high protein level. All measurements that failed the internal quality control and thus reported with a warning were excluded from the dataset. Three of the protein assays (IL6, CXCL8, and TNF) were included in all four panels for quality assurance purposes and were used as technical controls to investigate the quality of the samples using the interpanel correlation between all NPX values above the give limit of detection range (LOD)13. In addition, the coefﬁcient of variation (CV) of each assay was calculated as a measure of the technical variance within a plate (IntraCV) and across several plates (InterCV), based on the pooled plasma sample run in duplicate on each plate in the Olink Explore setup, following the procedure as presented in Wik et al.13. Differential expression analysis The differential protein expression was assessed using a two-sided t-test coupled with Benjamini-Hochberg multiple hypothesis correction28, with a signiﬁcance threshold of 0.05 for adjusted p-values. The adjusted p-values and difference in average expression per group were summarized in volcano plots for each of the analyzed cancers. Enrichment analysis of upregulated protein sets were performed using the clusterProﬁler package (version 3.18.1)29. The enricher() function in clusterProﬁler was used to perform overrepresentation analysis against the biological annotations from Gene Ontology (GO) biological processes (BP)30, with subsequent p-value adjustment using the BenjaminiHochberg method28 and using adjusted p-value < 0.05 as threshold for signiﬁcance. Disease classiﬁcation models Classiﬁcation models were built in three different settings: (1) to classify patients with one cancer from patients with other cancers, (2) to classify all cancers simultaneously, and (3) to classify patients with a speciﬁc cancer from healthy samples. All models were built using the caret R package (v 6.0.90)31. First, the cancer and wellness data were split in 70% for training purposes and 30% for testing purposes using the createDataPartition() function in caret, generating a training and testing pool of samples. For all models described, the test and train sets were composed of a subset of the training and testing pool sets, to avoid data leakage32,33. In the ﬁrst setting, the training set for the classiﬁcation of a speciﬁc cancer was composed of all samples from that cancer in the Nature Communications | (2023)14:4308 https://doi.org/10.1038/s41467-023-39765-y training pool and a balanced equally sized subset of samples from all other cancers acting as controls. In the same manner, the test set was composed of all samples from that cancer in the testing pool and matching number of controls representing all other cancers. For cancers consisting of male or female samples exclusively, only samples from the same sex were used as controls. In the multiclassiﬁcation setting, all cancer samples in the training and testing pools, respectively, were combined into two large set of samples used for training and testing. Finally, when classifying patients from one cancer against the healthy cohort, all samples from that cancer and healthy patients were used, with samples in the training pool being used for training the model and samples in the testing pool being used for testing. Again, only male or female samples were used as control for male and female-speciﬁc cancers, respectively. Before the model training, the data with missing values due to failed quality control was imputed using the preProcess() function in caret with the “knnImpute” method. Batch correction using the removeBatchEffect() function in the limma package (version 3.46.0)34 was performed to correct for potential batch effects between the cancer and healthy samples. The cancer prediction models were built on the selected training sets using the function train() in caret, and glmnet was used as the classiﬁcation algorithm16. A 5-fold cross-validation scheme and built-in parameter tuning were applied to the models. The contribution of each protein to the model was retrieved using the varImp() function in the caret package. When indicated, the data used as input to the model was restricted to a subset of proteins, which was guided by the feature importance ranking obtained when training the model using all proteins and thus based solely on training data. The predict() function in caret was used to estimate the class probabilities for the samples in the test set, which were not part of the training of any of the models and allowed an unbiased estimation of model performance. ROC analyses were performed to assess the sensitivity and speciﬁcity of the classiﬁcation, summarized as AUC scores. The pROC R package (v 1.18.0) was used for binary classiﬁcations and multiROC (v 1.1.1) was used for multiclass classiﬁcation. Statistical signiﬁcance for differences in AUC were calculated using the DeLong test35 implementation in the pROC package, using p-value < 0.05 as the threshold for signiﬁcance. Additionally, sensitivity, speciﬁcity, positive predictive value (PPV), negative predictive value (NPV), precision, recall, and F1 scores were calculated. For the binary classiﬁcations, these metrics were based on a probability threshold estimated using the coords() function in pROC with the Youden index18. Data visualization Data visualization was performed in R (version 4.0.3)36, using the ggplot2 (version 3.3.5)37, ggbeeswarm (version 0.6.0)38, ggpubr (version 0.5.0)39, ggraph (version 2.0.5)40, ggrepel (version 0.9.1)41, ggridges (version 0.5.3)42, ggplotify (version 0.1.0)43, igraph (version 1.2.6)44, pheatmap (version 1.0.12)45, patchwork (version 1.1.1)46, tidygraph (version 1.2.0)47, and UpSetR (version 1.4.0)48 packages. For the heatmap visualization, data was rescaled to a 0–1 scale and hierarchical clustering was performed using the “ward.D2” method. The limma R package (version 3.46.0)34 was used to correct for batch differences for the comparison between the U-CAN cancer cohorts and the Wellness healthy cohorts. The ﬁgures were assembled in Afﬁnity designer (version 1.10.0.1127). Reporting summary Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Data availability The normalized U-CAN proteomics data generated in this study have been deposited in the BioStudies database under accession code 11 Article S-BSST935, as well as on the Human Protein Atlas data publication page [https://www.proteinatlas.org/about/publicationdata]. All proteins are also visualized on the individual protein summary pages of the Human Disease Blood Atlas. For the Wellness healthy cohort, the Olink Explore participant-level datasets have been deposited with the Swedish National Data Service [https://snd.gu.se/sv/catalogue/study/ preview/88efa94d-39b3-4a50-8b3b-87b1abedefd4], and the data have been previously published14. Due to patient consent and conﬁdentiality agreements, the datasets can be made available only for validation purposes by contacting snd@snd.gu.se. Data access will be evaluated according to Swedish legislation. Data access for researchrelated questions in the S3WP program can be made available by contacting the corresponding author. Source data are provided with this paper. Code availability All code necessary for the data analysis and visualization is available at https://github.com/buenoalvezm/Pan-cancer-proﬁling49. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Crosby, D. et al. Early detection of cancer. Science 375, eaay9040 (2022). Cronin, K. A. et al. Annual report to the nation on the status of cancer, part 1: National cancer statistics. Cancer 128, 4251–4284 (2022). Ilic, D. et al. Prostate cancer screening with prostate-speciﬁc antigen (PSA) test: a systematic review and meta-analysis. BMJ 362, k3519 (2018). Ladabaum, U., Dominitz, J. A., Kahi, C. & Schoen, R. E. Strategies for colorectal cancer screening. Gastroenterology 158, 418–432 (2020). Yala, A. et al. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nat. Med. 28, 136–143 (2022). N. Cancer Genome Atlas Research. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). Hutter, C. & Zenklusen, J. C. The Cancer Genome Atlas: creating lasting value beyond its data. Cell 173, 283–285 (2018). Zhang, J. et al. The International Cancer Genome Consortium Data Portal. Nat. Biotechnol. 37, 367–369 (2019). I. T. P.-C. A. O. W. G. Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020). Friedman, A. A., Letai, A., Fisher, D. E. & Flaherty, K. T. Precision medicine for cancer with next-generation functional diagnostics. Nat. Rev. Cancer 15, 747–756 (2015). Akbani, R. et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat. Commun. 5, 3887 (2014). Gold, L., Walker, J. J., Wilcox, S. K. & Williams, S. Advances in human proteomics at high scale with the SOMAscan proteomics platform. N. Biotechnol. 29, 543–549 (2012). Wik, L. et al. Proximity extension assay in combination with nextgeneration sequencing for high-throughput proteome-wide analysis. Mol. Cell Proteom. 20, 100168 (2021). Zhong, W. et al. Next generation plasma proteome proﬁling to monitor health and disease. Nat. Commun. 12, 2493 (2021). Glimelius, B. et al. U-CAN: a prospective longitudinal collection of biomaterials and clinical information from adult cancer patients in Sweden. Acta Oncol. 57, 187–194 (2018). Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010). Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006). Schisterman, E. F., Perkins, N. J., Liu, A. & Bondell, H. Optimal cutpoint and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology 16, 73–81 (2005). Nature Communications | (2023)14:4308 https://doi.org/10.1038/s41467-023-39765-y 19. Wang, L., Jiang, X., Zhang, X. & Shu, P. Prognostic implications of an autophagy-based signature in colorectal cancer. Med. (Baltim.) 100, e25148 (2021). 20. Kim, M. K. et al. Patients with ERCC1-negative locally advanced esophageal cancers may beneﬁt from preoperative chemoradiotherapy. Clin. Cancer Res. 14, 4225–4231 (2008). 21. Lu, W. et al. Peroxiredoxin 2 is upregulated in colorectal cancer and contributes to colorectal cancer cells’ survival by protecting cells from oxidative stress. Mol. Cell Biochem. 387, 261–270 (2014). 22. Bergstrom, G. et al. The Swedish CArdioPulmonary BioImage Study: objectives and design. J. Intern Med 278, 645–659 (2015). 23. Tebani, A. et al. Integration of molecular proﬁles in a longitudinal wellness proﬁling cohort. Nat. Commun. 11, 4487 (2020). 24. Holst, C. B. et al. Plasma IL-8 and ICOSLG as prognostic biomarkers in glioblastoma. Neurooncol. Adv. 3, vdab072 (2021). 25. Jaksch-Bogensperger, H. et al. Proseek single-plex protein assay kit system to detect sAxl and Gas6 in serological material of brain tumor patients. Biotechnol. Rep. 18, e00252 (2018). 26. Engvall, E. & Perlmann, P. Enzyme-linked immunosorbent assay, Elisa. 3. Quantitation of speciﬁc antibodies by enzyme-labeled antiimmunoglobulin in antigen-coated tubes. J. Immunol. 109, 129–135 (1972). 27. Kotol, D. et al. Targeted proteomics analysis of plasma proteins using recombinant protein standards for addition only workﬂows. Biotechniques 71, 473–483 (2021). 28. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995). 29. Wu, T. et al. clusterProﬁler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021). 30. Ashburner, M. et al. Gene ontology: tool for the uniﬁcation of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000). 31. Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008). 32. Desaire, H. How (not) to generate a highly predictive biomarker panel using machine learning. J. Proteome Res. 21, 2071–2074 (2022). 33. Palmblad, M. et al. Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics. J. Proteome Res. 21, 1204–1207 (2022). 34. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). 35. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988). 36. R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. http:// www.R-project.org/ (2014). 37. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (SpringerVerlag New York, 2016). 38. Clarke, E. & Sherrill-Mix, S. ggbeeswarm: Categorical Scatter (Violin Point) Plots (2017). 39. Kassambara, A. ggpubr: ‘ggplot2’ Based Publication Ready Plots (2022). 40. Pedersen, T. L. ggraph: an Implementation of Grammar of Graphics for Graphs and Networks (2021). 41. Slowikowski, K. ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’ (2021). 42. Wilke, C. O. ggridges: Ridgeline Plots in ‘ggplot2’ (2021). 43. Yu, G. ggplotify: Convert Plot to ‘grob’ or ‘ggplot’ Object (2021). 44. Csardi, G. & Nepusz, T. The igraph software package for complex network research (2006). 45. Kolde, R. pheatmap: Pretty Heatmaps (2019). 12 Article https://doi.org/10.1038/s41467-023-39765-y 46. Pedersen, T. L. patchwork: The Composer of Plots (2020). 47. Pedersen, T. L. tidygraph: A Tidy API for Graph Manipulation (2020). 48. Gehlenborg, N. UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets (2019). 49. Bueno Álvez, M. buenoalvezm/Pan-cancer-proﬁling: pan-cancerproﬁling (Version v2). Zenodo. https://doi.org/10.5281/zenodo. 8012430 (2023). Additional information Acknowledgements Peer review information Nature Communications thanks Thomas Kislinger, Maximilian Strauss and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. We thank the entire staff of the Human Protein Atlas program and the Science for Life Laboratory (SciLifeLab) for their valuable contributions. We thank Per Eriksson and Lena Beckman for analysis of the Olink data and Camilla Jysky and Lina Dahlberg for collection of clinical samples. This work was supported by WCPR grant from Knut and Alice Wallenberg Foundation, the SciLifeLab & Wallenberg Data Driven Life Science Program (grant: KAW 2020.0239, M.U.), and Swedish Research Council Grant 2020-06175 (M.U.). The computations and data handling were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX partially funded by the Swedish Research Council through grant agreement no. 2018-05973 (M.U.). Author contributions M.U. conceived and designed the study. F.E., P.E., T.S., F.P., G.E., H.L., Ma.H., G.H., K.S., M.E., O.S., and Mi.H. collected and contributed samples to the study. M.B.A., M.K., F.E., A.M., W.Z., L.F., and M.U. performed the data analysis. E.L., N.R., T.A., M.Å., J.N., and U.G. processed the samples and performed the PEA analysis. Kv.F. and M.Z. created the database portal. M.U., L.F., and M.B.A. drafted the manuscript. All authors read and approved the ﬁnal manuscript. Funding Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41467-023-39765-y. Correspondence and requests for materials should be addressed to Mathias Uhlén. Reprints and permissions information is available at http://www.nature.com/reprints Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/. Open access funding provided by Royal Institute of Technology. © The Author(s) 2023 Competing interests The authors declare no competing interests. Nature Communications | (2023)14:4308 13

国外大学毕业证办理佛罗里达理工大学本科学位证书美国FIT硕士研究生学历证书《Q微信/1954292140》佛罗里达理工大学学历学位证书办理美国学费发票《美国大学毕业证办理佛罗里达理工大学文凭美国FIT毕业文凭外壳》、定制美国FIT成绩单样板美国FIT学位证书、《网上买佛罗里达理工大学学历证书毕业证》、哪里买美国美国FIT学位证《办理佛罗里达理工大学毕业证书》、办理佛罗里达理工大学本科文凭急速办理美国美国FIT Academic Transcript佛罗里达理工大学Diploma。全套留学文凭办理=《办理佛罗里达理工大学毕业证电子图》【Q/微1954 292 140】《美国FIT学位证书案例永久可查》1:1原版毕业证+原版成绩单+真实使馆证明+真实教育部认证！我们会根据您的实际情况，帮您选取最合适的方案，完善申请资料，填写申请并追踪进度，在最短的时间内帮你完成申请，专业解决各国留学生毕业证成绩单学历学位认证难题。八年从业经验《美国本科文凭急速办理佛罗里达理工大学毕业证电子图》【Q/微1954292140】《永久可查佛罗里达理工大学学位证书案例》、专业指导、私人定制、倾心为您解决留学毕业回国各种疑难问题。 <1>教育部学历学位认证服务: 做到真实永久存档，网上轻易可查，绝对对客户的资料进行保密，登录核实后再付款。中国教育部留学服务中心认证（中国）：《国外学历学位认证》 <2>为什么您的学位需要在国内进一步认证？如果您计划在国内发展，那么办理国内教育部认证是必不可少的。事业性用人单位如银行，国企，公务员，在您应聘时都会需要您提供这个认证。其他私营、外企企业，无需提供！办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，帮您快速整合材料，让您少走弯路。【业务选择办理准则】实体公司一、工作未确定回国需先给父母、亲戚朋友看下文凭的情况。办理一份《办理佛罗里达理工大学毕业证电子图》【Q/微1954292140】毕业证文凭即可；二、回国进私企、外企、自己做生意的情况这些单位是不查询毕业证真伪的而且国内没有渠道去查询国外文凭的真假也不需要提供真实教育部认证。鉴于此办理一份《美国本科文凭急速办理佛罗里达理工大学毕业证电子图》【Q/微1954292140】毕业证即可；三、进国企银行事业单位考公务员等等这些单位是必需要提供真实教育部认证的办理教育部认证所需资料众多且烦琐所有材料您都必须提供原件我们凭借丰富的经验快捷的绿色通道帮您快速整合材料让您少走弯路。以上联系方式敬请保留，以备急用，诚心合作，真诚制作！！欢迎新老客户咨询办理！！留信网认证《办理佛罗里达理工大学毕业证电子图》【Q/微1954 292 140】《美国FIT学位证书案例永久可查》国内认可度怎么样？可以用来入职吗？总体来说，留信网的认证一定程度上也能协助刚归国的留学生找工作。让企业更为直观的了解留学生个人的专业能力，及其对工作的胜任能力。显示认可度是比前几年更高了。留信网作为在中国学位认证评定的另一个平台，对比传统的认证更为现代化、综合性，并且许多就业企业和单位也认可留信网认证出的结果，为企业在筛选专业人才时提供了一个不错的参考，总体上还是要看企业对于人才选拔上的要求,如果留服认证没办法正常申请办理《美国本科文凭急速办理佛罗里达理工大学毕业证电子图》【Q/微1954 292 140】《永久可查佛罗里达理工大学学位证书案例》，这种情况下，就可以考虑申请办理留信认证《美国FIT毕业证电子图办理》【Q/微1954 292 140】《急速办理佛罗里达理工大学本科文凭》。

Log In

Next generation pan-cancer blood proteome profiling using proximity extension assay