Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach

Rita De Sanctis; Alessandro Viganò; Alessandro Giuliani; Alessandro Gronchi; Antonino De Paoli; Pierina Navarria; Vittorio Quagliuolo; Armando Santoro; Alfredo Colosimo

Hindawi BioMed Research International Volume 2018, Article ID 2786163, 11 pages https://doi.org/10.1155/2018/2786163 Research Article Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach Rita De Sanctis ,1,2 Alessandro Viganò,2,3 Alessandro Giuliani,4 Alessandro Gronchi,5 Antonino De Paoli,6 Pierina Navarria,7 Vittorio Quagliuolo,8 Armando Santoro,1,9 and Alfredo Colosimo 2 1 Department of Medical Oncology and Hematology, Humanitas Cancer Center and Research Hospital, IRCCS, Rozzano, Milan, Italy 2 Molecular and Cellular Networks Lab, Department of Anatomy, Histology, Forensic Medicine and Orthopaedics, “Sapienza” University of Rome, Rome, Italy 3 Department of Neurology and Psychiatry, “Sapienza” University of Rome, Rome, Italy 4 Department of Environment and Health, Istituto Superiore di Sanità, Rome, Italy 5 Department of Surgery, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy 6 Department of Radiation Oncology, Centro di Riferimento Oncologico, National Cancer Institute, Aviano, Italy 7 Department of Radiotherapy and Radiosurgery, Humanitas Cancer Center and Research Hospital, IRCCS, Rozzano, Milan, Italy 8 Department of Surgery, Humanitas Cancer Center and Research Hospital, IRCCS, Rozzano, Milan, Italy 9 Humanitas University, Rozzano, Milan, Italy Correspondence should be addressed to Rita De Sanctis; rita.desanctis@yahoo.it Received 28 September 2017; Accepted 12 March 2018; Published 23 April 2018 Academic Editor: Maria C. De Rosa Copyright © 2018 Rita De Sanctis et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The aim of this report is to unveil specific prognostic factors for retroperitoneal sarcoma (RPS) patients by univariate and multivariate statistical techniques. A phase I-II study on localized RPS treated with high-dose ifosfamide and radiotherapy followed by surgery (ISG-STS 0303 protocol) demonstrated that chemo/radiotherapy was safe and increased the 3-year relapse-free survival (RFS) with respect to historical controls. Of 70 patients, twenty-six developed local, 10 distant, and 5 combined relapse. Median disease-free interval (DFI) was 29.47 months. According to a discriminant function analysis, DFI, histology, relapse pattern, and the first treatment approach at relapse had a statistically significant prognostic impact. Based on scientific literature and clinical expertise, clinicopathological data were analyzed using both a supervised and an unsupervised classification method to predict the prognosis, with similar sample sizes (66 and 65, resp., in casewise approach and 70 in mean-substitution one). This is the first attempt to predict patients’ prognosis by means of multivariate statistics, and in this light, it looks noticable that (i) some clinical data have a well-defined prognostic value, (ii) the unsupervised model produced comparable results with respect to the supervised one, and (iii) the appropriate combination of both models appears fruitful and easily extensible to different clinical contexts. 1. Introduction Retroperitoneal sarcomas (RPS) are a peculiar soft tissue sarcoma (STS) subgroup including 4 or 5 subtypes and an expected incidence of less than 1 case per 100,000 inhabitants/year [1–4]. Although surgery is the standard treatment for localized lesions [5–13], the role of radiation therapy (RT) in RPS is not fully defined as of yet. In order to complement surgery, the use of preoperative and intraoperative RT is under increasing investigation. In addition, chemotherapy (CT), concurrent to RT, could have a radiosensitizer effect and a precautionary role in eradicating micrometastases, thus increasing the therapeutic index of RT alone [14, 15]. 2 The analysis of prognostic factors and risk stratification, in order to decide the better therapeutic approach, is still pivotal in patients with rare tumors; moreover, the usefulness of the few tools available to predict good or poor prognosis is not completely defined [16]. In this contribution, we present an exploratory univariate analysis of possible disease predictors as well as a multivariate prognostic model for RPS based on discriminant function analysis (DFA), Mahalanobis distance (MD), and decision trees (DT) classification. An unsupervised approach in the analytical strategy showed a comparable efficiency with respect to a supervised one, which is suggestive of their possible combined use in clinical practice. 2. Materials and Methods 2.1. Clinical Data. Clinicopathological data about RPS were obtained from patients treated at three Italian referral centers according to the protocol Italian Sarcoma Group-Soft Tissue Sarcoma Trial 0303 (ISG-STS0303; EudraCT number: ITASARC ∗ II 2004 003) between December 2003 and December 2010. At the end of this protocol, patients underwent further follow-up, thus collecting further information concerning (i) pattern of relapse (local and/or distant), (ii) disease-free interval (DFI), and (iii) postrelapse outcome (type of treatment at recurrence, response rate according to RECIST (Response Evaluation Criteria in Solid Tumors) criteria [17], and postrelapse survival). Institutional databases from the 3 main participating enrolling centers (Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy; Humanitas Research Hospital, IRCCS, Rozzano, Milan, Italy; and CRO, IRCCS, Aviano, Italy) allowed updating the follow-up of 70 patients operated on with a median follow-up time from surgery of 91.7 months (interquartile range (IQR): 72.5, 111.3 months). Patients lost to follow-up were excluded from the analysis. Male/female ratio was 1.25 : 1. Median age was 58.5 years (range: 27–75). The patients’ distribution among six relevant clinical descriptors is reported in Table 1. Notice that, for the 70 patients considered in Table 1, the following information was also available: type of resection (macroscopically complete or incomplete, multivisceral surgery), adjuvant therapies, percentage of necrosis of the surgical specimen, DFI, relapse pattern (local and/or distant), first type of treatment at recurrence (first-line chemotherapy regimen, best response and RT), and further treatments (second-line chemotherapy regimen, best response, and further CT lines). 41 out of the 70 patients included in the study developed local (𝑛 = 26) or distant (𝑛 = 10) or both local and distant (𝑛 = 5) relapse. Main histological relapsed subtypes were dedifferentiated liposarcoma (14 out of 41, 34%), leiomyosarcoma (10, 24%), well-differentiated liposarcoma (6, 14%), and NOS (not otherwise specified) sarcoma (6, 14%). Median DFI was 29.47 months. Patients with any local relapse (with or without distant disease) presented an infield (23/31), outfield (4/31), or mixed (4/31) recurrence. Among patients with only local relapse, 16/26 (61.5%) received surgery (in two cases after preoperative chemotherapy), 9/26 (34%) received BioMed Research International Table 1: Distribution of patients by six relevant clinical parameters. FNCLCC (French Fédération Nationale des Centres de Lutte Contre le Cancer) grading, proliferative index (ki67 expression), and CT-RT preoperative chemo/radiotherapy treatment. For the other acronyms, see the Abbreviations section. Feature Age, median (range) Sex Female Male Tumor size (cm), median (range) STS histology WDLPS DDLPS LMS Others FNCLLC grading (missing data = 1) G1 G2 G3 Response to CT-RT (missing data = 3) SD PR PD 𝑁 58.5 % (27–75) 32 38 15 45.7 54.3 (5–45) 16 22 15 17 22.9 31.4 21.4 24.3 19 32 18 27.2 45.7 25.7 50 8 9 71.4 11.4 12.9 chemotherapy, and 1 (4%) received RT. After metastatic relapse, 2/15 (13%) patients underwent surgery, 9/15 (60%) first-line chemotherapy, and 4/15 (27%) both. In addition, four (27%) of these patients received palliative RT. Of the 41 relapsed patients, sixteen (39%) and 4 (9.7%) received ≥2 and ≥3 lines of chemotherapy, respectively. The most commonly used agents included anthracyclines, trabectedin, and gemcitabine. The response rate to first-line chemotherapy was 23% and 10% to subsequent lines. In order to test the significance of any statistical model based upon the various diagnostic–therapeutic–prognostic parameters arising from clinical practice, a first and most important step was the recoding of those parameters on homogeneous scales so that quantitative classification and comparison become possible. An exemplary application of the above is illustrated in Figure 1, where, with the aim of checking a therapeutic approach, three clinical parameters, namely, DFI, tumor size, and histology, were numerically scaled by a 4-point (0–3) rating scale, in which 0 corresponded to a null risk and 3 to the highest risk of poor prognosis (see Table 2). 2.2. Statistical Methods 2.2.1. Setup of a Data (Cases/Variables) Matrix. From the available clinicopathological information, some relevant variables endowed with good and similar prognostic value were selected and tested in order to optimize their prognostic value. Chemotherapy regimens and second-line therapies as effective modifiers and potential confounders, respectively, BioMed Research International 3 Table 2: Clinical descriptors and prognostic reclassification. Each variable was reclassified according to the prognostic information deriving from literature data and clinical expertise. Each reclassification group could range from 0 (better prognosis) to 3 (worst prognosis). For more details on the recoding, see the Appendix. Clinical characteristics Age (years) Histology FNCLLC grading Size (cm) Response after chemo- and radiotherapy Type of resection DFI (months) Relapse pattern First type of retreatment at recurrence Values <50 51–60 61–65 66–70 70–75 WDLPS, SFT Myxoid LPS Pleomorphic LPS, fibrosarcoma LMS, DDLPS MPNST, NOS sarcoma G1 G2 G3 <10 11–15 16–20 20–30 30–45 PR SD PD No resection R0/R1 R2 Never relapsed >36 24–36 12–24 6–12 0–6 Never NED No relapse Local Distant Local + distant Surgery, surgery + RT CT followed by surgery CT or RT Debulking surgery, no surgery were eliminated. As a matter of fact, different chemotherapeutic agents, potentially active in specific histologies, may contribute to a different prognosis of the patients’ outcome. On the other hand, second-line therapies could have an uncertain impact on the outcome: a patient undergoing a further chemotherapy line should live a sufficient time to undergo a second-line therapy, but second-line therapies are offered to progressive patients with evidence of disease. Reclassification 1 1.5 2 2.5 3 1 1.5 2 2.5 3 1 2 3 1 1.5 2 2.5 3 1 2 3 1 2 3 0 0.5 1 1.5 2 2.5 3 0 1 2 3 1 1.5 2 3 Variables a priori considered as possible confounders of the exposure–outcome association(s) and also possible modifiers of the size or even the direction of the association between exposure and outcome were filtered out. Thus, the resulting data set included 9 main variables for each patient. If necessary, relevant data were numerically recorded and reclassified in groups at a 0.5-unit resolution (for the rescale parameters, see Table 2). These groups ranged from 0 to 4 BioMed Research International 0.6 0.5 0.4 0.3 0.2 0.1 0 Histology 1 1.5 2 2.5 3 0.6 0.5 0.4 0.3 0.2 0.1 0 Size 1 1.5 (a) 2 2.5 3 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 DFI 0 0.5 1 1.5 2 2.5 3 (c) (b) Figure 1: Distribution of histology (a), size (b), and disease-free interval or DFI (c) among the patients considered in this work, according to Table 2, recoding parameters. In histology (a), group 1 indicates WDLPS and SFT, group 1.5 myxoid LPS, group 2 pleomorphic LPS and fibrosarcoma, group 2.5 LMS and DDLPS, and group 3 MPNST and NOS sarcoma. In size (b), group 1 (<10 cm), group 1.5 (11–15 cm), group 2 (16–20 cm), group 2.5 (20–30 cm), and group 3 (30–45 cm) correspond to different sizes in the largest diameter of the tumor. In DFI (c), group 0 corresponds to never relapsed patients, groups 0.5 stays for patients with a DFI superior to 36 months, group 1 is for a DFI from 24 to 36 months, group 1.5 is for a DFI ranging from 12 to 24 months, group 2 is for a DFI of 6–12 months, group 2.5 is for a DFI inferior to 6 months, and group 3 is for patients with no eradication of the disease (never NED). Notice that (i) liposarcoma and leiomyosarcoma accounted for 75% of all patients, (ii) the variable size was normally distributed, and (iii) the majority of patients treated on the protocol ISG-STS 03.03 were in “no relapse” (0 group), confirming the potential effectiveness of the therapeutic approach. 3, where 0 corresponded to the better outcome while 3 corresponded to the worst prognosis. As an example, for the first type of retreatment at recurrence, an adequate treatment comprising radical surgery with or without radiotherapy was considered as the best approach (and reclassified as 1) in the prognosis, while a debulking surgery, which is known to be detrimental in the management of retroperitoneal sarcomas, was considered as the worst possible therapeutic approach (and, therefore, it was reclassified as 3). 2.2.2. Data Analysis (Univariate and Multivariate Methods). The association of the survival status with clinical variables was analyzed by univariate analysis, and the multivariate analysis was conducted at different levels of supervision applied to the independent variables. On the basis of the scientific literature and of clinical expertise of specialists in the field, the multivariate approach focused on the supervised clustering by discriminant function analysis (DFA) of clinicopathological profiles endowed with similar prognostic impact and Mahalanobis distance. It is worth stressing the particular meaning we associate here with the term “supervised.” On a purely statistical (syntactic) perspective, a discriminant analysis is by definition a supervised approach, even if we do not attach an a priori weight to the intervening variables, given that the system optimizes the fitting to a known outcome. Here, we adopt a “semantic” definition of the term implying the a priori setting of the “weight of evidence” of each variable, instead of limiting ourselves to make this weight emerge a posteriori by the least square optimization. We could use the term “Bayesian” for this approach, but we prefer “supervised” given that Bayesian approaches imply a particular mathematical computation of a posteriori probabilities that we do not apply here. DFA estimates the linear combinations of descriptors maximizing the separability among subjects according to their survival status [18]. DFA and Mahalanobis distance classification was performed on the clinical regressors (previous medical knowledge and data fitting). DFA allowed building a model able to predict the group (alive/dead) that each patient belongs to, through a forward stepwise optimization paradigm. Structural classification was derived from the whole dataset of variables to see if individuals could be grouped into any natural system of groups. The Mahalanobis distance is a measure of the distance between a point 𝑃 and a distribution 𝐷 introduced by Mahalanobis in 1936 [19]. It is a generalization of the Euclidean distance taking into consideration the mutual empirical correlation allowing the estimation of the distance of a unit (in our case a patient) from his/her reference population. This distance is zero if 𝑃 is at the mean of its group (being the mean defined as a vector of 𝑘 components correspondent to the means of the 𝑘 variables) and grows as 𝑃 moves away from the mean. In the case of two-class discrimination, the Mahalanobis distance of a patient 𝑃 from the centers of the two classes is computed and 𝑃 assigned to the nearest group. The main outcome of univariate and multivariate analyses was the assignment of each patient to one of the two alive/dead groups, and results were considered significant at 𝑝 < 0.05, after correction. The statistical analyses were carried out by JMP version 13 and STATISTICA version 7. 3. Results 3.1. Univariate Analysis. Taking the vital status of patients as the main outcome in the univariate analysis, the association of the outcome (survival status) with each clinical variable of interest was examined, regardless of all other variables. We found that histology, grading, response to preoperative treatment, disease-free interval (DFI), pattern of relapse, and first treatment approach at relapse showed a relatively better prognostic impact (Table 3). Notice that, in spite of the relatively fewer cases considered in the unfiltered condition, the overall association performance of the abovementioned variables was not improved in the “filtered” homogeneous condition, which was entitled to focus on the latter in the following analyses. BioMed Research International 5 Table 3: Univariate DFA over filtered and unfiltered cases. The association with the survival status was independently tested for each of the remaining 9 variables. In the “unfiltered” condition (on the left), the valid cases depended upon the presence of missing cases (to be excluded) on that variable. In the “filtered” condition (on the right), all cases having any type of missing values (9 in total) were excluded. Unfiltered Filtered Valid cases # wrong % correct 𝑅-squared −2 log prob. Valid cases # wrong % correct 𝑅-squared −2 log prob. Age Histology Grading Size Response to CTRT Resection DFI Relapse pattern First type of retreatment 70 70 69 68 32 27 25 35 54.3 61.4 63.8 48.5 0.00037 0.078 0.131 −0.0023 96.95 89.40 83.01 94.25 61 61 61 61 27 25 23 31 55.7 59.0 62.3 49.2 −0.003 0.060 0.114 −0.002 84.4 78.7 74.6 84.28 67 26 61.2 0.016 91.38 61 35 42.6 −0.004 84.52 70 70 66 32 11 12 54.3 84.3 81.8 0.02 0.49 0.24 95.05 49.62 69.39 61 61 61 32 11 11 47.5 82.0 82.0 0.007 0.458 0.219 83.6 45.61 65.71 68 15 77.9 0.12 83.36 61 14 77.0 0.120 74.045 3.2. Multivariate Analysis. Since the multivariate version of DFA is particularly sensitive to the number of cases, we recovered as many cases as possible by both “mean-substitution” and “casewise” validation procedures. In the meansubstitution procedure, missing data were substituted by their respective means. In the casewise method, those patients with too many missing values were automatically excluded from the analysis: out of 9 patients with missing values (Table 4), 4 and 5 patients were excluded from the supervised and unsupervised multivariate analysis, respectively. Under both conditions, variables were stepwise added to the discriminant function with an Enter 𝐹 of 1.00, an Exit 𝐹 of 0.0, and a tolerance of 0, through a forward stepwise method, selecting at each step the variable that made the most significant contribution to the discrimination. Casewise DFA (𝑛 = 66 patients) showed that DFI, age, and histology were endowed with a relevant predictive value, while the mean-substitution (𝑛 = 70 patients) method found DFI, relapse pattern, and histology as the best predictors. In both cases, DFI and histology were the most and the least important variables, respectively (see Table 5). For each variable, Wilks’ lambda and its 𝑝 value have been reported. Wilks’ lambda is a statistical test used in multivariate analysis of variance (like a 𝑡-test in the multivariate setting) to test whether there are differences between the means of the samples on a combination of dependent variables. In the present case, Wilks’ lambda indicated the unique contribution of the respective variable to the discrimination. 3.3. Multivariate Analysis: Supervised and Unsupervised Methods. In the unsupervised analysis, the rescaling of each variable was not performed and, most importantly, the variables were not a priori selected according to clinical expertise or literature data. All the known and independent variables were possible candidates for inclusion into the model. Under these conditions, the 16 variables initially found independent of each another (gender, histology, grading, tumor size, response to neoadjuvant CT-RT, multivisceral surgery, adjuvant therapies, margins, DFI, relapse, local recurrence, distant metastases, surgery for relapsed disease, first-line CT, and second-line CT) were submitted to the further constraint of the minimum possible number of missing values and finally reduced to 9. In the unsupervised analysis, the discriminant function showed that DFI had a statistically significant predictive value in both casewise (𝑛 = 65 patients) and mean-substitution (𝑛 = 70 patients) methods (Table 5). 3.4. Canonical Analysis. A canonical analysis is generally used to get some orthogonal (independent of each other) discriminant functions through the computation of orthogonal discriminant roots. Computationally, a canonical correlation analysis determines the successive discriminant functions and canonical roots, corresponding to the eigenvalues associated with the respective canonical function. The maximum number of functions cannot exceed the number of groups used in the classification minus one. Thus, in the present case, one discriminant function was estimated, which provided the best overall discrimination between alive and dead patients: in Figures 2(a) and 2(b), alive patients (red dots) are concentrated in the left or the right part, respectively, of the scatterplot. Since the main goal of any DFA method is to classify cases, a distance estimator in the multivariate space defined by the model’s variables is needed. In such a context, Mahalanobis distance has been selected since it takes into account the existing correlation structure of the data, generating a wellconditioned metric for the system at hand. In brief, points are classified as belonging to distinct groups by measuring the distance of each point 𝑃 from the multidimensional mean (centroid) of a distribution according to the covariance of the same distribution, so to scale the actual Euclidean distance in the multivariate space by the mutual correlation of the intervening variables. Briefly, such distances can be considered as weighted Euclidean distances; the model derived from a representative sample of the population at 6 BioMed Research International Table 4: Supervised features of the 70 patients considered in this work. The minus sign indicates cases with missing values, excluded from the analysis. Patient # 1 2 (−) 3 4 5 6 (−) 7 (−) 8 9 10 (−) 11 (−) 12 (−) 13 14 15 16 17 18 (−) 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 (−) 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 Age 3 1 2 2 1.5 1.5 1 2.5 1.5 1 1.5 2 1.5 2.5 1.5 1.5 3 2 2 1.5 1 1.5 2.5 1.5 3 1.5 2 1.5 1 1.5 1 1.5 2 3 2 1.5 2.5 1 1 1.5 1 2.5 1 2.5 2.5 2 3 2 1.5 1.5 2.5 Histology 1 3 2.5 2.5 1 3 2.5 3 3 2 3 1 1 1 1 1 2 2.5 3 2.5 2.5 2.5 2.5 3 2 2.5 3 2.5 2.5 1 2.5 1 1 2.5 1 1 3 2.5 2 1 3 2.5 1.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 Grading 1 3 1 2 1 2 2 2 2 3 2 1 1 2 1 3 2 2 2 3 3 2 3 3 3 3 3 3 1 3 1 1 3 3 1 2 2 1 2 3 2 1 2 2 2 2 3 2 2 2 Size 2 1.5 1 1 2 2.5 1 2 2.5 2 2.5 3 2 2 2.5 1.5 1 1.5 1.5 2.5 1 2 1.5 1 1 2 3 2.5 2 2.5 2.5 1 2 2 1.5 1 1 1 1.5 1.5 2 2.5 2.5 2 1.5 2 1 1.5 1.5 CTRT response 2 3 2 1 1 2 2 2 2 3 3 2 2 2 2 2 3 3 2 2 2 3 2 2 2 1 3 2 2 2 2 2 3 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 Resection 2 1 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 DFI 0.5 3 0.5 1.5 0.5 1.5 0 1.5 0.5 0.5 3 3 0 1 0 0 0 3 0.5 1 1 1 2 1 1.5 1.5 1.5 2.5 0.5 0 0 0 0 1.5 2.5 0 0 0 0.5 0.5 0 0 0.5 1.5 0 0.5 1 2 0.5 0 2 Relapse pattern 3 2 1 2 1 0 1 1 1 0 1 0 0 0 1 1 1 1 2 1 1 3 1 3 1 0 0 0 0 1 2 0 0 0 2 2 0 0 1 1 0 1 3 3 1 0 2 Retreatment (I) 1 1.5 1 1 1 2 2 2 1 2 3 1 3 3 3 2 1 1 1 1 2 3 3 3 1 2 2 3 3 3 3 2 2 3 3 3 2 1 3 3 1.5 2 3 2 1.5 2 1.5 3 2 Status Alive Dead Alive Dead Dead Dead Alive Dead Dead Alive Dead Dead Alive Alive Alive Alive Alive Dead Dead Dead Dead Dead Dead Dead Dead Dead Dead Dead Dead Alive Dead Alive Alive Dead Dead Alive Alive Alive Dead Alive Alive Alive Dead Dead Alive Dead Dead Dead Alive Alive Dead BioMed Research International 7 Table 4: Continued. Grading 1 2 1 2 2 2 1 2 2 1 2 1 2 1 2 1 3 3 2 Size 1 2.5 2 1.5 2 2.5 1.5 2 2 2.5 1.5 2.5 2.5 3 2 2.5 1.5 1.5 2 CTRT response 2 2 2 2 1 Resection 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 1 1 Relapse pattern Canonical 2 First type of re-treatment at recurrence Histology −1 0 Retreatment (I) 3 3 2 3 2 2 1 2 3 1 2 3 3 2 3 3 3 3 2 Status Alive Alive Alive Alive Dead Dead Alive Dead Alive Alive Alive Alive Alive Dead Alive Alive Alive Alive Alive 0.5 0.0 −0.5 −1.0 DFI −2 Relapse pattern 0 0 2 0 1 3 1 2 0 1 2 0 0 1 0 0 0 0 1 1.0 dead Grading DFI 0 0 0.5 0 2 1 0.5 1.5 0 0.5 0.5 0 0 0.5 0 0 0 0 1.5 alive 2.0 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 Histology 2.5 2.5 2.5 2.5 2.5 2.5 1 2.5 2.5 1 2.5 1 2.5 1 2.5 1 2.5 2.5 2.5 dead Age 1.5 2.5 1 1.5 3 1.5 2.5 2.5 1.5 1 3 1.5 2.5 1 1 2 2 1 2.5 alive Canonical 2 Patient # 52 53 54 55 56 57 (−) 58 59 60 61 62 63 64 65 66 67 68 69 70 1 2 3 Canonical 1 4 5 (a) 6 −3 −2 −1 0 Canonical 1 1 2 (b) Figure 2: Scatterplots in monodimensional canonical space. (a) Supervised condition (cases # = 64 (dead = 23, alive = 30); wrong = 11; % wrong = 17.2; 𝑅2 = 0.51). (b) Unsupervised condition (cases # = 56 (dead = 25, alive = 25); wrong = 6; % wrong = 10.7; 𝑅2 = 0.55). Table 5: Multivariate analysis of clinical descriptors by stepwise DFA. The association with the vital status by stepwise discriminant function analysis (DFA) was carried out according to the casewise (on the left) or the mean-substitution (on the right) methods. DFI Histology Age Relapse pattern Size Casewise Wilks’ 𝜆 (pi level) 0.60 (<0.001) 0.45 (0.04) 0.45 (0.04) 0.43 (0.10) 0.43 (0.17) Mean substitution Wilks’ 𝜆 (pi level) 0.58 (<0.001) 0.43 (0.03) 0.42 (0.18) 0.43 (0.04) 0.41 (0.11) hand could be easily extended and applied to new clinical cases. If the distance from the centroid is higher than a certain threshold, the point is no longer considered as belonging to the group. The classification method based on Mahalanobis distance showed in the unsupervised method 86.6% and 83.3% of accuracy of prediction of patients’ outcome for casewise and mean-substitution methods, respectively (Table 6, columns 4 and 5), and in the supervised method 85.3% and 84.7% accuracy of prediction between alive and dead patients in casewise and mean-substitution method, respectively (Table 6, columns 2 and 3). Such accuracy values are quite high, even if post hoc classifications are considered. 3.5. Partitioning Classes (Groups) by Decision Trees Methods. Partitioning multivariate data according to a relationship between the predictors and response values creates an empirical decision tree useful for exploring relationships in the absence of a good prior model. Figure 3 shows how a decision tree can visually and explicitly represent our database in a typical diagnostic or prognostic context. 8 BioMed Research International Table 6: Summary of results. In the casewise method, 4 and 5 cases were excluded because of too many missing values in the supervised and unsupervised methods, respectively. In the mean substitution, no cases were excluded since missing values were substituted by their respective means. Accuracy was computed considering the number of misclassified cases on the total of analyzed cases (68, 67, or 72 in supervised casewise, unsupervised casewise, or mean substitution, resp.). Supervised Unsupervised Casewise Mean substitution Casewise Mean substitution 10 11 9 12 4 85.3 / 84.7 5 86.6 / 83.3 All Relapse pattern 3 <3 2, 1 Grading 1, 2 <2 2 DFI <1 ≥1 ≥2 DFI ≥3 ≥3 <3 Grading (a) (b) Yes Yes 0.043 (?)Gr ≥ 3 No Yes (?)DFI ≥ 1 No 0.591 (?)Gr ≥ 2 (?)RP ≥ 1 No 0 Grading 3 3, 1, 2 Grading <1 Grading ≥1 Grading 1, 2 All Relapse pattern 1 Missing values % accuracy 1.5, 0.5 Incorrect classification Yes 0.763 No 0.979 Yes 0.134 No 0.497 (?)Gr ≥ 3 (c) Yes Yes 0.043 Yes Yes (?)Gr == 3 No (?)Gr == 2 No 0.763 (?)RP == 1|2|3 (?)DFI == 1|2 No No (?)Gr == 1 | 2 Yes 0.591 No 0.079 Yes 0.373 No 0.264 (?)DFI == 0.5 | 1.5 0.979 (d) Figure 3: Decision tree picture of the supervised- and unsupervised-based partitioning. Panels (a) and (c) depict the first three branches (splits) of the decision tree obtained by the numeric, supervised coding (scales reported in Table 2) of the 5 best performing variables in Table 3 (histology, grading, DFI, relapse pattern, and 1st-type treatment at recurrence). Panels (b) and (d) refer to the same data coded as alphanumeric symbols, hence loosing any quantitative specificity assigned by supervisors. The rectangular boxes in panels (c) and (d) contain the 𝑅2 values, namely, an indication of the % of explained variability. Ideally, repeated partitioning should eventually produce a total 𝑅2 = 1. Modeling has been carried out by the Partition Platform of JMP, version 13. BioMed Research International A classic application of partitioning is to create a diagnostic heuristic for a disease. Moreover, given symptoms and outcomes for a number of subjects, partitioning can be used to generate a hierarchy of questions helpful for new patients’ prognosis. Figures 3(a) and 3(b) depict the first three partitions (decision trees) identifying the probable alive and dead subjects on the basis of five selected variables (see the legend to the figure). Such variables were coded by numeric and alphanumeric symbols, in order to reproduce supervised and unsupervised learning, respectively. The diagrams in Figures 3(c) and 3(d) correspond to Figures 3(a) and 3(b), respectively. In spite of the intimidating aspect, such diagrams are a direct application of simple logical rules for the probable association of each single subject with the “alive” or dead “group” on the basis of the predictor variables, and they actually open the door to the prognostic application of the model to new subjects. It is important to note, however, that enriching the currently available database in order to check the model by the traditional approach of splitting the cases into two groups (to be used in a “learning” and a “test” phase) is in due course now. 4. Discussion Recently, concomitant chemo- and radiotherapy proved to be a safe and promising treatment in RPS, leading to an increase of the 3-year relapse-free survival (3y-RFS) with respect to historical controls. However, no significant prognostic factors were found by the classical Cox proportional hazards model, except for a trend toward a better RFS and overall survival (OS) in patients with nonliposarcoma histology [14]. To date, only nomogram-based prognostic tools are available for RPS [16], and thus the present study is, to our knowledge, the first application of univariate and multivariate methods to the prediction of clinical outcome in STS patients. In univariate analysis, several variables, chosen among the more important clinicopathological parameters, showed a significant prognostic role in the cohort of RPS patients treated on ISG-STS 03.03: DFI and relapse patterns appeared to be significantly associated with survival status, while size and age failed to reach statistical significance. The lack of significance of age is probably due to the epidemiology of the disease, mainly affecting young adults, and the inefficacy of size may be attributed to its anatomical peculiarity. The retroperitoneal site in fact, since it is not always directly in contact with abdominal organs, requires a highly specialized surgery which may limit the local invasiveness of the disease. The effectiveness of the multivariate approach was tested in the context of both a supervised and an unsupervised method. Since, according to several sources [20, 21], DFA is very sensitive to outliers, we carried out the supervised analysis first, in order to reduce the variance of variables distribution and the number of outliers. In the supervised classification model for DFA, the input variables were chosen and stratified by physicians on the basis of their clinical expertise in identifying lower and higher risk classes correlated with a proportional risk of death. Thus, the “relapse pattern” variable was defined as 0 if the patient did not experience a recurrence, 1 if the patient developed a 9 local relapse, 2 if metastases were detected, and 3 if both local and distant relapse were diagnosed. Quite interestingly, the longest DFI, classified as 0, or null risk of progression and death, showed a statistically significantly positive prognostic role. In addition, in the supervised method, the gender was not included since, at present, any suggestion of a prognostic role of the gender in STS is missing and, finally, surgical margins (R0, R1, and R2) and relapse (yes/no) were reported in a more synthetic form. In the unsupervised model, we introduced in the model all relevant variables without any a priori selection or rescaling, but with the lowest minimum possible number of missing values. Since every patient with missing values should be excluded from the analysis and, at the same time, a very limited sample size would thwart any statistics, 8 of the initial 24 variables, biased by several missing values, were excluded from the analysis. In both supervised and unsupervised learning methods, we carried out the DFA analysis following either the meansubstitution or the casewise procedure. The results of the supervised and unsupervised model were fairly similar, with a misclassification rate of 15.3% and 16.7% when missing values were substituted by their respective means, and 14.7% versus 13.4% in the casewise setting, respectively. However, in some cases, the substitution of the missing values by the means could be inappropriate, such as for histology or response to preoperative chemoradiotherapy. In the supervised model, DFI and histology were the best predictors in both casewise and mean-substitution approaches. Age was significant in casewise analysis and relapse pattern was significant in mean substitution. Indeed, the main factors influencing patients’ outcomes are expected to be grading, with a poorer prognosis for G3 histologies, and histotype, with a greater metastatic potential for leiomyosarcomas and MPNST. The role of DFI as a predictor of outcome seems straightforward since a longer time interval to relapse is intuitively a good prognostic factor and it could include other variables by itself, such as adequate surgery, response to preoperative treatments, or a less aggressive disease. It is worth noticing that, in the unsupervised model, only DFI was significant in both casewise and mean-substitution approaches. Concerning the results presented in Figure 3 on the decision tree methods, the following points deserve attention. (1) The 𝑅2 quantitative indicators of the obtained stratification are the same in the supervised and unsupervised procedure. In spite of the small number of clinically homogeneous subjects which could impair the stability of our statistical model, the emerging indication is that the two approaches are not incompatible among each other. A necessary prerequisite for an optimal diagnostic/prognostic performance remains, in any case, a solid expertise in the appropriate management of (possibly both, but at least or) clinical or statistical information. (2) The diagrams of the type in panels (c) and (d), once properly mastered, may play a crucial role in extending the decision tree method to any clinical condition of similar or different pathology. 10 (3) On a more theoretical ground, it is worth noting that, in the multivariate approach (both supervised and unsupervised), we rely on “configuration of symptoms” and not on the simple additivity of single prognostic factors. This is particularly evident for Mahalanobis distances where the classification comes up from the distance computed on the whole 𝑘-dimensional space. This implies that even information that per se does not have a relevant prognostic power can contribute to the classification when considered altogether. 5. Conclusions All in all, it seems fair to conclude that unsupervised and supervised analyses produced slightly similar results and a fair outcome prediction in retroperitoneal sarcoma patients treated on ISGSTS0303 protocol. No clinically significant differences were observed among the two methods, even if the supervised one was mainly based on the a priori medical knowledge of the disease. In particular, DFA allowed obtaining a good evaluation of single cases in terms of Mahalanobis distances, which can be possibly converted into probabilities. This methodological approach may well be included into the decision-making process in oncology and, more in general, in medicine, in a sort of “from bed to bioinformatic bench and back” strategy. Appendix Detailed Prognostic Reclassification of Clinical Descriptors in Table 2 Age cut-offs were defined on the basis of the nomogram for retroperitoneal sarcomas [16]. Histology reclassification derived from expertise and specific literature data [22–24]. R0/R1 resection was considered equivalent since there is no radical resection in the retroperitoneal space, by definition. Chemotherapy followed by surgery was reclassified as 1.5 since a chemotherapeutic treatment upfront is usually offered to locally advanced or rapidly growing disease; chemotherapy was evaluated as 2 since it had the objective of controlling the disease but not of eradicating it; no surgery or debulking surgery was classified equally as a 3-point factor (worst prognostic therapeutic approach, since debulking surgery is not effective in RPS and sometimes is also detrimental). Abbreviations 3y-RFS: 3-Year relapse-free survival CT: Chemotherapy DDLPS: Dedifferentiated liposarcoma DFA: Discriminant function analysis DFI: Disease-free interval DT: Decision trees BioMed Research International FNCLCC: French Fédération Nationale des Centres de Lutte Contre le Cancer ISG-STS 0303 protocol: Italian Sarcoma Group-Soft Tissue Sarcoma 0303 Protocol LMS: Leiomyosarcoma LPS: Liposarcoma MANOVA: Multivariate analysis of variance MD: Mahalanobis distance MPNST: Malignant peripheral nerve sheath tumors NED: Not evident disease NOS: Not otherwise specified OS: Overall survival PD: Progressive disease PR: Partial response R0, R1, and R2 surgical margins: R0, no residual tumor; R1, microscopic residual tumor; R2, macroscopic residual tumor RECIST: Response Evaluation Criteria in Solid Tumors Relapse Y/N: Yes/no RPS: Retroperitoneal sarcoma RT: Radiation therapy SD: Stable disease SFT: Solitary fibrous tumor STS: Soft tissue sarcomas WDLPS: Well-differentiated liposarcoma. Conflicts of Interest The authors declare that there are no conflicts of interest regarding the publication of this paper. References [1] A. Ferrari, I. Sultan, T. T. Huang et al., “Soft tissue sarcoma across the age spectrum: a population-based study from the surveillance epidemiology and end results database,” Pediatric Blood & Cancer, vol. 57, no. 6, pp. 943–949, 2011. [2] G. A. Porter, N. N. Baxter, and P. W. T. Pisters, “Retroperitoneal sarcoma: a population-based analysis of epidemiology, surgery, and radiotherapy,” Cancer, vol. 106, no. 7, pp. 1610–1616, 2006. [3] J. J. Lewis, D. Leung, J. M. Woodruff, and M. F. Brennan, “Retroperitoneal soft-tissue sarcoma: Analysis of 500 patients treated and followed at a single institution,” Annals of Surgery, vol. 228, no. 3, pp. 355–365, 1998. [4] I. Hassan, S. Z. Park, J. H. Donohue et al., “Operative management of primary retroperitoneal sarcomas: a reappraisal of an institutional experience,” Annals of Surgery, vol. 239, no. 2, pp. 244–250, 2004. [5] K. Cardona, R. Williams, and S. Movva, “Multimodality therapy for advanced or metastatic sarcoma,” Current Problems in Cancer, vol. 37, no. 2, pp. 74–86, 2013. [6] D. P. Jaques, D. G. Coit, S. I. Hajdu, and M. F. Brennan, “Management of primary and recurrent soft-tissue sarcoma of BioMed Research International [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] the retroperitoneum,” Annals of Surgery, vol. 212, no. 1, pp. 51– 59, 1990. The ESMO/European Sarcoma Network Working Group, “Soft tissue and visceral sarcomas: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up,” Annals of Oncology, vol. 25, supplement 3, pp. iii102–iii112, 2014. A. Gronchi, S. Lo Vullo, M. Fiore et al., “Aggressive surgical policies in a retrospectively reviewed single-institution case series of retroperitoneal soft tissue sarcoma patients,” Journal of Clinical Oncology, vol. 27, no. 1, pp. 24–30, 2009. S. Bonvalot, R. Miceli, M. Berselli et al., “Aggressive surgery in retroperitoneal soft tissue sarcoma carried out at high-volume centers is safe and is associated with improved local control,” Annals of Surgical Oncology, vol. 17, no. 6, pp. 1507–1514, 2010. P. W. T. Pisters and B. O’Sullivan, “Retroperitoneal sarcomas: combined modality treatment approaches,” Current Opinion in Oncology, vol. 14, no. 4, pp. 400–405, 2002. C. Colombo, R. L. Randall, R. H. Andtbacka, and A. Gronchi, “Surgery in soft tissue sarcoma: more conservative in extremities, more extended in the retroperitoneum,” Expert Review of Anticancer Therapy, vol. 12, no. 8, pp. 1079–1087, 2012. C. Mussi, P. Colombo, A. Bertuzzi et al., “Retroperitoneal sarcoma: Is it time to change the surgical policy?” Annals of Surgical Oncology, vol. 18, no. 8, pp. 2136–2142, 2011. A. Gronchi, R. Miceli, C. Colombo et al., “Frontline extended surgery is associated with improved survival in retroperitoneal low- to intermediate-grade soft tissue sarcomas,” Annals of Oncology, vol. 23, no. 4, pp. 1067–1073, 2012. A. Gronchi, A. De Paoli, C. Dani et al., “Preoperative chemoradiation therapy for localised retroperitoneal sarcoma: A phase I-II study from the Italian Sarcoma Group,” European Journal of Cancer, vol. 50, no. 4, pp. 784–792, 2014. W. O. Lane, C. K. Cramer, D. P. Nussbaum et al., “Analysis of perioperative radiation therapy in the surgical treatment of primary and recurrent retroperitoneal sarcoma,” Journal of Surgical Oncology, vol. 112, no. 4, pp. 352–358, 2015. A. Gronchi, R. Miceli, E. Shurell et al., “Outcome prediction in primary resected retroperitoneal soft tissue sarcoma: Histologyspecific overall survival and disease-free survival nomograms built on major sarcoma center data sets,” Journal of Clinical Oncology, vol. 31, no. 13, pp. 1649–1655, 2013. E. A. Eisenhauer, P. Therasse, J. Bogaerts et al., “New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1),” European Journal of Cancer, vol. 45, no. 2, pp. 228– 247, 2009. P. A. Lachenbruch and M. Goldstein, “Discriminant analysis,” Biometrics, vol. 35, no. 1, pp. 69–85, 1979. P. C. Mahalanobis, “On thegeneralised distance in statistics,” Proceedings of the National Institute of Sciences of India, vol. 2, no. 1, pp. 49–55, 1936. H. E. A. Tinsley and A. D. Brown, Handbook of Applied Multivariate Statistics and Mathematical Modeling, Academy Press, 2000. P. Lewicki and T. Hill, Statistics: Methods and Applications, A Comprehensive Reference for Science, Industry, and Data Mining, StatSoft, 2006. M. Toulmonde, S. Bonvalot, P. Méeus et al., “Retroperitoneal sarcomas: patterns of care at diagnosis, prognostic factors and focus on main histological subtypes: a multicenter analysis of the French Sarcoma Group,” Annals of Oncology, vol. 25, no. 3, Article ID mdt577, pp. 735–742, 2014. 11 [23] I. Alldinger, Q. Yang, C. Pilarsky, H.-D. Saeger, W. T. Knoefel, and M. Peiper, “Retroperitoneal soft tissue sarcomas: prognosis and treatment of primary and recurrent disease in 117 patients,” Anticancer Reseach, vol. 26, no. 2 B, pp. 1577–1581, 2006. [24] X. D. Zhao, P. Li, X. H. Huang, L. Chen, N. Liu, and Y. G. She, “Prognostic factors predicting the postoperative survival period following treatment for primary retroperitoneal liposarcoma,” Chinese Medical Journal, vol. 128, no. 1, pp. 85–90, 2015. MEDIATORS of INFLAMMATION The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com www.hindawi.com 2013 Volume 2018 Gastroenterology Research and Practice Hindawi www.hindawi.com Journal of Diabetes Research Hindawi www.hindawi.com Volume 2018 Volume 2018 Hindawi www.hindawi.com Volume 2018 Hindawi www.hindawi.com Volume 2018 International Journal of Journal of Endocrinology Immunology Research Hindawi www.hindawi.com Disease Markers Hindawi www.hindawi.com Volume 2018 Volume 2018 Submit your manuscripts at www.hindawi.com BioMed Research International PPAR Research Hindawi www.hindawi.com Hindawi www.hindawi.com Volume 2018 Volume 2018 Journal of Obesity Journal of Ophthalmology Hindawi www.hindawi.com Volume 2018 Evidence-Based Complementary and Alternative Medicine Stem Cells International Hindawi www.hindawi.com Volume 2018 Hindawi www.hindawi.com Volume 2018 Journal of Oncology Hindawi www.hindawi.com Volume 2018 Hindawi www.hindawi.com Volume 2013 Parkinson’s Disease Computational and Mathematical Methods in Medicine Hindawi www.hindawi.com Volume 2018 AIDS Behavioural Neurology Hindawi www.hindawi.com Research and Treatment Volume 2018 Hindawi www.hindawi.com Volume 2018 Hindawi www.hindawi.com Volume 2018 Oxidative Medicine and Cellular Longevity Hindawi www.hindawi.com Volume 2018

RELATED PAPERS

RELATED TOPICS

Log In

Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach

Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach

Related Papers

RELATED PAPERS

RELATED TOPICS