Principal component-based clinical aging clocks identify signatures of healthy aging and targets for clinical intervention

Fong, Sheng; Pabis, Kamil; Latumalea, Djakim; Dugersuren, Nomuundari; Unfried, Maximilian; Tolwinski, Nicholas; Kennedy, Brian; Gruber, Jan

doi:10.1038/s43587-024-00646-8

Download PDF

Article
Open access
Published: 19 June 2024

Principal component-based clinical aging clocks identify signatures of healthy aging and targets for clinical intervention

Nature Aging (2024)Cite this article

14k Accesses
1 Citations
242 Altmetric
Metrics details

Subjects

Abstract

Clocks that measure biological age should predict all-cause mortality and give rise to actionable insights to promote healthy aging. Here we applied dimensionality reduction by principal component analysis to clinical data to generate a clinical aging clock (PCAge) identifying signatures (principal components) separating healthy and unhealthy aging trajectories. We found signatures of metabolic dysregulation, cardiac and renal dysfunction and inflammation that predict unsuccessful aging, and we demonstrate that these processes can be impacted using well-established drug interventions. Furthermore, we generated a streamlined aging clock (LinAge), based directly on PCAge, which maintains equivalent predictive power but relies on substantially fewer features. Finally, we demonstrate that our approach can be tailored to individual datasets, by re-training a custom clinical clock (CALinAge), for use in the Comprehensive Assessment of Long-term Effects of Reducing Intake of Energy (CALERIE) study of caloric restriction. Our analysis of CALERIE participants suggests that 2âyears of mild caloric restriction significantly reduces biological age. Altogether, we demonstrate that this dimensionality reduction approach, through integrating different biological markers, can provide targets for preventative medicine and the promotion of healthy aging.

Longitudinal machine learning uncouples healthy aging factors from chronic disease risks

Article 07 December 2023

Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality

Article 01 March 2024

Mendelian randomization of genetically independent aging phenotypes identifies LPA and VCAM1 as biological targets for human aging

Article 20 January 2022

Main

Although prevention is proverbially better than cure, current clinical recommendations promoting healthy aging focus on specific diseases and react to symptoms and signs of disease rather than focusing on organismal age¹. Biological age (BA) is the most important risk factor determining individual risk of morbidity and mortality, with true BA of individuals generally different from chronological age (CA)². Attempts to construct biological aging clocks, inferring BA from observable physical features (biomarkers), have a long history^2,3,4. BA clocks have been constructed based on different classes of biological features, including clinical parameters^{5,6,7,8,9,10,11,12}, DNA methylation (DNAm)^{13,14,15,16,17,18,19,20} and many types of -omics data^{21,22,23,24,25,26}.

In addition to the underlying feature space, the operational definition of BA differs between approaches. Historically, BA is defined as the age at which the test subjectâs physiology (as determined by its position in feature space) would be approximately normal for the reference cohort^27,28,29. First-generation DNAm clocks follow this approach^13,16. Although such clocks have attained impressive accuracy in determining CA, they are not optimized to predict future morbidity and mortality^18,30.

Second-generation BA clocks aim to directly predict future mortality from biological parameters^{17,18,31,32,33}. These clocks define true BA as âGompertz ageâ, or the age commensurate with an individualâs future risk of dying from all intrinsic causes¹⁸. Second-generation clocks share some similarities with traditional clinical risk markers, such as the atherosclerosis cardiovascular disease (ASCVD) score³⁴, but differ in that they predict all-cause mortality, better reflecting the high degree of interconnectivity between organ system and disease etiology^{9,17,18,24,31,32,33}. Successful aging is more than the absence of specific diseases. Unlike existing clinical risk markers, BA clocks can identify individuals likely to remain free from age-dependent dysfunction, morbidity and mortality for years to come. BA clocks can, therefore, provide normative targets for clinical intervention and individual guidance to promote healthy aging.

Second-generation BA clocks require large-scale cohort data comprising data on biological features combined with long disease and mortality follow-up^35,36. For standard clinical chemistry and physiological features, datasets meeting these criteria are available, enabling construction of second-generation âclinical clocksâ (CCs), designed to predict future mortality and morbidity directly from clinical features and biomarkers^{15,17,18,31,37,38,39}. Unfortunately, equivalent historic data are not yet available for most -omics data, including DNAm. Current second-generation DNAm clocks have, therefore, been trained to either approximate BA predictions of existing CCs or approximate levels of the underlying biomarkers themselves^18,19,20,31.

In settings where the relevant clinical features and blood markers are readily accessible, CCs have distinct advantages. The features on which CCs are built often have intrinsic well-established biological and pathophysiological meaning, making their findings comparatively easy to interpret and act upon clinically. The development and validation of more powerful CCs, as well as tools facilitating their clinical interpretation and application, should, therefore, be a priority. Extracting aging patterns from patient data can be challenging because many statistical and machine learning techniques require datasets with more examples (for example, subjects) than there are features⁴⁰. Unfortunately, the cost and difficulty to generate such datasets scale with the number of samples and mortality follow-up time. This leaves the dimensionality of the feature space (the number of features collected for each subject) often being large compared to the number of subjects. One approach to address this challenge is dimensionality reduction, or transformation of data from a high-dimensional feature space into an approximately equivalent, lower-dimensional space⁴¹.

Principal component analysis (PCA) is a commonly used dimensionality reduction technique, based on singular value decomposition (SVD)⁴¹. SVD results in a transformation of the coordinate system of feature space into an equal number of âprincipal componentsâ (PCs). The transformation into PC space is a linear transformation (for example, rotation), mapping the original coordinate system of feature space to the new PC system, such that the coordinate axes align with directions in feature space along which the covariance (or correlation) between features is maximal across samples (subjects)⁴¹. PCA is widely used to extract insights from high-dimensional data. Because aging typically is the major source of variance in datasets derived from aging cohorts, this approach is especially appropriate for the extraction of aging patterns^42,43,44. Importantly, because SVD/PCA is an analytical matrix factorization technique, it, unlike regression methods, feature selection and many nonlinear techniques, involves no model fitting, loss of data or algorithmic optimization. PCA is, therefore, not subject to hyper-parameter selection and can be applied to smaller datasets. PCA can be used to reduce dimensionality and compress data by omitting contributions from higher PCs (directions in feature space) that explain smaller amounts of the overall variance, although this approach may sometimes result in loss of useful information⁴⁵. When constructing biomarkers of aging, morbidity and frailty, PCA can be employed as a feature selection/compression method and may increase the robustness of predictive models based on high-dimensional biomedical data^46,47,48,49. The usefulness of this approach in the construction of BA clocks was previously explored by Nakamura et al.⁵. More recently, Higgins-Chen et al.⁴⁵ demonstrated that DNAm clocks constructed from PC-transformed data exhibit increased reliability and reproducibility. In the present study, we explored the construction of second-generation CCs using large clinical datasets and dimensionality reduction by PCA.

Results

PCAge predicts biological age

We used a training dataset extracted from the National Health and Nutrition Examination Survey (NHANES) IV 1999â2000 cohort. The training cohort was initially composed of 1,476 males and 1,536 females aged 40â84âyears with a feature set comprising data from medical examination, physiological and laboratory measurements. Using health-related questionnaire data, we also generated three derived indices (comorbidity index, self-health index and healthcare use index (see Methods for details on these derived scores)). The complete set of 165 clinical parameters included the derived scores; data from medical examination, morphology and body composition; and clinical laboratory and blood chemistry (Supplementary Table 1). Individuals with missing values in any of these parameters were removed, resulting in a final training dataset comprising 923 males and 852 females. We next converted parameters into z-scores before calculating the SVD for the training cohort. We then used PCA for linear dimensionality reduction, retaining only the first 18 singular vectors (PCs), accounting for 99% of the overall variance in the data (see Supplementary Fig. 1 for scree plot). Loadings of the 165 parameters for these 18 PCs were visualized using heatmaps (Supplementary Fig. 2), and these PCs, together with CA, were selected as covariates for Cox proportional hazard models predicting mortality separately for males and females (Supplementary Fig. 3). Hazard ratios were converted into BA as outlined below (Methods), yielding separate BA clocks for males and females (PCAge). We then tested PCAge in a testing cohort, comprising a separate set of subjects, extracted from the NHANES IV 2001â2002 recruitment wave. This cohort initially comprised 1,619 males and 1,631 females aged 40â84âyears, with complete data available for 1,094 males and 942 females. The characteristics of the study participants are shown in Supplementary Table 2. Feature coordinates of subjects from the testing cohort were transformed into PC coordinates by projection using the right singular vectors of the training cohort before BA values were calculated.

As expected, PCAge was highly correlated with CA in both males and females (Fig. 1a) but with significant residuals (PCAge Deltas). We next asked if the residuals between PCAge and CA encode information regarding individual aging trajectoriesâthat is, if large negative or positive residuals were indicative of more or less successful aging. To address this question, we determined the correlation of PCAge, CA and their residuals with parameters of molecular aging (telomere length), cognitive performance (digit symbol substitution test) and physical function (gait speed). Although CA and PCAge were both significantly negatively correlated with telomere length (Supplementary Fig. 4a,d), cognitive performance (Supplementary Fig. 4b,e) and gait speed (Supplementary Fig. 4c,f), PCAge was more predictive of these parameters than CA alone (Supplementary Fig. 4). Subjects with negative PCAge residuals (biologically younger than their CA) had significantly longer telomeres (Supplementary Fig. 4g) and better preserved cognitive performance (Supplementary Fig. 4h), and they walked faster (Supplementary Fig. 4i), than expected based on their CA. By contrast, subjects with positive PCAge residuals (biologically older than their CA) had shorter telomeres (Supplementary Fig. 4g) and worse cognitive performance (Supplementary Fig. 4h), and they walked slower (Supplementary Fig. 4i), than expected for their CA. These data demonstrate that PCAge, despite originally being trained on survival only, is more predictive of molecular and physiological parameters expected to depend on BA than CA alone.

**Fig. 1: PCAge predicts BA in males and females.**

We next tested the performance of PCAge in predicting survival in unknown subjects by selecting subjects in the test cohort within the best (lowest) 25% and worst (highest) 25% quartiles for BA (PCAge). Across all age categories, when compared to subjects with BA within 25% of their CA (mean CA), male subjects in the best 25% quartiles (PCAge Low) experienced significantly lower mortality over the 20-year follow-up, whereas male subjects in the worst 25% quartiles (PCAge High) experienced significantly higher mortality (Fig. 1b,d,f). Similarly, when compared to mean CA, significant survival differences were observed in females, although this did not reach statistical significance for PCAge Low in the 55â64-year age category (Fig. 1c,e,g). Several of the biological features used to calculate PCAge have known associations with clinical disorders and disease risk. To directly test the performance of PCAge in predicting survival relative to a known clinical risk marker, we compared its predictive power against the ASCVD score, a widely used metric to predict the 10-year risk of cardiovascular disease (CVD) or stroke³⁴. Unlike the ASCVD score, we found that PCAge effectively predicts survival and mortality in both males and females aged 45â74âyears (Fig. 1bâe and Extended Data Fig. 1). Subjects with large (positive) PCAge Deltas of at least 20âyears were also significantly more likely to suffer from age-dependent diseases and died significantly faster (Fig. 2a).

**Fig. 2: Testing PCAge for robustness and precision.**

PCA increases robustness to random errors

To further explore the meaning of residuals between PCAge and CA, we next compared PCAge to a well-validated clinical BA clock, PhenoAge¹⁸. We found that PCAge and PhenoAge were highly correlated (Pearson correlation coefficient (PCC)â=â0.91, R²â=â0.83, Pâ<â0.001) (Fig. 2b). Despite strong correlation, there were significant residuals between both clocks (Fig. 2b and Supplementary Fig. 5). One explanation for these differences is sensitivity to random errors. Many clinical measurements are subject to measurement errors and significant day-to-day variations. For clinical parameters, typical variability has been estimated to be around 7â10% (https://www.westgard.cpm/biodatabase1.htm). We, therefore, compared the relative sensitivity of the ASCVD score, PhenoAge and PCAge to noise and found that the ASCVD score was impacted most significantly (Fig. 2c). By contrast, random errors largely average out across PCs, and, therefore, the relative error distribution for PCAge was the narrowest (Fig. 2c). The magnitude of the relative errors for PhenoAge was between that of the ASCVD score and PCAge. This may be because PhenoAge uses significantly fewer (nine) parameters compared to PCAge (165 parameters).

We next asked if these additional parameters enabled PCAge to capture meaningful biology and whether the residuals between PhenoAge and PCAge encoded additional biological information. To answer this question, we carried out a sequential sorting procedure, age-binning subjects not based on their CA but based on one of the clocks, before attempting to predict future survival using the other clock. We evaluated the ability of PCAge to predict future survival in subjects pre-selected (binned) according to their PhenoAge and vice versa. Across PhenoAge bins, we found that PCAge was able to further stratify survival in subjects binned by PhenoAge (Fig. 2d and Extended Data Fig. 2aâc), but the opposite was not true, with PhenoAge Deltas providing no further stratification in subjects binned by PCAge (Fig. 2e and Extended Data Fig. 2dâf). Our findings suggest that the additional parameters captured by PCAge enabled the identification of additional healthy aging and at-risk individuals, beyond those identified by PhenoAge.

PCs map mechanisms of aging and age-related disease(s)

Individual PCs comprise sets of correlated features (Supplementary Table 3). To explore the biological meaning of PC coordinates, we first selected the top nine PCs, based on their predictive value within PCAge (Supplementary Table 4 and Supplementary Fig. 3). We then combined training and testing cohorts, clustering all 2,017 male and 1,794 female subjects using k-means clustering based on their location along these nine PCs (Fig. 3). k-means clustering maximizes the separation between clusters⁵⁰. The clustering algorithm assigns individuals to the same cluster who are similar to each other in the space spanned by the top nine PCs of PCAge. Members of the same cluster, therefore, share similarities in biological features that impact their future mortality. CA was not part of the data used by the clustering algorithm, and no significant differences in CA were detected between any of the male clusters (Supplementary Table 5) nor for most of the female clusters (Supplementary Table 5). To learn more about the subjects comprising each cluster, we next characterized clusters using demographic and clinical data (Supplementary Table 5). We identified three unique themes, including healthy aging (green clusters), a cardio-metabolic axis formed by three separate clusters (purple, orange and red clusters) and an additional âmulti-morbidityâ group (yellow clusters).

Across sex and CA bins, subjects from the âhealthy agingâ (green) clusters were biologically significantly younger, had a slower cluster-specific aging rate and had significantly higher survival over the 20-year follow-up period compared to other clusters (Figs. 3 and 4a,b, Extended Data Figs. 3 and 4 and Supplementary Table 5). Subjects from the cardio-metabolic axis, comprising a spectrum across the âmild cardio-metabolicâ (purple) to the âmajor cardio-metabolicâ (orange) and âcardio-metabolic failureâ (red) clusters, exhibited increasingly positive PCAge Delta, progressive decline in survival and overall faster cluster-specific aging rates, with a majority suffering and dying from CVD (Figs. 3 and 4a,b, Extended Data Figs. 3 and 4 and Supplementary Table 5). Along this cardio-metabolic axis, subjects became increasingly obese, sedentary and frail (Supplementary Table 5). Members of the âmulti-morbidâ (yellow) cluster also failed to age successfully and formed a distinct group of subjects, outside the cardio-metabolic axis, with median PCAge Deltas significantly higher than âhealthy agersâ and âmild cardio-metabolicâ clusters but lower than the âmajor cardio-metabolicâ and âcardio-metabolic failureâ clusters (Supplementary Table 5). Although males and females from the âmulti-morbidâ clusters differed in terms of education and socioeconomic factors (Supplementary Table 5), members of both sexes had significantly more current smokers and subjects with alcohol use disorder, and their members had the lowest body mass index (BMI) among all the clusters (Supplementary Table 5). When we compared centenarians across all clusters, we found that centenarians had significantly lower mean PCAge Delta than matched controls (Fig. 3). For females, there were statistically significantly more centenarians in the âhealthy agingâ cluster than expected based on the dataset as a whole (Pâ=â0.03) (Supplementary Table 5). Interestingly, âhealthy agersâ used the healthcare system more proactively and effectively than members of other clusters (Supplementary Table 5 and Supplementary Fig. 6), suggesting that early and proactive treatment of risk factors and age-related disease(s) are determinants of more successful aging trajectories. Many members of the âmulti-morbidâ clusters suffered from, and died of, a variety of chronic, non-cardiovascular diseases (Supplementary Table 5). When treatment was indicated, there were significantly fewer members from the âmulti-morbidâ clusters who received the required chronic medications at an earlier age, and significantly more relatively younger members who required treatment were missed (Supplementary Table 5). In general, male members of the âmulti-morbidâ cluster accessed healthcare less frequently, and those who did relied more on emergency treatment (Supplementary Table 5). Taken together, these results suggest that lack of early, preventative and proactive treatment of age-related disease(s) and associated risk factors contributes to unsuccessful aging later in life (see Supplementary Note 1 in the Supplementary Information for detailed cluster analysis).

**Fig. 4: PCs to extract mechanisms of aging and age-related disease(s).**

The cluster analysis shows that individuals separated in feature space along the major PCs selected by PCAge differ not only by life expectancy but also by socioeconomic, lifestyle and behavioral factors, even though none of these factors was originally included in the model. We next asked if membership of the âhealthy agingâ cluster was associated with specific PC coordinates and if this could be exploited to extract pathways of healthy aging and inform intervention strategies aimed at moving subjects into the âhealthy agingâ cluster. We found that PC2, followed by PC4, resulted in the greatest separations between the âhealthy agingâ cluster and other clusters (Supplementary Table 4). When we sorted the clinical measures within PC2 by absolute magnitude and direction of their weights, we found that the clinical measures with highly weighted coefficients were mainly body composition and fat (Supplementary Table 6). In terms of interventions, the implications were obvious and expected, suggesting that improved exercise and diet would impact PC2 and result in more successful aging.

When we applied the same approach to PC4, we found that PC4 was substantially lower in the âhealthy agingâ cluster (Supplementary Table 4) compared to all other clusters. Moreover, PC4 was significantly positively correlated with CA in all but one cluster (female âcardio-metabolic failureâ) (Fig. 4c and Extended Data Fig. 4b). It is noteworthy that, despite the differences in body composition and disease spectrum that separate these two groups, high PC4 values were associated with less successful aging in both the âcardio-metabolicâ and âmulti-morbidâ clusters, suggesting that PC4 captures a common feature of all unsuccessful aging. To identify underlying mechanisms, we built a partial correlation network including only the top 10% (by PC4 absolute weight) of clinical parameters (Fig. 4d). We categorized these measures into biomedical categories related to body composition, physiological functions and responses, finding that PC4 encodes important information on pathways relating to cardiac function, renal function, inflammation and immunity, glucose regulation and iron storage and erythropoiesis (Fig. 4d). Elevated values in PC4 appear to capture abnormal clinical measures, thereby reflecting dysregulation in these pathways. Interestingly, the PC4 network gives substantial weight to markers of inflammation, a process known to play a central role in many age-dependent diseases.

Angiotensin-converting enzyme inhibitors/angiotensin receptor blockers normalize PC4 to reduce mortality risk and BA

Microalbuminuria, which is often secondary to chronic hypertension and/or longstanding diabetes mellitus, is an early manifestation of chronic kidney disease, associated with increased cardiovascular risk⁵¹. Microalbuminuria is considered clinically significant when the urine albumin-to-creatinine ratio (ACR) is â¥30âmgâg^â1 (ref. ⁵¹). We first matched (1) healthy subjects with normal urine ACR and without hypertension, hyperlipidemia or diabetes mellitus; (2) subjects with high urine ACR and not on treatment; and (3) subjects treated with angiotensin-converting enzyme inhibitors (ACE-Is) or angiotensin receptor blockers (ARBs) who had normal urine ACR (successfully treated). Subjects were matched by CA, sex, smoking status (serum cotinine) and BMI (nâ=â140 per group). We then compared the PC4 network of untreated subjects with high urine ACR against healthy subjects (Fig. 5a). In untreated subjects with high urine ACR, we found statistically significant increases in urine albumin (Pâ<â0.001), N-terminal pro-brain natriuretic peptide (NT-proBNP) (Pâ=â0.0074), globulin (Pâ<â0.001), C-reactive protein (CRP) (Pâ=â0.047), glycohemoglobin (HbA1c) (Pâ<â0.001) and glucose (Pâ<â0.001). Our results show that untreated subjects with high urine ACR also had dysregulated pathways involving renal and cardiac function, inflammation and glucose regulation. These findings are expected and consistent with known associations and outcomes of albuminuria, but we also found increased inflammation for untreated subjects with high urine ACR. Compared to healthy subjects, untreated subjects with high urine ACR had statistically significantly higher median PC4 value (Pâ<â0.001) (Fig. 5c), higher positive median PCAge Delta (Pâ<â0.001) (Fig. 5d) and higher mortality (Pâ<â0.001) (Fig. 5e).

**Fig. 5: ACE-I/ARBs normalize modifiable clinical parameters, involved in renal function, cardiac function and inflammation, within PC4 space to reduce mortality risk and BA.**

Given these findings, treatments to normalize urine ACR using best clinical practice, such as an ACE-I or ARB^51,52, might be expected to normalize PC4 values and lower PCAge. Apart from their reno-protective effects, ACE-I/ARBs have additional effects of lowering blood pressure and are cardio-protective, preventing heart failure⁵³. When we compared the PC4 network of ACE-I/ARB-treated subjects against healthy subjects, we found that there were no longer any significant differences in urine albumin, serum creatinine and NT-proBNP (Fig. 5b). Surprisingly, successful treatment with ACE-I/ARBs was also associated with lower CRP (Fig. 5b), which suggests that treatment with ACE-I/ARBs resulted in additional anti-inflammatory effects, either directly or through effects on general systemic function. ACE-I/ARB-treated subjects had statistically significantly lower median PCAge Delta (Pâ=â0.0027), resulting in an overall negative (PCAge lower than CA) PCAge Delta (Fig. 5d). Consistent with both the disease-specific benefits of ACE-I/ARBs and the normalization of PCAge Deltas, treated subjects had better survival over the 20-year follow-up period (Pâ=â0.003), with no remaining statistically significant differences in survival between ACE-I/ARB-treated and healthy subjects (Fig. 5e).

When we compared ACE-I/ARB-treated to untreated subjects with high urine ACR, we not only found lower urine albumin (Pâ<â0.001) and NT-proBNP (Pâ=â0.047) levels, as expected, but also statistically significantly lower levels of inflammatory markers, including serum globulin (Pâ=â0.011), CRP (Pâ=â0.047), fibrinogen (Pâ=â0.025), ferritin (Pâ=â0.03) and lactate dehydrogenase (LDH) (Pâ<â0.001) in ACE-I/ARB-treated subjects. Taken together, our data suggest that treatment of microalbuminuria with ACE-Is/ARBs reduces mortality risk and BA by normalizing modifiable clinical parameters, involved in renal function, cardiac function and inflammation, moving subjects along the PC4 axis in feature space.

The reduced clinical clock (LinAge) recapitulates PCAge

It is impractical to measure all 165 parameters included in PCAge. We, therefore, developed a reduced BA clock derived from PCAge but using a minimal set of parameters (LinAge). Using sensitivity analysis, we selected a subset of clinical parameters for inclusion to retain the predictive power of PCAge. LinAge includes only parameters from the complete blood count, renal function tests, liver function tests, iron panel and lipid panel in addition to vitamin B12, folate, CRP, fibrinogen, LDH, NT-proBNP, uric acid, glucose, HbA1c, urine ACR, blood pressure, pulse rate, BMI, smoking status and medical history (Supplementary Tables 1 and 7). All parameters used in LinAge can be measured in most standard clinical laboratories (see Methods, Code and Supplementary Files in the Supplementary Information for more details on âcustom clocksâ).

By design, LinAge is highly correlated with PCAge (Fig. 6a). To directly compare the performance of LinAge to PCAge in predicting survival, we first compared LinAge and PCAge predicted 20-year survival for all subjects in our test cohort. We also compared both clocks against CA, the ASCVD score and a widely used and well-validated measure of frailty in the clinic: the Clinical Frailty Scale (CFS)⁵⁴ (Fig. 6c). Both LinAge and PCAge outperformed CA, the ASCVD score and CFS, with no significant difference in the areas under the curve (AUCs) between PCAge and LinAge (Fig. 6c), suggesting that, despite the reduction in the number of parameters, PCAge and LinAge performed similarly in the NHANES IV test cohort. We also compared LinAge directly to PhenoAge using receiver operating characteristic (ROC) analysis (Fig. 6c) and in individual age bins (Fig. 6eâj and Extended Data Fig. 5). LinAge has a statistically significantly larger AUC when predicting future survival in the NHANES IV test cohort across all ages (Fig. 6c). LinAge also outperformed PhenoAge in some individual bins, although they performed similarly in other age and sex bins, especially in very old individuals (Fig. 6eâj and Extended Data Fig. 5). Finally, we compared LinAgeâs and PhenoAgeâs ability to predict specific causes of death, finding that LinAge overall outperformed PhenoAge in predicting 20-year CVD and non-CVD-related and cancer-related mortality. This advantage was more pronounced for non-CVD deaths (Supplementary Fig. 7bâd).

**Fig. 6: LinAge recapitulates PCAge in BA prediction.**

PCs can be strongly affected by outliers, thresholding and batch effects⁴¹. When extracting LinAge parameters, we collapsed both the projection into the PC coordinate system of the training dataset and the linear risk model into a single set of parameters. LinAge was trained in the NHANES IV 1999â2000 recruitment wave and performed well when tested in the separate NHANES IV 2001â2002 wave (Fig. 6a). Although none of the subjects of the training cohort was part of the testing cohort, both cohorts were recruited as part of NHANES IV. It could be argued that comparison between different waves does not adequately test the potential impact of batch effects and methodological differences that would be expected when applying LinAge to subjects from an independent trial. To address this concern, we used data from NHANES III, which was a study preceding NHANES IV that ran from 1988 to 1994. Both studies differed in key experimental aspects (Supplementary Table 8). Because NHANES III was initiated a decade earlier, linked mortality data for NHANES III are also substantially longer than for NHANES IV. Despite these differences, when applied to NHANES III data, LinAge could successfully stratify survival in NHANES III subjects without batch or thresholding corrections for most parameters and without the need for re-training (Fig. 6b and Supplementary Fig. 8). Although LinAge was only trained with up to 20-year follow-up, it performed equally well in predicting survival when applied to the longer follow-up time (25âyears) in NHANES III (Fig. 6d).

Caloric-restricted subjects have lower aging rates

One important application for aging clocks is to evaluate the impact of intervention strategies on BA. The CALERIE phase 2 randomized controlled trial was designed to test the effects of moderate (25%) calorie restriction (CR). A cohort of 220 healthy non-obese volunteers between the ages of 20âyears and 50âyears were randomly assigned to either CR or ad libitum (AL) control groups and followed over 2âyears⁵⁵. Although, in practice, subjects from the CR group achieved only relatively moderate CR (12%), this nevertheless resulted in a significant reduction in several known CVD risk factors⁵⁶, with reduction in BA estimates based on three different algorithms (KlemeraâDoubal BA, PhenoAge and homeostatic dysregulation)^28,57. However, a post hoc analysis using DNAm clocks found that only one DNAm clock (DunedinPACE) was able to identify significant effects, with no changes reported by several other DNAm clocks, including PhenoAge and GrimAge⁵⁸. Unfortunately, the parameters reported for CALERIE do not overlap sufficiently with LinAge to apply it directly. However, the PCA approach can be used to re-train custom clocks based on different subsets of the feature space.

To test this approach, we determined the set of parameters reported for both NHANES IV and CALERIE (Supplementary Table 1) and then applied the same procedure outlined for PCAge/LinAge (see Code and Supplementary Files in the Supplementary Information) to train and validate a mortality clock in NHANES IV (Fig. 7a). We next confirmed that the resulting âCALinAgeâ custom clock could predict mortality differences in the NHANES IV test cohort within the age range relevant for CALERIE (Fig. 7b,c). CALinAge (AUCâ=â0.8282) has a statistically significantly larger AUC than CA (AUCâ=â0.7910, Pâ<â0.001) when predicting future survival in the NHANES IV test cohort. We then applied CALinAge to CR and AL subjects, comparing the change in BA from baseline between CR and AL (Fig. 7d), adopting a similar approach as reported previously^57,58. Using this approach, we calculated a CALinAge aging rate of 1.54âyears per calendar year over the course of two calendar years for the AL group (95% confidence interval: 0.46â2.61, nâ=â55) and a significantly (Pâ=â0.0022, comparison of linear models by two-way ANOVA) lower aging rate of 0.11âyears per calendar year for the CR group (95% confidence interval: â0.58 to 0.79, nâ=â97) (Fig. 7d). Despite high inter-individual variability, the 95% confidence interval of the aging rate for CR subjects includes zero, indicating that the average aging rate in this group was insignificantly different from zero. These data suggest that CR, under the conditions realized in CALERIE, was able to significantly reduce biological aging as evaluated by CALinAge.

**Fig. 7: Caloric-restricted subjects have significantly lower aging rates.**

Discussion

In this study, we constructed CCs using dimensionality reduction by PCA to generate BA estimates from clinical parameters. We developed and validated a CC (PCAge) that estimates BA using linear dimensionality reduction in a large clinical feature space, followed by Cox proportional hazards regression against mortality. Based on data from a single survey timepoint, often decades before death, PCAge showed significant predictive efficacy over 20âyears and across a wide range of ages, illustrating the power of CCs in characterizing individual future aging trajectories, well before the onset of any specific pathology.

An advantage of CCs is that they are constructed from parameters that can be linked directly to the pathophysiology of specific diseases, making it easier to translate CC residuals into useful insights. Although PCA can facilitate both extraction and interpretation of feature space trajectories associated with organismal aging^{5,42,43,44,59}, working in PCA/SVD coordinates comes with a tradeoff in terms of abstraction. This is because PCA coordinates are linear combinations of the original features and can be more difficult to interpret^41,60. Nevertheless, individual aging PCs capture sets of features that exhibit correlated change during aging. By mapping to specific themes or aging processes captured by the data, analysis of individual PCs may, therefore, aid interpretation of feature space trajectories. Care must be taken when interpreting PCs in this way because they can be sensitive to outliers and thresholding effects and are generally not efficient in isolating distinct pathways. This tension between interpretability and efficacy is common to many machine learning techniques, and there are approaches to overcome it^25,61.

Here we show that analysis of aging PCs can aid interpretation, leading to the identification of mechanisms of age-dependent failure and of potential intervention against them. By clustering subjects based on their location in the lower-dimensional space spanned by those PCs with significant weights in PCAge, we found that cluster membership was systematically associated with how successfully subjects aged. A âhealthy agingâ cluster comprised subjects who were biologically younger and aged more successfully. Parameter values defining subjects from this cluster can be interpreted as normative values, defining features of healthy physiology at all ages. Overall distance from this cluster was associated with less successful aging. We found that, although there are different ways to age unsuccessfully, these are all associated with moving away from the main âhealthy agingâ cluster along different directions in PC space. This suggests that healthy agers are all alike, but unhealthy subjects are unhealthy in their own way.

Although our analysis demonstrates that PCA is a useful technique for dimensionality reduction and for the identification of patterns in high-dimensional aging datasets, it is, of course, far from the only such approach. For example, nonlinear generalizations of PCA, such as variational autoencoders⁴³, can be used to identify more complex patterns from high-dimensional data and may provide superior models, although this increased power comes at the cost of making interpretation more challenging.

Unfortunately, cross-sectional datasets, such as NHANES, do not allow comparison of subjects before and after interventions. This limits our ability to interpret the causality of the observation that ACE-Is/ARBs impact aging parameters. Furthermore, PCAge and LinAge include several parameters that are risk markers for age-dependent pathologies. Specifically, both include all the parameters found in the ASCVD score. However, BA clocks and clinical risk markers differ, both in goal and in approach. Clinical risk scores, by design, are hypothesis driven and organ/disease specific and aim to predict and detect specific pathologies or proximity to specific disease attractors. By contrast, CCs are data driven and disease agnostic, aiming to extract predictors of all-cause mortality from a collection of biological parameters, essentially quantifying the degree of an individualâs deviation from an optimal heathy aging trajectory. PCAge and LinAge are sensitive to a more complete set of mortality causes, and, unsurprisingly, they generally outperform the ASCVD score in predicting overall future mortality. This advantage is most obvious for individuals who are aging unusually well (whose BA is lower than their CA), because low cardiovascular risk alone does not guarantee healthy aging, but healthy aging is incompatible with substantially elevated cardiovascular risk.

The geroscience approach aims to practice preventative medicine by understanding and intervening in fundamental processes of aging or modulating the biological process(es) that drive the shift from healthy functioning toward systemic aging and the eventual manifestations of age-related disease(s). Geroscience shares many of the goals of traditional preventative medicine but seeks to push the boundaries of prevention to earlier ages, long before disease or overt abnormalities are detectable. Interpretable second-generation CCs can aid this goal. By analyzing factors related to elevated BA, we may be able to identify mechanisms separating successful from less successful aging. For example, subjects in the âhealthy agingâ cluster had significantly lower values along PC4. Analysis of parameter weights for PC4 revealed that it encoded information on disease pathways relating to systemic inflammation and immunity, impaired cardiac and renal function, glucose regulation, iron storage and erythropoiesis. This observation is consistent with the importance of organism-level aging patterns and the role of chronic sterile inflammation as a key driver of age-dependent decline, morbidity and mortality²⁵.

When interpreting CALERIE outcomes with reference to a clock trained in NHANES IV, we faced a common issueâthat is, two datasets that have some, but not all, features in common. To address this challenge, we used dimensionality reduction by PCA to train a custom clock based only on features present in both datasets (see Code and Supplementary Files in the Supplementary Information for an example dataset and more detailed explanation on generating custom PC clocks). CCs can aid rapid intervention testing as our analysis of CALERIE suggests, showing that mild CR significantly reduces biological aging.

Finally, aging clocks are not replacements for disease-specific risk markers or differential diagnosis. They differentiate subjects who are aging well from those who are aging poorly, helping us to define the former and pointing to interventions to help the latter. Early and proactive modification of known risk factors, using primary disease prevention approaches as well as existing pharmacological interventions, can play an important role in maintaining subjects on optimal aging trajectories, delaying manifestations of aging, including age-related disease, and, in turn, extending healthy lifespan. A key goal of geroscience is to intervene proactively at a time when interventions are most efficacious, years or decades before any overt pathology is present. In this sense, BA clocks are to geroscience what clinical risk scores are to traditional primary prevention. Mature BA clocks will allow healthcare providers and governments to navigate the complexities of the riskâbenefit analysis essential for adding years to healthy lifespan.

Methods

NHANES IV study design and participants

The continuous NHANES IV is an ongoing cohort study, by the National Center for Health Statistics, designed to assess the health and nutritional status of a nationally representative population of adults in the United States³⁵. The study involves a series of cross-sectional surveys that includes demographic, socioeconomic, dietary information; responses to health-related questions; medical and physiological measurements; and results of laboratory tests. NHANES IV is approved by the National Center for Health Statistics Research Ethics Review Board. All study participants are de-identified, and data from NHANES IV are publicly available³⁵. In the present study, we included adults aged 40â84âyears, recruited for the 1999â2000 and 2001â2002 cohorts. Linked mortality data were obtained from the National Death Index⁶² and are available from 1 January 1999 until 31 December 2019.

The entire NHANES 1999â2002 dataset was initially composed of 5,700 participants and 186 clinical parameters, which included data from health-related questions and physiological and laboratory measurements. Using the health-related questions, we generated three derived indices, including a comorbidity index, a self-health index and a healthcare use index as follows.

The comorbidity index included data on 22 comorbidities (hypertension, diabetes mellitus, renal impairment, asthma, anemia, arthritis, coronary heart disease, angina, previous myocardial infarction, previous stroke, emphysema, thyroid disease, obesity, chronic bronchitis, liver disease, malignancy, osteoporosis, previous hip fracture, previous wrist fracture, previous spine fracture, cognitive impairment and overnight hospitalization). The index was calculated as the sum of the total number of comorbidities reported divided by the maximum number possible (22). The self-health index was calculated based on two questions reporting on a subjectâs general health condition and on their current health compared to 1âyear ago. Options for the general health question were: âgood general healthâ (or better), âfair general healthâ or âpoor general healthâ. Options for the question regarding current health compared to 1âyear ago were: âbetter current healthâ, âabout the sameâ or âworse current healthâ. Affirmative answers were scored as 1, and negative answers were scored as 0. An aggregate index was generated according to the following formula: self-health indexâ=â((âfair general healthââÃâ2)â+â(âpoor general healthââÃâ4))âÃâ(1âââ(âbetter current healthââÃâ0.5)â+â(âworse current healthâ)). Therefore, subjects who became more ill would receive twice the penalty for current health status, whereas subjects who reported recovery would receive a modifier of 0.5. The healthcare use index is the number of times a subject received healthcare over the past year as coded by the NHANES variable âHUQ050â³⁵. These three indices were included directly as clinical parameters without normalization.

After excluding any parameters with more than 10% missing observations and all subjects with incomplete records, the resulting dataset was reduced to 3,811 participants and 165 clinical parameters. Our final training cohort, composed of the NHANES IV 1999â2000 study participants, included 923 males and 852 females, and our testing cohort, composed of the NHANES IV 2001â2002 study participants, included 1,094 males and 942 females (see Supplementary Table 2 for baseline characteristics).

It was showed previously that subjects with missing data in some cases are not completely random. For example, older subjects may be more likely to have missing data in some variables⁴⁸. This is a potential concern, as removal of subjects with missing values may affect some subsets of the cohort more than others. We, therefore, opted to remove features with more than 10% missing values. Because of its biological importance, NT-proBNP was included despite being missing in more than 10% of subjects (14.4%).

BAâdefinition used and determination from hazard ratio

We know that individual BA typically differs from CA². Some approaches define the BA of an individual as the age at which that individualâs position in feature space would be approximately normal for a reference cohort^27,28,29. A different approach uses mortality risk as the dependent variable, directly building models to predict future mortality from biological parameters^{17,18,31,32,33}. In this scheme, the true BA of an individual is defined as the âGompertz ageâ of the reference cohort at which subjects of the reference cohort have the same all-cause mortality risk as the individual in question^29,63. This definition of BA addresses the problem that different clocks (for example, based on different feature spaces or using different mathematical/machine learning methods) often do not agree with each other, sometimes producing vastly different BA estimates for the same individual. This might occur because the appearance of an individual may be different when viewed through the lens of different feature spaces and compared to different training cohorts. However, clocks trained to predict âGompertzâ BA can be objectively compared by testing them directly against the ground truth of historically observed all-cause mortality. Here, we adopt âGompertzâ BA, following the approach by Levine et al.¹⁸.

We first generated two Cox proportional hazard models for the training cohort⁶⁴. The NULL models for males and females were fitted to predict the mortality hazard (h₀) of dying over the follow-up period, based on CA and sex alone. This NULL model also yields the sex-specific mortality rate doubling time (MRDT_sex) for the training cohort. A second Cox model was then constructed by taking into consideration the covariates (PCs) for each subject of the training cohort. The PCs to be included were selected based on the percentage explained (see below). The final model was then used to predict the hazard of dying as a function of an individualâs position in PC-transformed feature space (h_pc). Finally, differences in âGompertz ageâ are calculated that result in an equivalent relative hazard ratio h_pc/h₀, thereby converting the hazard ratio into a corrected âGompertz ageâ (Îage) as follows:

$$\Delta {\rm{age}}=\frac{\mathrm{ln}\left(\frac{{h}_{{pc}}}{{h}_{0}}\right)}{\mathrm{ln}(2)}\bullet {MRD}{T}_{{sex}}$$

The final BA was then calculated by adding this age correction to the subject actual CA:

${\rm{BA}}={\rm{CA}}+\Delta {\rm{age}}$

SVD/PCAâmotivation and construction

When building models based on only a small subset of features, errors within these selected features can have a large impact on the resulting model predictions. An alternative approach, therefore, is to employ dimensionality reduction techniquesâthat is, transformation of data from a high-dimensional feature space into an approximately equivalent, lower-dimensional space. Using this lower-dimensional space as the basis of the model simplifies model building (reduces the number of parameters that need to be fitted), but, because the original features are still included in determining each subjectâs position in the reduced feature space, errors in individual parameters have a less pronounced effect on the overall model. Here, we explored the potential of PCA as a linear dimensionality reduction tool. For the construction of PCAge and LinAge, we first normalized the clinical parameters of the training dataset into z-scores, before transforming them into PC coordinates using the SVD function of R version 4.2.0 (https://www.R-project.org/). For the testing and validation cohorts, we used the singular vectors derived for the training cohort to project each subjectâs normalized feature space coordinates into the same PC coordinate system of the training set.

PCAge and LinAge development and validation

For PCAge, the first 18 PCs of the training data, which accounted for 99% of the overall variability in that dataset, together with each subjectâs CA, were selected as covariates in the Cox proportional hazards regression model, trained against data from 20-year mortality follow-up. The hazard ratio for each individual was then converted into an age correction as outlined above. Separate BA clocks were trained for males and females, exclusively using the data for the NHANES IV 1999â2000 recruitment wave and tested against the NHANES IV 2001â2002 wave.

For the construction of LinAge, we selected 61 parameters that are routinely measured clinically or can be extracted from clinical records. The relevant parameters are listed in Supplementary Tables 1 and 7. With the exception of CA, basophils number, smoking status and morbidity indices, each parameter was normalized with reference to its median value obtained from the âhealthy agingâ clusters of PCAge (Fig. 3a,b). Parameters were normalized separately for male and females by subtraction of the median and division by the median absolute deviation (MAD) in the âhealthy agingâ clusters for males or females, respectively (Supplementary Table 7). Smoking status was determined by binning the serum cotinine levels according to known cutoffs demonstrated in previous studies to correspond with qualitative smoking status⁶⁵: 0 to <10ângâml^â1 (non-smokersâ=â0), 10 to <100ângâml^â1 (light smokersâ=â1), 100 to <200ângâml^â1 (moderate smokersâ=â2) and â¥200ângâml^â1 (heavy smokersâ=â3). This approach is subject to less recall bias when compared to using questionnaire data on smoking status. However, in cases where data on cotinine are not available, this score can also be populated directly from data obtained by questionnaire without any change to LinAge. As for PCAge, feature space coordinates were transformed into PC space, and models were optimized separately for males and females based on the mortality follow-up for the training cohort.

For LinAge, PCs were selected for inclusion in the final model using regularized Cox regression using âglmnetâ version 4.1â7 (refs. ^66,67) and the âsurvivalâ package version 3.5â8 (ref. ⁶⁸) in R, with a 10-fold cross-validation and for alpha values of 1, 0.75 and 0.5 (Supplementary Fig. 9). PCs bearing non-zero weights, which were identified in a minimum of five instances out of the 100 iterations and consistently selected across all models, were selected for inclusion in the final Cox model. Individual proportional hazard ratios were transformed into age deltas and added to CA for each subject as described above. For LinAge, we also extracted a version of the clock parametrized in the original (non-PC) feature space by multiplying the SVD-derived coordinate transformation matrix with the weight matrix in PC space to obtain discrete parameter weights for each of the 61 parameters included in LinAge. These individual, parameter-level weights for the clinical parameters can be found in Supplementary Table 7 and enable LinAge to be calculated directly from parameter values using only a spreadsheet.

Equation for BA:

$${\rm{LinAge}}(\bar{X}\,)={\beta }_{{CA}}\bullet {CA}+\mathop{\sum }\limits_{i=1}^{n}{\beta }_{i}\bullet {X}_{i}^{\,z}+{C}_{0}$$

where:

$\bar{X}=\left\{{X}_{i}\right\}$, vector of nâ=â61 parameters used for LinAge (for given subject)

${\bar{X}}^{\,z}=\left\{{X}_{i}^{\,z}\right\}$, ${X}_{i}^{\,z}=\frac{{X}_{i}-{{median}}_{i}}{{{MAD}}_{i}}$, normalized parameters (for given subject)

median_i = sex-specific median value for i-th parameter over âhealthy agingâ cluster (Supplementary Table 7)

MAD_i = sex-specific MAD value for i-th parameter over âhealthy agingâ cluster (Supplementary Table 7).

Further LinAge validation

Validation in independent NHANES III cohort

The NHANES III study ran from 1988 to 1994 to assess the health and nutritional status of the United Statesâ civilian, non-institutionalized population⁶⁹. Cross-sectional survey data included demographic, socioeconomic and dietary information; responses to health-related questions; medical and physiological measurements; and results of clinical laboratory tests. NHANES III was approved by the National Center for Health Statistics Research Ethics Review Board. Anonymized data from NHANES III were obtained from a publicly available source⁶⁹. Linked mortality data were obtained from the National Death Index⁶² and are available from 18 October 1988 until 31 December 2019. Our final NHANES III external validation cohort comprised 715 males and 819 females aged 40â89âyears for whom most LinAge parameters were available.

Of all LinAge parameters, only one (NT-proBNP) was not recorded as part of NHANES III and was, therefore, missing from the entire NHANES III dataset. We addressed this issue by setting the weights associated with NT-proBNP to zero in the LinAge model. This approach is conservative with respect to evaluating LinAge in a different cohort, as LinAge would perform worse in NHANES IV if the information encoded in the NT-proBNP was not used. We next compared individual parameters between NHANES III and NHANES IV to identify major batch effects. However, we deliberately did not carry out a formal batch or range correction procedure as this would not generally be feasible, for example, when applying LinAge to a new patient cohort for which exact parameter ranges may not be known. However, we identified technical issues with the CRP and bicarbonate parameters. For technical reasons, CRP values in NHANES III had a lower detection limit of 0.21, whereas NHANES IV had a lower detection limit of 0.01. This means that the lowest possible value in NHANES III would be considered above the detection limit and within the modeled range for NHANES IV. We addressed this issue by replacing CRP values for group of subjects with the median log(CRP) value from NHANES IV, thereby ensuring that, for this subset of subjects, CRP contributed a zero value to the LinAge Delta age. Finally, the distribution of bicarbonate values of NHANES III was systematically and substantially higher than in NHANES IV (Supplementary Fig. 10a). Although some other parameters also showed systematic differences between NHANES III and NHANES IV, we chose to correct these batch effects only for bicarbonate alone by re-centering the NHANES III bicarbonate distribution on NHANES IV by multiplication of the bicarbonate values of NHANES III by the ratio between the mean bicarbonate values of the âhealthy agingâ clusters in the NHANES IV 1999â2002 cohorts and NHANES III (Supplementary Fig. 10b). No other corrections were applied.

Evaluation of the CALERIE intervention trial

Details of the CALERIE phase 2 multi-center randomized controlled trial were reported and can be found in the original publications^55,70,71. The CALERIE trial is registered on ClinicalTrials.gov as NCT00427193. CALERIE received ethics approval at three clinical centers (Washington University School of Medicine, Pennington Biomedical Research Center and Tufts University) and at the coordinating center at Duke University. Data from CALERIE are publicly available (https://calerie.duke.edu), and all study participants are de-identified.

When contemplating applying LinAge to CALERIE, we discovered that several of the clinical parameters used in LinAge were not present in the CALERIE dataset. We applied our pipeline to re-train a version of LinAge based on a subset of the feature space for which parameters were available within both NHANES IV and CALERIE. This clock included only features found in both the NHANES IV 1999â2002 waves and the CALERIE trial (Supplementary Table 1). The resulting CALinAge clock was trained in subjects aged 20â70âyears in the NHANES IV 1999â2000 cohort (nâ=â1,516), before being validated in subjects aged 20â70âyears in the separate NHANES IV 2001â2002 cohort (nâ=â1,683) and finally applied to the CALERIE trial subjects (nâ=â159). Our final CALERIE external validation cohort comprised 101 CR and 58 AL subjects.

Clustering analyses

We performed k-means clustering, using the âclusterâ⁷² (version 2.1.4) and âfactoextraâ⁷³ (version 1.0.7) R packages, by Euclidean distance in PC coordinates for 2,017 male and 1,794 female participants from the entire NHANES IV 1999â2002 dataset. We selected PC numbers 2, 3, 4, 7, 10, 11, 13, 17 and 18 for clustering, based on their significant weights in the PCAge model (Supplementary Fig. 3). We generated five distinct clusters each for males and females. The optimal number of clusters was determined to minimize the degree of overlap in information/themes between clusters. A larger number of clusters resulted in additional separation along the cardio-metabolic axis, whereas selecting fewer clusters resulted in merger along the same axis. Clusters were visualized using the âfviz_clusterâ function of âfactoextraâ. This method projects subjects into a two-dimensional (2D) plane which forms the xây plane of the cluster diagrams (Fig. 3a,b). The third (z) axis of the three-dimensional (3D) cluster diagram is time, in this case the age of each subject at the time of final mortality status update (31 December 2019). If this update is a record of death (subject deceased), a small marker (sphere) is drawn. If the final record is a record of survival to the end of the follow-up period, then the age at that timepoint is used as the z coordinate, and a large marker is drawn.

PC interpretation and partial correlation network analysis

Partial correlation network analysis of PC4 was performed by selecting the top 10% clinical measures by absolute magnitude of weights within PC4. For these parameters, we generated partial correlations using the âppcorâ⁷⁴ (version 1.1) R package. Edges below a hard threshold of 0.1 were set to zero, and the remaining edges were used as edge weights to construct a network using the âigraphâ^75,76 (version 2.0.3) R package. Edges indicating positive correlation are colored in blue, and negative edges are colored in red. Clinical parameters were categorized by body composition, physiological functions and physiological responses, based on domain knowledge to aid interpretation of the partial correlation network.

PhenoAge, ASCVD and CFS scores

PhenoAge¹⁸ and the ASCVD score³⁴ were constructed, and functions to calculate them from the data matrix were implemented, based on the equations provided in the original publications. CFS scores were determined as previously reported⁵⁴.

Dealing with missing values

Missing data is a common challenge with clinical datasets. When variables are missing from all or a large fraction of subjects, it may be necessary to remove that variable and train a new (âcustomâ) clock. An example workflow for creating a custom PCA clock is illustrated in Code and Supplementary Files (Supplementary Information). A similar toolkit for clocks based on individual parameters is already available as an R package²⁸.

If features are missing randomly for only a small fraction of subjects, missing values in continuous datasets can be imputed, for example, using mean imputation, random forest imputation and PCA-based imputation algorithms. PCA-based imputation takes advantage of the fact that the PC coordinates are linearly uncorrelated. For PCA-based imputation, the original data are projected into the PC coordinate system, initially using randomly imputed values for the missing data. The missing values are then deduced by reverse projecting the reduced data back onto the original space. This cycle is repeated, iteratively refining the missing values. Such PCA imputation can be performed using the iterative PCA algorithm implemented in the âmissMDAâ package. We explored PCA imputation as implemented in the âmissMDAâ⁷⁷ (version 1.18) R package with 10,000 iterations. However, when imputing data, care needs to be taken to ensure that variables are missing completely at random, with the probability of being missing not related to parameter values. For our final analysis (both PCAge and LinAge), we elected not to impute missing values and, instead, removed subjects with missing values from the analysis completely.

For LinAge itself, a zero z-score can be substituted for missing values after normalization, thereby setting the impact of the missing value on LinAge to zero. The magnitude of error introduced by this substitution (in years) is of the order of the individual weight (Supplementary Table 7) for the parameter in question.

Statistics and reproducibility

For the NHANES IV cohort, we excluded (1) participants top-coded at age 85âyears, as we could not ascertain the exact CAs of these adults; (2) participants who died from accidental deaths, as these were deemed to be not age related; and (3) physiological and laboratory measurements with significant missing data, defined as more than 10% of the training dataset. For the NHANES III cohort, we excluded (1) participants top-coded at age 90âyears, as we could not ascertain the exact CAs of these adults; (2) participants who died from accidental deaths, as these were deemed to be not age related; and (3) subjects for whom laboratory measurements needed to calculate LinAge were missing. For the CALERIE cohort, participants with missing data for whom CALinAge could not be calculated were excluded. For all three cohorts, no statistical method was used to pre-determine sample size.

Correlation analyses were performed using linear regression, and the strength of correlation was determined using PCC. Two-sided t-tests were used to compare the delta telomere lengths, delta digit symbol substitution test scores and delta gait speeds between groups and to compare the PCAge Deltas of centenarians to non-centenarians. Two-way ANOVA was used to compare aging rates between the AL and CR groups in the CALERIE trial. Survival analyses were performed using log-rank tests. KruskalâWallis tests were performed on continuous variables during cluster characterization. Post-test pairwise comparisons using Wilcoxon rank-sum tests with continuity correction were performed between clusters. Hypergeometric probability distributions were used to compare categorical variables during cluster characterization. KruskalâWallis tests were used to compare clinical parameters between multiple groups involving healthy subjects, untreated subjects with high urine ACR and ACE-I/ARB-treated subjects. Post hoc analyses were performed using Dunnâs test. ROC curves were compared using DeLongâs test. All statistical analyses were performed using R version 4.2.0 (https://www.R-project.org/).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets used are publicly available online at https://wwwn.cdc.gov/nchs/nhanes/Default.aspx, https://wwwn.cdc.gov/nchs/nhanes/nhanes3/datafiles.aspx#core and https://calerie.duke.edu. There were no restrictions on data availability. This study was reported according to STROBE guidelines for cohort studies⁷⁸.

Code availability

The codes for PCAge and LinAge, as well as code to generate a custom PC clock, are available as zipped R archives. Content of this archive and the readme file is provided in the Supplementary Information.

References

Kennedy, B. K. et al. Geroscience: linking aging to chronic disease. Cell 159, 709â713 (2014).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Ingram, D. K. Toward the behavioral assessment of biological aging in the laboratory mouse: concepts, terminology, and objectives. Exp. Aging Res. 9, 225â238 (1983).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Comfort, A. Test-battery to measure ageing-rate in man. Lancet 2, 1411â1414 (1969).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Ferrucci, L. et al. Measuring biological aging in humans: a quest. Aging Cell 19, e13080 (2020).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Nakamura, E., Miyao, K. & Ozeki, T. Assessment of biological age by principal component analysis. Mech. Ageing Dev. 46, 1â18 (1988).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Drewelies, J. et al. Using blood test parameters to define biological age among older adults: association with morbidity and mortality independent of chronological age validated in two separate birth cohorts. Geroscience 44, 2685â2699 (2022).
ArticleÂ PubMedÂ Google ScholarÂ
Park, J., Cho, B., Kwon, H. & Lee, C. Developing a biological age assessment equation using principal component analysis and clinical biomarkers of aging in Korean men. Arch. Gerontol. Geriatr. 49, 7â12 (2009).
ArticleÂ PubMedÂ Google ScholarÂ
Pyrkov, T. V. et al. Quantitative characterization of biological age and frailty based on locomotor activity records. Aging (Albany NY) 10, 2973â2990 (2018).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Tian, Y. E. et al. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat. Med. 29, 1221â1231 (2023).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Nakamura, E. & Miyao, K. A method for identifying biomarkers of aging and constructing an index of biological age in humans. J. Gerontol. A Biol. Sci. Med. Sci. 62, 1096â1105 (2007).
ArticleÂ PubMedÂ Google ScholarÂ
Zhong, X. et al. Estimating biological age in the Singapore Longitudinal Aging Study. J. Gerontol. A Biol. Sci. Med. Sci. 75, 1913â1920 (2020).
ArticleÂ PubMedÂ Google ScholarÂ
Hastings, W. J., Shalev, I. & Belsky, D. W. Comparability of biological aging measures in the National Health and Nutrition Examination Study, 1999â2002. Psychoneuroendocrinology 106, 171â178 (2019).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Belsky, D. W. et al. Quantification of the pace of biological aging in humans through a blood test, the DunedinPoAm DNA methylation algorithm. eLife 9, e54870 (2020).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Belsky, D. W. et al. Quantification of biological aging in young adults. Proc. Natl Acad. Sci. USA 112, E4104âE4110 (2015).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359â367 (2013).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Levine, M. E. Modeling the rate of senescence: can estimated biological age predict mortality more accurately than chronological age? J. Gerontol. A Biol. Sci. Med. Sci. 68, 667â674 (2013).
ArticleÂ PubMedÂ Google ScholarÂ
Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY) 10, 573â591 (2018).
ArticleÂ PubMedÂ Google ScholarÂ
Lu, A. T. et al. DNA methylation GrimAge version 2. Aging (Albany NY) 14, 9484â9549 (2022).
CASÂ PubMedÂ Google ScholarÂ
Lu, A. T. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging (Albany NY) 11, 303â327 (2019).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Hwangbo, N. et al. A metabolomic aging clock using human cerebrospinal fluid. J. Gerontol. A Biol. Sci. Med. Sci. 77, 744â754 (2022).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Robinson, O. et al. Determinants of accelerated metabolomic and epigenetic aging in a UK cohort. Aging Cell 19, e13149 (2020).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Unfried, M. et al. LipidClock: a lipid-based predictor of biological age. Front. Aging 3, 828239 (2022).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Nie, C. et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 38, 110459 (2022).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Sayed, N. et al. An inflammatory aging clock (iAge) based on deep learning tracks multimorbidity, immunosenescence, frailty and cardiovascular aging. Nat. Aging 1, 598â615 (2021).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Kristic, J. et al. Glycans are a novel biomarker of chronological and biological ages. J. Gerontol. A Biol. Sci. Med. Sci. 69, 779â789 (2014).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Klemera, P. & Doubal, S. A new approach to the concept and computation of biological age. Mech. Ageing Dev. 127, 240â248 (2006).
ArticleÂ PubMedÂ Google ScholarÂ
Kwon, D. & Belsky, D. W. A toolkit for quantification of biological age from blood chemistry and organ function test data: BioAge. Geroscience 43, 2795â2808 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Moqri, M. et al. Biomarkers of aging for the identification and evaluation of longevity interventions. Cell 186, 3758â3775 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Pyrkov, T. V. et al. Extracting biological age from biomedical data via deep learning: too much of a good thing? Sci. Rep. 8, 5210 (2018).
ArticleÂ PubMedÂ Google ScholarÂ
Liu, Z. et al. A new aging measure captures morbidity and mortality risk across diverse subpopulations from NHANES IV: a cohort study. PLoS Med. 15, e1002718 (2018).
ArticleÂ PubMedÂ Google ScholarÂ
McCrory, C. et al. GrimAge outperforms other epigenetic clocks in the prediction of age-related clinical phenotypes and all-cause mortality. J. Gerontol. A Biol. Sci. Med. Sci. 76, 741â749 (2021).
ArticleÂ PubMedÂ Google ScholarÂ
Zhang, Y. et al. DNA methylation signatures in peripheral blood strongly predict all-cause mortality. Nat. Commun. 8, 14617 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Goff, D. C. Jr. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 129, S49âS73 (2014).
ArticleÂ PubMedÂ Google ScholarÂ
Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). Continuous National Health and Nutrition Examination Survey (NHANES). https://wwwn.cdc.gov/nchs/nhanes/
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Li, S. et al. Genetic and environmental causes of variation in the difference between biological age based on DNA methylation and chronological age for middle-aged women. Twin Res. Hum. Genet. 18, 720â726 (2015).
ArticleÂ PubMedÂ Google ScholarÂ
Sebastiani, P. et al. Biomarker signatures of aging. Aging Cell 16, 329â338 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Ferrucci, L., Hesdorffer, C., Bandinelli, S. & Simonsick, E. Frailty as a nexus between the biology of aging, environmental conditions and clinical geriatrics. Public Health Rev. 32, 475â488 (2010).
ArticleÂ Google ScholarÂ
Jain, K. & Chandrasekaran, B. In Handbook of Statistics, Vol. 2 (eds Krishnaiah, P. R. & Kanal, L. N.) 835â855 (North-Holland Publishing Company, 1982).
Strang, G. Introduction to Linear Algebra 5th edn (Cambridge Univ. Press, 2016).
Tarkhov, A. E. et al. A universal transcriptomic signature of age reveals the temporal scaling of Caenorhabditis elegans aging trajectories. Sci. Rep. 9, 7368 (2019).
ArticleÂ PubMedÂ Google ScholarÂ
Avchaciov, K. et al. Unsupervised learning of aging principles from longitudinal data. Nat. Commun. 13, 6529 (2022).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Pyrkov, T. V. & Fedichev, P. O. In Biomarkers of Human Aging (ed Moskalev, A.) 23â36 (Springer, 2019).
Higgins-Chen, A. T. et al. A computational solution for bolstering reliability of epigenetic clocks: implications for clinical trials and longitudinal tracking. Nat. Aging 2, 644â661 (2022).
ArticleÂ PubMedÂ Google ScholarÂ
Bae, C. Y., Kim, I. H., Kim, B. S., Kim, J. H. & Kim, J. H. Predicting the incidence of age-related diseases based on biological age: the 11-year national health examination data follow-up. Arch. Gerontol. Geriatr. 103, 104788 (2022).
ArticleÂ PubMedÂ Google ScholarÂ
Chan, M. S. et al. A biomarker-based biological age in UK Biobank: composition and prediction of mortality and hospital admissions. J. Gerontol. A Biol. Sci. Med. Sci. 76, 1295â1302 (2021).
ArticleÂ PubMedÂ Google ScholarÂ
Pridham, G., Rockwood, K. & Rutenberg, A. Efficient representations of binarized health deficit data: the frailty index and beyond. Geroscience 45, 1687â1711 (2023).
ArticleÂ PubMedÂ Google ScholarÂ
Doherty, T. et al. A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator. BMC Bioinformatics 24, 178 (2023).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
MacQueen, J. B. Some methods for classification and analysis of multivariate observations. Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 281â297 https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s5_v1_article-17.pdf (1967).
KDIGO 2012 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney Int. 3, 1â150 (2013).
KDIGO 2022 Clinical Practice Guideline for Diabetes Management in Chronic Kidney Disease. Kidney Int. 102, S1âS127 (2022).
Heidenreich, P. A. et al. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: a report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J. Am. Coll. Cardiol. 79, e263âe421 (2022).
ArticleÂ PubMedÂ Google ScholarÂ
Rockwood, K. et al. A global clinical measure of fitness and frailty in elderly people. CMAJ 173, 489â495 (2005).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Rochon, J. et al. Design and conduct of the CALERIE study: comprehensive assessment of the long-term effects of reducing intake of energy. J. Gerontol. A Biol. Sci. Med. Sci. 66, 97â108 (2011).
ArticleÂ PubMedÂ Google ScholarÂ
Kraus, W. E. et al. 2 years of calorie restriction and cardiometabolic risk (CALERIE): exploratory outcomes of a multicentre, phase 2, randomised controlled trial. Lancet Diabetes Endocrinol. 7, 673â683 (2019).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Belsky, D. W., Huffman, K. M., Pieper, C. F., Shalev, I. & Kraus, W. E. Change in the rate of biological aging in response to caloric restriction: CALERIE biobank analysis. J. Gerontol. A Biol. Sci. Med. Sci. 73, 4â10 (2017).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Waziry, R. et al. Effect of long-term caloric restriction on DNA methylation measures of biological aging in healthy adults from the CALERIE trial. Nat. Aging 3, 248â257 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hofecker, G., Skalicky, M., Kment, A. & Niedermuller, H. Models of the biological age of the rat. I. A factor model of age parameters. Mech. Ageing Dev. 14, 345â359 (1980).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Bafei, S. E. C. & Shen, C. Biomarkers selection and mathematical modeling in biological age estimation. NPJ Aging 9, 13 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Qiu, W., Chen, H., Kaeberlein, M. & Lee, S. I. ExplaiNAble BioLogical Age (ENABL Age): an artificial intelligence framework for interpretable biological age. Lancet Healthy Longev. 4, e711âe723 (2023).
ArticleÂ PubMedÂ Google ScholarÂ
Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Death Index. https://www.cdc.gov/nchs/ndi/index.htm
Gompertz, B. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. https://doi.org/10.1098/rspl.1815.0271 (1825).
Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. B 34, 187â220 (1972).
ArticleÂ Google ScholarÂ
OâConnor, R. J. et al. Changes in nicotine intake and cigarette use over time in two nationally representative cross-sectional samples of smokers. Am. J. Epidemiol. 164, 750â759 (2006).
ArticleÂ PubMedÂ Google ScholarÂ
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1â22 (2010).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Simon, N., Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for Coxâs proportional hazards model via coordinate descent. J. Stat. Softw. 39, 1â13 (2011).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Therneau, T. M. A package for survival analysis in R. https://cran.r-project.org/web/packages/survival/vignettes/survival.pdf (2024).
Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey (NHANES) III. https://wwwn.cdc.gov/nchs/nhanes/nhanes3/default.aspx
Rickman, A. D. et al. The CALERIE Study: design and methods of an innovative 25% caloric restriction intervention. Contemp. Clin. Trials 32, 874â881 (2011).
ArticleÂ PubMedÂ Google ScholarÂ
Ravussin, E. et al. A 2-year randomized controlled trial of human caloric restriction: feasibility and effects on predictors of health span and longevity. J. Gerontol. A Biol. Sci. Med. Sci. 70, 1097â1104 (2015).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: cluster analysis basics and extensions. R package version 2.1.4. (2022).
Kassambara, A. & Mundt, F. factoextra: extract and visualize the results of multivariate data analyses. R package version 1.0.7. https://cran.r-project.org/web/packages/factoextra/index.html (2020).
Kim, S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665â674 (2015).
PubMedÂ Google ScholarÂ
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal 1695 (2006).
Csardi, G. et al. igraph for R: R interface of the igraph library for graph theory and network analysis. https://doi.org/10.5281/zenodo.8046777 (2023).
Josse, J. & Husson, F. missMDA: a package for handling missing values in multivariate data analysis. J. Stat. Softw. 70, 1â31 (2016).
ArticleÂ Google ScholarÂ
Vandenbroucke, J. P. et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med. 4, e297 (2007).
ArticleÂ PubMedÂ Google ScholarÂ

Download references

Acknowledgements

We thank the National Health and Nutrition Examination Survey and the Comprehensive Assessment of Long-term Effects of Reducing Intake of Energy (CALERIE) trial participants and staff who made this study possible. We thank C. Chen for her careful reading of the paper. This research was funded by the Ministry of Education in Singapore, grant number IG21-SG103, to N.T., and grants IG21-SG007 and A-0007215-00-00, to J.G. S.F. is supported by the Research Training Fellowship (MOH-001294-00) from the National Medical Research Council Singapore. This work was supported by the Lien Foundation.

Author information

Authors and Affiliations

Department of Geriatric Medicine, Singapore General Hospital, Singapore, Singapore
Sheng Fong
Clinical and Translational Sciences PhD Program, Duke-NUS Medical School, Singapore, Singapore
Sheng Fong
Healthy Longevity Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Kamil Pabis,Â Djakim Latumalea,Â Maximilian Unfried,Â Brian KennedyÂ &Â Jan Gruber
Center for Healthy Longevity, National University Health System, Singapore, Singapore
Kamil Pabis,Â Djakim Latumalea,Â Maximilian Unfried,Â Brian KennedyÂ &Â Jan Gruber
Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Kamil Pabis,Â Djakim Latumalea,Â Maximilian Unfried,Â Brian KennedyÂ &Â Jan Gruber
Science Division, Yale-NUS College, Singapore, Singapore
Nomuundari Dugersuren,Â Nicholas TolwinskiÂ &Â Jan Gruber
Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
Nicholas Tolwinski

Authors

Sheng Fong
View author publications
You can also search for this author in PubMedÂ Google Scholar
Kamil Pabis
View author publications
You can also search for this author in PubMedÂ Google Scholar
Djakim Latumalea
View author publications
You can also search for this author in PubMedÂ Google Scholar
Nomuundari Dugersuren
View author publications
You can also search for this author in PubMedÂ Google Scholar
Maximilian Unfried
View author publications
You can also search for this author in PubMedÂ Google Scholar
Nicholas Tolwinski
View author publications
You can also search for this author in PubMedÂ Google Scholar
Brian Kennedy
View author publications
You can also search for this author in PubMedÂ Google Scholar
Jan Gruber
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

S.F., B.K. and J.G. conceived, conceptualized and designed the study. S.F., K.P., D.L., N.D. and J.G. analyzed and interpreted the data. S.F., K.P., D.L., M.U., N.T., B.K. and J.G. wrote the first draft of the paper. S.F., D.L., N.T. and J.G. wrote the revised versions of the paper.

Corresponding author

Correspondence to Jan Gruber.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Aging thanks Daniel Belsky, Albert Higgins-Chen and Wolfgang Wagner for their contribution to the peer review of this work.

Additional information

Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 PCAge also predicts BA in chronologically 45â54 year old males and females.

Kaplan-Meier survival curves over a 20-year follow up period for 45â54 year old males and females for mean CA (Chronological Age, black), biologically younger males/females in the best 25% quartile for BA per CA category (PCAge Low, cyan), biologically older males/females in the worst 25% quartile for BA per CA category (PCAge High, blue), biologically younger males/females in the best 25% quartile for ASCVD score per CA category (CVD risk Low, orange), and biologically older males/females in the worst 25% quartile for ASCVD score per CA category (CVD risk High, red). a, Compared to mean CA (black), male subjects in the best 25% quartile, with younger PCAges relative to their CAs (PCAge Low, cyan), had a shallower decline in survival (Pâ=â0.002), whereas male subjects in the worst 25% quartile, with older PCAges relative to their CAs (PCAge High, blue), had a steeper decline in survival (Pâ=â0.03). b, Compared to mean CA (black), female subjects in the worst 25% quartile, with older PCAges relative to their CAs (PCAge High, blue), had a steeper decline in survival (Pâ=â0.001), although there was no statistically significant difference between mean CA (black) and female subjects in the best 25% quartile (PCAge Low, cyan) (Pâ=â0.08). For both sexes, there were no statistically significant differences between the ASCVD score and PCAge in the ability to predict survival in the 45â54 age category. Survival analyses were performed using log-rank tests. Areas shaded in color in each panel indicate 95% error bands for lines of the same color.

Extended Data Fig. 2 Kaplan-Meier survival curves by PhenoAge and PCAge categories for male and females.

We compared PhenoAgeâs and PCAgeâs ability to stratify the test cohort by either first selecting individuals based on their PhenoAge before predicting survival based on their PCAge (a-c), or by first selecting based on PCAge before predicting survival based on PhenoAge (d-f). a-c, Across all PhenoAge categories, we found that PCAge could further predict survival within the PhenoAge selection, as evidenced by the statistically significantly wider degree of separation in the survival curves between the best 25% and worst 25% quartiles (Pâ=â0.004 for PCAge Low (green) versus PCAge High (orange) and Pâ=â0.1 for PhenoAge Low (purple) versus PhenoAge High (cyan) in the 45â54 PhenoAge category, Pâ<â0.001 for PCAge Low (green) versus PCAge High (orange) and Pâ<â0.001 for PhenoAge Low (purple) versus PhenoAge High (cyan) in the 65â74 PhenoAge category, Pâ<â0.001 for PCAge Low (green) versus PCAge High (orange) and Pâ=â0.05 for PhenoAge Low (purple) versus PhenoAge High (cyan) in the 75â84 PhenoAge category). d-f, However, when we evaluated the performance of PhenoAge in survival prediction in subjects selected according to their PCAge instead, we found significant differences only in the 65â74 (Pâ<â0.001) and 75â84 (Pâ=â0.004) PCAge categories. Our findings therefore suggest that PCAge in many cases could identify additional healthy aging and other at-risk individuals beyond that predicted by PhenoAge. Survival analyses were performed using log-rank tests. Areas shaded in color in a-f indicate 95% error bands for lines of the same color.

Extended Data Fig. 3 Kaplan-Meier survival curves by CA category for male and female clusters.

a, Survival curves for male clusters in the 55â64 CA category. Log-rank tests were statistically significant for all individual curve comparisons, except between the âmajor cardio-metabolicâ (orange) and âmulti-morbidâ (yellow) clusters (Pâ=â0.6), and between the âhealthy agingâ (green) and âmild cardio-metabolicâ (purple) clusters (Pâ=â0.5). b, Refer to Fig. 4a. c, Survival curves for male clusters in the 75â84 CA category. Log-rank tests were statistically significant for all individual curve comparisons, except between the âmajor cardio-metabolicâ (orange) and âmulti-morbidâ (yellow) clusters (Pâ=â0.7), âmajor cardio-metabolicâ (orange) and âmild cardio-metabolicâ (purple) clusters (Pâ=â0.05), and between the âhealthy agingâ (green) and âmild cardio-metabolicâ (purple) clusters (Pâ=â0.6). d, Survival curves for female clusters in the 55â64 CA category. Log-rank tests were statistically significant only for individual curve comparisons between the âmild cardio-metabolicâ (purple) and âmulti-morbidâ (yellow) clusters (Pâ=â0.02), âmild cardio-metabolicâ (purple) and âcardio-metabolic failureâ (red) clusters (Pâ=â0.001), âmild cardio-metabolicâ (purple) and âmajor cardio-metabolicâ (orange) clusters (Pâ=â0.04), and between the âhealthy agingâ (green) and âcardio-metabolic failureâ (red) clusters (Pâ=â0.02). e, Survival curves for female clusters in the 65â74 CA category. Log-rank tests were statistically significant only for individual curve comparisons between the âhealthy agingâ (green) and âmulti-morbidâ (yellow) clusters (Pâ=â0.01), mild cardio-metabolicâ (purple) and âcardio-metabolic failureâ (red) clusters (Pâ=â0.03), and between the âhealthy agingâ (green) and âcardio-metabolic failureâ (red) clusters (Pâ=â0.01). f, Survival curves for female clusters in the 75â84 CA category. Log-rank tests were statistically significant only for individual curve comparisons between the âmild cardio-metabolicâ (purple) and âmulti-morbidâ (yellow) clusters (Pâ=â0.008), and between the âhealthy agingâ (green) and âmulti-morbidâ (yellow) clusters (Pâ=â0.006). Subjects from the âhealthy agingâ clusters experienced the shallowest declines in survival across all CA categories, except for 55â64 year old females.

Extended Data Fig. 4 Cluster-specific aging rates and PC4 rates for females.

a, Scatter plot and linear regression of CA versus PCAge for each of the five female clusters â âhealthy agingâ (green), âmild cardio-metabolicâ (purple), âmajor cardio-metabolicâ (orange), âcardio-metabolic failureâ (red), and âmulti-morbidâ (yellow). Females in the âhealthy agingâ cluster (green) had the slowest cluster-specific aging rate, biologically aging on average 1.04 years per calendar year (slope=1.04, R²â=â0.86, Pâ<â0.001 for females). Females from the cardio-metabolic axis had progressively faster cluster-specific aging rates (slope=1.04, R²â=â0.86, Pâ<â0.001 for âmild cardio-metabolicâ (purple), and slope=1.12, R²â=â0.80, Pâ<â0.001 for âmajor cardio-metabolicâ (orange)), with the highest cluster-specific aging rate seen in the âcardiometabolic failureâ (red) females (slope=1.31, R²â=â0.66, Pâ<â0.001). Females from the âmulti-morbidâ cluster (yellow) had intermediate cluster-specific aging rates (slope=1.10, R²â=â0.82, Pâ<â0.001). b, Scatter plot and linear regression of CA versus PC4 for each of the five female clusters. Although dispersion was high, PC4 increased with age for the âhealthy agingâ (green) (slope=0.012, R²â=â0.007, Pâ=â0.038), âmild cardio-metabolicâ (purple) (slope=0.02, R²â=â0.023, Pâ<â0.001), âmajor cardio-metabolicâ (orange) (slope=0.023, R²â=â0.013, Pâ=â0.02), and âmulti-morbidâ (yellow) (slope=0.04, R²â=â0.058, Pâ<â0.001) clusters. There was no statistically significant increase with age in the already high PC4 values in the âcardio-metabolic failureâ (red) (slope=0.04, R²â=â0.058, Pâ=â0.19) cluster.

Extended Data Fig. 5 LinAge and PhenoAge in chronologically 45â54 year old males and females.

a-b, Kaplan-Meier survival curves showing actual survival in the NHANES 2001â2002 test cohort over a 20-year follow up period for both sexes comparing biologically younger subjects in the bottom 25% quartile for LinAge Delta (LinAge Low, cyan) and PhenoAge Delta (PhenoAge Low, orange) with biologically older subjects in the upper 25% quartile for LinAge Delta (LinAge High, blue) and PhenoAge Delta (PhenoAge High, red) in the 45â54 CA category. Mortality is overall low in this CA bin and both clocks perform similarly with no statistically significant differences between LinAge Low (cyan) and PhenoAge Low (orange), as well as between LinAge High (blue) and PhenoAge High (red), in either sex. Areas shaded in color in each panel indicate 95% error bands for lines of the same color.

Supplementary information

Supplementary Figs. 1â10, Supplementary Tables 1â8, Cluster Analysis, Code and Supplementary Files

Reporting Summary

Supplementary Data 1

PCAge: Cleaned and merged dataset derived from NHANES IV 99/00 and 01/02 recruitment waves. Includes mortality linkage. This dataset is sufficient to train the PCAge clock example.

Supplementary Data 2

PCAge: Codebook file lists features (columns in NHANES datafile), linking NHANES variable names to human readable names. Codebook also contains columns used to include and force-include variables in clock.

Supplementary Code 1

PCAge: Example code (R script) constructing and displaying a PCAge clockâneeds codeBook.csv and nhanesMerged.csv in the working directory.

Supplementary Data 3

linAge_XLS: Excel spreadsheet to calculate LinAge from clinical parameters.

Supplementary Data 4

linAge_XLS: Readme file with instructions how to use linAge_Example.xls spreadsheet.

Supplementary Data 5

linAge_Rscript: Data matrix extracted from NHANES IV 99/00 and 01/02 required to run linAge.R example script.

Supplementary Data 6

linAge_Rscript: Data matrix (normalized) extracted from NHANES IV 99/00 and 01/02 required to run linAge.R example script.

Supplementary Data 7

linAge_Rscript: Data matrix extracted from NHANES IV 99/00 and 01/02, including demographic and questionnaire data, required to run linAge.R example script.

Supplementary Data 8

linAge_Rscript: Data file containing LinAge parameters and normalization parameters for LinAge. Required to run linAge.R example script.

Supplementary Software 2

linAge_Rscript: R script illustrating LinAge calculation from parameter and data files.

Supplementary Data 9

customClock: Readme file explaining purpose and flow of customClock_script.R.

Supplementary Software 3

customClock: Main custom clock R script, illustrating construction of LinAge using a subset of variables (see README_2.txt).

Supplementary Data 10

customClock: Data matrix extracted from NHANES IV 99/00 and 01/02 required to run customClock_script.R example script (see README_2.txt).

Supplementary Data 11

customClock: Codebook file lists features (columns in NHNAES datafile), linking NHANES variable names to human readable names. Codebook also contains columns used to include and force-include variables in clock. Needed to run customClock_script.R (see README_2.txt).

Supplementary Data 12

customClock: File containing modifiable parameters for customClock_script.R (see README_2.txt).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fong, S., Pabis, K., Latumalea, D. et al. Principal component-based clinical aging clocks identify signatures of healthy aging and targets for clinical intervention. Nat Aging (2024). https://doi.org/10.1038/s43587-024-00646-8

Download citation

Received: 26 July 2023
Accepted: 08 May 2024
Published: 19 June 2024
DOI: https://doi.org/10.1038/s43587-024-00646-8