Key Points
A novel CHARM was developed and validated in older patients to predict NRM and survival after allogeneic transplants.
CHARM performed better than HCT-comorbidity index per DCA for NRM.
Visual Abstract
Allogeneic hematopoietic cell transplantation (allo-HCT) is potentially curative for older adults with hematologic malignancies. Concerns on nonrelapse mortality (NRM) in older adults limit allo-HCT utilization. We executed a prospective, observational study BMT-CTN 1704 (Blood and Marrow Transplant Clinical Trials Network) enrolling allo-HCT recipients aged ≥60 years from 49 centers in the United States. We analyzed associations between 13 measurements of older adult health and NRM within 1 year to construct a comprehensive health assessment risk model (primary-CHARM) using multivariate Fine-Gray model and grouped penalized variable selection. Two machine learning (ML) models (Cox and pseudo-value boosting) were also explored. Models’ performances were compared using area under the curve (AUC), with bootstrap and cross-validation sampling to correct for optimism, decision curve analysis (DCA), calibration, and Brier scores. Among 1105 patients with median age of 67 (range, 60-82) years who received allo-HCT, NRM was 14.4% and overall survival (OS) 71.7% at 1 year. Factors statistically selected for inclusion in primary-CHARM were higher comorbidity burden, lower albumin, higher C-reactive protein, older age, higher weight-loss percentage, lower patient-reported performance score, and cognitive impairment. Primary-CHARM scores were independently associated with higher NRM (hazard ratio [HR], 2.72; P < .0001) and worse OS (HR, 2.09; P < .0001). Bootstrap bias–corrected AUC for primary-CHARM was 0.591. Comparing primary-CHARM with HCT-comorbidity index and 2 ML-CHARM models, calibration, Brier score, and DCA analysis favored primary-CHARM. Primary-CHARM, with mostly simple and readily available parameters, risk stratifies older adults for allo-HCT. Adopting primary-CHARM in practice may promote broader use of HCT by quantifying risk and enhance the design of strategies to improve outcomes. This trial was registered at www.ClinicalTrials.gov as #NCT03992352.
Introduction
Hematologic malignancies, such as acute myeloid leukemia, are more frequent among adults 60 years of age or older.1 Outcomes of these older adults are generally worse than those of younger patients.2 Allogeneic hematopoietic cell transplantation (allo-HCT) provides a potential cure for hematologic malignancies,3 with continued evidence of improving allo-HCT outcomes in older adults.4 Yet, only a small fraction of older adults with hematologic malignancies are offered allo-HCT,5 indicating uncertainty about the benefit of allo-HCT in this population.6,7 Older age is one of the largest barriers to referral for allo-HCT.8 One method to address this uncertainty is by optimizing methods of prognostic evaluation for nonrelapse mortality (NRM).
Before this study, an HCT-specific comorbidity index (HCT-CI) was widely used to risk stratify patients.9 The 2014 Blood and Marrow Transplant Clinical Trials Network (BMT-CTN) State of the Science Symposium highlighted that the optimal care for older allo-HCT recipients should include a comprehensive prognostic assessment that considers other potentially important patient-specific risk factors.10 Parameters included in geriatric assessments have been found to be linked to HCT outcomes,6,11 and such assessments are, in general, strongly recommended by international geriatric associations for older adults with cancer.12 Specifically, physical function, cognition, and gait speed are suggested to affect mortality.6,13-15 Serum laboratory biomarkers such as C-reactive protein (CRP) and albumin are also independently associated with NRM.16,17
Here, we report the results of the first large prospective national study, BMT-CTN 1704, executed to build and validate a novel comprehensive health assessment risk model (CHARM) from relevant prognostic parameters. The aim is to improve prediction of NRM and other outcomes among older recipients of allo-HCT that could enhance patient counseling and improve outcomes.
Methods
We followed the Enhancing the QUAlity and Transparency of Health Research (EQUATOR) reporting guidelines18 that use the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) reporting criteria for observational studies.19 Likewise, we followed the Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines.20
Study design, setting, and participants
This was a multicenter (n = 49), prospective, observational clinical trial (ClinicalTrials.gov identifier: NCT03992352) among potential recipients of allo-HCT, who were ≥60 years of age.
Primary objective
To determine the set of assessments and biomarkers that could together constitute a robust and valid composite health risk model for accurate personalized estimation of NRM. We report on this outcome here.
Secondary objectives
To determine the association of CHARM with differences in overall survival, frailty-free survival, disability, skilled-facility admission, quality of life and acute and chronic graft-versus-host disease (GVHD), serious organ toxicities, and survival after acute GVHD. We only report on overall survival here.
The trial was approved by the National Marrow Donor Program institutional review board. All participants provided written informed consent.
Inclusion criteria were patients (1) aged ≥60 years; (2) who had a diagnosis of a hematologic malignancy; (3) eligible for allo-HCT per institutional standards; (4) able to speak and read English, Spanish, and/or Mandarin; and (5) willing and able to provide informed consent. Exclusion criterion was previous HCT.
There were no racial/ethnic/gender expectations for this study. Furthermore, there were no restrictions for choice of conditioning regimen, stem cell source, GVHD prophylaxis regimen, or donor type.
Additional details are provided in the supplemental Material and supplemental Tables 1-3.
Variables, data sources/measurement, quantitative and categorical variables
Primary end point
The primary end point was 1-year NRM. NRM was defined as death without relapse or progression of the primary hematologic malignancy. Relapse or progression was a competing risk.
Secondary end point
The secondary end point was 1-year OS, with the event for this end point defined as death from any cause, and the time to the event was the time interval between date of transplant and death, with patients censored at last follow-up or 1 year, whichever was first.
Exposure
Exposure was receiving allo-HCT regardless of type of conditioning regimen, donor, donor-recipient HLA matching, stem cell source, or disease status. All patients receiving allo-HCT were considered evaluable participants for analyses.
The variables considered for CHARM and studies supporting the rationale of their use
The variables are the following: patient age at allo-HCT21-24; HCT-CI score25,26; serum albumin level16,17; CRP17; cognition level per the Montreal Cognitive Assessment (MoCA)6,27,28; percentage of weight loss over the preceding year29,30; scoring on patient-reported Karnofsky performance status (KPS)31-33; 4-meter gait speed6,34-36; Patient-Reported Outcomes Measurement Reporting System (PROMIS) physical function scale13,37,38; Instrumental Activities of Daily Living36,39; number of falls in the preceding 6 months40; scoring on PROMIS Depression scale6,36,37,41,42; and number of prescribed medications.40,43 Information about all CHARM variables was collected within the 2 to 3 weeks before start of conditioning for allo-HCT. Additional details, including handling of missing data, are provided in the supplemental Material.
Before allo-HCT, a survey assessed the treating HCT physicians’ estimates of their respective patients’ chances of 1-year survival, inspired by a previously used questionnaire.44
Statistical methods
Study size and sample calculation
Study size was based on targeting an events-per-variable ratio (EPV) of 12, considering an EPV of 10 to 15 as guidance for building prediction models with time-to-event outcomes.45 We projected the EPV of 12 using an NRM rate of 22% at 1 year based on historical data reported to the Center for International Blood and Marrow Transplant Research (CIBMTR) for patients aged ≥60 years from 2012 to 2016 receiving allo-HCT. Planning 16 variables in the NRM model (13 health variables and 3 adjustment variables) required a sample size of 880 subjects to meet the EPV target. With this sample size, a binary predictor with a frequency of 0.25 would have at least 80% power to detect a hazard ratio (HR) of 1.75. To account for potential dropout of patients before receiving allo-HCT (estimated at ∼20%), the sample size was inflated to target 1100 subjects with complete data on each CHARM variable. Because of the significant amount of missing data (especially CRP) among the first 1100 patients, a decision was made to continue accrual for an additional 126 patients. Additional details are provided in the supplemental Material.
Primary outcome
We fit a multivariate Fine-Gray model for the subdistribution hazard of NRM to build the primary-CHARM model. A penalized variable selection strategy with a smoothly clipped absolute deviation penalty was used to identify potential variables to retain in the model.46 This analysis was the primary analytic approach per protocol, and the resulting model is referred to as primary-CHARM. The 13 covariates were analyzed as continuous covariates whenever possible with linear and quadratic terms. In sensitivity analysis, the model was adjusted for donor type and HLA matching, donor/recipient cytomegalovirus status, and intensity of conditioning regimen.47 Generalized cross-validation was used to choose the best lambda penalty parameter. The primary-CHARM score was constructed as a linear predictor based on the log subdistribution HRs. The predictive performance of the model was summarized using the time-dependent area under the receiver operating characteristic (ROC) curve48 for prediction of NRM at 1 year, as implemented in the R package time receiver operating characteristic with patients experiencing competing risks considered as “controls.” Bias-corrected measures of predictive model performance49 using the bootstrap50 and cross-validation resampling methods were used to correct for optimism or overestimation of the performance metrics because the primary-CHARM model was trained on the full data set. To describe clinical effects, primary-CHARM scores and outcomes were displayed by tertiles.
Other considerations
We performed exploratory machine learning (ML) modeling using boosting applied to pseudovalues.51 This approach directly models the cumulative incidence of NRM at 1 year. We also performed Cox boosting ML model, which models the cause-specific hazard. The pseudovalue boosting ML model can provide predicted probabilities required for model performance metrics. These predicted probabilities were not directly available for the Cox boosting approach without also modeling the cause-specific hazard of relapse.
Assessment of model performance included, other than comparisons of area under the curve (AUC), calibration measures, slope, and intercept of the calibration plot of 1-year NRM. Apparent performance is calculated on the training sample and is subject to bias. Bootstrap bias correction with 200 bootstrap samples was the primary approach to correct for this bias, whereas cross-validation (10-fold, with 200 replications) was also performed to confirm the bias-corrected results. The ideal slope is 1 and the ideal intercept is 0.
We also performed overall fit measures, the Brier score, and the scaled Brier score (R2) for prediction of 1-year NRM.52,53 A lower Brier score indicates less difference between the observed and predicted outcomes and is considered favorable. The Brier score R2 indicates improvement from a model with no covariates, with higher percentage as favorable. The Cox boosting model was not included because it only models the cause-specific hazard for NRM and does not directly provide predictions of cumulative incidence to use in assessing overall fit.
In addition, a decision curve analysis (DCA) was performed to evaluate the potential clinical benefits of using the CHARM vs other models to identify patients at high risk for NRM. DCA54-60 evaluates the value of a predictive model when making clinical decisions. The following 3 strategies were compared: selecting all patients for intervention or allo-HCT (ie, treating all), selecting no patients (ie, treating none), and selecting patients based on the models (Figure 1). A high-performing model will demonstrate higher values of the net benefit across the targeted range for decision cutpoints in terms of predicted NRM incidence.
DCA plot of net benefit vs threshold probability for NRM. All plots are bias-corrected using cross-validation to generate predicted probabilities. This figure reveals that CHARM exhibited higher net benefit compared with a “treat all” or “treat none” and that benefit was higher than that per the HCT-CI approach with a wide range of threshold probability for NRM.
DCA plot of net benefit vs threshold probability for NRM. All plots are bias-corrected using cross-validation to generate predicted probabilities. This figure reveals that CHARM exhibited higher net benefit compared with a “treat all” or “treat none” and that benefit was higher than that per the HCT-CI approach with a wide range of threshold probability for NRM.
To further understand the possibility of better AUC for 1-year NRM by the ML-CHARM including all 13 health covariates, we evaluated a Shapley value plot to summarize the contribution and rank of individual factors in the ML-CHARM on NRM.
Shapley values measure the contribution of a particular variable (eg, a particular CHARM variable) to the prediction for each patient. In this case, they are being applied to the prediction from the Cox boost ML model. Each patient has a Shapley value for each CHARM variable. The summary plot reveals the variability in Shapley values across patients from the most important variable at the top to the least important variable at the bottom. The scale of the Shapley value represents change in log-hazard predictions with vs without a particular variable included in the model.
Secondary outcome
OS was analyzed using Cox proportional hazards regression. Stepwise variable selection was used to select demographic and baseline characteristics for risk adjustment, before adding primary-CHARM scores to the model. OS by primary-CHARM tertiles was summarized using Kaplan-Meier estimates.
Handling of missing data and loss to follow-up
Supplemental Table 6 describes the completeness of variables used in these analyses. Censored data methods were used to handle missingness on NRM and OS due to loss to follow-up. Multiple imputations were used to address missingness in CHARM variables using the R package “mice.”61 We implemented group variable selection with the R package “gcrrp” to perform consistent variable selection for the NRM model across multiply imputed data sets.62 Once the variables are selected, we refit a regular Fine-Gray model for NRM to each imputed data set and applied Rubin’s rule for inference. Similarly, Cox regression for OS on the multiply imputed data sets is implemented after selecting adjustment variables, and the results are combined using Rubin’s rule.
Results
Participant characteristics and exposure
The study enrolled a total of 1226 patients, of whom 121 dropped out mostly due to not proceeding to allo-HCT (n = 74, 61%). The CONSORT (Consolidated Standards of Reporting Trials) diagram is found in Figure 2. There were no significant differences in age, sex, race/ethnicity, or primary disease between the 2 groups (supplemental Table 4). Among 74 patients who did not proceed to allo-HCT, disease relapse was the most common reason (41%). Table 1 reveals the distribution of the baseline CHARM variables for the primary analysis population of 1105 allo-HCT recipients. Supplemental Table 5 shows baseline characteristics of the study cohort (n = 1105) and supplemental Table 6 shows completeness of CHARM baseline variables. Supplemental Figure 1 reveals graphical description of actual vs projected enrollment per each quarter for years 2019 to 2022. The median age was 67 years (range, 60-82). Most participants received reduced-intensity regimens (68%); 13% received high-dose conditioning regimens and 19% received nonmyeloablative regimens. Additional details are provided in the supplemental Material.
Health variables analyzed to develop CHARM, primary analysis population
Characteristic . | N (%) . |
---|---|
No. of patients | 1105 |
Transplant centers–reported domains | |
Age at HCT (y) | |
Evaluable | 1105 |
Mean (SD) | 67.7 (4.54) |
Median (25th-75th percentile) | 67.5 (64.1-71.2) |
Min-max | 60.0-82.2 |
60-64 | 350 (31.7%) |
65-69 | 399 (36.1%) |
70-74 | 295 (26.7%) |
>75 | 61 (5.5%) |
Walk speed, 4 m (m/s) | |
N | 1062 |
Mean (SD) | 1.0 (0.31) |
Median (25th-75th percentile) | 1.0 (0.8-1.1) |
Min-max | 0.1-2.7 |
<0.8 m/s | 244 (23.0%) |
≥0.8 m/s | 818 (77.0%) |
Weight loss percentage in previous year, n (%) | |
N | 1008 |
Median (25th-75th percentile) | 3.2% (0%-8.1%) |
Min-max | 0%-37.3% |
No weight loss | 354 (32) |
<10% | 457 (41) |
≥10% (unintentional) | 169 (15) |
≥10% (intentional) | 26 (2) |
≥10% (intention unknown) | 2 (0) |
Not reported | 97 (9) |
Cognition by MoCA | |
N | 1058 |
Mean (SD) | 26 (2.92) |
Median (25th-75th percentile) | 26 (24-28) |
Min-max | 12-30 |
<26 | 409 (38.7%) |
≥26 | 649 (61.3%) |
CRP (mg/L) | |
N | 948 |
Mean (SD) | 6.1 (12.03) |
Median (25th-75th percentile) | 2.0 (0.9-6.0) |
Min-max | 0.04-169 |
≤10 | 808 (85.2%) |
>10 | 140 (14.8%) |
Serum albumin, g/dL | |
N | 1101 |
Mean (SD) | 4.0 (0.42) |
Median (25th-75th prcentile) | 4.0 (3.7-4.2) |
Min-max | 2.4-5.3 |
<3.5 | 122 (11.1%) |
≥3.5 | 979 (88.9%) |
Comorbidity by HCT-CI, n (%) | |
0 | 195 (18) |
1 | 179 (16) |
2 | 168 (15) |
3 | 175 (16) |
4 | 149 (13) |
5 | 91 (8) |
6 | 75 (7) |
7 | 38 (3) |
8 | 20 (2) |
9 | 7 (1) |
10 | 6 (1) |
11 | 2 (0) |
Patient-reported domains | |
Karnofsky performance score, n (%) | |
40-Disabled, requires special care and assistance | 5 (0) |
50-Requires considerable assistance and frequent medical care | 8 (1) |
60-Requires occasional assistance but is able to care for most of his/her needs | 65 (6) |
70-Cares for self, unable to carry on normal activity or to do active work | 112 (10) |
80-Normal activity with effort; some signs or symptoms of disease | 126 (11) |
90-Able to carry on normal activity; minor signs or symptoms of disease | 343 (31) |
100-Normal, no complaints, no evidence of disease | 313 (28) |
Not reported | 133 (12) |
PROMIS Physical Function | |
N | 976 |
Mean (SD) | 44.7 (8.38) |
Median (25th-75th percentile) | 44.0 (39.0-50.0) |
Min-max | 21.0-67.0 |
<40 | 264 (27.0%) |
≥40 | 712 (73.0%) |
Instrumental activities of daily living score | |
N | 982 |
Mean (SD) | 13.1 (1.63) |
Median (25th-75th percentile) | 14.0 (13.0-14.0) |
Min-max | 4.0-14.0 |
<14 | 359 (36.6%) |
≥14 | 623 (63.4%) |
No. of falls in last 6 mo, no. (%) | |
0 | 820 (74) |
1 | 122 (11) |
≥2 | 35 (3) |
Not reported | 128 (12) |
PROMIS Depression | |
N | 975 |
Mean (SD) | 45 (7.17) |
Median (25th-75th percentile) | 44 (38-50) |
Min-max | 34-66 |
<60 | 949 (97.3%) |
≥60 | 26 (2.7%) |
Number of prescribed medications | |
N | 867 |
Mean (SD) | 6 (3.54) |
Median (25th-75th percentile) | 5 (3-7) |
Min-max | 0-27 |
≤4 | 386 (44.5%) |
>4 | 481 (55.5%) |
Physician-reported survey | |
Physician prognostic evaluation: alive at 1 y, n (%) | |
Very good (>90%) | 34 (3) |
Good (75%-90%) | 268 (24) |
Better than 50/50 (50%-74%) | 433 (39) |
Worse than 50/50 (25%-49%) | 178 (16) |
Bad (10%-24%) | 17 (2) |
Very bad (<10%) | 1 (0) |
Not reported | 174 (16) |
Characteristic . | N (%) . |
---|---|
No. of patients | 1105 |
Transplant centers–reported domains | |
Age at HCT (y) | |
Evaluable | 1105 |
Mean (SD) | 67.7 (4.54) |
Median (25th-75th percentile) | 67.5 (64.1-71.2) |
Min-max | 60.0-82.2 |
60-64 | 350 (31.7%) |
65-69 | 399 (36.1%) |
70-74 | 295 (26.7%) |
>75 | 61 (5.5%) |
Walk speed, 4 m (m/s) | |
N | 1062 |
Mean (SD) | 1.0 (0.31) |
Median (25th-75th percentile) | 1.0 (0.8-1.1) |
Min-max | 0.1-2.7 |
<0.8 m/s | 244 (23.0%) |
≥0.8 m/s | 818 (77.0%) |
Weight loss percentage in previous year, n (%) | |
N | 1008 |
Median (25th-75th percentile) | 3.2% (0%-8.1%) |
Min-max | 0%-37.3% |
No weight loss | 354 (32) |
<10% | 457 (41) |
≥10% (unintentional) | 169 (15) |
≥10% (intentional) | 26 (2) |
≥10% (intention unknown) | 2 (0) |
Not reported | 97 (9) |
Cognition by MoCA | |
N | 1058 |
Mean (SD) | 26 (2.92) |
Median (25th-75th percentile) | 26 (24-28) |
Min-max | 12-30 |
<26 | 409 (38.7%) |
≥26 | 649 (61.3%) |
CRP (mg/L) | |
N | 948 |
Mean (SD) | 6.1 (12.03) |
Median (25th-75th percentile) | 2.0 (0.9-6.0) |
Min-max | 0.04-169 |
≤10 | 808 (85.2%) |
>10 | 140 (14.8%) |
Serum albumin, g/dL | |
N | 1101 |
Mean (SD) | 4.0 (0.42) |
Median (25th-75th prcentile) | 4.0 (3.7-4.2) |
Min-max | 2.4-5.3 |
<3.5 | 122 (11.1%) |
≥3.5 | 979 (88.9%) |
Comorbidity by HCT-CI, n (%) | |
0 | 195 (18) |
1 | 179 (16) |
2 | 168 (15) |
3 | 175 (16) |
4 | 149 (13) |
5 | 91 (8) |
6 | 75 (7) |
7 | 38 (3) |
8 | 20 (2) |
9 | 7 (1) |
10 | 6 (1) |
11 | 2 (0) |
Patient-reported domains | |
Karnofsky performance score, n (%) | |
40-Disabled, requires special care and assistance | 5 (0) |
50-Requires considerable assistance and frequent medical care | 8 (1) |
60-Requires occasional assistance but is able to care for most of his/her needs | 65 (6) |
70-Cares for self, unable to carry on normal activity or to do active work | 112 (10) |
80-Normal activity with effort; some signs or symptoms of disease | 126 (11) |
90-Able to carry on normal activity; minor signs or symptoms of disease | 343 (31) |
100-Normal, no complaints, no evidence of disease | 313 (28) |
Not reported | 133 (12) |
PROMIS Physical Function | |
N | 976 |
Mean (SD) | 44.7 (8.38) |
Median (25th-75th percentile) | 44.0 (39.0-50.0) |
Min-max | 21.0-67.0 |
<40 | 264 (27.0%) |
≥40 | 712 (73.0%) |
Instrumental activities of daily living score | |
N | 982 |
Mean (SD) | 13.1 (1.63) |
Median (25th-75th percentile) | 14.0 (13.0-14.0) |
Min-max | 4.0-14.0 |
<14 | 359 (36.6%) |
≥14 | 623 (63.4%) |
No. of falls in last 6 mo, no. (%) | |
0 | 820 (74) |
1 | 122 (11) |
≥2 | 35 (3) |
Not reported | 128 (12) |
PROMIS Depression | |
N | 975 |
Mean (SD) | 45 (7.17) |
Median (25th-75th percentile) | 44 (38-50) |
Min-max | 34-66 |
<60 | 949 (97.3%) |
≥60 | 26 (2.7%) |
Number of prescribed medications | |
N | 867 |
Mean (SD) | 6 (3.54) |
Median (25th-75th percentile) | 5 (3-7) |
Min-max | 0-27 |
≤4 | 386 (44.5%) |
>4 | 481 (55.5%) |
Physician-reported survey | |
Physician prognostic evaluation: alive at 1 y, n (%) | |
Very good (>90%) | 34 (3) |
Good (75%-90%) | 268 (24) |
Better than 50/50 (50%-74%) | 433 (39) |
Worse than 50/50 (25%-49%) | 178 (16) |
Bad (10%-24%) | 17 (2) |
Very bad (<10%) | 1 (0) |
Not reported | 174 (16) |
Max, maximum; Min, minimum; SD, standard deviation.
Primary outcome
NRM at first year was 14.4% (95% confidence interval [CI], 12.4-16.5). Estimates were 12.5 (95% CI, 9.9-15.3) for years 2019 to 2020 and 16.6 (95% CI, 13.5-19.9) for years 2021 to 2022. Supplemental Table 7 presents comparisons of univariate outcomes among study participants vs a control group of older patients who received allo-HCT during the same time and were reported to CIBMTR. Supplemental Table 8 presents the primary causes of death.
CHARM variables
Supplemental Figure 2 reveals histograms of baseline CHARM variables. The primary-CHARM model for NRM (Table 2) included higher values for comorbidity burden, CRP, weight loss, and age and lower values for albumin, patient-reported performance score, and cognitive score; each of which had independent associations with the risk for NRM.
Primary analysis: multivariate penalized Fine-Gray model analysis of CHARM variables
Variable . | Coefficient (log HR scale)∗ . | P value . | Coefficient (adjusted for non-CHARM variables)† . | Median . | Quartile with elevated NRM risk . | Subdist HR‡ . | 95% CI lower . | 95% CI upper . |
---|---|---|---|---|---|---|---|---|
HCT-CI | 0.1 296 686 | <.0001 | 0.1 357 042 | 3 | 4 | 1.138 | 1.069 | 1.212 |
Log(CRP)∗ | 0.1 118 738 | .0493 | 0.1 141 597 | 2 | 6 | 1.131 | 1.000 | 1.278 |
Albumin | −0.5 989 142 | .0008 | −0.5 920 071 | 4 | 3.7 | 1.197 | 1.078 | 1.329 |
Weight loss2 | 0.001 110 624 | .0089 | 0.001 190 834 | 3.2% | 8.1% | 1.063 | 1.016 | 1.114 |
PROkps2 | −6.45035E-05 | .1115 | −6.74881E-05 | 90 | 80 | 1.116 | 0.975 | 1.277 |
Age at HCT | 0.03 748 089 | .0317 | 0.04 346 934 | 67.5 | 71.2 | 1.149 | 1.012 | 1.304 |
MoCA | −0.04 649 406 | .1106 | −0.04 011 149 | 26 | 24 | 1.097 | 0.979 | 1.230 |
Variable . | Coefficient (log HR scale)∗ . | P value . | Coefficient (adjusted for non-CHARM variables)† . | Median . | Quartile with elevated NRM risk . | Subdist HR‡ . | 95% CI lower . | 95% CI upper . |
---|---|---|---|---|---|---|---|---|
HCT-CI | 0.1 296 686 | <.0001 | 0.1 357 042 | 3 | 4 | 1.138 | 1.069 | 1.212 |
Log(CRP)∗ | 0.1 118 738 | .0493 | 0.1 141 597 | 2 | 6 | 1.131 | 1.000 | 1.278 |
Albumin | −0.5 989 142 | .0008 | −0.5 920 071 | 4 | 3.7 | 1.197 | 1.078 | 1.329 |
Weight loss2 | 0.001 110 624 | .0089 | 0.001 190 834 | 3.2% | 8.1% | 1.063 | 1.016 | 1.114 |
PROkps2 | −6.45035E-05 | .1115 | −6.74881E-05 | 90 | 80 | 1.116 | 0.975 | 1.277 |
Age at HCT | 0.03 748 089 | .0317 | 0.04 346 934 | 67.5 | 71.2 | 1.149 | 1.012 | 1.304 |
MoCA | −0.04 649 406 | .1106 | −0.04 011 149 | 26 | 24 | 1.097 | 0.979 | 1.230 |
Final CHARM score calculation = 0.1 296 686 × HCT-CI + 0.1 118 738 × log (CRP) – 0.5 989 142 × albumin + 0.001 110 624 × weight loss2 – 0.0 000 645 035 × KPS2 + 0.03 748 089 × age at HCT – 0.04 649 406 × MoCA. An online CHARM calculator is available at: https://cibmtr.org/CIBMTR/OffNav/DevSandbox/CHARM-Risk-NRM-Calculator.
log(CRP), log of C-reactive protein; PROkps2, indicates patient-reported KPS squared; weight loss2, percentage of weight loss comparing weight 1 year before allogeneic HCT to weight at the time of study evaluation squared.
Natural log; positive coefficients indicate higher risk with higher variable values; negative coefficients indicate higher risk with lower variable values.
The model was adjusted for characteristics of conditioning intensity, donor/recipient cytomegalovirus serology matching, or donor type/HLA matching forcing covariates in the model to see whether the effect of the primary-CHARM health variables differed after adjustment for these factors.
Subdistribution HR from the Fine-Gray model refers to the relative change in the instantaneous rate of the occurrence of the event (NRM) in those subjects who are event free or who have experienced a competing event (relapse). Subdistribution HR and associated CIs are for the comparison of the quartile with elevated risk vs the median value for each CHARM variable, to account for the highly varying scales of each measurement. Quartiles and medians are expressed on the original scale rather than the transformed (log or squared) scales. Quartile with elevated risk is defined as the 75th percentile for measures where increasing values are associated with higher NRM risk (HCT-CI, log(CRP), weight loss, and age) and the 25th percentile for measures where decreasing values are associated with higher NRM risk (albumin, PROkps, and MoCA).
Coefficients for each of the 7 variables are provided, and after adjusting for clinical factors, the model coefficients were largely unchanged (Table 2). Patients in the low, intermediate, and high CHARM score tertiles had NRM rates at 1 year of 8.1% (95% CI, 5.6-11.1), 12.1% (95% CI, 9.1-15.7), and 23.3% (95% CI, 19.0-27.7), respectively (Figure 3).
Primary-CHARM apparent AUC was 0.627. Cross-validation bias–corrected AUC for the CHARM model was 0.592 vs 0.580 for the HCT-CI (supplemental Table 9). In comparison, ML-CHARM1 and ML-CHARM2 models had bias-corrected AUCs of 0.577 and 0.606, respectively. To better elucidate the impact of variables other than the HCT-CI, an apparent AUC for a CHARM model not including the HCT-CI was 0.607 (compared with 0.627 for primary-CHARM).
Several approaches (calibration slope, intercept, and Brier score R2 analyses) were used to compare the performances of Primary-CHARM, ML-CHARM1, ML-CHARM2, and HCT-CI to predict NRM (supplemental Figure 3; supplemental Tables 10 and 11). Overall, they indicate better performance of primary-CHARM compared with the HCT-CI and similar performance to the 2 ML-CHARM models.
Shapley value plots (supplemental Figure 4) revealed the following variables to higher values in descending order: HCT-CI, CRP, serum albumin, walk speed, age, KPS, PROMIS Depression, MoCA, PROMIS Physical Function, percent of weight loss, number of prescribed medications, Instrumental Activities of Daily Living, and number of falls.
DCA (Figure 1; supplemental Figure 5) revealed that primary-CHARM exhibited higher net benefit (ie, correct identification of patients who would experience NRM by 12 months) compared with a “treat all” or “treat none” approach across a wide range of threshold probabilities for NRM. This higher net benefit was evident compared with those based on HCT-CI alone (Figure 1) and comparable to those per the pseudo-value ML-CHARM (supplemental Figure 5).
Subgroup analyses (supplemental Table 12; supplemental Figure 6) confirmed associations between primary-CHARM scores and risks of NRM among different subgroups, such as among patients who received post-transplant cyclophosphamide (HR, 3.522; 95% CI, 2.353-5.270) or other regimens (HR, 2.701; 95% CI, 1.807-4.036) for GVHD prophylaxis; those aged <70 years old (HR, 2.679; 95% CI, 1.951-3.680) or ≥70 years old (HR, 4.353; 95% CI, 2.424-7.819); and those with low-intermediate disease risk index (DRI) (HR, 3.140; 95% CI, 2.181-4.520) or high-very high DRI (HR, 2.703; 95% CI, 1.349-5.419).
Secondary outcome (1-year OS)
A total of 313 patients died within 1 year after allo-HCT with an average 1-year OS rate of 71.7% (95% CI, 68.2-75.1). Primary-CHARM scores stratified Kaplan-Meier plots of 1-year OS to 81.2%, 73.8%, and 59.6% for low-, intermediate-, and high-risk tertiles, respectively (Figure 4A), but did not for relapse (Figure 4B). In a multivariate Cox model, primary-CHARM (HR, 2.09; P < .0001) and DRI high/very high (HR, 1.73; P = .0025) were the only factors independently associated with OS (supplemental Table 13). In an additional multivariate Cox model analysis of OS, physician estimate of OS was forced in the model (Table 3). Primary-CHARM scores (HR, 2.06; P < .0001) and DRI high/very high (HR, 1.537; P = .0025) were again the only factors independently associated with OS, whereas physician estimate of OS was not (overall P = .056) (Table 3). Physician estimates of OS failed to stratify outcomes except for the highest risk group comprising only 2% of patients (supplemental Figure 7).
Stratification of outcomes per primary CHARM tertiles. Primary-CHARM tertiles stratifying for (A) overall survival and (B) relapse.
Stratification of outcomes per primary CHARM tertiles. Primary-CHARM tertiles stratifying for (A) overall survival and (B) relapse.
Multivariate model for overall survival through 1 year
Variable . | HR . | 95% CI lower . | 95% CI upper . | P value . |
---|---|---|---|---|
Primary-CHARM scores | 2.060 | 1.676 | 2.531 | <.0001 |
DRI low/intermediate | 1.000 | Overall .0057 | ||
DRI high/very high | 1.537 | 1.164 | 2.031 | .0025 |
DRI unknown | 1.495 | 0.935 | 2.391 | .0931 |
PhysQ: 75%-90% | 1.000 | Overall .056 | ||
PhysQ: 90%-100% | 0.874 | 0.418 | 1.830 | .7218 |
PhysQ: 50%-74% | 1.033 | 0.758 | 1.408 | .8357 |
PhysQ: 25%-49% | 1.058 | 0.730 | 1.534 | .7664 |
PhysQ: 0%-24% | 2.532 | 1.339 | 4.788 | .0043 |
Variable . | HR . | 95% CI lower . | 95% CI upper . | P value . |
---|---|---|---|---|
Primary-CHARM scores | 2.060 | 1.676 | 2.531 | <.0001 |
DRI low/intermediate | 1.000 | Overall .0057 | ||
DRI high/very high | 1.537 | 1.164 | 2.031 | .0025 |
DRI unknown | 1.495 | 0.935 | 2.391 | .0931 |
PhysQ: 75%-90% | 1.000 | Overall .056 | ||
PhysQ: 90%-100% | 0.874 | 0.418 | 1.830 | .7218 |
PhysQ: 50%-74% | 1.033 | 0.758 | 1.408 | .8357 |
PhysQ: 25%-49% | 1.058 | 0.730 | 1.534 | .7664 |
PhysQ: 0%-24% | 2.532 | 1.339 | 4.788 | .0043 |
PhysQ, physician questionnaire about patient survival.
Discussion
This US-based, multisite, prospective, observational, longitudinal study investigated the prognostic impact of 13 different health variables on the risk of NRM among patients aged ≥60 years who received allo-HCT. We were able to build and internally validate a novel, comprehensive prognostic measure of NRM risk for this population. Age, HCT-CI, albumin, CRP, weight loss, patient-rated KPS, and cognition by MoCA were the 7 variables that constituted the primary-CHARM model. The initial 6 of these are readily available in the clinic. Administering MoCA requires additional effort (10 minutes); however, screening for cognitive impairment not only holds value but is advised by the national guidelines for older adult cancer care.63 It is encouraging that the primary-CHARM score was the only prognostic factor for OS in a multivariate analysis along with disease risk assessed by the refined DRI. This met our study framework to quantify patient vulnerability to gauge NRM separate from disease risk tools. An online primary-CHARM calculator is available at the CIBMTR website (at https://cibmtr.org/CIBMTR/OffNav/DevSandbox/CHARM-Risk-NRM-Calculator), which provides risks of 1-year NRM based on the primary-CHARM score.
Our results reflect the following some important points: (1) given the high completion rates across 49 centers, comprehensive health assessment merging patient-reported and objective parameters is feasible (supplemental Table 6); (2) appropriate prognostic assessment of HCT outcomes is better served by multidimensional tools, not a single health domain64; (3) health assessment measures outperform subjective physician prognostication,64 justifying the effort required to collect this information in the clinic; (4) nutritional and inflammatory biomarkers, captured by weight loss, serum albumin, and CRP influence outcomes65; (5) impaired cognition, as a feature of physiologic aging, can adversely affect transplant outcomes; (6) the HCT-CI performed well in this older patient population in the modern era, being selected by the Fine-Gray model and Shapley values to be one of the most important predictors of NRM contradicting results from retrospective analyses66; and, last, (7) older age and patient-reported KPS are prognostic of NRM even after considering multiple other objective measures. Finally, we performed 2 exploratory ML models to prove that the primary-CHARM performs and provides confidence in the completeness of our model development approach.
Our study has the advantage of describing data from a large number (n = 1105) of patients treated at many (n = 49) transplant centers across the United States with broad eligibility criteria, which increases the generalizability of results. It includes a comprehensive assessment of geriatric syndromes, comorbidities, and readily available biomarkers to optimize chances to capture relevant outcome predictors. The identification of older adults with low NRM risks (ie, lowest tertile with 8.1% NRM and 81.2% OS at 1 year) should strongly encourage consideration of HCT with the appropriate disease indication, decreasing the current bias against offering allo-HCT for older patients. This will increase the chance of cure in the patient population most frequently affected with hematologic malignancies. However, identification of a group of patients with the highest tertile of primary-CHARM scores, associated with NRM rates of 23.3% and OS of 59.6% at 1 year, sets the stage for future trials exploring novel approaches to reduce morbidity and mortality of HCT. Of note, most CHARM variables are meant to reflect patients’ health status within the immediate 2 to 3 weeks before the date of transplant. This is thought to be the most adequate timeframe to evaluate patients for transplant eligibility while their primary cancer is under appropriate control or stability.
The current 1-year NRM in this study is the lowest ever reported for older (60-82 years old) recipients of HCT at 14.4%, and 1-year OS is the highest at 71.7%. This reinforces the improvement in transplant outcomes revealed previously.4,67 Reasons include enhancements in supportive care, better methods of preventing GVHD,68 and better tolerated induction therapies and conditioning regimens.69 Our findings are applicable to modern-day HCT practices nationwide with 21% patients receiving an HLA-haploidentical donor graft and 39% receiving post-transplant cyclophosphamide for GVHD prophylaxis.
The aim of this study was to create a model, designed and validated in a group of older recipients of allogeneic HCT, that can be used by transplant physicians to counsel patients about risks and benefits of HCT. Future trials could incorporate modifications based on CHARM score related to conditioning regimens or GVHD prophylaxis. We recognize that the model was not developed to decide whether patients should receive an allogeneic HCT or not. Such a model would need a different study design where data should be collected soon after achieving remission of primary cancer and at a duration that precedes the decision to refer a patient or not for allogeneic HCT. Nonetheless, investigating the value of CHARM scores earlier in the patients’ treatment history, ideally months before potential HCT, would be of the tremendous future interest.
We acknowledge that the cross-validation bias–corrected AUC of 0.591 for primary-CHARM was modest and that there is room for further improvement in predictive accuracy. We likewise found modest AUC for HCT-CI and more sophisticated ML-CHARM models. The AUC as a measure of discriminative capacity of a model has its own limitations.70 Reassuringly, all assessments currently recommended by the TRIPOD guidelines (Brier scores and DCA)57-59 revealed enhanced predictive performance by primary-CHARM compared with the current standard, the HCT-CI. Furthermore, the primary-CHARM strongly predicted OS, and its calibration was good (supplemental Figure 3; supplemental Table 10). Furthermore, removing the HCT-CI from CHARM maintained a level of prediction per AUC at a magnitude of 0.607 of acceptable apparent AUC (ie, without bias correction). Our results also suggest that primary-CHARM has at least comparable performance to 2 exploratory ML-CHARM models. Of note, the bootstrap bias–corrected AUC was superior for ML-CHARM2 probably because the apparent AUC was unusually high due to overfitting from the ML model. DCA has emerged as a recent technique to compare model performance. DCA suggests that primary-CHARM performs better than HCT-CI alone through a higher net benefit to identify patients with HCT compared with “treat all” or “treat none” approaches for a wide range of acceptable thresholds of 1-year NRM.
We recognize study limitations. Given the large patient sample needed to design the model and the declining NRM (and thus fewer events), a parallel external validation cohort was impractical and likely inefficient. However, the bootstrap-corrected internal validation and cross-validation analyses are frequently accepted approaches in epidemiology,71 and both were robust with multiple sensitivity analyses, thereby confirming the value of CHARM.71 Furthermore, unlike biomarker or tumor marker discovery studies, where external validation sample might be necessary,72 primary-CHARM as a model of established health factors follow TRIPOD guidelines that support conducting internal validation of the model in the population in which it is intended to be used.20 Adoption of CHARM in practice should enable future validation, similar to real-world validation of comorbidity by HCT-CI.26 The contribution of patients from under-represented minority groups was modest in this study despite broad eligibility and supporting 3 languages; however, they were similar to the distribution of race and ethnicity in the general United States of allo-HCT recipients aged ≥60 years when compared with data from the CIBMTR. This suggests that diminished access to HCT in general may be the main contributing factor for the low numbers of minorities gathered for the trials. Studying access barriers to clinical trials is an important future goal of the BMT-CTN. Only then can we know how well a model such as CHARM performs to capture risks across different races, ethnicities, and languages. Finally, the study was done in US centers. Whether CHARM applies equally well in other countries should be tested.
In summary, in a first-of-its-kind study, we were able to prospectively design and validate an easily implemented composite health risk assessment model inclusive of geriatric assessment, biomarkers, patient-reported outcomes, and comorbidities to better risk stratify older recipients of allo-HCT. The CHARM should improve decision-making, selection of the best transplant strategy by weighing risks vs benefits, allow calibration of data across trials and institutions, and ensure that appropriate older patients are not excluded from curative-intent allo-HCT. Intervention trials focusing on comorbidities, nutritional deficiencies, inflammation, and impaired cognition could further improve transplant outcomes. Future efforts to improve prognostic capacity may require artificial-intelligence modeling integrating broader clinical, sociodemographic, and/or biologic data.
Acknowledgments
Support for this study was provided by grants U10HL069294 and U24HL138660 to the Blood and Marrow Transplant Clinical Trials Network from the National Heart, Lung, and Blood Institute and the National Cancer Institute.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Authorship
Contribution: A.S.A. and M.L.S. initiated the conception of the study; A.S.A., B.L., M.L.S., M.M.H., and W.S. contributed to study design; all authors contributed to acquisition of data; A.B., A.S.A., B.L., M.L.S., M.M.H., and N.G. contributed to data analysis; A.B., A.S.A., B.L., J.M.M., M.L.S., M.M.H., N.G., and W.W. contributed to interpretation of data; M.L.S. drafted the article; A.S.A. contributed significantly to article drafting; A.S.A., B.L., J.M., M.M.H., N.G., R.O., S.M.D., S.R.M., S.A.W., W.S., and V.R.B. contributed to critical revision of important intellectual content; all authors provided final approval of the article; and A.S.A., B.L., and M.L.S. agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved.
Conflict-of-interest disclosure: M.L.S. reports receiving consultancy and receiving honoraria from JAZZ Pharmaceuticals for giving educational talks and receiving research funding from BlueNote. W.W. reports receiving research support from Pfizer and Genentech; having equity and providing consulting to Koneksa Health; and providing consulting for Teladoc Health, Quantum Health, and American Society of Hematology Research Collaborative. A.M. reports receiving grant support from Novartis. P.H.I. reports receiving research support from Janssen. V.R.B. reports participating in the Safety Monitoring Committee for Protagonist; serving as an Associate Editor for the journal, Current Problems in Cancer; serving as a contributor for BMJ Best Practice; providing consultancy for Imugene, Sanofi, and Taiho; receiving research support from MEI Pharma, Actinium Pharmaceutical, Sanofi U.S. Services, AbbVie, Pfizer, Incyte, Jazz, and National Marrow Donor Program; and receiving drug support (institutional) from Chimerix for a trial. R.O. reports receiving research support from Cellectis and providing consulting for Servier and Riger. J.M. reports receiving research support from Gilead, Atara, CRISPR, Precision Biosciences, Scripps Research Institute, VOR Bio, and Affimed. S.A.W. reports serving on the speaker bureau for Sobi. A.S.A. reports having advisory role for AstraZeneca and Magenta Therapeutics and providing consulting to AbbVie. The remaining authors declare no competing financial interests.
Correspondence: Mohamed L. Sorror, Clinical Research Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave North, Seattle, WA 98109-1024; email: msorror@fredhutch.org.
References
Author notes
This is not an interventional clinical trial. There is not a specific data-sharing plan for the study because the requirements by the International Committee of Medical Journal Editors are not applicable. However, all Blood and Marrow Transplant Clinical Trials Network trial data are deposited in the BioLINCC within 30 days of publication of the primary manuscript and made publicly available according to BioLINCC processes.
The full-text version of this article contains a data supplement.