• A Manhattan model based solely on clinical symptoms identifies a low-risk group needed to test strategies that minimize treatment.

  • MAGIC composite scores using both clinical and biomarker parameters further enlarge the low-risk group and most accurately predict outcomes.

Abstract

Acute graft-versus-host disease (GVHD) grading systems that use only clinical symptoms at treatment initiation such as the Minnesota risk identify standard and high-risk categories but lack a low-risk category suitable to minimize immunosuppressive strategies. We developed a new grading system that includes a low-risk stratum based on clinical symptoms alone and determined whether the incorporation of biomarkers would improve the model’s prognostic accuracy. We randomly divided 1863 patients in the Mount Sinai Acute GVHD International Consortium (MAGIC) who were treated for GVHD into training and validation cohorts. Patients in the training cohort were divided into 14 groups based on similarity of clinical symptoms and similar nonrelapse mortality (NRM); we used a classification and regression tree (CART) algorithm to create three Manhattan risk groups that produced a significantly higher area under the receiver operating characteristic curve (AUC) for 6-month NRM than the Minnesota risk classification (0.69 vs 0.64, P = .009) in the validation cohort. We integrated serum GVHD biomarker scores with Manhattan risk using patients with available serum samples and again used a CART algorithm to establish 3 MAGIC composite scores that significantly improved prediction of NRM compared to Manhattan risk (AUC, 0.76 vs 0.70, P = .010). Each increase in MAGIC composite score also corresponded to a significant decrease in day 28 treatment response (80% vs 63% vs 30%, P < .001). We conclude that the MAGIC composite score more accurately predicts response to therapy and long-term outcomes than systems based on clinical symptoms alone and may help guide clinical decisions and trial design.

Acute graft-versus-host disease (GVHD) remains a substantial cause of morbidity and nonrelapse mortality (NRM) and a major obstacle to successful outcomes after allogeneic hematopoietic cell transplantation (HCT) despite advances in prophylaxis.1-7 High doses of systemic steroids are used as first-line treatment for acute GVHD,8-10 but ∼30% of patients develop steroid-refractory GVHD and experience poor outcomes.1,11-15 The long-term outcomes of patients who initially respond to steroid therapy can vary and be complicated by GVHD flares.16 Thus, steroid treatment courses tend to be long, resulting in significant morbidities including increased infection risk.17-20 Treatment for GVHD may thus lead to both undertreatment of some patients and overtreatment of others.

The maximum severity of acute GVHD correlates with survival outcomes,21-24 but can only be determined retrospectively, and therefore cannot be used to guide treatment in real time. The Minnesota risk system, the only validated risk stratification that was modeled on GVHD symptoms at the initiation of treatment, possesses 2 strata (standard and high),25,26 but lacks the low-risk stratum necessary for treatment minimization. Several groups, including ours, have reported that GVHD biomarkers predict outcomes independently of clinical parameters.2,16,27-35 The Mount Sinai Acute GVHD International Consortium (MAGIC) has validated the MAGIC algorithm probability (MAP), a single value incorporating weighted serum concentrations of the following 2 biomarkers: suppression of tumorigenicity 2 (ST2) and regenerating islet–derived protein 3-α (REG3α). The MAP can be considered a liquid biopsy of GVHD damage to intestinal crypts36 and accurately predicts long-term outcomes before, during, and after therapy for acute GVHD.2,16,27,28,30,31,37 No studies have validated a model integrating both clinical and laboratory parameters at treatment onset. We hypothesized that the combination of clinical and biomarker values could create 3 separate acute GVHD grades with distinct prognoses. We used the MAGIC database and biorepository to develop and validate a grading system with 3 strata solely based on clinical symptoms and then developed new MAGIC composite scores that integrate both clinical and biomarker parameters with improved prognostic accuracy.

Patient selection

We obtained clinical data and serum samples from the MAGIC database and biorepository, which encompasses 23 HCT centers in North America, Europe, and Asia. Participating centers collected clinical information that focused on acute GVHD using a prospective-specimen-collection, retrospective-blinded-evaluation (PRoBE) study design,38,39 and provided longitudinal serum samples. Patients were prospectively monitored weekly for acute GVHD symptoms according to institutional frequency. Informed consent was obtained from all participants in accordance with the Declaration of Helsinki under an institutional review board–approved protocol.

We included both pediatric and adult patients who received a first HCT between 2014 and 2021 and who received systemic treatment for acute GVHD. We excluded patients who developed primary relapse of malignancy or who received donor lymphocyte infusion or second HCT before systemic GVHD treatment. Acute GVHD was diagnosed and staged according to the published criteria.39 Minnesota risk, HCT-specific comorbidity index scores, intensity of conditioning regimens, and disease risk were classified as previously reported.2,25,40,41 A complete response (CR) was defined as complete resolution of acute GVHD manifestations without secondary treatment. A partial response (PR) was defined as a decrease in at least 1 organ stage without worsening of other organs and without the need for secondary treatment, provided that the improvement was less than a CR.11 Overall response rate (ORR) was defined by CR or PR at day 28 after systemic treatment.

Serum samples

Serial serum samples were collected prospectively, cryopreserved, and shipped to a central laboratory. Serum concentrations of ST242 and REG3α43 were analyzed by enzyme-linked immunosorbent assays, as previously reported. The MAP was calculated as a single value between 0.001 and 0.999 according to the formula: log[–log(1 – MAP)] = −11.263 + 1.844(log10ST2) + 0.577(log10REG3α).28,30 We calculated Ann Arbor (AA) scores using previously validated thresholds (AA1 < 0.141; 0.141 ≤ AA2 < 0.291; AA3 ≥ 0.291).2,16,28,44 

Statistical analysis

The beginning of systemic treatment served as the starting point in all analyses. The primary end point was 6-month NRM, and outcomes were censored at 6 months. We estimated and plotted the cumulative incidence of NRM according to the Gray method, and we considered relapse and second allogeneic HCT as competing risks. We used the Kaplan-Meier method and the log-rank test to estimate and compare overall survival (OS) probabilities. We compared categorical variables using the Fisher exact test, and continuous variables using the Mann-Whitney U test. We used the area under the curve (AUC) of receiver operating characteristic analysis and the DeLong test to compare the prognostic value of the different models. The ΔAUC and its corresponding 95% confidence intervals (CIs) were calculated using 1000 bootstrap resamples.32,45 

We developed algorithms to predict 6-month NRM as follows. First, we randomly divided patients into training and validation cohorts in a 7:3 ratio, aiming to maximize both the size of the validation cohort and the representation of patients with uncommon clinical presentations in the training cohort.46 Second, we created groups with a minimum of 20 patients per group according to clinical similarities at the time of treatment and in 6-month NRM. Third, we used a classification and regression tree (CART) algorithm47 to create 3 groups according to the risk of 6-month NRM after treatment onset. The criteria to separate groups included a maximum depth of 2 levels with a complexity parameter of 0.2 and at least 30 observations in each terminal node.48 We also applied a K-means approach with the Lloyd algorithm as a sensitivity analysis for the accuracy of aggregation.49 The performance of each model was then evaluated in the validation cohort.

All statistical tests were 2-sided, and a P value <.05 was considered statistically significant. Statistical analyses were performed with R version 4.2.2 (R Foundation for Statistical Computing, Vienna, Austria) or EZR version 1.61 (Jichi Medical University Saitama Medical Center, Saitama, Japan).50 

Patient characteristics

We randomly divided 1863 patients who fulfilled all the inclusion criteria into a training (n = 1306) and validation cohort (n = 557) (supplemental Figure 1, available on the Blood website). There were no significant differences in baseline characteristics between cohorts, except for donor source (Table 1). Severity of GVHD and organ involvement at the time of treatment were also similar between the training and validation cohorts (supplemental Table 1). Treatment practices varied in our real-world study, and notably, >20% of patients treated for grades 1 or 2 acute GVHD received low-dose systemic steroids (<0.5 mg/kg methylprednisolone) (supplemental Table 2). The median follow-up of survivors after treatment initiation was 22 months (range, 1-58) and 23 months (range, 1-37) in the training and validation cohorts, respectively.

Table 1.

Patient characteristics

TrainingValidationP values
n = 1306n = 557
Median age at HCT, y (range) 56 (0, 79) 54 (0, 79) .206 
Recipient age, category    
<18 146 (11.2) 83 (14.9) .077 
18-54 474 (36.3) 199 (35.7)  
≥55 686 (52.5) 275 (49.4)  
Sex mismatch    
Female to male 219 (16.8) 103 (18.6) .349 
Other 1087 (83.2) 452 (81.4)  
Race    
White 1108 (84.8) 462 (82.9) .362 
Black 70 (5.4) 25 (4.5)  
Asian 46 (3.5) 25 (4.5)  
Others 6 (0.5) 5 (0.9)  
Unknown 76 (5.8) 40 (7.2)  
Primary disease    
Acute leukemia 677 (51.8) 299 (53.7) .349 
MDS/MPN 341 (26.1) 147 (26.4)  
Malignant lymphoma 117 (9.0) 36 (6.5)  
Other 171 (13.1) 75 (13.5)  
Disease risk    
Standard 1059 (81.1) 430 (77.2) .058 
High 247 (18.9) 127 (22.8)  
Donor type    
HLA matched related 267 (20.4) 111 (19.9) .058 
HLA matched unrelated 714 (54.7) 277 (49.7)  
HLA mismatched related 8 (0.6) 4 (0.7)  
HLA mismatched unrelated 122 (9.3) 65 (11.7)  
Haploidentical 148 (11.3) 65 (11.7)  
Umbilical cord blood 47 (3.6) 35 (6.3)  
GVHD prophylaxis    
CNI and MTX based 691 (52.9) 292 (52.4) .744 
CNI and MMF based 305 (23.4) 143 (25.7)  
PTCy 219 (16.8) 82 (14.7)  
Ex vivo T-cell depletion 38 (2.9) 17 (3.1)  
Other 53 (4.1) 23 (4.1)  
HCT-CI    
0-2 884 (67.7) 372 (66.8) .706 
≥3 422 (32.3) 185 (33.2)  
In vivo T-cell depletion    
No 809 (61.9) 341 (61.2) .795 
Yes 497 (38.1) 216 (38.8)  
Donor source    
Bone marrow 252 (19.3) 117 (21.0) .020 
Peripheral blood 1007 (77.1) 405 (72.7)  
Umbilical cord blood 47 (3.6) 35 (6.3)  
Conditioning    
MAC (TBI <8 Gy) 532 (40.7) 226 (40.6) .923 
MAC (TBI ≥8 Gy) 204 (15.6) 91 (16.3)  
RIC 570 (43.6) 240 (43.1)  
Sample available at Tx    
No 256 (19.6) 125 (22.4) .168 
Yes 1050 (80.4) 432 (77.6)  
Median y of HCT (range) 2018 (2014-2021) 2017 (2014-2021) .117 
TrainingValidationP values
n = 1306n = 557
Median age at HCT, y (range) 56 (0, 79) 54 (0, 79) .206 
Recipient age, category    
<18 146 (11.2) 83 (14.9) .077 
18-54 474 (36.3) 199 (35.7)  
≥55 686 (52.5) 275 (49.4)  
Sex mismatch    
Female to male 219 (16.8) 103 (18.6) .349 
Other 1087 (83.2) 452 (81.4)  
Race    
White 1108 (84.8) 462 (82.9) .362 
Black 70 (5.4) 25 (4.5)  
Asian 46 (3.5) 25 (4.5)  
Others 6 (0.5) 5 (0.9)  
Unknown 76 (5.8) 40 (7.2)  
Primary disease    
Acute leukemia 677 (51.8) 299 (53.7) .349 
MDS/MPN 341 (26.1) 147 (26.4)  
Malignant lymphoma 117 (9.0) 36 (6.5)  
Other 171 (13.1) 75 (13.5)  
Disease risk    
Standard 1059 (81.1) 430 (77.2) .058 
High 247 (18.9) 127 (22.8)  
Donor type    
HLA matched related 267 (20.4) 111 (19.9) .058 
HLA matched unrelated 714 (54.7) 277 (49.7)  
HLA mismatched related 8 (0.6) 4 (0.7)  
HLA mismatched unrelated 122 (9.3) 65 (11.7)  
Haploidentical 148 (11.3) 65 (11.7)  
Umbilical cord blood 47 (3.6) 35 (6.3)  
GVHD prophylaxis    
CNI and MTX based 691 (52.9) 292 (52.4) .744 
CNI and MMF based 305 (23.4) 143 (25.7)  
PTCy 219 (16.8) 82 (14.7)  
Ex vivo T-cell depletion 38 (2.9) 17 (3.1)  
Other 53 (4.1) 23 (4.1)  
HCT-CI    
0-2 884 (67.7) 372 (66.8) .706 
≥3 422 (32.3) 185 (33.2)  
In vivo T-cell depletion    
No 809 (61.9) 341 (61.2) .795 
Yes 497 (38.1) 216 (38.8)  
Donor source    
Bone marrow 252 (19.3) 117 (21.0) .020 
Peripheral blood 1007 (77.1) 405 (72.7)  
Umbilical cord blood 47 (3.6) 35 (6.3)  
Conditioning    
MAC (TBI <8 Gy) 532 (40.7) 226 (40.6) .923 
MAC (TBI ≥8 Gy) 204 (15.6) 91 (16.3)  
RIC 570 (43.6) 240 (43.1)  
Sample available at Tx    
No 256 (19.6) 125 (22.4) .168 
Yes 1050 (80.4) 432 (77.6)  
Median y of HCT (range) 2018 (2014-2021) 2017 (2014-2021) .117 

CNI, calcineurin inhibitor; HCT-CI, hematopoietic cell transplantation–specific comorbidity index; MAC, myeloablative conditioning; MDS/MPN, myelodysplastic syndromes/myeloproliferative neoplasms; MMF, mycophenolate mofetil; MTX, methotrexate; PTCy, posttransplant cyclophosphamide; RIC, reduced-intensity conditioning; TBI, total body irradiation; Tx, treatment.

Manhattan risk system

We first categorized all 76 distinct combinations of GVHD target organ severity that possessed at least 1 case in the training cohort (supplemental Table 3) into 24 groups based on similarities of individual organ severity at the time of treatment (Table 2). We then combined groups with both similar clinical characteristics and 6-month NRM to create 14 categories with at least 20 patients in each category (Table 2). Using the CART algorithm, we further reduced the number of categories to 3 (low, intermediate, and high risk), which we termed the Manhattan risk model. A sensitivity analysis using an unsupervised K-means clustering algorithm confirmed the accuracy of aggregation (Table 2).

Table 2.

GVHD organ involvement categories in the training cohort

Organ involvementFirst 24 categoriesCollapsed 14 categoriesDay 28 ORR (%)GlucksbergMinnesota riskCARTK-means
n6-mo NRM (%)n6-mo NRM (%)Manhattan riskManhattan risk
Isolated stage I skin 135 5.3 135 5.3 63.0 Grade 1 Standard Low Low 
Isolated stage II skin 259 7.8 259 7.8 76.1 Grade 1 Standard Low Low 
Isolated UGI 112 8.1 112 8.1 73.2 Grade 2 Standard Low Low 
Stage I skin + UGI 35 5.7 35 5.7 77.1 Grade 2 Standard Low Low 
Stage II skin + UGI 33 21.2 33 21.2 81.8 Grade 2 Standard Intermediate Intermediate 
Stage I LGI ± UGI 151 14.0 151 14.0 66.2 Grade 2 Standard Intermediate Intermediate 
Stage I skin + stage I LGI ± UGI 47 14.9 83 14.6 71.1 Grade 2 Standard Intermediate Intermediate 
Stage II skin + stage I LGI ± UGI 20 14.8    Grade 2    
Stage III skin + stage I LGI ± UGI 16 12.5    Grade 2    
Stage III skin ± UGI 222 11.8 222 11.8 72.5 Grade 2 Standard Intermediate Intermediate 
Stage II LGI ± UGI 63 15.9 63 15.9 58.7 Grade 3 Standard Intermediate Intermediate 
Stage I liver ± other organ involvement 20 35.5 59 46.6 40.7 Grade 2-4 Standard/High High High 
Stage II liver ± other organ involvement 23 37.8    Grade 3-4 Standard/High High High 
Stage III liver ± other organ involvement 12 73.1    Grade 3-4 Standard/High High High 
Stage IV liver ± other organ involvement 75.0    Grade 4 Standard/High High High 
Stage I skin + stage II LGI ± UGI 15 26.7 34 29.4 64.7 Grade 3 High High High 
Stage II skin + stage II LGI ± UGI 10 40.0    Grade 3    
Stage III skin + stage II LGI ± UGI 22.2    Grade 3    
Stage III LGI ± UGI 53 34.3 53 34.3 43.4 Grade 3 High High High 
Stage I skin + stage III LGI ± UGI 33.3 21 49.1 52.4 Grade 3 High High High 
Stage II skin + stage III LGI ± UGI 65.7    Grade 3    
Stage III skin + stage III LGI ± UGI 60.0    Grade 3    
Stage IV skin ± LGI ± UGI 25.0 46 32.6 47.8 Grade 4 High High High 
Stage IV LGI ± skin ± UGI 38 34.2    Grade 3    
Organ involvementFirst 24 categoriesCollapsed 14 categoriesDay 28 ORR (%)GlucksbergMinnesota riskCARTK-means
n6-mo NRM (%)n6-mo NRM (%)Manhattan riskManhattan risk
Isolated stage I skin 135 5.3 135 5.3 63.0 Grade 1 Standard Low Low 
Isolated stage II skin 259 7.8 259 7.8 76.1 Grade 1 Standard Low Low 
Isolated UGI 112 8.1 112 8.1 73.2 Grade 2 Standard Low Low 
Stage I skin + UGI 35 5.7 35 5.7 77.1 Grade 2 Standard Low Low 
Stage II skin + UGI 33 21.2 33 21.2 81.8 Grade 2 Standard Intermediate Intermediate 
Stage I LGI ± UGI 151 14.0 151 14.0 66.2 Grade 2 Standard Intermediate Intermediate 
Stage I skin + stage I LGI ± UGI 47 14.9 83 14.6 71.1 Grade 2 Standard Intermediate Intermediate 
Stage II skin + stage I LGI ± UGI 20 14.8    Grade 2    
Stage III skin + stage I LGI ± UGI 16 12.5    Grade 2    
Stage III skin ± UGI 222 11.8 222 11.8 72.5 Grade 2 Standard Intermediate Intermediate 
Stage II LGI ± UGI 63 15.9 63 15.9 58.7 Grade 3 Standard Intermediate Intermediate 
Stage I liver ± other organ involvement 20 35.5 59 46.6 40.7 Grade 2-4 Standard/High High High 
Stage II liver ± other organ involvement 23 37.8    Grade 3-4 Standard/High High High 
Stage III liver ± other organ involvement 12 73.1    Grade 3-4 Standard/High High High 
Stage IV liver ± other organ involvement 75.0    Grade 4 Standard/High High High 
Stage I skin + stage II LGI ± UGI 15 26.7 34 29.4 64.7 Grade 3 High High High 
Stage II skin + stage II LGI ± UGI 10 40.0    Grade 3    
Stage III skin + stage II LGI ± UGI 22.2    Grade 3    
Stage III LGI ± UGI 53 34.3 53 34.3 43.4 Grade 3 High High High 
Stage I skin + stage III LGI ± UGI 33.3 21 49.1 52.4 Grade 3 High High High 
Stage II skin + stage III LGI ± UGI 65.7    Grade 3    
Stage III skin + stage III LGI ± UGI 60.0    Grade 3    
Stage IV skin ± LGI ± UGI 25.0 46 32.6 47.8 Grade 4 High High High 
Stage IV LGI ± skin ± UGI 38 34.2    Grade 3    

UGI, upper gastrointestinal; LGI, lower gastrointestinal.

Manhattan risk differed from Minnesota risk in 2 important subsets. First, approximately half of Minnesota-standard-risk patients became low-risk: clinical symptoms included isolated stage I or II skin, isolated upper gastrointestinal (UGI), and stage I skin plus UGI GVHD. Second, in contrast to the Minnesota criteria, which classifies patients with liver GVHD with stage I to III skin as standard-risk, the Manhattan risk system classifies patients with any liver involvement as high-risk.

The Glucksberg classification (grades 1/2 vs 3/4),22 and a recently proposed principal component–derived grading system24 possess AUCs similar to those of the Minnesota risk system25 for the prediction of 6-month NRM (supplemental Figure 2). In the validation cohort (supplemental Table 4), the AUC of the Manhattan model for 6-month NRM was significantly higher than that of the Minnesota model (0.69 vs 0.64; P = .009; ΔAUC, 0.057 [95% CI, 0.016-0.101]) (supplemental Figure 3). The Manhattan risk model did not predict relapse, and thus differences in OS between groups were determined by differences in NRM (supplemental Figure 4). The Manhattan model defined 40% of patients as low-risk, and the 3 Manhattan strata possessed distinctly different 6-month NRM in both the training and the validation cohorts (Figure 1A-B). Comparisons of risk categories by organ involvement are summarized for the 2 models in supplemental Table 5.

Figure 1.

NRM in the clinical risk models. Six-month cumulative incidence of NRM by Minnesota (left) and Manhattan (right) risk strata. (A) Training cohort. Minnesota standard risk: 10.2% (95% CI, 8.5-12.2); Minnesota high risk: 36.8% (95% CI, 30.5-43.0); Manhattan low risk: 7.1% (95% CI, 5.1- 9.5); Manhattan intermediate risk: 13.9% (95% CI, 11.1-16.9); Manhattan high risk: 37.8% (95% CI, 31.2-44.4). (B) Validation cohort. Minnesota standard risk: 11.0% (95% CI, 8.3-14.1); Minnesota high risk: 34.4% (95% CI, 25.3-43.6); Manhattan low risk: 7.0% (95% CI, 4.2-10.8); Manhattan intermediate risk: 14.9% (95% CI, 10.6-19.9); Manhattan high risk: 35.8% (95% CI, 26.4-45.4). Pie charts depict the percentage of each clinical risk. ∗P values for pairwise comparisons were adjusted using the Bonferroni method.

Figure 1.

NRM in the clinical risk models. Six-month cumulative incidence of NRM by Minnesota (left) and Manhattan (right) risk strata. (A) Training cohort. Minnesota standard risk: 10.2% (95% CI, 8.5-12.2); Minnesota high risk: 36.8% (95% CI, 30.5-43.0); Manhattan low risk: 7.1% (95% CI, 5.1- 9.5); Manhattan intermediate risk: 13.9% (95% CI, 11.1-16.9); Manhattan high risk: 37.8% (95% CI, 31.2-44.4). (B) Validation cohort. Minnesota standard risk: 11.0% (95% CI, 8.3-14.1); Minnesota high risk: 34.4% (95% CI, 25.3-43.6); Manhattan low risk: 7.0% (95% CI, 4.2-10.8); Manhattan intermediate risk: 14.9% (95% CI, 10.6-19.9); Manhattan high risk: 35.8% (95% CI, 26.4-45.4). Pie charts depict the percentage of each clinical risk. ∗P values for pairwise comparisons were adjusted using the Bonferroni method.

Close modal

To evaluate the robustness of the Manhattan model, we evaluated subsets limited to Glucksberg grade 2 to 4 acute GVHD or to treatment with ≥0.5-mg/kg methylprednisolone in the whole cohort. The AUCs of the Manhattan risk model remained superior to those of the Minnesota model for both groups (0.67 vs 0.65; P = .024; ΔAUC, 0.024 [95% CI, 0.003-0.044]; 0.68 vs 0.65; P = .005; ΔAUC, 0.035 [95% CI, 0.011-0.061], respectively). The risks of NRM were similar within each risk category of these subsets (supplemental Tables 6 and 7).

MAGIC composite scores

We hypothesized that the inclusion of serum biomarkers at the onset of treatment would further improve the performance of the Manhattan clinical risk model, particularly for intermediate-risk patients, for whom 6-month NRM was only 7% higher than that of low-risk patients. Serum samples at treatment onset were available in 80% (1050/1306) of the training and 78% (432/557) of the validation cohort (Table 1; supplemental Figure 1). The 6-month NRM did not differ between patients with and without samples in either cohort (16% vs 13%; P = .296 and 16% vs 12%; P = .329, respectively). As expected, we found that AA scores independently stratified the risk of NRM in each risk group of the Manhattan risk model. The risk of NRM for each AA score increased with escalating Manhattan risk, further demonstrating improved prediction of outcome by combining clinical and biomarker assessments (supplemental Figure 5). We again applied a CART analysis to the 9 combinations of the Manhattan risk and AA scores in the training cohort that created a new composite scoring system of 3 strata, which we called the MAGIC composite scores (Table 3). In analyses not presented herein, we tested the accuracy of models containing 4 to 9 categories, but none provided significantly greater AUCs than the 3-category model, which was used in all subsequent analyses. We confirmed the accuracy performance of the new model using an unsupervised K-means clustering algorithm.

Table 3.

Algorithm assignment to MAGIC composite scores of 9 categories determined by the Manhattan risk and AA scores in the training cohort

Manhattan riskAA scoresn (%)6-mo NRM (%)CARTK-means
Low AA1 296 (28.2) 3.1 MCS1 MCS1 
Low AA2 99 (9.4) 12.1 MCS1 MCS1 
Low AA3 36 (3.4) 27.8 MCS2 MCS2 
Intermediate AA1 247 (23.5) 8.7 MCS1 MCS1 
Intermediate AA2 125 (11.9) 18.4 MCS2 MCS2 
Intermediate AA3 74 (7.0) 29.7 MCS2 MCS2 
High AA1 50 (4.8) 20.4 MCS2 MCS2 
High AA2 52 (5.0) 29.1 MCS2 MCS2 
High AA3 71 (6.8) 56.3 MCS3 MCS3 
Manhattan riskAA scoresn (%)6-mo NRM (%)CARTK-means
Low AA1 296 (28.2) 3.1 MCS1 MCS1 
Low AA2 99 (9.4) 12.1 MCS1 MCS1 
Low AA3 36 (3.4) 27.8 MCS2 MCS2 
Intermediate AA1 247 (23.5) 8.7 MCS1 MCS1 
Intermediate AA2 125 (11.9) 18.4 MCS2 MCS2 
Intermediate AA3 74 (7.0) 29.7 MCS2 MCS2 
High AA1 50 (4.8) 20.4 MCS2 MCS2 
High AA2 52 (5.0) 29.1 MCS2 MCS2 
High AA3 71 (6.8) 56.3 MCS3 MCS3 

MCS, MAGIC composite scores.

The incidence of NRM within 6 months increased with each increase in MAGIC composite score, but the incidence of relapse did not change, resulting in large differences in OS between each group in both the training and validation cohorts (Figure 2A-C; supplemental Figure 6A-C). In the total population, 24% (356/1482) of intermediate-risk patients in the Manhattan model had a MAGIC composite score of 1, with a 6-month NRM rate of only 8%. Furthermore, 3% (46/1482) of low-Manhattan-risk patients increased by 1 risk stratum to a MAGIC composite score of 2, with a 6-month NRM of 28%, and 12% (147/1428) of high-Manhattan-risk patients decreased by 1 risk stratum to a MAGIC composite score of 2, with a 6-month NRM of 26%.

Figure 2.

NRM and AUC of the MAGIC composite scores. (A) Six-month cumulative incidence of NRM. MAGIC composite score 1: 5.7% (95% CI, 3.3-8.9); composite score 2: 28.8% (95% CI, 21.2-36.8); composite score 3: 51.5% (95% CI, 33.1-67.2). (B) Six-month cumulative incidence of relapse. MAGIC composite score 1: 8.3% (95% CI, 5.4-12.0); composite score 2: 10.8% (95% CI, 6.2-16.9); composite score 3: 6.7% (95% CI, 1.1-19.7). (C) Probability of OS at 6 months; MAGIC composite score 1: 90.6% (95% CI, 86.4-93.5); composite score 2: 64.3% (95% CI, 55.3-71.9); composite score 3: 42.4% (95% CI, 25.6-58.3). Pie charts depict the percentage of each composite score. ∗P values for pairwise comparisons were adjusted using the Bonferroni method. (D) Time-dependent area under the receiver operating characteristic curve for NRM from the time of systemic treatment.

Figure 2.

NRM and AUC of the MAGIC composite scores. (A) Six-month cumulative incidence of NRM. MAGIC composite score 1: 5.7% (95% CI, 3.3-8.9); composite score 2: 28.8% (95% CI, 21.2-36.8); composite score 3: 51.5% (95% CI, 33.1-67.2). (B) Six-month cumulative incidence of relapse. MAGIC composite score 1: 8.3% (95% CI, 5.4-12.0); composite score 2: 10.8% (95% CI, 6.2-16.9); composite score 3: 6.7% (95% CI, 1.1-19.7). (C) Probability of OS at 6 months; MAGIC composite score 1: 90.6% (95% CI, 86.4-93.5); composite score 2: 64.3% (95% CI, 55.3-71.9); composite score 3: 42.4% (95% CI, 25.6-58.3). Pie charts depict the percentage of each composite score. ∗P values for pairwise comparisons were adjusted using the Bonferroni method. (D) Time-dependent area under the receiver operating characteristic curve for NRM from the time of systemic treatment.

Close modal

Using 6-month NRM as the outcome, the AUC of the MAGIC composite score model was significantly higher than that of the Manhattan model in both the training (0.73 vs 0.69; P = .019; ΔAUC, 0.042 [95% CI, 0.007-0.076]) and the validation cohorts (0.76 vs 0.70; P = .010; ΔAUC, 0.064 [95% CI, 0.018-0.112]) (supplemental Figure 7). We next assessed the prognostic efficacy of each model at several time points during the first year from GVHD treatment. The MAGIC composite scores were consistently superior to both Manhattan and Minnesota risk models (Figure 2D). The Akaike information criterion for predicting 6-month NRM based on MAGIC composite scores was also substantially lower (763.9) than the Manhattan (1006.9) or Minnesota risk model (1046.7).

In addition to NRM, the response to primary treatment is a key metric of successful predictive tests. We therefore assessed these models for their prediction of day 28 ORR, the standard end point for treatment response in clinical trials. In both the training and validation sets, there were significant differences in ORR between each MAGIC composite score, but there was no significant difference in ORR between the low- and intermediate-Manhattan-risk groups (Figure 3). These differences in ORR by MAGIC composite score align with the improved prediction of 6-month NRM and support the use of MAGIC composite scores to guide first-line therapy. Interestingly, although AA biomarker scores alone also predicted 6-month NRM as well as the composite scores, the composite scores were better predictors of day 28 ORR (supplemental Table 8). The large 17-point difference between MAGIC composite score 1 and 2 was highly significant (80% vs 63%; P < .001), whereas the smaller 11-point difference between AA 1 and 2 was not (80% vs 69%; P = .068).

Figure 3.

Day 28 ORR. Day 28 ORR by the Minnesota risk (left), Manhattan risk (middle), and MAGIC composite scores (right). (A) Training cohort. Minnesota standard risk: 71.5%; Minnesota high risk: 47.0%; Manhattan low risk: 72.3%; Manhattan intermediate risk: 69.6%; Manhattan high risk: 47.9%; MAGIC composite score 1: 74.8%; MAGIC composite score 2: 63.2%; MAGIC composite score 3: 35.2%. (B) Validation cohort. Minnesota standard risk: 73.3%; Minnesota high risk: 49.5%; Manhattan low risk: 77.0%; Manhattan intermediate risk: 69.7%; Manhattan high risk: 48.5%; MAGIC composite score 1: 79.8%; MAGIC composite score 2: 62.9%; MAGIC composite score 3: 30.3%. The error bars represent standard errors. ∗P values for pairwise comparisons were adjusted using the Bonferroni method.

Figure 3.

Day 28 ORR. Day 28 ORR by the Minnesota risk (left), Manhattan risk (middle), and MAGIC composite scores (right). (A) Training cohort. Minnesota standard risk: 71.5%; Minnesota high risk: 47.0%; Manhattan low risk: 72.3%; Manhattan intermediate risk: 69.6%; Manhattan high risk: 47.9%; MAGIC composite score 1: 74.8%; MAGIC composite score 2: 63.2%; MAGIC composite score 3: 35.2%. (B) Validation cohort. Minnesota standard risk: 73.3%; Minnesota high risk: 49.5%; Manhattan low risk: 77.0%; Manhattan intermediate risk: 69.7%; Manhattan high risk: 48.5%; MAGIC composite score 1: 79.8%; MAGIC composite score 2: 62.9%; MAGIC composite score 3: 30.3%. The error bars represent standard errors. ∗P values for pairwise comparisons were adjusted using the Bonferroni method.

Close modal

Using the whole cohort, we next evaluated the robustness of the MAGIC composite score model in the following 2 key subsets: patients with Glucksberg grade 2 to 4 acute GVHD and patients treated with ≥0.5-mg/kg methylprednisolone. The MAGIC composite score model produced significantly higher AUCs in both subsets compared with the Manhattan risk model (0.73 vs 0.67; P < .001; ΔAUC, 0.054 [95% CI, 0.024-0.086]; 0.75 vs 0.69; P = .009; ΔAUC, 0.054 [95% CI, 0.025-0.083]). Each risk category of the MAGIC composite score demonstrated a similar 6-month NRM within these subsets (supplemental Tables 6 and 7).

Black patients comprised 5% (95/1863) of the total population, whereas pediatric (<18 years old) patients comprised 12% (229/1863) of the total population (Table 1). As shown in supplemental Figures 8 and 9, both the Manhattan and the MAGIC composite score models performed well in these small subgroups, although numbers were not sufficiently large to indicate statistically significant differences between strata. The Manhattan risk model divided patients into approximately equal portions, with small differences between low and intermediate groups and a large difference between intermediate- and high-risk groups. The MAGIC composite scores correctly recategorized some higher-risk patients as lower-risk, resulting in a majority of patients with a composite score of 1 and a very low NRM rate, whereas the NRM increased in the smaller group of patients with a score of 2.

Finally, we also evaluated the model in a second key subset of patients developing acute GVHD after receiving prophylaxis that contained posttransplantation cyclophosphamide (n = 301). The overall 6-month NRM in this group was low (12%), and in the Manhattan risk model, there was no significant difference between groups in NRM among these patients (11% vs 11% vs 19%; P = .336). Incorporation of biomarkers into the MAGIC composite score model effectively stratified the risk of NRM in these patients (8% vs 16% vs 27%; P = .026) (supplemental Figure 10). When donor groups were evaluated separately, similar patterns were observed for recipients of both haploidentical and nonhaploidentical donors (supplemental Table 9).

High initial doses of corticosteroids and gradual tapers lasting for months have been the recommended treatment for GVHD for decades.8,10 Recent advances in GVHD prophylaxis, however, have reduced the overall incidence of severe GVHD, and mild to moderate symptoms are now the dominant clinical phenotype.5,6 In this study, the observed NRM for standard-Minnesota-risk patients (∼11%) was half that of previous publications,25,26 reflecting a trend toward less NRM from GVHD that may be due to improved GVHD prophylaxis, anti-infective therapy, and supportive care.5,6,19 We first validated the Manhattan risk system using only clinical organ severity, which identified significant numbers of patients with mild GVHD in a low-risk stratum encompassing ∼40% of patients. These data confirm an important finding of a recent retrospective study by Nikiforow et al51 demonstrating that patients with isolated UGI disease experienced low NRM (Table 2). However, in the current study, although UGI symptoms did not increase the NRM of patients with stage I skin GVHD, they did increase the NRM of patients with stage II skin GVHD fourfold, elevating the risk of this latter group from low to intermediate. The size of the Manhattan low-risk stratum significantly increased to >60% of patients with the incorporation of biomarker values. The incidence of 6-month NRM for these patients (∼6%) is almost half that of the Minnesota standard risk (∼11%), which may represent a clinically important difference in outcomes. Patients with MAGIC composite score 1 thus have a very low risk of NRM, which may serve to guide individual treatment strategies that minimize steroid exposure18 (NCT05090384). We have created a public website that includes a calculator for combining GVHD stages of individual organs (with specific guidance for gastrointestinal symptoms) to generate Manhattan risk, and for integrating biomarker values to generate MAGIC composite scores.52 

Unusual presentations and subtle manifestations of GVHD may present challenges to its accurate diagnosis and staging, particularly in ethnic minority populations for whom there are minimal historical data. When clinical findings alone are not definitive, physicians often consider both clinical symptoms and laboratory findings in determining the treatment of individual patients. The incorporation of biomarkers and clinical symptoms in the MAGIC composite scores by creating a third risk group leverages the prognostic accuracy of the MAGIC serum biomarkers2,27,28,30,31 and resolves the dilemma that clinicians face when the severity of clinical and laboratory parameters does not align. The MAGIC composite score model not only produced statistically significant differences in AUCs compared with the clinical risk models, but the integration of AA scores with Manhattan risk produced clinically meaningful changes in risk of NRM for 2 subsets of patients. First, nearly one-quarter of all patients who were classified as Manhattan-intermediate-risk but who had the lowest biomarker risk (AA1) experienced very low 6-month NRM of 8% and were therefore classified as the MAGIC composite score 1. Second, a small group (<5% of all patients) with Manhattan low risk and the highest biomarker risk (AA3) had 6-month NRM of 28% and were therefore classified as the MAGIC composite score 2. The high risk of NRM in this small group is important to consider in treatment decisions.

Increasing numbers of patients are currently receiving posttransplantation cyclophosphamide–based GVHD prophylaxis in human leukocyte antigen (HLA)–matched donor HCT and HLA–mismatched and haploidentical HCT.5,53-57 The Manhattan risk model using clinical symptoms alone did not distinguish between low and intermediate risk in such patients. However, MAGIC composite scores did successfully stratify these patients into 3 groups for risk of NRM, further demonstrating the additive value of biomarker scores to clinical phenotypes.

When biomarker values are not readily available, the Manhattan risk model offers advantages relative to the Minnesota risk model. First, given the superior survival and response rate of patients with Manhattan-low-risk GVHD, they may be considered for clinical trials designed to minimize immunosuppressive treatment. Second, it may be desirable to exclude patients with Manhattan low risk who have excellent treatment responses to standard treatment along with low NRM from trials investigating treatments intended to improve response rates. Third, the inclusion of all patients with liver GVHD in the high-risk group32 regardless of other organ involvement resolves an anomaly of the Minnesota risk system that categorized some patients with both skin and liver GVHD as standard-risk instead of high-risk.

Our study has several limitations. First, although our training cohort was large, certain groups of patients such as ethnic minorities and individuals with unusual combinations of GVHD manifestations were small. We approached the latter issue by merging a number of groups on the basis of similarities of their symptoms and NRM rates so that each group contained at least 20 patients before applying CART analysis. Thus, GVHD presenting with a rare constellation of symptoms may not be ideally classified. Second, carefully designed clinical trials are required to determine whether infectious deaths in patients whose GVHD has resolved can be prevented by less immunosuppressive (but equally effective) therapy. In this regard, the ability of the MAGIC composite scores to predict day 28 ORR is encouraging. Third, we included patients with Glucksberg grade 1 acute GVHD that was systemically treated, although systemic treatment is not uniformly recommended for this group.10 It is thus reassuring that the Manhattan and MAGIC composite score models performed well in subset analyses when patients treated with low doses of corticosteroids or those with Glucksberg grade 1 GVHD were excluded. Fourth, the initial dose of steroids varied among the participating centers, reflecting the heterogeneity of real-world practices.58 These findings must therefore be confirmed with data from prospective clinical trials applying strict exclusion and inclusion criteria and using consistent, homogeneous treatments.

In summary, a new Manhattan risk system based on clinical symptoms alone at the initiation of systemic treatment, and new MAGIC composite scores that include biomarkers, are more accurate than the current risk classification systems. The verification by a second statistical approach of both models lends confidence to the accuracy of their categorizations. However, the number of ethnic minority patients in this report is small, which may limit the models’ application to these and other rare populations. These improved models offer the potential to more accurately identify patients with both low- and high-risk disease who could benefit from personalized primary treatment strategies.

The authors greatly appreciate the patients, their families, the medical staff, and the data managers in the Mount Sinai Acute GVHD International Consortium centers.

This work was supported by the National Institutes of Health, National Cancer Institute (grants P01 CA039542 and P30 CA196521), the National Pediatric Cancer Foundation, and the German José Carreras Leukaemia Foundation (grants DJCLS 01 GVHD 2016 and DJCLS 01 GVHD 2020). Y.A. is a recipient of the Japan Society for the Promotion of Science Postdoctoral Fellowship for Research Abroad.

Contribution: Y.A. designed the study, collected the clinical data, conducted the statistical analysis, and wrote the manuscript; N.S. collected the clinical data, advised on statistical methods, and reviewed and revised the manuscript; D.W., P.A.-H., F.A., C.C., H.K.C., M.E., A.M.E., S.A.G., E.O.H., W.J.H., C.L.K., S. Kraus, M.M.A.M., P.M., M.Q., R.R., T.S., E.U., I.V., M.W., R.Z., Y.-B.C., and R.N. collected the clinical data, and reviewed and revised the manuscript; J.B., G.E., S.G., N.K., and R.Y. collected and reviewed the clinical data; R.B., S. Kowalyk, and G.M. performed the laboratory analysis; J.E.L. and J.L.M.F. designed the study, interpreted data, advised on methods, reviewed and revised the manuscript, and organized this project; and all authors contributed to the writing of the report and approved the final version of the manuscript.

Conflict-of-interest disclosure: M.W. received consulting fees from Amgen, Germany and speaker’s fees from Novartis, Germany. J.E.L. and J.L.M.F. report research support from Equillium, Incyte, MaaT Pharma, and Mesoblast, and consulting fees from Editas, Equillium, Kamada, and Mesoblast. J.E.L. reports additional consulting fees from Sanofi, bluebird bio, Inhibrx, and X4 Pharmaceuticals. J.L.M.F. reports additional consulting fees from Alexion, Realta, Medpace, Viracor, AlloVir, and Physicians’ Education Resource. The remaining authors declare no competing financial interests.

Correspondence: John E. Levine, The Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place/Box 1410, New York, NY 10029; email: john.levine@mssm.edu; and James L. M. Ferrara, The Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029; email: james.ferrara@mssm.edu.

1.
Martin
PJ
.
How I treat steroid-refractory acute graft-versus-host disease
.
Blood
.
2020
;
135
(
19
):
1630
-
1638
.
2.
Akahoshi
Y
,
Spyrou
N
,
Hogan
WJ
, et al
.
Incidence, clinical presentation, risk factors, outcomes, and biomarkers in de novo late acute GVHD
.
Blood Adv
.
2023
;
7
(
16
):
4479
-
4491
.
3.
Greinix
HT
,
Eikema
DJ
,
Koster
L
, et al
.
Improved outcome of patients with graft-versus-host disease after allogeneic hematopoietic cell transplantation for hematologic malignancies over time: an EBMT mega-file study
.
Haematologica
.
2022
;
107
(
5
):
1054
-
1063
.
4.
Khoury
HJ
,
Wang
T
,
Hemmer
MT
, et al
.
Improved survival after acute graft-versus-host disease diagnosis in the modern era
.
Haematologica
.
2017
;
102
(
5
):
958
-
966
.
5.
Bolanos-Meade
J
,
Hamadani
M
,
Wu
J
, et al
.
Post-transplantation cyclophosphamide-based graft-versus-host disease prophylaxis
.
N Engl J Med
.
2023
;
388
(
25
):
2338
-
2348
.
6.
Watkins
B
,
Qayed
M
,
McCracken
C
, et al
.
Phase II trial of costimulation blockade with abatacept for prevention of acute GVHD
.
J Clin Oncol
.
2021
;
39
(
17
):
1865
-
1877
.
7.
Akahoshi
Y
,
Igarashi
A
,
Fukuda
T
, et al
.
Impact of graft-versus-host disease and graft-versus-leukemia effect based on minimal residual disease in Philadelphia chromosome-positive acute lymphoblastic leukemia
.
Br J Haematol
.
2020
;
190
(
1
):
84
-
92
.
8.
Martin
PJ
,
Rizzo
JD
,
Wingard
JR
, et al
.
First- and second-line systemic treatment of acute graft-versus-host disease: recommendations of the American Society of Blood and Marrow Transplantation
.
Biol Blood Marrow Transplant
.
2012
;
18
(
8
):
1150
-
1163
.
9.
Malard
F
,
Holler
E
,
Sandmaier
BM
,
Huang
H
,
Mohty
M
.
Acute graft-versus-host disease
.
Nat Rev Dis Primers
.
2023
;
9
(
1
):
27
.
10.
Penack
O
,
Marchetti
M
,
Aljurf
M
, et al
.
Prophylaxis and management of graft-versus-host disease after stem-cell transplantation for haematological malignancies: updated consensus recommendations of the European Society for Blood and Marrow Transplantation
.
Lancet Haematol
.
2024
;
11
(
2
):
e147
-
e159
.
11.
MacMillan
ML
,
DeFor
TE
,
Weisdorf
DJ
.
The best endpoint for acute GVHD treatment trials
.
Blood
.
2010
;
115
(
26
):
5412
-
5417
.
12.
Levine
JE
,
Logan
B
,
Wu
J
, et al
.
Graft-versus-host disease treatment: predictors of survival
.
Biol Blood Marrow Transplant
.
2010
;
16
(
12
):
1693
-
1699
.
13.
Saliba
RM
,
Couriel
DR
,
Giralt
S
, et al
.
Prognostic value of response after upfront therapy for acute GVHD
.
Bone Marrow Transplant
.
2012
;
47
(
1
):
125
-
131
.
14.
Inamoto
Y
,
Martin
PJ
,
Storer
BE
,
Mielcarek
M
,
Storb
RF
,
Carpenter
PA
.
Response endpoints and failure-free survival after initial treatment for acute graft-versus-host disease
.
Haematologica
.
2014
;
99
(
2
):
385
-
391
.
15.
Biavasco
F
,
Ihorst
G
,
Wasch
R
, et al
.
Therapy response of glucocorticoid-refractory acute GVHD of the lower intestinal tract
.
Bone Marrow Transplant
.
2022
;
57
(
10
):
1500
-
1506
.
16.
Akahoshi
Y
,
Spyrou
N
,
Hoepting
M
, et al
.
Flares of acute graft-versus-host disease: a Mount Sinai Acute GVHD International Consortium analysis
.
Blood Adv
.
2024
;
8
(
8
):
2047
-
2057
.
17.
El Jurdi
N
,
Rayes
A
,
MacMillan
ML
, et al
.
Steroid-dependent acute GVHD after allogeneic hematopoietic cell transplantation: risk factors and clinical outcomes
.
Blood Adv
.
2021
;
5
(
5
):
1352
-
1359
.
18.
Etra
A
,
Capellini
A
,
Alousi
A
, et al
.
Effective treatment of low-risk acute GVHD with itacitinib monotherapy
.
Blood
.
2023
;
141
(
5
):
481
-
489
.
19.
Akahoshi
Y
,
Kimura
SI
,
Tada
Y
, et al
.
Cytomegalovirus gastroenteritis in patients with acute graft-versus-host disease
.
Blood Adv
.
2022
;
6
(
2
):
574
-
584
.
20.
Akahoshi
Y
,
Kimura
SI
,
Inamoto
Y
, et al
.
Effect of cytomegalovirus reactivation with or without acute graft-versus-host disease on the risk of nonrelapse mortality
.
Clin Infect Dis
.
2021
;
73
(
3
):
e620
-
e628
.
21.
Weisdorf
DJ
,
Hurd
D
,
Carter
S
, et al
.
Prospective grading of graft-versus-host disease after unrelated donor marrow transplantation: a grading algorithm versus blinded expert panel review
.
Biol Blood Marrow Transplant
.
2003
;
9
(
8
):
512
-
518
.
22.
Przepiorka
D
,
Weisdorf
D
,
Martin
P
, et al
.
1994 consensus conference on acute GVHD grading
.
Bone Marrow Transplant
.
1995
;
15
(
6
):
825
-
828
.
23.
Cahn
JY
,
Klein
JP
,
Lee
SJ
, et al
.
Prospective evaluation of 2 acute graft-versus-host (GVHD) grading systems: a joint Societe Francaise de Greffe de Moelle et Therapie Cellulaire (SFGM-TC), Dana Farber Cancer Institute (DFCI), and International Bone Marrow Transplant Registry (IBMTR) prospective study
.
Blood
.
2005
;
106
(
4
):
1495
-
1500
.
24.
Bayraktar
E
,
Graf
T
,
Ayuk
FA
, et al
.
Data-driven grading of acute graft-versus-host disease
.
Nat Commun
.
2023
;
14
(
1
):
7799
.
25.
MacMillan
ML
,
Robin
M
,
Harris
AC
, et al
.
A refined risk score for acute graft-versus-host disease that predicts response to initial therapy, survival, and transplant-related mortality
.
Biol Blood Marrow Transplant
.
2015
;
21
(
4
):
761
-
767
.
26.
MacMillan
ML
,
DeFor
TE
,
Holtan
SG
,
Rashidi
A
,
Blazar
BR
,
Weisdorf
DJ
.
Validation of Minnesota acute graft-versus-host disease risk score
.
Haematologica
.
2020
;
105
(
2
):
519
-
524
.
27.
Levine
JE
,
Braun
TM
,
Harris
AC
, et al
.
A prognostic score for acute graft-versus-host disease based on biomarkers: a multicentre study
.
Lancet Haematol
.
2015
;
2
(
1
):
e21
-
e29
.
28.
Hartwell
MJ
,
Ozbek
U
,
Holler
E
, et al
.
An early-biomarker algorithm predicts lethal graft-versus-host disease and survival
.
JCI Insight
.
2018
;
3
(
16
):
e124015
.
29.
Holtan
SG
,
DeFor
TE
,
Panoskaltsis-Mortari
A
, et al
.
Amphiregulin modifies the Minnesota acute graft-versus-host disease risk score: results from BMT CTN 0302/0802
.
Blood Adv
.
2018
;
2
(
15
):
1882
-
1888
.
30.
Etra
A
,
Gergoudis
S
,
Morales
G
, et al
.
Assessment of systemic and gastrointestinal tissue damage biomarkers for GVHD risk stratification
.
Blood Adv
.
2022
;
6
(
12
):
3707
-
3715
.
31.
Spyrou
N
,
Akahoshi
Y
,
Ayuk
F
, et al
.
The utility of biomarkers in acute GVHD prognostication
.
Blood Adv
.
2023
;
7
(
17
):
5152
-
5155
.
32.
Robin
M
,
Porcher
R
,
Michonneau
D
, et al
.
Prospective external validation of biomarkers to predict acute graft-versus-host disease severity
.
Blood Adv
.
2022
;
6
(
16
):
4763
-
4772
.
33.
McCurdy
SR
,
Radojcic
V
,
Tsai
HL
, et al
.
Signatures of GVHD and relapse after posttransplant cyclophosphamide revealed by immune profiling and machine learning
.
Blood
.
2022
;
139
(
4
):
608
-
623
.
34.
Luft
T
,
Benner
A
,
Jodele
S
, et al
.
EASIX in patients with acute graft-versus-host disease: a retrospective cohort analysis
.
Lancet Haematol
.
2017
;
4
(
9
):
e414
-
e423
.
35.
Socie
G
,
Niederwieser
D
,
von Bubnoff
N
, et al
.
Prognostic value of blood biomarkers in steroid-refractory or steroid-dependent acute graft-versus-host disease: a REACH2 analysis
.
Blood
.
2023
;
141
(
22
):
2771
-
2779
.
36.
Ferrara
JLM
,
Chaudhry
MS
.
GVHD: biology matters
.
Blood Adv
.
2018
;
2
(
22
):
3411
-
3417
.
37.
Srinagesh
HK
,
Ozbek
U
,
Kapoor
U
, et al
.
The MAGIC algorithm probability is a validated response biomarker of treatment of acute graft-versus-host disease
.
Blood Adv
.
2019
;
3
(
23
):
4034
-
4042
.
38.
Pepe
MS
,
Feng
Z
,
Janes
H
,
Bossuyt
PM
,
Potter
JD
.
Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design
.
J Natl Cancer Inst
.
2008
;
100
(
20
):
1432
-
1438
.
39.
Harris
AC
,
Young
R
,
Devine
S
, et al
.
International, multicenter standardization of acute graft-versus-host disease clinical data collection: a report from the Mount Sinai Acute GVHD International Consortium
.
Biol Blood Marrow Transplant
.
2016
;
22
(
1
):
4
-
10
.
40.
Bacigalupo
A
,
Ballen
K
,
Rizzo
D
, et al
.
Defining the intensity of conditioning regimens: working definitions
.
Biol Blood Marrow Transplant
.
2009
;
15
(
12
):
1628
-
1633
.
41.
Sorror
ML
,
Storer
B
,
Storb
RF
.
Validation of the hematopoietic cell transplantation-specific comorbidity index (HCT-CI) in single and multiple institutions: limitations and inferences
.
Biol Blood Marrow Transplant
.
2009
;
15
(
6
):
757
-
758
.
42.
Zhang
J
,
Ramadan
AM
,
Griesenauer
B
, et al
.
ST2 blockade reduces sST2-producing T cells while maintaining protective mST2-expressing T cells during graft-versus-host disease
.
Sci Transl Med
.
2015
;
7
(
308
):
308ra160
.
43.
Zhao
D
,
Kim
YH
,
Jeong
S
, et al
.
Survival signal REG3alpha prevents crypt apoptosis to control acute gastrointestinal graft-versus-host disease
.
J Clin Invest
.
2018
;
128
(
11
):
4970
-
4979
.
44.
Al Malki
MM
,
London
K
,
Baez
J
, et al
.
Phase 2 study of natalizumab plus standard corticosteroid treatment for high-risk acute graft-versus-host disease
.
Blood Adv
.
2023
;
7
(
17
):
5189
-
5198
.
45.
Robin
X
,
Turck
N
,
Hainard
A
, et al
.
pROC: an open-source package for R and S+ to analyze and compare ROC curves
.
BMC Bioinformatics
.
2011
;
12
:
77
.
46.
Gholamy
A
,
Kreinovich
V
,
Kosheleva
O
.
Why 70/30 or 80/20 relation between training and testing sets: a pedagogical explanation
.
Int J Intell Technol Appl Stat
.
2018
;
11
(
2
):
105
-
111
.
47.
Bzdok
D
,
Krzywinski
M
,
Altman
N
.
Points of significance: machine learning: a primer
.
Nat Methods
.
2017
;
14
(
12
):
1119
-
1120
.
48.
Breiman
L
,
Friedman
J
,
Stone
CJ
,
Olshen
RA
.
Classification and Regression Trees
. 1st ed..
Taylor & Francis
;
1984
.
49.
Hartigan
JA
,
Wong
MA
.
A K-means clustering algorithm
.
J R Stat Soc C Appl Stat
.
1979
;
28
(
1
):
100
-
108
.
50.
Kanda
Y
.
Investigation of the freely available easy-to-use software 'EZR' for medical statistics
.
Bone Marrow Transplant
.
2013
;
48
(
3
):
452
-
458
.
51.
Nikiforow
S
,
Wang
T
,
Hemmer
M
, et al
.
Upper gastrointestinal acute graft-versus-host disease adds minimal prognostic value in isolation or with other graft-versus-host disease symptoms as currently diagnosed and treated
.
Haematologica
.
2018
;
103
(
10
):
1708
-
1719
.
52.
Manhattan Risk and MAGIC Composite Score Calculator
. Accessed 17 July 2024. https://gvhdmagic.com/.
53.
Luznik
L
,
O'Donnell
PV
,
Symons
HJ
, et al
.
HLA-haploidentical bone marrow transplantation for hematologic malignancies using nonmyeloablative conditioning and high-dose, posttransplantation cyclophosphamide
.
Biol Blood Marrow Transplant
.
2008
;
14
(
6
):
641
-
650
.
54.
Meybodi
MA
,
Cao
W
,
Luznik
L
, et al
.
HLA-haploidentical vs matched-sibling hematopoietic cell transplantation: a systematic review and meta-analysis
.
Blood Adv
.
2019
;
3
(
17
):
2581
-
2585
.
55.
Bolanos-Meade
J
,
Reshef
R
,
Fraser
R
, et al
.
Three prophylaxis regimens (tacrolimus, mycophenolate mofetil, and cyclophosphamide; tacrolimus, methotrexate, and bortezomib; or tacrolimus, methotrexate, and maraviroc) versus tacrolimus and methotrexate for prevention of graft-versus-host disease with haemopoietic cell transplantation with reduced-intensity conditioning: a randomised phase 2 trial with a non-randomised contemporaneous control group (BMT CTN 1203)
.
Lancet Haematol
.
2019
;
6
(
3
):
e132
-
e143
.
56.
D'Souza
A
,
Fretham
C
,
Lee
SJ
, et al
.
Current use of and trends in hematopoietic cell transplantation in the United States
.
Biol Blood Marrow Transplant
.
2020
;
26
(
8
):
e177
-
e182
.
57.
Rimando
J
,
McCurdy
SR
,
Luznik
L
.
How I prevent GVHD in high-risk patients: posttransplant cyclophosphamide and beyond
.
Blood
.
2023
;
141
(
1
):
49
-
59
.
58.
Mielcarek
M
,
Furlong
T
,
Storer
BE
, et al
.
Effectiveness and safety of lower dose prednisone for initial treatment of acute graft-versus-host disease: a randomized controlled trial
.
Haematologica
.
2015
;
100
(
6
):
842
-
848
.

Author notes

J.E.L. and J.L.M.F. contributed equally to this study.

Data are available from authors John E. Levine (john.levine@mssm.edu), James L. M. Ferrara (james.ferrara@mssm.edu), or Yu Akahoshi (akahoshiu@gmail.com) upon reasonable request.

The online version of this article contains a data supplement.

There is a Blood Commentary on this article in this issue.

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Supplemental data

Sign in via your Institution