Background

Recent advances in next-generation sequencing (NGS) and transcriptomic profiling have refined the molecular classification of B-cell precursor acute lymphoblastic leukemia (BCP-ALL), leading to the identification of novel subtypes with distinct biological features and clinical implications. Among these, the BCR::ABL1-like subtype is characterized by a stem/progenitor cell gene expression profile, poor response to conventional therapy, higher levels of measurable residual disease (MRD), and an increased risk of relapse. Accurate and timely classification of this subtype is crucial for therapeutic decision-making, including the use of tyrosine kinase inhibitors and intensified treatment protocols.AimsOur primary objective was to develop artificial intelligence (AI) models that allow for rapid and automated classification of BCR::ABL1-like BCP-ALL patients using gene expression data obtained from our custom panel. In addition, we aimed to assess whether the combination of gene expression profiles and diagnostic clinical variables could be used to predict adverse outcomes, such as relapse or death, at diagnosis.MethodsRNA-seq data from 179 BCP-ALL patients were quantified using Salmon and normalized across samples. Normalized expression values were used to train predictive models. The dataset was randomly divided into training (80%) and test (20%) subsets. The BCR::ABL1-like subtype was encoded as 1; all other subtypes as 0. Clinical outcome (relapse and death) was available for a subset of 82 patients. Two machine learning (ML) strategies were applied: LightGBM (LGBM, Microsoft®) and a neural network (NN) built with Keras. Hyper-parameter tuning was performed for both models. Prediction accuracy was used to assess classification performance, and ROC_AUC variable (receiver operating characteristic - area under the curve) was used to evaluate outcome prediction, both measured on the test set. Feature importance and SHAP values were computed to interpret the LGBM models Results

Both AI models demonstrated strong classification performance, with test set accuracies of 0.96 (LGBM) and 0.98 (NN). No overfitting was observed (training vs. test accuracy difference < 0.02). For outcome prediction, the LGBM model yielded ROC-AUC values of 0.68 (death) and 0.83 (relapse). The top genes associated with relapse (based on LGBM feature importance) were JCHAIN (8), CD99 (11), and SHOC2 (19). For death, the top features were SHOC2 (17), RBM47 (17), and LDB3 (14). SHAP analysis suggested a protective role for SHOC2 in both relapse (SHAP: –0.70) and death (–0.40), whereas RBM47 (+0.60) and LDB3 (+0.70) were associated with increased mortality risk.

Summary/Conclusions AI models trained on targeted RNA-seq gene expression data can reliably classify BCR::ABL1-like BCP-ALL, supporting their potential role in rapid, automated molecular diagnosis. Additionally, baseline gene expression at diagnosis shows promise for predicting clinical outcomes, although larger cohorts are needed to improve prognostic model performance. Preliminary findings highlight SHOC2 as a potential protective biomarker for relapse and mortality in BCP-ALL, while RBM47, LDB3, and CD99 warrant further investigation as markers of poor prognosis.

This content is only available as a PDF.
Sign in via your Institution