Chimeric antigen receptor (CAR) T cell therapies have revolutionized treatment for hematological malignancies, yet over 50% of patients fail to achieve a durable response. While clinical outcome variability stems from multiple factors, the cellular composition and functional states within infusion products represent important determinants of therapeutic success and are an active area of research. We hypothesized that distinct cell states within CAR T infusion products differentially contribute to treatment efficacy, and that modeling these states could enable response prediction to guide rational therapy improvements. We assembled a single-cell RNA-sequencing (scRNA-seq) atlas of axicabtagene ciloleucel (axi-cel) infusion products from 64 patients with relapsed/refractory large B cell lymphoma (LBCL) across three U.S. institutions, combining internal and publicly available cohorts. We found that CD8+ T cells were enriched in patients who achieved an overall response (OR) at 3 months (P = 0.015). In a DESeq2 differential expression analysis controlling for batch effects, we found that SOCS3 expression was significantly enriched in non-responders (adj. P = 0.047). To reduce technical variability, we applied SCENIC transcription factor activity analysis, significantly reducing batch effects (P = 2.87×10⁻⁵) and generating 164 robust transcription factor (TF) activity scores per cell. Differential analysis of SCENIC TF scores identified RORC activity as enriched in non-responders (adj. P = 0.038). With the goal of leveraging machine learning to identify characteristics associated with OR and to potentially nominate specific genetic perturbations for rational design, we developed tcellMIL, a novel attention-based multiple instance learning framework that identifies therapeutically important cells within infusion products and leverages their gene expression states to predict overall response at 3 months after CAR T cell therapy. This approach models each patient's infusion product as a collection of cells, allowing the model to identify which cellular subsets drive therapeutic outcomes. Through rigorous leave-one-patient-out cross-validation, tcellMIL significantly outperformed all comparator models, including pseudobulk classifiers, foundation models, and baseline MIL approaches (accuracy: 0.72 vs. 0.53-0.69; F1 score: 0.74 vs. 0.58-0.73). Permutation testing found CD8+ T cells (P = 3.4×10⁻15) and CD4+ T cells (P = 2.4x10-7) had significant membership within influential cell populations compared to all other cells within the infusion products, consistent with their critical role as the primary effector cells in CAR T therapy. Our model's attention mechanism revealed that cells with high JUND activity contribute minimally to treatment predictions, suggesting this population may be less relevant clinically, whereas activity of FOSB was important for patient outcome prediction. Further, cells with higher TF activity of NFYB and the E2F family contributed greater towards predictions. In silico perturbation screens using our model nominated TBX21 (T-bet) overexpression, among others such as FOXM1, as a beneficial target, improving predicted response probabilities across all analyses. Our results are consistent with prior experimental studies demonstrating enhanced in vitro and in vivo efficacy with T-bet overexpression in CAR T cells for lymphoma (Gacerez et al., 2017) and CD19-low leukemia (Cimons et al., 2025), supporting our approach. Our findings demonstrate that cellular heterogeneity within CAR T infusion products predicts therapeutic outcomes, with specific T cell states serving as key determinants of treatment success. The tcellMIL framework provides a generalizable platform for leveraging primary scRNA-seq data to predict CAR T therapy responses and to identify rational engineering targets. This approach offers a pathway toward precision CAR T cell therapy design through predictive modeling and evidence-based cellular engineering strategies.

This content is only available as a PDF.
Sign in via your Institution