Figure 2.
Associations between the proteomic and clinical features for incident myeloma. (A) Box plots of the normalized protein expression (NPX) values of each of the top 10 proteomic features at baseline (enrolment into UK Biobank [UKB]), stratified by incident myeloma status. The plots are arranged in order of SHAP importance. The box plot lines represent the median NPX value, edges represent the first and third quartiles, and whiskers show 1.5× the interquartile range with dots as outliers outside this range. To account for the intra- and interbatch variability, the Olink data were scaled and normalized around a median of zero; thus, half of the data have negative values, and the data represent relative rather than absolute protein concentrations. (B) A forest plot of the hazard ratios estimated from the 3 Cox models that were developed. The proteomics model is shown in orange, the clinical model in blue, and the combined model in gray. The hazard ratio point estimates are represented by the dot and the 95% confidence intervals are represented by the whiskers. (C) A comparison of the performance of the 3 Cox models on the training and held-out test data sets. The line plot displays the time-dependent area under the receiver operator characteristic curves (AUROCs) for each of the 3 models at years 4, 8, 12, and 16 since enrolment in the UKB with the mean AUROCs and overall C indexes in the training and test data summarized below. The table below presents the number of participants in the training and test data sets at risk at years 4, 8, 12, and 16, respectively, and the number events (myeloma diagnoses) that occurred between years 0 to 4, 4 to 8, 8 to 12, and 12 to 16.

Associations between the proteomic and clinical features for incident myeloma. (A) Box plots of the normalized protein expression (NPX) values of each of the top 10 proteomic features at baseline (enrolment into UK Biobank [UKB]), stratified by incident myeloma status. The plots are arranged in order of SHAP importance. The box plot lines represent the median NPX value, edges represent the first and third quartiles, and whiskers show 1.5× the interquartile range with dots as outliers outside this range. To account for the intra- and interbatch variability, the Olink data were scaled and normalized around a median of zero; thus, half of the data have negative values, and the data represent relative rather than absolute protein concentrations. (B) A forest plot of the hazard ratios estimated from the 3 Cox models that were developed. The proteomics model is shown in orange, the clinical model in blue, and the combined model in gray. The hazard ratio point estimates are represented by the dot and the 95% confidence intervals are represented by the whiskers. (C) A comparison of the performance of the 3 Cox models on the training and held-out test data sets. The line plot displays the time-dependent area under the receiver operator characteristic curves (AUROCs) for each of the 3 models at years 4, 8, 12, and 16 since enrolment in the UKB with the mean AUROCs and overall C indexes in the training and test data summarized below. The table below presents the number of participants in the training and test data sets at risk at years 4, 8, 12, and 16, respectively, and the number events (myeloma diagnoses) that occurred between years 0 to 4, 4 to 8, 8 to 12, and 12 to 16.

or Create an Account

Close Modal
Close Modal