Construction of a machine learning model to predict fungal infection in children with leukemia based on clinical and imaging features
Ge Peng , Zhao Lian , Qian Jing , Li Guohui
1.Department of Radiology, Children’s Hospital of Soochow University, Jiangsu, Suzhou 215000, China; 2.Department of Hematology, Children’s Hospital of Soochow University, Jiangsu, Suzhou 215000, China
Objective To explore the application of machine learning methods in evaluating the clinical and imaging features of fungal infections in children with leukemia, and to establish an effective predictive model.
Methods A retrospective study was conducted. Forty children with leukemia and fungal infection hospitalized in the Department of Hematology, Children's Hospital of Soochow University from January 2021 to January 2023 were enrolled as the fungal infection group, and 150 children with non-fungal infection admitted to the same department during the same period were randomly selected using systematic sampling as the non-fungal infection group. The clinical and imaging features of the two groups of children were compared. Features with statistically significant differences were used to establish a predictive model based on machine learning algorithms, including Logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost). The performance of the four models was evaluated using the area under the curve (AUC) of the receiver operating characteristic curve. The importance matrix diagram and Shapley additive explanations (SHAP) values were calculated to assess the importance of the features and display the visualization results. Statistical analysis was performed using independent samples t-test, Mann-Whitney U test, χ 2 test, and Fisher's exact test. Results The platelet count in the fungal infection group was lower than that in the non-fungal infection group [88.50 (39.25, 260.75) × 109
/L vs 191.00 (88.00, 267.25) × 109
/L, Z=-2.628, P=0.009]; while
the levels of C-reactive protein (CRP) [28.00 (4.28, 80.13) mg/L vs 4.37 (0.67, 9.46) mg/L, Z=-4.978, P<0.001], procalcitonin (PCT) [0.28 (0.08, 0.44) μg/L vs 0.11 (0.05, 0.22) μg/L, Z=-3.027, P=0.002], and the proportions of stem cell transplantation (47.5% vs 16.0%, χ 2 =17.895, P<001), reticular/linear opacities (75.0% vs 34.7%, χ 2 =20.941, P<0.001), pulmonary nodules (62.5% vs 16.7%, χ 2 =34.211, P<0.001), air bronchogram sign (37.5% vs 10.7%, χ2=16.653, P<001), bronchiectasis (17.5% vs 3.3%, χ 2 =10.711, P=0.004), ground-glass opacity (GGO) (77.5% vs 38.0%, χ 2 =19.816, P<0.001), cavitation (25.0% vs 4.0%, χ 2 =18.058, P<0.001), air crescent sign (12.5% vs 0%, P<0.001), mediastinal lymphadenopathy (45.0% vs 6.0%, χ
2
=39.399, P<0.001), pleural effusion (32.5% vs 8.7%, χ2=15.186, P<0.001), and pleural thickening (52.5% vs 7.3%, χ
2
=45.997, P<0.001) in the fungal infection group were all significantly higher than those in the non-fungal infection group. Among the four machine learning models, theRF model had the highest performance (AUC=0.910), outperforming the XGBoost (AUC=0.906), LR (AUC=0.887), and SVM (AUC=0.880) models. Based on SHAP values, in the RF model, pleural thickening, CRP, and mediastinal lymphadenopathy were the three most important features. Conclusion The RF model can be used to predict the risk of fungal infection in children with leukemia. The most important influencing factors of the model are pleural thickening, CRP, and mediastinal lymphadenopathy.