Data source and study population
The retrospective cohort study was conducted from the Medical Information Mart for Intensive Care (MIMIC-IV) open source clinical database, which consisted of more than 40,000 patients in ICU between 2008 and 2019 at Beth Israel Deaconess Medical Center20.The MIMIC-IV database can be freely utilized after successful application and ethical approval from the Institutional Review Boards of both Beth Israel Deaconess Medical Center (Boston, MA, USA) and the Massachusetts Institute of Technology (Cambridge, MA, USA).
SAE is defined as the sepsis patients who have a Glasgow Coma Scale (GCS) ≤ 14 or delirium (according to the ICD-9 code (2930, 2931)). The delirium caused by alcohol or drug abuse, dementia, mental disorders, and neurological diseases were excluded. GCS was considered an important determinant for characterizing SAE and distinguishing it from sepsis.14.Our study included patients based on the Third International Consensus Definitions for Sepsis (Sepsis-3): (i) Patients with infection confirmed by the positive results of microbial cultivation and (ii) the Sequential Organ Failure Assessment (SOFA) score ≥ 2.21, Excluded were patients14: (i) with primary brain injury (traumatic brain injury, ischemic stroke, hemorrhagic stroke, epilepsy, or intracranial infection); (ii) with pre-existing liver or kidney failure affecting consciousness; (iii) with severe burn and trauma; (iv) receiving cardiac resuscitation recently; (v) with chronic alcohol or drug abuse; (vi) with severe electrolyte imbalances or blood glucose disturbances, including hyponatremia (< 120 mmol/l), hyperglycemia (> 180 mg/dl), or hypoglycemia (< 54 mg/dl); (vii) dying or leaving within 24 hours since ICU admission; (viii) without an evaluation of GCS; (ix) < 17 years of age. Eligible patients were enrolled into the final cohort for investigation, and the specific data inclusion analysis process was illustrated in Fig. 1.
56 features were extracted from all patients, including categorical variables such as comorbidities, mechanical ventilation, and the first care unit category within 24 h of admission to the ICU, along with continuous variables such as laboratory tests, vital signs, and demographic characteristics. The completeness of the features we chose was above 80%, and we used multiple interpolation22 methods to fill in the missing value. The categorical variables were specially processed in advance, and the numerical transformation was performed to 0,1 categories. All classification variables include gender, ethnicity, first care unit, comorbidity, microorganizations, mechanical utilization, and vaporizer. As shown in the statistical list of Table 1, we used 0,1 to represent the variables that cannot be represented by specific values. For example, we will mark the patients with hypertension as 1 in advance, and the patients without hypertension as 0, so that the classification variables can be handled in advance before entering the model. All variables were normalized (0–1 range) before entering the model. For some indicators which had more than one measurement a day, we calculated the mean, maximum and minimum values to reflect the information of patients in more detail.
Classification model and model interpretation
The scheme of the overall experimental design process was shown in Fig. 2. Firstly, according to the data inclusion criteria, the corresponding data would be extracted and cleaned. Then these features were fed into different machine learning classifiers to choose the best model. We randomly split the data of SAE and non-SAE patients by a 7:2:1 ratio for training, internal validation, and testing respectively, and tenfold cross validation was adopted. We randomly set aside a group of 10% data for final testing, tenfold cross validation was just used for the remaining 90% of the data. Six machine learning classifiers were employed to predict the occurrence of SAE, and they are Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting Model, Random Forest (RF), Light Gradient Boosting Machine (Light-GBM), Decision Tree (DT) , and Support Vector Machines (SVM). The performance of the different classifiers was compared by the area under the Receiver Operating Characteristic Curve (ROC). To identify potentially relevant features for the occurrence of SAE of the study participants and make the model interpretable, the Shapley additive explanation (SHAP)23 was utilized to analyze the feature importance and cut-off values, and finally make interpretable predictions for a single sample. The SHAP was based on game theory and can transform the model into a sum effect of all feature attributes to obtain the prediction. Furthermore, the effect of each feature on the final prediction can be measured by the SHAP value. The SHAP installation package and the machine learning model packages were imported into a python3.7 environment, and can be referred from the official website: https://shap.readthedocs.io/en/latest/api.html.
Data were presented in Table 1 according to different types and distributions of variables. The completeness of the features we chose was above 80%, and we used multiple interpolation methods to fill in the missing values. The demographic and baseline characteristics of the study population were compared using the Pearson chi-square test for categorical variables and Student’s t-test for continuous variables. Normality tests were performed using the Shapiro–Wilk test. Normally distributed continuous variables, non-normally distributed continuous variables, and categorical variables were expressed as mean ± standard deviation, quartiles, and count or percentage, respectively; differences were detected using the two-sample independent t-test, rank sum test, and chi-square test, respectively. SPSS software for Windows (version 25.0, SPSS Inc., Chicago, IL, USA) was used for the statistical analyses. An alpha level of 0.05 was set for statistical significance.
Model performance evaluation method
We used AUC-ROC, AUC-PR, AUC, sensitivity, specificity and F1, which were commonly used in machine learning to evaluate and compare the model performance. SHAP was used to explain the model prediction results. To further evaluated the interpretability of the model, we invited six neurosurgeons and ICU physicians to score the prediction results of our model. The physicians scored the cut-off points of the significant indicators from Fig. 4c, and then offered values from their own medical perceptions. By comparing the results of model interpretation with the evaluation of physicians, we can make an objective clinical evaluation of the interpretable model.
The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The data used are from publicly available datasets.
The Institutional Review Board at the Beth Israel Deaconess Medical Center waived informed consent to the study because the project did not impact clinical care and all protected health information was deidentified. The study conformed to the provisions of the Declaration of Helsinki (as revised in 2013). The study protocol was approved by Beijing Institute of Technology.