AbstractPatients with multiple comorbidities and those undergoing complex cardiac surgery may experience extubation failure and reintubation. The aim of this study was to establish an extubation prediction model using explainable machine learning and identify the most important predictors of extubation failure in patients undergoing cardiac surgery. Data from 776 adult patients who underwent cardiac surgery and were intubated for more than 24 h were obtained from the Medical Information Mart for Intensive Care (MIMIC)-IV database. The primary endpoint was extubation failure according to the WIND criteria, with 205 patients experiencing extubation failure. The data was split into a training set (80%) and a test set (20%). The performance of the XGBoost algorithm was the highest (AUC 0.793, Mean Precision 0.700, Brier Score0.150), which was better than that of logistic regression (AUC 0.766, Mean Precision 0.553, Brier Score0.173) and random forest (AUC 0.791, Mean Precision 0.510, Brier Score 0.181). The most crucial predictor of extubation failure is the mean value of the anion gap in the 24 h before extubation. The other main features include ventilator parameters and blood gas indicators. By applying machine learning to large datasets, we developed a new method for predicting extubation failure after cardiac surgery in critically ill patients. Based on the predictive factors analyzed, internal environmental indicators and ventilation characteristics were important predictors of extubation failure. Therefore, these predictive factors should be considered when determining extubation readiness.
IntroductionMechanical ventilation is a common treatment option for patients undergoing heart surgery. According to a previous study, the prevalence of prolonged mechanical ventilation after cardiac surgery is approximately 6.2%1; however, long-term mechanical ventilation can lead to complications, such as ventilator-associated pneumonia, tracheostomy, and muscle weakness2,3,4. Therefore, early extubation after cardiac surgery is beneficial for patients and can improve survival and reduce complications5. However, elderly patients with multiple comorbidities and those undergoing complex cardiac surgery may not be suitable for early extubation and may experience extubation failure and reintubation1. Extubation failure can occur shortly after or within days of extubation and can lead to significantly prolonged mechanical ventilation, prolonged hospital length of stay, and increased mortality6.Currently, extubation failure after cardiac surgery can be predicted using traditional indicators such as respiratory rate (RR), minute ventilation, tidal volume (VT), and rapid shallow breathing index (RSBI)7,8,9,10. To date, all these factors have been assessed separately, but individual parameters rarely provide sufficient accuracy to guide decision-making11,12. Recently, several studies have utilized artificial intelligence (AI), particularly machine learning (ML), to predict extubation failure or success11,13,14,15,16,17,18. However, mechanical ventilator data were usually missing, and these studies did not focus on patients who underwent cardiac surgery.Therefore, we aimed to establish an extubation prediction model using explainable machine learning and identify the most important predictors of extubation failure in patients undergoing cardiac surgery.MethodsData sourceThe data used in this study came from the Medical Information Mart for Intensive Care (MIMIC)-IV database. MIMIC-IV is a dataset that was generated by the Laboratory for Computational Physiology and made available to the public. It contains health data that has been deidentified and is associated with thousands of patients who were admitted to the intensive care unit at Beth Israel Deaconess Medical Centre (BIDMC) between the years 2008 and 201919,20. The dataset is extensively utilized by researchers and professionals globally, facilitating advancements in clinical informatics, epidemiology, and associated disciplines.PatientsAll patients who underwent cardiac surgery and were subjected to mechanical ventilation for a duration exceeding 24 h were deemed eligible for inclusion in the study.OutcomesThe primary endpoint was extubation failure according to the WIND criteria. These criteria require extubation without subsequent reintubation or mortality within a 7-day period, or be discharged from the ICU without the need for invasive mechanical ventilation within a period of 7 days21. In this definition, noninvasive ventilation was not taken into account. Since the causes of extubation were not routinely recorded, we did not distinguish between accidental and selective extubation.Predictive variablesThe identification of predictive variables for the model was based on a comprehensive review of relevant literature, clinical guidelines, and exploration of the dataset11,13,22,23. The extracted variables used the mean value within 24 h before extubation to facilitate interpretation of the model, which included patient demographics (age, body mass index, surgery type), laboratory studies (hemoglobin, platelet, white blood cell, partial pressure of oxygen, partial pressure of carbon dioxide, PaO2/FiO2 ratio, PH, lactate, Anion gap, Bicarbonate, blood urea nitrogen, calcium, chloride, creatinine, glucose, sodium, potassium), ventilator parameters (respiratory rate, minute ventilation volume, tidal volume, plateau pressure, positive end expiratory pressure, fraction of inspired oxygen) and comorbidities. Variables with missing values greater than 30% were excluded from the analysis to ensure the quality of the dataset.ModelingThe present investigation undertakes a comparative analysis of the efficacy of three distinct machine learning models, namely logistic regression, random forest, and XGBoost, in predicting the occurrence of extubation failure. We chose these models because they are easy to determine the importance of predictive factors. Hyperparameter tuning was performed using grid search with five-fold cross-validation on the training set to optimize model performance. The specific configurations for each model are detailed in Supplementary Table S1. The sample size used in this study was 776 adult patients, which is considered sufficient for model development. The dataset was first partitioned into two distinct subsets, namely the training set (80%) and the test set (20%), through a random process. This split was performed before any imputation or handling of missing data to ensure that no information from the test set would influence the imputation process or the training of the models. Missing data within the training dataset was addressed using Multivariate Imputation by MICE method implemented in R (version 4.1.3) using the “mice” package, which was performed within the training set only. This method generates multiple plausible imputations for each missing value based on observed data, thus reducing bias. Importantly, the imputation process was carried out exclusively within the training set, ensuring that no information from the test set was used during this process. By performing imputation only on the training dataset, we ensured that no data leakage occurred between the training and test datasets. The test set was not used in any part of the data preparation process, including imputation, which maintains the integrity of the evaluation metrics and prevents the model from having prior knowledge of the test set.The training set underwent a five-fold cross-validation procedure to mitigate the bias that may arise from the random partitioning of the dataset. The evaluation of the models’ efficacy was conducted through the utilisation of three distinct metrics, namely the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (or mean precision), and the Brier score or calibration curve. The performance of the model was reported as the mean and standard deviation of these metrics derived from the five internal validations, providing a more robust and comprehensive evaluation of the model’s performance.The Shapley Additive Explanation (SHAP) framework was employed to assess the significance of predictive variables. SHAP values represent the marginal contribution of a predictive factor to the overall prediction24, and are the latest technology for machine learning interpretability. In addition to using R for multiple imputation of missing data as described above, the analysis in this study was performed using Python (version 3.8.5).Ethical considerationThis study was conducted in compliance with the World Medical Association Declaration of Helsinki. This retrospective study did not require informed consent, and the approval of this project was received from BIDMC and the institutional review boards of Massachusetts Institute of Technology (MIT) (certification number: 9322422).ResultsPatients characteristicsA total of 776 adult patients who underwent cardiac surgery in the past 10 years were reviewed. Among them, 205 patients (26.4%) experienced extubation failure, with an incidence rate of 26.4%. The specific selection process is shown in Fig. 1. The characteristics of the populations with successful and failed extubation are summarized in Table 1.Fig. 1Study patients selection and exclusion process.Full size imageTable 1 Patient characteristics.Full size tableModelingThe model performance of three machine learning algorithm is shown in Table 2. The performance of the XGBoost algorithm was the highest (AUC 0.793 ± 0.012, Mean Precision 0.700 ± 0.060, Brier Score 0.150 ± 0.023), which was better than that of logistic regression (AUC 0.766 ± 0.014, Mean Precision 0.553 ± 0.055, Brier Score 0.173 ± 0.022) and random forest (AUC 0.791 ± 0.015, Mean Precision 0.510 ± 0.059, Brier Score 0.181 ± 0.025). In addition, Fig. 2 summarizes the predictive performance indicators of the XGBoost algorithm.Table 2 Model performance.Full size tableFig. 2Model performance metrics for the XGBoost models. (a) Receiver operating characteristic curves. (b) precision-recall curves. (c) Calibration curves and calculated Brier scores. AUC: area under the receiver operating characteristics curve; AUPRC: area under the precision-recall curve; B.S: Brier scores.Full size imagePredictor importanceThe XGBoost model was used to calculate the importance of predictive variables because it produced the highest performance. SHAP value is shown in the Fig. 3, which can help us understand the contribution of each feature in the model to the prediction results. The most crucial predictor of extubation failure is the mean value of the anion gap in the 24 h before extubation. The other main features include ventilator parameters and blood gas indicators.Fig. 3SHAP to illustrate the extubation prediction model at feature level. Each dot represents the Shapley value of one sample for that feature. A feature’s Shapley value represents the association of that feature to the risk score, with positive values indicating an association with a higher risk of extubation failure, and negative values indicating an association with a lower risk of extubation failure. The location of the dot on the x-axis represents its Shapley value, and its color represents the feature’s absolute value. SHAP: SHapley Additive explanation.Full size imageDiscussionIn this study, we used a large clinical database to train three machine-learning models to predict extubation failure in mechanically ventilated patients after cardiac surgery and found that XGBoost exhibited the best predictive ability. The most important predictors of extubation failure were mean anion gap, creatinine level, and ventilation parameters within 24 h before extubation. These risk factors may help critical care professionals choose the optimal time for extubation.This study is unique, and while our findings provide important insights into predicting extubation failure, further steps, including external validation across different patient populations and healthcare settings, are necessary before the model can be used to guide extubation decisions in clinical practice. The model’s performance in external validation will be critical in determining its generalizability and applicability in real-world clinical settings. In addition, this prediction model is designed for deployment as part of a clinical decision support system. By integrating with electronic health record systems, the model can provide clinicians with real-time predictions and recommendations regarding extubation readiness. This integration could enhance decision-making efficiency and accuracy, reducing extubation failure rates. Furthermore, external validation across various clinical environments is essential for ensuring the model’s robustness and adaptability to different patient demographics, institutional protocols, and resource settings. Such validation will be a key determinant of the model’s readiness for widespread deployment.Analyzing the previous literature, we found that most studies have focused on fast-track extubation in children25,26,27,28,29, with little attention paid to predictive models, especially for critically ill patients who require ventilation for more than 24 h and who are at a higher risk of extubation failure. In these prior studies, only risk factors were considered, which makes it difficult to provide guidance for extubation practices and assessment of extubation readiness.In our study, the anion gap value was the most important risk factor for extubation failure, along with other important factors, including creatinine and PH. These biochemical and blood gas parameters reflect the internal environmental status of the body. Previous studies18,30,31 have shown that biochemical and blood gas indicators can be used to guide extubation. The mean anion gap in the 24 h before extubation was found to be the most important predictor of extubation failure. This is likely due to its association with shifts in the internal environment, particularly metabolic acidosis, which can impair the patient’s respiratory compensation mechanisms. Elevated anion gap levels may indicate underlying conditions such as renal dysfunction or lactic acidosis, which can exacerbate respiratory failure and hinder the successful transition from mechanical ventilation to spontaneous breathing. It is important to note that creatinine levels in patients with chronic kidney disease, such as those undergoing dialysis, may not accurately reflect their acute renal status or respiratory readiness for extubation. In patients with preoperative renal dysfunction, elevated creatinine levels may be indicative of chronic disease rather than acute kidney injury. Therefore, the interpretation of creatinine levels in our model should be considered in the context of the patient’s preoperative renal status. In such cases, additional clinical markers, such as blood urea nitrogen and electrolyte imbalances, should be incorporated to provide a more comprehensive assessment of extubation readiness.Other important predictive factors were the ventilator parameters. Among these indicators, the two most important predictive factors for a higher probability of extubation failure were positive end-expiratory pressure (PEEP) and plateau pressure (Pplat). One possible explanation for this is that these two indicators reflect the degree of lung function impairment, with high values indicating severe disease. In addition, patients with high PEEP and Pplat levels may experience increased lung injury induced by mechanical power32,33,34, which may increase the risk of unsuccessful extubation. Similarly, a high tidal volume was also associated with extubation failure.This retrospective study had some limitations. First, the data in this study came from the MIMIC database, which contains single-center data and may limit the applicability of our models to other clinical environments. In the future, multicenter studies of diverse patient populations are required to determine the generalizability of our findings. Second, this study did not distinguish between accidental and intentional extubation. This is a major limitation, as these two types of extubation failures may have different underlying causes and could affect the generalizability of the findings. Future studies should aim to differentiate between accidental and intentional extubation failure to better understand the factors influencing extubation outcomes. Finally, the relationship identified in this study was an association and not a causal relationship.ConclusionBy applying machine learning to large datasets, we developed a new method for predicting extubation failure after cardiac surgery in critically ill patients. Based on the predictive factors analyzed, internal environmental indicators and ventilation characteristics were important predictors of extubation failure. Therefore, these predictive factors should be considered when determining extubation readiness.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
AbbreviationsMIMIC:
Medical Information Mart for Intensive Care
SHAP:
The Shapley Additive Explanation
RR:
Respiratory rate
VT:
Tidal volume
RSBI:
Rapid shallow breathing index
ML:
Machine learning
BIDMC:
Beth Israel Deaconess Medical Centre
MIT:
Massachusetts Institute of Technology
AUC:
Receiver operating characteristic curve
PEEP:
Positive end-expiratory pressure
Pplat:
Plateau pressure
ReferencesSharma, V. et al. A derived and validated score to predict prolonged mechanical ventilation in patients undergoing cardiac surgery. J. Thorac. Cardiovasc. Surg. 153, 108–115. https://doi.org/10.1016/j.jtcvs.2016.08.020 (2017).Article
PubMed
MATH
Google Scholar
Bigatello, L. M., Stelfox, H. T., Berra, L., Schmidt, U. & Gettings, E. M. Outcome of patients undergoing prolonged mechanical ventilation after critical illness. Crit. Care Med. 35, 2491–2497. https://doi.org/10.1097/01.Ccm.0000287589.16724.B2 (2007).Article
PubMed
Google Scholar
Roquilly, A. et al. Pathophysiological role of respiratory dysbiosis in hospital-acquired pneumonia. Lancet Respiratory Med. 7, 710–720. https://doi.org/10.1016/s2213-2600(19)30140-7 (2019).Article
MATH
Google Scholar
Weissman, C. Pulmonary complications after cardiac surgery. Semin. Cardiothorac. Vasc. Anesth. 8, 185–211. https://doi.org/10.1177/108925320400800303 (2004).Article
PubMed
Google Scholar
Martin, S., Jackson, K., Anton, J., Tolpin, D. A. & Pro Early extubation (< 1 Hour) after cardiac surgery is a useful, safe, and Cost-Effective method in select patient populations. J. Cardiothorac. Vasc. Anesth. 36, 1487–1490. https://doi.org/10.1053/j.jvca.2021.12.004 (2022).Article
PubMed
Google Scholar
Beverly, A., Brovman, E. Y., Malapero, R. J., Lekowski, R. W. & Urman, R. D. Unplanned reintubation following cardiac surgery: incidence, timing, risk factors, and outcomes. J. Cardiothorac. Vasc. Anesth. 30, 1523–1529. https://doi.org/10.1053/j.jvca.2016.05.033 (2016).Article
PubMed
Google Scholar
Baptistella, A. R. et al. Predictive factors of weaning from mechanical ventilation and extubation outcome: A systematic review. J. Crit. Care. 48, 56–62. https://doi.org/10.1016/j.jcrc.2018.08.023 (2018).Article
PubMed
MATH
Google Scholar
Luo, L. et al. Different effects of cardiac and diaphragm function assessed by ultrasound on extubation outcomes in difficult-to-wean patients: a cohort study. BMC Pulm. Med. 17, 161. https://doi.org/10.1186/s12890-017-0501-8 (2017).Article
PubMed
PubMed Central
MATH
Google Scholar
Rady, M. Y. & Ryan, T. Perioperative predictors of extubation failure and the effect on clinical outcome after cardiac surgery. Crit. Care Med. 27, 340–347. https://doi.org/10.1097/00003246-199902000-00041 (1999).Article
PubMed
MATH
Google Scholar
Sanson, G., Sartori, M., Dreas, L., Ciraolo, R. & Fabiani, A. Predictors of extubation failure after open-chest cardiac surgery based on routinely collected data. The importance of a shared interprofessional clinical assessment. Eur. J. Cardiovasc. Nurs. 17, 751–759. https://doi.org/10.1177/1474515118782103 (2018).Article
PubMed
Google Scholar
Fleuren, L. M. et al. Predictors for extubation failure in COVID-19 patients using a machine learning approach. Crit. Care. (London, England). 25, 448. https://doi.org/10.1186/s13054-021-03864-3 (2021).Article
PubMed Central
MATH
Google Scholar
Heunks, L. M. & van der Hoeven, J. G. Clinical review: the ABC of weaning failure–a structured approach. Crit. Care. (London, England). 14, 245. https://doi.org/10.1186/cc9296 (2010).Article
Google Scholar
Fabregat, A. et al. A machine learning decision-making tool for extubation in intensive care unit patients. Comput. Methods Programs Biomed. 200, 105869. https://doi.org/10.1016/j.cmpb.2020.105869 (2021).Article
PubMed
MATH
Google Scholar
Hsieh, M. H. et al. An artificial neural network model for predicting successful extubation in intensive care units. J. Clin. Med. 7 https://doi.org/10.3390/jcm7090240 (2018).Hsieh, M. H. et al. Predicting weaning difficulty for planned extubation patients with an artificial neural network. Medicine 98, e17392. https://doi.org/10.1097/md.0000000000017392 (2019).Article
PubMed
PubMed Central
Google Scholar
Kuo, H. J. et al. Improvement in the prediction of ventilator weaning outcomes by an artificial neural network in a medical ICU. Respir. Care. 60, 1560–1569. https://doi.org/10.4187/respcare.03648 (2015).Article
PubMed
MATH
Google Scholar
Lin, M. Y. et al. Explainable machine learning to predict successful weaning among patients requiring prolonged mechanical ventilation: A retrospective cohort study in central Taiwan. Front. Med. 8, 663739. https://doi.org/10.3389/fmed.2021.663739 (2021).Article
Google Scholar
Tsai, T. L., Huang, M. H., Lee, C. Y. & Lai, W. W. Data science for extubation prediction and value of information in surgical intensive care unit. J. Clin. Med. 8 https://doi.org/10.3390/jcm8101709 (2019).Alistair Johnson, L. B., Pollard, T. & Horng, S., Leo Anthony Celi, Roger Mark MIMIC-IV (version 2.2). PhysioNet., (2023).