nature.com

Machine learning models to predict osteoporosis in patients with chronic kidney disease stage 3–5 and end-stage kidney…

AbstractChronic kidney disease-mineral bone disorder is a common complication in patients with chronic kidney disease (CKD) and end-stage kidney disease (ESKD), and it increases the risk of osteoporosis and fractures. This study aimed to develop predictive machine-learning (ML) models to identify osteoporosis risk in patients with CKD stages 3–5 and ESKD. We retrospectively analyzed a de-identified osteoporosis database from a Taiwanese hospital, including 6614 patients with CKD stages 3–5 and ESKD who underwent bone mineral density (BMD) scans between January 2011 and June 2022. Nine ML algorithms were applied to predict osteoporosis: logistic regression, XGBoost, LightGBM, CatBoost, SVM, decision tree, random forest, k-nearest neighbors, and an artificial neural network (ANN). The ANN model achieved the highest predictive performance, with an area under the curve (AUC) of 0.940 on the validation and 0.930 on the test datasets. The receiver operating characteristic curve, confusion matrix, and predictive probability histogram revealed that the ANN model performed well in terms of discrimination. Calibration and decision curve analyses further demonstrated the reliability and applicability of the ANN model. The ANN model demonstrated the potential for clinical implementation in screening high-risk patients for osteoporosis.

IntroductionChronic kidney disease-mineral bone disorder (CKD-MBD) is one of the most common complications in patients with chronic kidney disease (CKD) and end-stage kidney disease (ESKD). The impact of CKD-MBD is increasingly being recognized with the global rise in CKD1. Skeletal abnormalities associated with CKD-MBD lead to a significantly higher risk of osteoporosis and fractures. Studies show that 18–32% of patients with CKD have osteoporosis2, with a fracture risk over 2.5 times higher than in those without CKD. For patients with ESKD, the risk was more than four times higher. Fragility fractures in patients with advanced CKD and ESKD lead to increased morbidity, mortality, and healthcare costs2,3,4,5,6.Bone biopsy is considered the gold standard for diagnosing CKD-MBD; however, its invasiveness and cost limit its use. The 2017 Kidney Disease Improving Global Outcomes (KDIGO) guidelines recommend that bone densitometry is essential for assessing fracture risk in patients with CKD, as disruptions in bone and mineral metabolism caused by CKD increase fracture risk7. Osteoporosis is diagnosed by a T-score of ≤  − 2.5 at the lumbar spine, femur neck, or total hip, using bone mineral density (BMD) testing via dual-energy X-ray absorptiometry (DXA)8. DXA is the most widely used method to measure BMD and diagnose osteoporosis. However, in clinical practice, many patients with CKD do not undergo BMD testing. In Taiwan, DXA scan access is limited by healthcare coverage restrictions9, and machines are not available in all clinical settings because of their cost and space requirements. Therefore, tools are required to screen patients with CKD who should undergo DXA scanning.Artificial intelligence has been widely employed to assist in disease diagnosis and outcome prediction. Machine learning (ML) models are mathematical algorithms used to identify patterns in data10,11. Several studies have explored ML applications for predicting osteoporosis, focusing on diverse populations, including the general population12,13,14,15,16,17,18,19,20, postmenopausal women21,22,23,24, and patients with breast cancer25, rheumatoid arthritis26, and type 2 diabetes mellitus27,28. However, no studies have employed multiple ML models to predict osteoporosis in patients with CKD stage 3–5 and ESKD. Therefore, our study aimed to apply various ML models to predict osteoporosis in this population and develop a one-year predictive ML model for osteoporosis risk.MethodsStudy participantsOur hospital has established a de-identified osteoporosis database that includes all participants who have undergone BMD scans. A total of 35,670 BMD scans were recorded in the database between January 1, 2011, and June 30, 2022. Demographic information, comorbidities, laboratory and BMD scan data were retrospectively extracted from the database. The study population selected from the database comprised participants with CKD stages 3–5 and ESKD who had undergone at least one BMD scan. We primarily enrolled participants with CKD and ESKD with at least two tests, separated by at least 3 months, showing estimated glomerular filtration rate (eGFR) < 60 mL/min/1.73 m2. Additionally, one of the two eGFR qualifying records had to be within one year prior to the BMD scan. We excluded participants who did not have a renal function test within one year prior to the BMD scan. A total of 6614 eligible participants were included in our cohort for further analysis. This study was approved by the Institutional Review Board of the Taichung Veterans General Hospital (No. SE24210B). All methods were carried out following the relevant institutional guidelines and regulations. The Institutional Review Board of the Taichung Veterans General Hospital approved a waiver of informed consent, as all protected health information was de-identified, and the study was retrospective in nature.Data extractionAll extracted patient information was de-identified. The demographic features used for the machine-learning models included age, sex, height, weight, and body mass index (BMI). Laboratory data included eGFR, serum creatinine (Cre), serum albumin, serum corrected calcium (Ca), serum phosphorus, serum alkaline phosphatase (ALP), serum intact parathyroid hormone (i-PTH), white blood cell count (WBC), hemoglobin (HGB), fasting glucose, and glycated hemoglobin. Comorbidities were identified using ICD-9 or ICD-10 codes and included osteoarthritis, rheumatoid arthritis, diabetes mellitus, hypertension, cerebrovascular accident, cataracts, and a history of previous fractures.Additionally, we extracted data on the comorbidities related to secondary osteoporosis, including type 1 diabetes mellitus, osteogenic imperfecta in adults, hyperthyroidism, hypogonadism, premature ovarian insufficiency, chronic malnutrition, and chronic liver disease. The comorbidities related to secondary osteoporosis were extracted from the Fracture Risk Assessment Tool (FRAX®)29 evaluations of these patients, which consider disorders associated with secondary osteoporosis. The assessment was conducted by our nurse case managers. Patients were classified as having chronic malnutrition if they had a diagnosis of ICD-10 codes E40–E46 or ICD-9 codes 260–263, or if our nurse case managers determined that they met the criteria for protein-energy wasting (PEW). The diagnosis of PEW was based on the definition established by the International Society of Renal Nutrition and Metabolism30. History of steroid use was also extracted if the patient had been taking oral glucocorticoids for more than 3 months at a daily dose of at least 5 mg of prednisolone. EGFR was calculated using the 2021 Chronic Kidney Disease Epidemiology Collaboration creatinine equation31,32. Corrected calcium levels in the serum were estimated using the following formula: corrected calcium (mg/dL) = serum calcium (mg/dL) + 0.8 × (4 − serum albumin [g/dL])33.Study design and label definitionFigure 1 illustrates the study design, label definitions, and architectures of the prediction models. According to the World Health Organization’s operational definition, the diagnosis of osteoporosis is based on a clinical history of fragility fractures or a T-score of ≤  − 2.5 at the lumbar spine, femoral neck, or total hip, as determined by BMD testing using DXA8. Participants were labeled as the “osteoporosis” group if their BMD scan results indicated osteoporosis. Conversely, participants were labeled as the “control” group if their BMD results did not show osteoporosis.Fig. 1Study design, label definitions, and the architecture of our prediction models.Full size imageWe used a stratified random splitting strategy to divide the study cohort into 80% training and validation data and 20% test data. This strategy ensured that the proportion of osteoporosis cases was the same in both the training and test sets34. Next, we applied the same stratified random splitting strategy to further divide the training and validation data into 80% training and 20% validation subsets. To identify the optimal classifier, we performed five-fold cross-validation of the training and validation subsets. The optimal classifier, independent of the training/validation data, was used to predict the outcomes for each participant in the test dataset. This provided an unbiased final performance metric for the model35.Data preprocessing and feature selectionTable 1 presents the missing data of the study cohort. We conducted Little’s missing completely at random (MCAR) test, which resulted in a P-value > 0.05, indicating that the missing data pattern did not meet the MCAR assumption36. In other words, the data are not missing completely at random and may instead be missing at random or missing not at random. For example, the probability of missing BMI data is related to the height and weight features37. We conducted a pilot study using three different methods to impute missing values: mean imputation, multiple imputation, and K-nearest neighbors (KNN) imputation38. The results showed that multiple imputations yielded fair performance; therefore, we selected it as the final method for handling missing values. Continuous variables were standardized using standard scaling prior to training. Categorical variables were processed using one-hot encoding prior to training.Table 1 Missing data in our study cohort.Full size tableNext, we applied recursive feature elimination with a five-fold cross-validation (RFECV) to determine the optimal number of features39. The RFECV results indicate that 16 features were optimal. We reviewed the literature on ML applications for predicting osteoporosis12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 and applied three methods to select features and gradually reduce the number of features. Table 2 outlines the three feature-selection methods and the features chosen in our study. In the first method, features are selected based on their importance17,18,23,24. In the second method, we used recursive feature elimination (RFE) with a random forest classifier to identify important features. RFE is a backward feature selection technique that optimizes the model performance by iteratively removing the least important features based on their predictive values20,24. In the third method, key features were selected from those identified in methods 1 and 2 through group discussions. Additionally, during the feature-selection process in method 3, we considered the accessibility and practicality of the selected features for clinical use12,13.Table 2 Feature selection methods and selected features in our study.Full size tableMachine learning models and performance evaluationWe applied nine ML algorithms to predict osteoporosis in patients with CKD stages 3–5 and ESKD: logistic regression (LR), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), category boosting (CatBoost), support vector machine (SVM), decision tree (DT), random forest (RF), KNN, and artificial neural networks (ANN). We evaluated the performance of each model using accuracy, precision, recall, F1 score, and area under the curve (AUC). In this study, the AUC refers to the area under the receiver operating characteristic (ROC) curve. The 95% confidence interval (CI) for each endpoint was computed using the bootstrap method, where 1000 replacement resampling procedures were performed to evaluate the model performance. We also used ROC curve, calibration curve, and decision curve analysis to determine the discrimination, reliability, and applicability of the predictive ML models. Additionally, we calculated the Shapley additive explanations (SHAP) values to assess feature importance and explore the relationship between outcomes and features40. All analyses, including data preprocessing, feature selection, ML modeling, and performance evaluation, were conducted using Python (version 3.12.4)41.In this study, we employed different hyperparameter tuning approaches based on the characteristics of each machine learning model. For the ANN model, we performed grid search with threefold cross-validation to determine the optimal hyperparameters, exploring batch sizes (32, 64, 128, 256), epochs (50, 100, 200, 300), dropout rates (0.05, 0.1, 0.2, 0.3), and optimizers (Adam, RMSprop). Early stopping was implemented to monitor validation loss and halt training if no improvement was observed for 10 consecutive epochs. For other machine learning models, we used randomized search with fivefold cross-validation to efficiently explore the hyperparameter space in models such as XGBoost, LightGBM, and random forest, while for models like logistic regression, decision tree, KNN, and SVM, we relied on default Scikit-learn settings to ensure standard implementation and baseline comparisons. The final hyperparameter settings for each model are summarized in Supplemental Table S1.Normality was assessed using the Kolmogorov–Smirnov test to compare the demographic and clinical characteristics of the participants. Tests for statistical significance were conducted using the Mann–Whitney U test for continuous variables and the chi-square test or Fisher’s exact test for categorical variables. The level of significance was set at a two-tailed value of P < 0.05. Statistical analyses were performed using MedCalc for Windows, version 22.303 (MedCalc Software; Ostend, Belgium).ResultsDemographic and clinical characteristics of the participantsA total of 6614 participants with stages 3–5 CKD and ESKD were enrolled to predict osteoporosis. Among them, 4526 participants were classified into the osteoporosis group, while 2088 participants were in the control group. Table 3 presents the baseline demographic and clinical characteristics of participants. The osteoporosis group, compared to the control group, was significantly older, with a higher proportion of females, and with shorter height, lower weight, lower BMI, lower eGFR, lower serum Cre levels, lower serum albumin levels, higher ALP levels, lower corrected Ca levels, lower serum phosphorus levels, lower i-PTH levels, higher WBCs, lower HGB levels, higher fasting glucose levels, and a higher proportion of participants with a history of osteoporosis, cerebrovascular accidents, cataracts, hypogonadism, malnutrition, or chronic liver disease. Although higher Cre levels are typically associated with lower eGFR, eGFR is calculated using age, sex, and serum Cre levels. Therefore, despite the lower serum Cre levels in the osteoporosis group, the eGFR was lower in this group because of their older age and higher proportion of females.Table 3 Comparison of demographic and clinical characteristics of the participants.Full size tablePerformance evaluation of the prediction modelsFigure 2 presents the AUC results of the three feature selection methods for both the validation and test datasets. Across all three methods, we tested sets containing 32, 16, 15, 10, 9, 8, 7, and 6 features to evaluate the AUC performance of the nine ML models. In method 3, we found that selecting eight features and using the ANN model to predict osteoporosis consistently produced the best results (AUC = 0.94, validation dataset; AUC = 0.93, test dataset), as shown in both the validation dataset (Fig. 2A) and the test dataset (Fig. 2B). The eight features selected by method 3 were Cre, age, weight, height, albumin, i-PTH, glucose, and HGB (Table 2).Fig. 2Results of AUC in three different feature selection methods on (A) the validation and (B) the test datasets.Full size imageTable 4 presents the performance metrics for the models predicting osteoporosis using method 3 with the eight selected features. In addition to achieving the best AUC, the ANN model demonstrated the highest accuracy, precision, recall, and F1 scores for both the validation and test datasets. The CatBoost model ranked second in terms of accuracy, precision, recall, F1 score, and AUC.Table 4 Performance metrics for nine ML models in the method 3 with eight selected features.Full size tableFigure 3 shows the ROC curves for different ML models predicting osteoporosis, using method 3 with eight selected features, on the validation (A) and test (B) datasets. These models included ANN, CatBoost, LightGBM, XGBoost, random forest, SVM, KNN, logistic regression, and decision trees. The AUC values for each model are shown in the Figure legend. The ANN model achieved the highest AUC for both the validation and test datasets, while the CatBoost model ranked second in terms of AUC performance.Fig. 3ROC curve comparison for different models on (A) the validation and (B) the test datasets. The y-axis represents the true-positive rate, whereas the x-axis represents the false-positive rate. The dashed green line represents the no-skilled classifier.Full size imageConfusion matrix and predictive probabilities histogramsFigure 4 presents the performance of the ANN model for osteoporosis prediction using method 3 with eight selected features on the (A) validation dataset and (B) test dataset. The ANN model could distinguish the osteoporosis from the control group in both the validation data set (Fig. 4A) and the testing data set (Fig. 4B).Fig. 4Confusion matrices and predictive probabilities histograms for the ANN model on (A) the validation and (B) the test datasets. The confusion matrices on the left indicate the number of correct and incorrect classifications, where 0 represents the control group (no osteoporosis) and 1 represents the osteoporosis group. The middle panels display histograms of the predicted probabilities for the “not osteoporosis” and “osteoporosis” classes separately. The rightmost panels combine the predicted probabilities of both classes and show the overall distribution. The x-axis represents the predicted probability, and the y-axis represents the sample count, reflecting the confidence of the model in its predictions.Full size imageCalibration curve and decision curve analysisTo plot the calibration curve and perform a decision curve analysis, we selected five models with an AUC greater than 0.85: ANN, CatBoost, LightGBM, Random Forest, and XGBoost. Figure 5 compares the performances of the five selected models in terms of the calibration curve (Fig. 5A) and decision curve analysis (Fig. 5B). Calibration and decision curve analyses further demonstrated the reliability and applicability of the ANN model.Fig. 5Calibration curve (A) and decision curve analysis (B) for five selected models in osteoporosis prediction. For the calibration curve, the perfectly calibrated model is represented by the dashed diagonal. Deviations from this line indicate calibration issues, with models that either overestimate or underestimate the probabilities. For the decision curve analysis, the dashed gray line represents the “ALL” strategy, which assumes that every patient has osteoporosis, while the black line represents the “NONE” strategy, which assumes that no patient has osteoporosis. The x-axis represents the threshold probability, and the y-axis represents the net benefit. The models closer to the top of the graph provide better clinical utility across different threshold probabilities.Full size imageThe calibration curve (Fig. 5A) was used to evaluate the agreement between the predicted probabilities and actual outcomes. The XGBoost and ANN models were relatively close to the perfect calibration line, indicating strong calibration, whereas Random Forest, CatBoost, and LightGBM showed some deviations, indicating that they may slightly over- or under-predict the true probability in certain ranges.The decision curve analysis (Fig. 5B) compared the net benefits of each model across different threshold probabilities. The ANN model exhibited the highest net benefit for most threshold probabilities, indicating that it provided the most clinically useful predictions across a wide range of decision thresholds. Other models followed closely, with varying performance across thresholds, but generally offered a net benefit above the “ALL” and “NONE” strategies. XGBoost tended to dip more steeply at higher thresholds, indicating that it may not perform as well in these cases.Feature importanceFigure 6 illustrates the feature importance for osteoporosis prediction, based on eight selected features selected by method 3 (age, i-PTH, weight, height, HGB, glucose, Cre, and albumin) for the ANN and CatBoost models. The SHAP Summary Plot ranks features based on the variability of SHAP values, highlighting their overall impact on the model. In contrast, the SHAP Importance Ranking Plot ranks features by the mean absolute SHAP values, representing their average influence. Differences in rankings between these two plots are expected, as they arise from different calculation methods.Fig. 6SHAP summary plot and variable importance ranking plot for (A) ANN and (B) CatBoost models using method 3 with eight selected features. The upper panels display the SHAP summary plots, where the x-axis represents the SHAP values, reflecting the impact of each feature on the model output. The color scale represents the feature values, with blue indicating lower values and pink indicating higher values. A positive SHAP value indicates that the feature increases the likelihood of predicting osteoporosis, whereas a negative SHAP value reduces this likelihood. The lower panels present the variable importance ranking plots, showing the mean absolute SHAP values for each feature, which represent their overall contribution to the model’s predictions.Full size imageAge remains the most critical factor for predicting osteoporosis in both models, with higher age generally increasing the model’s predicted likelihood of osteoporosis. In the ANN model, CKD-related features such as i-PTH and weight are among the top contributors. Similarly, in the CatBoost model, weight and i-PTH also play significant roles, though their relative importance varies slightly. Despite these differences, the overall ranking of feature importance is consistent between the two models, highlighting the significant role of age, weight, and i-PTH in osteoporosis prediction.DiscussionOur study demonstrated that the ANN model, utilizing eight selected features (age, i-PTH, weight, height, HGB, glucose, Cre, and albumin), achieved the highest predictive performance for osteoporosis in patients with stages 3–5 CKD and ESKD. The model yielded an AUC of 0.940 for the validation set and 0.930 for the test set, outperforming other ML models, including logistic regression, XGBoost, LightGBM, CatBoost, SVM, decision trees, random forests, and KNN. In addition, the ANN model exhibited strong calibration and decision curve performance, highlighting its clinical reliability and applicability. These findings suggest that ML, especially the ANN model, has significant potential for improving osteoporosis screening in patients with CKD stages 3–5 and ESKD.Patients with stages 3–5 CKD and ESKD often experience CKD-MBD, including vascular calcification, secondary hyperparathyroidism, and renal osteodystrophy. For patients at risk for CKD-MBD or osteoporosis, BMD testing is recommended if it may influence treatment decisions7. Although DXA can overestimate bone strength and underestimate fracture risk2, particularly in advanced CKD, it remains useful because these patients have higher fracture rates than the general population, with hip fractures leading to significant morbidity and mortality rates. BMD testing is valuable if it results in interventions, such as fall prevention or osteoporosis treatment. However, many patients with CKD do not undergo BMD testing because of limited access to DXA scanning, highlighting the need for alternative clinical measures to predict osteoporosis and identify patients who should undergo DXA scanning. Kuang et al.42 developed nomogram-based models to predict osteoporosis risk in 326 pre-dialysis patients categorized according to sex and renal function stage. By using multivariate logistic regression of clinical and laboratory data, these models offer a graphical tool that links patient variables to event probabilities. The nomogram models varied according to the sex and CKD stage. The AUCs for these models ranged from 0.839 to 0.933, with the best results similar to ours. One of the differences between this study and theirs is that we extended our analysis by incorporating calibration and decision curve analyses, further demonstrating the reliability and applicability of the ANN model.Several studies have explored the use of ML to predict osteoporosis, focusing on various populations, including the general population12,13,14,15,16,17,18,19,20, postmenopausal women21,22,23,24, patients with breast cancer25, patients with rheumatoid arthritis26, and those with type 2 diabetes mellitus27,28. However, to our knowledge, no previous studies have employed multiple ML models to predict osteoporosis in patients with CKD stages 3–5 and ESKD. We summarize these studies in Supplemental Tables S2 and S3, which detail the target population, sample size, feature selection methods, number of selected features, best-performing models, and AUC scores for osteoporosis prediction across different populations, such as postmenopausal women and patients with breast cancer. The AUC of these studies ranged from 0.710 to 0.961. In comparison, our study, which focused on CKD stages 3–5 and ESKD, achieved an AUC of 0.930 using the ANN model, which was higher than those of many previous studies. A likely reason for this is our targeted selection of key CKD-related features such as parathyroid hormone levels, which may not be significant in non-CKD populations. In contrast, studies by Inui et al.23 (AUC of 0.961 using LightGBM in elderly women) and Kwon et al.24 (AUC of 0.921 using AdaBoost in postmenopausal women) reported competitive results but targeted different populations and used different methodologies.Our study enhances the understanding of osteoporosis prediction in patients with CKD by emphasizing the importance of CKD-related features such as intact parathyroid hormone, which has not been as prominently highlighted in previous models for the general population. These findings suggest that tailored models for specific patient populations, such as patients with CKD, can substantially improve predictive accuracy. Furthermore, the strong performance of our ANN model emphasizes the potential of ML to guide clinical decision-making, offering a noninvasive and cost-effective method to identify patients at high risk of osteoporosis in CKD populations. This could optimize the allocation of resources for bone density scans and improve patient outcomes.

In this study, we addressed the category imbalance between the osteoporosis group (4526 patients) and the control group (2088 patients) using several strategies to enhance model robustness and minimize bias. First, we conducted a pilot study using three data imbalance handling techniques (Downsampling, SMOTE, and Tomek Links) on the entire dataset. The impact of these techniques on the performance of nine different models is summarized in Supplemental Table S4. However, none of these techniques significantly improved model performance compared to the original dataset. Subsequently, we employed stratified random sampling during dataset splitting to maintain the original class distribution across the training, validation, and test sets, ensuring adequate representation of the control group. To comprehensively evaluate model performance, we used multiple metrics beyond accuracy, including F1-score, AUC and recall. Furthermore, we utilized SHAP values to analyze feature importance, confirming that predictions were driven by clinically relevant variables rather than artifacts of imbalance. These strategies collectively ensured that our models effectively predicted osteoporosis in patients with CKD stages 3–5 and ESKD.

This study has several limitations. First, the retrospective design may have introduced causal inferences and selection bias. Notably, the prevalence of osteoporosis in our cohort (68.4%) was significantly higher than the 18–32% reported in other CKD studies. This discrepancy is likely due to Taiwan’s National Health Insurance policies, which restrict DXA screening to patients with CKD stages 3–5 and those with ESKD who are at higher risk for osteoporosis. Consequently, our study population may not fully represent the general CKD population, potentially limiting the generalizability of our findings. This selection bias is further reflected in Table 3, where 51.09% of participants had a history of fractures. Future research with a larger CKD population and less restricted screening criteria are required to validate our findings.

Second, this study used missing data imputation to handle incomplete variables. Among the eight selected features, three (height, weight, and i-PTH) had high missing rates (height: 34.59%, weight: 30.95%, i-PTH: 57.51%). To ensure the robustness of our results, we conducted a sensitivity analysis using various missing data handling methods, including multiple imputation (iterative imputer), mean imputation, KNN imputation, and complete-case analysis (drop missing). The results demonstrated that our model’s predictive performance remained consistent across different imputation strategies (Supplemental Table S5). While multiple imputation yielded fair performance, the overall findings were not significantly affected by the choice of imputation method, confirming the robustness of our predictive model. However, the high percentage of missing values in key variables remains a limitation, and future studies should aim to collect more complete datasets for further validation.

Third, a cross-sectional survey was employed in the study, which may not have been entirely representative, as it only provided a moment-in-time view of the population. Fourth, only internal validation was performed; external validation using different datasets was necessary. Fifth, the study had a relatively small sample size and was conducted at a single hospital, highlighting the need for larger multicenter multinational studies to confirm the robustness of our model. Because the selected features are commonly measured in patients with CKD stages 3–5 and ESKD in Taiwan, external validation using data from other hospitals could be considered. Additionally, the majority of participants were Taiwanese, and osteoporosis risk factors may vary across different ethnic groups and environments. Therefore, the applicability of our machine-learning models should be tested in diverse populations worldwide. These limitations may have affected the generalizability of the prediction model. Future research should address these issues by integrating more data sources or developing advanced preprocessing techniques to manage biased data effectively.

In conclusion, we aimed to develop ML models to predict osteoporosis in patients with CKD stages 3–5 and ESKD. By applying various models, we found that the ANN model, using eight selected features (age, creatinine, height, weight, albumin, glucose, intact parathyroid hormone, and hemoglobin), achieved the highest predictive performance with an AUC of 0.940 on the validation dataset and 0.930 on the test dataset. These results suggest that the ANN model has the potential for clinical application in screening high-risk patients with CKD and ESKD for osteoporosis. Our findings successfully address this research question by providing a predictive tool that can help optimize resource allocation for BMD scanning in clinical settings. Although our study is limited by its retrospective nature, the strong performance of the ANN model provides a foundation for future research on predicting osteoporosis, particularly in populations with CKD. Future studies should focus on external validation using larger and more diverse datasets to enhance the generalizability of the model. The implications of this research suggest that ML, particularly ANN models, can improve osteoporosis screening in patients with CKD. We hypothesized that this would further enhance the management of osteoporosis in these patients, ultimately improving outcomes and reducing healthcare costs.

Data availability

All data generated or analyzed during this study are included in this article. The data and code have been placed in public GitHub and are available at https://github.com/jaten000/OsteoporosisPredictionInCKD.

ReferencesZaimi, M. & Grapsa, E. Current therapeutic approach of chronic kidney disease-mineral and bone disorder. Ther. Apher. Dial. 28, 671–689 (2024).CAS 

PubMed 

MATH 

Google Scholar 

Abdalbary, M. et al. Management of osteoporosis in patients with chronic kidney disease. Osteoporos. Int. 33, 2259–2274 (2022).CAS 

PubMed 

MATH 

Google Scholar 

Nickolas, T. L., McMahon, D. J. & Shane, E. Relationship between moderate to severe kidney disease and hip fracture in the United States. J. Am. Soc. Nephrol. 17, 3223–3232 (2006).PubMed 

MATH 

Google Scholar 

Nickolas, T. L., Leonard, M. B. & Shane, E. Chronic kidney disease and bone fracture: a growing concern. Kidney Int. 74, 721–731 (2008).PubMed 

PubMed Central 

Google Scholar 

Alem, A. M. et al. Increased risk of hip fracture among patients with end-stage renal disease. Kidney Int. 58, 396–399 (2000).CAS 

PubMed 

MATH 

Google Scholar 

Pimentel, A., Ureña-Torres, P., Zillikens, M. C., Bover, J. & Cohen-Solal, M. Fractures in patients with CKD-diagnosis, treatment, and prevention: A review by members of the European Calcified Tissue Society and the European Renal Association of Nephrology Dialysis and Transplantation. Kidney Int. 92, 1343–1355 (2017).PubMed 

Google Scholar 

Kidney Disease: Improving Global Outcomes (KDIGO) CKD-MBD Update Work Group. KDIGO 2017 Clinical Practice Guideline Update for the Diagnosis, Evaluation, Prevention, and Treatment of Chronic Kidney Disease-Mineral and Bone Disorder (CKD-MBD). Kidney Int. Suppl. 7, 1–59 (2017).

Google Scholar 

Kanis, J. A. Assessment of fracture risk and its application to screening for postmenopausal osteoporosis: synopsis of a WHO report. WHO Study Group. Osteoporos. Int. 4, 368–381 (1994).CAS 

PubMed 

Google Scholar 

Tai, T. W. et al. (2023) Clinical practice guidelines for the prevention and treatment of osteoporosis in Taiwan: 2022 Update. J. Formos. Med. Assoc. 122(1), S4–S13 (2023).PubMed 

MATH 

Google Scholar 

Jiang, F. et al. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2, 230–243 (2017).PubMed 

PubMed Central 

MATH 

Google Scholar 

Yuan, Q. et al. Role of artificial intelligence in kidney disease. Int. J. Med. Sci. 17, 970–984 (2020).CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Erjiang, E. et al. Machine learning can improve clinical detection of low BMD: The DXA-HIP study. J. Clin. Densitom. 24, 527–537. https://doi.org/10.1016/j.jocd.2020.10.004 (2021).Article 

MATH 

Google Scholar 

Ou Yang, W. Y., Lai, C. C., Tsou, M. T. & Hwang, L. C. Development of machine learning models for prediction of osteoporosis from clinical health examination data. Int. J. Environ. Res. Public Health. 18, 7635 (2021).CAS 

PubMed 

PubMed Central 

Google Scholar 

Wang, Y. et al. Prediction model for the risk of osteoporosis incorporating factors of disease history and living habits in physical examination of population in Chongqing, Southwest China: Based on artificial neural network. BMC Public Health 21, 991 (2021).ADS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Iliou, T., Anagnostopoulos, C. N. & Anastassopoulos, G. Osteoporosis detection using machine learning techniques and feature selection. Int. J. Artif. Intell. Tools. 23, 1450014 (2014).MATH 

Google Scholar 

Park, H. W. et al. Application of machine learning to identify clinically meaningful risk group for osteoporosis in individuals under the recommended age for dual-energy x-ray absorptiometry. Calcif. Tissue Int. 109, 645–655 (2021).CAS 

PubMed 

MATH 

Google Scholar 

Qiu, C. et al. Developing and comparing deep learning and machine learning algorithms for osteoporosis risk prediction. Front. Artif. Intell. 7, 1355287 (2024).PubMed 

PubMed Central 

MATH 

Google Scholar 

Wu, X. & Park, S. A prediction model for osteoporosis risk using a machine-learning approach and its validation in a large cohort. J. Korean Med. Sci. 38, e162 (2023).PubMed 

PubMed Central 

Google Scholar 

Albuquerque, G. A. et al. Osteoporosis screening using machine learning and electromagnetic waves. Sci. Rep. 13, 12865 (2023).ADS 

CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Tu, J. B., Liao, W. J., Liu, W. C. & Gao, X. H. Using machine learning techniques to predict the risk of osteoporosis based on nationwide chronic disease data. Sci. Rep. 14, 5245 (2024).ADS 

CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Shim, J. G. et al. Application of machine learning approaches for osteoporosis risk prediction in postmenopausal women. Arch. Osteoporos. 15, 169 (2020).PubMed 

MATH 

Google Scholar 

Bui, H. M. et al. Predicting the risk of osteoporosis in older Vietnamese women using machine learning approaches. Sci. Rep. 12, 20160 (2022).ADS 

CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Inui, A. et al. Screening for osteoporosis from blood test data in elderly women using a machine learning approach. Bioengineering 10, 277 (2023).CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Kwon, Y. et al. Osteoporosis pre-screening using ensemble machine learning in postmenopausal Korean women. Healthcare 10, 1107 (2022).PubMed 

PubMed Central 

MATH 

Google Scholar 

Ji, L. et al. Osteoporosis, fracture and survival: Application of machine learning in breast cancer prediction models. Front. Oncol. 12, 973307 (2022).PubMed 

PubMed Central 

MATH 

Google Scholar 

Lee, C., Joo, G., Shin, S., Im, H. & Moon, K. W. Prediction of osteoporosis in patients with rheumatoid arthritis using machine learning. Sci. Rep. 13, 21800 (2023).ADS 

CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Wu, X. et al. Development of machine learning models for predicting osteoporosis in patients with type 2 diabetes mellitus-A preliminary study. Diabetes Metab. Syndr. Obes. 16, 1987–2003 (2023).CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Wu, X. et al. Application of machine learning algorithms to predict osteoporosis in postmenopausal women with type 2 diabetes mellitus. J. Endocrinol. Invest. 46, 2535–2546 (2023).CAS 

PubMed 

MATH 

Google Scholar 

Wu, C. H. et al. Consensus of official position of IOF/ISCD FRAX initiatives in Asia-Pacific region. J. Clin. Densitom. 17, 150–155 (2014).PubMed 

MATH 

Google Scholar 

Fouque, D. et al. A proposed nomenclature and diagnostic criteria for protein-energy wasting in acute and chronic kidney disease. Kidney Int. 73, 391–398 (2008).CAS 

PubMed 

MATH 

Google Scholar 

Inker, L. A. et al. New creatinine- and cystatin C-based equations to estimate GFR without race. N. Engl. J. Med. 385, 1737–1749 (2021).CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Kramer, H. J. et al. An endorsement of the removal of race from GFR estimation equations: A position statement from the National Kidney Foundation kidney disease outcomes quality initiative. Am. J. Kidney Dis. 80, 691–696 (2022).PubMed 

MATH 

Google Scholar 

Payne, R. B., Little, A. J., Williams, R. B. & Milner, J. R. Interpretation of serum calcium in patients with abnormal serum proteins. Br. Med. J. 4, 643–646 (1973).CAS 

PubMed 

PubMed Central 

Google Scholar 

Zhao, J., Lui, H., Kalia, S., Lee, T. K. & Zeng, H. Improving skin cancer detection by Raman spectroscopy using convolutional neural networks and data augmentation. Front. Oncol. 14, 1320220 (2024).PubMed 

PubMed Central 

Google Scholar 

Hsu, C. T., Pai, K. C., Chen, L. C., Lin, S. H. & Wu, M. J. Machine learning models to predict the risk of rapidly progressive kidney disease and the need for nephrology referral in adult patients with type 2 diabetes. Int. J. Environ. Res. Public Health. 20, 3396 (2023).CAS 

PubMed 

PubMed Central 

Google Scholar 

Li, C. Little’s test of missing completely at random. Stata J. 13, 795–809 (2013).MATH 

Google Scholar 

Emmanuel, T. et al. A survey on missing data in machine learning. J. Big Data. 8, 140 (2021).PubMed 

PubMed Central 

MATH 

Google Scholar 

Schmitt, S. & Rothlauf, F. Comparison of imputation methods for categorical real-world prostate cancer data with natural order. Stud. Health Technol. Inform. 316, 1800–1804 (2024).PubMed 

MATH 

Google Scholar 

Pan, Y. et al. Cognitive frailty in relation to vitamin B12 and 25-hydroxyvitamin D in an elderly population: A cross-sectional study from NHANES. Front. Nutr. 11, 1430722 (2024).PubMed 

PubMed Central 

Google Scholar 

Tseng, P. Y. et al. Prediction of the development of acute kidney injury following cardiac surgery by machine learning. Crit. Care. 24, 478 (2020).PubMed 

PubMed Central 

MATH 

Google Scholar 

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn Rese. 12, 2825–2830 (2011).MathSciNet 

MATH 

Google Scholar 

Kuang, C. et al. Risk factors and clinical prediction models for osteoporosis in pre-dialysis chronic kidney disease patients. Ren. Fail. 46, 2361802 (2024).PubMed 

PubMed Central 

MATH 

Google Scholar 

Download referencesAcknowledgementsThe authors thank the Osteoporosis Prevention and Treatment Center of the Taichung Veterans General Hospital for providing the osteoporosis database.Author informationAuthors and AffiliationsDepartment of Industrial Engineering and Enterprise Information, Tunghai University, Taichung, 407224, TaiwanChia-Tien Hsu & Chin-Yin HuangDivision of Nephrology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, 407219, TaiwanChia-Tien Hsu, Cheng-Hsu Chen & Ming-Ju WuSchool of Medicine, National Yang Ming Chiao Tung University, Taipei, 112304, TaiwanChia-Tien Hsu & Shih-Yi LinCenter for Osteoporosis Prevention and Treatment, Taichung Veterans General Hospital, Taichung, 407219, TaiwanChia-Tien Hsu, Ya-Lian Deng & Shih-Yi LinDepartment of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, 40227, TaiwanCheng-Hsu Chen & Ming-Ju WuDepartment of Nursing, Taichung Veterans General Hospital, Taichung, 407219, TaiwanYa-Lian DengCenter for Geriatrics and Gerontology, Taichung Veterans General Hospital, Taichung, 407219, TaiwanShih-Yi LinAuthorsChia-Tien HsuView author publicationsYou can also search for this author inPubMed Google ScholarChin-Yin HuangView author publicationsYou can also search for this author inPubMed Google ScholarCheng-Hsu ChenView author publicationsYou can also search for this author inPubMed Google ScholarYa-Lian DengView author publicationsYou can also search for this author inPubMed Google ScholarShih-Yi LinView author publicationsYou can also search for this author inPubMed Google ScholarMing-Ju WuView author publicationsYou can also search for this author inPubMed Google ScholarContributionsCT. H., CH. C., SY. L., MJ. W. and CY. H. made substantial contributions to the conception and design of the study and interpretation of data. CT. H., YL. D. and SY. L. gathered, analyzed, and interpreted the data. CT. H. wrote the original draft of the manuscript. CH. C., MJ. W., and CT. H. supervised the data interpretation and substantively revised the manuscript. All the authors have read and approved the final version of the manuscript.Corresponding authorCorrespondence to

Chin-Yin Huang.Ethics declarations

Competing interests

The authors declare no competing interests.

Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Electronic supplementary materialBelow is the link to the electronic supplementary material.Supplementary Material 1Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissionsAbout this articleCite this articleHsu, CT., Huang, CY., Chen, CH. et al. Machine learning models to predict osteoporosis in patients with chronic kidney disease stage 3–5 and end-stage kidney disease.

Sci Rep 15, 11391 (2025). https://doi.org/10.1038/s41598-025-95928-5Download citationReceived: 04 February 2025Accepted: 25 March 2025Published: 03 April 2025DOI: https://doi.org/10.1038/s41598-025-95928-5Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

KeywordsMachine learningOsteoporosisChronic kidney diseaseEnd-stage kidney diseaseBone mineral density

Read full news in source page