A Comparative analysis of Ensemble Machine Learning Techniques for Early Predicting the Risk of Chronic Kidney Disease
Main Article Content
Abstract
Introduction: Chronic Kidney Disease (CKD) is a condition marked by the gradual deterioration of kidney function. Timely detection and effective treatment can enhance the chances of a positive outcome.
Objectives: The proposed research focus on predicting CKD status by incorporating a naval framework for effective identification using ensemble machine learning for early prediction of Chronic Kidney Disease prediction.
Methods: The proposed research focus on predicting CKD status by incorporating data pre-processing, attribute selection using standard Lasso, Lasso with Cross-Validation, Multitask Lasso feature selection and combined features selected by above three feature selection methods with six different Boosting ensemble machine learning classifiers.This research has explored the potential of various Ensemble Machine Learning techniques such as Gradient Boosting Classifier, Histogram Gradient Boosting Classifier, adaBoost Classifier, XGBoosting Classifier, CatBoost Classifier and Light GBM Classifier for enabling early diagnosis of CKD using the dataset taken from kaggle.
Results: The performance is evaluated using confusion matrix. The efficiency of the methodologies is measured in terms of metrics. The overall result shows that the Gradient Boosting Classifier gives highest accuracy of 99% when compared to other five classifier used in this research work.
Conclusions: This study examines the performance of Boosting classifers in predicting chronic kidney disease (CKD) outcomes. The results show that Gradient Boosting achieved the highest accuracy 99% across all four feature selection categories. While Histogram Gradient Boosting and CatBoost showed better performance with Lasso CV, Multitask Lasso, and combined feature selections. Additionally, the AdaBoost classifier performed better with Multitask Lasso and combined features. Both XGBoost and Light GBM classifiers performed better when using combined feature selection. The combined feature set yielded 99% accuracy across all classifiers. Overall, the findings demonstrate that the Gradient Boosting classifier achieved the highest accuracy and sensitivity in identifying CKD patients, highlighting its potential for early detection in clinical settings compared to other classifiers.