A Hybrid Stacking Ensemble Approach for Diabetes Classification on Imbalanced Clinical Data
Main Article Content
Abstract
Diabetes mellitus is a common chronic disease which afflicts people worldwide. It presents great challenges for medical care because of the complications that are related to it. So it is important to diagnose and treat diabetes China can give every diabetic the chance of a healthy life. The Peidemo Diabetes Prediction System described in this paper is a strong example of ensemble learning. It uses XGBoost, LightGBM, and Adaboost classifiers as base classifiers in an ensemble setting together with logistic regression for meta learning. Synthetic minority over-sampling technique combined with edited nearest neighbors (SMOTEENN) is applied to the dataset so that the model can generalize. Pima Indian Diabetes dataset had gone through a complete pre-processing including cleaning data, balancing data, and scaling assignments. The model was evaluated with the training/test partition being 80% for training and 20% for testing. It had a predictive accuracy rate to 93.7% and an area under the receiver operating characteristic curve (AUC) of 0.97. Thus the results of this study show that combining base classifiers in a stacked ensemble can capture complex feature interactions and solve the problem of class imbalance in diabetes prediction effectively. As a method of clinical decision support, the methodology offers great promise. It helps companies in providing their employees with quality healthcare at an affordable cost and even an equal opportunity to develop skills for the future.