Exploring Highly Influential Features and Model Comparison for Predicting Coronary Heart Disease: A Novel Approach Using SHAP
Main Article Content
Abstract
Forecasting coronary heart disease (CHD) is essential for improving healthcare outcomes and holds significant importance in identifying individuals at risk and mitigating severe health outcomes. Many prior studies have analyzed complete datasets without isolating the most influential features, which hinders the interpretability and efficacy of classification models. This research focus to address this gap by identifying key features that have been underexplored in past research on heart disease and evaluating multiple classification models based on these significant features. Using SHapley Additive exPlanations (SHAP), the most impactful features were extracted from a dataset comprising 1888 instances with 14 attributes and two outcome classes (Yes and No), indicating CHD risk. Key features such as 'thal,' 'chest pain type (cp),' The slope of the ST segment as shown during maximal exertion, ’(slope)', 'ca,' and 'oldpeak' were prioritized for further analysis. Classification models, such as XGBoost, Decision Tree, K-Nearest Neighbours (KNN), and Logistic Regression were then assessed using this reduced feature set. Among these, XGBoost achieved the highest performance, with an 90.21 percent accuracy, 90.05% precision, 90.53% recall, and 90.29% F1 score. KNN, with an accuracy of 89.68%, came in second while the Decision Tree also yielded strong results with 90.21% accuracy. By leveraging a focused subset of critical features, this study demonstrates how classification models can be improved to produce results that are easier to understand and more effective. These findings pave the way for the development of a prototype tool to aid in heart disease detection, providing value to both research and clinical applications in cardiovascular diagnostics.