Predictive Modeling in High-Dimensional Structured Data
Main Article Content
Abstract
The rapid growth of data-driven technologies has led to an unprecedented increase in the availability of high-dimensional structured data, posing new challenges for predictive modeling in terms of scalability, interpretability, and overfitting. This study presents a comprehensive framework that integrates dimensionality reduction, regularization, and advanced machine learning algorithms to develop accurate and interpretable predictive models for high-dimensional datasets. A multi-phase methodology was adopted, encompassing data preprocessing, feature selection using Lasso and Elastic Net, dimensionality reduction via PCA and t-SNE, and model development using Support Vector Machine (SVM), Random Forest, XGBoost, and Deep Neural Networks (DNN). The models were evaluated through a combination of regression and classification metrics, including Accuracy, F1-score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R², supported by cross-validation for robustness. Results revealed that XGBoost and DNN outperformed all other models, achieving accuracies above 0.94 and R² values up to 0.96, confirming their superior adaptability to complex, non-linear feature interactions. Feature importance analysis identified Age, Blood Pressure, and Cholesterol Level as the most influential predictors, while SHAP value interpretation enhanced model transparency and explainability. The integration of ensemble learning, deep learning, and explainable AI techniques provided a balanced approach that optimized predictive accuracy without compromising interpretability. Overall, the study establishes a scalable, interpretable, and high-performing predictive modeling framework suitable for high-dimensional structured data applications across healthcare, finance, and other analytical domains.