A Multi-Model Machine Learning Framework for Performance Prediction and Educational Quality Assessment in Primary and Secondary Education Across School Stakeholders
Main Article Content
Abstract
Primary and secondary education systems generate large volumes of multi-stakeholder data, yet schools often lack practical, data-driven mechanisms to jointly assess learner risk, teaching effectiveness, leadership quality, and infrastructure status. This study proposes a multi-model machine learning framework to predict performance outcomes and support educational quality assessment across school stakeholders. A unified pipeline is applied: numeric conversion and cleaning, median imputation for missing values, z-score standardization, and outcome construction. For classification, a composite score is converted into binary labels using median thresholding and modeled using Logistic Regression, SVM (RBF), and Random Forest. For regression, the continuous composite score is modeled using Ridge Regression, SVR (RBF), and Random Forest Regressor. Experiments are conducted on the MP Education Survey dataset, comprising five aligned respondent tables primary students (n=500), secondary students (n=500), teachers (n=500), headmasters (n=500), and school observers (n=500) linked through school and location identifiers. Results demonstrate consistently high predictive performance across datasets: Random Forest yields the strongest classification performance (reaching near-perfect accuracy in student datasets), while Ridge Regression achieves the lowest RMSE in composite-score prediction due to the linear structure of the constructed indices. Performance is comparatively lower for observer data, indicating greater heterogeneity in external infrastructure assessments.