Using machine learning to predict student mathematics performance in six East Asian countries: evidence from PISA 2022
Article excerpt
IntroductionIn the contemporary era of artificial intelligence and rapidly evolving knowledge systems, mathematics performance has become a critical competency for younger generations. Although the mathematical achievement of students in East Asian countries has attracted increasing scholarly attention, studies employing machine…
IntroductionIn the contemporary era of artificial intelligence and rapidly evolving knowledge systems, mathematics performance has become a critical competency for younger generations. Although the mathematical achievement of students in East Asian countries has attracted increasing scholarly attention, studies employing machine learning techniques to examine the combined determinants of their success remain limited.MethodsThis study evaluates six machine learning models-Random Forest, LightGBM, XGBoost, AdaBoost, Elastic Net, and Linear Regression to identify the most accurate algorithm for predicting mathematics performance among students from six high-performing East Asian countries/economies participating in the Programme for International Student Assessment (PISA) 2022. A sample of 26,969 fifteen-year-old students was analyzed. Following model selection, a post hoc feature selection procedure was applied, retaining the 24 most influential predictors from an initial set of 62 variables to ensure analytical parsimony while preserving model performance. SHapley Additive exPlanations (SHAP) values and SHAP interaction analyses were employed to quantify the magnitude, direction, and heterogeneity of each predictor's contribution at the individual level, including nonlinear relationships.ResultsXGBoost emerged as the optimal model, demonstrating superior predictive accuracy (R2 = 0.5758, RMSE = 65.06) and explaining approximately 57.03% of the variance in mathematics achievement. Mathematics self-efficacy was identified as the most dominant predictor, exerting a substantially larger effect than all other variables, followed by participation in extracurricular activities before school and weekly mathematics instructional time. Affective, behavioral, and instructional factors consistently outperformed structural and socioeconomic variables in predictive importance.Discussion and conclusionThese findings underscore the central role of student-proximal determinants in mathematics achievement within Confucian Heritage Culture educational contexts. Interpreted through the lens of self-determination theory, the results carry important implications for educational policy and practice, particularly in prioritizing self-efficacy development, optimizing instructional time, and promoting equitable learning environments. The study also contributes theoretically and offers directions for future research.