An explainable machine learning analysis of technical and tactical indicators associated with CSL match outcomes
Article excerpt
To examine the analytical value of football technical and tactical indicators in post-match outcome analysis, this study used post-match data from 240 matches in the 2024 Chinese Super League (CSL) season and constructed an explainable analytical framework integrating machine learning…
To examine the analytical value of football technical and tactical indicators in post-match outcome analysis, this study used post-match data from 240 matches in the 2024 Chinese Super League (CSL) season and constructed an explainable analytical framework integrating machine learning and SHAP (Shapley Additive Explanations). From the home-team perspective, this study aimed to use model comparison and SHAP-based interpretation to identify the relative contribution magnitude, class-specific contribution direction, and potential nonlinear contribution patterns of technical, tactical, contextual, and player-attribute indicators in the classification of home win, draw, and home loss. Based on indicator attributes and the logic of football match performance, the variables were classified into seven dimensions: player attributes, match context, attacking performance, possession and passing performance, duels and contests, defensive performance, and disciplinary performance. Multiple machine learning models were compared in the three-class match outcome classification task, and XGBoost was used as the base model for subsequent SHAP analysis because of its comparatively balanced validation performance and tree-based interpretability. The global SHAP results showed that xG and Value had the highest model contributions, indicating that chance quality and squad-resource-related information were the primary signals used by the model to distinguish match outcome classes. Yellow_C, Round, Long_B, Throw_ins, Blocked_S, GK_Keep_Save, and Recoveries also ranked highly, suggesting that match outcome classification was informed by multidimensional information related to match context, possession progression, defensive events, goalkeeper involvement, and disciplinary performance. The class-specific SHAP results showed that xG and Value exhibited opposite contribution directions between the home win and home loss classes, whereas SHAP values for the draw class were more concentrated around zero, indicating weaker dominance of individual variables in identifying draw outcomes. The SHAP dependence plots further showed that several variables exhibited potential nonlinear contribution patterns and model-derived transition points in the home win class. Overall, this study provides an explainable post-match analytical framework for identifying how technical, tactical, contextual, and player-attribute indicators contribute to CSL match outcome classification, offering data-driven reference for contextual video review, performance diagnosis, and training-priority development.