Multiverse analysis of machine learning: classification between groups defined by suicidal ideation screening status using acoustic features in college students
Article excerpt
BackgroundSuicide is a major global public health crisis and a leading cause of unnatural death among college students. Current suicide ideation assessment mainly relies on self-report questionnaires and structured interviews. These methods are vulnerable to response bias and cannot support…
BackgroundSuicide is a major global public health crisis and a leading cause of unnatural death among college students. Current suicide ideation assessment mainly relies on self-report questionnaires and structured interviews. These methods are vulnerable to response bias and cannot support continuous monitoring. There is an urgent need for objective and non-invasive correlates of suicide ideation. Speech provides a promising source of such correlates, as acoustic features reflect emotional and cognitive states related to suicidal ideation. Although speech-based machine learning models have shown encouraging predictive performance, most studies rely on single analytical pipelines. Consequently, the robustness and generalizability of reported acoustic correlates across analytical choices remain a question for clinical translation.MethodsA comprehensive multiverse analysis was conducted across 1,764 distinct analytical pipelines using speech data from 96 Chinese university students (48 individuals who screened positive for suicidal ideation on the SIOSS and C-SSRS, and 48 matched controls who screened negative). The pipelines varied in preprocessing strategies, acoustic feature sets, dimensionality reduction methods, and machine learning models. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC). Feature importance was aggregated across all pipelines to identify the top 10 core acoustic features. These features were subsequently examined within a new multiverse analysis framework to assess their robustness across analytical specifications.ResultsPredictive performance was highly sensitive to analytical choices, with AUC values ranging from near chance (0.508) to high discriminative accuracy (0.856). Despite this variability, a core subset of acoustic features, including fundamental frequency (F0), F0 envelope, and Mel-frequency cepstral coefficients (MFCCs), demonstrated robust and stable differences between the group screening positive for suicidal ideation and the screening-negative control group. These features remained statistically significant in 237 of 240 eligible specifications (98.8%).ConclusionAlthough speech-based computational prediction of group status defined by suicidal ideation screening measures is highly dependent on analytical decisions, the discriminative acoustic features derived from machine learning remain remarkably stable, while it is important to recognize that observed acoustic differences likely reflect a combination of suicidal ideation, depression, anxiety, and general distress rather than a single underlying construct.