Introduction: Online social networking data (SN) is a contextually and temporally rich data stream that has shown promise in the prediction of suicidal thought and behavior. Despite the clear advantages of this digital medium, predictive modeling of acute suicidal ideation (SI) currently remains underdeveloped. SN data, in conjunction with robust machine learning algorithms, may offer a promising way forward.
Methods: We applied an ensemble machine learning model on a previously published dataset of adolescents on Instagram with a prior history of lifetime SI (N = 52) to predict SI within the past month. Using predictors that capture language use and activity within this SN, we evaluated the performance of our out-of-sample, cross-validated model against previous efforts and leveraged a model explainer to further probe relative predictor importance and subject-level phenomenology.
Results: Linguistic and SN data predicted acute SI with an accuracy of 0.702 (sensitivity = 0.769, specificity = 0.654, AUC = 0.775). Model introspection showed a higher proportion of SN-derived predictors with substantial impact on prediction compared with linguistic predictors from structured interviews. Further analysis of subject-specific predictor importance uncovered potentially informative trends for future acute SI risk prediction.
Conclusion: Application of ensemble learning methodologies to SN data for the prediction of acute SI may mitigate the complexities and modeling challenges of SI that exist within these time scales. Future work is needed on larger, more heterogeneous populations to fine-tune digital biomarkers and more robustly test external validity.