TY - JOUR
T1 - Applied machine learning to identify differential risk groups underlying externalizing and internalizing problem behaviors trajectories
T2 - A case study using a cohort of Asian American children
AU - Adhikari, Samrachana
AU - You, Shiying
AU - Chen, Alan
AU - Cheng, Sabrina
AU - Huang, Keng Yen
N1 - Publisher Copyright:
© 2023 Adhikari et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2023/3
Y1 - 2023/3
N2 - Background Internalizing and externalizing problems account for over 75% of the mental health burden in children and adolescents in the US, with higher burden among minority children. While complex interactions of multilevel factors are associated with these outcomes and may enable early identification of children in higher risk, prior research has been limited by data and application of traditional analysis methods. In this case example focused on Asian American children, we address the gap by applying data-driven statistical and machine learning methods to study clusters of mental health trajectories among children, investigate optimal predictions of children at high-risk cluster, and identify key early predictors. Methods Data from the US Early Childhood Longitudinal Study 2010–2011 were used. Multilevel information provided by children, families, teachers, schools, and care-providers were considered as predictors. Unsupervised machine learning algorithm was applied to identify groups of internalizing and externalizing problems trajectories. For prediction of high-risk group, ensemble algorithm, Superlearner, was implemented by combining several supervised machine learning algorithms. Performance of Superlearner and candidate algorithms, including logistic regression, was assessed using discrimination and calibration metrics via crossvalidation. Variable importance measures along with partial dependence plots were utilized to rank and visualize key predictors. Findings We found two clusters suggesting high- and low-risk groups for both externalizing and internalizing problems trajectories. While Superlearner had overall best discrimination performance, logistic regression had comparable performance for externalizing problems but worse for internalizing problems. Predictions from logistic regression were not well calibrated compared to those from Superlearner, however they were still better than few candidate algorithms. Important predictors identified were combination of test scores, child factors, teacher rated scores, and contextual factors, which showed non-linear associations with predicted probabilities. Conclusions We demonstrated the application of data-driven analytical approach to predict mental health outcomes among Asian American children. Findings from the cluster analysis can inform critical age for early intervention, while prediction analysis has potential to inform intervention programing prioritization decisions. However, to better understand external validity, replicability, and value of machine learning in broader mental health research, more studies applying similar analytical approach is needed.
AB - Background Internalizing and externalizing problems account for over 75% of the mental health burden in children and adolescents in the US, with higher burden among minority children. While complex interactions of multilevel factors are associated with these outcomes and may enable early identification of children in higher risk, prior research has been limited by data and application of traditional analysis methods. In this case example focused on Asian American children, we address the gap by applying data-driven statistical and machine learning methods to study clusters of mental health trajectories among children, investigate optimal predictions of children at high-risk cluster, and identify key early predictors. Methods Data from the US Early Childhood Longitudinal Study 2010–2011 were used. Multilevel information provided by children, families, teachers, schools, and care-providers were considered as predictors. Unsupervised machine learning algorithm was applied to identify groups of internalizing and externalizing problems trajectories. For prediction of high-risk group, ensemble algorithm, Superlearner, was implemented by combining several supervised machine learning algorithms. Performance of Superlearner and candidate algorithms, including logistic regression, was assessed using discrimination and calibration metrics via crossvalidation. Variable importance measures along with partial dependence plots were utilized to rank and visualize key predictors. Findings We found two clusters suggesting high- and low-risk groups for both externalizing and internalizing problems trajectories. While Superlearner had overall best discrimination performance, logistic regression had comparable performance for externalizing problems but worse for internalizing problems. Predictions from logistic regression were not well calibrated compared to those from Superlearner, however they were still better than few candidate algorithms. Important predictors identified were combination of test scores, child factors, teacher rated scores, and contextual factors, which showed non-linear associations with predicted probabilities. Conclusions We demonstrated the application of data-driven analytical approach to predict mental health outcomes among Asian American children. Findings from the cluster analysis can inform critical age for early intervention, while prediction analysis has potential to inform intervention programing prioritization decisions. However, to better understand external validity, replicability, and value of machine learning in broader mental health research, more studies applying similar analytical approach is needed.
UR - http://www.scopus.com/inward/record.url?scp=85149313581&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149313581&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0282235
DO - 10.1371/journal.pone.0282235
M3 - Article
C2 - 36867610
AN - SCOPUS:85149313581
SN - 1932-6203
VL - 18
JO - PloS one
JF - PloS one
IS - 3 March
M1 - e0282235
ER -