TY - JOUR
T1 - Deep significance clustering
T2 - A novel approach for identifying risk-stratified and predictive patient subgroups
AU - Huang, Yufang
AU - Liu, Yifan
AU - Steel, Peter A.D.
AU - Axsom, Kelly M.
AU - Lee, John R.
AU - Tummalapalli, Sri Lekha
AU - Wang, Fei
AU - Pathak, Jyotishman
AU - Subramanian, Lakshminarayanan
AU - Zhang, Yiye
N1 - Publisher Copyright:
© 2021 The Author(s) 2021.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Objective: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. Materials and Methods: Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. Results: Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. Discussion: DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. Conclusion: DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile.
AB - Objective: Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. Materials and Methods: Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. Results: Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. Discussion: DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. Conclusion: DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile.
KW - Machine learning
KW - Predictive clustering
KW - Risk stratification
UR - http://www.scopus.com/inward/record.url?scp=85121281098&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121281098&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocab203
DO - 10.1093/jamia/ocab203
M3 - Article
C2 - 34571540
AN - SCOPUS:85121281098
SN - 1067-5027
VL - 28
SP - 2641
EP - 2653
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 12
ER -