Supervised Bayesian latent class models for high-dimensional data

Stacia M. Desantis, E. Andrés Houseman, Brent A. Coull, Catherine L. Nutt, Rebecca A. Betensky

Research output: Contribution to journalArticlepeer-review


High-grade gliomas are the most common primary brain tumors in adults and are typically diagnosed using histopathology. However, these diagnostic categories are highly heterogeneous and do not always correlate well with survival. In an attempt to refine these diagnoses, we make several immunohistochemical measurements of YKL-40, a gene previously shown to be differentially expressed between diagnostic groups. We propose two latent class models for classification and variable selection in the presence of high-dimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques. Penalization and model selection are incorporated in this setting via prior distributions on the unknown parameters. The methods provide valid parameter estimates under conditions in which standard supervised latent class models do not, and outperform two-stage approaches to variable selection and parameter estimation in a variety of settings. We study the properties of these methods in simulations, and apply these methodologies to the glioma study for which identifiable three-class parameter estimates cannot be obtained without penalization. With penalization, the resulting latent classes correlate well with clinical tumor grade and offer additional information on survival prognosis that is not captured by clinical diagnosis alone. The inclusion of YKL-40 features also increases the precision of survival estimates. Fitting models with and without YKL-40 highlights a subgroup of patients who have glioblastoma (GBM) diagnosis but appear to have better prognosis than the typical GBM patient.

Original languageEnglish (US)
Pages (from-to)1342-1360
Number of pages19
JournalStatistics in Medicine
Issue number13
StatePublished - Jun 15 2012


  • Cancer
  • Glioma
  • Latent class
  • Penalization
  • Ridge
  • Supervised
  • Variable selection

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability


Dive into the research topics of 'Supervised Bayesian latent class models for high-dimensional data'. Together they form a unique fingerprint.

Cite this