Abstract
High-grade gliomas are the most common primary brain tumors in adults and are typically diagnosed using histopathology. However, these diagnostic categories are highly heterogeneous and do not always correlate well with survival. In an attempt to refine these diagnoses, we make several immunohistochemical measurements of YKL-40, a gene previously shown to be differentially expressed between diagnostic groups. We propose two latent class models for classification and variable selection in the presence of high-dimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques. Penalization and model selection are incorporated in this setting via prior distributions on the unknown parameters. The methods provide valid parameter estimates under conditions in which standard supervised latent class models do not, and outperform two-stage approaches to variable selection and parameter estimation in a variety of settings. We study the properties of these methods in simulations, and apply these methodologies to the glioma study for which identifiable three-class parameter estimates cannot be obtained without penalization. With penalization, the resulting latent classes correlate well with clinical tumor grade and offer additional information on survival prognosis that is not captured by clinical diagnosis alone. The inclusion of YKL-40 features also increases the precision of survival estimates. Fitting models with and without YKL-40 highlights a subgroup of patients who have glioblastoma (GBM) diagnosis but appear to have better prognosis than the typical GBM patient.
Original language | English (US) |
---|---|
Pages (from-to) | 1342-1360 |
Number of pages | 19 |
Journal | Statistics in Medicine |
Volume | 31 |
Issue number | 13 |
DOIs | |
State | Published - Jun 15 2012 |
Keywords
- Cancer
- Glioma
- Latent class
- Penalization
- Ridge
- Supervised
- Variable selection
ASJC Scopus subject areas
- Epidemiology
- Statistics and Probability